insulation comprar cialis generico online espana garbages lazier
unanimously seasons viagra online buy viagra online dresses here Better Online Citations – Details Part 2 (GEDCOM) | ThinkGenealogy
friend

Better Online Citations – Details Part 2 (GEDCOM)

Sunday, 3 May 2009 | by Mark Tucker

GEDCOM support by Legacy 7, RootsMagic 4, and Family Tree Maker 2009

In Better Online Citations – Details Part 1 we examined how the QuickCheck model for “Book: Basic format” from Evidence Explained was coded in Family Tree Maker 2009, Legacy 7, and RootsMagic 4. From the screens we were able to identify implementation differences between the three applications. There are also differences between the applications in how citation information is conveyed via a GEDCOM export. The individual fields shown on the template screens are lost in the standard GEDCOM export making it impossible to create a rich EE-style citation in one application, export it to GEDCOM, and import it into another application while retaining that richness. In all cases (except when the exporter and importer of the GEDCOM is RootsMagic 4), the citation is changed from a “Book: Basic format” to a generic “old-style” (pre EE) format with important details lost.

In the previous post I skipped all details of the file format needed to support online downloadable source citations. From my own observations, from survey feedback as well as comments on this blog and through e-mail there are two main camps when it comes to this topic. One group feels that the best approach to take is an extension to the existing GEDCOM 5.5 standard (which was released in 1996). They feel that it is the best choice to lower the barrier of adoption and avoids YAFF (Yet Another File Format). See the comments from Tamura Jones from Part 1. The second group is open to a new file format based on XML (Extensible Markup Language) which has wide support among programming languages. As a programmer I lean slightly to the side of XML and that is what I used in the prototype shown in the video. But I am open to either view.

In that spirit of openness, I will first look at how the three applications that support EE-style citations represent those citations in an exported GEDCOM file. In this post, we will look at sections of GEDCOM which is a little technical but there will be plenty of explanations for readers of all levels. From the previous post we saw differences in how each application implemented the EE-style QuickCheck model and this post will show differences in how each exports them. You will also see some forethought in the export from one vendor.

For those who have never seen a GEDCOM file, each line starts with a number: 0, 1, 2, etc. A zero is the beginning of a new record and higher numbers are “nested” under lower numbers. It is a way of grouping information together. In the GEDCOM fragments I show here, I will indent the lines to make them easier to understand. I have also done some slight rearranging of lines to make the comparisons easier. After the number, each line has a tag which is a shortened identifier of the information contained on that line. You can see definitions of the standard tags here.

A GEDCOM file starts with a header that describes the software application that generated the file as well as the version of the GEDCOM standard:

Family Tree Maker 2009

0 HEAD
	1 SOUR FTM
		2 VERS Family Tree Maker (18.0.0.305)
		2 NAME Family Tree Maker for Windows
		2 CORP The Generations Network
	1 DEST GED55
	1 GEDC
		2 VERS 5.5
		2 FORM LINEAGE-LINKED

Legacy 7

0 HEAD
	1 SOUR Legacy
		2 VERS 7.0
		2 NAME Legacy (R)
		2 CORP Millennia Corp.
	1 DEST Gedcom55
	1 GEDC
		2 VERS 5.5
		2 FORM LINEAGE-LINKED

RootsMagic 4

0 HEAD
	1 SOUR RootsMagic
		2 VERS 4.0
		2 NAME RootsMagic
		2 CORP RootsMagic, Inc.
	1 DEST RootsMagic
	1 GEDC
		2 VERS 5.5.1
		2 FORM LINEAGE-LINKED

The first line starts the header record which contains a source program (SOUR) that generated the file and a destination program or format (DEST) that will use the file. The level under source shows the program, version and company information. The GEDC or GEDCOM tag identifies that the file adheres to the specified GEDCOM version and format. FTM and Legacy shows 5.5 whereas RM shows 5.5.1. GEDCOM 5.5.1 was published as a draft in 1999 but the changes between it and GEDCOM 5.5 do not affect our discussion. The rest of the header record was removed as it is not important for our comparison. Even though the numbering and abbreviations take a little getting used to, the file format is pretty straight forward. The rest of the file is just as understandable.

As you might remember from the video, we have my great grandfather Worth Tucker who owned property in Elmo, Emery, Utah. The source of this information was a book. From the book we have 4 images: three that go with the source entry and one for a specific page that should be associated with the citation or source detail. Lastly, there is an extract from a page added as the citation text.

Here is what part of that looks like:

Family Tree Maker 2009

0 @I00001@ INDI
	1 NAME Worth /Tucker/
	1 SEX U

Legacy 7

0 @I1@ INDI
	1 NAME Worth /Tucker/
		2 GIVN Worth
		2 SURN Tucker
	1 SEX U

RootsMagic 4

0 @I1@ INDI
	1 NAME Worth /Tucker/
		2 GIVN Worth
		2 SURN Tucker

These next lines start an individual (INDI) record (notice the 0 prefix). The ID between @ characters (ex: @I00001@) uniquely identifies this person record from any other person records in the file. The name of the person is indicated in one or two ways. The NAME tag requires the full name with the surname between “/” characters. The name can also be broken in given name and surname as represented by GIVN and SURN. Some formats set the gender or sex to U for unknown or don’t include the tag unless it is set.

Included in the individual record is the property ownership event:

Family Tree Maker 2009

1 EVEN 80 acres
	2 TYPE Property
	2 DATE 1908
	2 PLAC Elmo, Emery, Utah

Legacy 7

1 EVEN 80 acres
	2 TYPE Property
	2 DATE 1908
	2 PLAC Elmo, Emery, Utah

RootsMagic 4

1 PROP 80 acres
	2 DATE 1908
	2 PLAC Elmo, Emery, Utah

FTM and Legacy choose to use generic event tag with a corresponding qualifying type of “Property” whereas RM simplifies it by using the property tag. Both are equivalent. Each includes the property description of “80 acres” and the date and place of ownership.

GEDCOM supports the concept of a master source with its information and multiple citation details corresponding to an event. In part 1, this concept was shown in the application screens in two screens (FTM 2009 and Legacy 7) or a single screen with different colored top and bottom sections (RootsMagic 4). The way this is represented in the GEDCOM format is to have a separate source record with a unique id and to reference that source record for the specific event being cited. Additional citation details are then given. Notice that the below section starts at level 2 and appears in the file at the same level as PLAC and right below it. This signifies that the event being cited is the parent level 1 or the property ownership event.

Family Tree Maker 2009

2 SOUR @S00002@
	3 PAGE 179
	3 DATA
		4 TEXT In 1908 Eliza Oviatt filed on eighty acres and Worth Tucker purchased
			5 CONC eighty acres of an adjacent school section. These properties became
			5 CONC the Elmo townsite, platted into lots that were sold to prospective
			5 CONC residents for $10.

Legacy 7

2 SOUR @S4@
	3 PAGE 179.
	3 DATA
		4 TEXT In 1908 Eliza Oviatt filed on eighty acres and Worth Tucke
			5 CONC r purchased eighty acres of an adjacent school section. The
			5 CONC se properties became the Elmo townsite, platted into lots t
			5 CONC hat were sold to prospective residents for $10.
	3 OBJE
		4 FORM jpg
		4 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-page179.jpg
		4 _SCBK Y
		4 _PRIM Y
		4 _TYPE PHOTO

RootsMagic 4

2 SOUR @S1@
	3 PAGE 179
	3 DATA
		4 TEXT In 1908 Eliza Oviatt filed on eighty acres and Worth Tucker purchased e
			5 CONC ighty acres of an adjacent school section. These properties became the E
			5 CONC lmo townsite, platted into lots that were sold to prospective residents f
			5 CONC or $10.
	3 OBJE
		4 FORM jpg
		4 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-page179.jpg
		4 _SCBK Y
		4 _PRIM Y
		4 _TYPE PHOTO
	3 _TMPLT
		4 FIELD
			5 NAME Page
			5 VALUE 179

The source tag (SOUR) here references the source record with the unique id between the “@” characters. We will look at its details in the next section. The level 3s and higher are for the citation detail. This citation is from page 179 of the source and text from that page is represented by the TEXT tag under the DATA tag. The concatenation tag is used to break up lengthy text onto multiple lines. Both Legacy 7 and RootsMagic 4 export the directory path to the image file of page 179. This is done with the object (OBJE) tag which specifies that the format of the file is jpg. The next three tags begin with an underscore character “_” which means that they are custom extensions to GEDCOM made by applications and that other applications are not required to support them. It appears that both Legacy and RootsMagic have chosen to support these tags. The _SCBK tag indicates if this image should appear in the scrapbook or media viewer inside the application with a Y value signifying “yes”. The _PRIM tag indicates if this is the primary image which is shown in the application in situations where only one image about the citation is shown. The _TYPE tag identifies this as being a photograph as opposed to some other media type. RootsMagic 4 has done something above and beyond the other vendors. It supports a custom template tag (_TMPLT) which we will investigate in a minute.

Before we continue it might be helpful to review page 646 from Evidence Explained to see how the Book: Basic format is represented as a source list entry, full reference note, and short reference note:

Evidence Explained - Book Basic Format - Source List Entry

Evidence Explained - Book Basic Format - Full Reference Note

Evidence Explained - Book Basic Format - Short Note

Take just a minute to examine the fields for each type, formating, as well as the ordering and contents of the author field.

The final section of the GEDCOM file that we will examine is the actual source record:
Family Tree Maker 2009

0 @S00002@ SOUR
	1 TITL Geary, Edward A., A History of Emery County
	1 NOTE
		2 CONC Geary, Edward A..  A History of Emery County:  .  Salt Lake City:
		2 CONC Utah State Historical Society, 1996.

Legacy 7

0 @S4@ SOUR
	1 ABBR History of Emery County
	1 TITL A History of Emery County
	1 AUTH Edward A. Geary
	1 PUBL Salt Lake City: Utah State Historical Society, 1996.
	1 OBJE
		2 FORM jpg
		2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-frontcover.jpg
		2 _SCBK Y
		2 _PRIM Y
		2 _TYPE PHOTO
	1 OBJE
		2 FORM jpg
		2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-titlepage.jpg
		2 _SCBK Y
		2 _TYPE PHOTO
	1 OBJE
		2 FORM jpg
		2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-copyrightpage.jpg
		2 _SCBK Y
		2 _TYPE PHOTO

RootsMagic 4

0 @S1@ SOUR
	1 ABBR History of Emery County
	1 TITL Edward A. Geary, A History of Emery County (Salt Lake City: Utah S
		2 CONC tate Historical Society, 1996), [Page].
	1 _SUBQ Edward A. Geary, A History of Emery County, [Page].
	1 _BIBL Edward A. Geary. A History of Emery County. Salt Lake City: Utah S
		2 CONC tate Historical Society, 1996.
	1 OBJE
		2 FORM jpg
		2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-frontcover.jpg
		2 _SCBK Y
		2 _PRIM Y
		2 _TYPE PHOTO
	1 OBJE
		2 FORM jpg
		2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-titlepage.jpg
		2 _SCBK Y
		2 _PRIM N
		2 _TYPE PHOTO
	1 OBJE
		2 FORM jpg
		2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-copyrightpage.jpg
		2 _SCBK Y
		2 _PRIM N
		2 _TYPE PHOTO
	1 _TMPLT
		2 TID 372
		2 FIELD
			3 NAME Author
			3 VALUE Edward A. Geary
		2 FIELD
			3 NAME Title
			3 VALUE A History of Emery County
		2 FIELD
			3 NAME SubTitle
		2 FIELD
			3 NAME PubPlace
			3 VALUE Salt Lake City
		2 FIELD
			3 NAME Publisher
			3 VALUE Utah State Historical Society
		2 FIELD
			3 NAME PubDate
			3 VALUE 1996

FTM takes a shortcut and using the format specified for source list entry, puts that as the source note. The title (TITL) comes from how FTM automatically named the source when it was created. There are two things to note about the NOTE text: 1) it does not have a way to indicate that the book title should be italicized and 2) there appears to be extra periods and spaces in it.

Legacy 7 takes the approach of trying to stuff the EE-style citation into the fewer fields available for the “old-style” citation. There are 6 parts of a source list entry citation for a basic format book:

  1. Author
  2. Main Title
  3. Sub Title
  4. Place of publication
  5. Publisher
  6. Year

Legacy 7 implements the basic book template with the following fields (the last two pertaining to the citation detail):

  1. Author Last Name
  2. Author Given Name(s)
  3. Author Suffix
  4. Title
  5. Short Title
  6. Publisher City
  7. Publisher State
  8. Publisher
  9. Publish Date
  10. Volume Data
  11. Page
  12. Volume

The standard fields available in GEDCOM are:

  1. Title
  2. Author
  3. Publication

So Legacy makes the following matches:

  • GEDCOM Title = Title
  • GEDCOM Author = Author Given Name(s) + Author Last Name + Author
  • GEDCOM Publication = Publisher City + Publisher State + Publisher + Publish Date

Note that there is some formatting according to the source list entry and can be seen in the PUBL tag as it follows the order of fields and contains the colon and comma in the correct location. Also, the abbreviation (ABBR) tag is used to name the source in the master list after it is imported. The rest of the GEDCOM contents from the Legacy 7 file specify the 3 media files associated with the source. Nothing new there.

I have yet to do additional experiments to determine how the translation to “old style” citations works with more complicated citation formats.

Finally we look at RootsMagic 4. It also uses the abbreviation and object tags in the same way as Legacy 7. But some interesting things are happening in the rest of the file. Notice that the title tag follows the format for a full reference citation complete with parenthesis, commas, and colons. The title is between special formatting tags < i > and < /i > to indicate that it should be italicized. Where the page number should go is the textual placeholder “[Page]”. The custom subsequent tag (_SUBQ) contains the short note format although it should just contain the author’s last name. The custom bibliography tag (_BIBL) contains the source list entry format. It appears that a bug in the export is causing the bibliography entry to not show author with last name first. It is important to note that any application that imports a RM4-generated GEDCOM will get only the contents of the title tag and will have to manually edit it to remove the italicization indicators which they don’t support.

Now let’s get to the part where RootsMagic 4 has shown some innovation in their GEDCOM. Remember the custom template (_TMPLT) tag we saw for the citation:

3 _TMPLT
	4 FIELD
		5 NAME Page
		5 VALUE 179

There is also one in the source:

1 _TMPLT
	2 TID 372
	2 FIELD
		3 NAME Author
		3 VALUE Edward A. Geary
	2 FIELD
		3 NAME Title
		3 VALUE A History of Emery County
	2 FIELD
		3 NAME SubTitle
	2 FIELD
		3 NAME PubPlace
		3 VALUE Salt Lake City
	2 FIELD
		3 NAME Publisher
		3 VALUE Utah State Historical Society
	2 FIELD
		3 NAME PubDate
		3 VALUE 1996

Now compare that with the source entry screen:

RootsMagic 4 - Basic Book Citation

Notice that in the yellow Master Source section, there are 6 entry fields: Author, Title, Sub-title, Publish Place, Publisher, and Publish Date. These correspond to the 6 template (_TMPLT) field name entries in the GEDCOM: Author, Title, SubTitle, PubPlace, Publisher, and PubDate. In the green Source Details section Page corresponds with the field name entry is the citation section: Page. The value tags contain the actual value. That way the details of knowing individual fields and values is not lost. Completing this is the template id or TID tag that is a unique number used internally by RootsMagic 4 to always refer to this template. That is why you can never edit existing templates in RootsMagic 4.

Here are the details of the template for id 372 as shown in the Source Templates screen:

RootsMagic 4 Source Template for Book

It is interesting that each field is given a type to indicate if it is a Name, Place, Date, or Text. This could come in handy in future situations. Imagine searching all sources not just for the text “White” but for all sources that contain a name that contains “White.” Searches like that would return more appropriate results.

So what have we discovered is that the current three applications that support EE-style templates do so slightly differently on the input side (part 1) and vary greatly when it comes to GEDCOM output. As it stands today much is lost in the GEDCOM export rendering rich citations into blobs of text. RootsMagic 4 solves this problem in a proprietary way using its own template id and template fields names.  Currently no real interoperability exists between these applications when it comes to EE-style source citations.
This post is already long enough and I will likely expound on my ideas in a follow-up post. But imagine the RM4 implementation standardized and universally accepted. What a world of interoperability that would open up!

There is so much to think about. What do you think?

7 Comments »

  1. Mark,

    I’ll save some quibbles about software architecture approaches and supposed camps for later, and focus on what you’ve shown here.

    In this post you’ve examined the current quality of the GEDCOM output with regard to EE-citations. A very quick summary might be that the FTM export is poor, Legacy 7 export is mediocre and RM 4 export is good.

    That the citations do not transfer well from one program to another is a foregone conclusion, we’d need some kind of GEDCOM standard for EE-style citations, and there is none yet.
    I’d still be interested to hear how well these three program imports each other’s citations, to get an even better idea of the status quo.

    The more intriguing question is this; how well does each application import its own citations back in again?
    That test relates to this fundamental consideration: is it reasonable to say that an application really supports citations if even that application itself cannot import its own exported citations back in without loss?

    Another relevant question is how well their export matches the existing GEDCOM standard?
    That question directly relates to another practical issue: how well do other genealogy applications import their current EE-style citations?

    - Tamura

    Comment by Tamura Jones — 3 May 2009 @ 11:11 am

  2. My tree on WorldConnect is now a mess. I tried the importing into another program and exporting to GedCom in a failed attempt to “clean up” some of the mess. So I can answer that. It continues to make bigger and bigger messes.

    Comment by Sheri — 4 May 2009 @ 7:24 pm

  3. Sheri,

    This is because some sites and programs don’t know how to ignore user defined tags as detailed in the GEDCOM specs. We will be releasing an update that will have an option to strip the user defined tags out of a GEDCOM to work with those systems that don’t know how to ignore user defined tags.

    In the meantime, WorldConnect has an option where you can specify tags to ignore. You can reprocess using the advanced option and remove these custom tags in the gedcom:

    _UID,_SDATE, _BIBL,_SUBQ, _TMPLT,_COLOR

    - Bruce

    Comment by Bruce Buzbee — 7 May 2009 @ 7:26 am

  4. Bruce,
    I am so glad I came back here to see if anything had been posted. I really love RootsMagic 4 and hated to have to go back. I thought I’d check to see if anyone had an idea. I didn’t know I could remove tags in WC. I will do that until the update.
    Thank you for taking the time to tell me that. I’m going to give it a try when I get home tonight.
    Sheri

    Comment by Sheri — 7 May 2009 @ 8:28 am

  5. [...] Online Sources.  Some of the suggestions that came from the survey and posts Details Part 1 and Details Part 2 (GEDCOM) was why not use an existing [...]

    Pingback by Better Online Citations – Details Part 3 (MARC) | ThinkGenealogy — 20 Jun 2009 @ 12:08 am

  6. [...] cite online sources (Part 1), how citation information can be stored as a file using GEDCOM format (Part 2) and MARC format (Part 3). This post takes the next logical step and discusses MARC [...]

    Pingback by Better Online Citations – Details Part 4 (MARC XML) | ThinkGenealogy — 20 Jun 2009 @ 11:24 am

  7. Hi Mark, great article that raises more questions than answers for frustrated family historians, which is a good thing.

    I have one to add:

    How I wonder could Turnitin.com.au (UN of South Australia’s Baby) be applied to citations in GEDCOM files, and would that work after importation or before, or both?

    I, too, am continually frustrated with acceptable formats across the globe. cheers sonya
    PS My website is still under construction

    Comment by sonya staffolani — 8 Nov 2010 @ 5:31 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress | Theme by Roy Tanck

Copyright 2010 Mark Tucker. All rights reserved.