In Better Online Citations – Details Part 1 we examined how the QuickCheck model for “Book: Basic format” from Evidence Explained was coded in Family Tree Maker 2009, Legacy 7, and RootsMagic 4. From the screens we were able to identify implementation differences between the three applications. There are also differences between the applications in how citation information is conveyed via a GEDCOM export. The individual fields shown on the template screens are lost in the standard GEDCOM export making it impossible to create a rich EE-style citation in one application, export it to GEDCOM, and import it into another application while retaining that richness. In all cases (except when the exporter and importer of the GEDCOM is RootsMagic 4), the citation is changed from a “Book: Basic format” to a generic “old-style” (pre EE) format with important details lost.
In the previous post I skipped all details of the file format needed to support online downloadable source citations. From my own observations, from survey feedback as well as comments on this blog and through e-mail there are two main camps when it comes to this topic. One group feels that the best approach to take is an extension to the existing GEDCOM 5.5 standard (which was released in 1996). They feel that it is the best choice to lower the barrier of adoption and avoids YAFF (Yet Another File Format). See the comments from Tamura Jones from Part 1. The second group is open to a new file format based on XML (Extensible Markup Language) which has wide support among programming languages. As a programmer I lean slightly to the side of XML and that is what I used in the prototype shown in the video. But I am open to either view.
In that spirit of openness, I will first look at how the three applications that support EE-style citations represent those citations in an exported GEDCOM file. In this post, we will look at sections of GEDCOM which is a little technical but there will be plenty of explanations for readers of all levels. From the previous post we saw differences in how each application implemented the EE-style QuickCheck model and this post will show differences in how each exports them. You will also see some forethought in the export from one vendor.
For those who have never seen a GEDCOM file, each line starts with a number: 0, 1, 2, etc. A zero is the beginning of a new record and higher numbers are “nested” under lower numbers. It is a way of grouping information together. In the GEDCOM fragments I show here, I will indent the lines to make them easier to understand. I have also done some slight rearranging of lines to make the comparisons easier. After the number, each line has a tag which is a shortened identifier of the information contained on that line. You can see definitions of the standard tags here.
A GEDCOM file starts with a header that describes the software application that generated the file as well as the version of the GEDCOM standard:
Family Tree Maker 2009
0 HEAD 1 SOUR FTM 2 VERS Family Tree Maker (220.127.116.115) 2 NAME Family Tree Maker for Windows 2 CORP The Generations Network 1 DEST GED55 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED
0 HEAD 1 SOUR Legacy 2 VERS 7.0 2 NAME Legacy (R) 2 CORP Millennia Corp. 1 DEST Gedcom55 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED
0 HEAD 1 SOUR RootsMagic 2 VERS 4.0 2 NAME RootsMagic 2 CORP RootsMagic, Inc. 1 DEST RootsMagic 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED
The first line starts the header record which contains a source program (SOUR) that generated the file and a destination program or format (DEST) that will use the file. The level under source shows the program, version and company information. The GEDC or GEDCOM tag identifies that the file adheres to the specified GEDCOM version and format. FTM and Legacy shows 5.5 whereas RM shows 5.5.1. GEDCOM 5.5.1 was published as a draft in 1999 but the changes between it and GEDCOM 5.5 do not affect our discussion. The rest of the header record was removed as it is not important for our comparison. Even though the numbering and abbreviations take a little getting used to, the file format is pretty straight forward. The rest of the file is just as understandable.
As you might remember from the video, we have my great grandfather Worth Tucker who owned property in Elmo, Emery, Utah. The source of this information was a book. From the book we have 4 images: three that go with the source entry and one for a specific page that should be associated with the citation or source detail. Lastly, there is an extract from a page added as the citation text.
Here is what part of that looks like:
Family Tree Maker 2009
0 @I00001@ INDI 1 NAME Worth /Tucker/ 1 SEX U
0 @I1@ INDI 1 NAME Worth /Tucker/ 2 GIVN Worth 2 SURN Tucker 1 SEX U
0 @I1@ INDI 1 NAME Worth /Tucker/ 2 GIVN Worth 2 SURN Tucker
These next lines start an individual (INDI) record (notice the 0 prefix). The ID between @ characters (ex: @I00001@) uniquely identifies this person record from any other person records in the file. The name of the person is indicated in one or two ways. The NAME tag requires the full name with the surname between “/” characters. The name can also be broken in given name and surname as represented by GIVN and SURN. Some formats set the gender or sex to U for unknown or don’t include the tag unless it is set.
Included in the individual record is the property ownership event:
Family Tree Maker 2009
1 EVEN 80 acres 2 TYPE Property 2 DATE 1908 2 PLAC Elmo, Emery, Utah
1 EVEN 80 acres 2 TYPE Property 2 DATE 1908 2 PLAC Elmo, Emery, Utah
1 PROP 80 acres 2 DATE 1908 2 PLAC Elmo, Emery, Utah
FTM and Legacy choose to use generic event tag with a corresponding qualifying type of “Property” whereas RM simplifies it by using the property tag. Both are equivalent. Each includes the property description of “80 acres” and the date and place of ownership.
GEDCOM supports the concept of a master source with its information and multiple citation details corresponding to an event. In part 1, this concept was shown in the application screens in two screens (FTM 2009 and Legacy 7) or a single screen with different colored top and bottom sections (RootsMagic 4). The way this is represented in the GEDCOM format is to have a separate source record with a unique id and to reference that source record for the specific event being cited. Additional citation details are then given. Notice that the below section starts at level 2 and appears in the file at the same level as PLAC and right below it. This signifies that the event being cited is the parent level 1 or the property ownership event.
Family Tree Maker 2009
2 SOUR @S00002@ 3 PAGE 179 3 DATA 4 TEXT In 1908 Eliza Oviatt filed on eighty acres and Worth Tucker purchased 5 CONC eighty acres of an adjacent school section. These properties became 5 CONC the Elmo townsite, platted into lots that were sold to prospective 5 CONC residents for $10.
2 SOUR @S4@ 3 PAGE 179. 3 DATA 4 TEXT In 1908 Eliza Oviatt filed on eighty acres and Worth Tucke 5 CONC r purchased eighty acres of an adjacent school section. The 5 CONC se properties became the Elmo townsite, platted into lots t 5 CONC hat were sold to prospective residents for $10. 3 OBJE 4 FORM jpg 4 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-page179.jpg 4 _SCBK Y 4 _PRIM Y 4 _TYPE PHOTO
2 SOUR @S1@ 3 PAGE 179 3 DATA 4 TEXT In 1908 Eliza Oviatt filed on eighty acres and Worth Tucker purchased e 5 CONC ighty acres of an adjacent school section. These properties became the E 5 CONC lmo townsite, platted into lots that were sold to prospective residents f 5 CONC or $10. 3 OBJE 4 FORM jpg 4 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-page179.jpg 4 _SCBK Y 4 _PRIM Y 4 _TYPE PHOTO 3 _TMPLT 4 FIELD 5 NAME Page 5 VALUE 179
The source tag (SOUR) here references the source record with the unique id between the “@” characters. We will look at its details in the next section. The level 3s and higher are for the citation detail. This citation is from page 179 of the source and text from that page is represented by the TEXT tag under the DATA tag. The concatenation tag is used to break up lengthy text onto multiple lines. Both Legacy 7 and RootsMagic 4 export the directory path to the image file of page 179. This is done with the object (OBJE) tag which specifies that the format of the file is jpg. The next three tags begin with an underscore character “_” which means that they are custom extensions to GEDCOM made by applications and that other applications are not required to support them. It appears that both Legacy and RootsMagic have chosen to support these tags. The _SCBK tag indicates if this image should appear in the scrapbook or media viewer inside the application with a Y value signifying “yes”. The _PRIM tag indicates if this is the primary image which is shown in the application in situations where only one image about the citation is shown. The _TYPE tag identifies this as being a photograph as opposed to some other media type. RootsMagic 4 has done something above and beyond the other vendors. It supports a custom template tag (_TMPLT) which we will investigate in a minute.
Before we continue it might be helpful to review page 646 from Evidence Explained to see how the Book: Basic format is represented as a source list entry, full reference note, and short reference note:
Take just a minute to examine the fields for each type, formating, as well as the ordering and contents of the author field.
The final section of the GEDCOM file that we will examine is the actual source record:
Family Tree Maker 2009
0 @S00002@ SOUR 1 TITL Geary, Edward A., A History of Emery County 1 NOTE 2 CONC Geary, Edward A.. A History of Emery County: . Salt Lake City: 2 CONC Utah State Historical Society, 1996.
0 @S4@ SOUR 1 ABBR History of Emery County 1 TITL A History of Emery County 1 AUTH Edward A. Geary 1 PUBL Salt Lake City: Utah State Historical Society, 1996. 1 OBJE 2 FORM jpg 2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-frontcover.jpg 2 _SCBK Y 2 _PRIM Y 2 _TYPE PHOTO 1 OBJE 2 FORM jpg 2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-titlepage.jpg 2 _SCBK Y 2 _TYPE PHOTO 1 OBJE 2 FORM jpg 2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-copyrightpage.jpg 2 _SCBK Y 2 _TYPE PHOTO
0 @S1@ SOUR 1 ABBR History of Emery County 1 TITL Edward A. Geary, A History of Emery County (Salt Lake City: Utah S 2 CONC tate Historical Society, 1996), [Page]. 1 _SUBQ Edward A. Geary, A History of Emery County, [Page]. 1 _BIBL Edward A. Geary. A History of Emery County. Salt Lake City: Utah S 2 CONC tate Historical Society, 1996. 1 OBJE 2 FORM jpg 2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-frontcover.jpg 2 _SCBK Y 2 _PRIM Y 2 _TYPE PHOTO 1 OBJE 2 FORM jpg 2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-titlepage.jpg 2 _SCBK Y 2 _PRIM N 2 _TYPE PHOTO 1 OBJE 2 FORM jpg 2 FILE C:\Users\mtucker\Documents\RootsMagic downloads - test\historyofemerycounty-copyrightpage.jpg 2 _SCBK Y 2 _PRIM N 2 _TYPE PHOTO 1 _TMPLT 2 TID 372 2 FIELD 3 NAME Author 3 VALUE Edward A. Geary 2 FIELD 3 NAME Title 3 VALUE A History of Emery County 2 FIELD 3 NAME SubTitle 2 FIELD 3 NAME PubPlace 3 VALUE Salt Lake City 2 FIELD 3 NAME Publisher 3 VALUE Utah State Historical Society 2 FIELD 3 NAME PubDate 3 VALUE 1996
FTM takes a shortcut and using the format specified for source list entry, puts that as the source note. The title (TITL) comes from how FTM automatically named the source when it was created. There are two things to note about the NOTE text: 1) it does not have a way to indicate that the book title should be italicized and 2) there appears to be extra periods and spaces in it.
Legacy 7 takes the approach of trying to stuff the EE-style citation into the fewer fields available for the “old-style” citation. There are 6 parts of a source list entry citation for a basic format book:
- Main Title
- Sub Title
- Place of publication
Legacy 7 implements the basic book template with the following fields (the last two pertaining to the citation detail):
- Author Last Name
- Author Given Name(s)
- Author Suffix
- Short Title
- Publisher City
- Publisher State
- Publish Date
- Volume Data
The standard fields available in GEDCOM are:
So Legacy makes the following matches:
- GEDCOM Title = Title
- GEDCOM Author = Author Given Name(s) + Author Last Name + Author
- GEDCOM Publication = Publisher City + Publisher State + Publisher + Publish Date
Note that there is some formatting according to the source list entry and can be seen in the PUBL tag as it follows the order of fields and contains the colon and comma in the correct location. Also, the abbreviation (ABBR) tag is used to name the source in the master list after it is imported. The rest of the GEDCOM contents from the Legacy 7 file specify the 3 media files associated with the source. Nothing new there.
I have yet to do additional experiments to determine how the translation to “old style” citations works with more complicated citation formats.
Finally we look at RootsMagic 4. It also uses the abbreviation and object tags in the same way as Legacy 7. But some interesting things are happening in the rest of the file. Notice that the title tag follows the format for a full reference citation complete with parenthesis, commas, and colons. The title is between special formatting tags < i > and < /i > to indicate that it should be italicized. Where the page number should go is the textual placeholder “[Page]”. The custom subsequent tag (_SUBQ) contains the short note format although it should just contain the author’s last name. The custom bibliography tag (_BIBL) contains the source list entry format. It appears that a bug in the export is causing the bibliography entry to not show author with last name first. It is important to note that any application that imports a RM4-generated GEDCOM will get only the contents of the title tag and will have to manually edit it to remove the italicization indicators which they don’t support.
Now let’s get to the part where RootsMagic 4 has shown some innovation in their GEDCOM. Remember the custom template (_TMPLT) tag we saw for the citation:
3 _TMPLT 4 FIELD 5 NAME Page 5 VALUE 179
There is also one in the source:
1 _TMPLT 2 TID 372 2 FIELD 3 NAME Author 3 VALUE Edward A. Geary 2 FIELD 3 NAME Title 3 VALUE A History of Emery County 2 FIELD 3 NAME SubTitle 2 FIELD 3 NAME PubPlace 3 VALUE Salt Lake City 2 FIELD 3 NAME Publisher 3 VALUE Utah State Historical Society 2 FIELD 3 NAME PubDate 3 VALUE 1996
Now compare that with the source entry screen:
Notice that in the yellow Master Source section, there are 6 entry fields: Author, Title, Sub-title, Publish Place, Publisher, and Publish Date. These correspond to the 6 template (_TMPLT) field name entries in the GEDCOM: Author, Title, SubTitle, PubPlace, Publisher, and PubDate. In the green Source Details section Page corresponds with the field name entry is the citation section: Page. The value tags contain the actual value. That way the details of knowing individual fields and values is not lost. Completing this is the template id or TID tag that is a unique number used internally by RootsMagic 4 to always refer to this template. That is why you can never edit existing templates in RootsMagic 4.
Here are the details of the template for id 372 as shown in the Source Templates screen:
It is interesting that each field is given a type to indicate if it is a Name, Place, Date, or Text. This could come in handy in future situations. Imagine searching all sources not just for the text “White” but for all sources that contain a name that contains “White.” Searches like that would return more appropriate results.
So what have we discovered is that the current three applications that support EE-style templates do so slightly differently on the input side (part 1) and vary greatly when it comes to GEDCOM output. As it stands today much is lost in the GEDCOM export rendering rich citations into blobs of text. RootsMagic 4 solves this problem in a proprietary way using its own template id and template fields names. Currently no real interoperability exists between these applications when it comes to EE-style source citations.
This post is already long enough and I will likely expound on my ideas in a follow-up post. But imagine the RM4 implementation standardized and universally accepted. What a world of interoperability that would open up!
There is so much to think about. What do you think?