Better Online Citations – Details Part 1

Tuesday, 28 Apr 2009 | by Mark Tucker

There have been a number of comments from viewers of the video, “A Better Way to Cite Online Sources”, asking about how things work behind the scenes. Being a geek by nature, I tend to be technical in my writing and so I tried to stay away from too many details in the video. The main point was to show what a solution to the online citation problem might look like.

For those who want to know more, here are the details.

We will first start with the QuickCheck models found in Evidence Explained. These models can be used by software developers as a feature specification:

Evidence Explained - Book Basic Format Citation

This example from page 646 specifies that a basic book citation consists of 7 parts:

  1. Author
  2. Main Title
  3. Sub Title
  4. Place of publication
  5. Publisher
  6. Year
  7. Page

It also indicates the format of the citation specifying where to put commas, colons, periods, and parenthesis as well as when text is italicized.

These QuickCheck models as well as other citation formats specified in EE (Evidence Explained) have been coded into Legacy 7, Family Tree Maker 2009, and RootsMagic 4.

Now even though each of these 3 desktop genealogy applications used EE for their spec and received clarification from their outside business analyst, Elizabeth Shown Mills, there are slight variations in the implementation by each. To prove this, here are comparisons of each application and how it deals with a book citation:

RootsMagic 4

RootsMagic 4 - Basic Book Citation

Legacy Family Tree 7

Legacy 7 - Basic Book Citation Legacy 7 - Basic Book Citation 2

Family Tree Maker 2009

Family Tree Maker 2009 - Basic Book Citation Family Tree Maker 2009 - Basic Book Citation 2

Here is a comparison of these three applications showing different interpretations of the EE citation model:

Book Citation Format Inconsistencies

So currently we have 3 applications that support EE and each are slightly different. What is needed is a standard that each can measure up against so as more applications support these citation formats, tests can be created to verify compatibility. Some sort of consortium needs to be created to discuss the current differences and come to a consensus of opinion.

Let’s say that process has already occurred and agreement has been reached. Now a standardized file format can be designed to handle the additional level of detail required for the citation models. In some ways this file format would serve a similar purpose as GEDCOM, but would be updated with the capability of handling more-detailed source citation and referencing media such as images and files.

When visiting a website that supports this new file format, the researcher will encounter a download button or link that references that file:

Citation Download Link

In the above example, the link references a file called book.cite. The .cite extension represents a file of a specific content type (called a MIME type) identified as: application/cite+xml. The important thing to know about this is that a .cite file can now be uniquely identified from other content types.

We will not discuss the actual structure of the file at this point. Details will be provided in a later post.

When the researcher clicks the link, the web browser (in this case Firefox) prompts us if we would like to save or open the file:

Firefox Download

As can be seen from the screen, the file is correctly identified as a CITE file and picks the default application used to open the file, ClickCite Launcher.
Part of the prototype code that I wrote was an application called ClickCite Launcher. Its purpose is to intercept CITE files and pass the file along to an importer application. The ClickCite Launcher application would need to be installed on the researcher’s computer and part of the installation would make a file association between .cite files and ClickCite Launcher. This is how Firefox knows which application to use.

The launcher application and the file association are for a computer running Windows. I am not familiar with Mac computers in this regard, but it would surprise me if a similar capability was not available.

This launcher application is aware of all installed applications that support importing of CITE files and presents the user with a list of desktop genealogy applications:

Citation Launcher

After an application is picked from the list and Import is clicked, the launcher application would start the importer application and pass it the book.cite file. The launcher application would likely be open source software whereas each desktop application that supported CITE files would provide their own importer. In the video demonstration, the importer for RootsMagic 4 presented no user interface and just updated the database file for the most-recently opened database, test.rmgc. The developers of the importer could choose to show a user interface that might include a list of all previously created databases and allow the user to choose which database would be the target of the import.

The process is depicted in the following diagram:

Citation Import Process

  1. Genealogist clicks download link which causes the browser to download the file onto his/her computer.
  2. When the file is manually opened or opened by the browser when the Open option is selected, Windows runs the application associated with the extension. In this case, .cite is associated to the launcher application. The launcher receives the location of the downloaded CITE file.
  3. When Import is clicked on the launcher, the importer for the selected genealogy software is started and passed the location of the downloaded CITE file.
  4. The importer loads the CITE file and adds information to the application’s database file.

I hope this addresses some of the technical questions that I have been receiving.


  1. Mark,

    This brief comparison of these three applications clearly shows that there is need for a formal specification to support any citation standard, EE-style or otherwise. The idea that several vendors have, that an individual name or place name can and should be split up in parts is perfectly reasonable. The issue is that for data to transfer correctly, the various applications need to do the same thing.

    Although XML is an obvious choice as a vendor-neutral and widely supported base format, citations only make sense in context, and here that context is genealogy. Therefore, the citation format needs to be part of or at least integrate with a genealogy format.

    Also, as we discussed by email, whatever its technical merits, no one is really eager to support Yet Another File Format. Therefore, the most promising way forward may be to seek to standardise how EE-style citations should be supported in GEDCOM. That integrates with the genealogy and additionally builds on work that these vendors have already done. You will not be asking them to do something new, merely to standardise what they are already doing.

    This could be formalised as a multi-vendor GEDCOM extension that defines how to provide EE-style citations, just like the multi-vendor GEDCOM EL extension defines how to provide first-class place name support.
    Then, when GEDCOM is replaced by some XML-based format, all the work already done will naturally carry forward to that new format.

    - Tamura

    Comment by Tamura Jones — 29 Apr 2009 @ 5:03 am

  2. Excellent comparison of how the three programs handle EE-style source citations. I think FTM 2009 also has a Year of Publication entry, which you don’t show in your comparison chart.

    Thanks for the detailed explanation of how this will work.

    It will be interesting to see which database company will step up and be the first to do what you suggest, and make a marketing issue of it.

    Comment by Randy Seaver — 29 Apr 2009 @ 9:02 am

  3. What Tamura says makes a lot of sense. After trying several genealogy applications, I found that I had the best citation gedcom exports by using the free form option in RootsMagic. The citation exports with the “title” tag and therefore displays well on my TNG (The Next Generation) site. However, that does mean writing all citations by hand. I have decided that for myself, that I like the control that gives me as well the increasing familiarity with correct forms of citations.

    Comment by Patti Hobbs — 29 Apr 2009 @ 10:57 am

  4. [...] Better Online Citations – Details Part 1 we examined how the QuickCheck model for “Book: Basic format” from Evidence Explained was coded [...]

    Pingback by Better Online Citations - Details Part 2 (GEDCOM) | ThinkGenealogy — 3 May 2009 @ 9:19 am

  5. Some form of the RIS format done up in XML might be work looking into.

    Comment by Drew Smith — 3 May 2009 @ 10:40 am

  6. Or better yet, MODS:

    Comment by Drew Smith — 3 May 2009 @ 10:46 am

  7. Have you looked into “microformats” (, cf. There’s quite a bit of discussion/work there on this subject.

    Certainly separate, though linked, entities for the source, citation and event would be needed (just as in GEDCOM) to allow re-use. In addition, some global source id (ISBN?) would be required to prevent duplicate “sources”.

    In many (esp. machine-generated) cases, the event of interest would be unknown. For example, Google Books may provide a source for a book and a citation for a page (or even a paragraph or sentence), but it won’t know what person or event you might extract from it.

    The advantage of a semantic XHTML microformat approach is that a global search (Google) would be possible: eg. find all web pages that cite page 192 in a given source.

    Comment by Mark Roy — 6 May 2009 @ 2:08 pm

  8. This is certainly a terrific idea and I’ve wished for something along this line for the past several years. Your example is of a simple book citation, and there would be a need, of course, for dozens, if not hundreds of additional formats. I do like the idea that a commenter made of using Zotero. It’s open source and the additional formats could be added to their “library” of formats.

    You’ve focused on the reference note portion of the citation, which sometimes omits details that appear in the source list entry–somehow the export needs to deal with both element sets: reference note citation elements and source list elements (which largely or completely overlap in some cases).

    Comment by Steven M. Law — 13 May 2009 @ 9:59 am

  9. There’s a difference between a “file format” and a data model. If a common data model for citations was defined, then that same model could be implemented as part of a GEDCOM extension (for conclusional genealogical data); implemented as a file format for “.cite” files; as an XML schema for exchange in web services that deal with sources; and used as the basis for UI and internal data structures in various desktop applications.

    Citations in genealogical data are probably not terribly different from citations anywhere else, except that they probably deal with even more types of sources that many other disciplines (individual manuscripts, headstones, etc., in addition to more commonly cited sources like published books). It would be very helpful if a standard citation format had a good answer for how to represent source citations from various existing models, such as those used by WorldCat/OCLC; GEDCOM; Zotero; BibTeX; and other existing “standards”. A new model would not necessarily have to be a derivative of any of these, though there is some built-in adoption that happens by extending an existing standard rather than creating yet another one. But by looking at the existing standards, a new standard can avoid missing things that are necessary, and can avoid reinventing things in a new way that is worse or at least no better than how it has already been done.

    A mention was made above to the actual data that is in the source. Modeling that data is not the purpose of a citation model. A citation describes a “source” or a part thereof, not the data that the source gave you. That would be the scope of a conclusional model like GEDCOM or a source data model for structured, extracted data.

    A mention was made to ISBN. That is one ID that can be used for some published sources, but not all. OCLC/WorldCat is hopefully assigning IDs to its entries as well, but I don’t know if they’ve dealt with resolving duplicates when they accidentally get multiple entries for the same real source. It would be great if a consortium could create a “source authority” to track sources, give them unique, long-lived IDs, and provide canonical, structured source citations for sources or parts of sources that are of interest.

    Comment by Randy Wilson — 13 May 2009 @ 11:27 am

  10. [...] A Better Way to Cite Online Sources.  Some of the suggestions that came from the survey and posts Details Part 1 and Details Part 2 (GEDCOM) was why not use an existing [...]

    Pingback by Better Online Citations – Details Part 3 (MARC) | ThinkGenealogy — 20 Jun 2009 @ 12:06 am

  11. [...] posts have explored a better way to cite online sources (Part 1), how citation information can be stored as a file using GEDCOM format (Part 2) and MARC format [...]

    Pingback by Better Online Citations – Details Part 4 (MARC XML) | ThinkGenealogy — 20 Jun 2009 @ 1:07 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress | Theme by Roy Tanck

Copyright 2010 Mark Tucker. All rights reserved.