Citations in genealogical data are probably not terribly different from citations anywhere else, except that they probably deal with even more types of sources that many other disciplines (individual manuscripts, headstones, etc., in addition to more commonly cited sources like published books). It would be very helpful if a standard citation format had a good answer for how to represent source citations from various existing models, such as those used by WorldCat/OCLC; GEDCOM; Zotero; BibTeX; and other existing “standards”. A new model would not necessarily have to be a derivative of any of these, though there is some built-in adoption that happens by extending an existing standard rather than creating yet another one. But by looking at the existing standards, a new standard can avoid missing things that are necessary, and can avoid reinventing things in a new way that is worse or at least no better than how it has already been done.
A mention was made above to the actual data that is in the source. Modeling that data is not the purpose of a citation model. A citation describes a “source” or a part thereof, not the data that the source gave you. That would be the scope of a conclusional model like GEDCOM or a source data model for structured, extracted data.
A mention was made to ISBN. That is one ID that can be used for some published sources, but not all. OCLC/WorldCat is hopefully assigning IDs to its entries as well, but I don’t know if they’ve dealt with resolving duplicates when they accidentally get multiple entries for the same real source. It would be great if a consortium could create a “source authority” to track sources, give them unique, long-lived IDs, and provide canonical, structured source citations for sources or parts of sources that are of interest.
]]>You’ve focused on the reference note portion of the citation, which sometimes omits details that appear in the source list entry–somehow the export needs to deal with both element sets: reference note citation elements and source list elements (which largely or completely overlap in some cases).
]]>Certainly separate, though linked, entities for the source, citation and event would be needed (just as in GEDCOM) to allow re-use. In addition, some global source id (ISBN?) would be required to prevent duplicate “sources”.
In many (esp. machine-generated) cases, the event of interest would be unknown. For example, Google Books may provide a source for a book and a citation for a page (or even a paragraph or sentence), but it won’t know what person or event you might extract from it.
The advantage of a semantic XHTML microformat approach is that a global search (Google) would be possible: eg. find all web pages that cite page 192 in a given source.
]]>Thanks for the detailed explanation of how this will work.
It will be interesting to see which database company will step up and be the first to do what you suggest, and make a marketing issue of it.
]]>This brief comparison of these three applications clearly shows that there is need for a formal specification to support any citation standard, EE-style or otherwise. The idea that several vendors have, that an individual name or place name can and should be split up in parts is perfectly reasonable. The issue is that for data to transfer correctly, the various applications need to do the same thing.
Although XML is an obvious choice as a vendor-neutral and widely supported base format, citations only make sense in context, and here that context is genealogy. Therefore, the citation format needs to be part of or at least integrate with a genealogy format.
Also, as we discussed by email, whatever its technical merits, no one is really eager to support Yet Another File Format. Therefore, the most promising way forward may be to seek to standardise how EE-style citations should be supported in GEDCOM. That integrates with the genealogy and additionally builds on work that these vendors have already done. You will not be asking them to do something new, merely to standardise what they are already doing.
This could be formalised as a multi-vendor GEDCOM extension that defines how to provide EE-style citations, just like the multi-vendor GEDCOM EL extension defines how to provide first-class place name support.
Then, when GEDCOM is replaced by some XML-based format, all the work already done will naturally carry forward to that new format.
- Tamura
]]>