What’s in a Citation

An exploration of citation complexity; an extension of two posts ago.

In one of my relative’s family tree appears the following citation:

"New York, New York City Marriage Records, 1829-1940," database, FamilySearch (https://familysearch.org/ark:/61903/1:1:24HG-T2H : 10 February 2018), Peter Tychoniewicz in entry for Gabriel Tychoniewicz and Ahaphia Oleszczuk, 05 Oct 1907; citing Marriage, Manhattan, New York, New York, United States, New York City Municipal Archives, New York; FHL microfilm 1,452,208.

There’s a lot going on here. Let’s try taking it apart.

The most specific part is an entry part, one for Peter Tychoniewicz.
That is part of an entry for the marriage of Gabriel Tychoniewicz and Ahaphia Oleszczuk. The entry is dated 1907-10-05.
That entry was part of a database. The database’s name is “‍New York, New York City Marriage Records, 1829-1940‍”. The database’s publisher is FamilySearch. As of 2018-02-10, the entry’s URL in the database is “‍https://familysearch.org/ark:/61903/1:1:24HG-T2H‍”.
That database cites another document. That document’s name is “‍Marriage, Manhattan, New York, New York, United States‍”.
That document is stored in an archive. That archive’s name is “‍New York City Municipal Archives, New York‍”. That archive’s address is “‍New York‍” – implicitly, New York State, USA.
That document is also copied onto a microfilm. That microfilm’s name is “‍FHL microfilm 1,452,208‍”.
Implicitly, the citation also suggests that:
- The microfilm is stored in the repository named “‍The Family History Library‍” in Salt Lake city, Utah, USA;
- The database entry was created from the microfilm, not from the document itself;
- The database is hosted in the repository known as The Internet.

This single example – literally the first one I grabbed, I didn’t do any looking around to find a special one – indicates several points worth keeping in mind in family history citations.

First, the common presentation of citations is not in an obvious order or format. The general rule is something like most-important-data-first; thus we begin with a name that can indicate what kind of source this is, then how to find it online, then where to look in it once you find it. After that we put the sources of the source, original first then derivative. Some details are also posed as English phrases like “‍in entry for‍” and some details are elided completely, like assuming you know an entry for two people in this database means an entry for their marriage or assuming you know the repository that hosts any microfilm with a name starting “‍FHL‍”.

Second, the citation itself is made of multiple parts and their relationships to one another. Each part is relatively simple: a type and 1–3 details. Most parts are related to just one other part, generally by being within that other part; but some have multiple relationships such as the database being related to the document and to the microfilm. The implicit data might suggest that it’s actually related to the database through the microfilm, making only a single path of relationships, but that implicit data adds in two other branches: the database derives from the microfilm and is hosted in The Internet, and the microfilm derives from the document and is hosted in the Family History Library.

We might be able to finagle this example into a simple non-branching structure by calling repositories properties instead of parts, but in general that won’t be possible. Branching provenance is the common case.

For contrast, let’s look at an academic citation:

Luther Tychonievich and James P. Cohoon. 2020. “‍Lessons Learned from Providing Hundreds of Hours of Diversity Training.‍” In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ’20). Association for Computing Machinery, New York, NY, USA, 206–212. DOI:https://doi.org/10.1145/3328778.3366930

This citation also has multiple parts:

The most specific part is an article. The article’s title is “‍Lessons Learned from Providing Hundreds of Hours of Diversity Training.‍” The article has two authors: Luther Tychonievich and James P. Cohoon.
The article was published in a proceedings. The publication occurred in 2020. The proceeding’s title is “‍Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ’20)‍”. The location of the article within that publication is pages 206 through 212, inclusive.
The proceedings has a publisher. The publisher’s name is “‍Association for Computing Machinery‍”. The publisher’s address is “‍New York, NY, USA‍”.
The article was also published online. The Digital Object Identifier for the article is 10.1145/3328778.3366930. The URL for the article is thus https://doi.org/10.1145/3328778.3366930.
Implicitly, the citation also suggests that:
- The online publication has the same publisher as the paginated publication.
- There may or may not even be a paper version: pagination might simply be a throwback to an earlier time.
- Because the name of the publication is “‍Proceedings of …‍” there was probably a conference in which this work was presented, but the article was written before the presentation occurred so it’s really an article, not a conference report.

Several of the previous observations still apply here. Most important first: reputation and originality are the currency of academia, so author and date are the most important, followed by the venue as that is a proxy for exclusivity and thus merit, with data needed to actually locate the article last. The citation is also a branching set of interconnected parts, though in this case the parts are very predictable and can easily be flattened into a couple dozen possible key:value pairs.

Citations, academic and family historical, are a brief presentation of a fairly large amount of interlinked data. For academic citations, the total set of parts is generally quite limited and well handled by various digital formats. For family history citations, the chains get much longer with more branching and there are fewer limitations that can be exploited to simplify them.