Yesterday afternoon I got an email from Ancestry touting their newly added US Sons of the American Revolution Application dataset. (By the way, you can access it FREE this weekend in celebration of the Fourth of July holiday.) So I naturally logged on to Ancestry, went to the new SAR dataset and entered one of the surnames that I research to see what would pop up. And sure enough, I got some hits! The third or fourth one on the list I recognized as being one of my 5x great-grandfathers. Clicking on his name sent me to a screen where I could access the actual scanned image of a SAR application. And in viewing the application I was able to see birth, death and marriage dates and places for the generations that separated the applicant from the patriot.
Now, for me, this is both a blessing and a curse! Why? Because now I have to decide how to structure this within the confines of the method Legacy Family Tree has implemented sources and citations. Basically, I need to decide what information from the SAR application I want to store in the source and what I want to store in the citation. This discussion comes up often on the Legacy Users Group email list and is generally referred to as “lumping and splitting.” (FYI: Legacy has chosen not to provide a forum/message board, and with their email archiving being fractured and possibly dropping messages, many questions are repeated periodically.)
I think the both the source/citation and so-called lumping/splitting concepts are best illustrated with examples. The most easily understood source example is a book. Most people (regardless of whether they are lumpers or splitters) would agree that the book information (title, author, publisher, date, etc) should be stored in the source table and the page (or chapter, page range, etc) should be stored in the citation. As this illustrates, the purpose of the citation is to link the source to the event, and as such it should refer to the specific subset of the source that contains the data that was extracted and inserted into the event or fact.
Another common source – census data – is much less straight forward in terms of what part is source and what part is citation. If you follow Legacy Source Writer templates for US Federal Censuses, your sources will be specific to year, state, county and online database. In other words:
- source 1 – 1880 Census, Pennsylvania, Berks, Ancestry.com
- source 2 – 1880 Census, Pennsylvania, Berks, HeritageQuest
- source 3 – 1880 Census, Pennsylvania, Chester, Ancestry.com
- source 4 – 1800 Census, Pennsylvania, Chester, HeritageQuest
Splitters may break this down further, with separate sources for each town or city. Some have even suggested each census sheet is a separate source! As you can see, splitters have a very large number of very specific sources. Very little, if any, additional data is stored in the citation and it becomes just a link between source and event.
On the other hand, I fall into the lumper category. I still use Source Writer, but I leave the state and county fields blank when I create the census source. Then, when I add a citation, I preface the municipality field with the state and county. Since the source writer citation form does not have a logical place to store the online database (i.e. Ancestry vs. HeritageQuest, etc), my sources are based on year and online database. I’d rather it just be year, but I want to know which online database I used, so I make this concession in order to use source writer. Thus my sources look something this:
- Source 1 – 1880 US Federal Census, Ancestry
- Source 2 – 1880 US Federal Census, HeritageQuest
In general, lumpers have fewer sources and those sources are (for the most part) not repetitious. My guess is that most Legacy users who have a background in software or database design will probably lean toward the lumper end of the spectrum because that more closely follows the design principles to which we are accustomed.
Getting back to the SAR application and how that fits into the source/citation structure. A splitter would most likely consider each application a separate source. On the other hand, I consider the Ancestry SAR Application Collection the source, thus the specific data on an individual application would be part of my citation. But for a document like a SAR application, where it will serve as a citation for many events or facts, the fact that Legacy stores multiple copies of the citations is trouble waiting to happen. — more on this later in part 2 of this article [link].