Monthly Archives: July 2011

Tuesday’s Tip – Consider ALL the Possibilities!

So you just got your hot little hands on a copy of great-great-granddad’s will and he names “his brother-in-law Joe Blow” as one of his co-executors. Do you now throw your arms up in the air and do the genealogy happy dance because you have uncovered the long sought after maiden name your great-great grandmother? It’s tempting! But maybe it’s not quite time to break out the champagne. Here’s a true story from my family files.

Sebastian Keeley (sometimes spelled Keely) was my 5X’s great grandfather. At the time of his death he lived on a “plantation” in Vincent Township, Chester County, Pennsylvania. Records show he also operated a tavern. He died on November 8, 1777 at the age of 48. In his will he names as co-executors his wife Elizabeth, his son Matthias and his brother-in-law George Christman. It is my belief that this is where William Henry Egle, the Pennsylvania State Librarian who wrote a series of genealogy-related articles for the Harrisburg PA Daily Telegraph newspaper in the late 1800s, got his information. His Egle’s “Notes and Queries” is a multiple volume compilation of about 5000 pages containing historical and biographical information on families from eastern and central Pennsylvania. It is an important and widely available genealogical resource in Pennsylvania. And it says my 5x’s great grandmother was Elizabeth Christman.

So this is my starting point. Elizabeth Christman is my 5x’s great-grandmother. Now what? Well naturally I want to find her parents – my 6x’s great-grandparents. And this is where everything starts to fall apart.

The Keely family, headed by Sebastian’s father Valentine, arrived in American around 1728. Valentine settled in what would be become western Montgomery County, Pennsylvania. Two of his children, Matthias and Sebastian settled across the river in Chester County when they became of age.

Daniel, patriarch of the Christman family, came to America 1730. He purchased land near Valentine Keely’s property in 1738. According to Christman genealogists, Daniel’s children were born between 1731 and 1744. Some of his children also wound up settling across the river in Chester County.

This area (i.e. northern Chester County and western Montgomery County, Pennsylvania) was not a huge population center. Despite pockets of housing developments, even in 2011 most people would characterize this area as relatively rural — particularly on the Chester County side of the river. So back in the late 1700s the population was downright sparse. This is good because there are a limited number of Keelys as well as a limited number of Christmans living in the area for us to analyze.

Now, as you’ll recall, Sebastian was born 1729. Logically, his spouse would be in the generation of Daniel’s children. There’s also a small chance Elizabeth could be a granddaughter of Daniel – particularly if he was widowed and she was a second wife. So let’s look at George Christman – Sebastian’s brother-in-law. In a deed dated 1801, the executors of the estate of Sebastian Keely sold land in Limerick township, Montgomery County to Jacob Keely (one of Sebastian’s sons). In that deed George Christman is described as a yeoman of Pikeland township. Now I have been told (correct me if I’m wrong) that in this case, the term of yeoman refers to a farmer who owned land. Through a combination of civil and church records, it can be proved that the George Christman who lived and owned land in Pikeland township, and who was of age to be an executor in 1777, was the son of Daniel Christman, the immigrant mentioned above. And George had a sister named Elizabeth!

So far so good, right? Well, let’s keep going with this. Daniel Christman died in 1760. In his will he describes his daughter Elizabeth as a spinster. So Elizabeth Christman was unmarried when her father wrote his will. So if Sebastian and Elizabeth married after Daniel’s death in 1760, all of their (legitimate) children would have been born after that and would have been under the age of 17 when Sebastian died in 1777. But baptism and other records show that Sebastian had at least 4 children born between 1754 and 1760, as well as four more between about 1762 and 1772. Egle apparently knew this. He implied that Elizabeth was the second wife of Sebastian. What he apparently did not realize, however, was that church records show that Elizabeth Christman, daughter of Daniel, married Johannes Haas/Hause on March 12, 1861. Burial records for Vincent Mennonite Cemetery (aka Rhoades Burial Ground) show that Elizabeth Hause, wife of Johannes, died in 1777.

As if that isn’t bad enough, when Elizabeth Keeley died in 1807, her son Jacob petitioned the orphan’s court to partition or sell land. Jacob was born in 1758. Jacob’s petition lists the names of his then living brothers and sisters as well as the surviving children of his deceased brother Sebastian. From the way this document is written, it would appear that Elizabeth was the mother of all of Sebastian’s known children. To top it off, the birth date of Elizabeth Christman, daughter of Daniel, as recorded in her baptism record, is inconsistent with the birth date inscribed on Elizabeth Keeley’s tombstone. It is, however, consistent with the birth date on the tombstone of Elizabeth Haas/Hause.

So where did we go wrong? Perhaps we found the wrong set of George/Elizabeth Christman siblings. But given the facts we have about George the executor, the only George Christman who possibly fits the bill is unequivocally also the son of Daniel. In the end, our entire premise for Elizabeth Keeley having the maiden name of Christman is the fact that Sebastian called George Christman his brother-in-law.

So who are your brother-in-laws? Strictly speaking your brother-in-law is the brother of your wife or the spouse of your sister. In the interest of space, in this case, George’s wife is not Sebastian’s sister. Nor did George have a deceased wife who was Sebastian’s sister. Nor did George have a deceased sister who was Sebastian’s first wife. That pretty much covers all the brother-in-law bases. Except one. My husband has three sisters. They are my sister-in-laws. Strictly speaking their husbands are not my brother-in-laws – but I’ve called them brother-in-laws. Don’t most people?

What if Sebastian’s wife and George’s wife were sisters. Would Sebastian call George his brother-in-law? Well, I think so and I think that’s exactly what happened in this case. George’s wife was Sophia Frey or Fry. I am slowly finding more information on the Frys in the area. So far, it’s all starting to fit together. There aren’t the glaring inconsistencies as there are with Daniel Christman’s daughter. But it’s still a little premature to start doing the happy dance.  Of course, there’s also the bit about Elizabeth Christman. She was declared to be Sebastian’s wife by a very well-known, respected source over 100 years ago. It’s been an uphill battle trying to convince other Keeley/Keely researchers that that may not be the case!

So getting back to the Tuesday’s Tip — check all the possibilities. Particularly when it comes to relationships. Sometimes siblings are really half-siblings (or even step-siblings). Sometimes brothers are brothers-in-laws. (I also have a will where this is the case.) Sometimes adopted or step children are not explicitly identified as such. And sometimes a brother-in-law is, in the strictest sense of the word, not a brother-in-law but rather the spouse of your sister-in-law!

My Struggle with Legacy Family Tree Sourcing – Part 2

In my previous post (link), I started the discussion of sources and citations and lumping and splitting as related to creating sources in Legacy Family Tree software. Part 2 will talk a little about the database structure and the trade-0ffs of lumping and splitting.

I made a very simplistic, high-level diagram of how things work in Legacy. It shows a source table with 5 sources. Next is the Citation Table which shows 5 citations all pointing back to source 1. Notice, however, the first three of these citation records contains identical data. This identical data is stored three times because each one points to a different event. The 4th and 5th Citation Data records are also identical.

Diagram 1 - Legacy Source/Citation Database Overview

This data model involves redundant storage of data and is very bad in terms of data maintainability. Now, in order to maintain data integrity, if any piece of the citation data needs to be changed it is necessary to find all copies and change each of them. Legacy shifts this onus of maintaining data integrity to the user, supplying only a Search and Replace tool. But there are several cases where Search and Replace cannot make the desired change easily and some cases where it cannot make the desired change at all. It then falls on the user to edit each copy of the citation data manually.

Diagram 2 illustrates the exact same relationships, but the citation data/event links are pulled out into a separate table. This allows for each unique citation record to be stored once, regardless of the number of events to which it is linked.

Diagram 2 - Alternate Data Model

Had Legacy implemented a model similar to diagram 2, I would be a very happy camper. But they didn’t. So now it all comes down to trade-offs. On one hand become a splitter. This entails creating a multitude of very specific, repetitious sources and minimizing the citation data content. In many, but not all cases, it gets rid of the data integrity problem as described above. On the downside, since you have so many sources, you need to be very careful and vigilant when you create them so that all the repetitious data is consistent from one source to the next. (That is if you care about consistent wording and formatting.) Admittedly, the ability to create a new source by copying an existing one helps with this.

It is also extremely important to come up with an organization scheme that is built into the source name. All Legacy does is present you with an alphabetized list of sources – no filtering or classification on the list. Granted, this is also a problem for lumpers, but the problem is magnified for splitters because they have so many more sources! Over the past couple of years, I’ve tweaked my naming system a couple of times to force the sort order of the source list. I don’t want to even image how much more laborious and time-consuming this would have been were I an extreme splitter! And while I don’t know of a hard-limit on sources, if you have a large database of people you may find that extreme splitting creates just too many sources to be practical.

On the flip-side. lumpers are more likely to have a more manageable number of sources. This has the advantage of making it somewhat easier to tweak them. I have actually tweaked my naming convention to force sort order (as mentioned above) as well as tweaked wording and content of source fields for better consistency in how footnotes and endnotes appear in reports. I also feel that some degree of lumping is more in keeping with database and software design principles. The downside of lumping is, of course, the citation maintainability problem which is the focus of the article.

So where does all this leave me. Well, I’ve already characterized myself as a lumper – although I am sure there are some who are more extreme. Over the years I’ve done a lot of experimenting and tweaking and I like the system I have for deciding what to include in the source table vs what to include in the citation. What works best for me is to define a source for each dataset or data collection. So each database in Ancestry gets their own source. I also tend to create separate sources for each of the datasets stored on the USGenWeb county pages. On the splitter/lumper spectrum, I would say that I’m a moderate lumper. In terms of Source Writer, I have recently decided to primarily use just 2 templates  for online data – online database or online database with images. (But that’s starting to get into a whole other discussion — the multitude of source writer templates!)

My struggle with Legacy sourcing is nothing new. It’s an issue I’ve had to deal with from the time I first started using it back in early 2004. Based on what people say on the LUG email list, the database implementation has pushed some people toward extreme splitting. That never felt right to me, and by now I’m just too entrenched in the way I have been doing things to even consider changing. Why then did I bother writing this post? After creating the SAR source and citation and linking it to about 30 events (so 30 copies of the citation) it occurred to me I should have included something additional in the citation. I tried to fix it with search and replace and guess what — it didn’t work!!!! It said it found and made 30 changes, but when I looked at the data, it had NOT changed. So mostly out of frustration I wrote this post. I still have to go back and try to fix things up, but at least I had a chance to vent.

Maybe some day Legacy will fix this design. Maybe some day some other company will design and build genealogy software that does not have this issue. Maybe some company already has — sometimes it’s hard to tell from the marketing oriented data on their websites. Maybe somebody will see this post and let me know!!

My Struggle with Legacy Family Tree Sourcing – part 1

Yesterday afternoon I got an email from Ancestry touting their newly added US Sons of the American Revolution Application dataset. (By the way, you can access it FREE this weekend in celebration of the Fourth of July holiday.) So I naturally logged on to Ancestry, went to the new SAR dataset and entered one of the surnames that I research to see what would pop up. And sure enough, I got some hits! The third or fourth one on the list I recognized as being one of my 5x great-grandfathers. Clicking on his name sent me to a screen where I could access the actual scanned image of a SAR application. And in viewing the application I was able to see birth, death and marriage dates and places for the generations that separated the applicant from the patriot.

Now, for me, this is both a blessing and a curse! Why? Because now I have to decide how to structure this within the confines of the method Legacy Family Tree has implemented sources and citations. Basically, I need to decide what information from the SAR application I want to store in the source and what I want to store in the citation. This discussion comes up often on the Legacy Users Group email list and is generally referred to as “lumping and splitting.” (FYI: Legacy has chosen not to provide a forum/message board, and with their email archiving being fractured and possibly dropping messages, many questions are repeated periodically.)

I think the both the source/citation and so-called lumping/splitting concepts are best illustrated with examples. The most easily understood source example is a book. Most people (regardless of whether they are lumpers or splitters) would agree that the book information (title, author, publisher, date, etc) should be stored in the source table and the page (or chapter, page range, etc) should be stored in the citation. As this illustrates, the purpose of the citation is to link the source to the event, and as such it should refer to the specific subset of the source that contains the data that was extracted and inserted into the event or fact.

Another common source – census data – is much less straight forward in terms of what part is source and what part is citation. If you follow Legacy Source Writer templates for US Federal Censuses, your sources will be specific to year, state, county and online database. In other words:

  • source 1 – 1880 Census, Pennsylvania, Berks, Ancestry.com
  • source 2 – 1880 Census, Pennsylvania, Berks, HeritageQuest
  • source 3 – 1880 Census, Pennsylvania, Chester, Ancestry.com
  • source 4 – 1800 Census, Pennsylvania, Chester, HeritageQuest

Splitters may break this down further, with separate sources for each town or city. Some have even suggested each census sheet is a separate source! As you can see, splitters have a very large number of very specific sources. Very little, if any, additional data is stored in the citation and it becomes just a link between source and event.

On the other hand, I fall into the lumper category. I still use Source Writer, but I leave the state and county fields blank when I create the census source. Then, when I add a citation, I preface the municipality field with the state and county. Since the source writer citation form does not have a logical place to store the online database (i.e. Ancestry vs. HeritageQuest, etc), my sources are based on year and online database. I’d rather it just be year, but I want to know which online database I used, so I make this concession in order to use source writer. Thus my sources look something this:

  • Source 1 – 1880 US Federal Census, Ancestry
  • Source 2 – 1880 US Federal Census, HeritageQuest

In general, lumpers have fewer sources and those sources are (for the most part) not repetitious. My guess is that most Legacy users who have a background in software or database design will probably lean toward the lumper end of the spectrum because that more closely follows the design principles to which we are accustomed.

Getting back to the SAR application and how that fits into the source/citation structure. A splitter would most likely consider each application a separate source. On the other hand, I consider the Ancestry SAR Application Collection the source, thus the specific data on an individual application would be part of my citation. But for a document like a SAR application, where it will serve as a citation for many events or facts, the fact that Legacy stores multiple copies of the citations is trouble waiting to happen. — more on this later in part 2 of this article [link].