Category Archives: Resources – Software, Websites, Records, Etc

Finding Enumeration Maps for Southeastern Pennsylvania Counties

This is an update to my previous post “Tuesday’s Tip – Finding an Enumeration District in the 1940 Census” [link]. In response to that post, I received an email from Ken McCrea who has added a utility to his GermanNames website to aid researchers in finding the ED maps for various southeastern Pennsylvania counties and their population centers (cities/town/townships).

Go to his website [link] and at the very top you can click go to “Guides to the 1940 Census for Southeastern Pennsylvania.” From there it is pretty self-explanatory. He is providing direct links to the maps at the NARA Online Public Access site, eliminating the need to formulate a search query. It makes finding the maps a little more straight-forward.

Tuesday’s Tip – PA State Death Index, 1906-1961 — WooHoo!

The long awaited (for me, anyway) Pennsylvania State Death Index for the years 1906 to 1961 finally came online yesterday and I have been making the most of it! The vast majority of the individuals in my genealogy database are from Pennsylvania. All of my immigrant ancestors arrived in America between the late 1600s and mid to late 1700s and came to PA. My direct lines as well as many cousin lines stayed.

I am using this index as additional death date source for those individuals for whom I have already found a death date – i.e. in an obit or on a tombstone, etc. I am also using it to find an exact date (which can be confirmed with further research) in the cases where I have only a year or a month and year. At this point, I have found just over 100!

As other bloggers [link and link] have noted, the database is not searchable. It is really just a collection of browse-able files which are images of the paper index. For some years the index is alphabetical. For other years, the index is alphabetized by soundex codes. There are instructions on the site to calculate the soundex code by hand, but I found an online calculator here: [link].

The index for each year is broken up into several files – usually about 5 or 6 or more. To make things go a little faster, I used Legacy Family Tree’s advanced tagging feature to find all the individuals who may possibly be in the PA Death Index. I then exported those individuals to GenViewer (a Legacy add-on) for ease of sorting so I’m not flipping back and forth between different years and different files within a year. This has really helped to streamline my workflow.

If you have Pennsylvania people in your database for these years, then this is a resource that you really need to check out. (Oh, there’s also a birth index, but due to the 105 year privacy restriction, it is only for the year 1906 for now.)

RootsTech observations from a Home Viewer

It seems that the whole genealogy community is buzzing about the recent RootsTech conference – and with good reason! I was one of the unfortunate many who could not attend the conference live, but was able to catch a bit of the excitement by watching several of the presentations that were streamed live on the internet. So here goes with some general observations.

Cloud computing was a huge topic in the sessions that I saw online. This included using the cloud for backups, synchronization, collaboration and storage of family trees. I’ve always been a little distrustful of “the cloud,” but I was convinced to take a few more steps in that direction – or at least check it out in more detail. As an example, I know that a lot of people use dropbox for their genealogy data, but I’ve been hesitant. Hearing all the conference talk, however, prompted me to do a google search which showed a product called SecretSync that encrypts files prior to uploading to dropbox. This gets around some of the concerns people are expressing with the dropbox privacy policy. It probably isn’t necessary to SecretSync every file before adding it to drop box, but I will probably do this for any information I consider personal or sensitive. On the other hand, I didn’t get a warm fuzzy feeling about Geni. I still plan to keep my primary genealogy database on my PC and upload a subset to the various tree sites.

In viewing the presentations, I also realized that I’m under-utilizing some important resources – especially maps. LegacyFamilyTree has built-in mapping based on Bing. But I have been unable to get it to work on my relatively new Windows 7 computer. The LegacyFamilyTree website says that their mapping requires IE7. I don’t use IE, but have version 8 installed on my computer. I am reluctant to go back to version 7. After seeing some of the RootsTech presentations, I’m going to look into using GoogleEarth tours and possibly some basic mapping with GoogleMaps. It won’t be integrated with Legacy, but I guess you can’t have everything. :(

While I enjoyed each and every presentation that I saw, the topic that got me most excited was the Google presentation segment on Historical-data.org. In a nutshell, this is a way to add semantic information to a web page in order for the search engines to better assess it’s relevance to a “genealogy search.” I even went so far as to start to update one of my obituary web pages by defining my ancestor Augustus Bechtel as an HistoricalPerson. I did this after the Historical-data.org schema definitions were touched upon in the Day 1 keynote address. I wasn’t sure how to define the HistoricalDates and felt vindicated when watching the Google presentation on Day 2, when the speaker said even the large companies they were working with struggled with this. They  (Google, et al) are promising to add examples to the Historical-data.org blog, and you can just bet that I am now subscribed and waiting for that post! I even put in a product enhancement suggestion for LegacyFamilyTree to add this to the webpages that Legacy generates. (Crossing fingers that they at least consider.)

That’s about it for now. As I try out some of the software and concepts, I may post follow-ups!

My Struggle with Legacy Family Tree Sourcing – Part 2

In my previous post (link), I started the discussion of sources and citations and lumping and splitting as related to creating sources in Legacy Family Tree software. Part 2 will talk a little about the database structure and the trade-0ffs of lumping and splitting.

I made a very simplistic, high-level diagram of how things work in Legacy. It shows a source table with 5 sources. Next is the Citation Table which shows 5 citations all pointing back to source 1. Notice, however, the first three of these citation records contains identical data. This identical data is stored three times because each one points to a different event. The 4th and 5th Citation Data records are also identical.

Diagram 1 - Legacy Source/Citation Database Overview

This data model involves redundant storage of data and is very bad in terms of data maintainability. Now, in order to maintain data integrity, if any piece of the citation data needs to be changed it is necessary to find all copies and change each of them. Legacy shifts this onus of maintaining data integrity to the user, supplying only a Search and Replace tool. But there are several cases where Search and Replace cannot make the desired change easily and some cases where it cannot make the desired change at all. It then falls on the user to edit each copy of the citation data manually.

Diagram 2 illustrates the exact same relationships, but the citation data/event links are pulled out into a separate table. This allows for each unique citation record to be stored once, regardless of the number of events to which it is linked.

Diagram 2 - Alternate Data Model

Had Legacy implemented a model similar to diagram 2, I would be a very happy camper. But they didn’t. So now it all comes down to trade-offs. On one hand become a splitter. This entails creating a multitude of very specific, repetitious sources and minimizing the citation data content. In many, but not all cases, it gets rid of the data integrity problem as described above. On the downside, since you have so many sources, you need to be very careful and vigilant when you create them so that all the repetitious data is consistent from one source to the next. (That is if you care about consistent wording and formatting.) Admittedly, the ability to create a new source by copying an existing one helps with this.

It is also extremely important to come up with an organization scheme that is built into the source name. All Legacy does is present you with an alphabetized list of sources – no filtering or classification on the list. Granted, this is also a problem for lumpers, but the problem is magnified for splitters because they have so many more sources! Over the past couple of years, I’ve tweaked my naming system a couple of times to force the sort order of the source list. I don’t want to even image how much more laborious and time-consuming this would have been were I an extreme splitter! And while I don’t know of a hard-limit on sources, if you have a large database of people you may find that extreme splitting creates just too many sources to be practical.

On the flip-side. lumpers are more likely to have a more manageable number of sources. This has the advantage of making it somewhat easier to tweak them. I have actually tweaked my naming convention to force sort order (as mentioned above) as well as tweaked wording and content of source fields for better consistency in how footnotes and endnotes appear in reports. I also feel that some degree of lumping is more in keeping with database and software design principles. The downside of lumping is, of course, the citation maintainability problem which is the focus of the article.

So where does all this leave me. Well, I’ve already characterized myself as a lumper – although I am sure there are some who are more extreme. Over the years I’ve done a lot of experimenting and tweaking and I like the system I have for deciding what to include in the source table vs what to include in the citation. What works best for me is to define a source for each dataset or data collection. So each database in Ancestry gets their own source. I also tend to create separate sources for each of the datasets stored on the USGenWeb county pages. On the splitter/lumper spectrum, I would say that I’m a moderate lumper. In terms of Source Writer, I have recently decided to primarily use just 2 templates  for online data – online database or online database with images. (But that’s starting to get into a whole other discussion — the multitude of source writer templates!)

My struggle with Legacy sourcing is nothing new. It’s an issue I’ve had to deal with from the time I first started using it back in early 2004. Based on what people say on the LUG email list, the database implementation has pushed some people toward extreme splitting. That never felt right to me, and by now I’m just too entrenched in the way I have been doing things to even consider changing. Why then did I bother writing this post? After creating the SAR source and citation and linking it to about 30 events (so 30 copies of the citation) it occurred to me I should have included something additional in the citation. I tried to fix it with search and replace and guess what — it didn’t work!!!! It said it found and made 30 changes, but when I looked at the data, it had NOT changed. So mostly out of frustration I wrote this post. I still have to go back and try to fix things up, but at least I had a chance to vent.

Maybe some day Legacy will fix this design. Maybe some day some other company will design and build genealogy software that does not have this issue. Maybe some company already has — sometimes it’s hard to tell from the marketing oriented data on their websites. Maybe somebody will see this post and let me know!!

My Struggle with Legacy Family Tree Sourcing – part 1

Yesterday afternoon I got an email from Ancestry touting their newly added US Sons of the American Revolution Application dataset. (By the way, you can access it FREE this weekend in celebration of the Fourth of July holiday.) So I naturally logged on to Ancestry, went to the new SAR dataset and entered one of the surnames that I research to see what would pop up. And sure enough, I got some hits! The third or fourth one on the list I recognized as being one of my 5x great-grandfathers. Clicking on his name sent me to a screen where I could access the actual scanned image of a SAR application. And in viewing the application I was able to see birth, death and marriage dates and places for the generations that separated the applicant from the patriot.

Now, for me, this is both a blessing and a curse! Why? Because now I have to decide how to structure this within the confines of the method Legacy Family Tree has implemented sources and citations. Basically, I need to decide what information from the SAR application I want to store in the source and what I want to store in the citation. This discussion comes up often on the Legacy Users Group email list and is generally referred to as “lumping and splitting.” (FYI: Legacy has chosen not to provide a forum/message board, and with their email archiving being fractured and possibly dropping messages, many questions are repeated periodically.)

I think the both the source/citation and so-called lumping/splitting concepts are best illustrated with examples. The most easily understood source example is a book. Most people (regardless of whether they are lumpers or splitters) would agree that the book information (title, author, publisher, date, etc) should be stored in the source table and the page (or chapter, page range, etc) should be stored in the citation. As this illustrates, the purpose of the citation is to link the source to the event, and as such it should refer to the specific subset of the source that contains the data that was extracted and inserted into the event or fact.

Another common source – census data – is much less straight forward in terms of what part is source and what part is citation. If you follow Legacy Source Writer templates for US Federal Censuses, your sources will be specific to year, state, county and online database. In other words:

  • source 1 – 1880 Census, Pennsylvania, Berks, Ancestry.com
  • source 2 – 1880 Census, Pennsylvania, Berks, HeritageQuest
  • source 3 – 1880 Census, Pennsylvania, Chester, Ancestry.com
  • source 4 – 1800 Census, Pennsylvania, Chester, HeritageQuest

Splitters may break this down further, with separate sources for each town or city. Some have even suggested each census sheet is a separate source! As you can see, splitters have a very large number of very specific sources. Very little, if any, additional data is stored in the citation and it becomes just a link between source and event.

On the other hand, I fall into the lumper category. I still use Source Writer, but I leave the state and county fields blank when I create the census source. Then, when I add a citation, I preface the municipality field with the state and county. Since the source writer citation form does not have a logical place to store the online database (i.e. Ancestry vs. HeritageQuest, etc), my sources are based on year and online database. I’d rather it just be year, but I want to know which online database I used, so I make this concession in order to use source writer. Thus my sources look something this:

  • Source 1 – 1880 US Federal Census, Ancestry
  • Source 2 – 1880 US Federal Census, HeritageQuest

In general, lumpers have fewer sources and those sources are (for the most part) not repetitious. My guess is that most Legacy users who have a background in software or database design will probably lean toward the lumper end of the spectrum because that more closely follows the design principles to which we are accustomed.

Getting back to the SAR application and how that fits into the source/citation structure. A splitter would most likely consider each application a separate source. On the other hand, I consider the Ancestry SAR Application Collection the source, thus the specific data on an individual application would be part of my citation. But for a document like a SAR application, where it will serve as a citation for many events or facts, the fact that Legacy stores multiple copies of the citations is trouble waiting to happen. — more on this later in part 2 of this article [link].