Category Archives: Resources – Software, Websites, Records, Etc

A Must-See Webinar if you have German Ancestors

Genealogy webinars seem to just keep growing and growing in popularity – and I have watched quite a few over the last year or so. The folks over at Millennia, makers of LegacyFamilyTree, have an ongoing series in which the live broadcasts are free. Then, depending on the presenter, recordings are free with no expiration or free for a limited time, after which you can purchase a CD. I’ve watched quite a few of the Legacy webinars and almost always pick up at least one or two tidbits (or more!). But I have to say that yesterday’s webinar “Researching Your German Ancestors” by Kory Meyerink was one of the most informative and relevant to my research. I have many German-speaking ancestors who arrived in America throughout the 1700s. If you do too, you really should check out this webinar. It’s available to view for free until June 18 at this link.

Finding Enumeration Maps for Southeastern Pennsylvania Counties

This is an update to my previous post “Tuesday’s Tip – Finding an Enumeration District in the 1940 Census” [link]. In response to that post, I received an email from Ken McCrea who has added a utility to his GermanNames website to aid researchers in finding the ED maps for various southeastern Pennsylvania counties and their population centers (cities/town/townships).

Go to his website [link] and at the very top you can click go to “Guides to the 1940 Census for Southeastern Pennsylvania.” From there it is pretty self-explanatory. He is providing direct links to the maps at the NARA Online Public Access site, eliminating the need to formulate a search query. It makes finding the maps a little more straight-forward.

Tuesday’s Tip – PA State Death Index, 1906-1961 — WooHoo!

The long awaited (for me, anyway) Pennsylvania State Death Index for the years 1906 to 1961 finally came online yesterday and I have been making the most of it! The vast majority of the individuals in my genealogy database are from Pennsylvania. All of my immigrant ancestors arrived in America between the late 1600s and mid to late 1700s and came to PA. My direct lines as well as many cousin lines stayed.

I am using this index as additional death date source for those individuals for whom I have already found a death date – i.e. in an obit or on a tombstone, etc. I am also using it to find an exact date (which can be confirmed with further research) in the cases where I have only a year or a month and year. At this point, I have found just over 100!

As other bloggers [link and link] have noted, the database is not searchable. It is really just a collection of browse-able files which are images of the paper index. For some years the index is alphabetical. For other years, the index is alphabetized by soundex codes. There are instructions on the site to calculate the soundex code by hand, but I found an online calculator here: [link].

The index for each year is broken up into several files – usually about 5 or 6 or more. To make things go a little faster, I used Legacy Family Tree’s advanced tagging feature to find all the individuals who may possibly be in the PA Death Index. I then exported those individuals to GenViewer (a Legacy add-on) for ease of sorting so I’m not flipping back and forth between different years and different files within a year. This has really helped to streamline my workflow.

If you have Pennsylvania people in your database for these years, then this is a resource that you really need to check out. (Oh, there’s also a birth index, but due to the 105 year privacy restriction, it is only for the year 1906 for now.)

RootsTech observations from a Home Viewer

It seems that the whole genealogy community is buzzing about the recent RootsTech conference – and with good reason! I was one of the unfortunate many who could not attend the conference live, but was able to catch a bit of the excitement by watching several of the presentations that were streamed live on the internet. So here goes with some general observations.

Cloud computing was a huge topic in the sessions that I saw online. This included using the cloud for backups, synchronization, collaboration and storage of family trees. I’ve always been a little distrustful of “the cloud,” but I was convinced to take a few more steps in that direction – or at least check it out in more detail. As an example, I know that a lot of people use dropbox for their genealogy data, but I’ve been hesitant. Hearing all the conference talk, however, prompted me to do a google search which showed a product called SecretSync that encrypts files prior to uploading to dropbox. This gets around some of the concerns people are expressing with the dropbox privacy policy. It probably isn’t necessary to SecretSync every file before adding it to drop box, but I will probably do this for any information I consider personal or sensitive. On the other hand, I didn’t get a warm fuzzy feeling about Geni. I still plan to keep my primary genealogy database on my PC and upload a subset to the various tree sites.

In viewing the presentations, I also realized that I’m under-utilizing some important resources – especially maps. LegacyFamilyTree has built-in mapping based on Bing. But I have been unable to get it to work on my relatively new Windows 7 computer. The LegacyFamilyTree website says that their mapping requires IE7. I don’t use IE, but have version 8 installed on my computer. I am reluctant to go back to version 7. After seeing some of the RootsTech presentations, I’m going to look into using GoogleEarth tours and possibly some basic mapping with GoogleMaps. It won’t be integrated with Legacy, but I guess you can’t have everything. :(

While I enjoyed each and every presentation that I saw, the topic that got me most excited was the Google presentation segment on Historical-data.org. In a nutshell, this is a way to add semantic information to a web page in order for the search engines to better assess it’s relevance to a “genealogy search.” I even went so far as to start to update one of my obituary web pages by defining my ancestor Augustus Bechtel as an HistoricalPerson. I did this after the Historical-data.org schema definitions were touched upon in the Day 1 keynote address. I wasn’t sure how to define the HistoricalDates and felt vindicated when watching the Google presentation on Day 2, when the speaker said even the large companies they were working with struggled with this. They  (Google, et al) are promising to add examples to the Historical-data.org blog, and you can just bet that I am now subscribed and waiting for that post! I even put in a product enhancement suggestion for LegacyFamilyTree to add this to the webpages that Legacy generates. (Crossing fingers that they at least consider.)

That’s about it for now. As I try out some of the software and concepts, I may post follow-ups!

My Struggle with Legacy Family Tree Sourcing – Part 2

In my previous post (link), I started the discussion of sources and citations and lumping and splitting as related to creating sources in Legacy Family Tree software. Part 2 will talk a little about the database structure and the trade-0ffs of lumping and splitting.

I made a very simplistic, high-level diagram of how things work in Legacy. It shows a source table with 5 sources. Next is the Citation Table which shows 5 citations all pointing back to source 1. Notice, however, the first three of these citation records contains identical data. This identical data is stored three times because each one points to a different event. The 4th and 5th Citation Data records are also identical.

Diagram 1 - Legacy Source/Citation Database Overview

This data model involves redundant storage of data and is very bad in terms of data maintainability. Now, in order to maintain data integrity, if any piece of the citation data needs to be changed it is necessary to find all copies and change each of them. Legacy shifts this onus of maintaining data integrity to the user, supplying only a Search and Replace tool. But there are several cases where Search and Replace cannot make the desired change easily and some cases where it cannot make the desired change at all. It then falls on the user to edit each copy of the citation data manually.

Diagram 2 illustrates the exact same relationships, but the citation data/event links are pulled out into a separate table. This allows for each unique citation record to be stored once, regardless of the number of events to which it is linked.

Diagram 2 - Alternate Data Model

Had Legacy implemented a model similar to diagram 2, I would be a very happy camper. But they didn’t. So now it all comes down to trade-offs. On one hand become a splitter. This entails creating a multitude of very specific, repetitious sources and minimizing the citation data content. In many, but not all cases, it gets rid of the data integrity problem as described above. On the downside, since you have so many sources, you need to be very careful and vigilant when you create them so that all the repetitious data is consistent from one source to the next. (That is if you care about consistent wording and formatting.) Admittedly, the ability to create a new source by copying an existing one helps with this.

It is also extremely important to come up with an organization scheme that is built into the source name. All Legacy does is present you with an alphabetized list of sources – no filtering or classification on the list. Granted, this is also a problem for lumpers, but the problem is magnified for splitters because they have so many more sources! Over the past couple of years, I’ve tweaked my naming system a couple of times to force the sort order of the source list. I don’t want to even image how much more laborious and time-consuming this would have been were I an extreme splitter! And while I don’t know of a hard-limit on sources, if you have a large database of people you may find that extreme splitting creates just too many sources to be practical.

On the flip-side. lumpers are more likely to have a more manageable number of sources. This has the advantage of making it somewhat easier to tweak them. I have actually tweaked my naming convention to force sort order (as mentioned above) as well as tweaked wording and content of source fields for better consistency in how footnotes and endnotes appear in reports. I also feel that some degree of lumping is more in keeping with database and software design principles. The downside of lumping is, of course, the citation maintainability problem which is the focus of the article.

So where does all this leave me. Well, I’ve already characterized myself as a lumper – although I am sure there are some who are more extreme. Over the years I’ve done a lot of experimenting and tweaking and I like the system I have for deciding what to include in the source table vs what to include in the citation. What works best for me is to define a source for each dataset or data collection. So each database in Ancestry gets their own source. I also tend to create separate sources for each of the datasets stored on the USGenWeb county pages. On the splitter/lumper spectrum, I would say that I’m a moderate lumper. In terms of Source Writer, I have recently decided to primarily use just 2 templates  for online data – online database or online database with images. (But that’s starting to get into a whole other discussion — the multitude of source writer templates!)

My struggle with Legacy sourcing is nothing new. It’s an issue I’ve had to deal with from the time I first started using it back in early 2004. Based on what people say on the LUG email list, the database implementation has pushed some people toward extreme splitting. That never felt right to me, and by now I’m just too entrenched in the way I have been doing things to even consider changing. Why then did I bother writing this post? After creating the SAR source and citation and linking it to about 30 events (so 30 copies of the citation) it occurred to me I should have included something additional in the citation. I tried to fix it with search and replace and guess what — it didn’t work!!!! It said it found and made 30 changes, but when I looked at the data, it had NOT changed. So mostly out of frustration I wrote this post. I still have to go back and try to fix things up, but at least I had a chance to vent.

Maybe some day Legacy will fix this design. Maybe some day some other company will design and build genealogy software that does not have this issue. Maybe some company already has — sometimes it’s hard to tell from the marketing oriented data on their websites. Maybe somebody will see this post and let me know!!