My Struggle with Legacy Family Tree Sourcing – Part 2

In my previous post (link), I started the discussion of sources and citations and lumping and splitting as related to creating sources in Legacy Family Tree software. Part 2 will talk a little about the database structure and the trade-0ffs of lumping and splitting.

I made a very simplistic, high-level diagram of how things work in Legacy. It shows a source table with 5 sources. Next is the Citation Table which shows 5 citations all pointing back to source 1. Notice, however, the first three of these citation records contains identical data. This identical data is stored three times because each one points to a different event. The 4th and 5th Citation Data records are also identical.

Diagram 1 - Legacy Source/Citation Database Overview

This data model involves redundant storage of data and is very bad in terms of data maintainability. Now, in order to maintain data integrity, if any piece of the citation data needs to be changed it is necessary to find all copies and change each of them. Legacy shifts this onus of maintaining data integrity to the user, supplying only a Search and Replace tool. But there are several cases where Search and Replace cannot make the desired change easily and some cases where it cannot make the desired change at all. It then falls on the user to edit each copy of the citation data manually.

Diagram 2 illustrates the exact same relationships, but the citation data/event links are pulled out into a separate table. This allows for each unique citation record to be stored once, regardless of the number of events to which it is linked.

Diagram 2 - Alternate Data Model

Had Legacy implemented a model similar to diagram 2, I would be a very happy camper. But they didn’t. So now it all comes down to trade-offs. On one hand become a splitter. This entails creating a multitude of very specific, repetitious sources and minimizing the citation data content. In many, but not all cases, it gets rid of the data integrity problem as described above. On the downside, since you have so many sources, you need to be very careful and vigilant when you create them so that all the repetitious data is consistent from one source to the next. (That is if you care about consistent wording and formatting.) Admittedly, the ability to create a new source by copying an existing one helps with this.

It is also extremely important to come up with an organization scheme that is built into the source name. All Legacy does is present you with an alphabetized list of sources – no filtering or classification on the list. Granted, this is also a problem for lumpers, but the problem is magnified for splitters because they have so many more sources! Over the past couple of years, I’ve tweaked my naming system a couple of times to force the sort order of the source list. I don’t want to even image how much more laborious and time-consuming this would have been were I an extreme splitter! And while I don’t know of a hard-limit on sources, if you have a large database of people you may find that extreme splitting creates just too many sources to be practical.

On the flip-side. lumpers are more likely to have a more manageable number of sources. This has the advantage of making it somewhat easier to tweak them. I have actually tweaked my naming convention to force sort order (as mentioned above) as well as tweaked wording and content of source fields for better consistency in how footnotes and endnotes appear in reports. I also feel that some degree of lumping is more in keeping with database and software design principles. The downside of lumping is, of course, the citation maintainability problem which is the focus of the article.

So where does all this leave me. Well, I’ve already characterized myself as a lumper – although I am sure there are some who are more extreme. Over the years I’ve done a lot of experimenting and tweaking and I like the system I have for deciding what to include in the source table vs what to include in the citation. What works best for me is to define a source for each dataset or data collection. So each database in Ancestry gets their own source. I also tend to create separate sources for each of the datasets stored on the USGenWeb county pages. On the splitter/lumper spectrum, I would say that I’m a moderate lumper. In terms of Source Writer, I have recently decided to primarily use just 2 templates  for online data – online database or online database with images. (But that’s starting to get into a whole other discussion — the multitude of source writer templates!)

My struggle with Legacy sourcing is nothing new. It’s an issue I’ve had to deal with from the time I first started using it back in early 2004. Based on what people say on the LUG email list, the database implementation has pushed some people toward extreme splitting. That never felt right to me, and by now I’m just too entrenched in the way I have been doing things to even consider changing. Why then did I bother writing this post? After creating the SAR source and citation and linking it to about 30 events (so 30 copies of the citation) it occurred to me I should have included something additional in the citation. I tried to fix it with search and replace and guess what — it didn’t work!!!! It said it found and made 30 changes, but when I looked at the data, it had NOT changed. So mostly out of frustration I wrote this post. I still have to go back and try to fix things up, but at least I had a chance to vent.

Maybe some day Legacy will fix this design. Maybe some day some other company will design and build genealogy software that does not have this issue. Maybe some company already has — sometimes it’s hard to tell from the marketing oriented data on their websites. Maybe somebody will see this post and let me know!!

About these ads

10 responses to “My Struggle with Legacy Family Tree Sourcing – Part 2

  1. Pingback: My Struggle with Legacy Family Tree Sourcing – part 1 | Janis' Genealogy Blog

  2. Vicki Pfeffer

    Have you considered emailing your analysis to Legacy?

  3. I guess I’m not following what you are describing because you can change something in a source in Legacy and it will make that change in every place where you have used that source. The only place where it would not make changes to all instances where you used the source is if you made a change in the source details. But, usually, the source details are specific enough that they are only used a few times.

  4. Hi Debbie. You are absolutely correct that the source is stored only once and that if you change it whatever field within it that you updated is now seen universally. In my case I made a change in what you are calling the source-detail (what I have called the citation). For my example, the source is the Ancestry SAR Application Dataset. The specific application of John Doe is a citation (or source detail). So it is stored multiple times – in this case 30 because it listed birth/death/marriage dates for several generations and I attached it to all those “events.” If I find another applicable SAR app, it will be another unique citation (source detail). Regardless of how many SAR apps I find, I will have 1 SAR Source. Each application will be a unique citation (source detail) but each will be stored many times for each event that it supports. It all goes back to the lumper/splitter concept that I was trying to describe.

    Hi. Vicki. You may be right that I should contact Legacy about this. But a situation similar to this was recently discussed on the LUG email list where someone wanted to attach a scan to the source detail/citation. Actually similar situations come up on the LUG fairly regularly. So Legacy (or more properly Millennia Corp) knows. The Search and Replace is usually brought up as the work-around (or solution), although that doesn’t seem to work at all in the case of adding the scanned image.

  5. Interesting analysis. I use and like Legacy but I agree that the sourcing has been problematic in that there was a big buildup with ESM Evidence Explained (great resource) as well as the ability to convert previous sources (which never came to fruition). Sometimes I think the programmers are not adequately supported (or perhaps they are not genealogists) by many of the programs out there and/or certain bells & whistles are more flash than substance.

    That said I find Legacy the best and most intuitive from my limited testing. So, how does Legacy make it better? Have you sent your analysis to them? I know that I read the LUG but after being on the list for a while I dropped off – there needs to be a better method of testing, playing with and discussing than some of the LUG discussions. I would be interested in hearing if Legacy gets back with you and please follow up by letting us know your system and any tweaks. I am early enough in the process that I can make changes and would be interesting in learning how you make the best use of Legacy. Thanks.

  6. I presume you are aware of the Source Clipboard, which makes changing your citations a whole lot easier? All you have to do is change the Source Detail once, then click in those 30 places where you’ve used that exact citation (although I have a hard time understanding how one SAR application would be linked to 30 events; you may be a lumper with sources but you sound like an extreme splitter with events ).

    I have census sources by year and county, but I’d have only one Master Source for the SAR applications database. I do not find naming and finding my multiple sources at all difficult, but you are right that regardless of whether you lump or split (or are somewhere in between like I am by your definition) one must have a logical, planned method for naming sources. That is also true of other databases I’ve used.

  7. Hi Connie. Yes, I am aware of the source clipboard. It is actually the feature I used when assigning all of those 30 citations to begin with. And you are pretty much describing the “manual” process that I use when I want to make a change to a citation (aka source detail). That would be to bring up the assigned sources window for an individual, highlight the one I want to change, click on the edit detail button, make the change, save it, load it onto the source clipboard, go back to the assigned source window, delete the old version from that event, then add the new and delete the old to any other applicable events for that individual. Then navigate to the next affected individual. Go to that person’s assigned sources window, do the add/delete as applicable. And repeat the process for each affected individual.

    In the case of the SAR application, it has birth, marriage, and in some cases death information for all the generations that separate the applicant from the patriot. These are the “events” of which I am speaking. Mostly they refer to the birth, death or marriage fields/fact/event, but some I have put as Alt. BIrth, Alt. Marriage or Alt. Death if the date on the app differs from what I already have. The app also had the birth dates for the applicant’s three children and the birth date of his spouse and their marriage date. So there’s no event splitting going on at all — the events/facts/data pertain to many individuals.

    The point I am trying to make is that it is easy to miss one or more of them when doing this changing process manually. What if the phone rings, someone knocks on the door, one of the kids needs something, etc … So I stop what I am doing and get back to it minutes (or hours, or a day) later. Now where was I? Did I finish the current person? Did I get all the events for him? Did I change the citations for his wife yet? My contention here is that this way of implementing (i.e. storing multiple copies of what should be identical data) dramatically increases the odds that at some point in time, citations/source detail that we want to be identical will somehow not be.

  8. I too share your frustration with how Legacy Family Tree (LFT) handles Sources/Citations/Events. My frustration reached a level such that I reviewed another genealogy program to see if there was a better way.

    The other program I reviewed was RootsMagic (RM).

    Before I discuss how RM handles SCE’s, there is one additional link to your Diagram 2 which should be added – and that is the link between Events and individuals. In my case, as I am sure it is for you, the relationship between an Event and the name database can be a one to many relationship. For example, a single census record (page) usually contains information for several members of the same family, and in Fact a single census record (page) sometimes will contain information for other families/names which are in your genealogy database. Now if you are an extreme splitter, this census page becomes a Source, applied to many individuals. Although LFT’s method of applying this Source is very cumbersome (must be done for one individual at a time), a modification to this Source automatically applies to all name links. However, if you are a lumper (which I am) and you use Citations to define the census page number, district, etc., a modification to the Citation must be done for each individual reference. But I guess you said this.

    RM has a better way to handle Sources, Citations, and shared Sources/Citations. When an Event (called “Fact” in RM) is entered for an individual, you have the option to share the Event/Fact for a multiplicity of individuals. RM has a convenient method for choosing the individuals for sharing Facts (Events) by automatically selecting the family (by spouse), ancestors, or descendants of the individual for which the Fact is being recorded. There are also other automated selection options. But most importantly, if a change is made to the Source Citation for this Fact, the change is made for all individuals for which the Fact is shared. And, of course, a change to a Source is applied automatically to all individual references, no mater what the Citation. Unfortunately Citations are not standalone entities and as such cannot be changed independently of the Fact and the link to the source and individual. But the ability to link Sources/Citations/Facts en masse is very time saving.

    However, LFT is light years ahead of RM in several other ways, particularly in searching and navigating in the DB, applying timelines against individual chronologies, and reporting. Else, I would abandon LFT and switch to RM (I may do this anyway). To me it is apparent that the program DB design for RM is lead by relational database programmers and LFT program DB design is largely influenced by non-programming genealogists. Also it seems to me that RM suffers from lack of capability simply due to program immaturity and perhaps a lack of programming resources. But that’s just a guess on my part.

    If you are considering switching from LFT to RM, be aware that Source and Citation data are transferred in the “free-form” format. All your Sources and Citations would have to be reentered into the RM database to retain their detailed attributes as they were in LFT.

  9. Hi Joe. Thanks for the info on RM. I actually looked at RM a couple of years ago – back when Legacy was on V6. I can’t remember the actual details anymore, but I know that back then RM was lacking some of the functionality that I was used to in Legacy and it wasn’t worth switching. From what you are saying about the searching and chronology, that still appears to be the case. (I really like those features!)

    You are also definitely right about the relationship of events/facts to people. Logically, that should also be a many-many relationship and in Legacy it is not implemented that way. I have often wondered why they chose to implement the database as they have. Admittedly my relational database experience is with Sybase and Oracle on Unix platforms, so maybe there were performance trade-offs with Access on a PC that drove them to the choices they made.

    Too bad there isn’t a product with all the functionality of Legacy combined with the database structure of RM!!

    • “Too bad there isn’t a product with all the functionality of Legacy combined with the database structure of RM!!”

      My sentiments exactly! So, let’s ask Legacy to improve their data base structure. Access will handle it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s