Friday, February 6, 2015

He was born in the United States WHEN?!

I’ve recently been doing a bit of research on my early American ancestors, immigrants who arrived in various New England locations in the 1620’s-1700’s.  This is a time when North America was just being opened up for settlement along the east coast.  Life was hard, people died young.  Often, people would marry two and three times as spouses died off, and families tended to be large, often with many infant deaths. 

Some early settlements, like Hartford, Connecticut and Springfield, Massachusetts, left us a decent amount of records of when people were born married and died.  These records are available on and other places online.  But here we come into a problem, one that has been causing me some irritation.  When importing many of these records to my ancestors, I keep running into location fields populated with a location, followed by “United States”.  This is obviously incorrect data, as prior to 4 July 1776, there was no United States of America!  How am I supposed to trust records that include such blatantly incorrect data?

It’s 2015.  We’re living in a time when computers are everywhere, and systems are being built with a lot of built in intelligence.  Why, then, have services like implemented some of that intelligence in fighting this sort of database corruption?  For it is corruption to include demonstrably false data in a database.  Why have they not implemented controls that examine records being entered for such anachronisms as chronologically non-existent countries?  What’s more, we know when most counties in various states were created, as well, and we could also screen for that!  Not only would this screen for bad data, it could then flag the user about the problem, so that they could do further research to get the correct data, instead of relying on erroneous entries that have been passed about for decades.

As an IT person, I have a little experience with programming, and I know this problem is not trivial, but it is also not insurmountable.  Data could be examined, modified to repair blatantly incorrect entries, or perhaps even remove the incorrect portions.  This would not fix existing databases of users of those services, but it would keep new users from filling their databases with bad data!  Perhaps Ancestry or MyHeritage could even offer database cleaning service, to examine users’ data and suggest items to be cleaned.  After all, most major genealogy software now offers some error checking capability; this could be implemented for users who are just using the websites as well.  Heck, it could even be set up as an in-app purchase to help cover the costs of implementing it!

We’re using all of our computing power to collect and store reams of data.  Isn’t it time we used some of that power to make sure the data’s correct?

This and all other articles on this blog are © copyright 2015 by Daniel G. Dillman