I’ve recently been doing a bit of research on my early
American ancestors, immigrants who arrived in various New England locations in
the 1620’s-1700’s. This is a time when
North America was just being opened up for settlement along the east coast. Life was hard, people died young. Often, people would marry two and three times
as spouses died off, and families tended to be large, often with many infant
deaths.
Some early settlements, like Hartford, Connecticut and
Springfield, Massachusetts, left us a decent amount of records of when people
were born married and died. These records
are available on Ancestry.com and other places online. But here we come into a problem, one that has
been causing me some irritation. When
importing many of these records to my ancestors, I keep running into location
fields populated with a location, followed by “United States”. This is obviously incorrect data, as prior to
4 July 1776, there was no United States of America! How am I supposed to trust records that
include such blatantly incorrect data?
It’s 2015. We’re
living in a time when computers are everywhere, and systems are being built
with a lot of built in intelligence.
Why, then, have services like Ancestry.com implemented some of that
intelligence in fighting this sort of database corruption? For it is corruption to include demonstrably
false data in a database. Why have they
not implemented controls that examine records being entered for such
anachronisms as chronologically non-existent countries? What’s more, we know when most counties in
various states were created, as well, and we could also screen for that! Not only would this screen for bad data, it
could then flag the user about the problem, so that they could do further research
to get the correct data, instead of relying on erroneous entries that have been
passed about for decades.
As an IT person, I have a little experience with programming,
and I know this problem is not trivial, but it is also not insurmountable. Data could be examined, modified to repair
blatantly incorrect entries, or perhaps even remove the incorrect
portions. This would not fix existing
databases of users of those services, but it would keep new users from filling
their databases with bad data! Perhaps
Ancestry or MyHeritage could even offer database cleaning service, to examine
users’ data and suggest items to be cleaned.
After all, most major genealogy software now offers some error checking
capability; this could be implemented for users who are just using the websites
as well. Heck, it could even be set up
as an in-app purchase to help cover the costs of implementing it!
We’re using all of our computing power to collect and store
reams of data. Isn’t it time we used
some of that power to make sure the data’s correct?
This and all other articles on this blog are © copyright 2015 by Daniel G. Dillman