Big data is usually discussed in terms of its applicability to business or scientific research, but it can be valuable for much more. Consider, for instance, the release of the 1940 census data by the U.S. National Archives earlier this week. Hey, they’ve even managed to find Jimmy Hoffa!

Well, they’ve been able to find Hoffa’s record from the 1940 census, anyway. Hoffa was “enumerated” (the census takers were called “enumerators”) on April 6, 1940.

While Hoffa’s a pretty well-researched subject, I was curious about some less-famous characters. Specifically, I wanted to see if I could find any information about my family via the 1940 census.

At the moment, you can’t search the indexes for names. You can only search for the images of census sheets for the “enumeration districts” and then scan the images for names. If you have a reasonably good idea where someone lived, you can find the enumeration district pretty easily.

At least on my father’s side, this was pretty trivial because I knew what city his parents lived in, and it’s a small city. If all you know is the state, for example, it’s going to be much trickier until data is indexed by name. (If you’d like to volunteer to help, go check out the volunteer page and see the video to use the indexing program.)

Using the census data, I’ve been able to suss out a few family details I wasn’t privy to before and have been lost in the mists of time. For example, I now know what year my great-grandfather was born. I also now now that three of my great-uncles were living at home in 1940, and their occupations and level of schooling.

To be sure, it’s not a lot of information. But it’s a few pieces of information I didn’t have before. Armed with that, I can do some further research, too.

This is an amazing undertaking, and literally the first of its kind. The National Archives have made available 3.8 million “pages” captured from microfilm of the census sheets that seem to average about 5MB each. This means that the entire collection weighs in at more than 18TB. Not the biggest of big data, perhaps, but nothing to sneeze at.

That this is the first time these images have been made available online freely. Earlier census data has been released, but you have to go through commercial sites to get to the images.

Big Data and History

Digitizing historical information and putting it online is not new, but it’s still too rare. We need more efforts like this one, which will empower people to piece together family history or research.

I don’t know about you, but I’m eagerly awaiting the release of the 1950 census.