You cannot separate “history” from how history is done. This makes intuitive sense to anyone who thinks about it, but is easy to forget when you think of “history” as something that’s already known, as opposed to a field where there is lots of new research, discoveries, and re-interpretations of all manner of things that everyone thought they already knew. The methodologies used to do historical research have a direct impact on what we are able to draw from that research. This has me thinking a lot about how “doing history” in the future is going to be dramatically different than it is today, with big implications for whose stories are preserved and understood.

I recently finished a recent episode of this (highly recommended) history podcast that I subscribe to: “Tides of History” by Patrick Wyman. Patrick is both a scholar of Roman history and an MMA commentator, which arguably puts him in the running for one of the most interesting people around. In that episode, Patrick mentions how, as a part of his doctoral research, he read every single surviving letter from the late Roman period. That’s a lot of letters – in the low thousands – but it’s (apparently) doable. I find that remarkable.

Compare that with the situation of medieval or early modern European historians. Barely a century after the Gutenberg printing innovations (I hesitate to say “invention”), there were so many printed materials that it would be impossible even to read all of those that survive today. (Thousands of French liturgical sermons alone!) This is part of what drives specialization in historical research – it’s simply possible to understand more detail about more specific topics in eras in which more primary sources survive. (The historical record’s capacity for specialization also vastly outstrips the ability of academia to provide jobs for those to do it, meaning that some topics get researched, while others don’t. C’est la vie.)

As the “Big Data” people constantly remind us, we’re creating more “data” than ever today. More photos alone than all of humanity created up until 10 years ago, far more music and written material, all that sort of stuff. But the computer revolution has also effectively ended the lifecycle that most of our everyday documents of life used to have: deeds, transactions, birth/marriage/death certificates, bills, etc. Like everything else, these are mostly all recorded electronically now and preserved forever, because storage is essentially free. Of course, formats and media will change, but it’s reasonable to assume that 500 or 1,000 or 5,000 years from now, smart historians will be able to figure out how to read the digital formats of the early 21st century. People disagree about just how much electronic storage is likely to degrade over very long time periods like these, but I tend to suspect it’ll hold up extremely well (especially compared to paper or clay tablets).

In other words, the historians of 2518 or 3018 or whatever are going to have so much primary source material with which to understand our era that it actually becomes a problem. At their disposal will likely be the entire digital lives of billions of people – billions of hours of video, trillions of pages of text in hundreds of languages, billions of complete social media histories, unfathomable amounts of news articles, commentary and more. (And that’s not even considering the metadata.) Even with hyperspecific topical specialization, how would anyone approach that task? Where do you even start?

The answer is that they won’t – not the humans, anyway. The only way to make any sort of sense out of that inconceivable amount of data will be through very powerful data mining and translation technology, probably paired with a form of artificial intelligence that is still far beyond us. It could be that future researchers will be required to form hypotheses first, which they then must test for explanatory power against the “historical record” that only AIs can actually query in its entirety.

This would be a remarkable flip of the traditional model of historical inquiry, in which researchers use primary sources to then form explanatory theses. When there are just far too many primary sources for even a group of humans to ever examine, even with a representative sample, data analysis tools like these will be the only way of grasping what the world was like.

Another big challenge that those future historians will encounter is whose record will survive.

This is already a well-understood issue in historical research. The Roman plebs urbana, for example, are woefully underrepresented in the primary sources that survive to this day, because most of them left little or no document trail, just like most people (who were mostly poor) throughout history. The same dynamic exists today, of course, except that even the poor today often leave a significant digital “data wake” behind them that is preserved indefinitely. The overwhelming dominance of free consumer platforms like Google and Facebook (and Tencent and Weibo) make this inevitable. While these companies may look very different a century from now, the constellations of data they possess seem very unlikely to ever die.

In much of history, a good way to ensure a record of your existence would survive was to be a member of a Church clergy or some sort of official (being a Lord or King or Consul helped). Today, that same permanence has effectively been extended to every person with a Facebook or Google account – Bill Gates has the same type of Google profile that a poor farmer in Oaxaca does. For the vast majority of users who do not delete their free email (let alone social media) accounts, it’s perfectly reasonable to assume that their information will live forever, and one day wind up in a database used for research.

I have long been a proponent of the “own your own platform” school, which is why I blog here, on my own domain and server, instead of on Medium or something. I still think that’s the best approach; but one drawback is that, not belonging to a corporate mega-platform, the content here is less likely to survive the test of time. All of these words live on an AWS server, which will promptly be erased and re-written once I stop paying for it. (Though I’ll probably back it up physically one day. We’ll see.)

The moment we stopped storing most of our society’s information in hardcopy, the question of what our history will look like became one of business models (eg. who pays for it?) rather than how long clay or paper or tape can be preserved. It’s tantalizing to imagine what they might get wrong about us one day – and, along the same lines, what we get wrong about the ancients now.


