On the Internet's memory

This may be a hot take, but I believe that the Internet should remember it all. Every social media post, every webpage, every article—no one should be authorized to remove it, not the author, not the platform, and definitely not the authorities.

This may seem like a radical stance; let me first ground and nuance it.

Things have gone particularly bad in recent years and in certain parts of the Internet. We have Redact; we have social media that explicitly sell "transient posts" as a feature (yes, Instagram Stories); we have news articles that get unpublished for no particular reason; certain platforms have the most cancerous moderation policies I've seen and would censor anything they consider inappropriate for toddlers. Every time I see a 404, it's like seeing a little tombstone for something of value that once existed, and a part of me dies inside.

Post-hoc content modification undermines the Internet in three ways: it destroys historical memory, erodes the integrity of the hyperlinked network, and weakens public accountability.

First is the preservation of history, knowledge, and collective memory. We mourn the burning of the Library of Alexandria to this day, because of the countless rare copies carrying ancient wisdom. It turns out that, by the nature of the Internet, most content we see is rare copies, in that once deleted, they are gone forever unless someone specifically sets out to preserve them. This is indeed the noble mission of the Internet Archive, but its coverage is still limited to the crawlable Internet. I came across the following Reddit comment on Redact:

Nothing pisses me off more than when I have a hyper-specific technological problem and I look it up, get a hit on Reddit, the top comment is "head shoulders knees toes knees toes this comment has been anonymized by redact" and the OP replies with "worked perfectly, thanks!"

Knowledge can always be regenerated and rediscovered, so the impact is relatively excusable. The erasure of collective memory, on the other hand, is tantamount to destroying historical monuments. Much of modern culture—memes, fandom communities, subcultures, and activism—exists primarily online. When a historian studies 19th century politics and history, they will find ample primary resources preserved as paper media. When they study more recent periods, such as BLM, the COVID pandemic, or even more niche subjects like, say, the furry community, imagine their frustration if valuable primary sources testifying to the origin of one specific term or practice are erased without a trace.

Second is the integrity of the articles that remain. By removing one piece of content, one is not just destroying that particular piece of information; collateral damage is also dealt to all websites linking to and citing it. As a maintainer of MDN, one of my weekly duties is to maintain the wellbeing of all external links, and I regularly encounter webpages taken down without replacement (the majority of them being company websites when the company rebrands and removes all their old posts). Occasionally, the MDN article relies so much on critical information contained in the linked article that the removal of the latter effectively destroys the former as well and leaves it in need of a rewrite. The impact multiplies when the page is cited by an academic paper, whose very purpose is to last for eternity like stones upon which we carve human knowledge. When content disappears, research becomes unverifiable, journalism loses sources, and our knowledge fragments.

Third is about transparency and accountability. I believe that permanence disproportionately benefits the public more than the institution. Politicians, companies, and public figures often delete or edit statements once they become controversial. You cannot change printed newspapers, but you can change your tweets, your blog posts, and even online press releases. Without archives, digital history becomes easily manipulable, which serves the interest of those with the power to edit and delete, but not those who read, as is the "memory hole" in 1984. Indeed, permanent public records is one of the pillars of democracy, because it grants everyone equal access to the information, and thus equal voice and equal responsibility. But we cannot hope to achieve information symmetry if the control over said information is asymmetrical—in order to play on a fair ground, information governance must be decentralized instead of provisioned via a single authority; otherwise, the public record is no more than a veil concealing the untold truths.

Dissenters often cite the right to be forgotten as a counterargument. However, this legislature only protects your oblivion from a particular platform, and not the erasure of your digital footprint at large. The Internet Archive, quotes by other people, and screenshots would all record precisely your "forgotten" content. I respect the need to avoid abuse of personal information by one company and one platform, but it is physically impossible to revoke what is already released to the public domain—so why bother? Let your thoughts and your life live on on the Internet.

Finally, there's the question of "should we remember absolutely everything" to which my answer is yes. Your hate speech, your violent threats, your graphical porn, and your embarrassing posts should also live on.1 Remember that our ultimate goal is to build the library of human knowledge and memory, where each work is more or less unique and irreplaceable. We are not in a position to judge the content's appropriateness or value, and we should not be. To this day, there does not exist an unambiguous and universal definition of what is "good", and to enforce some particular standard is to endorse a particular ideology of a particular authority, and we return to the point about power asymmetry in information governance. It is not just the tension of the public and the institution at the moment, but the tension of the people now and the people in the future. This is the reason why even content removal under some consensus is problematic: it excludes future generations from the consent process. By removing certain kinds of content, we are depriving all our future generations of the opportunity to learn, to experience, and to witness the full spectrum of the past. Times change, dynamics shift, and morality evolves2, so let us not impose our current standards on what the future should consume.

In order for information to be truly permanent, its governance must be decentralized and critical functions must be distributed across multiple parties. This is the vision of the decentralized web. Unfortunately, it seems to be at-odds with the desire of the general public: decentralized Mastodon is overtaken by centralized Bluesky; decentralized Matrix is overtaken by centralized Discord; not to mention blockchain-based, truly "undeletable" social media platforms remaining niche. Centralization implies convenience and accountability, which is much more enticing to most users than intangible ideals of freedom and permanence. Thus, we continue to witness content being edited and deleted at the whim of a few, and the Internet's memory continues to morph in real time. The rite of passage of humanity is when, one day, we can start upholding the data we produce as treasurable artifacts of our existence, and not as disposable materials for entertainment and self-expression.

Footnotes

  1. My opinions on copyright are a topic for discussion on another day. As a baseline, I am not against the removal of complete replications because it does not cause loss of information.

  2. This is load-bearing. 800 years ago, blasphemy would be criminal; 400 years ago, anti-monarchy would be criminal; 100 years ago, interracial marriage and homosexuality would be criminal. We cannot expect the future to share our current moral standards and thus cannot exercise our legislation on their behalf.