The Web Archive has been combating for 25 years to maintain what’s on the internet from going away – and you may assist – GCN
The Internet archive has been fighting for 25 years to make sure that what is on the web does not go away – and you can help
This year the Internet Archive will be 25 years old. It is best known for its pioneering role in archiving the Internet through the Wayback Machine, which allows users to see what websites have looked like in the past.
A large part of daily life is increasingly done online. School, work, communication with friends and family, and news and pictures are all accessed through a variety of websites. Information that used to be printed, physically sent, or kept in photo albums and notebooks may now only be available online. The COVID-19 pandemic has pushed even more interactions onto the internet.
You may not know that parts of the internet are disappearing all the time. As librarians and archivists, we strengthen collective memory by preserving materials that document society’s cultural heritage, including on the internet. As a citizen archivist, you can also help us to save the Internet.
People and organizations remove content from the web for a variety of reasons. Sometimes it’s the result of changing internet culture, such as the recent Yahoo Answers shutdown.
It can also be the result of following website design best practices. For example, updating a website will overwrite the previous version – unless it has been archived.
Web archiving is the process of collecting, maintaining and providing continuous access to information on the Internet. Often this work is done by librarians and archivists using automated technologies such as web crawlers.
Web crawlers are programs that index websites in order to make them available via search engines or to store them for a long time. The Internet Archive, a not-for-profit organization, uses thousands of computer servers to store multiple digital copies of these pages, which require over 70 petabytes of data. It is financed through donations, grants and payments for its digitization services. Over 750 million web pages are recorded daily in the Internet Archive’s Wayback Machine.
In 2018, President Donald Trump falsely claimed on Twitter that Google advertised President Barack Obama’s State of the Union speech on its homepage, but not his own. Archived versions of the Google homepage proved that Google actually highlighted Trump’s State of the Union address in the same way. Several news outlets use the Internet Archive’s Wayback Machine as a source for verifying these types of allegations because screenshots alone are easily altered.
A 2019 report by the Tow Center for Digital Journalism examined the digital archiving practices and guidelines of newspapers, magazines, and other news producers. The interviews revealed that many news media workers either lack the resources to archive their work or misunderstand digital archiving by equating it with a backup version.
When a news story disappeared from the Gawker website a year after it was discontinued, the Freedom of the Press Foundation worried about what could happen if wealthy people buy websites with the intent to delete or censor the archives. It has partnered with the Internet Archive to start a web archive collection that will focus on preserving the web archives of vulnerable news agencies – and preventing billionaires from buying such material for censorship.
Archiving websites that document social justice issues, such as Black Lives Matter, helps explain these movements to people of the present and the future.
Archiving government websites promotes transparency and accountability. In transition periods in particular, government websites are prone to being deleted by changing political parties.
In 2017, the Library of Congress announced that due to the growth of Twitter as a communication tool, it would no longer archive every single tweet. Twitter provides the Library of Congress with the text of tweets, not shared images or videos. Instead of extensive collections, the Library of Congress now only archives tweets of considerable national importance.
Archived websites that document the culture and history of the internet, like the Geocities Gallery, are not only fun, but also illustrate how early websites were created and used by individuals.