The Web archive has been combating for 25 years to make sure that what’s on the Web doesn’t go away – and you’ll assist
Dayton, USA, August 14th (The Conversation) The Internet Archive turns 25 this year. It is best known for its pioneering role in archiving the Internet through the Wayback Machine, which allows users to see what websites have looked like in the past.
A large part of daily life is increasingly done online. School, work, communication with friends and family, and news and pictures are all accessed through a variety of websites. Information that used to be printed, physically sent, or kept in photo albums and notebooks may now only be available online. The COVID-19 pandemic has pushed even more interactions onto the internet.
You may not know that parts of the internet are disappearing all the time. As librarians and archivists, we strengthen collective memory by preserving materials that document society’s cultural heritage, including on the internet. As a citizen archivist, you can also help us to save the Internet.
Disappearance of actions People and organizations remove content from the web for a variety of reasons. Sometimes it’s the result of changing internet culture, such as the recent Yahoo Answers shutdown.
It can also be the result of following website design best practices. For example, updating a website will overwrite the previous version – unless it has been archived.
Web archiving is the process of collecting, maintaining and providing continuous access to information on the Internet. Often this work is done by librarians and archivists with the assistance of automated technologies such as web crawlers.
Web crawlers are programs that index websites in order to make them available via search engines or to store them for a long time. The Internet Archive, a not-for-profit organization, uses thousands of computer servers to store multiple digital copies of these pages, which require over 70 petabytes of data. It is financed through donations, grants and payments for its digitization services. Over 750 million web pages are recorded daily in the Internet Archive’s Wayback Machine.
The story goes on
Why archive? In 2018, President Donald Trump falsely claimed on Twitter that Google advertised President Barack Obama’s State of the Union speech on its homepage, but not his own.
Archived versions of the Google homepage proved that Google actually highlighted Trump’s State of the Union address in the same way. Several news outlets use the Internet Archive’s Wayback Machine as a source for verifying these types of allegations because screenshots alone are easily altered.
A 2019 report by the Tow Center for Digital Journalism examined the digital archiving practices and guidelines of newspapers, magazines, and other news producers. The interviews revealed that many news media workers either lack the resources to archive their work or misunderstand digital archiving by equating it with a backup version.
When a news story disappeared from the Gawker website a year after it was discontinued, the Freedom of the Press Foundation worried about what could happen if wealthy people buy websites with the intent to delete or censor the archives. It has partnered with the Internet Archive to start a web archive collection that will focus on preserving the web archives of vulnerable news agencies – and preventing billionaires from buying such material for censorship.
Archiving websites that document social justice issues, such as Black Lives Matter, helps explain these movements to people of the present and the future.
Archiving government websites promotes transparency and accountability. In transition periods in particular, government websites are prone to being deleted by changing political parties.
In 2017, the Library of Congress announced that it would no longer archive every single tweet due to the growth of Twitter as a communication tool. Twitter provides the Library of Congress with the text of tweets, not shared images or videos. Instead of extensive collections, the Library of Congress now only archives tweets of considerable national importance.
Archived websites that document the culture and history of the internet, like the Geocities Gallery, are not only fun, but also illustrate how early websites were created and used by individuals.
Citizen archivists Archiving the Internet is a monumental task that librarians and archivists cannot do on their own. Anyone can be a citizen archivist and preserve history through the wayback machine of the internet archive.
The “Save Page Now” function allows anyone to freely archive a single, public website page. Keep in mind that some websites have special coding to prevent web crawling and archiving, or require you to log in to the site. This can be due to sensitive content or the personal preferences of the web developer.
Local cultural heritage institutions such as libraries, archives and museums also actively archive the Internet. More than 800 institutions use Archive-It, a tool from the Internet Archive, to create archived web collections. At the University of Dayton, we curate collections on our Catholic and Marian heritage, from Catholic blogs to stories about the Virgin Mary on the news.
Through its on-the-fly collections, Archive-It works with organizations and individuals to create collections of “web content about a specific event that captures vulnerable content in a time of crisis.” Similarly, in partnership with the Institute for Museum and Library Services, it launched the Community Webs program to help public libraries create collections of archived web content relevant to local communities.
Today’s websites are tomorrow’s historical testimonies, but only if they are archived. If lost, we lose vital information about corporate and government decisions, modern communication methods like social media, and social movements with significant online presences like Black Lives Matter and #MeToo.
Together with librarians and archivists, you can help ensure the survival of this evidence and save Internet history. (The conversation) SCY SCY