Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2024
License: CC BY
Data sources: ZENODO
ZENODO
Conference object . 2024
License: CC BY
Data sources: Datacite
ZENODO
Conference object . 2024
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Some URLs Are Immortal, Most Are Ephemeral

Authors: Garg, Kritika;

Some URLs Are Immortal, Most Are Ephemeral

Abstract

"How long does a web page last?" is often answered with "44 to 100 days," but the web has changed since those numbers were first given in 1996. We examined how webpage lifespans have evolved using a sample of 27.3 million URLs archived from 1996 to 2021 by the Internet Archive (IA). Only 35% of URLs remained active in 2023, indicating significant web inactivity. Our preliminary analysis suggests that these numbers are not inflated with soft 404s and other phenomena. We encountered DNS failures for 30% of 7 million unique domains. Surprisingly, almost half of the URLs initially archived in 1996–2000 were still active in 2023, suggesting the longevity of some early URLs. Sites like nasa.gov continue to exist. Conversely, some URLs had lifespans that defied measurement. Nearly 30% had short lifespans, with only one archived page or no "200 OK" mementos, indicating brief existence or limited archival interest. The average lifespan of a web page in our dataset is 5.1 years and a median of 2.3 years.. However, this average conceals the bimodal nature of root URLs, where 10% persist for less than a year, and nearly 20% thrive for over 20 years, resulting in a median lifespan of 8.8 years. Deep links have a median lifespan of 1.3 years. We examined web page half-life, i.e., the time it takes for half of the pages to disappear. Root URLs had a half-life of nine years compared to one year for deep links. URLs from different decades exhibited varying lifespans: 1990s URLs had a half-life of 15-20 years, early 2000s URLs had 6-7 years, and URLs from 2003 to 2021 had 6 months to 3 years. Using the IA as a source for sample URLs provides the only realistic, public option to study the evolution of the web at this scale and duration. However, well-known classes of pages are not present in the Wayback Machine, and our findings apply only to publicly archivable pages. They provide a nuanced understanding of web page longevity, emphasizing that while some URLs survive a long time, most have an ephemeral lifespan.

Related Organizations
  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green