Online Content Is Disappearing

⁨141⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨years⁩ ago⁩ by ⁨funn@lemy.lol⁩ to ⁨technology@beehaw.org⁩

https://www.pewresearch.org/data-labs/2024/05/17/when-online-content-disappears/

cross-posted from: lemy.lol/post/25166889

source

Comments

Sort:hotnew top

Powderhorn@beehaw.org ⁨2⁩ ⁨years⁩ ago
Poorly thought-out Facebook posts are forever; coverage of city council malfeasance from two years ago, not so much.

source
adespoton@lemmy.ca ⁨2⁩ ⁨years⁩ ago
This isn’t helped by most websites reinventing themselves every couple of years so the old links 404 even though the content still exists.

source
- brisk@aussie.zone ⁨2⁩ ⁨years⁩ ago
  If anyone is considering how to avoid this on their own site: indieweb.org/URL_design
  
  source
C4d@beehaw.org ⁨2⁩ ⁨years⁩ ago
I think much of Geocities remained accessible until 2013/2014 before going completely (apart from Japan 2019 or so).

source
- lemmyvore@feddit.nl ⁨2⁩ ⁨years⁩ ago
  I wonder how much of this stuff may still be around on harddrives somewhere. Random blogs probably not because they were using shared hosting that would overwrite and reuse the space when the blog went down, and typically destroy the drives when the servers got decomissioned. But maybe large platforms like Geocities might still be archived somewhere.
  
  source
  - synae@lemmy.sdf.org ⁨2⁩ ⁨years⁩ ago
    Internet archive actively tried to gather as much as possible before it was decommissioned:
    
    archive.org/web/geocities.php
    
    source
  - frog@beehaw.org ⁨2⁩ ⁨years⁩ ago
    
    I wonder how much of this stuff may still be around on harddrives somewhere.
    
    Probably quite a lot!
    
    Just as an example, I’m a part of an art and writing focused community that’s been around off-and-on since the late 90s. Typically each member has/had their own website. So a few years ago when we went from an “off” phase back to “on” again, a major project became reconstructing the stuff that used to be on Geocities, the various smaller platforms of the 90s and 00s, and ISP-provided webhosting. And obviuously it’s hard to judge how much stuff we don’t remember and therefore don’t know we’re missing, but well over half of what we have reconstructed has come from “I found my external hard drive from 2006 and it had X, Y and Z on it!” I personally had ~3000 files sitting on my NAS, which I had moved off my own hard drive at some point, but had been unwilling to delete, so I just dumped it into long-term storage. Four years into the reconstruction project, we still occasionally find files we thought were lost forever, usually when someone’s found an old hard drive in a box in their attic/basement. The found content often was created by someone else, but downloaded and archived by the hard drive owner.
    
    Although this is representative of just one community, given how apparently common it was for people to download offline copies of websites they liked, it could well be that large swathes of the old internet are sitting on people’s hard drives, waiting to be rediscovered.
    
    source
  - SnotFlickerman@lemmy.blahaj.zone ⁨2⁩ ⁨years⁩ ago
    Probably a good bit. I have a backup from my personal website from 2006 to about 2013. Along with a lot of media going back to my teen years. I’m really lucky I’ve been a good digital steward of my own data and haven’t lost almost any of my personal digital history.
    
    source
Ilandar@aussie.zone ⁨2⁩ ⁨years⁩ ago
This is even more concerning (or funny, depending on how dark your humour is) when you realise that it will be replaced by AI-generated webpages. Humanity’s presence on the internet is disappearing before our eyes.

source
- rwhitisissle@beehaw.org ⁨2⁩ ⁨years⁩ ago
  Dead Internet Theory is one of the few “conspiracy” theories I sort of buy, in the sense that it’s probably not descriptive of the nature of the current internet, but rather predictive of what it’s becoming: en.wikipedia.org/wiki/Dead_Internet_theory
  
  source
  - Ilandar@aussie.zone ⁨2⁩ ⁨years⁩ ago
    I think the Wikipedia article needs to be updated to be honest. Continuing to describe it as a “conspiracy theory” is quite misleading given the phenomenon is already underway and only picking up pace.
    
    source
Kissaki@beehaw.org ⁨2⁩ ⁨years⁩ ago

54% of Wikipedia pages contain at least one link in their “References” section that points to a page that no longer exists.

My impression was/is that over the last years/decade Wikipedia made efforts to/switch to not linking directly but extending direct links with (dated) Web Archive links or using Web Archive links directly (dated as "sourced from this in this state; which protects against upstream edits too).

source
- tiago@beehaw.org ⁨2⁩ ⁨years⁩ ago
  Have not contributed enough to Wikipedia! I’d like to implement this on links I come across.
  
  How does one get an Archive link for a specific date and website? I tried snooping around but am not familiar
  
  source
  - Kissaki@beehaw.org ⁨2⁩ ⁨years⁩ ago
    Wikipedia has guidance for it as Citing sources. Regarding web links specifically section Handling links and Preventing and repairing dead links.
    
    The Web Archive “Wayback Machine” is available at . It has a “Save Page Now” action too.
    
    gives you a history of archived versions of that URL.
    
    source
some_guy@lemmy.sdf.org ⁨2⁩ ⁨years⁩ ago
And now with Google regurgitating a summary of the content they’ve crawled there will be no incentive to publish because no one will click through to get ad payments.

source
- thecodemonk@programming.dev ⁨2⁩ ⁨years⁩ ago
  We need to revive the days people write blog posts to help others instead of pushing ads to make money. The content was far better.
  
  source
  - Facebones@reddthat.com ⁨2⁩ ⁨years⁩ ago
    I started a home server on an RPI running some services with a dinky HTML page on it, I need to start actually posting to it. I’m already not on fb or Twitter or anything, if you want to know what I’m doing go to my homepage loser! 🤣
    
    source
Blackout@kbin.run ⁨2⁩ ⁨years⁩ ago
If the information was important wouldn't it already be passed around and expanded upon? The Internet is probably 99% junk, at least the posts I've made. Only the good stuff like goatse survives.

source
- rar@discuss.online ⁨2⁩ ⁨years⁩ ago
  Problem is, people rarely realize the importance until they’re lost. Plenty of posts from 90s and 2000s containing valuable insights are probably lost forever. Remember that not everything online is in English, either.
  
  source
  - SnotFlickerman@lemmy.blahaj.zone ⁨2⁩ ⁨years⁩ ago
    Finding sources about Bush and Cheney fuckery from 2000-2008 is getting increasingly difficult. Their crimes are getting memory-holed.
    
    source
    -> View More Comments
- rwhitisissle@beehaw.org ⁨2⁩ ⁨years⁩ ago
  From a historical or intellectual archaeological perspective, no one in 2000 BC Babylon thought their pottery would be of historical significance, but 4000 years later, it is. These websites, particularly ones independently created and maintained by hobbyists, are snapshots of the ideas of the time and people that created them. These websites may not have been intensely popular, but they were in many ways a foundational part of the inchoate tapestry of the internet that would eventually become the “modern web.”
  
  source
  - TehPers@beehaw.org ⁨2⁩ ⁨years⁩ ago
    On the flip side, nobody can be expected to keep their website up for 4000 years. Hosting costs money and time, and at some point, the thing you’re hosting will fall out of relevance enough to no longer be worth the cost.
    
    This is why archiving is important. Hopefully most of the content that was lost was archived at some point. Getting a good chunk of that content onto long term storage would do future generations a favor (even if it’s just a bunch of tape storage locked away in a warehouse or something).
    
    source
    -> View More Comments
toothpicks@beehaw.org ⁨2⁩ ⁨years⁩ ago
Yeah. Might need to start backing up stuff myself. I noticed a lot of old YouTube videos go missing as well.

source
snooggums@midwest.social ⁨2⁩ ⁨years⁩ ago
This isn’t inherently bad.

Some web pages are extraneous, fedundant, or only relevant for a limited period of time. A sign up page for a concert doesn’t need to exist permanently. Consolidating a large website down to fewer pages that are accessible for everyone is a good thing.

Archiving services that retain web pages that deserve saving are how we should retain that history of the web, but the actual creators don’t necessarily need to indefinitely maintain a web page that becomes obsolete.

Yes, a lot is lost that could have just continued to exist and archiving is good, but getting rid of clutter is not a bad thing.

source
- nossaquesapao@lemmy.eco.br ⁨2⁩ ⁨years⁩ ago
  But how do you know if it’s the clutter that is being removed? One of the indicator listed in the article is broken links from wikipedia. These links are very likely to point to informational resources or news articles, but are also being lost at a high rate.
  
  source
- The_Che_Banana@beehaw.org ⁨2⁩ ⁨years⁩ ago
  Yeah but Amish Rakefight GFY (a dedicated page you sent someone a link to which then toñd you to Go Fuck Yourself -along with a counter to twll you how many people were told to fuck off) has been gone for a few years now…and damn i miss it.
  
  source
  - snooggums@midwest.social ⁨2⁩ ⁨years⁩ ago
    For sure, things that were not intended to be temporary or were replaced with a better version are sad to lose.
    
    At least we still have penisland.net
    
    source
- Kissaki@beehaw.org ⁨2⁩ ⁨years⁩ ago
  
  Consolidating a large website down to fewer pages that are accessible for everyone is a good thing.
  
  For consolidation, the clean thing is to introduce redirection to the new location.
  
  source
Feyter@programming.dev ⁨2⁩ ⁨years⁩ ago
I think it’s much more impressive that stuff that was added in 2018 and 2019 has a much higher probability of being deleted today than if it was added 2017…

Wonder if that has anything to do with covid and maybe new businesses models opened 2 years before failing and therefore websites of this companies disappeared.

Also I think it would be nice to see a graph of new websites being opened other the same time span.

source
- millie@beehaw.org ⁨2⁩ ⁨years⁩ ago
  I mean, that kind of makes sense. A lot of small websites are probably for temporary projects, or may even be experiments. When the project ends, it usually makes financial sense to quit paying for hosting and domains.
  
  source
petrescatraian@libranet.de ⁨2⁩ ⁨years⁩ ago
@funn Online content is, sadly, more vulnerable than we think. All it takes is one server to go down, and the entire website/thing goes bust.
source
ChallengeApathy@infosec.pub ⁨2⁩ ⁨years⁩ ago
Dead Internet.

source
Maeve@kbin.social ⁨2⁩ ⁨years⁩ ago
The memory hole is real.

source
5714@lemmy.dbzer0.com ⁨2⁩ ⁨years⁩ ago
This is like the disinvention of the printing press, at least from an archeological perspective.

source
- OpenStars@discuss.online ⁨2⁩ ⁨years⁩ ago
  Well, websites can still be made, so not quite the same, but I get what you mean:-).
  
  And similarly, a ton of written material was lost e.g. when a library was burned - often unique or rare material subsequently lost in other ways, so very much the same process.
  
  source