Web archiving

The remedy to cowardice, inattention, censorship and amorality.

Due to Ciro Santilli's campaign for freedom of speech in China, Ciro Santilli maintains information on this at mostly at:

Dan Dascalescu's "Web page archiving" comparison table: web.archive.org/web/20130922192354/http://wiki.dandascalescu.com/reviews/online_services/web_page_archiving

Archive.today (archive.is)

 1  0

cirosantilli.com/china-dictatorship/archive-today

Some of their archiving accounts:

vk.com/id534525981 e.g. archive.ph/GTM7S

Creator of Archive.today (Denis Petrov of Archive.Is)

 0  0

webapps.stackexchange.com/questions/145817/on-which-country-are-the-creators-and-servers-of-archive-today-archive-is-base/175600#175600 by Ciro Santilli
Points to:
- www.linkedin.com/in/denispetrov/
"Alex Conferno" is also brought up: twitter.com/conferno
www.reddit.com/r/COPYRIGHT/comments/1bcqf3y/archivetoday_archiveis_copyright_victims/
drive.google.com/file/d/1JTPVd09NPaGH-KzGv2jU3XXcFiJAoUjw/view some crazy due investigating, let's see how long until it goes down, posted at:
www.reddit.com/r/DataHoarder/comments/12trawt/has_anyone_ever_actually_spoken_to_denis_petrov/
gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/. Trended on Hacker News: news.ycombinator.com/item?id=37009598
gigazine.net/gsc_news/en/20240326-archive-today/

Other mentions of "Denis Petrov":

webmasters.stackexchange.com/questions/88257/deny-access-to-archive-is

In 2025, the FBI got interested in the creator of the website due to its news paywall circumvention usage, and the dude started going nuts, coming after anyone who had previously investigated him with OSINT, e.g. the author of gyrovague.com and Ciro Santilli:

Later on in 2026 Wikipedia banned archive.today links because it had DDoS'ed Wikipedia and tampered with snapshots: arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos-and-altered-web-captures/ The dude had gone completely unhinged, destroying everything he built.

Internet Archive

 1  0

Internet Archive Open Library

 0  0

Previously called "Lending Library" it seems: help.archive.org/hc/en-us/articles/360016554912-Borrowing-From-The-Lending-Library

You can borrow online books from them for a few hours/days: help.archive.org/hc/en-us/articles/360016554912-Borrowing-From-The-Lending-Library This is the most amazing thing ever made!!! You can even link to specific pages, e.g. archive.org/details/supermenstory00murr/page/80/mode/2up

They seem to a have a separate URL with the same content as well for some reason: openlibrary.org/, classic messy Internet Archive style.

Bastards are suing them www.theverge.com/2020/6/1/21277036/internet-archive-publishers-lawsuit-open-library-ebook-lending: Hachette, Penguin Random House, Wiley, and HarperCollins

It is quite hard to decide if an upload is from the official legal lending library, or just some illegal upload, e.g.:

archive.org/details/TheGoogleStory likely illegal
archive.org/details/isbn_9780385342728 likely legal

so the URLs are basically the same style. Some legality indicators:

Access-restricted-item: true
present in the collection: archive.org/details/internetarchivebooks?tab=about

Hachette v. Internet Archive (2023)

 1  0

Wayback Machine

 1  0

cirosantilli.com/china-dictatorship/wayback-machine

 Tagged

CIA 2010 covert communication websites / Wayback Machine

Wayback Machine save screen shot

 0  0

Feature added in 2019 apparently: www.reddit.com/r/DataHoarder/comments/dj6ot5/you_can_now_save_a_screenshot_of_your_saved_pages/
github.com/ourbigbook/template/archive/refs/heads/master.zip
But TODO: how to access the screenshot afterwards?

Wayback Machine rate limit

 0  0

archive.org/details/toomanyrequests_20191110 says 15 archives / minute, but apparently aslo 15 retrievals per minutes on Wikipedia, after which 5 min blacklist. After that, you start getting some 429s, and after that, server refuses to connect at al.

CDX: no limits apparently, they might just throttle you? Made 10k requets on bash loop and was going fine. But not that if you get blacklisted by create/fetch requests blacklist, server fails to connect here as well.

archive.org/post/1055220/how-to-query-for-all-the-websites-that-end-in-combr
archive.org/details/WebArchiveDomainFiles only a random list with per-ccTLDs upon request of (paid presumably) partners. As of 2023 only contains the Netherlands: archive.org/details/Dotnl-2016-present-domains-in-wayback-domainyear-of-last-capture

Wayback Machine pages don't after you just finished archiving them

 0  0

Pages seem to take some time after they say they have "archived it" to when you can actually see what was archived.

Their system is that bad unsurprisingly.

Web archiving

Digital preservation

Archive.today (archive.is)

Creator of Archive.today (Denis Petrov of Archive.Is)

Internet Archive

Internet Archive Open Library

Hachette v. Internet Archive (2023)

Wayback Machine

Wayback Machine save screen shot

Wayback Machine rate limit

Search Wayback Machine by IP

Wayback Machine full text search

List all domains from the Wayback Machine

Wayback Machine pages don't after you just finished archiving them

Archive Team

 Ancestors (3)

 Incoming links (4)

 Discussion (0)

 Articles by others on the same topic (0)

 Discussion (0)  Subscribe (1)

 Discussion (0)