The Usenet Text Archive goes online (300 gigs, approx. 300+ million posts and growing) - DataHoarder

Hey folks,

someone on Usenet forum recommended me to post this note here as well...

I've just opened https://UsenetArchives.com - Free of charge and free of advertising access to archives of BIG-8 and other Usenet text newsgroups.

Currently, the database has about 300 GB, the archive contains close to 300 million posts (works out to about 1 million posts per gig) and I am adding anywhere between 1-10 million new posts a day. At the moment, there are about 9 thousand newsgroups available to access, but this will go to approx. 50-60 thousand text groups when I am done. The total expected size of the archive is to be around 1-2 billion posts once the whole archiving is completed. To see the real-time collection stats, visit the Stats page.

At the moment, I am synchronizing primarily the BIG-8 groups, you can access them with these URLs: Comp - Humanities - Misc - News - Rec - Sci - Soc - Talk, and recently also added Alt and Microsoft groups.

I wanted to do this before it is lost to the next generations. Of course, I am aware that Google Groups do this, but not all of the newsgroups are there, and no one can be sure when it'll go away.

To access a specific newsgroup, select a topic of your interest through the 'Categories' menu on https://UsenetArchives.com and then navigate to the discussion threads and posts you like.

Note: At the moment, I am throttling the access to only those threads with at least 5 replies. This is mainly to make sure that the bot indexing traffic doesn't kill all my bandwidth, because it's a lot of text :). But even with this limit in place, there are tens of millions of discussions already visible to everyone online. The 5+ reply limit will go away likely towards the end of this year.

If you have any ideas on how to improve it, please do let me know. I am doing all coding on the backend and frontend myself, so bear with me, especially when it comes to the UI, which is still in alpha stages.

Enjoy and if you have any comments, please let me know.

https://www.reddit.com/r/DataHoarder/comments/in7rd8/the_usenet_text_archive_goes_online_300_gigs/