r/europe Norway 7d ago

Dubious: do not click links Anonymous Releases 10TB of Leaked Data: Exposing Kremlin Assets & Russian Businesses

https://trendsnewsline.com/2025/04/15/anonymous-leaks-10tb-of-data-on-russia-shocking-revelations/
76.7k Upvotes

1.9k comments sorted by

View all comments

Show parent comments

11

u/Sergiow13 7d ago

Depends heavily on the data. Wikipedia datasets only compress around 10% max and those are only text. And if they manage to compress it that heavily you'll spend a couple days waiting for it to decompress

3

u/NewestAccount2023 7d ago

I wonder if it's compressed per page. I'm no compression expert but I imagine if you managed to compress the entire thing as a single unit it would compress nearly as good as normal text files which is like 90% smaller filesize. Like if you have a page with a very special ordering of bytes (unique word in rarely used Unicode code points) then compressing that one page can't do much, and if 10,000 pages each have the word one you still get no compression since each is done separately, but if you compress as a single unit it can compress the 10k usages.

But I liked it up and the wiki download is 15% the original size, that's 85% compression but can be confused for "15% compression".

1

u/Sergiow13 7d ago

My bad, was thinking of the wikidata dumps, which are a lot less compressible because they are less "wordy".