AI bots strain Wikimedia as bandwidth surges 50%

Post content hidden for low score. Show…

Dark Jaguar

Ars Legatus Legionis
11,758
"All your base, are belongs to us" - AI 2025

Truthfully, while I do want AI to make the lives of people better, I do not like how corporations are going about it.
This article is about how AI is making our lives WORSE.

AI isn't improving our lives at all, and in fact, I'm convinced that under capitalism, it can't, ever.
 
Upvote
154 (170 / -16)
In the past (around 2016 or so) Wikipedia got a significant decrease in server costs because so many people were now getting their facts from the Google knowledge panel, proxied by Google from Wikipedia without Wikipedia needing to serve it. But at the same time they saw a significant increase in the cost per user of Wikipedia because there was a bias among those still browsing the site towards logged-in editors, who get served a heavyweight page that costs more to produce. Anyway my point is that Wikimedia's costs have long been tied to the actions of major corporations, so perhaps they can work with said major corporations to distribute their media for free.
 
Upvote
31 (34 / -3)

muddledzen

Wise, Aged Ars Veteran
239
Subscriptor
Really hope we end up with a robust honeypot system as a first layer, like Cloudflare and others are working on, and then a way for places like Wikipedia and other FOSS projects to charge these companies an infrastructure tax to get out of the honeypot.

Of course that would require cooperation, laws, and general ethical behavior, and we're in a shortage of all three these days.
 
Upvote
68 (68 / 0)
Post content hidden for low score. Show…

case_ratchet

Ars Centurion
327
Subscriptor
I read the article.

I understand AI is the future. You can be mad about it, dislike it, speak out against AI, but AI is happening. I would rather more people got involved as opposed to just being upset and letting others define the future of AI.
I’m sick of AI inevitability rhetoric. No technology is inevitable. If it’s not broadly useful or efficient, or if it’s just a fucking bummer in the end, it can, in fact, go away.
 
Upvote
199 (206 / -7)
I’m sick of AI inevitability rhetoric. No technology is inevitable. If it’s not broadly useful or efficient, or if it’s just a fucking bummer in the end, it can, in fact, go away.

Hate to break it to you, but increased AI usage is also linked to decreased critical thinking skills. It’s literally making humanity stupider…hence, we are most likely screwed.
 
Upvote
91 (97 / -6)
Post content hidden for low score. Show…
Something, once invented, do not go away. AI is such a thing.

Even if all the corporations abandon AI, there would still be open source developments looking to advance AI. AI is not going anywhere. It has begun.
We've banned technology before, and we can do it again. For example, human cloning is outlawed. Gun control laws exist in many nations (though sadly not all), and of course, and nuclear weapon research is highly restricted by those who... well already have the nukes. Heck, in the 90's for a time corporations were limited in how many "bits" of encryption they could use. Now, will such laws prevent individual people from using AI at home? No, I'm not so naive as that of course, but will it stop large corporations from using it, as they'd be caught in an instant on every attempt as large as they are? Yes. It would. The whole "genie is out of the bottle" stuff isn't a literal fact of existence, but more a comment on geopolitical military realities.

And, did you actually CARE what the article said? You haven't commented at all on the content of the article, or acknowledged that this is a purely harmful case of it. What would you consider a "helpful" version of this kind of data scraping, that is to say, outright theft of authorship?

Remember, copyright law is meant to protect creators, and so if creators are having all their work digested and then spat back out in a billion permutations such that the original work can't even be singled out in the noise, wouldn't you say that creators are not being protected at all unless such use of their works is outlawed?
 
Upvote
70 (77 / -7)
This is dumb. A compressed copy of Wikipedia is only around 25gigs. They should download it once and reuse that.
I, spend one day and a few hundred USD for development time to dump Wikipedia once and ingest it.
VS
You spending millions?

Of course I am going to save the few hundred.

Honestly - I find it very unlikely that considering external costs cross any of the minds of the psychopatic leaders of the tech industry.
 
Upvote
41 (45 / -4)
I wonder how big an anti-bot fence needs to be around a site before you could make an argument that their site accessing is considered "hacking" by circumventing the measures and thus prompt better legal action than merely playing defensive against soft DDOSing.
So..its ok for AI companies to steal others works, but then charge you for prompts? No, that's not how it works, AI companies. Sites need firewalls that block and send cease complaints to the ISP and legal of those companies. "Its stealing. Public domain or not...because they are reselling access".
 
Upvote
44 (48 / -4)

muddledzen

Wise, Aged Ars Veteran
239
Subscriptor
Something, once invented, do not go away. AI is such a thing.

Even if all the corporations abandon AI, there would still be open source developments looking to advance AI. AI is not going anywhere. It has begun.
Lots of things have in fact gone away over the years because they ended up being nowhere near as broadly useful as their marketing hype suggested, or because other things made them expensive and irrelevant.

3D TV is one of the more recent examples.
 
Upvote
93 (95 / -2)
Lots of things have in fact gone away over the years because they ended up being nowhere near as broadly useful as their marketing hype suggested, or because other things made them expensive and irrelevant.

3D TV is one of the more recent examples.
I love watching theatrical 3D movies that are done right, with bright, real IMAX or even 4DX screens. But in the home, the glasses I’ve tried, active or passive, just aren’t good enough, dimness being one of the biggest issues.
 
Upvote
-8 (9 / -17)

traumadog

Ars Tribunus Angusticlavius
8,166
This article is about how AI is making our lives WORSE.

AI isn't improving our lives at all, and in fact, I'm convinced that under capitalism, it can't, ever.
Welcome to end-stage capitalism. Where corporate profits exceed any protection of public good in terms of motivation.
 
Upvote
26 (32 / -6)

DarthSlack

Ars Legatus Legionis
20,710
Subscriptor++
This is dumb. A compressed copy of Wikipedia is only around 25gigs. They should download it once and reuse that.

But that would require them to think through the problem. It's much easier to write a script that just scrapes any and all sites it can find and let the peasants worry about the bandwidth costs.
 
Upvote
45 (45 / 0)

PhilipStorry

Ars Scholae Palatinae
1,091
Subscriptor++
The AI companies want to use it? Let them pay for it. Especially as they are supposedly making money off of it.

AI is making money?

That's news to me. From everything I've seen, it's basically a money pit. Big companies are taking the losses simply because they want to be the last one standing.

The only way to make money with AI these days is to be writing or talking about it. The companies that actually develop and run it are very far away from any kind of sustainable business model with it.
 
Upvote
55 (58 / -3)
Post content hidden for low score. Show…

caramelpolice

Ars Scholae Palatinae
1,410
AI has improved my life. I work in AI (not this kind, I work for a large brick and mortar retailer) so it pays the bills, and CoPilot has genuinely made my life easier. I generate like half the work I turn in with LLMs. I know other non-ai programmers that feel the same way about CoPilot and other code-generating AIs.

I get it the haters here will say "only a matter of time before it replaces you" and that's unknowable so I won't comment on that, but I will just say, AI has both increased the amount of money I make, and reduced the amount of work I have to do. That's saying a lot.
plagiarism is generally more efficient than doing things yourself, yes
 
Upvote
67 (71 / -4)
I read the article.

I understand AI is the future. You can be mad about it, dislike it, speak out against AI, but AI is happening. I would rather more people got involved as opposed to just being upset and letting others define the future of AI.
Just like 3D, VR, metaverse, blockchain, cryptocoins and NFTs are the future?

Just because marketing is pushing AI as the future does not mean they'll make fetch happen.

After the bubble pops there probably will be niche uses that remain. AGI? Possibly only when it's running on quantum PCs powered by cold fusion.
 
Upvote
65 (67 / -2)
Lots of things have in fact gone away over the years because they ended up being nowhere near as broadly useful as their marketing hype suggested, or because other things made them expensive and irrelevant.

3D TV is one of the more recent examples.
Heck we don't even have to be obscure. CRTs just aren't being made anywhere anymore at all, and those DO have some some modern use cases, just not enough to outweigh the advantages of flat screen displays.
 
Upvote
19 (21 / -2)

CliffJerrison

Smack-Fu Master, in training
16
All this for bots that still don't know how to understand or prioritize the information so they'll still randomly make shit up. Companies are pouring so much gas in this car before they've figured out how to make the wheels circular.

But also: you can download Wikipedia. There's a bunch of mirrors that will give you the full contents of a recent version of the site. AI companies could download it once and use that, instead of crawling the site directly and putting a continuous strain on it. But that's, like, work I guess.

Anyway, I guess it's all worth it because AI Is The Future. Now pardon me, I need to take my self-driving car to the Hyperloop station while I buy some NFTs in the Metaverse.
 
Upvote
45 (46 / -1)

shahms

Smack-Fu Master, in training
51
Better coordination between AI developers and resource providers could potentially resolve these issues

They are actively ignoring the existing coordination mechanisms and clearly have no interest in resolving these issues. Anything which would moderate their data scraping is a barrier to their business model.
 
Upvote
28 (28 / 0)

caramelpolice

Ars Scholae Palatinae
1,410
Just like 3D, VR, metaverse, blockchain, cryptocoins and NFTs are the future?

Just because marketing is pushing AI as the future does not mean they'll make fetch happen.
A lot of rich, powerful, absolutely delusional billionaires who run or invest in tech companies are convinced that the future of humanity is essentially 'turning us all into AI models that live forever in the metaverse and use a blockchain-based economy'. They call this belief 'longtermism' and 'rationalism' despite it being an absurd fantasy that they will kill us all to pursue.
 
Upvote
36 (38 / -2)
I'm a little surprised that they don't appear to be using a CDN of any sort, like Cloudflare, Akamai, or even AWS. My employer uses Akamai, and they have had a list of known AI bots that we can block outright if we choose to do so, so that traffic would never even establish a connection to our origin. I'm willing to bet Cloudflare and many other CDN providers have similar capabilities.
 
Upvote
-4 (4 / -8)

Nathan2055

Ars Centurion
359
Subscriptor
It’s honestly bizarre the extent of just how bad the crawling problem has become.

Search engine crawlers were at least smart enough to consider how often pages are updated, and prioritize crawling pages that are actually updated more frequently. AI crawlers seemingly don’t do that, and have been seen crawling very infrequently viewed pages constantly.

They also haven’t bothered doing anything that might reduce their load on the web (or even their own outgoing bandwidth load) like ingesting local archives of data (like Wikipedia freely licensed and freely available database dumps) or even using sources like Common Crawl to reduce the amount of infrastructure strain they’re causing.

Given that open source AI projects have been doing some pretty impressive things by significantly reducing the amount of training data and focusing on improving the training process instead, and we’re seeing Chinese commercial AI companies with limited access to latest-generation hardware start to adopt similar strategies to pretty extreme success (see: DeepSeek), I really have a hard time believing that this level of mass crawling is even necessary for AI development.

It would honestly be really funny if the end result of the hardware export bans and sites locking down programmatic access is that the smaller, more efficient AI companies wind up beating out the giants who’s only strategy seems to be to Ctrl+S the planet and hope they figure something out later.

It reminds me of all of controversy and legislation over fetal stem cells only for scientists to eventually find out that adult stem cells are actually super easy to harvest and way more versatile than fetal stem cells anyway. In the end, none of the debates were even necessary and just resulted in negative publicity that the field still hasn’t recovered from to this day.

I think that AI mass crawling will probably end up the same way when all of this is said and done. It’s just frustrating that we have to watch the Internet get destroyed first.
 
Upvote
47 (47 / 0)

Resolute

Ars Tribunus Angusticlavius
7,286
Just like 3D, VR, metaverse, blockchain, cryptocoins and NFTs are the future?

Just because marketing is pushing AI as the future does not mean they'll make fetch happen.

After the bubble pops there probably will be niche uses that remain. AGI? Possibly only when it's running on quantum PCs powered by cold fusion.
There's also the fact that in their ignorance, that user is conflating LLMs with AI. AI, broadly, will have valuable adds across many industries.

LLMs are going to be the 3D TV - something overhyped and almost completely useless for 99% of the population.
 
Upvote
32 (35 / -3)