The Web Is Broken – Botnet Part 2

(jan.wildeboer.net)

Comments

aorth 20 April 2025
In the last week I've had to deal with two large-scale influxes of traffic on one particular web server in our organization.

The first involved requests from 300,000 unique IPs in a span of a few hours. I analyzed them and found that ~250,000 were from Brazil. I'm used to using ASNs to block network ranges sending this kind of traffic, but in this case they were spread thinly over 6,000+ ASNs! I ended up blocking all of Brazil (sorry).

A few days later this same web server was on fire again. I performed the same analysis on IPs and found a similar number of unique addresses, but spread across Turkey, Russia, Argentina, Algeria and many more countries. What is going on?! Eventually I think I found a pattern to identify the requests, in that they were using ancient Chrome user agents. Chrome 40, 50, 60 and up to 90, all released 5 to 15 years ago. Then, just before I could implement a block based on these user agents, the traffic stopped.

In both cases the traffic from datacenter networks was limited because I already rate limit a few dozen of the larger ones.

Sysadmin life...

hubraumhugo 20 April 2025
We all agree that AI crawlers are a big issue as they don't respect any established best practices, but we rarely talk about the path forward. Scraping has been around for as long as the internet, and it was mostly fine. There are many very legitimate use cases for browser automation and data extraction (I work in this space).

So what are potential solutions? We're somehow still stuck with CAPTCHAS, a 25 years old concept that wastes millions of human hours and billions in infra costs [0].

How can enable beneficial automation while protecting against abusive AI crawlers?

[0] https://arxiv.org/abs/2311.10911

zahlman 19 April 2025
> I am now of the opinion that every form of web-scraping should be considered abusive behaviour and web servers should block all of them. If you think your web-scraping is acceptable behaviour, you can thank these shady companies and the “AI” hype for moving you to the bad corner.

I imagine that e.g. Youtube would be happy to agree with this. Not that it would turn them against AI generally.

Quarrel 20 April 2025
FWIW, Trend Micro wrote up a decent piece on this space in 2023.

It is still a pretty good lay-of-the-land.

https://www.trendmicro.com/vinfo/us/security/news/vulnerabil...

aucisson_masque 19 April 2025
It's interesting but so far there is no definitive proof it's happening.

People are jumping to conclusions a bit fast over here, yes technically it's possible but this kind of behavior would be relatively easy to spot because the app would have to make direct connections to the website it wants to scrap.

Your calculator app for instance connecting to CNN.com ...

iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.

Android by Google doesn't have such a useful feature of course, but you can run third party firewall like pcapdroid, which I recommend highly.

Macos (little snitch).

Windows (fort firewall).

Not everyone run these app obviously, only the most nerdy like myself but we're also the kind of people who would report on app using our device to make, what is in fact, a zombie or bot network.

I'm not saying it's necessarily false but imo it remains a theory until proven otherwise.

jeroenhd 19 April 2025
> So there is a (IMHO) shady market out there that gives app developers on iOS, Android, MacOS and Windows money for including a library into their apps that sells users network bandwidth

AKA "why do Cloudflare and Google make me fill out these CAPTCHAs all day"

I don't know why Play Protect/MS Defender/whatever Apple has for antivirus don't classify apps that embed such malware as such. It's ridiculous that this is allowed to go on when detection is so easy. I don't know a more obvious example of a trojan than an SDK library making a user's device part of a botnet.

Liftyee 19 April 2025
I don't know if I should be surprised about what's described in this article, given the current state of the world. Certainly I didn't know about it before, and I agree with the article's conclusion.

Personally, I think the "network sharing" software bundled with apps should fall into the category of potentially unwanted applications along with adware and spyware. All of the above "tag along" with something the user DID want to install, and quietly misuse the user's resources. Proxies like this definitely have an impact for metered/slow connections - I'm tempted to start Wireshark'ing my devices now to look for suspicious activity.

There should be a public repository of apps known to have these shady behaviours. Having done some light web scraping for archival/automation before, it's a pity that it'll become collateral damage in the anti-AI-botfarm fight.

karmanGO 19 April 2025
Has anyone tried to compile a list of software that uses these libraries? It would be great to know what apps to avoid
api 19 April 2025
This is nasty in other ways too. What happens when someone uses these B2P residential proxies to commit crimes that get traced back to you?

Anything incorporating anything like this is malware.

kastden 19 April 2025
Are there any lists with known c&c servers for these services that can be added to Pihole/etc?
__MatrixMan__ 19 April 2025
The broken thing about the web is that in order for data to remain readable, a unique sysadmin somewhere has to keep a server running in the face of an increasingly hostile environment.

If instead we had a content addressed model, we could drop the uniqueness constraint. Then these AI scrapers could be gossiping the data to one another (and incidentally serving it to the rest of us) without placing any burden on the original source.

Having other parties interested in your data should make your life easier (because other parties will host it for you), not harder (because now you need to work extra hard to host it for them).

reconnecting 19 April 2025
Residential IP proxies have some weaknesses. One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.

We are working on an open‑source fraud prevention platform [1], and detecting fake users coming from residential proxies is one of its use cases.

[1] https://www.github.com/tirrenotechnologies/tirreno

Pesthuf 19 April 2025
We need a list of apps that include these libraries and any malware scanner - including Windows Defender, Play Protect and whatever Apple calls theirs - need to put infected applications into quarantine immediately. Just because it's not directly causing damage to the device running the malware is running on, that doesn't mean it's not malware.
reincoder 20 April 2025
I work for IPinfo (a commercial service). We offer a residential proxy detection service, but it costs money.

If you are being bombarded by suspicious IP addresses, please consider using our free service and blocking IP addresses by ASN or Country. I think ASN is a common parameter for malicious IP addresses. If you do not have time to explore our services/tools (it is mostly just our CLI: https://github.com/ipinfo/cli), simply paste the IP addresses (or logs) in plain text, send it to me and I will let you know the ASNs and corresponding ranges to block.

at0mic22 19 April 2025
Strange the HolaVPN e.g. Brightdata is not mentioned. They've been using user hosts for those purposes for decades, and also selling proxies en masse. Fun fact they don't have any servers for the VPN. All the VPN traffic is routed through ... other users!
armchairhacker 19 April 2025
> I am now of the opinion that every form of web-scraping should be considered abusive behaviour and web servers should block all of them. If you think your web-scraping is acceptable behaviour, you can thank these shady companies and the “AI” hype for moving you to the bad corner.

Why jump to that conclusion?

If a scraper clearly advertises itself, follows robots.txt, and has reasonable backoff, it's not abusive. You can easily block such a scraper, but then you're encouraging stealth scrapers because they're still getting your data.

I'd block the scrapers that try to hide and waste compute, but deliberately allow those that don't. And maybe provide a sitemap and API (which besides being easier to scrape, can be faster to handle).

amiga-workbench 19 April 2025
What is the point of app stores holding up releases for review if they don't even catch obvious malware like this?
arewethereyeta 19 April 2025
I have some success in catching most of them at https://visitorquery.com
pton_xd 19 April 2025
I thought the closed-garden app stores were supposed to protect us from this sort of thing?
dspillett 20 April 2025
> So there is a (IMHO) shady market out there that gives app developers on iOS, Android, MacOS and Windows money for including a library into their apps that sells users network bandwidth.

This is yet another reason why we need to be wary of popular apps, add-ons, extensions, and so forth changing hands, by legitimate sale or more nefarious methods. Initially innocent utilities can be quickly coopted into being parts of this sort of scheme.

greesil 19 April 2025
How would I know if an app on my device was doing this?
hinkley 20 April 2025
When the enshitification initially hit the fan, I had little flashbacks of Phil Zimmerman talking about Web of Trust and amusing myself thinking maybe we need humans proving they're humans to other humans so we know we aren't arguing with LLMs on the internet or letting them scan our websites.

But it just doesn't scale to internet size so I'm fucked if I know how we should fix it. We all have that cousin or dude in our highschool class who would do anything for a bit of money and introducing his 'friend' Paul who is in fact a bot whose owner paid for the lie. And not like enough money to make it a moral dilemma, just drinking money or enough for a new video game. So once you get past about 10,000 people you're pretty much back where we are right now.

rsedgwick 19 April 2025
I think tech can still be beautiful in a less grandiose and "omniparadisical" way than people used to dream of. "A wide open internet, free as in speech this, free as in beer that, open source wonders, open gardens..." Well, there are a lot of incentives that fight that, and game theory wins. Maybe we download software dependencies from our friends, the ones we actually trust. Maybe we write more code ourselves--more homesteading families that raise their own chickens, jar their own pickled carrots, and code their own networking utilities. Maybe we operate on servers we own, or our friends own, and we don't get blindsided by news that the platforms are selling our data and scraping it for training.

Maybe it's less convenient and more expensive and onerous. Do good things require hard work? Or did we expect everyone to ignore incentives forever while the trillion-dollar hyperscalers fought for an open and noble internet and then wrapped it in affordable consumer products to our delight?

It reminds me of the post here a few weeks ago about how Netflix used to be good and "maybe I want a faster horse" - we want things to be built for us, easily, cheaply, conveniently, by companies, and we want those companies not to succumb to enshittification - but somehow when the companies just follow the game theory and turn everything into a TikToky neural-networks-maximizing-engagement-infinite-scroll-experience, it's their fault, and not ours for going with the easy path while hoping the corporations would not take the easy path.

yungporko 19 April 2025
it's funny, i've never heard of or thought about the possibility of this happening but actually in hindsight it seems almost too obvious to not be a thing.
neilv 19 April 2025
Couldn't Apple and Google (and, to a lesser extent, Microsoft) pretty easily shut down almost all the apps that steal bandwidth?
panny 19 April 2025
>Apple, Microsoft and Google should act.

Do nothing, win.

They are the primary benefactors buying this data since they are the largest AI players.

panstromek 19 April 2025
I'd expect this to be against app store and google play rules, they are very picky.
matheusmoreira 19 April 2025
"Peer-to-business network"! Amazing. uBlock Origin gets rid of this, right?
_ink_ 20 April 2025
How can I detect such behaviour on my devices / in my home network?
theteapot 19 April 2025
Are ad blockers like AdBlock, uBlock effective against these?
proxy_err 19 April 2025
Its a fair point but very dynamic to sort out. This needs a full research team to figure out. Or you know.. all of us combined!! It is definitely a problem.

TINFOIL: Sometimes I always wondered if Azure or AWS used bots to push site traffic hits to generate money... they know you are hosted with them.. They have your info.. Send out bots to drive micro accumulation. Slow boil..

badmonster 19 April 2025
do you think there’s a realistic path forward for better transparency or detection—maybe at the OS level or through network-level anomaly detection?
y42 20 April 2025
Let me get this straight: we want computers knowing everything, to solve current and future problems, but we don't want to give them access to our knowledge?
jt2190 19 April 2025
I’m really struggling to understand how this is different than malware we’ve had forever. Can someone explain what’s novel about this?
jgalt212 20 April 2025
I blame the VCs. They don't stop, and implicitly encourage, website-crushing scrapers among their funded ventures.

It's not a crime if we do it with an app

https://pluralistic.net/2025/01/25/potatotrac/#carbo-loading

jonplackett 19 April 2025
How is this not just illegal? Surely there’s something in GDPR that makes this not allowed.
vlan121 19 April 2025
when the shit hits the fan, this seems like the product.
ChrisMarshallNY 19 April 2025
> So if you as an app developer include such a 3rd party SDK in your app to make some money — you are part of the problem and I think you should be held responsible for delivering malware to your users, making them botnet members.

I suspect that this goes for many different SDKs. Personally, I am really, really sick of hearing "That's a solved problem!", whenever I mention that I tend to "roll my own," as opposed to including some dependency, recommended by some jargon-addled dependency addict.

Bad actors love the dependency addiction of modern developers, and have learned to set some pretty clever traps.

156287745637 20 April 2025
AI scrapers and "sneaker bots" are just the tip of the iceberg. Why are all these entities concentrated and metastasizing from just a few superhubs? Why do they look, smell and behave like state-level machinery? If you've researched you'll know exactly what I'm talking about.

Unless complicit, tech leaders (Apple Google Microsoft) have a duty to respond swiftly and decisively. This has been going on far too long.

gpi 20 April 2025
"Infatica is partnered with Bitdefender, a global leader in cybersecurity, to protect our SDK users from malicious web traffic and content, including infected URLs, untrusted web pages, fraudulent and phishing links, and more."

That's not good.