How Twitter is (probably) crawling the Internet for AI
As a bored sysadmin you might SSH into one of your servers and run htop or tail -f access.log.
Last Sunday was one of those tail -f access.log days.
You take a cursory look at your web server logs slowly scrolling past and see a bunch of requests come in. They look a bit odd.
142.147.166.136 - - [XX/Dec/2025:XX:XX:XX +0000] "GET /some/path2 HTTP/1.1" 200 1953 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3" "-" 104.37.31.15 - - [XX/Dec/2025:XX:XX:XX +0000] "GET /some/path3 HTTP/1.1" 200 2091 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "-" 23.226.210.75 - - [XX/Dec/2025:XX:XX:XX +0000] "GET /some/path4 HTTP/1.1" 200 1786 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "-" 170.23.18.148 - - [XX/Dec/2025:XX:XX:XX +0000] "GET /some/path6 HTTP/1.1" 200 2415 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5" "-" 8.160.39.134 - - [XX/Dec/2025:XX:XX:XX +0000] "GET /some/path1 HTTP/1.1" 200 83209 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36" "-" 85.254.140.83 - - [XX/Dec/2025:XX:XX:XX +0000] "GET /some/path5 HTTP/1.1" 200 3441 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3" "-"
Trailer/93.3.8652.5"?Investigating bot traffic
Plugging the last thing into Google doesn't really return much useful, except for a single German blog post [1] about a single IP with clearly bot-like behavior cycling through multiple user-agents, which are very similar to ours.
Now "stealth" web bots are not a new phenomenon by any means. But given a good clue I decided to investigate a bit more.
Having established that "Trailer" is apparently not part of any normal user-agent
header I grep my logs for \sTrailer/[0-9]+. This matches more than 10000 requests
over the last month. Huh.
Conversely, the spectrum of pages visited is extremely wide. More than 80% of URLs were visited only exactly once with this user agent. Notably including some very old stuff that usually nobody looks at. The IP addresses paint a similar picture. 91% of IPs make just one request and never visit again.
For a more useful overview you should plug the IPs into your favorite IP-lookup tool and check which networks (AS) they belong to, which I did. The most common ones are as follows:
2517 "AS212238 Datacamp Limited" 2210 "AS9009 M247 Europe SRL" 2090 "AS3257 GTT Communications Inc." 903 "AS210906 UAB \"Bite Lietuva\"" 882 "AS203020 HostRoyale Technologies Pvt Ltd" 723 "AS62874 Web2Objects LLC" 622 "AS7979 Servers.com, Inc."
If these do not strike you as ISPs you usually use at home, you'd be right. They're datacenters and carriers.
The bottom of the list however turned out to be much more interesting:
4 "AS202914 Adeo Datacenter ApS" 4 "AS63179 Twitter Inc." 1 "AS44244 Iran Cell Service and Communication Company"
Twitter???
I initially assumed this to be a false-positive and went back to unprocessed logs to discover that - no - Twitter had indeed visited with this peculiar user agent. Specificially, from 69.12.56.x, 69.12.57.x and 69.12.58.x.
I took a step back and searched the logs for all request from this IP network. The result looked as follows:
69.12.59.37 - - [06/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.3" "-" 69.12.56.29 - - [06/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.3" "-" 69.12.59.31 - - [06/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 OPR/117.0.0.0" "-" 69.12.59.19 - - [06/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "-" 69.12.57.39 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.3" "-" 69.12.59.33 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1" "-" 69.12.59.36 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.3" "-" 69.12.56.7 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3" "-" 69.12.56.34 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.3" "-" 69.12.58.21 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3" "-" 69.12.58.1 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5" "-" 69.12.56.17 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1" "-" 69.12.58.47 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5" "-" 69.12.58.4 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3" "-" 69.12.56.47 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1" "-" 69.12.59.24 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3" "-" 69.12.57.2 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5" "-" 69.12.57.24 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "-" 69.12.59.25 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "-" 69.12.56.44 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1" "-" 69.12.56.11 - - [07/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0" "-" 69.12.56.36 - - [08/Nov/2025:XX:XX:XX +0000] "..." XXX X "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5" "-"
The exact same bot traffic pattern. Never seen again from this network, except on these three days.
The immediate question of "What are they doing this for?" is best answered by performing traceroute to one of the IPs (this was a coincidental finding).
Loss% Snt Last Avg Best Wrst StDev 5.|-- ae1-0.0001.prrx.02.dus.de.net.telefonica.de (62.53.13.223) 0.0% 2 7.3 7.6 7.3 7.8 0.4 6.|-- ae3-0-grtdusix1.net.telefonicaglobalsolutions.com (213.140.51.62) 0.0% 2 7.5 7.5 7.5 7.6 0.0 7.|-- 94.142.107.133 0.0% 2 9.0 8.5 8.1 9.0 0.7 8.|-- ??? 100.0 2 0.0 0.0 0.0 0.0 0.0 9.|-- X.AI-LLC.ear4.Atlanta2.Level3.net (4.14.249.150) 0.0% 2 122.5 122.7 122.5 122.8 0.2 10.|-- ??? 100.0 2 0.0 0.0 0.0 0.0 0.0
While AS63179 or 69.12.56.0/21 are only generally attributable to Twitter Inc,
their carrier (Level 3) [3] has helpfully indicated in the reverse DNS that
the company they're routing this particular network for is X.AI LLC.
So let's collect the theory so far:
X.AI is doing web scraping presumably to train Grok
their bots pretend to be browsers (badly) and do not follow robots.txt
they're using some kind of proxy, VPN or similar network
on three days in November they slipped up and used their own IP space
Connecting the dots
Now what's so special about all those other IPs visiting us?
Staring at the list of networks isn't going to help so it's time for some better tools. I started querying Spur for these IPs and this painted a very clear picture:
|
|
|
Oxylabs is apparently one of the largest proxy network providers. Basically, you pay them money and they let you use one of their many IPs to visit websites. They even advertise their services for use with AI, or for web scraping.
I do not want to go into legal questions, but while there are innocent use cases such as product price monitoring, any proxy network will inevitably be used for scalping, spamming and fraud. No matter how well-intentioned, the operator of a proxy network can hardly prevent this.
(In my opinion web scraping doesn't go into the "innocent" category either, but it's not as unequivocally bad as the other examples.)
Going back to the main topic: I revisited the top list of networks and found around 100 IPs from Verizon and Comcast on there too. Looking these up on Spur also indicates Oxylabs. So it appears that X.AI uses some percentage of residential proxies too.
To verify my discovery I googled to see if anyone else had connected Twitter or Grok to Oxylabs yet:

The single mention [4] I found (ironically on Twitter itself) was a tweet from August 2025 and provided a direct confirmation that Grok does web scraping with spoofed user agents through Oxylabs.
Headers
It was already clearly apparent just from the normal web server logs that their scraper isn't doing a good job of pretending to be a browser, but I wanted a closer look anyway.
Since the scraping was still ongoing I made some quick changes to my web server stack to capture the full request. A few hours later I had usable results and all requests with the suspicious user agent showed the same pattern:
GET /some/url HTTP/1.1 host: some.website accept-encoding: gzip, deflate, br accept-language: en-US,en;q=0.5 user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5 accept: */*
If you've done deeper browser request debugging before there's an immediate red flag:
Browsers never use Accept */* to fetch web pages [5].
Their header value generally includes some variations of text/html, as well as
SXG
(application/signed-exchange) on Chrome.
Also very indicative is the total lack of fetch metadata headers (Sec-Fetch-*),
which according to caniuse.com
are long supported on Chrome 134 and should be sent with every request.
If you want to learn more about metadata headers I can recommend this
excellent blog post [6] by Sicuranext.
If we wanted to dive deeper, some promising avenues would be TCP fingerprinting and SSL/TLS fingerprinting, but I decided against spending more time on this.
Conclusion
Before I end this article I want to note that you don't have to just trust me that Twitter was crawling from their own IPs in November. If you check AbuseIPDB you can find a bunch of reports for the relevant days [7] [8].
Today, we discovered by accident that Twitter (X.AI) is very likely running a hidden web scraping operation to feed their LLM models and using a major proxy network to hide their tracks.
Now this isn't a major breakthrough or anything, and you can bet that they're not the only ones doing this. Aggressive stealth crawlers are a common complaint of webmasters [9], and often attributed to LLMs. But it's nice to have circumstantial evidence on who is behind this particular bot traffic.


