During the last months the world has changed. The internet has changed. This shift already began a while ago, LLMs entered the focus of public perception and now they are around every here and there. And there is a wide spread customer base around the world with a lot of people using it via an app on their mobile phones, and the competition between several companies is hard. Finally these companies are in the business of making money, not making the world better. Keep this in mind.
The LLMs need input data to learn from, and they acquire this content from the internet, especially from websites, without paying. The content is free as in free beer. The costs of content creation are externalized, which is a pretty neat business model. On the technical level this is the same method used by search enginge spiders aka web crawlers. In the context of LLM companies we use the label scraper with an explicitely unfriendly connotation. Why that?
- Some of the scrapers don't respect robots.txt and rate limits. This causes load and disturbs other users.
This can be summarized as the first level of evilness, being a Padawan of the Order of Assholes (POA). But there is more to learn for a real Sith apprentice:
- Some of the scrapers don't identify themselves properly - obviously those companies have already learned that they are not welcome and filtered.
- Some of the scrapers don't use the companies network as their IP address origin to avoid identification.
This is much better - why only annoy other people when there is a chance to increase the level of evilness. Now the second level is reached and you get the attention of a Sith Master. The next level a Sith Lord needs to know is blaming:
- Some of the scrapers don't just fake generic User Agent strings, they abuse User Agent IDs of other companies, e.g. pretending to be Googlebot.
This is an interesting behaviour since it might cover legal trademark issues. But we haven't yet reached the end of evilness. To become a Sith Master yourself, you need to learn the art of deception and make other people follow and help you, presumably without their knowledge.
- Some companies offer LLM apps for mobile phones and similar goodies to build a user base of volunteers. Those users and devices can be abused as a proxy scraper bot network, acting as Borg Collective and performing severe DDoS attacks.
Now we have got a new benchmark. I'm looking forward to the next level of escalation. Will it be enough to destroy the internet? Or will it continue and destroy the planet by energy consumption? The future will tell us...
If those companies enforce a survival war upon the internet, guess who has got a chance to survive at the end. Hint: it's not those companies. But there is one outstanding and conspicuous company among them all, Let's call it Chinesus Palpatinus Sidious Sauronus. People involved in the matter will have no difficulty to figure out which company I'm talking about. The final words are especially for you, CPSS:
Have you ever heard of the United System Administrator Corps (USAC)? If not, be patient. You are on their list. Always remember: We're Not Only Good Looking' - We'll Kick Your Ass Too.
The Few. The Proud. The Sysadmins.