@rtxn

rtxn@lemmy.world · 1 day ago

New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. https://www.theregister.com/2025/08/21/ai_crawler_traffic/

Anubis’ developer was interviewed and they posted the responses on their website: https://xeiaso.net/notes/2025/el-reg-responses/

In particular:

Fastly’s claims that 80% of bot traffic is now AI crawlers

In some cases for open source projects, we’ve seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It’s nuts.

So, yeah. If we believe Xe, OOP’s article is complete hogwash.

rtxn@lemmy.world · 2 days ago

That’s why the developer is working on a better detection mechanism. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/

rtxn@lemmy.world · edit-2 2 days ago

With how much authority you wrote with before, I thought you’d be able to grasp the concept. I’m sorry I assumed better.

rtxn@lemmy.world · 2 days ago

THEN (and this is the part you don’t seem to understand) the client process has to waste time solving the challenge, which is, by the way, orders of magnitudes lighter on the server than serving the actual meaningful content, or cancel the request. If a new request is sent during that time, it will still have to waste time solving the challenge. The scraper will get through eventually, but the challenge delays the response and reduces the load on the server because while the scrapers are busy computing, it doesn’t have to serve meaningful content to them.

rtxn@lemmy.world · edit-2 2 days ago

It’s not client-side because validation happens on the server side. The content won’t be displayed until and unless the server receives a valid response, and the challenge is formulated in such a way that calculating a valid answer will always take a long time. It can’t be spoofed because the server will know that the answer is bullshit. In my example, the server will know that the prime factors returned by the client are wrong because their product won’t be equal to the original semiprime. Delegating to a sub-process won’t work either, because what’s the parent process supposed to do? Move on to another piece of content that is also protected by Anubis?

The point is to waste the client’s time and thus reduce the number of requests the server has to handle, not to prevent scraping altogether.

rtxn@lemmy.world · edit-2 2 days ago

That’s the great thing about Anubis: it’s not client-side. Not entirely anyways. Similar to public key encryption schemes, it exploits the computational complexity of certain functions to solve the challenge. It can’t just say “solved, let me through” because the client has to calculate a number, based on the parameters of the challenge, that fits certain mathematical criteria, and then present it to the server. That’s the “proof of work” component.

A challenge could be something like “find the two prime factors of the semiprime 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139”. This number is known as RSA-100, it was first factorized in 1991, which took several days of CPU time, but checking the result is trivial since it’s just integer multiplication. A similar semiprime of 260 decimal digits still hasn’t been factorized to this day. You can’t get around mathematics, no matter how advanced your AI model is.

rtxn@lemmy.world · 2 days ago

The developer is working on upgrades and better tools. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/

rtxn@lemmy.world · edit-2 2 days ago

The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.

The purpose is to reduce the flood to a manageable level, not to block every single scraper request.

rtxn@lemmy.world · edit-2 3 days ago

You can go with Debian (or Devuan), easily. My home server is running Proxmox on metal (Debian-based itself), virtualized OPNSense, and multiple Debian containers on an i3-4150 and previously 4GB RAM (now mis-matched 10GB).

rtxn@lemmy.world · 10 days ago

Bro is also concerned about attacks on exposed well-known ports, in which case bro can use Tailscale Funnel to expose a service without exposing a port. Besides, bro can make up bro’s own mind.

rtxn@lemmy.world · 10 days ago

Consider Tailscale. It’s a mesh VPN based on Wireguard that uses a hosted service to manage keys and devices. It works without having to expose any ports on the firewall, and can expose a service through a relay server.

Some people will say that you shouldn’t trust it because company bad, but you should give it a try and make up your own mind. If you’re feeling adventurous, you can install Headscale on a VPS to serve as a control server.

rtxn@lemmy.world · edit-2 10 days ago

I can’t believe it. The incident has actually been reported!

I don’t use Caddy, but it seems like it tried to generate and write a TLS certificate into /usr/local, but didn’t have the necessary permissions. Basically it tried to use sudo tee ... to write a file. Is Caddy running in a container? If it is, you might need to create a volume at /usr/local/share/ca-certificates. If not in a container, you’ll need to grant the caddy user write permissions in that directory.

But to answer your question directly, it’s not a cause for concern. You’re not getting hacked, it’s just a configuration error.

rtxn@lemmy.world · edit-2 13 days ago

Do not do that. You need to set up VLANs and proper separation between them on both the switch and the router, assuming the switch even supports tagged trunk lines. If you don’t, you’re just connecting all of your hosts to the unfiltered internet.

rtxn@lemmy.world · 13 days ago

Yes, that will be enough. You can also use a single port on the NIC and the one on the motherboard if it can handle the ethernet speed you want.

This is my network setup on Proxmox:

vmbr0 is a bridge that has a single port going to the modem. The OPNSense VM’s first virtual interface is connected to this and configured as a WAN interface. Nothing else connects to this bridge as it is exposed to the internet.

vmbr1 also has a single port that goes to the physical switch. OPNSense’s second interface connects to it as a LAN port, as well as every other VM and container running on the server.

rtxn@lemmy.world · edit-2 13 days ago

You can use OPNSense inside a virtual machine. You can use QEMU or install the Proxmox toolkit over Debian to manage it. I’ve been using this setup for years without issue.

You’ll have to create a bridge network for the WAN and the LAN interface, connect them to the VM, then configure the virtual interfaces inside OPNSense.

rtxn@lemmy.world · 17 days ago

Please share those options, don’t keep them secret.

rtxn@lemmy.world · 17 days ago

PVE running on a pile of e-waste. Most of the parts are leftovers from my parents’ old PC that couldn’t handle Win10. Proxmox loves it. Even the 10GB mis-matched DDR3 memory. The only full VM is OPNSense (formerly pfSense), everything else runs inside Debian containers. It only struggles when Jellyfin has to transcode something because I don’t have a spare GPU.

rtxn@lemmy.world · edit-2 17 days ago

+1 for OMV. I use it at work all the time to serve Clonezilla images through an SMB share. It’s extremely reliable. The Clonezilla PXE server is a separate VM, but the toolkit is available in the clonezilla package, and I could even integrate the two services if I felt particularly masochistic one day.

My first choice for that role was TrueNAS, but at the time I had to use an old-ass Dell server that only had hardware RAID, and TrueNAS couldn’t use ZFS with it.

rtxn@lemmy.world · 24 days ago

Still don’t understand why they cut the fleeb.

rtxn@lemmy.world · edit-2 26 days ago

Anti-intellectualism as a defence. Nice. Abandon grammar and take language where an LLM can’t follow. Surely we’ll be able to tell a text written by a human from one written by a machine if the human writes it like a dumbass, I can’t see anything wrong with that. i mean why even use proper punctuation and capitalization an ai wouldnt write sumthin like dis isnnnt it better you can tell a human wrote dis

Have I made my point?