• 5opn0o30@lemmy.world
    link
    fedilink
    arrow-up
    21
    arrow-down
    5
    ·
    3 months ago

    Wow. A lot of cynicism here. The AI bots are (currently) honoring robots.txt so this is an easy way to say go away. Honeypot urls can be a second line of defense as well as blocking published IP ranges. They’re no different than other bots that have existed for years.

    • digdilem@lemmy.ml
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      2 months ago

      In my experience, the AI bots are absolutely not honoring robots.txt - and there are literally hundreds of unique ones. Everyone and their dog has unleashed AI/LLM harvesters over the past year without much thought to the impact to low bandwidth sites.

      Many of them aren’t even identifying themselves as AI bots, but faking human user-agents.