Proof of Work is a terrible solution because it assumes computational costs are significant expense for scrapers compared to proxy costs. It’ll never come close to costing the same as residential proxies and meanwhile every smartphone user will be complaining about your website draining their battery.
You can do something like only challenge data data center IPs but you’ll have to do better than Proof-of-Work. Canvas fingerprinting would work.
My issue with canvas fingerprinting and, well, any other fingerprinting is that it makes the situation even worse. It plays right into the hands of data brokers, and is something I’ve been heavily fighting against, and simply don’t visit any website that doesn’t work in my browser that’s trying hard not to be fingerprintable.
Just now there is an article on the front page of programming.net about how are data brokers boasting to have extreme amounts of data on almost every user of the internet. If the defense against bot will be based on fingerprinting, it will heavily discourage use of anti-fingerprinting methods, which in turn makes them way less effective - if you’re one of the few people who isn’t fingerprintable, then it doesn’t matter that you have no fingeprint, because it makes it a fingerprint in itself.
So, please no. Eat away on my CPU however you want, but don’t help the data brokers.
it assumes computational costs are significant expense for scrapers compared to proxy costs
The assumption is correct. PoW has been proven to significantly reduce bot traffic… meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.
Canvas fingerprinting would work.
Demonstrably false… people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks
The assumption is correct. PoW has been proven to significantly reduce bot traffic.
What you’re doing is filtering out bots that can’t be bothered to execute JavaScript. You don’t need to do a computational heavy PoW task to do that.
meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.
Correct, and thats why they are the number one expense for any scraping company. Any scraper that can’t be bothered to spin up a headless browser isn’t going to cough up the dough for residential proxies.
Demonstrably false… people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks
That’s not what “demonstrably false” even means. Canvas fingerprinting filters out bots better than PoW. What you’re complaining about too strict settings and some users being denied. Make your Anubis settings too high you’ll have users waiting long times while their batteries drain.
What you’re doing is filtering out bots that can’t be bothered to execute JavaScript. You don’t need to do a computational heavy PoW task to do that.
Most bots and scrapers from what I’ve seen already are using (headless) full browsers, and hence are executing javascript, so I think anything that slows them down or increases their cost can reduce the traffic they bring.
Canvas fingerprinting filters out bots better than PoW
Source? I strongly disagree, and it’s not hard to change your browser characteristics to get a new canvas fingerprint every time, some browsers like firefox even have built-in options for it.
@refalo@sudo If Proof of Work gets widely adopted I foresee a future where bot running data-centers can out-compute humans to visit sites, while old devices of users in poorer countries struggle to compute the required task for hours … Or is that fear misguided?
Proof of Work is a terrible solution because it assumes computational costs are significant expense for scrapers compared to proxy costs. It’ll never come close to costing the same as residential proxies and meanwhile every smartphone user will be complaining about your website draining their battery.
You can do something like only challenge data data center IPs but you’ll have to do better than Proof-of-Work. Canvas fingerprinting would work.
My issue with canvas fingerprinting and, well, any other fingerprinting is that it makes the situation even worse. It plays right into the hands of data brokers, and is something I’ve been heavily fighting against, and simply don’t visit any website that doesn’t work in my browser that’s trying hard not to be fingerprintable.
Just now there is an article on the front page of programming.net about how are data brokers boasting to have extreme amounts of data on almost every user of the internet. If the defense against bot will be based on fingerprinting, it will heavily discourage use of anti-fingerprinting methods, which in turn makes them way less effective - if you’re one of the few people who isn’t fingerprintable, then it doesn’t matter that you have no fingeprint, because it makes it a fingerprint in itself.
So, please no. Eat away on my CPU however you want, but don’t help the data brokers.
Hard disagree, because:
The assumption is correct. PoW has been proven to significantly reduce bot traffic… meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.
Demonstrably false… people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks
What you’re doing is filtering out bots that can’t be bothered to execute JavaScript. You don’t need to do a computational heavy PoW task to do that.
Correct, and thats why they are the number one expense for any scraping company. Any scraper that can’t be bothered to spin up a headless browser isn’t going to cough up the dough for residential proxies.
That’s not what “demonstrably false” even means. Canvas fingerprinting filters out bots better than PoW. What you’re complaining about too strict settings and some users being denied. Make your Anubis settings too high you’ll have users waiting long times while their batteries drain.
Most bots and scrapers from what I’ve seen already are using (headless) full browsers, and hence are executing javascript, so I think anything that slows them down or increases their cost can reduce the traffic they bring.
Source? I strongly disagree, and it’s not hard to change your browser characteristics to get a new canvas fingerprint every time, some browsers like firefox even have built-in options for it.
@refalo @sudo If Proof of Work gets widely adopted I foresee a future where bot running data-centers can out-compute humans to visit sites, while old devices of users in poorer countries struggle to compute the required task for hours … Or is that fear misguided?