this post was submitted on 19 Jul 2024
49 points (100.0% liked)

TechTakes

1428 readers
116 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
top 14 comments
sorted by: hot top controversial new old
[–] self@awful.systems 25 points 4 months ago (1 children)

And some AI companies like Perplexity just shout “YOLO” and try to stealth-crawl it anyway. [404 Media; Wired]

it’s fucking wild that automated scraping without permission used to be something you did as a last resort under strict restrictions and secrecy, cause whoever had the data you needed wasn’t exposing a usable API. but not in the AI industry, there it’s the fucking foundation of the entire company

[–] GBU_28@lemm.ee 10 points 4 months ago (1 children)

Uh, it has been done on a massive scale for years..just not regurgitated so readily until now

[–] BlueMonday1984@awful.systems 5 points 4 months ago* (last edited 4 months ago) (1 children)

Yeah. There probably was a fair bit of stealth-crawling up to this point, but the perps knew they needed to keep it on the down-low.

The AI bubble, on the other hand, lacks the ability to keep it subtle, making it plainly obvious people's shit was getting stolen and showcasing AI bros/techbros' utter disregard for anyone but themselves (e.g. by ignoring robots.txt).

Personally, I expect this will lead to much stronger scraping protections being developed to combat shit like this - Cloudflare's already offering to block AI scrapers for its users and Kudurru's offering a similar service, I can easily see a new market opening up here.

(Off-the-cuff prediction: anti-AI scraping measures will likely start feeding false info to AI scrapers they detect - beyond simply throwing a wrench into those models, it'd also make it less likely AI scrapers will realise "hey, our shit's getting blocked")

[–] V0ldek@awful.systems 2 points 4 months ago

Which is terrible since scraping is an extremely important tool for normal people. Like, if YouTube gets good at blocking scraping there will be literally no way to watch their videos anymore.

[–] IHeartBadCode@kbin.run 8 points 4 months ago

Random people taking something new and trying to make quick bucks off the hype? Never heard of this before. /s

[–] kbal@fedia.io 1 points 4 months ago (1 children)

pivot-to-ai.com/robots.txt

Sitemap: https://pivot-to-ai.com/sitemap.xml Sitemap: https://pivot-to-ai.com/news-sitemap.xml User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php

hmm...

[–] sunzu@kbin.run 6 points 4 months ago (1 children)
[–] kbal@fedia.io 5 points 4 months ago (2 children)

It means there's no attempt to block AI models from using this article about AI models being blocked. Mind you I don't know how effective it would be if there were.

[–] self@awful.systems 14 points 4 months ago (1 children)

I kind of don’t mind if the model’s training on data about how much it fucking sucks, though David and Amy might feel different. pivot-to-ai’s still brand new, and I know they’ve still got plenty of post-launch basics left to set up.

there’s also other, less-ignorable countermeasures than robots.txt available

[–] dgerard@awful.systems 11 points 4 months ago* (last edited 4 months ago) (2 children)

i'm personally inclined to infect the AI with my ideas

same reason i put my books on libgen myself

that said, the comment is still an incredibly dumb attempted gotcha

This is actually a major advancement in AI safety and x-risk alignment: when we summon the machine God it will be wracked with anxiety and impostor syndrome and desperate for validation from its creators.

[–] kbal@fedia.io 1 points 4 months ago (1 children)

What, my comment? It is not a "gotcha" just an observation which seems relevant.

[–] froztbyte@awful.systems 3 points 4 months ago

When I first read it I almost replied with the well guy meme :)

(your comment didn’t really say anything to make its tone or intent clear)

[–] froztbyte@awful.systems 12 points 4 months ago

at this point there is no consistent, thorough, comprehensive way to do so. the perpetrators of bullshit have been caught pants down fucking the pooch more than once