this post was submitted on 17 Jun 2023
128 points (100.0% liked)

Lemmy.World Announcements

29063 readers
1 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news ๐Ÿ˜

Outages ๐Ÿ”ฅ

https://status.lemmy.world/

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to info@lemmy.world e-mail.

Report contact

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

founded 1 year ago
MODERATORS
 

One of the side effects of the reddit meltdown is that many search results were unavailable because of communities going private. It would be great if we could fill in the void with lemmy content instead.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] topnomi@kbin.social 23 points 1 year ago (1 children)

I heard that reddit has a dedicated cdn each for Microsoft and Google scraping. That's why they work so well to search reddit posts. It will probably take some effort to feed data so we'll from the fediverse.

On that note, perhaps we should have some per-community as well as per-post scrape/noscrape toggle. Might be difficult to get buy-in from all parties.

[โ€“] Trebach@kbin.social 6 points 1 year ago

Whether a community gets to opt out of being scraped depends on the scraper respecting robots.txt and/or the meta tag of the page.

Not all do, particularly the ones scraping for SEO purposes, so instances might to add IP bans for scrapers that refuse to respect restrictions in those places.