this post was submitted on 26 Jul 2024
106 points (92.7% liked)

Fediverse

28731 readers
307 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS
 

All the posts about Reddit blocking everyone except Google and Brave got me thinking: What if SearNGX was federated? I.E. when data is retrieved via a providers API, that data is then federated to all other instances.

It would spread the API load out amongst instances, removing the API bottlenecks that come from search providers.

It would allow for more anonymous search, since users could cycle between instances and get the same results.

Geographic bias would be a thing of the past.

Other than ActivityPub overhead and storage, which could be reduced by federating text-only content, I fail to see any downside.

Thoughts?

you are viewing a single comment's thread
view the rest of the comments
[–] Max_P@lemmy.max-p.me 11 points 5 months ago

I ran a YaCy instance for a while like a decade ago. It does federate index requests, and when you search it propagates the search request across a bunch of nodes. When my node came online it almost immediately started crawling stuff and it did get a bunch of search queries. But the network was still pretty small back then and the search results were... not great. That's the price of independence from Google's and Microsoft's giant server farms, it's hard to compete with that size.

But at the rate Google and Bing are enshittifying, I think it's worth revisiting.

Using ActivityPub for this would be immensely wasteful. It's just not feasable that all instances would have the whole index because it's so large. Back when I tried it, the network still had several TBs worth of indexed pages. This is firmly in the realm of distributed P2P systems. One could have an ActivityPub plugin however to receive updates from social media near instantly and index those immediately with less overhead. But you still want to index wikipedia, forums, blogs, whatever the crawlers can find.