this post was submitted on 19 Jun 2023
3 points (100.0% liked)

Fediverse

19 readers
2 users here now

This magazine is dedicated to discussions on the federated social networking ecosystem, which includes decentralized and open-source social media platforms. Whether you are a user, developer, or simply interested in the concept of decentralized social media, this is the place for you. Here you can share your knowledge, ask questions, and engage in discussions on topics such as the benefits and challenges of decentralized social media, new and existing federated platforms, and more. From the latest developments and trends to ethical considerations and the future of federated social media, this category covers a wide range of topics related to the Fediverse.

founded 2 years ago
 

I’d like to explore a project for NLP based search on the fediverse. But I’m a fediverse beginner and am not sure if it’s possible to index fediverse content.

My general idea is -

  1. Set up my own read-only instance, let’s say of kbin. I’m not sure if the concept of a read-only instance makes sense. It’s read-only because the instance only needs to be able to read the content already on the fediverse and doesn’t need the ability to post content.
  2. At some regular interval, let’s say once a day, monitor any changes in the content from the previous run. I’m not sure if there is a single “fediverse” where all the content can be read from. If not, then I can start with tracking the same content as on kbin.social. Is it possible to monitor changes to content on a kbin instance?
  3. I’ll convert the content into vector embeddings by a using an NLP ML model like CLIP. The embeddings will be stored in a vector store. The vector store will also include the url of the content as metadata.
  4. When a user requests a search, the search term is converted to its vector embedding using the same ML model and the most similar vectors are identified.
  5. The user gets the search results as urls of the most relevant content, and perhaps a preview of the content. The user can then access the full content from where it’s originally posted using its url.

I’m comfortable with setting up steps 3 and 4. But I do not know the fediverse enough to answer whether steps 1, 2, and 5 would work or even make sense how I’m envisioning them.

Can some of the fediverse veterans help me understand if this is a feasible approach or if I’ve got it all wrong?

you are viewing a single comment's thread
view the rest of the comments
[–] noodlejetski@kbin.social 2 points 1 year ago

prepare for a ton of instances defederating from yours on day 1.