kbin.social disallows crawlers to index the site. And honest crawlers will honor that.
https://kbin.social/robots.txt
This actually seems to be the standard configuration that kbin ships with, so most instances will have that in place.
### About Community Tracking and helping #redditmigration to Kbin and the Fediverse. Say hello to the decentralized and open future. To see latest reeddit blackout info, see here: https://reddark.untone.uk/
kbin.social disallows crawlers to index the site. And honest crawlers will honor that.
https://kbin.social/robots.txt
This actually seems to be the standard configuration that kbin ships with, so most instances will have that in place.
This seems like a roadblock to people discovering any community here. It removes the Reddit effect, where adding 'Reddit' to a search led to non-ad posts with first-person relevance, and eliminates the easiest way for someone who's not part of the federated universe to find the federated universe.
Counterproductive, seems to me.
@raphael Why is that? To prevent the site and posts becoming ad focused? I mean I can see that as a benefit. But on the other side, content that is not indexed by search machines is not much different from Discord. I always criticized Discord content for not being indexed, when arguing with my bro, and tell why Reddit (or any other open platform) is much better to post. Honestly, kbin disallowing crawlers to index the site is a big blow to me! I don't like that at all.
🤷♂️
It is just a decision that every instance owner can make for themselves (if they are aware of it).
It will be a huge headache for search engines anyways, all posts are basically replicated across all instances and look local to a search engine. So for a single post it will have hundreds of copies in its database and probably outputting all of them as results (for now).
Is it possible/reasonable to have some sort of a fediverse-encompassing api for search engines that would help index only the original threads? A separate instance maybe? Or is it going to stay as is?
@fearout It just occurred to me that all you need is your own server and you just need to index that server only. It basically gets data from all other instances through the standard activityPub protocol. It works differently than traditional crawlers, but the outcome is the same.
The search engines are going to have to deal with that. However you can provide context in the instance in the form of a canonical URL, to tell a search engine where content originated.
@raphael I didn't know the instances would copy the messages. Interesting! I think search engines need to be redesigned to respect robots of the origin instance then. If they are not designed for this, it surely looks local. That's kind of a mess then, from search engine perspective.
Strange enough, if I search with my search engine based on SearXNG the terms "final fantasy site:kbin.social", then it finds a few links. They are only based on tags or person, not the actual content. So maybe use tags, if you want to get indexed anyway.
Thanks, I totally forgot that robots.txt existed. Since you are replying, do you know the difference between badges and tags?
I wonder if there some way for instances to use the canonical tag to point to the original and make it less annoying for search engines.
That said, I guess id rather see it crawled and let search engines figure it out then have it not crawled at all. Not really sure where that feedback goes though.
"Tags" are similar to twitter hashtags, you can use them to categorize a post. For example, if you create a post about Android you can add the tag "android" (without "#"). When you manage a magazine you can add one or more tags (always without "#") to see posts of the fediverse with that tag.
I have no idea about "badges" :)
Wow I wish this were changed. Lots of good content to be discovered here.
On one hand, I support a strong robots.txt being in place. It keeps data from being used by honest engines (though what constitutes "honesty" varies). But at the same time, indexing and caching is how we can grow. If you want the site to grow, you want to get it to 1st position on first SERP.
It's a tricky balance.
As far as I am aware, Badges are intended to be similar to reddit's post flair system? They don't seem to do anything at the moment though.
Community moderators can set up badges but even with those in place you can't see them when making a post.