this post was submitted on 04 Feb 2024
430 points (88.9% liked)

memes

9680 readers
3219 users here now

Community rules

1. Be civilNo trolling, bigotry or other insulting / annoying behaviour

2. No politicsThis is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent repostsCheck for reposts when posting a meme, you can only repost after 1 month

4. No botsNo bots without the express approval of the mods or the admins

5. No Spam/AdsNo advertisements or spam. This is an instance rule and the only way to live.

Sister communities

founded 1 year ago
MODERATORS
 

Many "alternative" search engines are better for privacy, but they are still vulnerable to censorship, because they rely on g**gle and m*crosoft's indices for their search results. This isn't a deep-hidden secret either, many of them disclose what search index they use on the "about" page, for example:

There are still search engines that (claim to) maintain their own index. Most surprisingly, br*ve:

you are viewing a single comment's thread
view the rest of the comments
[–] Mojeek@lemmy.ml 3 points 7 months ago* (last edited 7 months ago)

if you look at the repo they give thanks to:

"The commoncrawl organization for crawling the web and making the dataset readily available. Even though we have our own crawler now, commoncrawl has been a huge help in the early stages of development."

There is nothing I can find which says how much of the index is CC and how much is their own; if there's a decent amount of CC, this is originally for researchers etc. it's not the best resource in the world for a search index: https://commoncrawl.org/

That being said, as an independent search engine, it's always good to see people take on the massive task of actually building an index, not becoming a proxy.