this post was submitted on 08 Jun 2023
40 points (100.0% liked)

Asklemmy

43738 readers
1991 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 5 years ago
MODERATORS
 

Yes, I’m certain I could final answers to all these questions via research, but I’m coming here as part of the Reddit diaspora. My guess is that there’s a benefit to others like me to have this discussion.

I can vaguely understand the federation concept, the idea that my account is hosted at an individual Lemmy server and that other servers trust that one to validate my account. What’s the network flow like? I’m posting this to the lemmy.ml /asklemmy community, but I’m composing it on the sh.itjust.works interface. I’m assuming sh.itjust.works hands this over to lemmy.ml. How does my browsing work? Is all of my traffic routed through sh.itjust.works?

Assuming there’s a mass influx of redditors, what does it look like as things fail? I’m assuming some servers can keep up under the load and some can’t. If sh.itjust.works goes down under the load, can I still browse other servers? Or, do those servers think I should have some token from sh.itjust.works, because my cookies say I’m still logged in, and I can’t even do that?

Are there easy mechanisms to allow me to grab my post history?

I’m assuming most (all?) Lemmy servers are hosted in home labs? The idea of Lemmy excites me, but the growth pain that could be coming scares me. Anybody using a CDN in front of their servers? That could be good, but with unconstrained growth, that could be costly, which is very bad.

I can imagine lots of different worse case scenarios, but I’m curious what those of you who run servers imagine for the best case scenario? A manageable growth that just gets more vibrant communities, which can’t ever lead to the breadth and variety of Reddit?

Also, for those running servers, have any of you experienced issues during this growth? What scares you?

you are viewing a single comment's thread
view the rest of the comments
[–] phase_change@sh.itjust.works 5 points 1 year ago (2 children)

Thanks. That was an incredibly detailed response that answers the questions I was asking.

Doesn’t the fact that every Lemmy server has a copy of every federated post mean that if Lemmy takes off, only a few people with strong donation feeds can afford to survive?

If there’s an active forum (sub-lemmy?) on a server that has to spin down, the history stays on the remaining active ones, but I assume the only option is forking?

Moderation can only happen on the server hosting a forum, or each server can moderate posts in that server’s db?

[–] Hexorg@beehaw.org 6 points 1 year ago

Doesn’t the fact that every Lemmy server has a copy of every federated post mean that if Lemmy takes off, only a few people with strong donation feeds can afford to survive?

Yes I’ve seen something like that on mastodon already. Though the caching is scaled by time so you can just say to cache only last 24 hours (or less) which will scale down storage requirements.

[–] PriorProject@lemmy.world 6 points 1 year ago* (last edited 1 year ago)

Doesn’t the fact that every Lemmy server has a copy of every federated post mean that if Lemmy takes off, only a few people with strong donation feeds can afford to survive?

It's not precisely true that every Lemmy instance stores every post. A given Lemmy instance will store a given post if and only if:

  • A user on the storing instance is "interested" in that post. Being interested has a somewhat complicated definition that I myself don't fully understand, but two examples of being interested include being subscribed to the community BEFORE the post was made (importantly, the storing instance doesn't fetch the forever backlog of posts just because someone subscribes... rather... it asks to receive FUTURE posts), or a user searches a post by url.
  • Subject to some caching policy. I just learned this today from another comment, and maybe it doesn't apply to Lemmy, or doesn't apply yet... but we already know that the storing instance doesn't fetch historical posts going back forever. It could also decide to forget posts older than a certain time.

The first of these is most important though, because it means that posts and comments that no one is interested in don't get shipped around the federated network. And this leads to the property that the size/cost of a Lemmy instance is going to depend on the size of the "active" usage. A single user Lemmy instance subscribing to a handful of communities will always be small and cheap, because it doesn't subscribe to much content. A bigger Lemmy instance need not scale to the entirely of content in the lemmyverse, but rather to the "active set" of posts and comments its users interact with this month. That could get big, but what the Lemmy devs are saying (sorry no link, I've read too many posts lately to remember all my sources) is that user-traffic browsing the local DB of the Lemmy instance is dwarfing the replication load, which is great news because user browsing is much easier to optimize than federated replication.

If there’s an active forum (sub-lemmy?) on a server that has to spin down, the history stays on the remaining active ones, but I assume the only option is forking?

(FYI, the thing you subscribe to is called a community in Lemmy. Some folks say sublemmy, but this is a redditism that isn't used in the code or official docs. It's a "community", which is why the url for a community is hxxp://my.lemmy.social/c/mycommunity. The "c" in the middle stands for community.)

Well, we've already talked about caching and expiry. It's not clear to me than any Lemmy instance other than the one that hosts the community is required to keep the ENTIRE post/comment history (though yeah the active/recent ones will be all over the federated network).

I haven't lived through a major instance shutdown, maybe an old-timer can weigh in here. Speculating, I'd think there would be 2 options:

  • BEFORE the old instance shuts down (or using a third big community like [!lemmy@lemmy.ml](/c/lemmy@lemmy.ml) to coordinate), make a new community on a different server and the mods post telling everyone to subscribe there. The new community would be... well... new. It wouldn't have the old posts, it would be made from scratch. The only things that would bind it to the old community are the mods that come over, the users that follow them, and the culture.
  • Optionally, using database tricks, or a migration tool (that I don't think currently exists, but could almost certainly be created)... after doing all of the above... someone direct db access on both instances (aka an admin) could copy the the old posts to the new community. This might be a terrible idea for federation reasons, or it might be prohibitively complicated because of the db schema... but it feels to me like it COULD work. I'm not aware of it having been done before, or of any tooling that makes it "easier" for someone who isn't pretty strong at doing complex data migrations in Postgres.