this post was submitted on 05 Jun 2023
141 points (100.0% liked)
Lemmy
12531 readers
9 users here now
Everything about Lemmy; bugs, gripes, praises, and advocacy.
For discussion about the lemmy.ml instance, go to !meta@lemmy.ml.
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You should use this relatively quiet time to migrate to a larger server, because when the time comes where you need to do it, you're going to be in for a world of hurt. This is the calm before the storm--take advantage of it.
Ultimately, you need to scale horizontally. You need to shard your database and separate out your different functions (database, front end, whatever back end applications you use, etc) onto different servers, all fronted by load balancers. That's going to be the only way to even begin to handle increasing load. If you don't have a small team of experienced engineers with a deep understanding of how to build for scale, and you get a sudden mass exodus of users from Reddit, you're fucked. So if I were you, here's what I'd do:
Scale up to the largest instance type you can. If possible, switch (at least temporarily) to AWS and use something in the c6i instance family, such as the c6id.32xlarge. Billing for AWS instances is done by the hour, so you wouldn't need to pay for an entire month up front if you only need that extra horsepower for a few days (such as when the blackouts are planned from the 12th through 14th).
Because the above will do nothing but buy you time until you crash--and if you get a huge spike of users, without horizontal scaling, you WILL crash--migrate your DNS to something like Cloudflare. From there, configure workers to respond when health checks to your site fail, so that users attempting to access the site can be shown a static page directing them to something like http://join-lemmy.org or someplace, instead of simply getting 5xx errors.
Once the hug of death is over, evaluate where you stand. Reduce your instance size, if you can, and start investigating what it's going to take to scale horizontally.
I'm not a SQL expert, but I am a principal network architect, and my day job for the last 15 years has been working on scale and automation for the world's largest companies, including 7 years spent at AWS. In my world, websites like Reddit, as large as they are, are still considered to be of 'average' size. I can't help you with database, but I'm happy to provide guidance around networking, DNS, scale, automation, security, etc.
I believe @ernest is just about to do a backend re-factor on https://kbin.social/ if you had the time and inclination a ticket here outlining some optimisations for horizontal scaling might be timely https://codeberg.org/Kbin/kbin-core