sh.itjust.works Main Community

7705 readers

6 users here now

Home of the sh.itjust.works instance.

founded 1 year ago

MODERATORS

TheDude@sh.itjust.works

manifex@sh.itjust.works

biela@sh.itjust.works

imaqtpie@sh.itjust.works

558

Update on SJW upgrade to v19.3 (sh.itjust.works)

submitted 8 months ago* (last edited 8 months ago) by TheDude@sh.itjust.works to c/main@sh.itjust.works

123 comments fedilink hide all child comments

Hello sh.itjust.works community,

Many of you have been eager to get an update about when the sh.itjust.works instance will get it's upgrade to the latest version of lemmy. Here's a update along with a tentative timeline.

In December 2023 I purchased a new server for this community. It took me awhile but I eventually made the time to get it racked at the local datacenter. For the sysadmins lingering and those interested here are the specs:

Dual Xeon 2.9Ghz CPUs (32 cores total)
256GB ram
4 x 1TB SSD in raid 10 (with room to add 6 more disks)
10gbit networking

While I'm ready to proceed with the upgrade, I've decided to first migrate this instance over to the new hardware. Here are two reasons.

Those of you who have been around long enough may remember that I've been running this instance on "borrowed" unused resources that were available at the time. There are no more resources available for this instance to grow.
There are reports that the latest version of lemmy may use more resources. Given we are among the bigger instances, should I end up in a situation where I need to increase resources to keep things fast I'll be restricted.

Here's the tentative timeline:

Task	Date	Expected Downtime
Migration to new server	Tuesday February 27 2024 @ 8:00PM ET	90 Minutes
Upgrade to V19.3	Thursday February 29 2024 @ 8:00PM ET	Up to 120 Minutes

If anything major goes wrong on the 27th I will revert back the changes and bring the instance back up on the current server.
If anything major goes wrong on the 29th I will revert back using an earlier snapshot. If that fails, I will restore from a backup.

During these two planned events those who want to provide moral support or who want to get periodic updates are more than welcome to join us on our matrix channel

=========================================================
Update February 29 2024
We've successfully completed the upgrade to v1.9.3. I'm happy to announce that we did it in an astonishing 27 minutes, a whole 93 minutes under what was expected. The extra leg work that was done over the last few weeks combined with the better hardware definitely played a part. Looking over the processes, it looks like the service responsible for images is still doing some work so it's possible that you will come across some broken images. I'll be keeping on eye on that over the next bit and make adjustments if needed. Thank you all for the support and to all of you who kept me company on our matrix channel. Have a good evening.

=========================================================
Update February 27 2024
We've successfully completed the migration. I'm happy to announce that this instance is now running on its new hardware dedicated solely to this community! We experienced just under 40 minutes of downtime which is a whole 50 minutes less than expected. Please give this instance a chance to catch up what it missed but we should be good within the next 30 or so minutes. Thank you

you are viewing a single comment's thread
view the rest of the comments

[–] 4am@lemm.ee 20 points 8 months ago* (last edited 8 months ago) (1 children)

Speed is usually the reason. SSDs in general are faster, enterprise SSDs are not only faster but much more write-tolerant and last a very long time in comparison to consumer SSDs.

They can also (in many cases) do write caching at the speed of a DRAM buffer, making the bottleneck the SATA or SAS bus itself (SAS is like enterprise SATA, 12Gb/sec as opposed to 6). NVMe can be even faster. This means that programs (ie Lemmy and its database) that write data aren’t waiting around for the drive to acknowledge the write before that program can move on to other things. Shaving off a few milliseconds per write can make a massive difference when you realize there might be millions of IOPS (Input/Output operations Per Second) under load. The requirement for low latency is everything in servers.

When you are running a public service and requests are coming in constantly and at a high rate, you really really do not want storage latency to bottleneck you, as that is a problem that will compound extremely quickly. This is a big issue with HDDs as well, as even disk seek times add to the problem, let alone caching/buffering writes.

We could talk all day about if four SSDs in a RAID 10 are optimal, but sometimes you have to think about budget and complexity as well. For the load that a popular Lemmy instance might currently draw, I’d make an educated guess that this might be sufficient for now. Room to expand was also mentioned, which is the second most important part of a storage plan.

[–] _cnt0@sh.itjust.works 2 points 8 months ago (1 children)

I'd wager raid 5 would be better, but it would require a special storage controller or hog the cpu with 4 ssds.

[–] burrito@sh.itjust.works 2 points 8 months ago (1 children)

Software RAID is much faster than you think, even in RAID 5. Many of the algorithms used in software RAID leverage special CPU instructions that can process the parity operations at a very fast rate. Reading the data, which is by far the most common operation in a Lemmy instance, uses even less computational power than writes.

[–] 4am@lemm.ee 2 points 8 months ago

Yeah, ZFS rocks these days. Fast and rock solid for me, even on older hardware. I run my whole array as mirrored vdevs (so, basically a bunch of raid 10) to keep resilver times down when i replace drives. No issues so far!