I'm hosting my own Lemmy instance and trying to figure out how to optimize PSQL to reduce disk IO at the expense of memory.

I accept increased risk this introduces, but need to figure out parameters that will allow a server with a ton of RAM and reliable power to operate without constantly sitting with 20% iowait.

Current settings:
# DB Version: 15
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 32 GB
# CPUs num: 8
# Data Storage: hdd

max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4
fsync = off
synchronous_commit = off
wal_writer_delay = 800
wal_buffers = 64MB
Most load comes from LCS script seeding content and not actual users.

Solution: My issue turned out to be really banal - Lemmy's PostgreSQL container was pointing at default location for config file (/var/lib/postgresql/data/postgresql.conf) and not at the location where I actually mounted custom config file for the server (/etc/postgresql.conf). Everything is working as expected after I updated docker-compose.yaml file to point PostgreSQL to correct config file. Thanks @bahmanm@lemmy.ml for pointing me in the right direction!

you are viewing a single comment's thread
view the rest of the comments

[–] bahmanm@lemmy.ml 3 points 1 year ago* (last edited 1 year ago) (5 children)

A few things off the top of my head in order of importance:

How frequently do you VACCUM the database? Have you tried VACCUMing a few of times over a 5 min span & see if there are changes to the disk I/O aftewards?
I've got no idea how Lemmy works but "seeding content", to my mind, possibly means a lot of INSERT/UPDATEs. Is that correct? If yes, there's a chance you may be thrashing your indices & invalidating them too frequently which triggers a lot of rebuilding which could swallow a very large portion of the shared_buffers. To rule that out, you can simply bump shared_buffers (eg 16GB) & effective_cache_size and see if it makes any difference.
Please include a bit more information about PG activity, namely from pg_stat_activity, pg_stat_bgwriter & pg_stat_wal.
You've got quite a high value for max_connections - I don't believe that' s the culprit here.

And finally, if possible, I'd highly recommend that you take a few minutes & install Prometheus, Prometheus node exporter, Proemetheus PG exporter and Grafana to monitor the state of your deployment. It's way easier to find correlations between data points using the said toolset.

[–] towerful@programming.dev 1 points 1 year ago (1 children)

Your Prom PG Exporter is a 404 (I think there is a trailing t in the URL).

Do you have any recommendations for dashboard for grafana/Pg?
As well as statistics that are important?
This is something I'm going to be putting into my deployment, and it's really easy to get overwhelmed with data!

[–] bahmanm@lemmy.ml 2 points 1 year ago

Oh, updated the link 🤦‍♂️

The stock Grafana dashboard for PG is a good starting point. At least, that's how I started. You really should add new metrics to your dashboard if you really need them as you said.

Don't forget to install node-exporter too. It gives some important bits of info about the PG host. Again the stock dashboard is a decent one to start w/.

load more comments (3 replies)