cross-posted from: https://lemmy.daqfx.com/post/24701
I'm hosting my own Lemmy instance and trying to figure out how to optimize PSQL to reduce disk IO at the expense of memory.
I accept increased risk this introduces, but need to figure out parameters that will allow a server with a ton of RAM and reliable power to operate without constantly sitting with 20% iowait.
Current settings:
# DB Version: 15 # OS Type: linux # DB Type: web # Total Memory (RAM): 32 GB # CPUs num: 8 # Data Storage: hdd max_connections = 200 shared_buffers = 8GB effective_cache_size = 24GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 4 effective_io_concurrency = 2 work_mem = 10485kB min_wal_size = 1GB max_wal_size = 4GB max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 max_parallel_maintenance_workers = 4 fsync = off synchronous_commit = off wal_writer_delay = 800 wal_buffers = 64MB
Most load comes from LCS script seeding content and not actual users.
Solution: My issue turned out to be really banal - Lemmy's PostgreSQL container was pointing at default location for config file (/var/lib/postgresql/data/postgresql.conf) and not at the location where I actually mounted custom config file for the server (/etc/postgresql.conf). Everything is working as expected after I updated docker-compose.yaml file to point PostgreSQL to correct config file. Thanks @bahmanm@lemmy.ml for pointing me in the right direction!
A few things off the top of my head in order of importance:
How frequently do you
VACCUM
the database? Have you triedVACCUM
ing a few of times over a 5 min span & see if there are changes to the disk I/O aftewards?I've got no idea how Lemmy works but "seeding content", to my mind, possibly means a lot of
INSERT
/UPDATE
s. Is that correct? If yes, there's a chance you may be thrashing your indices & invalidating them too frequently which triggers a lot of rebuilding which could swallow a very large portion of theshared_buffers
. To rule that out, you can simply bumpshared_buffers
(eg 16GB) &effective_cache_size
and see if it makes any difference.Please include a bit more information about PG activity, namely from
pg_stat_activity
,pg_stat_bgwriter
&pg_stat_wal
.You've got quite a high value for
max_connections
- I don't believe that' s the culprit here.And finally, if possible, I'd highly recommend that you take a few minutes & install Prometheus, Prometheus node exporter, Proemetheus PG exporter and Grafana to monitor the state of your deployment. It's way easier to find correlations between data points using the said toolset.
Your Prom PG Exporter is a 404 (I think there is a trailing t in the URL).
Do you have any recommendations for dashboard for grafana/Pg?
As well as statistics that are important?
This is something I'm going to be putting into my deployment, and it's really easy to get overwhelmed with data!
Oh, updated the link 🤦♂️
The stock Grafana dashboard for PG is a good starting point. At least, that's how I started. You really should add new metrics to your dashboard if you really need them as you said.
Don't forget to install node-exporter too. It gives some important bits of info about the PG host. Again the stock dashboard is a decent one to start w/.