this post was submitted on 31 Jul 2023
41 points (100.0% liked)
..:: tchncs ::..
1309 readers
4 users here now
Your friendly https://tchncs.de/ community! Discuss whats happening in the tchncs world and/or just use it as a community forum.
German and english allowed.
If you are looking for a way to support tchncs, please check out https://tchncs.de/donate
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You‘re not telling me telling me you‘re reading 62 TB in 117 hours, right? Right? xD the old ones were even petabytes.
Those numbers are just insane. I have worked with AI training and storage. I have never seen such numbers.
Well, I suppose that nvme was very much eol. Now I understand the behavior. This many operations in such a short time will put serious strain on your system. No wonder parts can give up. Are you using a raid config? Sorry if you already mentioned it.
i am not sure about those numbers on the new ones ... it was one db restore and a few hrs of uptime ... a scrub... , then i rsynced some stuff over and since then the thing is in idle 🤷
sample of the current active system .. i think at time of arrival it was 2+tb written or something
I might now understand what happened to your nvme (just a guess):
SSDs have „spare“ sectors, not available to you until the old ones are used up. Then the new ones get cycled in.
The other info said: no spare available, usage 250%
I have read about this I think. If the spare sectors run out and the drive starts to get smaller and smaller, the system will fill it up to its old capacity and overwrite data, thus corrupting itself.
That what happens to phony ssds that get sold as tb drives but are 250 gig usb drives instead. As long as you only fill 250, you will not recognize something is wrong. Once you go above, you start losing data.
Not totally sure it works that way in ssds but I‘m somewhat sure this 250% usage is an indicator of a run down ssd.
And I still think it is pure negligence of hetzner to not have swapped them out then they were due.
Didn’t they run in raid 1 or something? Usually, if a drive fails, the second one should hold.
I am a bit confused now... the spare was 98% as to read in my snippet above ... where does it say "no spare available"? I think it is on me to request a swap, and thats what i did as also the one with slightly less wear reported 255% used – which afaik is an aprox. lifetime left estimation based on rw cycles (not sure about all factors).
The one the hoster left in for me to play with, said no:
Tried multiple kernelflags n stuff but couldn't get past that error. Would have been interesting to have the hoster ship the thing to me (and maybe that would have been a long enough cooldown to have the thing working again), but i assume that would have been expensive from helsinki.
My bad. I must have misread. Sorry.
Yes, shipping it to you would have probably been a good idea. Does it cost a lot less to use the helsinki location? Otherwise Falkenstein would be a pretty good alternative I guess.