this post was submitted on 24 Jan 2024
12 points (92.9% liked)

Permacomputing

655 readers
3 users here now

Computing to support life on Earth

Computing in the age of climate crisis is often wasteful and adds nothing useful to our real life communities. Here we try to find out how to change that.

Definition and purpose of permacomputing: http://viznut.fi/files/texts-en/permacomputing.html

XMPP chat: https://movim.slrpnk.net/chat/lowtech%40chat.disroot.org/room

Sister community over at lemmy.sdf.org: !permacomputing@lemmy.sdf.org

There's also a wiki: https://permacomputing.net/

Website: http://wiki.xxiivv.com/site/permacomputing.html

founded 1 year ago
MODERATORS
 

Short reminder:

The human genome, with all its magic, is about 3,117,275,501 base pairs long. Source: Wikipedia

If you would encode that data digitally, and store it on a SSD drive, it would take up < 1 GB.

So, if we can do so much magic with 1 GB, that should be an inspiration to all software to do more, with less space.

Thank you for coming to my talk.

you are viewing a single comment's thread
view the rest of the comments
[–] gandalf_der_12te@feddit.de 1 points 10 months ago (1 children)

I think it might be even less than that, considering that two humans have 99.9% the same genome.

So the differences are maybe 1 MB per person.

That would make for 8000 TB.

And then there's compression. Obviously, there's going to be a lot of representation if you take the genes of all humans, since they are more or less just passed on, with slight modifications.

So it's definitely going to be 80 TB or less.

[–] Sibbo@sopuli.xyz 1 points 10 months ago* (last edited 10 months ago) (1 children)

Right, peta is two steps above giga. Then I'll go with one terabyte. Well, then there is roughly 10 bytes per genome. Hmm, that is a bit little. Maybe the 80TB estimate is quite good. Then it would be 1KB per genome.

You could probably build a phylogenetic tree by some heuristic, and then the differences on the edges are very small.

Or, build an index of all variants, and then represent each genome as a compressed bitvector with a one for each variant it contains.

Well, now it seems that this would still be many variants, given that there are so many single bases that may differ. So maybe 80TB is a bit too little.

[–] gandalf_der_12te@feddit.de 1 points 10 months ago (1 children)

Yeah, but nobody's gonna encode all of humanity's genes at once. It's like taking the storage of all game sava data of all users combined. It doesn't make sense.

Normally, you look at the storage space for one individual at a time.

[–] Sibbo@sopuli.xyz 2 points 10 months ago (1 children)

There is an entire research field about looking at sets of genomes. It's called pangenomics. I think they are at hundreds of thousands of human genomes of available data right now. Ten thousand from a few years ago I know for sure.

Considering multiple genomes is one of the keys to understanding the effects of the different genomic variants on the individual. One can for example look at various chronic diseases and see if there is anything special about the genomes of the sick individuals compared to the healthy ones.

This requires a lot of samples, because just comparing one sick and one healthy individual will bring up a lot of false positive variants, that differ between the individuals, but are not related to the disease.

[–] gandalf_der_12te@feddit.de 2 points 10 months ago

thanks, I hadn't thought of that.