this post was submitted on 24 Jan 2024
12 points (92.9% liked)

Permacomputing

641 readers
2 users here now

Computing to support life on Earth

Computing in the age of climate crisis is often wasteful and adds nothing useful to our real life communities. Here we try to find out how to change that.

Definition and purpose of permacomputing: http://viznut.fi/files/texts-en/permacomputing.html

XMPP chat: https://movim.slrpnk.net/chat/lowtech%40chat.disroot.org/room

Sister community over at lemmy.sdf.org: !permacomputing@lemmy.sdf.org

There's also a wiki: https://permacomputing.net/

Website: http://wiki.xxiivv.com/site/permacomputing.html

founded 1 year ago
MODERATORS
 

Short reminder:

The human genome, with all its magic, is about 3,117,275,501 base pairs long. Source: Wikipedia

If you would encode that data digitally, and store it on a SSD drive, it would take up < 1 GB.

So, if we can do so much magic with 1 GB, that should be an inspiration to all software to do more, with less space.

Thank you for coming to my talk.

you are viewing a single comment's thread
view the rest of the comments
[–] somnuz@lemm.ee 5 points 9 months ago (3 children)

1 GB is okay, can we — for the sake of this argument — compress it?

[–] numberfour002@lemmy.world 4 points 9 months ago (2 children)

Yes, but I don't know how much (and it would vary based on numerous factors).

An uncompressed format would need 3,117,275,501 X 2 bits to be able to guarantee that it can encode any DNA sequence of 3,117,275,501 base pairs (caveats and nuances aside).

However, human DNA sequences aren't completely random! There are constraints on what would actually be a valid human DNA sequence. That opens the possibility of compressing the data.

For example, you'll never find someone with 3,117,275,501 of exactly the same base pairs (i.e. AAAAAAAAAA....AAAAAA), it's impossible. Based solely on that, you don't actually need all 3,117,275,501 X 2 bits of information. In fact, the set of valid human DNA sequences is probably considerably smaller than the set of all possible DNA sequences of the same size (can't find any specific data here, so you'll just have to "trust me bro"). So, a good/smart algorithm can make use of that to generate representations that require fewer bits of storage.

Another aspect of human DNA is that it contains a lot of repeated segments. A quick check of Wikipedia even suggests that 2/3rds of human DNA is composed of these repeating patterns. Repeating patterns like that, and particularly because they make up so much of our DNA, are ripe for compression.

I'm sure there are other aspects at play here, but those two facts in and of themselves pretty much guarantee that we can compress otherwise uncompressed binary representations of human DNA sequences.

[–] somnuz@lemm.ee 2 points 9 months ago

You! Yes, you. We need this type of approach, can you now go make some useful software that technically won’t be a bloatware or a game that won’t exceed 50 gigs? Pretty please?

[–] gandalf_der_12te@feddit.de 1 points 9 months ago* (last edited 9 months ago)

Yes, exactly.

And also, we probably don't need most of our genome anyway. IIRC, 90% or something seems to have no apparent function. It's just there as an artifact of evolution, and never got removed.

So, if you would leave all that out, the actually useful genome is much less than 3,117,275,501 base pairs.

The thing is, we have no idea, which genes are useful or not. It is often very difficult to say, and any error would probably lead to disease. So we don't mess with DNA.

[–] JoMomma@lemm.ee 3 points 9 months ago

It is compressed by histones, except during cell division and when specific sections are expanded for reading and transcribing

[–] gandalf_der_12te@feddit.de 1 points 9 months ago (1 children)

yes, though I don't know how much

[–] somnuz@lemm.ee 2 points 9 months ago

Let’s just agree on lossless with solid error correction — we really don’t want to fumble this bag.