Technology

37720 readers

408 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

Los@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

102

AI language models can exceed PNG and FLAC in lossless compression, says study (arstechnica.com)

submitted 1 year ago by FlickOfTheBean@beehaw.org to c/technology@beehaw.org

29 comments fedilink hide all child comments

While LLMs have been used for... a lot, it seems like this use might be one where it's not only reliable but it appears to outperform existing methods of image compression. Being able to cram more data into less space tends to lead to interesting developments, so I will be keeping my eye on this.

What do you guys think? Seem like it's deserving of less hype than I'm giving it? What kind of security holes do you think this could open?

you are viewing a single comment's thread
view the rest of the comments

[–] skip0110@lemm.ee 24 points 1 year ago* (last edited 1 year ago) (2 children)

I think this model has billions of weights. So I believe that means the model itself is quite large. Since the receiver needs to already have this model, I’d suggest that rather than compressing the data, we have instead pre encoded it, embedded it in the model weights, and thus the “compression” is just basically passing a primary key that points to the data to be compressed in the model.

It’s like, if you already have a copy of a book, I can “compress” any text in that book into 2 numbers: a page offset, and a word offset on that page. But that’s cheating because, at some point, we had to transfer to book too!

[–] puttputt@beehaw.org 14 points 1 year ago

Yeah, it's like saying I can "compress" a png of the Mona Lisa to just the string "Mona Lisa" because I have a database of art.

[–] coffeejunky@beehaw.org 2 points 1 year ago

I feel it's somewhere in the middle. Like your book example only works if you already have the book. If this is a model that is a few gigabytes of data, but it works for every movie or audio file it can still be useable. In that case it's not that you have to send the book first, but you do need to have the same dictionary.