this post was submitted on 15 Jun 2023

47 points (100.0% liked)

Technology

37712 readers

182 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

Los@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

Different development paths of LLMs (lemmy.intai.tech)

submitted 1 year ago by manitcor@lemmy.intai.tech to c/technology@beehaw.org

15 comments fedilink hide all child comments

cross-posted from: https://lemmy.intai.tech/post/5502

cross-posted from: https://lemmy.intai.tech/post/5501

https://www.interconnects.ai/p/llm-development-paths

all 16 comments

sorted by: hot top controversial new old

[–] Ongar@beehaw.org 6 points 1 year ago (1 children)

Can someone give me a simple explanation of the differences between the main branches on this graphic?

[–] person594@feddit.de 4 points 1 year ago (1 children)

All the colored branches are based on the Transformer neural network architecture. If we allow for a lot of handwaving, a transformer allows each word to look at other words in the sequence in order to make decisions/produce output.

In a decoder transformer architecture, each word can only look at the words that come before it in the sequence. Such models are naturally suited for the task of next word prediction: you can ask each word to predict what word comes next, and they can look at all the words before them, but cannot "cheat" by looking ahead. These models are used for text generation: start with an empty sequence, and repeatedly predict what word should come next.

On the other hand, in an encoder architecture, each word can look at every other word, in front of and behind it. You can't use these for next word prediction, since they can cheat, but they are commonly used for masked language modeling, a task where we delete some words from the input and then try to predict which words were deleted. It isn't straightforward to generate text with these models, but they can learn a lot of deep statistical properties of text, which can then be used for other tasks/models.

Some model architectures might use both encoders and decoders, but I am not too familiar with how they are used for language modeling. The classic example of such a model is a translation system, which would use an encoder to "encoder" the source language text, and the decoder to generate target language text from that encoding.

[–] Ongar@beehaw.org 1 points 1 year ago

Thanks for the explanation, I think I can wrap my head around that.

[–] bownage@beehaw.org 4 points 1 year ago (1 children)

I was going to ask which are transformer based but it's actually nearly all of them (non grey) 😳. I thought BERT was still pre transformers. Goes to show how quickly things have evolved in recent years.

[–] manitcor@lemmy.intai.tech 1 points 1 year ago

yeah, the idea of making it transformers all the way down was key, each node is a complete Turing machine.

[–] milkytoast@kbin.social 3 points 1 year ago

Can someone explain Decoder only vs Encoder/Decoder?

[–] Davel23@kbin.social 2 points 1 year ago

Where's Eliza?

[–] sabret00the@beehaw.org 2 points 1 year ago (1 children)

Doesn't Bard run on Lamda?

[–] notfromhere@lemmy.one 2 points 1 year ago (2 children)

Wikipedia says it was initially based on LaMDA then PaLM.

[–] entropicdrift@lemmy.sdf.org 1 points 1 year ago

And currently PaLM 2

[–] sabret00the@beehaw.org 1 points 1 year ago

So in that case, shouldn't the branch reflect that? It looks as though they're different projects.

[–] entropicdrift@lemmy.sdf.org 2 points 1 year ago

Also of note is RWKV, the only purely RNN structured LLM. I'm keeping an eye on that one because of the theoretically infinite context length

[–] ludeth@kbin.social 2 points 1 year ago

I don’t know what transformer vs non transformer means. But the tree is interesting!

[–] Steeve@lemmy.ca 1 points 1 year ago

Companies that open source their code, you love to see it.

[–] thepiguy@lemmy.blahaj.zone 1 points 1 year ago

Whooooa. Neat.