Technology

59298 readers

6306 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

Microsoft Announces: LongNet - Scaling LLM Transformers to 1,000,000,000 Tokens & Context Length (lemmy.world)

submitted 1 year ago by Blaed@lemmy.world to c/technology@lemmy.world

12 comments fedilink hide all child comments

cross-posted from: https://lemmy.world/post/1115513

Microsoft Announces a New Breakthrough: LongNet: Scaling AI/LLM Transformers to 1,000,000,000 Tokens & Context Length

Official Microsoft Breakthroughs:

https://arxiv.org/pdf/2307.02486.pdf

https://github.com/microsoft/unilm/tree/master

See one of the first implementations of LongNet here:

https://github.com/kyegomez/LongNet

In the realm of large language models, scaling sequence length has emerged as a significant challenge. Current methods often grapple with computational complexity or model expressivity, limiting the maximum sequence length. This paper introduces LongNet, a Transformer variant designed to scale sequence length to over 1 billion tokens without compromising performance on shorter sequences. The key innovation is dilated attention, which exponentially expands the attentive field as the distance increases.

Features

LongNet offers several compelling advantages:

Linear Computation Complexity: It maintains a linear computational complexity and a logarithmic dependency between tokens.

Distributed Trainer: LongNet can serve as a distributed trainer for extremely long sequences.

Dilated Attention: This new feature is a drop-in replacement for standard attention and can be seamlessly integrated with existing Transformer-based optimization.

(+ many others that are hard to fit here - please read the full paper here for more insights)

Experimental results show that LongNet delivers strong performance on both long-sequence modeling and general language tasks. This work paves the way for modeling very long sequences, such as treating an entire corpus or even the whole Internet as a sequence.

If computation and inference hurdles are continually overcome the way they are now - we may be seeing near infinite context lengths sooner than many had initially thought. How exciting!

Arxiv Paper | The Abstract:

(take this graph with a grain of salt - this is not indicative of logarithmic scaling)

Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this work, we introduce LONGNET, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, we propose dilated attention, which expands the attentive field exponentially as the distance grows. LONGNET has significant advantages:

It has a linear computation complexity and a logarithm dependency between tokens.

It can be served as a distributed trainer for extremely long sequences.

Its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization.

Experiments results demonstrate that LONGNET yields strong performance on both long-sequence modeling and general language tasks.

Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence. Code is available at https://aka.ms/LongNet.

Click here to read the rest of the paper!

top 7 comments

sorted by: hot top controversial new old

[–] simple@lemmy.world 11 points 1 year ago* (last edited 1 year ago) (1 children)

Woah. Isn't this huge? Now LLMs essentially can have long-term memory or be able to digest entire codebases to answer questions on them. This sounds crazy, but I'll hold my hype until we see a working demo for it.

[–] TheDarkKnight@lemmy.world 8 points 1 year ago (1 children)

Could this in theory enable a persistent assistant that learns and improves with time?

[–] dandelo@lemmy.world 7 points 1 year ago (1 children)

Or something very close to it! Crazy to think that GPT-4 is only 32k, and makes you wonder what we’re going to have at our fingertips 3-5 years from now…

[–] Blaed@lemmy.world 1 points 1 year ago

All of these are great thoughts and ponderings! Totally correct in the right circumstances, too.

Massive context lengths that can retain coherent memory and attention over long periods of time would enable all sorts of breakthroughs in LLM technology. At this point, you would be held back by performance, compute, and datasets, rather than LLM context windows and short-term memory. In this context, our focus would be towards optimizing attention or improving speed and accuracy.

Let's say you had hundreds of pages of a digital journal and felt like feeding this to a local LLM (where your data stays private). If the model was running sufficiently at high quality, you could have an AI assistant, coach, partner, or tutor that was caught up to speed with your project's goals, your personal aspirations, and your daily life within a matter of a few hours (or a few weeks, depending on hardware capabilities).

Missing areas of expertise you want your AI to have? Upload and feed it more datasets Matrix style, any text-based information that humanity has shared online is available to the model.

From here, you could further finetune and give your LLM a persona, having an assistant and personal operating system that breaks down your life with you, or you could simply 'chat' with your life, those pages you fed it, and reflect upon your thoughts and memories, tuned to a super intelligence beyond your own.

Poses some fascinating questions, doesn't it? About consciousness? Thought? You? This is the sort of stuff that keeps me up at night... If you trained a private LLM on your own notes, thoughts, reflections and introspection, wouldn't you be imposing a level of consciousness into a system far beyond your own mental capacities? I have already started to use LLMs on the daily. In the right conditions, I would absolutely utilize a tool like this. We're not at super intelligence yet, but an unlimited context window for a model of that caliber would be groundbreaking.

Information of any kind could be digitalized and formatted into datasets (at massive lengths), enabling this assistant or personal database to grow overtime with innovations of a project, you, your life, learning and discovering things alongside the intention and desire for it to function. At that point, we're starting to get into augmented human capabilities.

What this means over the course of many years and breakthroughs in models and training methods would be fascinating thought experiment to consider for a society where everyone is using massive context length LLMs regularly.

Sci-fi is quickly becoming a reality, how exciting! I'm here for it, that's for sure. Let's hope the technology stays free, and open and accessible for all of us to participate in its marvels.

[–] BeepTheJeep@lemmy.world 4 points 1 year ago

Long long net!

[–] cybirdman@lemmy.ca 3 points 1 year ago (1 children)

Oh cool, I'm too lazy to read but what would that mean in terms of resources? Is it going to take an insane amount of memory or just take a long time to process longer chains?

[–] Blaed@lemmy.world 3 points 1 year ago* (last edited 1 year ago)

You are correct in thinking this will demand a lot of compute. Hardware will need to scale to match these context lengths, but that is becoming increasingly possible with things like NVIDIA's Grace Hopper architecture and AMDs recent commitment to expanding their hardware selection for emerging AI markets and demand.

There are also some really interesting frameworks and hardware developments being made at TinyCorp & TinyGrad that aim to run these emerging technologies efficiently and accessibly. He talks about this in detail in his podcast with Lex Fridman, a great watch if you're interested in this sort of stuff.

It is an exciting time for technology and innovation. We have already started to hit exaflops of compute...

load more comments