Technology

59436 readers

3056 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

'Up to 1,000x faster': AI startup wants to make GPU training obsolete with an extraordinary piece of tech — meet the Tseltin machine which may come to a device near you sooner than you think (www.techradar.com)

submitted 8 months ago by ooli@lemmy.world to c/technology@lemmy.world

8 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] kromem@lemmy.world 38 points 8 months ago

For anyone interested in algorithmic changes that improve efficiency, Microsoft's recent research around moving from floating point weights to ternary ones (1, 0, -1) was really impressive:

https://arxiv.org/abs/2402.17764

Basically at larger parameter sizes it outperforms FP networks while being a fraction of the memory footprint and bypassing the need for matrix multiplication.

It kind of makes sense that it works too, given past research that the networks are creating a virtualized node topology based on combinations of physical nodes, so with enough nodes to work with there isn't a loss in functionality and the discrete weights should arrive at optimal thresholds more easily than slight adjustments to FP values.

The next generation of models built on this need to be trained from scratch (this is about pretraining and not quantization after the fact), but it should open the door to new hardware architectures better optimized for networks of ternary weights.