Thanks for this. It will definitely come in handy 👍
this post was submitted on 17 Oct 2023
19 points (91.3% liked)
AI Horde
345 readers
1 users here now
Discuss the happenings around the AI Horde
founded 1 year ago
MODERATORS
Not the same but similar. Petals.dev
' Run large language models at home, BitTorrent‑style
- Generate text with Llama 2 (70B), Falcon (40B+), BLOOM (176B) (or their derivatives) and fine‑tune them for your tasks — using a consumer-grade GPU or Google Colab.
- You load a small part of the model, then join a network of people serving the other parts. Single‑batch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) — enough for chatbots and interactive apps.
- Beyond classic LLM APIs — you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch and 🤗 Transformers.'
Haven't had the time to tinker with any of them my self, so it's just fyi..