this post was submitted on 11 Jul 2023
16 points (100.0% liked)
LocalLLaMA
2249 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
KoboldCpp has documentation on the github page. Maybe just google for other guides if the documentation doesn't do it for you.
My advice is: Do one step at a time. Get it running first, without fancy stuff. Start with a small model and without gpu acceleration. Then get the acceleration/CUDA working. Then try with a bigger model. And then you can do the elaborate stuff like having some layers in VRAM and others in RAM and blowing up the context size past 2048/default. Don't do it all at once. That way you might figure out your problem and at which of the steps it happens.
(Edit: And make sure to always use the latest version. You're playing with pretty recent stuff that still might have bugs.)
I can't say much about the windows stuff or the state of the integration layers in oobabooga's.