this post was submitted on 04 Oct 2024
6 points (87.5% liked)
Rant
233 readers
1 users here now
A place where you can rant to your heart's content.
Rules :
- Follow all of Lemmy code of conduct.
- Be respectful to others, even if they're the subject of your rant. Realize that you can be angry at someone without denigrating them.
- Keep it on Topic. Memes about ranting are allowed for now, but will be banned if they start to become more prevalent than actual rants.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You have to tell the AI what it is specifically in order to shape its response. They tend to default to explaining subjects to the dumbest potential user.
Math or generalizations like this have an enormous range of contexts and you need to specify. If you are using a more advanced interface that shows the token perplexity scores for the reply, you'll likely see the AI does not know the context of itself or the question. Also if you are using a ultra simplistic general interface with a top-p/top-k sampler for softmax, this type of reply is almost inevitable. Depending on the model architecture, mirostat sampling would likely show better results in general, but without a visible token perplexity score it is very difficult to understand when the issue is due to a prompt and when it is due to the model itself.
One cheap and easy trick is to tell the model a few extra details. This can be as simple as, "You are an AI assistant for MIT undergraduate students." One of my favorites is, "Questions and answers with Richard Stallman's AI Assistant." Since Stallman studied AI and has contributed to systems running in present LLM's, this instruction tends to guide competence considerably. The AI will often rise to a higher level of expectations of the associated context.
Everything you ask for is building momentum. If you use an interface that is data mining and recycling all of your conversations in a hidden history like chatGPT or other service, you're relying on the massive model size alone to find a result without momentum in the truly available information. If you use an open weights offline model or have control over the history where you can remove unrelated questions or conversations, you gain more depth and utility in what you're able to access and how.