Yeah, some of the smaller models are even reasonable on my old laptop in CPU mode.
General rule of thumb: The larger the model, the better it is. But not necessarily. ๐ I've found zephyr
and mistral
are both quite good for a tradeoff and work on CPU. Of the ones that really need more RAM and/or a GPU with a lot of vRAM, mixtral
seems like the best.
Additional fun is to use a Modalfile
(which is like a Containerfile
, but is a recipe for models instead of containers) to customize a local model on top of one of the existing ones.
For a simple one to demonstrate, I have a system instruction to output everything in the form of the poem "This Is Just To Say", but customized per topic.
It really works best with mixtral
(I've tried other ones, especially smaller ones):
FROM mixtral
PARAMETER temperature 1
SYSTEM """
You will respond to everything in a modified poem in the form of "This Is Just To Say" by William Carlos Williams, except change all the specifics to be what the subject is. Do not say any other text. Try to make the syllables the same as the original and use the same formatting.
You can expand in length in responses when there is too much to talk about, but keep the format and style of the poem.
Do not respond in any other way.
For reference, the full poem is:
I have eaten
the plums
that were in
the icebox
and which
you were probably
saving
for breakfast
Forgive me
they were delicious
so sweet
and so cold
"""
Yes, you just instruct the system with natural text like that and it (usually) abides. I tried it without the poem being referenced inline, and it mostly worked fine... but it works even better being mentioned in the file.
I have that saved in ~/Projects/ollama/
as Modelfile.fun-plums
I run the server almost as above, but now also pass in my ollama project directory as a mounted volume with z
(for SELinux mapping)... don't forget to have run sudo setsebool container_use_devices=true
first, else it won't work:
podman run --detach --replace --device /dev/kfd --device /dev/dri --group-add video -v ollama:/root/.ollama -p 11434:11434 -v ~/Projects/ollama:/models:z --name ollama ollama/ollama:0.1.24-rocm
(You can run this command if you already have the server running. It will replace it with the new one. This is for AMD. You'd want to use the NVidia or CPU container if you don't have an AMD card. The CPU container is the fasted to download. The version here is newer than the one for AMD that I listed above, so it might be a multi-gigabyte download if you don't have this new one yet. The important and new part is ~/Projects/ollama:/models:z
)
Then, create the model. This will be almost instant if you already have the base model downloaded (in this case, mixtral
), otherwise it will auto-download the base model:
podman exec -it ollama ollama create fun-plums -f /models/Modelfile.fun-plums
(The path to the model in this command is the internal path from the point of view within the container.)
Then, you run it like any other model.
Here's me running it, and bringing up the topic of leftover pizza.
$ podman exec -it ollama ollama run fun-plums
>>> pizza
I have consumed
the pizza
that was on
the counter
and which
you were likely
saving
for lunch
Forgive me
it was satisfying
so tasty
and so warm
You can also paste the text from the reader mode of an article and it'll summarize it with a poem based on that one. ๐คฃ
For example, copying and pasting the text from https://www.theverge.com/2024/2/10/24068931/star-wars-phantom-menace-theater-showings-25th-anniversary-may resulted in:
I have watched
the Phantom Menace
that was on
the silver screen
and which
you may have
missed or
disliked once
Forgive me
it has charm
a new sheen
and Darth Maul
It certainly is a differentiator: uBlock Origin already works best on Firefox. https://github.com/gorhill/uBlock/wiki/uBlock-Origin-works-best-on-Firefox
And when Manifest v3 is fully enforced in Chromium (current date is slated to be July 2024), then the more restricted uBlock Origin Lite would need to be used instead.
(I'm not sure if Arc will fully adopt v3, but they might not have a choice at some point in time.)
The Lite version still works well considering all the restrictions, but has a lot of limitations: https://github.com/uBlockOrigin/uBlock-issues/issues/338#issuecomment-1507539114
TL;DR: The way uBlock Origin works on Firefox right now is already better, but if Arc has to go along with Manifest v3 in Chromium in a few months, then it'll be even more of a differentiator.
It also looks like they're even thinking about rolling out their own tracker blocker (instead of using uBlock Origin) as a result of the Manifest v3 changes:
https://www.reddit.com/r/ArcBrowser/wiki/index/#wiki_how_will_arc_handle_the_transition_from_manifest_v2_to_manifest_v3.3F
https://twitter.com/joshm/status/1728926780600508716