this post was submitted on 07 Jan 2024
200 points (96.3% liked)

Selfhosted

39921 readers
348 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I'm interested in hosting something like this, and I'd like to know experiences regarding this topic.

The main reason to host this for privacy reasons and also to integrate my own PKM data (markdown files, mainly).

Feel free to recommend me videos, articles, other Lemmy communities, etc.

all 35 comments
sorted by: hot top controversial new old
[–] woodgen@lemm.ee 27 points 10 months ago* (last edited 10 months ago) (1 children)

I tired a bunch, but current state of the art is text-generation-webui, which can load multiple models and has a workflow similar to stable-diffusion-webui.

https://github.com/oobabooga/text-generation-webui

[–] CumBroth@discuss.tchncs.de 2 points 10 months ago

I've tried both this and https://github.com/jmorganca/ollama. I liked the latter a lot more; just can't remember why.

GUI for ollama is a separate project: https://github.com/ollama-webui/ollama-webui

[–] dacookingsenpai@lemme.discus.sh 16 points 10 months ago

Absolutely yes. You can try GPT4ALL which works on any decent CPU computer (the minimum I managed to run it with is a 2018 6 core 2.0ghz ARM64 processor) and has a lot of built in models. You can also import uncensored models (like the TheBloke ones on Huggingface ).

I also tried AutoGPT some times ago which is quite complex and cool.

[–] CubitOom@infosec.pub 12 points 10 months ago

Checkout ollama.

There's a lot of models you can pull from the official library.

Using ollama, you can also run external gguf models found on places like huggingface if you use a modelfile with something as simple as

echo "FROM ~/Documents/ollama/models/$model_filepath" >| ~/Documents/ollama/modelfiles/$model_name.modelfile
[–] sj_zero@lotide.fbxl.net 11 points 10 months ago (1 children)

I've been using a number of different tools which I interface to my nextcloud.

My main nextcloud has a llm plugin which was really easy to install, you just install the plug-in, make sure that you are configured properly with python in your path, and then run an OCC command to download one of a few models.

https://localai.io/

I also hosted localAI, which was a little bit more involved, but the website did a decent enough job of explaining exactly all the things that you needed to do in order to get all the different types of AI model working. Besides LLMs, it also supports text to speech, speech to text, and image generation.

Two things that are important: first, if you are server doesn't have a pretty advanced video card then you're going to be using the CPU exclusively for AI, and that'll be pretty slow. Second, I found it very quickly that the amount of RAM you have is critical. My main server is a core i5 4th gen, and so I put AI software on another one of my servers which is a core i5 7th gen. You would think that the latter would work a lot better, but it had half the ram, and it basically wasn't even able to get started.

Besides hosting ai, if you have a desktop computer or gaming laptop you can run local AI models. There's a fantastic piece of software called Faraday that works pretty well on my laptop. You can get more and more sophisticated models depending on how much memory you have.

https://youtu.be/aLy_vVLUHZk

Krita has AI dal-e support for image generation available as a plug-in. I haven't used it yet because I just got it started downloading last night before I went to bed, but the installation process has defined in the video seems accurate and was extremely easy and mostly automated.

https://youtu.be/AU8NDSBIS1U

[–] ReallyActuallyFrankenstein@lemmynsfw.com 0 points 10 months ago (1 children)

Second, I found it very quickly that the amount of RAM you have is critical. My main server is a core i5 4th gen, and so I put AI software on another one of my servers which is a core i5 7th gen. You would think that the latter would work a lot better, but it had half the ram, and it basically wasn't even able to get started.

Is there an amount of RAM that's currently considered the bare minimum for CPU-only self-hosting?

[–] exu@feditown.com 3 points 10 months ago (1 children)

If you're using llama.cpp, have a look at the GGUF models by TheBloke on huggingface. He puts approximate RAM required in the readme based on the quantisation level.

From personal experience I'd estimate 12G for 7B models based on how full RAM was with 16 gigs. For mixtral at least 32G.

Thanks, appreciate it (I'm new to local text CPU models, I know it was a stupid question).

[–] Buffalobuffalo@lemmy.dbzer0.com 10 points 10 months ago

Dbzero Lemmy has a relationship with the Horde AI shared LLM group. My primary use is for chat roleplay but they have streamlined guides to hosting your own models for personal or horde use. One of the primary interfaces is SillyTavern but they integrate numerous models

[–] WeLoveCastingSpellz@lemmy.dbzer0.com 8 points 10 months ago* (last edited 10 months ago)

I use koboldAI. It is local and open source

[–] hactar42@lemmy.world 5 points 10 months ago

I've played around with a few of them. I've found LM Studio the most robust and user friendly.

[–] Gooey0210@sh.itjust.works 3 points 10 months ago (1 children)

Recntly started using HuggingChat 🤗

[–] Potatos_are_not_friends@lemmy.world 3 points 10 months ago (1 children)

Huggingchat for image generation is beautiful beautiful nightmare fuel.

I seriously love it.

[–] Gooey0210@sh.itjust.works 1 points 10 months ago

I do image generation on AUTOMATIC1111

Really happy that i switched the text ai to something more opened that CloseAI

[–] AlphaAutist@lemmy.world 3 points 10 months ago (1 children)

I haven’t tried any of them but I did just listen to a podcast the other week where they talk about LlamaGPT vs Ollama and other related tools. If you’re interested it’s episode 540: Uncensored AI on Linux by Linux Unplugged

[–] TCB13@lemmy.world -4 points 10 months ago

“Uncensored” models are bullshit everything but uncensored. Just ask them for a Windows XP Pro key and you'll see how uncensored they really are.

[–] Imacat@lemmy.dbzer0.com 3 points 10 months ago* (last edited 10 months ago) (1 children)

There’s a local llama subreddit with a lot of good information and 4chan’s /g/ board will usually have a good thread with a ton of helpful links in the first post. Don’t think there’s anything on lemmy yet. You can run some good models on a decent home pc but training and fine tuning will likely require renting out some cloud gpus.

[–] Rolando@lemmy.world 4 points 10 months ago

Don’t think there’s anything on lemmy yet.

!fosai@lemmy.world -- has a good overview/introduction

!localllama@sh.itjust.works

!localllm@lemmy.world

!localai@lemmy.world

!localllama@kbin.social

Some of those are inactive, though.

[–] Haggunenons@lemmy.world 3 points 10 months ago

Mixtral is an amazing one that isn't super slow or require incredible hardware foe a decent speed.

In general this guy has really good videos/tutorials for the latest tools.

[–] amzd@kbin.social 3 points 10 months ago

ollama + codellama works perfect, I use it from neovim with a plug-in called gen-nvim I think

[–] hottari@lemmy.ml 2 points 10 months ago

Last time I checked this, out of all the options available Serge was the simplest to host and use. Though you need a beefy computer to get fast and/or good responses.

[–] SuperiorOne@lemmy.ml 2 points 10 months ago

I'm actively using ollama with docker to run llama2:13b model. It's generally works fine but heavy on resources as expected.

[–] db0@lemmy.dbzer0.com 2 points 10 months ago

If you want to be able to use your models from everywhere sefurely, then koboldcpp on the ai horde is your best option. Super easy to set up

[–] beta_tester@lemmy.ml 1 points 10 months ago

Not with success but I'm using huggingface since a couple of days. You may want to have a look into it

[–] TCB13@lemmy.world -2 points 10 months ago (1 children)

Yes, mostly https://gpt4all.io/ only to find out that even the "uncensored" models are bullshit and won't even provide you with a Windows XP Pro key. That's kind of my benchmark for models nowadays. :P

[–] cashews_best_nut@lemmy.world 3 points 10 months ago

Will it tell you how to make meth?