Another helpful detail, if someone goes through blindly punching in the commands at the "Let’s install RoCm, AMD’s GPU drivers/programming environment" step this link will fail
I don't know enough about this stuff to know whether you can assume the latest version of amdgpu will work, but if you want to link to the specific version in your blog post, this link is the permanent link
Install llama.cpp, the steps are detailed in the repo.
Download an LLM from huggingface.
For those not familiar with this step, look for anything that has GGUF`in its name. You will probably find that on TheBloke's page. If you are not sure which one to choose, start with one that has lots of likes or downloads, or browse this community for impressions and feedback.Once you find the model you like, go to its page, click on `Files and versions` and then choose a file that ends with .gguf and download it. If you are not familiar with the sizes, go for Q4_K_M and make sure the size of the file seems to be something that can fit in your GPU or CPU memory.
Start the llama.cpp server, here is the command I use to start Mixtral with GPU:
I started the server on port 3077, which means that the endpoint at `http://127.0.0.1:3077/v1/chat/completions` is up. In `Chatbox` UI, I can just go to settings and replace the API Host with `http://127.0.0.1:3077` and the app runs with my local model now.
About how much free space do you need on your steam deck to do this? 20 gigs?
I am very excited to try this! Thank you for writing the guide.
Also in the "Install Distrobox and Podman" step the link for podman is deprecated. I am going to try using lilipod from here https://github.com/89luca89/lilipod/releases/download/v0.0.2/lilipod-linux-amd64
Another helpful detail, if someone goes through blindly punching in the commands at the "Let’s install RoCm, AMD’s GPU drivers/programming environment" step this link will fail
http://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_5.7.50700-1_all.deb
I don't know enough about this stuff to know whether you can assume the latest version of amdgpu will work, but if you want to link to the specific version in your blog post, this link is the permanent link
http://repo.radeon.com/amdgpu-install/5.7/ubuntu/jammy/amdgpu-install_5.7.50700-1_all.deb
This reddit thread also has some useful information to get started on trying different LLM models
https://www.reddit.com/r/LocalLLaMA/comments/18hzun0/sharing_a_simple_local_llm_setup/ specifically this quote below
--------------------------------
Steps:
Install llama.cpp, the steps are detailed in the repo.
Download an LLM from huggingface.
For those not familiar with this step, look for anything that has GGUF`in its name. You will probably find that on TheBloke's page. If you are not sure which one to choose, start with one that has lots of likes or downloads, or browse this community for impressions and feedback.Once you find the model you like, go to its page, click on `Files and versions` and then choose a file that ends with .gguf and download it. If you are not familiar with the sizes, go for Q4_K_M and make sure the size of the file seems to be something that can fit in your GPU or CPU memory.
Start the llama.cpp server, here is the command I use to start Mixtral with GPU:
./server -m ./mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --alias gpt-3.5-turbo --n-gpu-layers 10000 --mlock --port 3077 --cont-batching --ctx-size 4096
I started the server on port 3077, which means that the endpoint at `http://127.0.0.1:3077/v1/chat/completions` is up. In `Chatbox` UI, I can just go to settings and replace the API Host with `http://127.0.0.1:3077` and the app runs with my local model now.
-----------------