Run Ollama in Raspberry PI: Self-Hosted Generative AI
Last Updated on 27th September 2024 by peppe8o
This tutotial will show you how to install Ollama in your Raspberry PI, so getting a self-hosted, open-source generative AI system.
Please note that this tutorial applies only to Raspberry PI computer boards. Ollama (as well as any generative AI) requires a lot of CPU and RAM, so it will probably run well only in the latest Raspberry PI computer models from Raspberry PI 4 onward, with 8GB of RAM.
What is Ollama
Ollama is an open-source project providing an easy way to run large language models (LLMs) on local devices. As you can install it in your local computer, you will keep all your data there, so avoiding to rely on large services providers. The main goal of Ollama is to make AI accessible to anyone by using their own hardware.
You can thing about Ollama as a platform giving all the tools to download and use pre-trained AI models. Once you download a specific model in your Raspberry PI, you can start a chat with it, as you are used to do with the most famous AI services.
Many models are available for free from the Ollama Library, where you can get details for each model.
It is important to note that, as of the low computing and memory resources of our Raspberry PI, we will be able to use only these models with low computing requirements, as you will see testing them by yourself.
Here, I will show you a few models that I’ve tested by myself in my Raspberry PI 5 Model B (8GB).
What We Need
As usual, I suggest adding from now to your favourite e-commerce shopping cart all the needed hardware, so that at the end you will be able to evaluate overall costs and decide if to continue with the project or remove them from the shopping cart. So, hardware will be only:
- Raspberry PI Computer Board (including proper power supply or using a smartphone micro USB charger with at least 3A). I suggest a computer board with at least 8GB of RAM. From Raspberry PI 5 Model B it should be enough to run generative AI models with decent performances.
- high speed micro SD card. I suggest considering a large micro SD, at least 32 GB, as you will need storage space to host locally the models. I also sogguest to use a fast micro SD (I used an A1 class micro SD, but it should be at least class 10)
Step-by-Step Procedure
Prepare the Raspberry PI Operating System
The first step is installing the Raspberry PI OS Lite (I suggest the 64-bit version, for boards supporting it) to get a fast and light operating system (headless). This is the best solution for this project as the missing desktop environment means that more resources will be focused on running your programs. If you need a desktop environment, you can also use the Raspberry PI OS Desktop, in this case working from its terminal app. Please find the differences between the 2 OS versions in my Raspberry PI OS Lite vs Desktop article.
Please make sure that your OS is up to date. From your terminal, use the following command:
sudo apt update -y && sudo apt upgrade -y
Install Ollama in Raspberry PI
Installing Ollama is made simple by the project’s one line install command. You will have it installed and running with the following terminal command:
curl -fsSL https://ollama.com/install.sh | sh
You can verify that it is installed by checking the related systemd service:
pi@raspberrypi:~ $ sudo systemctl status ollama.service
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Fri 2024-09-27 12:56:27 CEST; 2h 11min ago
Main PID: 1267 (ollama)
Tasks: 26 (limit: 9259)
CPU: 1h 30min 21.398s
CGroup: /system.slice/ollama.service
└─1267 /usr/local/bin/ollama serve
Sep 27 14:42:07 raspberrypi ollama[1267]: llama_new_context_with_model: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiB
Sep 27 14:42:07 raspberrypi ollama[1267]: llama_new_context_with_model: CPU output buffer size = 2.00 MiB
Sep 27 14:42:07 raspberrypi ollama[1267]: llama_new_context_with_model: CPU compute buffer size = 424.01 MiB
...
Or you can do it by querying the installed version:
pi@raspberrypi:~ $ ollama -v
ollama version is 0.3.12
Running a Generative AI Model
As you can see, the Ollama installation has been really simple. the same applies to run generative models. You must first choose them from the Ollama Library. Once you have identified your favourite, you can download it locally in your Raspberry PI with the ollama pull <model_name>
command.
Please consider that the models will have important sizes: the smaller we’re going to use is nearly 700MB, but the optimized models often have a size of 3-4 GB.
I’m going to show you 3 language models I tested:
- tinyllama
- llama3.1
- llama3.2
Let’s donwload the first one:
ollama pull tinyllama
Once the download finishes, you can start a chat with this model with the following command:
ollama run tinyllama
This command will make you entering an interactive session with your generative AI. You can ask anything you want to the AI, it will prompt soon the answer:
pi@raspberrypi:~ $ ollama run tinyllama
>>> Send a message (/? for help)
You will soon note that this model is enough to run fastly in your Raspberry PI, but some precise questions will generate unprecise answers. This is the big compromise for Generative AI systems: the biggest is the model, the slower it will answer, the more precise will be the answers. Depending on your needs, you will have to find the right balance between performance and answer’s quality.
In the chat environment, there are some special inputs allowing you to perform specific actions. You can see them by calling the help with the /?
special input:
>>> /?
Available Commands:
/set Set session variables
/show Show model information
/load <model> Load a session or model
/save <model> Save your current session
/clear Clear session context
/bye Exit
/?, /help Help for a command
/? shortcuts Help for keyboard shortcuts
Use """ to begin a multi-line message.
As you can see, for example, you can exit from the chat session by typing /bye
in the chat.
At the same way, you can also test other 2 language models by pulling them:
ollama pull llama3.1
ollama pull llama3.2
You can run them as seen in the previous paragraphs.
Simple Differences Between Language Models
By testing these 3 language models in my Raspberry PI, I got the following impressions:
- tinyllama: this language model performed really well in my Raspberry PI 5 in terms of answer speed and text generation. On the other hand, when going to specific questions the related answer started to appear me generic or completely wrong.
- llama3.1: this language model performed slow in my Raspberry PI 5, but the answers has been more precise. Moreover, it was able to asnwer properly also in languages different from English (I tested it in Italian)
- llama3.2: this language model was a bit faster than llama3.1, with an intermediate level between speed and precision.
Ollama Commands
Besides the pull and run commands, and Ollama has also some additional commands that you can discover from your Raspberry PI terminal by activating the related help (ollama -h
):
pi@raspberrypi:~ $ ollama -h
Large language model runner
Usage:
ollama [flags]
ollama [command]
Available Commands:
serve Start ollama
create Create a model from a Modelfile
show Show information for a model
run Run a model
stop Stop a running model
pull Pull a model from a registry
push Push a model to a registry
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command
Flags:
-h, --help help for ollama
-v, --version Show version information
Use "ollama [command] --help" for more information about a command.
Besided the other, two important commands are ollama list
and ollama rm <model_name>
, as they allows you to manage the existing models in your Raspberry PI storage (which will be small) and remove the unused models.
The ollama list command will print the pulled language models, togheter with their full name and size:
pi@raspberrypi:~ $ ollama list
NAME ID SIZE MODIFIED
llama3.1:latest 42182419e950 4.7 GB 29 minutes ago
llama3.2:latest a80c4f17acd5 2.0 GB 2 hours ago
tinyllama:latest 2644915ede35 637 MB 3 hours ago
From here, you can remove one by one the models with the second command already presented. For example:
pi@raspberrypi:~ $ ollama rm llama3.1
deleted 'llama3.1'
pi@raspberrypi:~ $ ollama list
NAME ID SIZE MODIFIED
llama3.2:latest a80c4f17acd5 2.0 GB 2 hours ago
tinyllama:latest 2644915ede35 637 MB 3 hours ago
What’s Next
Interested in more cool projects for your Raspberry PI? Take a look at peppe8o Raspberry PI tutorials.
Enjoy!