Run Ollama in Raspberry PI: Self-Hosted Generative AI

ollama-raspberry-pi-generative-ai-featured-image

This tutotial will show you how to install Ollama in your Raspberry PI, so getting a self-hosted, open-source generative AI system.

Please note that this tutorial applies only to Raspberry PI computer boards. Ollama (as well as any generative AI) requires a lot of CPU and RAM, so it will probably run well only in the latest Raspberry PI computer models from Raspberry PI 4 onward, with 8GB of RAM.

What is Ollama

Ollama is an open-source project providing an easy way to run large language models (LLMs) on local devices. As you can install it in your local computer, you will keep all your data there, so avoiding to rely on large services providers. The main goal of Ollama is to make AI accessible to anyone by using their own hardware.

You can thing about Ollama as a platform giving all the tools to download and use pre-trained AI models. Once you download a specific model in your Raspberry PI, you can start a chat with it, as you are used to do with the most famous AI services.

Many models are available for free from the Ollama Library, where you can get details for each model.

It is important to note that, as of the low computing and memory resources of our Raspberry PI, we will be able to use only these models with low computing requirements, as you will see testing them by yourself.

Here, I will show you a few models that I’ve tested by myself in my Raspberry PI 5 Model B (8GB).

What We Need

As usual, I suggest adding from now to your favourite e-commerce shopping cart all the needed hardware, so that at the end you will be able to evaluate overall costs and decide if to continue with the project or remove them from the shopping cart. So, hardware will be only:

Raspberry PI Computer Board (including proper power supply or using a smartphone micro USB charger with at least 3A). I suggest a computer board with at least 8GB of RAM. From Raspberry PI 5 Model B it should be enough to run generative AI models with decent performances.
high speed micro SD card. I suggest considering a large micro SD, at least 32 GB, as you will need storage space to host locally the models. I also sogguest to use a fast micro SD (I used an A1 class micro SD, but it should be at least class 10)

Step-by-Step Procedure

Prepare the Raspberry PI Operating System

The first step is installing the Raspberry PI OS Lite (I suggest the 64-bit version, for boards supporting it) to get a fast and light operating system (headless). This is the best solution for this project as the missing desktop environment means that more resources will be focused on running your programs. If you need a desktop environment, you can also use the Raspberry PI OS Desktop, in this case working from its terminal app. Please find the differences between the 2 OS versions in my Raspberry PI OS Lite vs Desktop article.

Please make sure that your OS is up to date. From your terminal, use the following command:

sudo apt update -y && sudo apt upgrade -y

Install Ollama in Raspberry PI

Installing Ollama is made simple by the project’s one line install command. You will have it installed and running with the following terminal command:

curl -fsSL https://ollama.com/install.sh | sh

You can verify that it is installed by checking the related systemd service:

pi@raspberrypi:~ $ sudo systemctl status ollama.service
● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-09-27 12:56:27 CEST; 2h 11min ago
   Main PID: 1267 (ollama)
      Tasks: 26 (limit: 9259)
        CPU: 1h 30min 21.398s
     CGroup: /system.slice/ollama.service
             └─1267 /usr/local/bin/ollama serve

Sep 27 14:42:07 raspberrypi ollama[1267]: llama_new_context_with_model: KV self size  =  896.00 MiB, K (f16):  448.00 MiB, V (f16):  448.00 MiB
Sep 27 14:42:07 raspberrypi ollama[1267]: llama_new_context_with_model:        CPU  output buffer size =     2.00 MiB
Sep 27 14:42:07 raspberrypi ollama[1267]: llama_new_context_with_model:        CPU compute buffer size =   424.01 MiB
...

Or you can do it by querying the installed version:

pi@raspberrypi:~ $ ollama -v
ollama version is 0.3.12

Running a Generative AI Model

As you can see, the Ollama installation has been really simple. the same applies to run generative models. You must first choose them from the Ollama Library. Once you have identified your favourite, you can download it locally in your Raspberry PI with the ollama pull <model_name> command.

Please consider that the models will have important sizes: the smaller we’re going to use is nearly 700MB, but the optimized models often have a size of 3-4 GB.

I’m going to show you 3 language models I tested:

tinyllama
llama3.1
llama3.2

Let’s donwload the first one:

ollama pull tinyllama

Once the download finishes, you can start a chat with this model with the following command:

ollama run tinyllama

This command will make you entering an interactive session with your generative AI. You can ask anything you want to the AI, it will prompt soon the answer:

pi@raspberrypi:~ $ ollama run tinyllama
>>> Send a message (/? for help)

You will soon note that this model is enough to run fastly in your Raspberry PI, but some precise questions will generate unprecise answers. This is the big compromise for Generative AI systems: the biggest is the model, the slower it will answer, the more precise will be the answers. Depending on your needs, you will have to find the right balance between performance and answer’s quality.

In the chat environment, there are some special inputs allowing you to perform specific actions. You can see them by calling the help with the /? special input:

>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message.

As you can see, for example, you can exit from the chat session by typing /bye in the chat.

At the same way, you can also test other 2 language models by pulling them:

ollama pull llama3.1
ollama pull llama3.2

You can run them as seen in the previous paragraphs.

Simple Differences Between Language Models

By testing these 3 language models in my Raspberry PI, I got the following impressions:

tinyllama: this language model performed really well in my Raspberry PI 5 in terms of answer speed and text generation. On the other hand, when going to specific questions the related answer started to appear me generic or completely wrong.
llama3.1: this language model performed slow in my Raspberry PI 5, but the answers has been more precise. Moreover, it was able to asnwer properly also in languages different from English (I tested it in Italian)
llama3.2: this language model was a bit faster than llama3.1, with an intermediate level between speed and precision.

Ollama Commands

Besides the pull and run commands, and Ollama has also some additional commands that you can discover from your Raspberry PI terminal by activating the related help (ollama -h):

pi@raspberrypi:~ $ ollama -h
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.

Besided the other, two important commands are ollama list and ollama rm <model_name>, as they allows you to manage the existing models in your Raspberry PI storage (which will be small) and remove the unused models.

The ollama list command will print the pulled language models, togheter with their full name and size:

pi@raspberrypi:~ $ ollama list
NAME                ID              SIZE      MODIFIED
llama3.1:latest     42182419e950    4.7 GB    29 minutes ago
llama3.2:latest     a80c4f17acd5    2.0 GB    2 hours ago
tinyllama:latest    2644915ede35    637 MB    3 hours ago

From here, you can remove one by one the models with the second command already presented. For example:

pi@raspberrypi:~ $ ollama rm llama3.1
deleted 'llama3.1'

pi@raspberrypi:~ $ ollama list
NAME                ID              SIZE      MODIFIED
llama3.2:latest     a80c4f17acd5    2.0 GB    2 hours ago
tinyllama:latest    2644915ede35    637 MB    3 hours ago