Ollama部署

Posted on 2025-05-09 Edited on 2025-06-18 In 工具

这篇笔记主要记录了Ollama的配置过程，包括安装、环境配置和基本使用方法

下载ollama

curl -fsSL https://ollama.com/install.sh | sh

下载并导入模型文件

1.ollama pull

使用 ollama pull 下载模型文件

2.自定义模型

从 GGUF 导入：

Ollama 支持在 Modelfile 中导入 GGUF 模型：

1.创建一个名为 Modelfile 的文件，并在其中包含一个 FROM 指令，该指令指向你想要导入的模型的本地文件路径。

1	FROM ./vicuna-33b.Q4_0.gguf

2.在 Ollama 中创建模型

1	ollama create example -f Modelfile

运行模型

1	ollama run example

ollama run llama3.2
pulling manifest
pulling dde5aa3fc5ff: 100% ▕██████████████████████████████████████████████████████████▏ 2.0 GB
pulling 966de95ca8a6: 100% ▕██████████████████████████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da: 100% ▕██████████████████████████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9: 100% ▕██████████████████████████████████████████████████████████▏ 6.0 KB
pulling 56bb8bd477a5: 100% ▕██████████████████████████████████████████████████████████▏   96 B
pulling 34bb5ab01051: 100% ▕██████████████████████████████████████████████████████████▏  561 B
verifying sha256 digest
writing manifest
success
>>> 你是谁
我是GPT-4，一个人工智能语言模型。我的主要功能是处理和理解自然语言，提供信息、答案和支持。

命令行参考

创建模型

ollama create 用于从 Modelfile 创建模型。

1	ollama create mymodel -f ./Modelfile

拉取模型

1	ollama pull llama3.2

此命令也可以用于更新本地模型。只有差异部分会被拉取。

删除模型

1	ollama rm llama3.2

复制模型

1	ollama cp llama3.2 my-model

多行输入

对于多行输入，你可以用 """ 包裹文本：

>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

多模态模型

1 2	ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png" The image features a yellow smiley face, which is likely the central focus of the picture.

将提示作为参数传递

1
2

$ ollama run llama3.2 "Summarize this file: $(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

显示模型信息

1	ollama show llama3.2

列出你计算机上的模型

1	ollama list

列出当前已加载的模型

ollama ps

停止当前正在运行的模型

1	ollama stop llama3.2

启动 Ollama

ollama serve 用于在不运行桌面应用程序的情况下启动 Ollama。

构建

参见开发者指南

运行本地构建

接下来，启动服务器：

1	ollama serve

最后，在单独的 shell 中运行模型：

1	ollama run llama3.2

REST API

Ollama 拥有一个用于运行和管理模型的 REST API。

生成响应

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

与模型聊天

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ],
  "stream":false
}'

安装docker和OpenWeb UI

分开安装 (Separate Installation): Ollama 运行在 Docker 主机上，Open WebUI 运行在 Docker 容器中。注意安装cuda版本
捆绑安装 (Bundled Installation): Ollama 和 Open WebUI 都运行在同一个 Docker 容器中。

Ollama设置跨域访问

1	vim /etc/systemd/system/ollama.service

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3

Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"

[Install]
WantedBy=default.target

1
2

原始备份：
Environment="PATH=/usr/local/cuda/bin:/home/yang/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/libnvvp:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/Program Files/NVIDIA Corporation/NVIDIA app/NvDLISR:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/e/Windows Kits/10/Windows Performance Toolkit/:/mnt/c/Program Files/NVIDIA Corporation/Nsight Compute 2022.3.0/:/mnt/c/Program Files/Git/cmd:/mnt/c/Program Files/nodejs/:/mnt/c/Program Files/PowerShell/7/:/mnt/c/Program Files/Docker/Docker/resources/bin:/mnt/c/Users/biela/.local/bin:/mnt/c/Users/biela/anaconda3:/mnt/c/Users/biela/anaconda3/Library/mingw-w64/bin:/mnt/c/Users/biela/anaconda3/Library/usr/bin:/mnt/c/Users/biela/anaconda3/Library/bin:/mnt/c/Users/biela/anaconda3/Scripts:/mnt/c/Users/biela/AppData/Local/Microsoft/WindowsApps:/mnt/d/Program Files/w64devkit/bin:/mnt/c/Users/biela/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/biela/AppData/Roaming/npm:/snap/bin:/usr/local/cuda-11.8/bin"

重新加载systemd守护进程并启用Ollama服务

1
2
3

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

wsl2 mirrored模式

使用wsl2镜像模式进行docker配置，出现很多问题

解决思路：docker地址映射不正确，主机和wsl2的映射localhost差异，需要跟踪本机网络地址转向路由表，防火墙

hostname -I 查看ip，试图ip映射访问：
172.17.0.1 192.168.1.101

在docker中进行端口测试：

docker exec open-webui curl http://host.docker.internal:11434

nat 模式

更换nat模式没有任何映射跨域访问问题。