<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>推理部署 on 杨の草原</title><link>https://thinkless-github-io.pages.dev/tags/%E6%8E%A8%E7%90%86%E9%83%A8%E7%BD%B2/</link><description>Recent content in 推理部署 on 杨の草原</description><generator>Hugo</generator><language>zh-CN</language><lastBuildDate>Fri, 19 Dec 2025 17:14:09 +0800</lastBuildDate><atom:link href="https://thinkless-github-io.pages.dev/tags/%E6%8E%A8%E7%90%86%E9%83%A8%E7%BD%B2/index.xml" rel="self" type="application/rss+xml"/><item><title>多模态推理模型</title><link>https://thinkless-github-io.pages.dev/posts/%E5%A4%9A%E6%A8%A1%E6%80%81%E6%8E%A8%E7%90%86%E6%A8%A1%E5%9E%8B/</link><pubDate>Fri, 19 Dec 2025 17:14:09 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/%E5%A4%9A%E6%A8%A1%E6%80%81%E6%8E%A8%E7%90%86%E6%A8%A1%E5%9E%8B/</guid><description>梳理大型多模态推理模型（LMRMs）的技术演进路线：从早期感知驱动的模块化设计，到大模型时代的思维链（CoT）推理，再到基于强化学习的长程规划系统。</description></item><item><title>vLLM推理性能压测</title><link>https://thinkless-github-io.pages.dev/posts/vllm%E6%8E%A8%E7%90%86%E6%80%A7%E8%83%BD%E5%8E%8B%E6%B5%8B/</link><pubDate>Thu, 17 Jul 2025 21:02:19 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/vllm%E6%8E%A8%E7%90%86%E6%80%A7%E8%83%BD%E5%8E%8B%E6%B5%8B/</guid><description>vLLM 推理性能压测笔记，记录 PagedAttention 内存分配、KV Cache 优化、参数调优和吞吐量测试结果。</description></item><item><title>vLLM</title><link>https://thinkless-github-io.pages.dev/posts/vllm/</link><pubDate>Wed, 21 May 2025 21:35:30 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/vllm/</guid><description>vLLM 推理框架笔记，记录 PagedAttention 内存管理、量化技术、分布式部署和 OpenAI 兼容 API 的使用方式。</description></item><item><title>推理框架</title><link>https://thinkless-github-io.pages.dev/posts/%E6%8E%A8%E7%90%86%E6%A1%86%E6%9E%B6/</link><pubDate>Tue, 20 May 2025 10:20:29 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/%E6%8E%A8%E7%90%86%E6%A1%86%E6%9E%B6/</guid><description>我自己的推理框架实战笔记：ONNX、TensorRT、TorchScript，聊原理也给出部署踩坑与优化经验。</description></item><item><title>Ollama部署</title><link>https://thinkless-github-io.pages.dev/posts/ollama%E9%83%A8%E7%BD%B2/</link><pubDate>Fri, 09 May 2025 11:08:46 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/ollama%E9%83%A8%E7%BD%B2/</guid><description>记录 Ollama 本地部署流程，包括安装配置、模型下载、GGUF 格式导入、自定义 Modelfile 创建和运行管理。</description></item><item><title>OpenWeb UI指南（基于Docker安装）</title><link>https://thinkless-github-io.pages.dev/posts/openweb-ui%E6%8C%87%E5%8D%97%E5%9F%BA%E4%BA%8Edocker%E5%AE%89%E8%A3%85/</link><pubDate>Tue, 29 Apr 2025 21:29:21 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/openweb-ui%E6%8C%87%E5%8D%97%E5%9F%BA%E4%BA%8Edocker%E5%AE%89%E8%A3%85/</guid><description>OpenWeb UI Docker 部署指南，记录 Windows WSL2 环境配置、网络代理设置和镜像源优化。解决常见安装问题，快速搭建本地 AI 聊天界面。</description></item></channel></rss>