<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>模型微调 on 杨の草原</title><link>https://thinkless-github-io.pages.dev/tags/%E6%A8%A1%E5%9E%8B%E5%BE%AE%E8%B0%83/</link><description>Recent content in 模型微调 on 杨の草原</description><generator>Hugo</generator><language>zh-CN</language><lastBuildDate>Tue, 27 Jan 2026 20:24:09 +0800</lastBuildDate><atom:link href="https://thinkless-github-io.pages.dev/tags/%E6%A8%A1%E5%9E%8B%E5%BE%AE%E8%B0%83/index.xml" rel="self" type="application/rss+xml"/><item><title>KL 驱动下的 SFT 与 DPO</title><link>https://thinkless-github-io.pages.dev/posts/kl-%E9%A9%B1%E5%8A%A8%E4%B8%8B%E7%9A%84-sft-%E4%B8%8E-dpo/</link><pubDate>Tue, 27 Jan 2026 20:24:09 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/kl-%E9%A9%B1%E5%8A%A8%E4%B8%8B%E7%9A%84-sft-%E4%B8%8E-dpo/</guid><description>本文记录 Qwen3 + LoRA 微调实战：SFT 阶段用 KL 散度压住通用能力退化，DPO 阶段通过调节贝塔把控偏好强度。结合 Qwen3 + 全线性层LoRA，在低显存下兼顾通用性与领域性能。</description></item><item><title>LLaMA-Factory</title><link>https://thinkless-github-io.pages.dev/posts/llama-factory/</link><pubDate>Tue, 20 May 2025 16:08:08 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/llama-factory/</guid><description>LLaMA-Factory 大模型微调框架实操记录，覆盖安装配置、LoRA 微调、指令监督微调、PPO 训练等核心功能，方便快速搭建定制化语言模型。</description></item><item><title>训练与微调技术</title><link>https://thinkless-github-io.pages.dev/posts/%E8%AE%AD%E7%BB%83%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%8A%80%E6%9C%AF/</link><pubDate>Tue, 06 May 2025 11:12:21 +0800</pubDate><guid>https://thinkless-github-io.pages.dev/posts/%E8%AE%AD%E7%BB%83%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%8A%80%E6%9C%AF/</guid><description>大模型训练与微调笔记，记录全量微调、参数高效微调、LoRA 原理和指令微调方法。</description></item></channel></rss>