MetaGPT多智能体框架

使 GPT 以软件公司的形式工作，协作处理更复杂的任务

MetaGPT一行要求作为输入，输出用户故事 / 竞品分析 / 需求 / 数据结构 / APIs / 文件等
MetaGPT 内部包括产品经理 / 架构师 / 项目经理 / 工程师，它把软件公司的流程抽成一套 SOP
Code = SOP(Team) 是核心哲学：把 SOP 具象化，再交给由 LLM 组成的团队执行

A software company consists of LLM-based roles

快速开始

安装稳定版本

pip install metagpt

以开发模式安装

适合需要定制框架、试验新想法，或基于框架实现复杂能力（比如新的记忆机制）的开发者和研究者。

git clone https://github.com/geekan/MetaGPT.git
cd /your/path/to/MetaGPT
pip install -e .

安装子模块

RAG, pip install 'metagpt[rag]'. 用途：用于基于 RAG（Retrieval-Augmented Generation，检索增强生成）的系统，结合多个 LLM（大语言模型）和向量存储技术。
OCR, pip install 'metagpt[ocr]'. 用途：用于光学字符识别（OCR）任务，识别和提取图像中的文本。
search-ddg, pip install 'metagpt[search-ddg]'. 用途：用于 DuckDuckGo 搜索功能。
search-google, pip install 'metagpt[search-google]'. 用途：用于与 Google API（如 Google 搜索 API）进行交互。
selenium, pip install 'metagpt[selenium]'. 用途：用于自动化浏览器操作和网页抓取。

配置大模型 API

OpenAI API

其他大模型的 API 配置过程类似，可以通过设置 config2.yaml 完成。

使用`config2.yaml`

在当前工作目录中创建一个名为config的文件夹，并在其中添加一个名为config2.yaml的新文件。
将示例config2.yaml文件的内容复制到新文件中。
将自己的值填入文件中：

llm:
  api_type: 'openai' # or azure / ollama / groq etc. Check LLMType for more options
  api_key: 'sk-...' # YOUR_API_KEY
  model: 'gpt-4-turbo' # or gpt-3.5-turbo
  # base_url: 'https://api.openai.com/v1'  # or any forward url.
  # proxy: 'YOUR_LLM_PROXY_IF_NEEDED'  # Optional. If you want to use a proxy, set it here.
  # pricing_plan: 'YOUR_PRICING_PLAN' # Optional. If your pricing plan uses a different name than the `model`.

注意：MetaGPT 将按照以下优先顺序读取：~/.metagpt/config2.yaml > config/config2.yaml

一句话需求的软件开发

先导入已实现的角色。

import asyncio
from metagpt.roles import (
    Architect,
    Engineer,
    ProductManager,
    ProjectManager,
)
from metagpt.team import Team

然后，初始化公司团队，配置对应的智能体，设置预算，并给出一个小游戏需求。

async def startup(idea: str):
    company = Team()
    company.hire(
        [
            ProductManager(),
            Architect(),
            ProjectManager(),
            Engineer(),
        ]
    )
    company.invest(investment=3.0)
    company.run_project(idea=idea)

    await company.run(n_round=5)

运行后即可得到生成的游戏代码。

await startup(idea="write a cli blackjack game") # blackjack: 二十一点

具有单一动作的智能体

假设我们想用自然语言写代码，并让一个智能体负责这件事。这里把它叫作 SimpleCoder，需要两步：

定义一个编写代码的动作
为智能体配备这个动作

定义动作

在 MetaGPT 中，Action 类是动作的逻辑抽象。调用 self._aask 就能把动作交给 LLM 处理，底层会发起 LLM API 请求。

下面定义一个 SimpleWriteCode，它继承自 Action。它本质上是 prompt 和 LLM 调用的包装，但抽成 Action 之后，在更复杂的任务里会更顺手：智能体只关心“执行写代码这个动作”，不用到处拼 prompt、调模型。

from metagpt.actions import Action # 导入 Action 基类
import re 

class SimpleWriteCode(Action):

    PROMPT_TEMPLATE: str = """
    Write a python function that can {instruction} and provide two runnnable test cases.
    Return ```python your_code_here ``` with NO other texts,
    your code:
    """
    # 这是一个类属性，定义了发送给LLM的提示模板。
    # {instruction} 是一个占位符，将在运行时被实际的用户指令替换。
    # 模板明确要求LLM返回一个以 "```python" 开头，以 "```" 结尾的代码块，并且不要包含其他文本。
    # 这种严格的格式要求是为了方便后续的解析。

    name: str = "SimpleWriteCode"

    async def run(self, instruction: str):
        # 这是一个异步方法，是执行动作的核心逻辑。
        # 接收一个字符串参数 `instruction`，即用户对代码的需求描述。

        prompt = self.PROMPT_TEMPLATE.format(instruction=instruction)
        # 使用用户的 `instruction` 填充 `PROMPT_TEMPLATE`，生成完整的LLM提示。
        # 例如，如果 instruction 是 "calculates the sum of a list"，
        # 那么 prompt 会变成 "Write a python function that can calculates the sum of a list and provide two runnnable test cases.\nReturn ```python your_code_here ``` with NO other texts,\nyour code:"

        rsp = await self._aask(prompt)
        code_text = SimpleWriteCode.parse_code(rsp)
        # 调用类方法 `parse_code` 来从LLM的原始响应 `rsp` 中提取出纯净的Python代码。
        # 这一步非常关键，因为LLM的响应可能包含额外的解释性文本、markdown格式等。

        return code_text

    @staticmethod
    def parse_code(rsp):
        
        pattern = r"```python(.*)```"
        # 定义一个正则表达式模式。
        # ````python`：匹配字面字符串 "```python"。
        # `(.*)`：捕获组，匹配任意字符（`.`）0次或多次（`*`）。
        # `re.DOTALL` 标志使得 `.` 可以匹配包括换行符在内的所有字符。
        # ````：匹配字面字符串 "```"。
        # 目标是匹配并捕获 ````python` 和 ```` 之间的所有内容。

        match = re.search(pattern, rsp, re.DOTALL)
        # 使用 `re.search` 在LLM的响应 `rsp` 中查找匹配 `pattern` 的第一个位置。
        # `re.DOTALL` 确保 `.` 能匹配换行符，以便捕获多行代码。

        code_text = match.group(1) if match else rsp
        return code_text

定义角色

在 MetaGPT 中，Role 类是智能体的逻辑抽象。一个 Role 能执行特定的 Action，也可以拥有记忆、思考过程和行动策略。它负责把这些组件串起来。

这个示例创建了一个 SimpleCoder，它能够根据人类的自然语言描述编写代码。步骤如下：

我们为其指定一个名称和配置文件。
我们使用 self._init_action 函数为其配备期望的动作 SimpleWriteCode。
覆盖 _act 函数，写入智能体的具体行动逻辑。智能体会从最新记忆中取出人类指令，运行已配置的动作，MetaGPT 会在幕后把它作为待办事项 (self.rc.todo) 处理，最后返回一条完整消息。

from metagpt.roles import Role

class SimpleCoder(Role):
    name: str = "Alice"
    profile: str = "SimpleCoder"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.set_actions([SimpleWriteCode])

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: to do {self.rc.todo}({self.rc.todo.name})")
        todo = self.rc.todo  # todo will be SimpleWriteCode()

        msg = self.get_memories(k=1)[0]  # find the most recent messages
        code_text = await todo.run(msg.content)
        msg = Message(content=code_text, role=self.profile, cause_by=type(todo))

        return msg

运行你的角色

现在初始化智能体，并用一条起始消息运行它。

import asyncio

from metagpt.context import Context

async def main():
    msg = "write a function that calculates the sum of a list"
    context = Context()
    role = SimpleCoder(context=context)
    logger.info(msg)
    result = await role.run(msg)
    logger.info(result)

asyncio.run(main)

具有多个动作的智能体

Role 抽象真正有用的地方，在于可以组合多个动作（还有记忆等组件，后面再看）。把动作串起来，就能构建一个工作流，让智能体处理更复杂的任务。

假设现在不只要用自然语言生成代码，还希望生成后立即执行。这个场景可以用一个拥有多个动作的智能体来做。这里把它叫作 RunnableCoder：既写代码，也立刻运行代码。需要两个 Action：SimpleWriteCode 和 SimpleRunCode。

定义动作

先定义 SimpleWriteCode，这里直接复用上面创建的版本。

再定义 SimpleRunCode。从概念上说，一个动作可以调用 LLM，也可以完全不碰 LLM。SimpleRunCode 就是后者：启动一个子进程运行代码并拿到结果。这里想表达的重点是，动作逻辑没有固定模板，用户可以按自己的需求设计。

class SimpleRunCode(Action):
    name: str = "SimpleRunCode"

    async def run(self, code_text: str):
        result = subprocess.run(["python3", "-c", code_text], capture_output=True, text=True)
        code_result = result.stdout
        logger.info(f"{code_result=}")
        return code_result

定义角色

和单动作智能体差别不大，核心变化是：

用 self.set_actions 初始化所有 Action
指定每次 Role 会选择哪个 Action。这里把 react_mode 设置为 “by_order”，表示 Role 会按照 self.set_actions 中的顺序执行可用的 Action（更多讨论见思考和行动）。在这个例子里，Role 执行 _act 时，self.rc.todo 会先是 SimpleWriteCode，再是 SimpleRunCode。
标准ReAct模式（默认）
先思考，再行动，直到角色认为是时候停止。这是ReAct论文中的标准思考-行动循环，即交替进行任务解决中的思考和行动，即 _think -> _act -> _think -> _act -> …
每次在 _think 期间，Role 会选择一个 Action 来响应当前观察，并在 _act 阶段运行所选 Action。随后，操作输出会成为下一次 _think 使用的新观察。框架会在 _think 中动态使用 LLM 选择动作，因此这个模式通用性较好。
按顺序执行
每次按照 set_actions 中定义的顺序执行可行操作，即 _act (Action1) -> _act (Action2) -> _act (Action3) -> …
这种模式适合确定性的标准操作程序（SOP）：我们明确知道 Role 应该采取哪些行动，以及它们的顺序。使用这种模式，只需要定义 Action，框架会接管管道构建。
覆盖 _act 函数。Role 从上一轮的人类输入或动作输出中检索消息，把合适的 Message 内容交给当前 Action (self.rc.todo)，最后返回由当前 Action 输出组成的 Message。

class RunnableCoder(Role):
    name: str = "Alice"
    profile: str = "RunnableCoder"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.set_actions([SimpleWriteCode, SimpleRunCode])
        self._set_react_mode(react_mode="by_order")

    async def _act(self) -> Message:
        logger.info(f"{self._setting}: to do {self.rc.todo}({self.rc.todo.name})")
        # By choosing the Action by order under the hood
        # todo will be first SimpleWriteCode() then SimpleRunCode()
        todo = self.rc.todo

        msg = self.get_memories(k=1)[0]  # find the most k recent messages
        result = await todo.run(msg.content)

        msg = Message(content=result, role=self.profile, cause_by=type(todo))
        self.rc.memory.add(msg)
        return msg

运行你的角色

初始化它，并用一条起始消息运行。

import asyncio

from metagpt.context import Context

async def main():
    msg = "write a function that calculates the sum of a list"
    context = Context()
    role = RunnableCoder(context=context)
    logger.info(msg)
    result = await role.run(msg)
    logger.info(result)

asyncio.run(main)

使用记忆

记忆是智能体的核心组件之一。智能体需要通过记忆拿到决策或执行动作所需的上下文，也需要用记忆沉淀技能和经验。

理解 MetaGPT 中记忆的概念
如何添加或检索记忆

什么是记忆

在 MetaGPT 中，Memory 类是智能体记忆的抽象。初始化时，Role 会创建一个 Memory 对象作为 self.rc.memory 属性；之后 _observe 会把每个 Message 存进去，供后续检索。简单说，Role 的记忆就是一个包含 Message 的列表。

检索记忆

需要获取记忆时（也就是给 LLM 准备输入上下文），可以使用 self.get_memories。函数定义如下：

def get_memories(self, k=0) -> list[Message]:
    """A wrapper to return the most recent k memories of this role, return all when k=0"""
    return self.rc.memory.get(k=k)

async def _act(self) -> Message:
        logger.info(f"{self._setting}: ready to {self.rc.todo}")
        todo = self.rc.todo

        # context = self.get_memories(k=1)[0].content # use the most recent memory as context
        context = self.get_memories() # use all memories as context

        code_text = await todo.run(context, k=5) # specify arguments

        msg = Message(content=code_text, role=self.profile, cause_by=todo)

        return msg

添加记忆

可以使用 self.rc.memory.add(msg) 添加记忆，其中 msg 必须是 Message 的实例。上面的代码片段已经展示了用法。

建议在定义 _act 逻辑时，把 Message 的动作输出添加到 Role 的记忆中。通常，Role 需要记住自己之前说过或做过什么，才知道下一步怎么动。

创建和使用工具

在 MetaGPT 中创建工具并不复杂：写好自己的函数或类，并放到 metagpt/tools/libs 目录下。

创建工具的步骤

创建预提供的函数或类:
编写专门用于与外部环境进行特定交互的函数或类，并将它们放置在metagpt/tools/libs目录中。
使用谷歌风格的文档字符串（Docstring）:
为每个函数或类配上 Google 风格的文档字符串，用来说明用途、输入参数和预期输出。
应用@register_tool装饰器:
使用 @register_tool 装饰器，确保工具能注册到工具表中。这样 DataInterpreter 才能找到并调用它。

自定义计算阶乘的工具

在 metagpt/tools/libs 中创建一个你自己的函数，假设它是 calculate_factorial.py，并添加装饰器 @register_tool 以将其注册为工具

# metagpt/tools/libs/calculate_factorial.py
import math
from metagpt.tools.tool_registry import register_tool

# 使用装饰器注册工具
@register_tool()
def calculate_factorial(n):
    """
    计算非负整数的阶乘
    """
    if n < 0:
        raise ValueError("输入必须是非负整数")
    return math.factorial(n)

在数据解释器DataInterpreter中使用工具

# main.py
import asyncio
from metagpt.roles.di.data_interpreter import DataInterpreter
from metagpt.tools.libs import calculate_factorial

async def main(requirement: str):
    role = DataInterpreter(tools=["calculate_factorial"]) # 集成工具
    await role.run(requirement)

if __name__ == "__main__":
    requirement = "请计算 5 的阶乘"
    asyncio.run(main(requirement))

注意：

别忘了为你的函数编写文档字符串（docstring），这将有助于 DataInterpreter 选择合适的工具并理解其工作方式。
在注册工具时，工具的名称就是函数的名称。
在运行 DataInterpreter 之前，记得从 metagpt.tools.libs 导入你的 calculate_factorial 模块，以确保该工具已被注册。

人类介入

在一些实际场景里，人类介入很有必要：比如做质量把关、参与关键决策，或者在游戏中扮演某个角色。

在LLM和人类之间切换

默认情况下，LLM 扮演 SimpleReviewer。如果想更好地控制审阅过程，可以由人来担任这个 Role。只要打开一个开关：初始化时设置 is_human=True。代码变为：

team.hire(
    [
        SimpleCoder(),
        SimpleTester(),
        # SimpleReviewer(), # 原始行
        SimpleReviewer(is_human=True), # 更改为这一行
    ]
)

我们作为人类充当 SimpleReviewer，和两个基于 LLM 的智能体 SimpleCoder、SimpleTester 交互。比如，可以点评 SimpleTester 写的单元测试，要求补更多边界条件，再让 SimpleTester 改写。这个切换对原始 SOP 和 Role 定义是透明的，因此可以套到任何类似场景里。

每次轮到人类回应时，运行过程会暂停并等待输入。输入内容后，消息就会发送给其他智能体。

集成开源LLM

由于上述部署暴露为 API 接口，因此通过修改配置文件 config/config2.yaml 即可生效。

OpenAI 兼容接口

如 LLaMA-Factory、FastChat、vllm 部署的 OpenAI 兼容接口。

config/config2.yaml

llm:
  api_type: open_llm
  base_url: 'http://106.75.10.xxx:8000/v1'
  model: 'llama2-13b'

ollama API 接口

如通过 ollama 部署的模型服务。

config/config2.yaml

llm:
  api_type: ollama
  base_url: 'http://127.0.0.1:11434/api'
  model: 'llama2'

ollama chat 接口的完整路由是 http://127.0.0.1:11434/api/chat，base_url 只需要配置到 http://127.0.0.1:11434/api，剩余部分由 OllamaLLM 补齐。model 为请求接口参数 model 的实际值。

为角色或动作配置不同 LLM

MetaGPT 允许为团队中的不同 Role 和 Action 使用不同的 LLM。这样可以按角色需求、动作特点选择合适的模型，对对话质量和成本做更细的控制。

以下是设置步骤：

定义配置：使用默认配置，或者从 ~/.metagpt 目录中加载自定义配置。
分配配置：将特定的 LLM 配置分配给 Role 和 Action。配置优先级：Action config > Role config > Global config（config in config2.yaml）。
团队交互：创建一个带有环境的团队，开始交互。

示例

考虑一个美国大选的现场直播环境，创建三个 Role：A、B 和 C。A、B 是两个候选人，C 是一个选民。

定义配置

可以使用默认配置为不同的 Role 和 Action 配置 LLM，也可以在 ~/.metagpt 目录中加载自定义配置。

from metagpt.config2 import Config

# 以下是一些示例配置，分别为gpt-4、gpt-4-turbo 和 gpt-3.5-turbo。
gpt4 = Config.from_home("gpt-4.yaml")  # 从`~/.metagpt`目录加载自定义配置`gpt-4.yaml`
gpt4t = Config.default()  # 使用默认配置，即`config2.yaml`文件中的配置，此处`config2.yaml`文件中的model为"gpt-4-turbo"
gpt35 = Config.default()
gpt35.llm.model = "gpt-3.5-turbo"  # 将model修改为"gpt-3.5-turbo"

分配配置

创建 Role 和 Action，并为它们分配配置。

from metagpt.roles import Role
from metagpt.actions import Action

# 创建a1、a2和a3三个Action。并为a1指定`gpt4t`的配置。
a1 = Action(config=gpt4t, name="Say", instruction="Say your opinion with emotion and don't repeat it")
a2 = Action(name="Say", instruction="Say your opinion with emotion and don't repeat it")
a3 = Action(name="Vote", instruction="Vote for the candidate, and say why you vote for him/her")

# 创建A，B，C三个角色，分别为“民主党候选人”、“共和党候选人”和“选民”。
# 虽然A设置了config为gpt4，但因为a1已经配置了Action config，所以A将使用model为gpt4的配置，而a1将使用model为gpt4t的配置。
A = Role(name="A", profile="Democratic candidate", goal="Win the election", actions=[a1], watch=[a2], config=gpt4)
# 因为B设置了config为gpt35，而为a2未设置Action config，所以B和a2将使用Role config，即model为gpt35的配置。
B = Role(name="B", profile="Republican candidate", goal="Win the election", actions=[a2], watch=[a1], config=gpt35)
# 因为C未设置config，而a3也未设置config，所以C和a3将使用Global config，即model为gpt4的配置。
C = Role(name="C", profile="Voter", goal="Vote for the candidate", actions=[a3], watch=[a1, a2])

对于关注的 Action，配置优先级为：Action config > Role config > Global config。不同 Role 和 Action 的配置如下：

Action 和角色的配置关系：

Action of interest	Global config	Role config	Action config	Effective config for the Action
a1	gpt4	gpt4	gpt4t	gpt4t
a2	gpt4	gpt35	unspecified	gpt35
a3	gpt4	unspecified	unspecified	gpt4

团队交互

创建一个带有环境的团队，并让它开始交互。

import asyncio
from metagpt.environment import Environment
from metagpt.team import Team

# 创建一个描述为“美国大选现场直播”的环境
env = Environment(desc="US election live broadcast")
team = Team(investment=10.0, env=env, roles=[A, B, C])
# 运行团队，我们应该会看到它们之间的协作
asyncio.run(team.run(idea="Topic: climate change. Under 80 words per message.", send_to="A", n_round=3))
# await team.run(idea="Topic: climate change. Under 80 words per message.", send_to="A", n_round=3) # 如果在Jupyter Notebook中运行，使用这行代码

完整代码和对应配置示例

默认配置： ~/.metagpt/config2.yaml

llm:
  api_type: 'openai'
  model: 'gpt-4-turbo'
  base_url: 'https://api.openai.com/v1'
  api_key: 'sk-...' # YOUR_API_KEY

自定义配置： ~/.metagpt/gpt-4.yaml

llm:
  api_type: 'openai'
  model: 'gpt-4o'
  base_url: 'https://api.openai.com/v1'
  api_key: 'sk-...' # YOUR_API_KEY

python

from metagpt.config2 import Config
from metagpt.roles import Role
from metagpt.actions import Action
import asyncio
from metagpt.environment import Environment
from metagpt.team import Team

# 以下是一些示例配置，分别为gpt-4、gpt-4-turbo 和 gpt-3.5-turbo。
gpt4 = Config.from_home("gpt-4.yaml")  # 从`~/.metagpt`目录加载自定义配置`gpt-4.yaml`
gpt4t = Config.default()  # 使用默认配置，即`config2.yaml`文件中的配置，此处`config2.yaml`文件中的model为"gpt-4-turbo"
gpt35 = Config.default()
gpt35.llm.model = "gpt-3.5-turbo"  # 将model修改为"gpt-3.5-turbo"

# 创建a1、a2和a3三个Action。并为a1指定`gpt4t`的配置。
a1 = Action(config=gpt4t, name="Say", instruction="Say your opinion with emotion and don't repeat it")
a2 = Action(name="Say", instruction="Say your opinion with emotion and don't repeat it")
a3 = Action(name="Vote", instruction="Vote for the candidate, and say why you vote for him/her")

# 创建A，B，C三个角色，分别为“民主党候选人”、“共和党候选人”和“选民”。
# 虽然A设置了config为gpt4，但因为a1已经配置了Action config，所以A将使用model为gpt4的配置，而a1将使用model为gpt4t的配置。
A = Role(name="A", profile="Democratic candidate", goal="Win the election", actions=[a1], watch=[a2], config=gpt4)
# 因为B设置了config为gpt35，而为a2未设置Action config，所以B和a2将使用Role config，即model为gpt35的配置。
B = Role(name="B", profile="Republican candidate", goal="Win the election", actions=[a2], watch=[a1], config=gpt35)
# 因为C未设置config，而a3也未设置config，所以C和a3将使用Global config，即model为gpt4的配置。
C = Role(name="C", profile="Voter", goal="Vote for the candidate", actions=[a3], watch=[a1, a2])

# 创建一个描述为“美国大选现场直播”的环境
env = Environment(desc="US election live broadcast")
team = Team(investment=10.0, env=env, roles=[A, B, C])
# 运行团队，我们应该会看到它们之间的协作
asyncio.run(team.run(idea="Topic: climate change. Under 80 words per message.", send_to="A", n_round=3))
# await team.run(idea="Topic: climate change. Under 80 words per message.", send_to="A", n_round=3) # 如果在Jupyter Notebook中运行，使用这行代码

快速开始#

安装稳定版本#

以开发模式安装#

安装子模块#

配置大模型 API#

OpenAI API#

使用config2.yaml#

一句话需求的软件开发#

具有单一动作的智能体#

定义动作#

定义角色#

运行你的角色#

具有多个动作的智能体#

定义动作#

定义角色#

标准ReAct模式（默认）#

按顺序执行#

运行你的角色#

使用记忆#

什么是记忆#

检索记忆#

添加记忆#

创建和使用工具#

创建工具的步骤#

自定义计算阶乘的工具#

人类介入#

在LLM和人类之间切换#

集成开源LLM#

OpenAI 兼容接口#

ollama API 接口#

为角色或动作配置不同 LLM#

示例#

定义配置#

分配配置#

团队交互#

完整代码和对应配置示例#

快速开始

安装稳定版本

以开发模式安装

安装子模块

配置大模型 API

OpenAI API

使用`config2.yaml`

一句话需求的软件开发

具有单一动作的智能体

定义动作

定义角色

运行你的角色

具有多个动作的智能体

定义动作

定义角色

标准ReAct模式（默认）

按顺序执行

运行你的角色

使用记忆

什么是记忆

检索记忆

添加记忆

创建和使用工具

创建工具的步骤

自定义计算阶乘的工具

人类介入

在LLM和人类之间切换

集成开源LLM

OpenAI 兼容接口

ollama API 接口

为角色或动作配置不同 LLM

示例

定义配置

分配配置

团队交互

完整代码和对应配置示例