Dify中的工具

Dify中的工具分为内置工具（硬编码）和第三方工具（OpenAPI Swagger/ChatGPT Plugin）。工具可被Workflow（工作流）和Agent使用，当然Workflow也可被发布为工具，这样Workflow（工作流）中又可以使用Workflow（工具）。

一.Dify内置工具

下面以Google为例介绍。从前端看只要输入SerpApi API key即可，接下来重点分析后端实现。

源码位置：dify-0.6.9/api/core/tools/provider/builtin/google

1.准备工具供应商 yaml

源码位置：dify-0.6.9/api/core/tools/provider/builtin/google/google.yaml

identity:  # 工具供应商的基本信息
  author: Dify  # 作者
  name: google  # 工具供应商的名称，名称是唯一的，不允许和其它供应商重名
  label:  # 标签用于前端展示
    en_US: Google  # 英文标签
    zh_Hans: Google  # 简体中文标签
    pt_BR: Google  # 葡萄牙语标签
  description:  # 描述用于前端展示
    en_US: Google  # 英文描述
    zh_Hans: GoogleSearch  # 简体中文描述
    pt_BR: Google  # 葡萄牙语描述
  icon: icon.svg  # 图标文件名，图标文件需要放在当前模块的_assets目录下

2.准备供应商凭据

源码位置：dify-0.6.9/api/core/tools/provider/builtin/google/google.yaml

Google使用了SerpApi提供的API，而SerpApi需要一个API Key才能使用，即该工具需要一个凭证才能使用，也是前端需要输入SerpApi API key的原因。

credentials_for_provider:  # 凭据字段
  serpapi_api_key:  # 凭据字段的唯一标识
    type: secret-input  # 凭据字段的类型
    required: true  # 是否必填
    label:  # 标签用于前端展示
      en_US: SerpApi API key  # 英文标签
      zh_Hans: SerpApi API key  # 简体中文标签
      pt_BR: SerpApi API key  # 葡萄牙语标签
    placeholder:  # 提示用于前端展示
      en_US: Please input your SerpApi API key  # 英文提示
      zh_Hans: 请输入你的 SerpApi API key  # 简体中文提示
      pt_BR: Please input your SerpApi API key  # 葡萄牙语提示
    help:  # 凭据字段帮助文本
      en_US: Get your SerpApi API key from SerpApi  # 英文帮助文本
      zh_Hans: 从 SerpApi 获取您的 SerpApi API key  # 简体中文帮助文本
      pt_BR: Get your SerpApi API key from SerpApi  # 葡萄牙语帮助文本
    url: https://serpapi.com/manage-api-key  # 凭据字段帮助链接

type：凭据字段类型，目前支持secret-input、text-input、select 三种类型，分别对应密码输入框、文本输入框、下拉框，如果为secret-input，则会在前端隐藏输入内容，并且后端会对输入内容进行加密。

3.准备工具 yaml

源码位置：dify-0.6.9\api\core\tools\provider\builtin\google\tools\google_search.yaml

一个供应商底下可以有多个工具，每个工具都需要一个 yaml 文件来描述，这个文件包含了工具的基本信息、参数、输出等。

identity:  # 工具的基本信息
  name: google_search  # 工具的唯一名称
  author: Dify  # 工具的作者
  label:  # 工具的标签，用于前端展示
    en_US: GoogleSearch  # 英文标签
    zh_Hans: 谷歌搜索  # 简体中文标签
    pt_BR: GoogleSearch  # 葡萄牙语标签
description:  # 工具的描述
  human:  # 人类可读的描述
    en_US: A tool for performing a Google SERP search and extracting snippets and webpages.Input should be a search query.
    zh_Hans: 一个用于执行 Google SERP 搜索并提取片段和网页的工具。输入应该是一个搜索查询。
    pt_BR: A tool for performing a Google SERP search and extracting snippets and webpages.Input should be a search query.
  # 传递给 LLM 的介绍，为了使得LLM更好理解这个工具，我们建议在这里写上关于这个工具尽可能详细的信息，让 LLM 能够理解并使用这个工具
  llm: A tool for performing a Google SERP search and extracting snippets and webpages.Input should be a search query.
parameters:  # 参数列表
  - name: query  # 参数名称
    type: string  # 参数类型
    required: true  # 是否必填
    label:  # 参数标签
      en_US: Query string  # 英文标签
      zh_Hans: 查询语句  # 简体中文标签
      pt_BR: Query string  # 葡萄牙语标签
    human_description:  # 参数描述，用于前端展示
      en_US: used for searching  # 英文描述
      zh_Hans: 用于搜索网页内容  # 简体中文描述
      pt_BR: used for searching  # 葡萄牙语描述
    # 传递给LLM的介绍，同上，为了使得LLM更好理解这个参数，我们建议在这里写上关于这个参数尽可能详细的信息，让LLM能够理解这个参数
    llm_description: key words for searching
    form: llm  # 参数的表单类型，llm表示这个参数需要由Agent自行推理出来，前端将不会展示这个参数
  - name: result_type  # 参数名称
    type: select  # 参数类型
    required: true  # 是否必填
    options:  # 参数的选项
      - value: text
        label:
          en_US: text
          zh_Hans: 文本
          pt_BR: texto
      - value: link
        label:
          en_US: link
          zh_Hans: 链接
          pt_BR: link
    default: link  # 默认值为链接
    label:
      en_US: Result type
      zh_Hans: 结果类型
      pt_BR: Result type
    human_description:
      en_US: used for selecting the result type, text or link
      zh_Hans: 用于选择结果类型，使用文本还是链接进行展示
      pt_BR: used for selecting the result type, text or link
    form: form  # 参数的表单类型，form表示这个参数需要由用户在对话开始前在前端填写

identity 字段是必须的，它包含了工具的基本信息，包括名称、作者、标签、描述等
parameters 参数列表
- name 参数名称，唯一，不允许和其他参数重名
- type 参数类型，目前支持string、number、boolean、select 四种类型，分别对应字符串、数字、布尔值、下拉框
- required 是否必填
  - 在llm模式下，如果参数为必填，则会要求 Agent 必须要推理出这个参数
  - 在form模式下，如果参数为必填，则会要求用户在对话开始前在前端填写这个参数
- options 参数选项
  - 在llm模式下，Dify 会将所有选项传递给 LLM，LLM 可以根据这些选项进行推理
  - 在form模式下，type为select时，前端会展示这些选项
- default 默认值
- label 参数标签，用于前端展示
- human_description 用于前端展示的介绍，支持多语言
- llm_description 传递给 LLM 的介绍，为了使得 LLM 更好理解这个参数，我们建议在这里写上关于这个参数尽可能详细的信息，让 LLM 能够理解这个参数
- form 表单类型，目前支持llm、form两种类型，分别对应 Agent 自行推理和前端填写

4.准备工具代码

源码位置：dify-0.6.9\api\core\tools\provider\builtin\google\tools\google_search.py

class GoogleSearchTool(BuiltinTool):
    def _invoke(self, 
                user_id: str,  # 表示用户ID
               tool_parameters: dict[str, Any],  # 表示工具参数
        ) -> Union[ToolInvokeMessage, list[ToolInvokeMessage]]:  # 表示工具调用消息
        """
            invoke tools
        """
        query = tool_parameters['query']  # 表示查询
        result_type = tool_parameters['result_type']  # 表示结果类型
        api_key = self.runtime.credentials['serpapi_api_key']  # 表示API密钥
        result = SerpAPI(api_key).run(query, result_type=result_type)  # 表示运行查询
        if result_type == 'text':  # 表示结果类型为文本
            return self.create_text_message(text=result)  # 返回文本消息
        return self.create_link_message(link=result)  # 返回链接消息

5.准备供应商代码

源码位置：dify-0.6.9\api\core\tools\provider\builtin\google\google.py

class GoogleSearchTool(BuiltinTool):
    def _invoke(self, 
                user_id: str,  # 表示用户ID
               tool_parameters: dict[str, Any],  # 表示工具参数
        ) -> Union[ToolInvokeMessage, list[ToolInvokeMessage]]:  # 表示工具调用消息
        """
            invoke tools
        """
        query = tool_parameters['query']  # 表示查询
        result_type = tool_parameters['result_type']  # 表示结果类型
        api_key = self.runtime.credentials['serpapi_api_key']  # 表示API密钥
        result = SerpAPI(api_key).run(query, result_type=result_type)  # 表示运行查询
        if result_type == 'text':  # 表示结果类型为文本
            return self.create_text_message(text=result)  # 返回文本消息
        return self.create_link_message(link=result)  # 返回链接消息

二.工具接口中的消息返回

1.返回消息类型

源码位置：dify-0.6.9\api\core\tools\tool\tool.py

Dify支持文本 链接 图片 文件BLOB 等多种消息类型，可通过以下几个接口返回不同类型的消息给 LLM 和用户。

def create_image_message(self, image: str, save_as: str = '') -> ToolInvokeMessage:
    """
        create an image message

        :param image: the url of the image
        :return: the image message
    """
    return ToolInvokeMessage(type=ToolInvokeMessage.MessageType.IMAGE, 
                             message=image, 
                             save_as=save_as)

def create_file_var_message(self, file_var: FileVar) -> ToolInvokeMessage:
    return ToolInvokeMessage(type=ToolInvokeMessage.MessageType.FILE_VAR,
                             message='',
                             meta={
                                 'file_var': file_var
                             },
                             save_as='')

def create_link_message(self, link: str, save_as: str = '') -> ToolInvokeMessage:
    """
        create a link message

        :param link: the url of the link
        :return: the link message
    """
    return ToolInvokeMessage(type=ToolInvokeMessage.MessageType.LINK, 
                             message=link, 
                             save_as=save_as)

def create_text_message(self, text: str, save_as: str = '') -> ToolInvokeMessage:
    """
        create a text message

        :param text: the text
        :return: the text message
    """
    return ToolInvokeMessage(type=ToolInvokeMessage.MessageType.TEXT, 
                             message=text,
                             save_as=save_as
                             )

def create_blob_message(self, blob: bytes, meta: dict = None, save_as: str = '') -> ToolInvokeMessage:
    """
        create a blob message

        :param blob: the blob
        :return: the blob message
    """
    return ToolInvokeMessage(type=ToolInvokeMessage.MessageType.BLOB, 
                             message=blob, meta=meta,
                             save_as=save_as
                             )

如果要返回文件的原始数据，如图片、音频、视频、PPT、Word、Excel 等，可以使用文件 BLOB。

blob 文件的原始数据，bytes 类型。
meta 文件的元数据，如果知道该文件的类型，最好传递一个mime_type，否则Dify将使用octet/stream作为默认类型。比如：

# b64decode函数的作用是将一个Base64编码的字符串解码为原始的字节数据
self.create_blob_message(blob=b64decode(image.b64_json), meta={ 'mime_type': 'image/png' }, save_as=self.VARIABLE_KEY.IMAGE.value)

self.create_blob_message(blob=response.content, meta={'mime_type': 'image/svg+xml'})

application/octet-stream 是一种通用的二进制数据的 MIME 类型。“Octet” 是一个八位字节，“stream” 指的是数据流。这种类型通常用于表示未知的、二进制的数据。当下载或上传文件时，如果服务器或客户端不能确定文件的具体类型，就可能会使用 application/octet-stream。例如，当下载一个 .exe 文件或者 .zip 文件时，HTTP 响应的 Content-Type 头部字段可能就会被设置为 application/octet-stream。

2.总结和爬虫

还有2个常用的文本总结工具和网络爬虫工具如下：

源码位置：dify-0.6.9\api\core\tools\tool\builtin_tool.py

def summary(self, user_id: str, content: str) -> str:
    max_tokens = self.get_max_tokens()

    if self.get_prompt_tokens(prompt_messages=[
        UserPromptMessage(content=content)
    ]) < max_tokens * 0.6:
        return content
    
    def get_prompt_tokens(content: str) -> int:
        return self.get_prompt_tokens(prompt_messages=[
            SystemPromptMessage(content=_SUMMARY_PROMPT),
            UserPromptMessage(content=content)
        ])
    
    def summarize(content: str) -> str:
        summary = self.invoke_model(user_id=user_id, prompt_messages=[
            SystemPromptMessage(content=_SUMMARY_PROMPT),
            UserPromptMessage(content=content)
        ], stop=[])

        return summary.message.content

    lines = content.split('\n')
    new_lines = []
    # split long line into multiple lines
    for i in range(len(lines)):
        line = lines[i]
        if not line.strip():
            continue
        if len(line) < max_tokens * 0.5:
            new_lines.append(line)
        elif get_prompt_tokens(line) > max_tokens * 0.7:
            while get_prompt_tokens(line) > max_tokens * 0.7:
                new_lines.append(line[:int(max_tokens * 0.5)])
                line = line[int(max_tokens * 0.5):]
            new_lines.append(line)
        else:
            new_lines.append(line)

    # merge lines into messages with max tokens
    messages: list[str] = []
    for i in new_lines:
        if len(messages) == 0:
            messages.append(i)
        else:
            if len(messages[-1]) + len(i) < max_tokens * 0.5:
                messages[-1] += i
            if get_prompt_tokens(messages[-1] + i) > max_tokens * 0.7:
                messages.append(i)
            else:
                messages[-1] += i

    summaries = []
    for i in range(len(messages)):
        message = messages[i]
        summary = summarize(message)
        summaries.append(summary)

    result = '\n'.join(summaries)

    if self.get_prompt_tokens(prompt_messages=[
        UserPromptMessage(content=result)
    ]) > max_tokens * 0.7:
        return self.summary(user_id=user_id, content=result)
    
    return result

def get_url(self, url: str, user_agent: str = None) -> str:
    """
        get url
    """
    return get_url(url, user_agent=user_agent)

3.变量池

简单理解变量池用于存储工具运行过程中产生的变量、文件等，这些变量可以在工具运行过程中被其它工具使用。以DallE3和Vectorizer.AI为例，介绍如何使用变量池。

DallE3是一个图片生成工具，它可以根据文本生成图片，将让DallE3生成一个咖啡厅的 Logo。
Vectorizer.AI是一个矢量图转换工具，它可以将图片转换为矢量图，将DallE3生成的PNG图标转换为矢量图，从而可真正被设计师使用。

# DallE 消息返回
self.create_blob_message(blob=b64decode(image.b64_json), meta={ 'mime_type': 'image/png' }, save_as=self.VARIABLE_KEY.IMAGE.value)

# 从变量池中获取到之前 DallE 生成的图片
image_binary = self.get_variable_file(self.VARIABLE_KEY.IMAGE)

三.Dify第三方工具

创建自定义工具，目前支持 OpenAPI Swagger 和 ChatGPT Plugin 规范。可将 OpenAPI schema 内容直接粘贴或从 URL 内导入。工具目前支持两种鉴权方式：无鉴权和 API Key。

1.天气（JSON）

{
      "openapi": "3.1.0",
      "info": {
        "title": "Get weather data",
        "description": "Retrieves current weather data for a location.",
        "version": "v1.0.0"
      },
      "servers": [
        {
          "url": "https://weather.example.com"
        }
      ],
      "paths": {
        "/location": {
          "get": {
            "description": "Get temperature for a specific location",
            "operationId": "GetCurrentWeather",
            "parameters": [
              {
                "name": "location",
                "in": "query",
                "description": "The city and state to retrieve the weather for",
                "required": true,
                "schema": {
                  "type": "string"
                }
              }
            ],
            "deprecated": false
          }
        }
      },
      "components": {
        "schemas": {}
      }
}

2.宠物商店（YAML）

# Taken from https://github.com/OAI/OpenAPI-Specification/blob/main/examples/v3.0/petstore.yaml

    openapi: "3.0.0"
    info:
      version: 1.0.0
      title: Swagger Petstore
      license:
        name: MIT
    servers:
      - url: https://petstore.swagger.io/v1
    paths:
      /pets:
        get:
          summary: List all pets
          operationId: listPets
          tags:
            - pets
          parameters:
            - name: limit
              in: query
              description: How many items to return at one time (max 100)
              required: false
              schema:
                type: integer
                maximum: 100
                format: int32
          responses:
            '200':
              description: A paged array of pets
              headers:
                x-next:
                  description: A link to the next page of responses
                  schema:
                    type: string
              content:
                application/json:    
                  schema:
                    $ref: "#/components/schemas/Pets"
            default:
              description: unexpected error
              content:
                application/json:
                  schema:
                    $ref: "#/components/schemas/Error"
        post:
          summary: Create a pet
          operationId: createPets
          tags:
            - pets
          responses:
            '201':
              description: Null response
            default:
              description: unexpected error
              content:
                application/json:
                  schema:
                    $ref: "#/components/schemas/Error"
      /pets/{petId}:
        get:
          summary: Info for a specific pet
          operationId: showPetById
          tags:
            - pets
          parameters:
            - name: petId
              in: path
              required: true
              description: The id of the pet to retrieve
              schema:
                type: string
          responses:
            '200':
              description: Expected response to a valid request
              content:
                application/json:
                  schema:
                    $ref: "#/components/schemas/Pet"
            default:
              description: unexpected error
              content:
                application/json:
                  schema:
                    $ref: "#/components/schemas/Error"
    components:
      schemas:
        Pet:
          type: object
          required:
            - id
            - name
          properties:
            id:
              type: integer
              format: int64
            name:
              type: string
            tag:
              type: string
        Pets:
          type: array
          maxItems: 100
          items:
            $ref: "#/components/schemas/Pet"
        Error:
          type: object
          required:
            - code
            - message
          properties:
            code:
              type: integer
              format: int32
            message:
              type: string

3.空模板（JSON）

{
      "openapi": "3.1.0",
      "info": {
        "title": "Untitled",
        "description": "Your OpenAPI specification",
        "version": "v1.0.0"
      },
      "servers": [
        {
          "url": ""
        }
      ],
      "paths": {},
      "components": {
        "schemas": {}
      }
}

四.Cloudflare Workers

一个函数调用工具可以部署到Cloudflare Workers，并使用OpenAPI模式。其中，Cloudflare Workers是Cloudflare提供的一种在边缘网络运行JavaScript函数的服务。简单理解这是一个用于为dify应用创建工具的Cloudflare Worker。

# 克隆代码
git clone https://github.com/crazywoola/dify-tools-worker

# 开发模式
cp .wrangler.toml.example .wrangler.toml
npm install
npm run dev
# You will get a url like this: http://localhost:8787

# 部署模式
npm run deploy 
# You will get a url like this: https://difytoolsworker.yourname.workers.dev

填写URL从URL中导入，如下所示：