程序员福音!OpenAI新推出的模型API全部支持结构化输出,JSON Schema匹配率高达100%,成本还立减一半。
还在绞尽脑汁想一堆提示词,为一顿操作后五花八门的输出结果而头疼?OpenAI终于听到了群众的呼声,为广大开发者送上渴望已久的第一大功能。OpenAI今日宣布新功能上线,ChatGPT API现已支持JSON结构化输出。
JSON(JavaScript Object Notation)是文件和数据交换格式的行业标准,因为它既易于人类读取又易于机器解析。然而,LLM常常与JSON对着干,经常会产生幻觉,要不生成仅部分遵循指令的响应,要不就生成一堆「天书」,根本无法完全解析。这就需要开发人员使用多种开源工具、尝试不同的提示或重复请求等来生成理想的输出结果,耗时耗力。结构化输出功能于今天发布,以上棘手的难题迎刃而解,确保模型生成的输出与JSON中规定的schema相匹配。一直以来,结构化输出功能是开发人员呼声最高的头号功能,奥特曼在推文中也表示,该版本是应广大用户的要求发布的。OpenAI发布的新功能确实击中了许多开发者的心,他们一致认为「This is a big deal」。纷纷留言表示赞叹,直呼「Excellent!」。几家欢喜几家愁,OpenAI的这次更新,又让人担心会吞噬初创公司。然而,对于更多的普通用户来说,他们更关心的问题是GPT-5到底什么时候发布,至于JSON Schema,「那是什么?」毕竟,没有GPT-5的消息,OpenAI今年秋季的DevDay,可能与去年相比,将会显得安静了许多。
轻松确保模式一致性
有了结构化输出,只需要定义一个JSON Schema,AI就会不再「任性」,乖乖按照指令要求输出数据。并且,新功能不仅仅让AI变得更加听话,还能大大提高输出内容的可靠性。在对复杂的JSON schema的跟踪评估中,带有结构化输出的新模型gpt-4o-2024-08-06获得了100%的满分。相比之下,gpt-4-0613的得分不到40%。实际上,JSON Schema功能就是OpenAI在去年的DevDay上推出的。现在,OpenAI在API中扩展了这项功能,确保模型生成的输出与开发人员提供的JSON Schema完全匹配。从非结构化输入生成结构化数据是当今应用中人工智能的核心用例之一。开发人员使用OpenAI API构建强大的助手,能够通过函数调用获取数据和回答问题,提取结构化数据以进行数据输入,并构建多步骤的智能体工作流(multi-step agentic workflows),从而允许LLM采取行动。
技术原理
OpenAI采用了一种双管齐下的方法来提高模型输出与JSON Schema的匹配度。最新的gpt-4o-2024-08-06模型经过训练,可以更好地理解复杂的Schema并生成与之匹配的输出。尽管模型性能已显著提升,在基准测试中达到了93%的准确性,但固有不确定性仍然存在。为了确保开发者构建应用的稳定性,OpenAI提供了一种更高准确度的方法来约束模型的输出,从而实现100%的可靠性。
约束解码
OpenAI采用了一种称为约束采样或约束解码的技术,默认情况下,模型生成输出时完全不受约束,可能从词汇表中选择任何token作为下一个输出。这种灵活性可能导致错误,例如,在生成有效JSON时随意插入无效字符。为了避免此类错误,OpenAI使用动态约束解码的方法,确保生成的输出token始终符合提供的schema。为了实现这一点,OpenAI将提供的JSON Schema转换为上下文无关文法(CFG)。对于每个JSON Schema,OpenAI计算出一个代表该模式的语法,并在采样期间高效地访问预处理的组件。这种方法不仅使生成的输出更准确,还减少了不必要的延迟。首次请求新模式可能会有额外的处理时间,但随后的请求通过缓存机制实现快速响应。
备选方案
除了CFG方法,其他方法通常使用有限状态机(FSM)或正则表达式来进行约束解码。然而,这些方法在动态更新有效token时能力有限。特别是对于复杂的嵌套或递归数据结构,FSM通常难以处理。OpenAI的CFG方法在表达复杂schema时表现出色。例如,支持递归模式的JSON schema在OpenAI API上已得到实现,但无法通过FSM方法表达。
输入成本节省一半
支持函数调用的所有模型均可实现结构化输出,包括最新的GPT-4o和GPT-4o-mini模型,以及微调模型。此功能可在Chat Completions API、Assistants API和Batch API上使用,并兼容视觉输入。与gpt-4o-2024-05-13版本相比,gpt-4o-2024-08-06版本在成本上也更具优势,开发者可以在输入端节省50%的成本(2.50美元/1M oken),在输出端节省33%的成本(10.00美元/1M token)。
如何使用结构化输出
在API中可以使用两种形式引入结构化输出:
函数调用
通过在函数定义中设置strict: true,可以实现通过工具的结构化输出。此功能适用于支持工具的所有型号,包括所有型号gpt-4-0613和gpt-3.5-turbo-0613及更高版本。启用结构化输出后,模型输出将与提供的工具定义匹配。示例请求:
POST /v1/chat/completions
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
},
{
"role": "user",
"content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "query",
"description": "Execute a query.",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"table_name": {
"type": "string",
"enum": ["orders"]
},
"columns": {
"type": "array",
"items": {
"type": "string",
"enum": [
"id",
"status",
"expected_delivery_date",
"delivered_at",
"shipped_at",
"ordered_at",
"canceled_at"
]
}
},
"conditions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"column": {
"type": "string"
},
"operator": {
"type": "string",
"enum": ["=", ">", "<", ">=", "<=", "!="]
},
"value": {
"anyOf": [
{
"type": "string"
},
{
"type": "number"
},
{
"type": "object",
"properties": {
"column_name": {
"type": "string"
}
},
"required": ["column_name"],
"additionalProperties": false
}
]
}
},
"required": ["column", "operator", "value"],
"additionalProperties": false
}
},
"order_by": {
"type": "string",
"enum": ["asc", "desc"]
}
},
"required": ["table_name", "columns", "conditions", "order_by"],
"additionalProperties": false
}
}
}
]
}
示例输出:
{
"table_name": "orders",
"columns": ["id", "status", "expected_delivery_date", "delivered_at"],
"conditions": [
{
"column": "status",
"operator": "=",
"value": "fulfilled"
},
{
"column": "ordered_at",
"operator": ">=",
"value": "2023-05-01"
},
{
"column": "ordered_at",
"operator": "<",
"value": "2023-06-01"
},
{
"column": "delivered_at",
"operator": ">",
"value": {
"column_name": "expected_delivery_date"
}
}
],
"order_by": "asc"
}
response_format参数的新选项
开发人员现在可以通过response_format的新选项json_schema选择是否需要规定格式的输出。当模型不调用工具,而是以结构化方式响应用户时,这一功能非常有用。此功能适用于最新的GPT-4o型号:今天发布的gpt-4o-2024-08-06和gpt-4o-mini-2024-07-18 。将response_format设置为strict:true时,模型输出将与提供的schema匹配。示例请求:
POST /v1/chat/completions
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor."
},
{
"role": "user",
"content": "solve 8x + 31 = 2"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
}
}
}
}
示例输出:
{
"steps": [
{
"explanation": "Subtract 31 from both sides to isolate the term with x.",
"output": "8x + 31 - 31 = 2 - 31"
},
{
"explanation": "This simplifies to 8x = -29.",
"output": "8x = -29"
},
{
"explanation": "Divide both sides by 8 to solve for x.",
"output": "x = -29 / 8"
}
],
"final_answer": "x = -29 / 8"
}
开发人员可以使用结构化输出逐步生成答案,以引导达到预期的输出。根据OpenAI的说法,开发人员不需要验证或重试格式不正确的响应,并且该功能允许更简单的提示。
原生SDK支持
OpenAI称他们的Python和Node SDK已更新,原生支持结构化输出。为工具提供架构或响应格式就像提供Pydantic或Zod对象一样简单,OpenAI的SDK能将数据类型转换为支持的JSON模式、自动将JSON响应反序列化为类型化数据结构以及解析拒绝。
from enum import Enum
from typing import Union
from pydantic import BaseModel
import openai
from openai import OpenAI
class Table(str, Enum):
orders = "orders"
customers = "customers"
products = "products"
class Column(str, Enum):
id = "id"
status = "status"
expected_delivery_date = "expected_delivery_date"
delivered_at = "delivered_at"
shipped_at = "shipped_at"
ordered_at = "ordered_at"
canceled_at = "canceled_at"
class Operator(str, Enum):
eq = "="
gt = ">"
lt = "<"
le = "<="
ge = ">="
ne = "!="
class OrderBy(str, Enum):
asc = "asc"
desc = "desc"
class DynamicValue(BaseModel):
column_name: str
class Condition(BaseModel):
column: str
operator: Operator
value: Union[str, int, DynamicValue]
class Query(BaseModel):
table_name: Table
columns: list[Column]
conditions: list[Condition]
order_by: OrderBy
client = OpenAI()
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "system",
"content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function.",
},
{
"role": "user",
"content": "look up all my orders in may of last year that were fulfilled but not delivered on time",
},
],
tools=[
openai.pydantic_function_tool(Query),
],
)
print(completion.choices[0].message.tool_calls[0].function.parsed_arguments)
而且,本机结构化输出支持也可用于response_format 。
from pydantic import BaseModel
from openai import OpenAI
class Step(BaseModel):
explanation: str
output: str
class MathResponse(BaseModel):
steps: list[Step]
final_answer: str
client = OpenAI()
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "solve 8x + 31 = 2"},
],
response_format=MathResponse,
)
message = completion.choices[0].message
if message.parsed:
print(message.parsed.steps)
print(message.parsed.final_answer)
else:
print(message.refusal)
其他用例
开发人员经常使用OpenAI的模型为各种用例生成结构化数据。其他一些例子包括:-根据用户意图动态生成用户界面开发人员可以使用结构化输出来创建代码或UI生成应用程序。使用相同的response_format ,可根据用户输入生成不同的 UI。例如,创建「园丁的登录界面」:就是用以下代码生成的:
{
"type": "div",
"label": "",
"children": [
{
"type": "header",
"label": "",
"children": [
{
"type": "div",
"label": "Green Thumb Gardening",
"children": [],
"attributes": [{ "name": "className", "value": "site-title" }]
},
{
"type": "div",
"label": "Bringing Life to Your Garden",
"children": [],
"attributes": [{ "name": "className", "value": "site-tagline" }]
}
],
"attributes": [{ "name": "className", "value": "header" }]
},
{
"type": "section",
"label": "",
"children": [
{
"type": "div",
"label": "",
"children": [
{
"type": "div",
"label": "About Us",
"children": [
{
"type": "div",
"label": "At Green Thumb Gardening, we specialize in transforming your outdoor spaces into beautiful, thriving gardens. Our team has decades of experience in horticulture and landscape design.",
"children": [],
"attributes": [
{ "name": "className", "value": "about-description" }
]
}
],
"attributes": [{ "name": "className", "value": "about-section" }]
}
],
"attributes": [{ "name": "className", "value": "content" }]
}
],
"attributes": [{ "name": "className", "value": "about-container" }]
},
{
"type": "section",
"label": "",
"children": [
{
"type": "div",
"label": "",
"children": [
{
"type": "div",
"label": "Our Services",
"children": [
{
"type": "div",
"label": "Garden Design",
"children": [],
"attributes": [
{ "name": "className", "value": "service-item" }
]
},
{
"type": "div",
"label": "Plant Care & Maintenance",
"children": [],
"attributes": [
{ "name": "className", "value": "service-item" }
]
},
{
"type": "div",
"label": "Seasonal Cleanup",
"children": [],
"attributes": [
{ "name": "className", "value": "service-item" }
]
},
{
"type": "div",
"label": "Custom Landscaping",
"children": [],
"attributes": [
{ "name": "className", "value": "service-item" }
]
}
],
"attributes": [{ "name": "className", "value": "services-list" }]
}
],
"attributes": [{ "name": "className", "value": "content" }]
}
],
"attributes": [{ "name": "className", "value": "services-container" }]
}
],
"attributes": [{ "name": "className", "value": "landing-page" }]
}
– 将最终答案与支撑性的推理或附加评论分开为模型提供一个单独的思维链字段可以提高响应的最终质量。请求:
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "9.11 and 9.9 -- which is bigger?"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "reasoning_schema",
"strict": true,
"schema": {
"type": "object",
"properties": {
"reasoning_steps": {
"type": "array",
"items": {
"type": "string"
},
"description": "The reasoning steps leading to the final conclusion."
},
"answer": {
"type": "string",
"description": "The final answer, taking into account the reasoning steps."
}
},
"required": ["reasoning_steps", "answer"],
"additionalProperties": false
}
}
}
}
结构化输出:
{
"reasoning_steps": [
"First step is to compare the numbers 9.11 and 9.9.",
"Both numbers have the same whole number part, which is 9.",
"To compare the decimal parts, convert them to the same number of decimal places.",
"9.11 has two decimal places: it is 9.11.",
"9.9 has one decimal place: it can be rewritten as 9.90.",
"Now, compare 9.11 and 9.90 by looking at the decimal parts.",
"Compare 11 with 90.",
"90 is greater than 11, so 9.90 is greater than 9.11."
],
"answer": "9.9 is bigger than 9.11."
}
– 从非结构化数据中提取结构化数据例如,指示模型从会议记录中提取待办事项、截止日期和作业等内容。请求:
POST /v1/chat/completions
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "Extract action items, due dates, and owners from meeting notes."
},
{
"role": "user",
"content": "...meeting notes go here..."
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "action_items",
"strict": true,
"schema": {
"type": "object",
"properties": {
"action_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "Description of the action item."
},
"due_date": {
"type": ["string", "null"],
"description": "Due date for the action item, can be null if not specified."
},
"owner": {
"type": ["string", "null"],
"description": "Owner responsible for the action item, can be null if not specified."
}
},
"required": ["description", "due_date", "owner"],
"additionalProperties": false
},
"description": "List of action items from the meeting."
}
},
"required": ["action_items"],
"additionalProperties": false
}
}
}
}
结构化输出:
{
"action_items": [
{
"description": "Collaborate on optimizing the path planning algorithm",
"due_date": "2024-06-30",
"owner": "Jason Li"
},
{
"description": "Reach out to industry partners for additional datasets",
"due_date": "2024-06-25",
"owner": "Aisha Patel"
},
{
"description": "Explore alternative LIDAR sensor configurations and report findings",
"due_date": "2024-06-27",
"owner": "Kevin Nguyen"
},
{
"description": "Schedule extended stress tests for the integrated navigation system",
"due_date": "2024-06-28",
"owner": "Emily Chen"
},
{
"description": "Retest the system after bug fixes and update the team",
"due_date": "2024-07-01",
"owner": "David Park"
}
]
}
安全的结构化输出安全是OpenAI的首要任务——新的结构化输出功能将遵守OpenAI现有的安全政策,并且仍然允许模型拒绝不安全的请求。为了使开发更简单,API响应上有一个新的refusal字符串值,它允许开发人员以编程方式检测模型是否生成拒绝而不是与架构匹配的输出。当响应不包含拒绝并且模型的响应没有过早中断(如finish_reason所示)时,模型的响应将可靠地生成与提供的schema匹配的有效JSON。
{
"id": "chatcmpl-9nYAG9LPNonX8DAyrkwYfemr3C8HC",
"object": "chat.completion",
"created": 1721596428,
"model": "gpt-4o-2024-08-06",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"refusal": "I'm sorry, I cannot assist with that request."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 81,
"completion_tokens": 11,
"total_tokens": 92
},
"system_fingerprint": "fp_3407719c7f"
}
参考资料:
https://openai.com/index/introducing-structured-outputs-in-the-api/
https://x.com/sama/status/1820881534909300769
https://venturebeat.com/ai/openai-has-finally-released-the-no-1-feature-developers-have-been-desperate-for/