技术

思维链

复杂任务的逐步推理

Chain of Thought (CoT) 提示是一种能够显著提升 AI 在复杂推理任务上表现的技术,其核心是要求模型逐步展示其推理过程。

展示你的推理过程

就像数学老师要求学生展示解题步骤一样,CoT 提示要求 AI 将其推理过程可视化呈现。

CoT 解决的问题

当被要求直接给出答案时,AI 模型在多步推理方面可能会遇到困难。

直接回答(常常出错)

Question: A store sells apples for $2 each. If you buy 5 or more, you get 20% off. How much do 7 apples cost?

Answer: $14 ❌

Chain of Thought(正确)

Question: A store sells apples for $2 each. If you buy 5 or more, you get 20% off. How much do 7 apples cost?

Let's work through this step by step:
1. Regular price per apple: $2
2. Number of apples: 7
3. Since 7 ≥ 5, the discount applies
4. Regular total: 7 × $2 = $14
5. Discount: 20% of $14 = $2.80
6. Final price: $14 - $2.80 = $11.20

Answer: $11.20 ✓

基础 CoT 提示

简单触发短语

在你的提示中添加以下短语之一:

"Let's think step by step."
"Think through this carefully."
"Work through this problem methodically."
"Show your reasoning at each step."
"Explain your thought process."

明确的步骤请求

尝试 Chain of Thought

要求 AI 逐步解决问题。

Solve this problem:

A car rental costs $40 per day plus $0.25 per mile. If someone rents a car for 3 days and drives 200 miles, what is the total cost?

Before giving your final answer:
1. Identify what information is given
2. Determine what we need to find
3. Plan your approach
4. Execute each step, showing work
5. Verify your answer

Zero-Shot vs Few-Shot CoT

Chain of Thought 提示有两种主要方法,它们在简单性和可控性之间各有取舍。

Zero-Shot CoT

最简单的方法——只需添加一个触发短语,让模型自己决定如何推理。无需示例。这种方法对许多问题出人意料地有效,尽管你对推理格式的控制较少。

If John has 3 apples and gives half to Mary, then buys 4 more, how many does he have?

Let's think step by step.

输出示例:

Let's think step by step.
- John starts with 3 apples
- Half of 3 is 1.5, but we can't split apples
- If we round down: John gives 1 apple, keeps 2
- After buying 4 more: 2 + 4 = 6 apples

Answer: 6 apples

Few-Shot CoT

为了获得更一致的结果,通过提供完整的示例向模型展示你希望它如何推理。这让你能够控制推理风格、格式和详细程度。模型会模仿你展示的模式。

Example 1:
Q: A baker has 24 cupcakes. She puts them equally into 4 boxes. Then she eats 2 cupcakes from one box. How many cupcakes total remain?

A: Let's work through this:
- Total cupcakes: 24
- Cupcakes per box: 24 ÷ 4 = 6
- After eating 2 from one box: 6 - 2 = 4 cupcakes in that box
- Total remaining: (3 boxes × 6) + 4 = 18 + 4 = 22 cupcakes

Example 2:
Q: A train travels at 60 mph for 2 hours, then 40 mph for 3 hours. What is its average speed?

A: Let's work through this:
- Distance at 60 mph: 60 × 2 = 120 miles
- Distance at 40 mph: 40 × 3 = 120 miles
- Total distance: 120 + 120 = 240 miles
- Total time: 2 + 3 = 5 hours
- Average speed: 240 ÷ 5 = 48 mph

Now solve:
Q: A store sells notebooks for $3 each. If you buy 10 or more, you get 25% off. How much do 12 notebooks cost?

输出示例:

A: Let's work through this:
- Regular price: $3, Quantity: 12
- Since 12 ≥ 10, the 25% discount applies
- Regular total: 12 × $3 = $36
- Discount: 25% of $36 = $9
- Final price: $36 - $9 = $27

结构化 CoT 格式

为了获得一致、可复现的推理结果,请使用带有命名步骤的结构化格式。这些框架确保模型不会跳过重要阶段,并使输出更容易解析和验证。

BREAK 格式

一个易于记忆的缩写词,指导模型完成从理解到验证的完整问题解决流程。

BREAK框架

B开始
R推理
E执行
A回答
K确认
使用BREAK解决这个问题:

B - 从重述问题开始
R - 推理使用什么方法
E - 逐步执行解决方案
A - 清楚地回答
K - 通过验证/检查来确认

问题:一个矩形的长是宽的两倍。如果周长是36厘米,面积是多少?

推理模板

一种更正式的结构,将输入、目标和执行分离。非常适合需要清晰记录解决过程的技术问题。

Use this template to solve the problem:

GIVEN: [List known information]
GOAL: [What we need to find]
APPROACH: [Strategy we'll use]
STEPS:
1. [First step with reasoning]
2. [Second step with reasoning]
...
VERIFICATION: [Check the answer]
ANSWER: [Final answer]

Problem: A car uses 8 liters of fuel per 100 km. If the tank holds 45 liters and fuel costs $1.50 per liter, how much does it cost to drive 300 km?

输出示例:

GIVEN: 8L/100km, tank 45L, $1.50/L, drive 300km
GOAL: Calculate fuel cost
APPROACH: Fuel needed × cost per liter
STEPS:
1. Fuel = (300 ÷ 100) × 8 = 24 liters
2. Cost = 24 × $1.50 = $36
VERIFICATION: 24L < 45L tank ✓
ANSWER: $36

不同问题类型的 CoT

不同类型的问题适合使用不同的 CoT 方法。以下是针对常见场景优化的模式。

数学推理

数学问题从 CoT 中受益最大,因为每一步都建立在前一步的基础上。错误会累积,因此展示过程有助于及早发现错误。模型应该清楚地标注每个计算步骤。

A company's revenue was $1.2M in Q1. It grew 15% in Q2, declined 10% in Q3, and grew 25% in Q4. What was the total annual revenue?

Think through each quarter, showing your calculations clearly.

输出示例:

Q1: $1,200,000 (given)
Q2: $1,200,000 × 1.15 = $1,380,000
Q3: $1,380,000 × 0.90 = $1,242,000
Q4: $1,242,000 × 1.25 = $1,552,500

Total: $5,374,500

逻辑推理

逻辑谜题需要系统性的排除和假设检验。CoT 帮助模型追踪约束条件、测试可能性,并验证所有条件是否都得到满足。关键是有条理地探索而非猜测。

Three friends—Alice, Bob, and Carol—each have different pets (cat, dog, fish) and different jobs (teacher, engineer, doctor).

Clues:
1. Alice doesn't have a cat
2. The engineer has a dog
3. Bob is a teacher
4. Carol doesn't have a fish

Who has which pet and job? Reason through this systematically, testing hypotheses and verifying against all clues.

输出示例:

From clue 3: Bob = teacher
→ Alice and Carol are engineer/doctor

If Carol = engineer:
- Carol has dog (clue 2)
- Alice = doctor, has fish (clue 1)
- Bob has cat

Verify: All 4 clues satisfied ✓
Answer: Alice (doctor, fish), Bob (teacher, cat), Carol (engineer, dog)

代码调试

调试从 CoT 中受益,因为它迫使模型追踪执行过程而不是猜测 bug。通过用具体值逐步执行代码,实际行为变得可见,与预期行为的差异也会暴露出来。

This function should return the sum of even numbers in a list, but it's returning wrong values. Debug step by step by tracing through with the input [2, 3, 4, 6]:

def sum_evens(numbers):
  total = 0
  for n in numbers:
      if n % 2 == 0:
          total += 1
  return total

Trace each step, find the bug, and provide the fix.

输出示例:

Trace [2, 3, 4, 6]:
n=2: total += 1 → 1 (wait, adding 1 not n!)
n=3: skip (odd)
n=4: total = 2
n=6: total = 3

BUG: "total += 1" should be "total += n"
Expected: 12, Got: 3
Fix: Change to "total += n"

CoT 的自我一致性

对于关键问题,不要依赖单一的推理路径。生成多个独立的解决方案并比较结果。如果不同的方法得出相同的答案,可信度就很高。结果不一致则表明需要仔细审查。

Solve this problem 3 different ways, then give your final answer based on which solution appears most correct:

Problem: A farmer has chickens and cows. Together they have 30 heads and 74 legs. How many chickens and how many cows does the farmer have?

Approach 1: Algebra
Approach 2: Guess and check
Approach 3: Logical reasoning

Then compare and give your final answer.

输出示例:

Approach 1 (Algebra): c + w = 30, 2c + 4w = 74 → c=23, w=7
Approach 2 (Guess): Try 23/7 → 46 + 28 = 74 ✓
Approach 3 (Logic): 30 chickens = 60 legs, need 14 more → 7 cows

All agree: 23 chickens, 7 cows

CoT 用于决策制定

决策涉及在多个维度上权衡利弊。CoT 确保所有相关因素都被系统地考虑,而不是草率下结论。这种结构化方法也为将来的参考记录了推理过程。

Should we adopt microservices architecture for our application?

Context:
- Monolithic application with 50,000 lines of code
- Team of 5 developers
- 100 daily active users
- Planning for 10x growth in 2 years

Think through this systematically:
1. List the current state
2. Identify factors to consider (team size, scale, velocity, future growth)
3. Weigh each factor as for/against
4. Give a recommendation with reasoning

输出示例:

FACTORS:
- Team size (5): Too small for microservices ❌
- Scale (100 DAU): No scaling need ❌  
- Velocity: Monolith = faster iteration ❌
- Future growth: Uncertain timeline ⚠️

WEIGHING: 3 strong against, 1 weak for

RECOMMENDATION: Stay monolith, use clear module 
boundaries to ease future transition.

何时使用 CoT

适合使用 CoT

数学问题 — 减少计算错误

逻辑谜题 — 防止跳过步骤

复杂分析 — 组织思维

代码调试 — 追踪执行过程

决策制定 — 权衡利弊

不适合使用 CoT

简单问答 — 不必要的开销

创意写作 — 可能限制创造力

事实查询 — 无需推理

翻译 — 直接任务

摘要 — 通常很直接

CoT 的局限性

虽然 CoT 很强大,但它并非万能药。了解其局限性有助于你正确地应用它。

  1. 增加 token 使用量 — 更多输出意味着更高成本
  2. 并非总是必要 — 简单任务不会从中受益
  3. 可能过于冗长 — 可能需要要求简洁
  4. 推理可能有缺陷 — CoT 不保证正确性

总结

核心要点

CoT 通过将隐含步骤显式化,显著提升复杂推理能力。适用于数学、逻辑、分析和调试。权衡:以更多 token 换取更高准确性。

什么情况下不应该使用 Chain of Thought 提示?

在下一章中,我们将探索 Few-Shot Learning——通过示例来教导模型。