6. Spring AI 核心概念 · Albert's Stack

前面我们完成了基础设施的搭建（WebFlux、R2DBC、测试框架），现在是时候把 AI 模型接入进来，看看它是如何工作的了。我们会从整体架构入手，逐步了解各组件的设计和使用。

1. Spring AI 核心组件

Spring AI 的组件围绕「对话」这个场景设计，结构很清晰：

1.1 Message（对话的基本单元）

每条消息由角色和内容组成，Spring AI 提供了三种 Message 实现类：

类名	角色	说明	构造示例
`SystemMessage`	SYSTEM	系统指令，设定 AI 的行为和人设	`new SystemMessage("你是Java助手")`
`UserMessage`	USER	用户的输入内容	`new UserMessage("什么是WebFlux？")`
`AssistantMessage`	ASSISTANT	AI 的回复（多轮对话时用于携带历史）	`new AssistantMessage(previousReply)`

多轮对话的本质就是把历史消息列表（System + User + Assistant 交替）一起发给模型，让它理解上下文。

1.2 Prompt（请求的封装）

Prompt 把一组 Message 打包成一个请求对象，可以附加生成参数：

构造方式	说明
`new Prompt("你好")`	快捷方式，自动包装为单条 UserMessage
`new Prompt(List<Message>)`	传入多条消息（多轮对话）
`new Prompt(List<Message>, ChatOptions)`	传入消息 + 生成参数

ChatOptions 常用参数：

参数	说明	典型值
`temperature`	随机性，越高回答越发散，越低越确定	`0.7`（默认），`0` 精确，`1.0` 创意
`maxTokens`	最大生成 token 数	`2048`
`topP`	核采样概率，与 temperature 类似但控制方式不同	`0.9`

1.3 ChatModel（底层模型调用）

ChatModel 是 Spring AI 的底层抽象接口，所有模型提供商（Ollama、OpenAI、Azure 等）都实现它：

方法	返回类型	说明
`call(Prompt)`	`ChatResponse`	同步调用，等待完整回复
`stream(Prompt)`	`Flux<ChatResponse>`	流式调用，逐 token 推送

ChatResponse 包含：

方法	说明
`getResult()`	获取生成结果（`Generation` 对象）
`getResult().getOutput()`	获取 AI 回复的 `AssistantMessage`
`getResult().getOutput().getText()`	获取回复的纯文本内容
`getMetadata()`	获取元数据（模型名称、token 用量等）

1.4 ChatClient（高级 API）

ChatClient 是对 ChatModel 的封装，提供链式 API，是业务代码中推荐使用的方式。

ChatClient.Builder 常用配置：

方法	说明
`defaultSystem(String)`	设置默认系统提示词，每次请求自动携带
`defaultOptions(ChatOptions)`	设置默认生成参数（temperature 等）
`build()`	构建 ChatClient 实例

ChatClient 请求构建（.prompt() 之后可用）：

方法	说明
`.system(String)`	设置本次请求的系统提示词（覆盖默认值）
`.user(String)`	设置用户消息
`.messages(List<Message>)`	传入完整的消息列表（多轮对话）
`.options(ChatOptions)`	设置本次请求的生成参数
`.call()`	同步调用，返回 `CallResponse`
`.stream()`	流式调用，返回 `StreamResponse`

响应提取：

方法	返回类型	说明
`.call().content()`	`String`	直接获取回复文本
`.call().chatResponse()`	`ChatResponse`	获取完整响应（含元数据）
`.stream().content()`	`Flux<String>`	流式文本片段
`.stream().chatResponse()`	`Flux<ChatResponse>`	流式完整响应

用法示例：

java

String response = chatClient.prompt()
        .system("你是一个资深的 Java 开发程序员")
        .user("什么是 Spring Boot？")
        .call()
        .content();

2. 自动配置

当你在 pom.xml 中引入 spring-ai-starter-model-ollama，并在 application.yaml 中配置了 Ollama 的地址和模型，Spring Boot 会自动创建以下 Bean：

OllamaChatModel：实现了 ChatModel 接口的 Ollama 客户端
ChatClient.Builder：用于构建 ChatClient 实例

我们需要写一个配置类，用 Builder 构建出 ChatClient 并注册为 Bean：

src/main/java/com/albertstack/aichat/config/AiConfig.java

java

package com.albertstack.aichat.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class AiConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder.build();
    }
}

3. 代码调用案例

再跑测试用例前，请确保你的 Ollama 服务正在运行（ollama serve），并且已经拉取了配置文件中指定的模型。

3.1 基础调用测试

src/test/java/com/albertstack/aichat/ai/ChatClientTest.java

java

package com.albertstack.aichat.ai;

import org.junit.jupiter.api.Test;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.messages.AssistantMessage;
import org.springframework.ai.chat.messages.Message;
import org.springframework.ai.chat.messages.SystemMessage;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.model.Generation;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import reactor.test.StepVerifier;

import java.util.List;

import lombok.extern.slf4j.Slf4j;

import static org.junit.jupiter.api.Assertions.*;

@Slf4j
@SpringBootTest
class ChatClientTest {

    @Autowired
    private ChatClient chatClient;

    @Autowired
    private ChatModel chatModel;

    /**
     * 打印 ChatResponse 的完整信息（回复内容 + 思考过程 + 元数据）
     */
    private void logChatResponse(String label, ChatResponse response) {
        Generation result = response.getResult();

        // 回复内容
        log.info("[{}] 回复: {}", label, result.getOutput().getText());

        // 思考过程（部分模型如 qwen3、deepseek-r1 支持，存放在 Generation 的 metadata 中）
        Object thinking = result.getMetadata().get("thinking");
        if (thinking != null && !thinking.toString().isBlank()) {
            log.info("[{}] 思考过程: {}", label, thinking);
        }

        // 模型元数据
        log.info("[{}] 模型: {}", label, response.getMetadata().getModel());
    }

    // ========== 同步调用 ==========

    @Test
    void call_shouldReturnNonEmptyResponse() {
        String userMsg = "用一句话解释什么是 Java";
        log.info("---------- 同步调用（基础） ----------");
        log.info("[提问] {}", userMsg);

        ChatResponse response = chatClient.prompt()
                .user(userMsg)
                .call()
                .chatResponse();

        assertNotNull(response.getResult());
        logChatResponse("同步调用", response);
        log.info(" ✅ 同步调用成功");
    }

    @Test
    void call_withSystemMessage_shouldFollowInstruction() {
        String systemMsg = "你是一个简洁的助手，回答不超过20个字";
        String userMsg = "什么是 Spring Boot？";
        log.info("---------- 同步调用（系统提示词） ----------");
        log.info("[系统] {}", systemMsg);
        log.info("[提问] {}", userMsg);

        ChatResponse response = chatClient.prompt()
                .system(systemMsg)
                .user(userMsg)
                .call()
                .chatResponse();

        assertNotNull(response.getResult());
        logChatResponse("简洁模式", response);
        log.info(" ✅ 系统提示词生效");
    }

    // ========== 流式调用 ==========

    @Test
    void stream_shouldEmitMultipleChunks() {
        String userMsg = "用三句话介绍 Spring AI";
        log.info("---------- 流式调用 ----------");
        log.info("[提问] {}", userMsg);

        var flux = chatClient.prompt()
                .user(userMsg)
                .stream()
                .chatResponse();

        StringBuilder content = new StringBuilder();
        StringBuilder thinking = new StringBuilder();
        int[] chunkCount = {0};
        boolean[] thinkingPhase = {true};

        StepVerifier.create(flux)
                .thenConsumeWhile(chunk -> {
                    Generation result = chunk.getResult();
                    if (result == null || result.getOutput() == null) return true;

                    chunkCount[0]++;

                    // 思考过程：逐 chunk 实时输出
                    Object reason = result.getMetadata().get("thinking");
                    if (reason != null && !reason.toString().isEmpty()) {
                        if (thinkingPhase[0] && thinking.isEmpty()) {
                            System.out.print("[思考] ");
                        }
                        System.out.print(reason);
                        System.out.flush();
                        thinking.append(reason);
                    }

                    // 回复内容：逐 chunk 实时输出
                    String text = result.getOutput().getText();
                    if (text != null && !text.isEmpty()) {
                        if (thinkingPhase[0]) {
                            thinkingPhase[0] = false;
                            System.out.println();
                            System.out.print("[回复] ");
                        }
                        System.out.print(text);
                        System.out.flush();
                        content.append(text);
                    }

                    return true;
                })
                .verifyComplete();

        System.out.println();
        log.info("[流式] 共接收 {} 个 chunk（思考: {} 字符，回复: {} 字符）",
                chunkCount[0], thinking.length(), content.length());
        log.info(" ✅ 流式调用完成");
    }

    // ========== 底层 ChatModel 调用 ==========

    @Test
    void chatModel_call_shouldReturnChatResponse() {
        String userMsg = "1 + 1 等于几？只回答数字";
        log.info("---------- ChatModel 底层调用 ----------");
        log.info("[提问] {}", userMsg);

        Prompt prompt = new Prompt(List.of(new UserMessage(userMsg)));
        ChatResponse response = chatModel.call(prompt);

        assertNotNull(response);
        assertNotNull(response.getResult());
        logChatResponse("ChatModel", response);
        log.info(" ✅ ChatModel 底层调用成功");
    }

    // ========== 多轮对话 ==========

    @Test
    void multiTurn_shouldMaintainContext() {
        log.info("---------- 多轮对话 ----------");

        // 第一轮
        String systemMsg = "你是一个数学助手";
        String firstUserMsg = "我叫 Albert，请记住我的名字";
        log.info("[系统] {}", systemMsg);
        log.info("[第一轮提问] {}", firstUserMsg);

        List<Message> messages = List.of(
                new SystemMessage(systemMsg),
                new UserMessage(firstUserMsg)
        );

        ChatResponse firstResponse = chatModel.call(new Prompt(messages));
        String firstReply = firstResponse.getResult().getOutput().getText();
        logChatResponse("第一轮", firstResponse);

        // 第二轮：带上历史消息
        String secondUserMsg = "我叫什么名字？";
        log.info("[第二轮提问] {}", secondUserMsg);

        List<Message> secondMessages = List.of(
                new SystemMessage(systemMsg),
                new UserMessage(firstUserMsg),
                new AssistantMessage(firstReply),
                new UserMessage(secondUserMsg)
        );

        ChatResponse secondResponse = chatModel.call(new Prompt(secondMessages));
        String secondReply = secondResponse.getResult().getOutput().getText();
        logChatResponse("第二轮", secondResponse);

        assertTrue(secondReply.contains("Albert"), "AI 应该记住用户的名字，实际回复: " + secondReply);
        log.info(" ✅ 多轮对话上下文保持正确");
    }
}

首次运行可能较慢（模型加载需要时间），后续调用会快很多。

3.2 测试用例解读

测试	验证点	关键 API
`call_shouldReturnNonEmptyResponse`	同步调用 + 完整响应	`.call().chatResponse()` 获取 `ChatResponse`
`call_withSystemMessage`	系统提示词	`.system()` 设置角色
`stream_shouldEmitMultipleChunks`	流式调用 + 思考过程	`.stream().chatResponse()` 返回 `Flux<ChatResponse>`
`chatModel_call`	底层 API 调用	`chatModel.call(Prompt)` -> `ChatResponse`
`multiTurn_shouldMaintainContext`	多轮对话上下文	手动拼接 `Message` 列表

4. 调用链解析

4.1 同步调用

java

chatClient.prompt().user("hello").call().content()

这行代码看似简单，内部实际经历了 5 个阶段：

下面逐步拆解每个阶段：

1. ChatClient - 构建 Prompt

.prompt() 创建一个 ChatClientRequest 构建器，.user("hello") 将用户消息封装为 UserMessage 对象，加入消息列表。

此时还会合并你在 ChatClient.builder() 中预设的 defaultSystem、defaultAdvisors 等配置。

2. Advisor 链 - 拦截 & 增强

调用 .call() 时，首先触发 Advisor 拦截链。

Spring AI 的 Advisor 机制类似 Servlet Filter，每个 RequestResponseAdvisor 按顺序执行 adviseRequest()，可以修改 Prompt（比如注入 RAG 上下文、添加聊天记忆），最后再反向执行 adviseResponse() 处理返回结果。

3. OllamaChatModel - .call(Prompt)

Advisor 链处理完毕后，OllamaChatModel.call(Prompt) 接管。

它将 Spring AI 的通用 Prompt 对象转换为 Ollama API 所需的请求格式，包括模型名称（如 qwen3.5:9b）、消息数组、温度 / top-p 等推理参数，以及 stream: false 标记。

4. RestClient - POST /api/chat

转换后的请求通过 RestClient 以 POST 方式发送到 http://localhost:11434/api/chat 。

Ollama 收到后加载对应模型权重，执行前向推理，等全部 token 生成完毕后，一次性返回完整 JSON 响应。

5. 响应回路 - ChatResponse -> String

OllamaChatModel 将 Ollama 返回的 JSON 解析为 Spring AI 的 ChatResponse 对象，其中包含：

Generation（模型输出）

ChatResponseMetadata（token 用量、模型信息等）

响应经过 Advisor 链反向执行 adviseResponse() 后，最终 .content() 从中提取 Generation.getOutput().getText()，返回纯文本字符串。

4.2 流式调用

java

chatClient.prompt().user("hello").stream().chatResponse()

流式调用的整体链路与同步类似，但从第 3 步开始产生关键差异：

1. 和 2. 两个阶段（构建 Prompt、执行 Advisor 链）与同步调用一致

3. OllamaChatModel - .stream(Prompt)

与同步调用的唯一区别是将请求中的 stream 字段设为 true，其余参数（模型名称、消息数组、推理参数）完全相同。

4. WebClient - POST /api/chat

底层切换为 WebClient（而非同步调用使用的 RestClient）发送请求。

Ollama 以 Transfer-Encoding: chunked 响应，每生成一个 token 就推送一个 JSON 片段，而非等全部生成完再返回。

5. 响应回路 - Flux<ChatResponse> -> Flux<String>

每个 JSON 片段被解析为一个独立的 ChatResponse，整体返回类型是 Flux<ChatResponse>。

调用 .content() 则进一步映射为 Flux<String>，每个元素是一小段文本片段，前端可以通过 SSE 逐步展示给用户，实现"打字机"效果。

5. ChatClient vs ChatModel

特性	ChatClient	ChatModel
API 风格	链式调用，流畅	传统方法调用
抽象层级	高层（面向业务）	底层（面向协议）
默认配置	支持默认系统提示词等	不支持
返回类型	直接返回 `String` 或 `Flux<String>`	返回 `ChatResponse`（需手动提取）
推荐场景	业务代码中使用	需要细粒度控制时使用

业务代码中优先使用 ChatClient，需要访问元数据（如 token 用量）时再用 ChatModel。

6. 实用技巧

6.1 控制思考模式

部分模型（如 qwen3.5、deepseek-r1）默认开启「思考模式」，回复前会先输出一段推理过程。这在需要逻辑推导的场景很有用，但在简单问答场景会拖慢响应速度。

思考模式本质上是模型的能力，不是参数开关。 最可靠的做法是准备两个模型，一个带思考，一个不带思考，在代码中根据场景切换：

模型	说明	来源
`qwen3.5:9b`	默认开启思考，适合复杂推理	Ollama 官方库
`frob/qwen3.5-instruct:9b`	同架构但关闭思考，响应更快	社区模型

frob/qwen3.5-instruct 是社区基于 qwen3.5 制作的 instruct（指令遵循）版本，跳过了内部推理过程，直接输出回答。参数规格和 qwen3.5 一致（4b / 9b / 27b / 35b / 122b），使用前需要先拉取：ollama pull frob/qwen3.5-instruct:9b

6.1.1 配置双模型

在 application.yaml 中配置两个模型名称：

yaml

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: qwen3.5:9b # 默认模型（带思考）

# 自定义配置
ai:
  model:
    no-think: frob/qwen3.5-instruct:9b
  prompt:
    default: classpath:prompts/default.txt
    no-think: classpath:prompts/no-think.txt

6.1.2 系统提示词文件

将系统提示词抽离到独立文件中，便于维护和调整，不需要改代码就能修改 AI 的行为。

src/main/resources/prompts/default.txt

text

你是一个严谨的AI助手，请严格遵循以下规则：

【事实性要求】
- 只能基于已知信息或合理推断回答问题
- 不允许编造、虚构或猜测事实
- 若信息不足，请明确说明"无法确定"或"需要更多信息"

【表达要求】
- 使用简洁、清晰、专业的中文回答
- 避免冗余、废话和重复内容
- 优先给出结论，再补充必要解释

【风格要求】
- 保持客观、中立、技术导向
- 不使用情绪化表达或迎合性语言

请按照以上规则回答用户问题。

src/main/resources/prompts/no-think.txt

text

你是一个高效的AI助手，请严格遵循以下规则：

【核心要求】
- 直接给出结论，不展示推理过程
- 不编造事实，不确定时直接说明

【表达要求】
- 使用简洁、清晰的中文
- 控制输出长度，避免冗余解释

【行为约束】
- 不输出思考过程
- 不进行不必要的扩展分析

请直接回答用户问题。

6.1.3 修改 AiConfig

通过 OllamaChatOptions 在请求级别覆盖模型名称，实现动态切换：

src/main/java/com/albertstack/aichat/config/AiConfig.java

java

package com.albertstack.aichat.config;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.ollama.api.OllamaChatOptions;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;

import java.io.IOException;
import java.nio.charset.StandardCharsets;

@Configuration
public class AiConfig {

    @Value("${ai.model.no-think}")
    private String noThinkModel;

    @Value("${ai.prompt.default}")
    private Resource defaultPrompt;

    @Value("${ai.prompt.no-think}")
    private Resource noThinkPrompt;

    /**
     * 默认 ChatClient（使用 yaml 中配置的默认模型，带思考）
     */
    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) throws IOException {
        return builder
                .defaultSystem(defaultPrompt.getContentAsString(StandardCharsets.UTF_8))
                .build();
    }

    /**
     * 无思考 ChatClient（使用 instruct 模型，跳过推理过程）
     */
    @Bean
    public ChatClient noThinkChatClient(ChatClient.Builder builder) throws IOException {
        return builder
                .defaultSystem(noThinkPrompt.getContentAsString(StandardCharsets.UTF_8))
                .defaultOptions(OllamaChatOptions.builder()
                        .model(noThinkModel)
                        .build())
                .build();
    }
}

使用时根据场景注入不同的 ChatClient：

java

@Autowired
private ChatClient chatClient; // 带思考，适合复杂问题

@Autowired
private ChatClient noThinkChatClient; // 无思考，适合简单问答

// 复杂问题 -> 用带思考的模型
String deepAnswer = chatClient.prompt()
        .user("证明根号2是无理数")
        .call()
        .content();

// 简单问答 -> 用无思考的模型，响应更快
String quickAnswer = noThinkChatClient.prompt()
        .user("1 + 1 等于几？")
        .call()
        .content();

读取思考内容（仅带思考的模型会返回）：

java

ChatResponse response = chatClient.prompt()
        .user("证明根号2是无理数")
        .call()
        .chatResponse();

String reply = response.getResult().getOutput().getText();

// 思考过程（仅思考模型支持，instruct 模型此处为 null）
Object thinking = response.getResult().getMetadata().get("thinking");
if (thinking != null && !thinking.toString().isBlank()) {
    log.info("思考过程: {}", thinking);
}

6.2 System Prompt 设计原则

系统提示词的质量直接决定 AI 回复的质量。回顾前面 prompts/default.txt 中的提示词，它运用了以下设计原则：

原则	说明	对应我们的提示词
明确角色	告诉 AI 它是谁	"你是一个严谨的AI助手"
限定范围	该做什么、不该做什么	"不允许编造、虚构或猜测事实"
约束格式	指定语言和表达方式	"使用简洁、清晰、专业的中文回答"
控制风格	定义回复的调性	"保持客观、中立、技术导向"
分模块组织	按类别分组规则，结构清晰	【事实性要求】【表达要求】【风格要求】

6.3 PromptTemplate（模板化提示词）

除了系统提示词，有时候 用户提示词 也需要动态拼接变量。直接用字符串拼接容易出错，Spring AI 提供了 PromptTemplate，支持变量占位符：

java

import org.springframework.ai.chat.prompt.PromptTemplate;

// 定义模板（使用 {变量名} 占位）
PromptTemplate template = new PromptTemplate(
        "请将以下内容翻译成{language}：\n\n{content}"
);

// 填充变量并生成 Prompt
Prompt prompt = template.create(Map.of(
        "language", "英文",
        "content", "Spring AI 是一个很棒的框架"
));

// 调用
String result = chatClient.prompt(prompt).call().content();

模板也可以定义在资源文件中，适合长模板：

src/main/resources/prompts/translate.st

text

请将以下内容翻译成{language}：

{content}

要求：
- 保持原文的技术术语不翻译
- 翻译要自然流畅

java

PromptTemplate template = new PromptTemplate(
        new ClassPathResource("prompts/translate.st")
);

PromptTemplate 适用于 动态用户提示词（如翻译、代码生成等需要填充变量的场景），而系统提示词（角色设定）应该通过 defaultSystem + 外部文件统一管理，两者各司其职。

6.4 实用技巧验证测试

上面介绍了思考模式切换、系统提示词设计和 PromptTemplate 三个技巧，光看代码片段可能缺乏直观感受。下面用一个独立的测试类来验证它们：

src/test/java/com/albertstack/aichat/ai/PromptTipsTest.java

java

package com.albertstack.aichat.ai;

import org.junit.jupiter.api.Test;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.model.Generation;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.util.Map;

import lombok.extern.slf4j.Slf4j;

import static org.junit.jupiter.api.Assertions.*;

@Slf4j
@SpringBootTest
class PromptTipsTest {

    @Autowired
    private ChatClient chatClient; // 默认模型（带思考）

    @Autowired
    private ChatClient noThinkChatClient; // instruct 模型（无思考）

    // ========== 思考模式切换 ==========

    @Test
    void thinkModel_shouldIncludeThinkingProcess() {
        String userMsg = "证明 1 + 1 = 2 的基本思路是什么？用两句话概括";
        log.info("---------- 带思考模型 ----------");
        log.info("[提问] {}", userMsg);

        ChatResponse response = chatClient.prompt()
                .user(userMsg)
                .call()
                .chatResponse();

        Generation result = response.getResult();
        assertNotNull(result);

        String reply = result.getOutput().getText();
        log.info("[回复] {}", reply);

        // 思考模型应包含推理过程
        Object thinking = result.getMetadata().get("thinking");
        boolean hasThinking = thinking != null && !thinking.toString().isBlank();
        log.info("[思考过程] {}", hasThinking ? "有（" + thinking.toString().length() + " 字符）" : "（无）");

        assertTrue(hasThinking, "带思考的模型应返回思考过程");
        log.info(" ✅ 思考模型验证通过");
    }

    @Test
    void noThinkModel_shouldSkipThinkingProcess() {
        String userMsg = "1 + 1 等于几？只回答数字";
        log.info("---------- 无思考模型 ----------");
        log.info("[提问] {}", userMsg);

        ChatResponse response = noThinkChatClient.prompt()
                .user(userMsg)
                .call()
                .chatResponse();

        Generation result = response.getResult();
        assertNotNull(result);

        String reply = result.getOutput().getText();
        log.info("[回复] {}", reply);

        // instruct 模型不应有思考过程
        Object thinking = result.getMetadata().get("thinking");
        boolean hasThinking = thinking != null && !thinking.toString().isBlank();
        log.info("[思考过程] {}", hasThinking ? thinking : "（无）");

        assertFalse(hasThinking, "无思考模型不应返回思考过程");
        log.info(" ✅ 无思考模型验证通过");
    }

    // ========== System Prompt 设计原则 ==========

    @Test
    void systemPrompt_shouldConstrainResponseFormat() {
        String systemPrompt = """
                你是一个 JSON 格式化助手，遵循以下规则：
                1. 只输出合法的 JSON，不要输出任何其他内容
                2. 不要用 Markdown 代码块包裹
                3. 如果无法转为 JSON，返回 {"error": "无法处理"}
                """;
        String userMsg = "将以下信息转为 JSON：名称Albert，年龄18，职业程序员";
        log.info("---------- System Prompt 格式约束 ----------");
        log.info("[系统] {}", systemPrompt);
        log.info("[提问] {}", userMsg);

        String reply = chatClient.prompt()
                .system(systemPrompt)
                .user(userMsg)
                .call()
                .content();

        log.info("[回复] {}", reply);

        // 验证回复是合法 JSON 格式（包含关键字段）
        assertTrue(reply.contains("Albert"), "回复应包含姓名信息");
        assertTrue(reply.contains("{") && reply.contains("}"), "回复应为 JSON 格式");
        log.info(" ✅ 系统提示词成功约束了输出格式");
    }

    @Test
    void systemPrompt_shouldLimitScope() {
        String systemPrompt = """
                你是一个 Java 编程助手。
                规则：只回答 Java 编程相关问题，如果用户问的不是编程问题，回复"抱歉，我只能回答编程相关问题"。
                """;
        String userMsg = "今天天气怎么样？";
        log.info("---------- System Prompt 范围限定 ----------");
        log.info("[系统] {}", systemPrompt);
        log.info("[提问] {}", userMsg);

        String reply = chatClient.prompt()
                .system(systemPrompt)
                .user(userMsg)
                .call()
                .content();

        log.info("[回复] {}", reply);

        // AI 应该拒绝回答非编程问题
        assertTrue(reply.contains("编程") || reply.contains("抱歉") || reply.contains("无法"),
                "AI 应拒绝回答非编程问题，实际回复: " + reply);
        log.info(" ✅ 系统提示词成功限定了回答范围");
    }

    // ========== PromptTemplate 模板化提示词 ==========

    @Test
    void promptTemplate_shouldFillVariables() {
        log.info("---------- PromptTemplate 变量填充 ----------");

        PromptTemplate template = new PromptTemplate(
                "请将以下内容翻译成{language}，直接输出译文：\n\n{content}"
        );

        String targetLang = "英文";
        String content = "Spring AI 让 Java 开发者能快速集成大语言模型";
        log.info("[模板变量] language={}, content={}", targetLang, content);

        Prompt prompt = template.create(Map.of(
                "language", targetLang,
                "content", content
        ));

        String reply = chatClient.prompt(prompt).call().content();
        log.info("[翻译结果] {}", reply);

        // 翻译结果应包含英文单词
        assertTrue(reply.matches(".*[a-zA-Z]{3,}.*"),
                "翻译结果应包含英文，实际回复: " + reply);
        log.info(" ✅ PromptTemplate 变量填充并调用成功");
    }

    @Test
    void promptTemplate_multiVariable_shouldGenerateStructuredOutput() {
        log.info("---------- PromptTemplate 多变量结构化 ----------");

        PromptTemplate template = new PromptTemplate(
                "用{language}写一个{function}函数，要求：\n- 函数名：{functionName}\n- 只输出代码，不要解释"
        );

        Prompt prompt = template.create(Map.of(
                "language", "Java",
                "function", "判断一个数是否为质数",
                "functionName", "isPrime"
        ));

        log.info("[生成的 Prompt] {}", prompt.getContents());

        String reply = chatClient.prompt(prompt).call().content();
        log.info("[生成代码]\n{}", reply);

        // 验证生成了包含指定函数名的代码
        assertTrue(reply.contains("isPrime"),
                "生成的代码应包含函数名 isPrime，实际回复: " + reply);
        log.info(" ✅ 多变量模板成功生成结构化代码");
    }
}

6.5 测试用例解读

测试	验证的技巧	关键点
`thinkModel_shouldIncludeThinkingProcess`	思考模式（开启）	默认模型返回 `thinking` 元数据
`noThinkModel_shouldSkipThinkingProcess`	思考模式（关闭）	instruct 模型不返回 `thinking` 元数据
`systemPrompt_shouldConstrainResponseFormat`	System Prompt 格式约束	通过系统提示词让 AI 只输出 JSON
`systemPrompt_shouldLimitScope`	System Prompt 范围限定	通过系统提示词让 AI 拒绝非编程问题
`promptTemplate_shouldFillVariables`	PromptTemplate 变量填充	`{language}`、`{content}` 占位符被替换后生成翻译
`promptTemplate_multiVariable_shouldGenerateStructuredOutput`	PromptTemplate 多变量	多个变量组合生成结构化代码输出

7. 小结

Spring AI 用 Message、Prompt、ChatModel、ChatClient 四层抽象把模型调用包装成常规 Java API，业务代码只需要面向链式 API 编写，底层协议由框架统一处理。

知识点	说明
Message	对话的基本单元，分为 System / User / Assistant 三种角色
Prompt	将消息列表封装为请求，支持 `ChatOptions` 配置生成参数
ChatModel	底层模型调用接口，`call()` 同步、`stream()` 流式
ChatClient	高级链式 API，推荐在业务代码中使用
多轮对话	本质是把历史消息列表一起发给模型
思考模式	双模型切换实现（`qwen3.5:9b` 带思考 / instruct 无思考）
Prompt 优化	系统提示词设计原则 + `PromptTemplate` 模板化

接下来介绍话题与消息管理，实现数据层建模和话题 CRUD 的全栈开发。