LLM大模型统一封装接口解决方案

Jack原创2024年3月5日大约 13 分钟

目的：封装一个通用的 Java 框架的 chat completion 接口，来适配各种大模型的统一调用

限制条件：仅对聊天（文本）接口进行封装，其他接口待后续计划安排

OpenAi chat completion 接口分析

==>官方文档

Curl 默认请求

curl http://chat.xxxxxx.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

返回值

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0125",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Python 流式请求

from openai import OpenAI
# client = OpenAI()

client = OpenAI(
    base_url='http://chat.xxxxxx.com/'
)
completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  stream=True
)

for chunk in completion:
  print(chunk.choices[0].delta)

返回值

{
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
			"index": 0,
			"delta": {
				"role": "assistant",
				"content": ""
			},
			"logprobs": null,
			"finish_reason": null
		}
	]
}

{
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
			"index": 0,
			"delta": {
				"content": "Hello"
			},
			"logprobs": null,
			"finish_reason": null
		}
	]
}

....

{
	"id": "chatcmpl-123",
	"object": "chat.completion.chunk",
	"created": 1694268190,
	"model": "gpt-3.5-turbo-0125",
	"system_fingerprint": "fp_44709d6fcb",
	"choices": [
		{
			"index": 0,
			"delta": {},
			"logprobs": null,
			"finish_reason": "stop"
		}
	]
}

请求参数分析

model 选填，默认为 gpt-3.5-turbo
messages 必填

[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]

stream 选填，默认为 false

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message

temperature 选填，默认为 1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

使用什么样的采样温度，在 0 到 2 之间。较高的值，比如 0.8，会使输出更随机，而较低的值，比如 0.2，会使其更加聚焦和确定性。

We generally recommend altering this or top_p but not both.

我们一般建议修改这个参数或者 top_p，但不要同时修改两者。

top_p 选填，默认为 1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

一种与温度采样相对应的替代方法是核心采样，模型会考虑具有 top_p 概率质量的标记的结果。因此，0.1 表示仅考虑构成顶部 10% 概率质量的标记。

We generally recommend altering this or temperature but not both.

我们一般建议修改这个参数或者温度，但不要同时修改两者。

其他可能用到的请求参数

max_tokens 选填，无默认值

The maximum number of tokens that can be generated in the chat completion.

n integer or null 选填，默认值 1

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

输出参数对比(默认和流式结果)

流式结果没有 usage 字段，要计算token数需要额外计算，官方并未提供相关方法
计算token 的 官方介绍
Another small drawback of streaming responses is that the response no longer includes the usage field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using tiktoken.
python库 tiktoken使用方法
普通请求结果的 choices 字段中的 message 用 delta 字段代替

灵医Bot Chat 接口分析

接口路径 /api/01bot/sse-gateway/stream

灵医Bot Chat 提供如果是服务端调用，参考 Server Sent Events server and client for Golang

名称	示例	类型	必选	说明
Content-Type	application/json	string	是	Content-Type类型
X-IHU-Authorization-V2	参考《鉴权认证文档》	string	是	签名字符串

非流式请求示例（单轮）参数

{
    "model": "test-model",
    "stream": false,
    "messages": [
        {
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "user",                    
            "content": [{
                "type":"text",
                "body":"患者3天前面部肿胀伴多发红疹，自觉瘙痒，是什么疾病？"
            }]                       
        }
    ]
}

与OpenAi区别
messages 字段多了 version created 字段

文档中message 参数中有 default 参数必填 (兜底开关，用于兜底策略：0：不兜底 1：一言兜底) 但是给出的示例中未体现

content 内容结构不同, 有 string 改成实体类型(type, body)

非流式响应参数（单轮）

{
    "error_msg":"",
    "logid":"",
    "need_clear_history":false,
    "ban_round":0,
    "result":[
        {
            "messages":[
                {
                    "extend":null,
                    "version":"api-v2",
                    "role":"",
                    "intention":"",
                    "content":[
                        {
                            "type":"text",
                            "body":"我不是医生或医疗专家，不能提供专业的医疗建议或诊断，以下信息仅供参考：
                            根据您提供的患者信息，可能的诊断是湿疹，荨麻疹；
                            诊断依据:
                            1. 患者3天前面部肿胀伴多发红疹，自觉瘙痒，提示患者有皮肤过敏反应；
                            2. 病历中未提及患者有发热、关节痛等症状，因此可以排除结节性红斑的可能性。
                            此外建议您补充以下检查:
                            1. 血常规检查：了解是否有感染或炎症反应；
                            2. 过敏原检测：了解过敏原，为治疗提供依据。"
                        }
                    ],
                    "created":0
                }
            ],
            "created":0
        }
    ],
    "error_code":0,
    "user_id":0
}

响应结果与OpenAi结构差异较大
extend 额外响应参数 string 响应额外参数参考示例为空
intention body string 当前对话意图信息参考示例为空

非流式请求示例（多轮）

{
    "model": "test-model",
    "stream": false,
    "messages": [
        {
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "user",                    
            "content": [{
                "type":"text",
                "body":"患者3天前面部肿胀伴多发红疹，自觉瘙痒，是什么疾病？"
            }]                       
        },
        {
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "assistant",                    
            "content": [{
                "type":"text",
                "body":"我不是医生或医疗专家，不能提供专业的医疗建议或诊断，以下信息仅供参考：
                根据您提供的患者信息，可能的诊断是湿疹，荨麻疹；
                诊断依据:
                1. 患者3天前面部肿胀伴多发红疹，自觉瘙痒，提示患者有皮肤过敏反应；
                2. 病历中未提及患者有发热、关节痛等症状，因此可以排除结节性红斑的可能性。
                此外建议您补充以下检查:
                1. 血常规检查：了解是否有感染或炎症反应；
                2. 过敏原检测：了解过敏原，为治疗提供依据。"
            }]                       
        },
        {
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "user",                    
            "content": [{
                "type":"text",
                "body":"应该吃什么药？"
            }]                       
        }
    ]
}

非流式响应参数（多轮）

{
    "error_msg":"",
    "logid":"",
    "need_clear_history":false,
    "ban_round":0,
    "result":[
        {
            "messages":[
                {
                    "extend":null,
                    "version":"api-v2",
                    "role":"",
                    "intention":"",
                    "content":[
                        {
                            "type":"text",
                            "body":"我无法作为医学专业人员为您提供用药指导，推荐结果仅供参考。
                            1. 湿疹：湿疹是一种常见的皮肤疾病，通常表现为皮肤上的红斑、丘疹和水疱等症状。推荐使用非处方药氢化可的松乳膏或地奈德乳膏，它们可以缓解湿疹引起的瘙痒和炎症。如果症状严重或持续时间较长，建议咨询医生并使用处方药。
                            2. 荨麻疹：荨麻疹通常表现为皮肤上的瘙痒性风团，推荐使用非处方药氯雷他定或西替利嗪，它们可以缓解荨麻疹引起的瘙痒和红肿。如果症状严重或持续时间较长，建议咨询医生并使用处方药。
                            请您在必要时咨询医学相关专业人士意见；同时，也要注意药品的使用方法和剂量，以免造成不良后果。"
                        }
                    ],
                    "created":0
                }
            ],
            "created":0
        }
    ],
    "error_code":0,
    "user_id":0
}

请求示例（流式）

{
    "model": "test-model",
    "stream": true,
    "messages": [
        {
            "version":"api-v2",       
            "created": 1683944235,              
            "role": "user",                    
            "content": [{
                "type":"text",
                "body":"患者3天前面部肿胀伴多发红疹，自觉瘙痒，是什么疾病？"
            }]                       
        }
    ]
}

响应示例（流式）

第一轮返回,其他返回类似，最后一轮返回is_end字段为true，标识整个对话回答返回结束

"event":"message"
"data": {
    "error_msg":"",
    "logid":"",
    "is_end": false,
    "sentence_id": 0,
    "need_clear_history":false,
    "ban_round":0,
    "result":[
        {
            "messages":[
                {
                    "extend":null,
                    "version":"api-v2",
                    "role":"",
                    "intention":"",
                    "content":[
                        {
                            "type":"text",
                            "body":"xxxxxxxx"
                        }
                    ],
                    "created":0
                }
            ],
            "created":0
        }
    ],
    "error_code":0,
    "user_id":0
}

请求示例 Java

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import javax.net.ssl.HttpsURLConnection;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.*;
import com.alibaba.fastjson2.JSONObject;
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.InputStreamReader;
import java.net.URL;

// Press Shift twice to open the Search Everywhere dialog and type `show whitespaces`,
// then press Enter. You can now see whitespace characters in your code.
public class Main {
    private static final String ak = "替换成申请的AK";
    private static final String sk = "替换成申请的SK";
    public static String getMd5(String data) throws NoSuchAlgorithmException, UnsupportedEncodingException {
        MessageDigest md = MessageDigest.getInstance("MD5");
        md.update(data.getBytes(StandardCharsets.UTF_8));
        byte[] digest = md.digest();
        StringBuilder sb = new StringBuilder();
        for (byte b : digest) {
            sb.append(String.format("%02x", b));
        }
        return sb.toString();
    }


    private static String string2Unicode(String string) {
        if (string.equals("")) {
            return null;
        }
        char[] bytes = string.toCharArray();
        StringBuffer unicode = new StringBuffer();
        for (int i = 0; i < bytes.length; i++) {
            char c = bytes[i];

            // 标准ASCII范围内的字符，直接输出
            if (c >= 0 && c <= 127) {
                unicode.append(c);
                continue;
            }
            String hexString = Integer.toHexString(bytes[i]);

            unicode.append("\\u");

            // 不够四位进行补0操作
            if (hexString.length() < 4) {
                unicode.append("0000".substring(hexString.length(), 4));
            }
            unicode.append(hexString);
        }
        return unicode.toString();
    }

    public static String hmacSha256(String secret, String message) throws NoSuchAlgorithmException {
        try {
            SecretKeySpec secretKeySpec = new SecretKeySpec(secret.getBytes(), "HmacSHA256");
            Mac mac = Mac.getInstance("HmacSHA256");
            mac.init(secretKeySpec);
            byte[] rawHmac = mac.doFinal(message.getBytes("UTF-8"));
            StringBuilder sb = new StringBuilder();
            for (byte b : rawHmac) {
                sb.append(String.format("%02x", b));
            }
            return sb.toString();
        } catch (Exception e) {
            throw new RuntimeException("Failed to calculate HMAC-SHA256", e);
        }
    }

    public static class message {
        private String model;

    }

    public static class Message {

        private String version;
        private long created;
        private String role;
        private List<Content> content;
        public void setVersion(String version) {
            this.version = version;
        }
        public String getVersion() {
            return version;
        }

        public void setCreated(long created) {
            this.created = created;
        }
        public long getCreated() {
            return created;
        }

        public void setRole(String role) {
            this.role = role;
        }
        public String getRole() {
            return role;
        }

        public void setContent(List<Content> content) {
            this.content = content;
        }
        public List<Content> getContent() {
            return content;
        }

    }



    public static class Content {

        private String type;
        private String body;
        public void setType(String type) {
            this.type = type;
        }
        public String getType() {
            return type;
        }

        public void setBody(String body) {
            this.body = body;
        }
        public String getBody() {
            return body;
        }

    }


    public static class MessageBean {

        private String model;

        public boolean isStream() {
            return stream;
        }

        public void setStream(boolean stream) {
            this.stream = stream;
        }

        private boolean stream;
        private List<Message> messages;
        public void setModel(String model) {
            this.model = model;
        }
        public String getModel() {
            return model;
        }

        public void setMessages(List<Message> messages) {
            this.messages = messages;
        }
        public List<Message> getMessages() {
            return messages;
        }

    }


    public static void main(String[] args) throws IOException {
        MessageBean messageBean = new MessageBean();
        messageBean.setModel("替换成申请的model");
        messageBean.setStream(false);
        Message message = new Message();
        ArrayList<Message> messages = new ArrayList<>();
        message.setRole("user");
        message.setCreated(1683944235);
        message.setVersion("api-v2");
        ArrayList<Content> contents = new ArrayList<>();
        Content content = new Content();
        content.setType("text");
        content.setBody("阿莫西林应该怎么吃");
        contents.add(content);
        message.setContent(contents);
        messages.add(message);
        messageBean.setMessages(messages);

        String messageJson = JSONObject.toJSONString(messageBean);
        System.out.println(messageJson);
        messageJson = string2Unicode(messageJson);
        String md5 = "";
        try {
            md5 = getMd5(messageJson);
        } catch (NoSuchAlgorithmException e) {
            System.out.println("Cannot get content md5");
            System.exit(0);
        }
        DateFormat sdf = new SimpleDateFormat("dd MMM yyyy HH:mm:ss 'GMT'", Locale.US);
        String trimester = sdf.format(new Date());
        String authStringPrefix = "ihcloud/" + ak + "/" + trimester + "/300";
        String signingKey = "";
        try {
            signingKey = hmacSha256(sk, authStringPrefix);
        } catch (NoSuchAlgorithmException e) {
            System.out.println("Cannot get content md5");
            System.exit(0);
        }
        String host = "替换成当前环境的host，线上环境是https://01bot.baidu.com";
        String path = "/api/01bot/sse-gateway/stream";
        String url = host + path;
        URL obj = new URL(url);
        String canonicalRequest = String.join("\n", "POST", obj.getPath(), "content-md5:" + md5);
        String signature = "";
        try {
            signature = hmacSha256(signingKey, canonicalRequest);
        } catch (NoSuchAlgorithmException e) {
            System.out.println("gen signature failed");
            System.exit(0);
        }
        HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

        //添加请求头
        con.setRequestMethod("POST");
        con.setRequestProperty("Content-Type", "application/json");
        con.setRequestProperty("X-IHU-Authorization-V2", authStringPrefix + "/" + signature);
        //发送Post请求
        con.setDoOutput(true);
        DataOutputStream wr = new DataOutputStream(con.getOutputStream());
        wr.writeBytes(messageJson);
        wr.flush();
        wr.close();

        int responseCode = con.getResponseCode();
        System.out.println("Response Code : " + responseCode);

        BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer response = new StringBuffer();

        while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
        }
        in.close();

        //打印结果
        System.out.println(response.toString());
    }
}

总结

请求结构体差距较小,字段有多出来,结构体有变化 , 后续其他大模型接入差距会更大

不同大模型返回结果差异较大

多轮对话需要自己构造 message的对话历史（指数级消耗token）
多轮对话需要完善的对话历史管理(用户/会话)才能实现,且会话长度需要控制。目前未考虑实现。

灵医Bot Chat 的签名验证方法比较复杂，每次请求需要重新计算签名，需要资源比较多。封装统一接口适配比普通接口额外处理较多。

拟定义的统一接口契约

统一接口不考虑权限认证，默认在安全环境下使用

入参（请求参数）

{
    "systemId": "1234567890", # string 不为空, 给出系统请求的标识(统计权限控制等使用) ; 第一个版本可随机传,后端服务不做控制
    "userId": "1234567890", # string 不为空, 给出用户请求的标识(统计或会话等使用)     ; 第一个版本可随机传,后端服务不做控制

    "model": "baidu_01bot",  # 大模型名称 string 厂商_大模型名称 第一个版本支持  [baidu_01bot]
    "stream": true, # 是否为流式请求 bool 默认 false
    "messages": [ # 请求消息列表 list 非空 { "role": ["system","user","assistant"],"content":"string"}
        {             
            "role": "user",                    
            "content": "患者3天前面部肿胀伴多发红疹，自觉瘙痒，是什么疾病？"                       
        }
    ],
    "extra": { # 额外参数 map 可为空 (针对不同的大模型提供 最外层参数 的额外参数扩展)
        "key1": "values1",
        "key2": 22.52
    }
}

假如OpenAI 额外参数说明如下且服务端已经支持

1. temperature float [0,2]
2. top_p float [0,1]

请求的时候额外参数例子如下 , 注意官方建议不建议两个参数同时使用 , 这里仅仅为了示例参数的写法

 "extra": {
        "temperature": 1.2,
        "top_p": 0.5
    }

额外的参数需要接口适配器支持才可以使用

返回参数 - 默认响应 (返回结果)

{
    "success": true, # bool 是否成功,

    "seqId": "guid", # 序列id string,

    "code": 0,  # int 0:成功, 其他:失败代码 ,
    "message": "success", # string 成功/失败的信息
    "model": "baidu_01bot",  # 大模型名称 string
    "result": [
        {
            "role": "assistant",
            "content": "我不是医生或医疗专家，不能提供专业的医疗建议或诊断，以下信息仅供参考：
                            根据您提......",
        }
    ],
    "extra": { # 附加的响应参数 map , 针对扩展功能预留
        "key1": "values1",
    }
}

extra 参数说明

假如 01bot 外层多返回了 intention 当前会话意图 , 且客户端已经支持解析,则extra返回如下

 "extra": {
        "intention": "string" # 当前会话意图
    }

额外的响应参数需要接口响应适配支持才可以使用

返回参数 - 流式响应 (返回结果)

SSE规范参考

除ended外,data 结构和默认响应 result 结构一样

"event":"message"
"data": {
    "success": true, # bool 是否成功,

    "seqId": "guid", # 序列id string,

    "code": 0,  # int 0:成功, 其他:失败代码 ,
    "message": "success", # string 成功/失败的信息
    "model": "baidu_01bot",  # 大模型名称 string

    "ended": false, # bool 是否结束流式请求 , true:结束流式请求 , false:继续流式请求 ,
    "result":[
        {
            "role": "assistant",
            "content": "我不是医生或医疗专家，不能提供专业的医疗建议或诊断，以下信息仅供参考：
                            根据您提......",
        }
    ],
    "extra": { # 附加的响应参数 map , 针对扩展功能预留
        "key1": "values1",
    }
}

extra 参数说明注意!! 此处介绍数组使用方法

假如 01bot 的 result 的每个结果多返回了intention会话意图 , 要表达每个结果的会话意图,则extra返回如下

 "extra": {
        "intentions": ["string"] # result结果的每个会话意图 , 数组长度和result长度一致
    }

extra 约定事项

Java为面向对象语言, 当extra 响应结果为空时，返回结果为

"extra": null

返回参数 - 流式响应 (返回结果)

特别说明

不同大模型接口返回结构差异较大，请求参数也不尽相同，为实现不同目的，需要新增不同参数对应不同的大模型

虽然设计已经考虑尽可能适配不同大模型，但使用场景需求的变化和大模型接口的调整，会导致入参和返回参数后续大概率会有变动