English 中文(简体)
• 如何在公开场合象征性使用 AI ChatCompletion AP calls in thestreaming manner?
原标题:How to get token usage for each openAI ChatCompletion API call in streaming mode?
  • 时间:2023-03-23 15:14:59
  •  标签:
  • openai-api

According to openAI s documentation, https://platform.openai.com/docs/guides/chat/chat-vs-completions you should get token usage from the response. However, I am currently working making the API call with stream set to True. The response doesn t seem to contain usage property?

因此,我怎样才能在这个案例中得到象征性的使用?

问题回答

您可使用tiktoken

<代码>pip 安装tiktoken

import tiktoken

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
    """Returns the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model == "gpt-3.5-turbo":
        print("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301")
    elif model == "gpt-4":
        print("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
        return num_tokens_from_messages(messages, model="gpt-4-0314")
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}
{content}<|end|>

        tokens_per_name = -1  # if there s a name, the role is omitted
    elif model == "gpt-4-0314":
        tokens_per_message = 3
        tokens_per_name = 1
    else:
        raise NotImplementedError(f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")
    num_tokens = 0

    if type(messages) == "list":
        for message in messages:
            num_tokens += tokens_per_message
            for key, value in message.items():
                num_tokens += len(encoding.encode(value))
                if key == "name":
                    num_tokens += tokens_per_name
        num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    elif type(messages) == "str":
        num_tokens += len(encoding.encode(messages))
    return num_tokens
import openai

result = []

for chunk in openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ], # this is prompt_tokens ex) prompt_tokens=num_tokens_from_messages(messages)
    stream=True
):
    content = chunk["choices"][0].get("delta", {}).get("content")
    if content:
        result.append(content)


# Usage of completion_tokens
completion_tokens = num_tokens_from_messages("".join(result))

提供的所有答复都有核心解决办法,即我们必须使用某种代用手段,利用我们的习惯方式处理象征性的计算(即打字)。 在此,我要在我的回答中介绍开放审计协会的实施方式。

这里的逻辑是建立逆向代谢((气体反向代理)。 您可以从Enterprise-azureai-proxy找到完整的项目。

The below is one major part of the solution of handling the stream

using AsyncAwaitBestPractices;
using Azure.Core;
using AzureAI.Proxy.Models;
using AzureAI.Proxy.OpenAIHandlers;
using AzureAI.Proxy.Services;
using System.Text;
using System.Text.Json;
using System.Text.Json.Nodes;
using Yarp.ReverseProxy.Transforms;
using Yarp.ReverseProxy.Transforms.Builder;

namespace AzureAI.Proxy.ReverseProxy;

internal class OpenAIChargebackTransformProvider : ITransformProvider
{
   
    private readonly IConfiguration _config;
    private readonly IManagedIdentityService _managedIdentityService;
    private readonly ILogIngestionService _logIngestionService;
   
    private string accessToken = "";

    private TokenCredential _managedIdentityCredential;

    public OpenAIChargebackTransformProvider(
        IConfiguration config, 
        IManagedIdentityService managedIdentityService,
        ILogIngestionService logIngestionService)
    {
        _config = config;
        _managedIdentityService = managedIdentityService;
        _logIngestionService = logIngestionService;
               
        _managedIdentityCredential = _managedIdentityService.GetTokenCredential();

    }

    public void ValidateRoute(TransformRouteValidationContext context) { return; }

    public void ValidateCluster(TransformClusterValidationContext context) { return; }
    
    public void Apply(TransformBuilderContext context)
    {
        context.AddRequestTransform(async requestContext => {
            //enable buffering allows us to read the requestbody twice (one for forwarding, one for analysis)
            requestContext.HttpContext.Request.EnableBuffering();

            //check accessToken before replacing the Auth Header
            if (String.IsNullOrEmpty(accessToken) || OpenAIAccessToken.IsTokenExpired(accessToken, _config["EntraId:TenantId"]))
            {
                accessToken = await OpenAIAccessToken.GetAccessTokenAsync(_managedIdentityCredential, CancellationToken.None);
            }

            //replace auth header with the accesstoken of the managed indentity of the proxy
            requestContext.ProxyRequest.Headers.Remove("api-key");
            requestContext.ProxyRequest.Headers.Remove("Authorization");
            requestContext.ProxyRequest.Headers.Add("Authorization", $"Bearer {accessToken}");

        });
        context.AddResponseTransform(async responseContext =>
        {
            var originalStream = await responseContext.ProxyResponse.Content.ReadAsStreamAsync();
            string capturedBody = "";

            // Buffer for reading chunks
            byte[] buffer = new byte[8192];
            int bytesRead;

            // Read, inspect, and write the data in chunks - this is especially needed for streaming content
            while ((bytesRead = await originalStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
            {
                // Convert the chunk to a string for inspection
                var chunk = Encoding.UTF8.GetString(buffer, 0, bytesRead);

                capturedBody += chunk;

                // Write the unmodified chunk back to the response
                await responseContext.HttpContext.Response.Body.WriteAsync(buffer, 0, bytesRead);
            }

            //flush any remaining content to the client
            await responseContext.HttpContext.Response.CompleteAsync();

            //now perform the analysis and create a log record
            var record = new LogAnalyticsRecord();
            record.TimeGenerated = DateTime.UtcNow;
            
            if (responseContext.HttpContext.Request.Headers["X-Consumer"].ToString() != "")
            {
                record.Consumer = responseContext.HttpContext.Request.Headers["X-Consumer"].ToString();
            }
            else
            {
                record.Consumer = "Unknown Consumer";
            }
           
            bool firstChunck = true;
            var chunks = capturedBody.Split("data:");
            foreach (var chunk in chunks)
            {
                var trimmedChunck = chunk.Trim();
                if (trimmedChunck != "" && trimmedChunck != "[DONE]")
                {

                    JsonNode jsonNode = JsonSerializer.Deserialize<JsonNode>(trimmedChunck);
                    if (jsonNode["error"] is not null)
                    {
                        Error.Handle(jsonNode);
                    }
                    else
                    {
                        string objectValue = jsonNode["object"].ToString();

                        switch (objectValue)
                        {
                            case "chat.completion":
                                Usage.Handle(jsonNode, ref record);
                                record.ObjectType = objectValue;
                                break;
                            case "chat.completion.chunk":
                                if (firstChunck)
                                {
                                    record = Tokens.CalculateChatInputTokens(responseContext.HttpContext.Request, record);
                                    record.ObjectType = objectValue;
                                    firstChunck = false;
                                }
                                ChatCompletionChunck.Handle(jsonNode, ref record);
                                break;
                            case "list":
                                if (jsonNode["data"][0]["object"].ToString() == "embedding")
                                {
                                    record.ObjectType = jsonNode["data"][0]["object"].ToString();
                                    //it s an embedding
                                    Usage.Handle(jsonNode, ref record);
                                }
                                break;
                            default:
                                break;
                        }
                    }
                }

            }

            record.TotalTokens = record.InputTokens + record.OutputTokens;
            _logIngestionService.LogAsync(record).SafeFireAndForget();
        });
    }
}

You can retrieve the total number of tokens from the response by checking response.usage.total_tokens.

例:

response = openai_client.embeddings.create(model= "text-embedding-3-large", input="test text", encoding_format="float")
if response.data:
    embedding = response.data[0].embedding
    
    total_tokens = response.usage.total_tokens
    print ("Total tokens: ", total_tokens)

在植根之前,用Tiktoken:

def get_number_of_tokens(string: str) -> int:
    encoding = tiktoken.encoding_for_model("text-embedding-3-large")
    num_tokens = len(encoding.encode(string))
    return num_tokens

total_token = get_number_of_tokens( test text )
print total_token

最后,我感到欣慰的是,经过几小时的劝说文件之后,我终于找到了解决办法,因此,希望这将有助于人们摆脱困境。 如果你发现上游/热量使用量之间的不匹配,请让我知道。

不幸的是,他们没有选择通过国际发展法来查询使用信息,甚至只是退回使用某些知识;这将是比较容易的解决办法。 相反,我在此执行。 它涉及:

  • Counting tokens for images with the new gpt-4-turbo/vision models
  • The scuffed and varied additional tokens that get added in with openai s api
  • Wrapping the returned Stream generator, appending any tokens to a list before yielding, and finally processing the list as the output message

Count子类(类型稍有缺陷,我没有将其列入SO法典,但是如果需要所有类型的,我把它列入我的实际项目)

在我的参考项目中执行;检查链条。 py functions: https://github.com/flatypus/flowchat/blob/main/flowchat/ private/ private_helpers.py

法典:

from io import BytesIO
from math import ceil
from PIL import Image
from requests import get
from typing import Callable, List, Dict
import base64
import tiktoken

class CalculateImageTokens:
    def __init__(self, image: str):
        self.image = image

    def _get_image_dimensions(self):
        if self.image.startswith("data:image"):
            image = self.image.split(",")[1]
            image = base64.b64decode(image)
            image = Image.open(BytesIO(image))
            return image.size
        else:
            response = get(self.image)
            image = Image.open(BytesIO(response.content))
            return image.size

    def _openai_resize(self, width: int, height: int):
        if width > 1024 or height > 1024:
            if width > height:
                height = int(height * 1024 / width)
                width = 1024
            else:
                width = int(width * 1024 / height)
                height = 1024
        return width, height

    def count_image_tokens(self):
        width, height = self._get_image_dimensions()
        width, height = self._openai_resize(width, height)
        h = ceil(height / 512)
        w = ceil(width / 512)
        total = 85 + 170 * h * w
        return total


class CountStreamTokens:
    def __init__(self, model: str, messages: List[Message]):
        self.collect_tokens: List[str] = []
        self.messages = messages
        self.model = self._get_model(model)
        self.tokens_per_message = 3
        self.tokens_per_name = 1

    def _get_model(self, model: str):
        """Picks the right model and sets the additional tokens. See https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb"""
        try:
            self.encoding = tiktoken.encoding_for_model(model)
        except KeyError:
            self.encoding = tiktoken.get_encoding("cl100k_base")

        if model in {
            "gpt-3.5-turbo-0613",
            "gpt-3.5-turbo-16k-0613",
            "gpt-4-0314",
            "gpt-4-32k-0314",
            "gpt-4-0613",
            "gpt-4-32k-0613",
        }:
            self.tokens_per_message = 3
            self.tokens_per_name = 1

        elif model == "gpt-3.5-turbo-0301":
            # every message follows <|start|>{role/name}
{content}<|end|>

            self.tokens_per_message = 4
            self.tokens_per_name = -1  # if there s a name, the role is omitted
        elif "gpt-3.5-turbo" in model:
            self._get_model("gpt-3.5-turbo-0613")
        elif "gpt-4" in model:
            self._get_model("gpt-4-0613")

    def _count_text_tokens(self, message: Message) -> int:
        """Return the number of tokens used by a list of messages. See above link for context"""
        num_tokens = self.tokens_per_message
        for key, value in message.items():
            num_tokens += len(self.encoding.encode(str(value)))
            if key == "name":
                num_tokens += self.tokens_per_name

        return num_tokens

    def _count_input_tokens(self):
        tokens = 0
        text_messages: List[Message] = []
        image_messages: List[Dict[str, Any]] = []

        for message in self.messages:
            content = message["content"]
            role = message["role"]
            if isinstance(content, str):
                text_messages.append({"role": role, "content": content})
            else:
                for item in content:
                    if item["type"] == "text":
                        text_messages.append(
                            {"role": role, "content": item["text"]})
                    else:
                        image_messages.append(item)

        for message in text_messages:
            tokens += self._count_text_tokens(message)

        for message in image_messages:
            image = message["image_url"]
            detail = image.get("detail", "high")
            if detail == "low":
                tokens += 85
            else:
                tokens += (
                    CalculateImageTokens(message["image_url"]["url"])
                    .count_image_tokens()
                )

        tokens += 3  # every reply is primed with <|start|>assistant<|message|>

        return tokens

    def _count_output_tokens(self, message: str):
        return len(self.encoding.encode(message))

    def wrap_stream_and_count(self, generator: StreamChatCompletion, callback: Callable[[int, int], None]):
        for response in generator:
            content = response.choices[0].delta.content
            yield response

            if content is None:
                output_message = "".join(self.collect_tokens)
                prompt_tokens = self._count_input_tokens()
                completion_tokens = self._count_output_tokens(output_message)
                callback(prompt_tokens, completion_tokens)
                continue

            self.collect_tokens.append(content)

def add_token_count(self, prompt_tokens: int, completion_tokens: int) -> None:
        # I append the tokens to a running total here. This will be called after the calculation is finished, as a callback. 
        # You can choose to do anything here with the numbers.
        self.usage["prompt_tokens"] += prompt_tokens
        self.usage["completion_tokens"] += completion_tokens

# ============= YOUR CODE =============

completion = openai.chat.completions.create(messages=messages, stream=True, **params)

# completion is now a generator, or a  stream  object. 
# CountStreamTokens is a custom class that is initialized with the model you use, and the messages you want to query with. 
# These are saved as class attributes for use in the .wrap_stream_and_count() function.
# The .wrap_stream_and_count() returns another generator, yielding all the same tokens as OpenAI provides, 
# but simultaneously collecting the output tokens.
# When the generator detects a None (ending) token in the stream, 
# it yields the final token and begins counting tokens (as to keep the stream running)

return CountStreamTokens(model, messages).wrap_stream_and_count(completion, _add_token_count)

如果你使用拉链条,你也可以使用“开门车”。

from langchain.callbacks import get_openai_callback

        with get_openai_callback() as cb:
            response = qa({"question": prompt, "chat_history": chat_history})

            print(f"Prompt Tokens: {cb.prompt_tokens}")
            print(f"Completion Tokens: {cb.completion_tokens}")
            print(f"Total Cost (USD): ${cb.total_cost}")




相关问题
How do I connect Alexa to ChatGPT

I m trying to connect Alexa to openAI s API using my APIkey for chat completions and not sure where to write the code for the function in the Alexa Skills Kit. In the Lambda file there are: index.js, ...

CHATGPT Integration [closed]

Can anyone please let me know the steps to integrate ChatGPT in to a website using Api with python I am expecting Chat GPT integration process in to a website for my process. So please let me know the ...

Open AI finish response LENGTH

I am calling open AI api with below details and I am getting below response { value : { outputs : [{ finishReason : LENGTH , text : The summary in this JSON format is as follows: short_ , ...

热门标签