{"id":1276,"date":"2023-02-25T09:58:29","date_gmt":"2023-02-25T01:58:29","guid":{"rendered":"https:\/\/www.aqwu.net\/wp\/?p=1276"},"modified":"2023-02-25T16:24:29","modified_gmt":"2023-02-25T08:24:29","slug":"openai-api5-%e4%be%8b%e5%ad%90%ef%bc%9a%e5%a6%82%e4%bd%95%e6%9e%84%e5%bb%ba%e5%8f%af%e4%bb%a5%e5%9b%9e%e7%ad%94%e6%9c%89%e5%85%b3%e6%82%a8%e7%bd%91%e7%ab%99%e7%9a%84%e9%97%ae%e9%a2%98%e7%9a%84","status":"publish","type":"post","link":"https:\/\/www.aqwu.net\/wp\/?p=1276","title":{"rendered":"OpenAI API \u5f00\u59cb\u4f7f\u7528\uff1a5.\u4f8b\u5b50\uff1a\u5982\u4f55\u6784\u5efa\u53ef\u4ee5\u56de\u7b54\u6709\u5173\u60a8\u7f51\u7ad9\u7684\u95ee\u9898\u7684\u4eba\u5de5\u667a\u80fd"},"content":{"rendered":"\n<p>\u672c\u6559\u7a0b\u6f14\u7ec3\u4e00\u4e2a\u7b80\u5355\u7684\u793a\u4f8b\uff1a\u5bf9\u7f51\u7ad9\uff08\u5728\u6b64\u793a\u4f8b\u4e2d\u4e3a OpenAI \u7f51\u7ad9\uff09\u8fdb\u884c\u722c\u7f51\uff0c\u4f7f\u7528\u5d4c\u5165 API \u5c06\u5df2\u722c\u7f51\u7684\u9875\u9762\u8f6c\u6362\u4e3a<a href=\"https:\/\/platform.openai.com\/docs\/guides\/embeddings\">\u5d4c\u5165<\/a>\uff0c\u7136\u540e\u521b\u5efa\u5141\u8bb8\u7528\u6237\u8be2\u95ee\u6709\u5173\u5d4c\u5165\u4fe1\u606f\u7684\u95ee\u9898\u7684\u57fa\u672c\u641c\u7d22\u529f\u80fd\u3002\u8fd9\u65e8\u5728\u6210\u4e3a\u4f7f\u7528\u81ea\u5b9a\u4e49\u77e5\u8bc6\u5e93\u7684\u66f4\u590d\u6742\u7684\u5e94\u7528\u7a0b\u5e8f\u7684\u8d77\u70b9\u3002<\/p>\n\n\n\n<p>\u539f\u6587\u94fe\u63a5\uff1a<a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\">Web Q&amp;A &#8211; OpenAI API<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\"><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/getting-started\">\u5f00\u59cb<\/a><\/h1>\n\n\n\n<p>Python \u548c GitHub \u7684\u4e00\u4e9b\u57fa\u672c\u77e5\u8bc6\u5bf9\u672c\u6559\u7a0b\u5f88\u6709\u5e2e\u52a9\u3002\u5728\u6df1\u5165\u7814\u7a76\u4e4b\u524d\uff0c\u8bf7\u786e\u4fdd<a href=\"https:\/\/platform.openai.com\/docs\/api-reference\/introduction\">\u8bbe\u7f6e OpenAI API \u5bc6\u94a5<\/a>\u5e76\u6f14\u7ec3\u5feb\u901f<a href=\"https:\/\/platform.openai.com\/docs\/quickstart\">\u5165\u95e8\u6559\u7a0b<\/a>\u3002\u8fd9\u5c06\u4e3a\u5982\u4f55\u5145\u5206\u53d1\u6325 API \u7684\u6f5c\u529b\u63d0\u4f9b\u826f\u597d\u7684\u76f4\u89c9\u3002<\/p>\n\n\n\n<p>Python\u4e0eOpenAI\uff0cPandas\uff0ctransformers\uff0cNumPy\u548c\u5176\u4ed6\u6d41\u884c\u7684\u8f6f\u4ef6\u5305\u4e00\u8d77\u88ab\u7528\u4f5c\u4e3b\u8981\u7684\u7f16\u7a0b\u8bed\u8a00\u3002\u5982\u679c\u60a8\u5728\u5b8c\u6210\u672c\u6559\u7a0b\u65f6\u9047\u5230\u4efb\u4f55\u95ee\u9898\uff0c\u8bf7\u5728&nbsp;<a href=\"https:\/\/community.openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI \u793e\u533a\u8bba\u575b<\/a>\u4e0a\u63d0\u95ee\u3002<\/p>\n\n\n\n<p>\u82e5\u8981\u4ece\u4ee3\u7801\u5f00\u59cb\uff0c\u8bf7\u5728&nbsp;<a href=\"https:\/\/github.com\/openai\/openai-cookbook\/tree\/main\/apps\/web-crawl-q-and-a\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub \u4e0a\u514b\u9686\u672c\u6559\u7a0b\u7684\u5b8c\u6574\u4ee3\u7801<\/a>\u3002\u6216\u8005\uff0c\u6309\u7167\u64cd\u4f5c\u5e76\u5c06\u6bcf\u4e2a\u90e8\u5206\u590d\u5236\u5230 Jupyter \u7b14\u8bb0\u672c\u4e2d\uff0c\u5e76\u9010\u6b65\u8fd0\u884c\u4ee3\u7801\uff0c\u6216\u8005\u53ea\u662f\u7ee7\u7eed\u9605\u8bfb\u3002\u907f\u514d\u4efb\u4f55\u95ee\u9898\u7684\u4e00\u4e2a\u597d\u65b9\u6cd5\u662f\u8bbe\u7f6e\u65b0\u7684\u865a\u62df\u73af\u5883\u5e76\u901a\u8fc7\u8fd0\u884c\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88c5\u6240\u9700\u7684\u5305\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python -m venv env\n\nsource env\/bin\/activate\n\npip install -r requirements.txt<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/setting-up-a-web-crawler\"><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/setting-up-a-web-crawler\">\u8bbe\u7f6e\u7f51\u7edc\u722c\u866b<\/a><\/h2>\n\n\n\n<p>\u672c\u6559\u7a0b\u7684\u4e3b\u8981\u91cd\u70b9\u662f OpenAI API\uff0c\u56e0\u6b64\u5982\u679c\u60a8\u613f\u610f\uff0c\u53ef\u4ee5\u8df3\u8fc7\u6709\u5173\u5982\u4f55\u521b\u5efa\u7f51\u7edc\u722c\u866b\u7684\u4e0a\u4e0b\u6587\uff0c\u53ea\u9700<a href=\"https:\/\/github.com\/openai\/openai-cookbook\/tree\/main\/apps\/web-crawl-q-and-a\" target=\"_blank\" rel=\"noreferrer noopener\">\u4e0b\u8f7d\u6e90\u4ee3\u7801<\/a>\u3002\u5426\u5219\uff0c\u8bf7\u5c55\u5f00\u4ee5\u4e0b\u90e8\u5206\u4ee5\u5b8c\u6210\u6293\u53d6\u673a\u5236\u7684\u5b9e\u73b0\u3002<\/p>\n\n\n\n<p>\u4e86\u89e3\u5982\u4f55\u6784\u5efa\u7f51\u7edc\u722c\u866b<\/p>\n\n\n\n<p><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/building-an-embeddings-index\"><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/building-an-embeddings-index\">\u6784\u5efa\u5d4c\u5165\u7d22\u5f15<\/a><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn.openai.com\/API\/docs\/images\/tutorials\/web-qa\/DALL-E-woman-turning-a-stack-of-papers-into-numbers-pixel-art.png\" alt=\"DALL-E\uff1a\u5973\u4eba\u628a\u4e00\u53e0\u7eb8\u53d8\u6210\u6570\u5b57\u50cf\u7d20\u827a\u672f\"\/><\/figure>\n\n\n\n<p>CSV \u662f\u5b58\u50a8\u5d4c\u5165\u7684\u5e38\u7528\u683c\u5f0f\u3002\u60a8\u53ef\u4ee5\u901a\u8fc7\u5c06\u539f\u59cb\u6587\u672c\u6587\u4ef6\uff08\u4f4d\u4e8e\u6587\u672c\u76ee\u5f55\u4e2d\uff09\u8f6c\u6362\u4e3a Pandas \u6570\u636e\u6846\u6765\u5728 Python \u4e2d\u4f7f\u7528\u8fd9\u79cd\u683c\u5f0f\u3002Pandas \u662f\u4e00\u4e2a\u6d41\u884c\u7684\u5f00\u6e90\u5e93\uff0c\u53ef\u5e2e\u52a9\u60a8\u5904\u7406\u8868\u683c\u6570\u636e\uff08\u5b58\u50a8\u5728\u884c\u548c\u5217\u4e2d\u7684\u6570\u636e\uff09\u3002<\/p>\n\n\n\n<p>\u7a7a\u767d\u7684\u7a7a\u884c\u53ef\u80fd\u4f1a\u4f7f\u6587\u672c\u6587\u4ef6\u6df7\u4e71\uff0c\u5e76\u4f7f\u5176\u66f4\u96be\u5904\u7406\u3002\u4e00\u4e2a\u7b80\u5355\u7684\u51fd\u6570\u53ef\u4ee5\u5220\u9664\u8fd9\u4e9b\u884c\u5e76\u6574\u7406\u6587\u4ef6\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def remove_newlines(serie):\n    serie = serie.str.replace('\\n', ' ')\n    serie = serie.str.replace('\\\\n', ' ')\n    serie = serie.str.replace('  ', ' ')\n    serie = serie.str.replace('  ', ' ')\n    return serie<\/code><\/pre>\n\n\n\n<p>\u5c06\u6587\u672c\u8f6c\u6362\u4e3a CSV \u9700\u8981\u5faa\u73af\u8bbf\u95ee\u4e4b\u524d\u521b\u5efa\u7684\u6587\u672c\u76ee\u5f55\u4e2d\u7684\u6587\u672c\u6587\u4ef6\u3002\u6253\u5f00\u6bcf\u4e2a\u6587\u4ef6\u540e\uff0c\u5220\u9664\u591a\u4f59\u7684\u95f4\u8ddd\u5e76\u5c06\u4fee\u6539\u540e\u7684\u6587\u672c\u8ffd\u52a0\u5230\u5217\u8868\u4e2d\u3002\u7136\u540e\uff0c\u5c06\u5220\u9664\u65b0\u884c\u7684\u6587\u672c\u6dfb\u52a0\u5230\u7a7a\u7684 Pandas \u6570\u636e\u6846\u4e2d\uff0c\u5e76\u5c06\u6570\u636e\u6846\u5199\u5165 CSV \u6587\u4ef6\u3002<\/p>\n\n\n\n<p>\u989d\u5916\u7684\u95f4\u8ddd\u548c\u65b0\u884c\u4f1a\u4f7f\u6587\u672c\u6df7\u4e71\u5e76\u4f7f\u5d4c\u5165\u8fc7\u7a0b\u590d\u6742\u5316\u3002\u6b64\u5904\u4f7f\u7528\u7684\u4ee3\u7801\u6709\u52a9\u4e8e\u5220\u9664\u5176\u4e2d\u4e00\u4e9b\uff0c\u4f46\u60a8\u53ef\u80fd\u4f1a\u53d1\u73b0 3rd \u65b9\u5e93\u6216\u5176\u4ed6\u65b9\u6cd5\u53ef\u7528\u4e8e\u5220\u9664\u66f4\u591a\u4e0d\u5fc5\u8981\u7684\u5b57\u7b26\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\n\n# Create a list to store the text files\ntexts=&#91;]\n\n# Get all the text files in the text directory\nfor file in os.listdir(\"text\/\" + domain + \"\/\"):\n\n    # Open the file and read the text\n    with open(\"text\/\" + domain + \"\/\" + file, \"r\", encoding=\"UTF-8\") as f:\n        text = f.read()\n\n        # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces.\n        texts.append((file&#91;11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text))\n\n# Create a dataframe from the list of texts\ndf = pd.DataFrame(texts, columns = &#91;'fname', 'text'])\n\n# Set the text column to be the raw text with the newlines removed\ndf&#91;'text'] = df.fname + \". \" + remove_newlines(df.text)\ndf.to_csv('processed\/scraped.csv')\ndf.head()<\/code><\/pre>\n\n\n\n<p>\u6807\u8bb0\u5316\u662f\u5c06\u539f\u59cb\u6587\u672c\u4fdd\u5b58\u5230 CSV \u6587\u4ef6\u540e\u7684\u4e0b\u4e00\u6b65\u3002\u6b64\u8fc7\u7a0b\u901a\u8fc7\u5206\u89e3\u53e5\u5b50\u548c\u5355\u8bcd\u5c06\u8f93\u5165\u6587\u672c\u62c6\u5206\u4e3a\u6807\u8bb0\u3002\u53ef\u4ee5\u901a\u8fc7<a href=\"https:\/\/platform.openai.com\/tokenizer\">\u67e5\u770b\u6587\u6863\u4e2d\u7684 Tokenizer<\/a>&nbsp;\u6765\u67e5\u770b\u5bf9\u6b64\u7684\u89c6\u89c9\u6f14\u793a\u3002<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u4e00\u4e2a\u6709\u7528\u7684\u7ecf\u9a8c\u6cd5\u5219\u662f\uff0c\u5bf9\u4e8e\u5e38\u89c1\u7684\u82f1\u8bed\u6587\u672c\uff0c\u4e00\u4e2a\u6807\u8bb0\u901a\u5e38\u5bf9\u5e94\u4e8e ~4 \u4e2a\u5b57\u7b26\u7684\u6587\u672c\u3002\u8fd9\u76f8\u5f53\u4e8e\u5927\u7ea6\u4e00\u4e2a\u5355\u8bcd\u7684 100\/75\uff08\u6240\u4ee5 &lt;&gt; \u4e2a\u6807\u8bb0 ~= &lt;&gt; \u4e2a\u5355\u8bcd\uff09\u3002<\/p>\n<\/blockquote>\n\n\n\n<p>API \u5bf9\u5d4c\u5165\u7684\u6700\u5927\u8f93\u5165\u4ee4\u724c\u6570\u6709\u9650\u5236\u3002\u4e3a\u4e86\u4fdd\u6301\u5728\u9650\u5236\u4ee5\u4e0b\uff0cCSV \u6587\u4ef6\u4e2d\u7684\u6587\u672c\u9700\u8981\u5206\u89e3\u4e3a\u591a\u884c\u3002\u5c06\u9996\u5148\u8bb0\u5f55\u6bcf\u884c\u7684\u73b0\u6709\u957f\u5ea6\uff0c\u4ee5\u786e\u5b9a\u9700\u8981\u62c6\u5206\u54ea\u4e9b\u884c\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import tiktoken\n\n# Load the cl100k_base tokenizer which is designed to work with the ada-002 model\ntokenizer = tiktoken.get_encoding(\"cl100k_base\")\n\ndf = pd.read_csv('processed\/scraped.csv', index_col=0)\ndf.columns = &#91;'title', 'text']\n\n# Tokenize the text and save the number of tokens to a new column\ndf&#91;'n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))\n\n# Visualize the distribution of the number of tokens per row using a histogram\ndf.n_tokens.hist()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn.openai.com\/API\/docs\/images\/tutorials\/web-qa\/embeddings-initial-histrogram.png\" alt=\"\u5d4c\u5165\u76f4\u65b9\u56fe\"\/><\/figure>\n\n\n\n<p>\u6700\u65b0\u7684\u5d4c\u5165\u6a21\u578b\u53ef\u4ee5\u5904\u7406\u6700\u591a 8191 \u4e2a\u8f93\u5165\u6807\u8bb0\u7684\u8f93\u5165\uff0c\u56e0\u6b64\u5927\u591a\u6570\u884c\u4e0d\u9700\u8981\u4efb\u4f55\u5206\u5757\uff0c\u4f46\u53ef\u80fd\u4e0d\u662f\u6bcf\u4e2a\u6293\u53d6\u7684\u5b50\u9875\u9762\u7684\u60c5\u51b5\uff0c\u56e0\u6b64\u4e0b\u4e00\u4e2a\u4ee3\u7801\u5757\u4f1a\u5c06\u8f83\u957f\u7684\u884c\u62c6\u5206\u4e3a\u8f83\u5c0f\u7684\u5757\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>max_tokens = 500\n\n# Function to split the text into chunks of a maximum number of tokens\ndef split_into_many(text, max_tokens = max_tokens):\n\n    # Split the text into sentences\n    sentences = text.split('. ')\n\n    # Get the number of tokens for each sentence\n    n_tokens = &#91;len(tokenizer.encode(\" \" + sentence)) for sentence in sentences]\n    \n    chunks = &#91;]\n    tokens_so_far = 0\n    chunk = &#91;]\n\n    # Loop through the sentences and tokens joined together in a tuple\n    for sentence, token in zip(sentences, n_tokens):\n\n        # If the number of tokens so far plus the number of tokens in the current sentence is greater \n        # than the max number of tokens, then add the chunk to the list of chunks and reset\n        # the chunk and tokens so far\n        if tokens_so_far + token &gt; max_tokens:\n            chunks.append(\". \".join(chunk) + \".\")\n            chunk = &#91;]\n            tokens_so_far = 0\n\n        # If the number of tokens in the current sentence is greater than the max number of \n        # tokens, go to the next sentence\n        if token &gt; max_tokens:\n            continue\n\n        # Otherwise, add the sentence to the chunk and add the number of tokens to the total\n        chunk.append(sentence)\n        tokens_so_far += token + 1\n\n    return chunks\n    \n\nshortened = &#91;]\n\n# Loop through the dataframe\nfor row in df.iterrows():\n\n    # If the text is None, go to the next row\n    if row&#91;1]&#91;'text'] is None:\n        continue\n\n    # If the number of tokens is greater than the max number of tokens, split the text into chunks\n    if row&#91;1]&#91;'n_tokens'] &gt; max_tokens:\n        shortened += split_into_many(row&#91;1]&#91;'text'])\n    \n    # Otherwise, add the text to the list of shortened texts\n    else:\n        shortened.append( row&#91;1]&#91;'text'] )<\/code><\/pre>\n\n\n\n<p>\u518d\u6b21\u53ef\u89c6\u5316\u66f4\u65b0\u7684\u76f4\u65b9\u56fe\u6709\u52a9\u4e8e\u786e\u8ba4\u884c\u662f\u5426\u5df2\u6210\u529f\u62c6\u5206\u4e3a\u7f29\u77ed\u7684\u90e8\u5206\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df = pd.DataFrame(shortened, columns = &#91;'text'])\ndf&#91;'n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))\ndf.n_tokens.hist()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn.openai.com\/API\/docs\/images\/tutorials\/web-qa\/embeddings-tokenized-output.png\" alt=\"\u5d4c\u5165\u6807\u8bb0\u5316\u8f93\u51fa\"\/><\/figure>\n\n\n\n<p>\u5185\u5bb9\u73b0\u5728\u88ab\u5206\u89e3\u4e3a\u66f4\u5c0f\u7684\u5757\uff0c\u5e76\u4e14\u53ef\u4ee5\u5411 OpenAI API \u53d1\u9001\u4e00\u4e2a\u7b80\u5355\u7684\u8bf7\u6c42\uff0c\u6307\u5b9a\u4f7f\u7528\u65b0\u7684\u6587\u672c\u5d4c\u5165 ada-002 \u6a21\u578b\u6765\u521b\u5efa\u5d4c\u5165\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import openai\n\ndf&#91;'embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')&#91;'data']&#91;0]&#91;'embedding'])\n\ndf.to_csv('processed\/embeddings.csv')\ndf.head()<\/code><\/pre>\n\n\n\n<p>\u8fd9\u5e94\u8be5\u9700\u8981\u5927\u7ea6 3-5 \u5206\u949f\uff0c\u4f46\u4e4b\u540e\u60a8\u5c06\u51c6\u5907\u597d\u4f7f\u7528\u60a8\u7684\u5d4c\u5165\uff01<\/p>\n\n\n\n<p><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/building-a-question-answer-system-with-your-embeddings\"><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/platform.openai.com\/docs\/tutorials\/web-qa-embeddings\/building-a-question-answer-system-with-your-embeddings\">\u4f7f\u7528\u5d4c\u5165\u6784\u5efa\u95ee\u7b54\u7cfb\u7edf<\/a><\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/cdn.openai.com\/API\/docs\/images\/tutorials\/web-qa\/DALL-E-friendly-robot-question-and-answer-system-pixel-art.png\" alt=\"DALL-E\uff1a\u53cb\u597d\u7684\u673a\u5668\u4eba\u95ee\u7b54\u7cfb\u7edf\u50cf\u7d20\u827a\u672f\"\/><\/figure>\n\n\n\n<p>\u5d4c\u5165\u5df2\u51c6\u5907\u5c31\u7eea\uff0c\u6b64\u8fc7\u7a0b\u7684\u6700\u540e\u4e00\u6b65\u662f\u521b\u5efa\u4e00\u4e2a\u7b80\u5355\u7684\u95ee\u7b54\u7cfb\u7edf\u3002\u8fd9\u5c06\u63a5\u53d7\u7528\u6237\u7684\u95ee\u9898\uff0c\u521b\u5efa\u5b83\u7684\u5d4c\u5165\uff0c\u5e76\u5c06\u5176\u4e0e\u73b0\u6709\u5d4c\u5165\u8fdb\u884c\u6bd4\u8f83\uff0c\u4ee5\u4ece\u6293\u53d6\u7684\u7f51\u7ad9\u4e2d\u68c0\u7d22\u6700\u76f8\u5173\u7684\u6587\u672c\u3002\u7136\u540e\uff0ctext-davinci-003 \u6a21\u578b\u5c06\u6839\u636e\u68c0\u7d22\u5230\u7684\u6587\u672c\u751f\u6210\u4e00\u4e2a\u81ea\u7136\u542c\u8d77\u6765\u4e0d\u9519\u7684\u7b54\u6848\u3002<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>\u5c06\u5d4c\u5165\u8f6c\u6362\u4e3a NumPy \u6570\u7ec4\u662f\u7b2c\u4e00\u6b65\uff0c\u9274\u4e8e\u5728 NumPy \u6570\u7ec4\u4e0a\u8fd0\u884c\u7684\u8bb8\u591a\u53ef\u7528\u51fd\u6570\uff0c\u8fd9\u5c06\u4e3a\u5982\u4f55\u4f7f\u7528\u5b83\u63d0\u4f9b\u66f4\u5927\u7684\u7075\u6d3b\u6027\u3002\u5b83\u8fd8\u4f1a\u5c06\u7ef4\u5ea6\u5c55\u5e73\u4e3a 1-D\uff0c\u8fd9\u662f\u8bb8\u591a\u540e\u7eed\u64cd\u4f5c\u6240\u9700\u7684\u683c\u5f0f\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\nfrom openai.embeddings_utils import distances_from_embeddings\n\ndf=pd.read_csv('processed\/embeddings.csv', index_col=0)\ndf&#91;'embeddings'] = df&#91;'embeddings'].apply(eval).apply(np.array)\n\ndf.head()<\/code><\/pre>\n\n\n\n<p>\u73b0\u5728\u6570\u636e\u5df2\u51c6\u5907\u5c31\u7eea\uff0c\u8be5\u95ee\u9898\u9700\u8981\u8f6c\u6362\u4e3a\u5177\u6709\u7b80\u5355\u51fd\u6570\u7684\u5d4c\u5165\u3002\u8fd9\u5f88\u91cd\u8981\uff0c\u56e0\u4e3a\u4f7f\u7528\u5d4c\u5165\u7684\u641c\u7d22\u4f7f\u7528\u4f59\u5f26\u8ddd\u79bb\u6bd4\u8f83\u6570\u5b57\u5411\u91cf\uff08\u8fd9\u662f\u539f\u59cb\u6587\u672c\u7684\u8f6c\u6362\uff09\u3002\u5411\u91cf\u53ef\u80fd\u662f\u76f8\u5173\u7684\uff0c\u5982\u679c\u5b83\u4eec\u7684\u4f59\u5f26\u8ddd\u79bb\u63a5\u8fd1\uff0c\u5219\u53ef\u80fd\u662f\u95ee\u9898\u7684\u7b54\u6848\u3002OpenAI python\u5305\u6709\u4e00\u4e2a\u5185\u7f6e\u51fd\u6570\uff0c\u5728\u8fd9\u91cc\u5f88\u6709\u7528\u3002<code>distances_from_embeddings<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def create_context(\n    question, df, max_len=1800, size=\"ada\"\n):\n    \"\"\"\n    Create a context for a question by finding the most similar context from the dataframe\n    \"\"\"\n\n    # Get the embeddings for the question\n    q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')&#91;'data']&#91;0]&#91;'embedding']\n\n    # Get the distances from the embeddings\n    df&#91;'distances'] = distances_from_embeddings(q_embeddings, df&#91;'embeddings'].values, distance_metric='cosine')\n\n\n    returns = &#91;]\n    cur_len = 0\n\n    # Sort by distance and add the text to the context until the context is too long\n    for i, row in df.sort_values('distances', ascending=True).iterrows():\n        \n        # Add the length of the text to the current length\n        cur_len += row&#91;'n_tokens'] + 4\n        \n        # If the context is too long, break\n        if cur_len &gt; max_len:\n            break\n        \n        # Else add it to the text that is being returned\n        returns.append(row&#91;\"text\"])\n\n    # Return the context\n    return \"\\n\\n###\\n\\n\".join(returns)<\/code><\/pre>\n\n\n\n<p>\u6587\u672c\u88ab\u5206\u89e3\u4e3a\u8f83\u5c0f\u7684\u6807\u8bb0\u96c6\uff0c\u56e0\u6b64\u6309\u5347\u5e8f\u5faa\u73af\u5e76\u7ee7\u7eed\u6dfb\u52a0\u6587\u672c\u662f\u786e\u4fdd\u5b8c\u6574\u7b54\u6848\u7684\u5173\u952e\u6b65\u9aa4\u3002\u5982\u679c\u8fd4\u56de\u7684\u5185\u5bb9\u591a\u4e8e\u6240\u9700\u5185\u5bb9\uff0c\u4e5f\u53ef\u4ee5\u5c06max_len\u4fee\u6539\u4e3a\u8f83\u5c0f\u7684\u5185\u5bb9\u3002<\/p>\n\n\n\n<p>\u4e0a\u4e00\u6b65\u4ec5\u68c0\u7d22\u4e0e\u95ee\u9898\u8bed\u4e49\u76f8\u5173\u7684\u6587\u672c\u5757\uff0c\u56e0\u6b64\u5b83\u4eec\u53ef\u80fd\u5305\u542b\u7b54\u6848\uff0c\u4f46\u65e0\u6cd5\u4fdd\u8bc1\u3002\u901a\u8fc7\u8fd4\u56de\u524d 5 \u4e2a\u6700\u53ef\u80fd\u7684\u7ed3\u679c\uff0c\u53ef\u4ee5\u8fdb\u4e00\u6b65\u589e\u52a0\u627e\u5230\u7b54\u6848\u7684\u673a\u4f1a\u3002<\/p>\n\n\n\n<p>\u7136\u540e\uff0c\u56de\u7b54\u63d0\u793a\u5c06\u5c1d\u8bd5\u4ece\u68c0\u7d22\u5230\u7684\u4e0a\u4e0b\u6587\u4e2d\u63d0\u53d6\u76f8\u5173\u4e8b\u5b9e\uff0c\u4ee5\u4fbf\u5236\u5b9a\u8fde\u8d2f\u7684\u7b54\u6848\u3002\u5982\u679c\u6ca1\u6709\u76f8\u5173\u7b54\u6848\uff0c\u63d0\u793a\u5c06\u8fd4\u56de\u201c\u6211\u4e0d\u77e5\u9053\u201d\u3002<\/p>\n\n\n\n<p>\u53ef\u4ee5\u4f7f\u7528\u5b8c\u6210\u7aef\u70b9\u521b\u5efa\u95ee\u9898\u7684\u73b0\u5b9e\u7b54\u6848\u3002<code>text-davinci-003<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def answer_question(\n    df,\n    model=\"text-davinci-003\",\n    question=\"Am I allowed to publish model outputs to Twitter, without a human review?\",\n    max_len=1800,\n    size=\"ada\",\n    debug=False,\n    max_tokens=150,\n    stop_sequence=None\n):\n    \"\"\"\n    Answer a question based on the most similar context from the dataframe texts\n    \"\"\"\n    context = create_context(\n        question,\n        df,\n        max_len=max_len,\n        size=size,\n    )\n    # If debug, print the raw model response\n    if debug:\n        print(\"Context:\\n\" + context)\n        print(\"\\n\\n\")\n\n    try:\n        # Create a completions using the question and context\n        response = openai.Completion.create(\n            prompt=f\"Answer the question based on the context below, and if the question can't be answered based on the context, say \\\"I don't know\\\"\\n\\nContext: {context}\\n\\n---\\n\\nQuestion: {question}\\nAnswer:\",\n            temperature=0,\n            max_tokens=max_tokens,\n            top_p=1,\n            frequency_penalty=0,\n            presence_penalty=0,\n            stop=stop_sequence,\n            model=model,\n        )\n        return response&#91;\"choices\"]&#91;0]&#91;\"text\"].strip()\n    except Exception as e:\n        print(e)\n        return \"\"<\/code><\/pre>\n\n\n\n<p>\u5927\u529f\u544a\u6210\uff01\u4e00\u4e2a\u4eceOpenAI\u7f51\u7ad9\u5d4c\u5165\u77e5\u8bc6\u7684\u5de5\u4f5c\u95ee\u7b54\u7cfb\u7edf\u73b0\u5728\u5df2\u7ecf\u51c6\u5907\u597d\u4e86\u3002\u53ef\u4ee5\u8fdb\u884c\u4e00\u4e9b\u5feb\u901f\u6d4b\u8bd5\u4ee5\u67e5\u770b\u8f93\u51fa\u7684\u8d28\u91cf\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>answer_question(df, question=\"What day is it?\", debug=False)\n\nanswer_question(df, question=\"What is our newest embeddings model?\")\n\nanswer_question(df, question=\"What is ChatGPT?\")<\/code><\/pre>\n\n\n\n<p>\u54cd\u5e94\u5c06\u5982\u4e0b\u6240\u793a\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\"I don't know.\"\n\n'The newest embeddings model is text-embedding-ada-002.'\n\n'ChatGPT is a model trained to interact in a conversational way. It is able to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.'<\/code><\/pre>\n\n\n\n<p>\u5982\u679c\u7cfb\u7edf\u65e0\u6cd5\u56de\u7b54\u9884\u671f\u7684\u95ee\u9898\uff0c\u5219\u503c\u5f97\u641c\u7d22\u539f\u59cb\u6587\u672c\u6587\u4ef6\uff0c\u4ee5\u67e5\u770b\u9884\u671f\u5df2\u77e5\u7684\u4fe1\u606f\u662f\u5426\u5b9e\u9645\u4e0a\u6700\u7ec8\u88ab\u5d4c\u5165\u3002\u6700\u521d\u5b8c\u6210\u7684\u722c\u7f51\u8fc7\u7a0b\u8bbe\u7f6e\u4e3a\u8df3\u8fc7\u63d0\u4f9b\u7684\u539f\u59cb\u7f51\u57df\u4e4b\u5916\u7684\u7f51\u7ad9\uff0c\u56e0\u6b64\u5982\u679c\u6709\u5b50\u7f51\u57df\u8bbe\u7f6e\uff0c\u5b83\u53ef\u80fd\u6ca1\u6709\u8be5\u77e5\u8bc6\u3002<\/p>\n\n\n\n<p>\u76ee\u524d\uff0c\u6bcf\u6b21\u90fd\u4f1a\u4f20\u5165\u6570\u636e\u5e27\u4ee5\u56de\u7b54\u95ee\u9898\u3002\u5bf9\u4e8e\u66f4\u591a\u7684\u751f\u4ea7\u5de5\u4f5c\u6d41\u7a0b\uff0c\u5e94\u4f7f\u7528<a href=\"https:\/\/platform.openai.com\/docs\/guides\/embeddings\/how-can-i-retrieve-k-nearest-embedding-vectors-quickly\">\u77e2\u91cf\u6570\u636e\u5e93\u89e3\u51b3\u65b9\u6848<\/a>\uff0c\u800c\u4e0d\u662f\u5c06\u5d4c\u5165\u5b58\u50a8\u5728CSV\u6587\u4ef6\u4e2d\uff0c\u4f46\u5f53\u524d\u7684\u65b9\u6cd5\u662f\u4e00\u4e2a\u5f88\u597d\u7684\u539f\u578b\u8bbe\u8ba1\u9009\u62e9\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u672c\u6559\u7a0b\u6f14\u7ec3\u4e00\u4e2a\u7b80\u5355\u7684\u793a\u4f8b\uff1a\u5bf9\u7f51\u7ad9\uff08\u5728\u6b64\u793a\u4f8b\u4e2d\u4e3a OpenAI \u7f51\u7ad9\uff09\u8fdb\u884c\u722c\u7f51\uff0c\u4f7f\u7528\u5d4c\u5165 API \u5c06\u5df2\u722c\u7f51\u7684\u9875\u9762 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[289,312,43],"tags":[242,314],"class_list":["post-1276","post","type-post","status-publish","format-standard","hentry","category-gpt","category-openai","category-infoarticle","tag-chatgpt","tag-openai-api"],"views":917,"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts\/1276","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1276"}],"version-history":[{"count":3,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts\/1276\/revisions"}],"predecessor-version":[{"id":1313,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts\/1276\/revisions\/1313"}],"wp:attachment":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1276"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1276"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1276"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}