{"id":4433,"date":"2024-08-13T04:31:57","date_gmt":"2024-08-12T20:31:57","guid":{"rendered":"https:\/\/www.aqwu.net\/wp\/?p=4433"},"modified":"2024-08-13T04:34:05","modified_gmt":"2024-08-12T20:34:05","slug":"%e6%ac%a2%e8%bf%8e-falconmamba%ef%bc%9a%e9%a6%96%e6%ac%be%e5%bc%ba%e5%a4%a7%e7%9a%84%e6%97%a0%e5%85%b3%e6%b3%a8-7b-%e5%9e%8b%e5%8f%b7","status":"publish","type":"post","link":"https:\/\/www.aqwu.net\/wp\/?p=4433","title":{"rendered":"\u6b22\u8fce FalconMamba\uff1a\u9996\u6b3e\u5f3a\u5927\u7684\u65e0\u5173\u6ce8 7B \u578b\u53f7"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/falconllm.tii.ae\/tii-releases-first-sslm-with-falcon-mamba-7b.html\">Falcon Mamba<\/a>&nbsp;\u662f\u963f\u5e03\u624e\u6bd4<a href=\"https:\/\/www.tii.ae\/ai-and-digital-science\">\u6280\u672f\u521b\u65b0\u7814\u7a76\u6240 \uff08TII\uff09<\/a>&nbsp;\u6839\u636e TII Falcon License 2.0 \u53d1\u5e03\u7684\u65b0\u578b\u53f7\u3002\u8be5\u6a21\u578b\u662f\u5f00\u653e\u83b7\u53d6\u7684\uff0c\u53ef\u5728Hugging Face\u751f\u6001\u7cfb\u7edf\u4e2d\u4f7f\u7528\uff0c\u4efb\u4f55\u4eba\u90fd\u53ef\u4ee5\u7528\u4e8e<a href=\"https:\/\/huggingface.co\/tiiuae\/falcon-mamba-7b\"><\/a>\u4ed6\u4eec\u7684\u7814\u7a76\u6216\u5e94\u7528\u76ee\u7684\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u5728\u8fd9\u7bc7\u535a\u5ba2\u4e2d\uff0c\u6211\u4eec\u5c06\u4ecb\u7ecd\u6a21\u578b\u80cc\u540e\u7684\u8bbe\u8ba1\u51b3\u7b56\uff0c\u8be5\u6a21\u578b\u76f8\u5bf9\u4e8e\u5176\u4ed6\u73b0\u6709\u7684 SoTA \u6a21\u578b\u5982\u4f55\u5177\u6709\u7ade\u4e89\u529b\uff0c\u4ee5\u53ca\u5982\u4f55\u5728 Hugging Face \u751f\u6001\u7cfb\u7edf\u4e2d\u4f7f\u7528\u5b83\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u7b2c\u4e00\u4e2a\u901a\u7528\u7684\u5927\u89c4\u6a21\u7eaf\u66fc\u5df4\u6a21\u578b(pure Mamba model)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u57fa\u4e8e\u6ce8\u610f\u529b\u673a\u5236\u7684\u8f6c\u6362\u5668\u662f\u5f53\u4eca\u6240\u6709\u6700\u5f3a\u5927\u7684\u5927\u578b\u8bed\u8a00\u6a21\u578b\u4e2d\u4f7f\u7528\u7684\u4e3b\u8981\u67b6\u6784\u3002\u7136\u800c\uff0c\u7531\u4e8e\u8ba1\u7b97\u548c\u5185\u5b58\u6210\u672c\u968f\u7740\u5e8f\u5217\u957f\u5ea6\u7684\u589e\u52a0\u800c\u589e\u52a0\uff0c\u6ce8\u610f\u529b\u673a\u5236\u5728\u5904\u7406\u5927\u578b\u5e8f\u5217\u65f6\u4ece\u6839\u672c\u4e0a\u53d7\u5230\u9650\u5236\u3002\u5404\u79cd\u66ff\u4ee3\u67b6\u6784\uff0c\u7279\u522b\u662f\u72b6\u6001\u7a7a\u95f4\u8bed\u8a00\u6a21\u578b \uff08SSLM\uff09\uff0c\u8bd5\u56fe\u89e3\u51b3\u5e8f\u5217\u7f29\u653e\u9650\u5236\uff0c\u4f46\u4e0e SoTA \u8f6c\u6362\u5668\u76f8\u6bd4\uff0c\u6027\u80fd\u6709\u6240\u56de\u843d\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u901a\u8fc7Falcon Mamba\uff0c\u6211\u4eec\u8bc1\u660e\u4e86\u5e8f\u5217\u7f29\u653e\u9650\u5236\u786e\u5b9e\u53ef\u4ee5\u514b\u670d\uff0c\u800c\u4e0d\u4f1a\u964d\u4f4e\u6027\u80fd\u3002Falcon Mamba \u57fa\u4e8e Mamba \u4e2d\u63d0\u51fa\u7684\u539f\u59cb Mamba \u67b6\u6784<a href=\"https:\/\/arxiv.org\/abs\/2312.00752\"><em>\uff1a\u5177\u6709\u9009\u62e9\u6027\u72b6\u6001\u7a7a\u95f4\u7684\u7ebf\u6027\u65f6\u95f4\u5e8f\u5217\u5efa\u6a21<\/em><\/a>\uff0c\u5e76\u6dfb\u52a0\u4e86\u989d\u5916\u7684 RMS \u5f52\u4e00\u5316\u5c42\u4ee5\u786e\u4fdd\u5927\u89c4\u6a21\u7684\u7a33\u5b9a\u8bad\u7ec3\u3002\u8fd9\u79cd\u67b6\u6784\u7684\u9009\u62e9\u786e\u4fdd\u4e86 Falcon Mamba\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u53ef\u4ee5\u5728\u4e0d\u589e\u52a0\u5185\u5b58\u5b58\u50a8\u7684\u60c5\u51b5\u4e0b\u5904\u7406\u4efb\u610f\u957f\u5ea6\u7684\u5e8f\u5217\uff0c\u7279\u522b\u662f\u9002\u5408\u5355\u4e2a A10 24GB GPU\u3002<\/li>\n\n\n\n<li>\u65e0\u8bba\u4e0a\u4e0b\u6587\u7684\u5927\u5c0f\u5982\u4f55\uff0c\u90fd\u9700\u8981\u6052\u5b9a\u7684\u65f6\u95f4\u6765\u751f\u6210\u65b0\u7684\u4ee4\u724c\uff08\u8bf7\u53c2\u9605\u6b64<a href=\"https:\/\/huggingface.co\/blog\/falconmamba#hardware-performance\">\u90e8\u5206<\/a>)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/huggingface.co\/blog\/falconmamba#model-training\"><\/a>\u6a21\u578b\u8bad\u7ec3<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Falcon Mamba \u4f7f\u7528 ~ 5500GT \u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4e3b\u8981\u7531 RefinedWeb \u6570\u636e\u7ec4\u6210\uff0c\u6b64\u5916\u8fd8\u6709\u6765\u81ea\u516c\u5171\u6765\u6e90\u7684\u9ad8\u8d28\u91cf\u6280\u672f\u6570\u636e\u548c\u4ee3\u7801\u6570\u636e\u3002\u6211\u4eec\u5728\u5927\u90e8\u5206\u8bad\u7ec3\u4e2d\u4f7f\u7528\u6052\u5b9a\u5b66\u4e60\u7387\uff0c\u7136\u540e\u662f\u76f8\u5bf9\u8f83\u77ed\u7684\u5b66\u4e60\u7387\u8870\u51cf\u9636\u6bb5\u3002\u5728\u6700\u540e\u9636\u6bb5\uff0c\u6211\u4eec\u8fd8\u6dfb\u52a0\u4e86\u4e00\u5c0f\u90e8\u5206\u9ad8\u8d28\u91cf\u7684\u7cbe\u9009\u6570\u636e\uff0c\u4ee5\u8fdb\u4e00\u6b65\u63d0\u9ad8\u6a21\u578b\u6027\u80fd\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/huggingface.co\/blog\/falconmamba#evaluations\"><\/a>\u8bc4\u4f30<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u6211\u4eec\u4f7f\u7528<code>lm-evaluation-harness<\/code>\u8f6f\u4ef6\u5305\u5728\u65b0\u6392\u884c\u699c\u7248\u672c\u7684\u6240\u6709\u57fa\u51c6\u4e0a\u8bc4\u4f30\u6211\u4eec\u7684\u6a21\u578b\uff0c\u7136\u540e\u4f7f\u7528 Hugging Face \u5206\u6570\u5f52\u4e00\u5316\u5bf9\u8bc4\u4f30\u7ed3\u679c\u8fdb\u884c\u5f52\u4e00\u5316\u3002<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><code>model name<\/code><\/th><th><code>IFEval<\/code><\/th><th><code>BBH<\/code><\/th><th><code>MATH LvL5<\/code><\/th><th><code>GPQA<\/code><\/th><th><code>MUSR<\/code><\/th><th><code>MMLU-PRO<\/code><\/th><th><code>Average<\/code><\/th><\/tr><\/thead><tbody><tr><td><em><strong>\u7eaf SSM \u6a21\u578b<\/strong><\/em><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><code>Falcon Mamba-7B<\/code><\/td><td>33.36<\/td><td>19.88<\/td><td>3.63<\/td><td>8.05<\/td><td>10.86<\/td><td>14.47<\/td><td><strong>15.04<\/strong><\/td><\/tr><tr><td><code>TRI-ML\/mamba-7b-rw<\/code><sup>*<\/sup><\/td><td>22.46<\/td><td>6.71<\/td><td>0.45<\/td><td>1.12<\/td><td>5.51<\/td><td>1.69<\/td><td>6.25<\/td><\/tr><tr><td><em><strong>\u6df7\u5408 SSM \u6ce8\u610f\u529b\u6a21\u578b<\/strong><\/em><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><code>recurrentgemma-9b<\/code><\/td><td>30.76<\/td><td>14.80<\/td><td>4.83<\/td><td>4.70<\/td><td>6.60<\/td><td>17.88<\/td><td>13.20<\/td><\/tr><tr><td><code>Zyphra\/Zamba-7B-v1<\/code><sup>*<\/sup><\/td><td>24.06<\/td><td>21.12<\/td><td>3.32<\/td><td>3.03<\/td><td>7.74<\/td><td>16.02<\/td><td>12.55<\/td><\/tr><tr><td><em><strong>\u53d8\u538b\u5668\u578b\u53f7<\/strong><\/em><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><code>Falcon2-11B<\/code><\/td><td>32.61<\/td><td>21.94<\/td><td>2.34<\/td><td>2.80<\/td><td>7.53<\/td><td>15.44<\/td><td>13.78<\/td><\/tr><tr><td><code>Meta-Llama-3-8B<\/code><\/td><td>14.55<\/td><td>24.50<\/td><td>3.25<\/td><td>7.38<\/td><td>6.24<\/td><td>24.55<\/td><td>13.41<\/td><\/tr><tr><td><code>Meta-Llama-3.1-8B<\/code><\/td><td>12.70<\/td><td>25.29<\/td><td>4.61<\/td><td>6.15<\/td><td>8.98<\/td><td>24.95<\/td><td>13.78<\/td><\/tr><tr><td><code>Mistral-7B-v0.1<\/code><\/td><td>23.86<\/td><td>22.02<\/td><td>2.49<\/td><td>5.59<\/td><td>10.68<\/td><td>22.36<\/td><td>14.50<\/td><\/tr><tr><td><code>Mistral-Nemo-Base-2407 (12B)<\/code><\/td><td>16.83<\/td><td>29.37<\/td><td>4.98<\/td><td>5.82<\/td><td>6.52<\/td><td>27.46<\/td><td>15.08<\/td><\/tr><tr><td><code>gemma-7B<\/code><\/td><td>26.59<\/td><td>21.12<\/td><td>6.42<\/td><td>4.92<\/td><td>10.98<\/td><td>21.64<\/td><td><strong>15.28<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u6b64\u5916\uff0c\u6211\u4eec\u4f7f\u7528\u00a0<code>lighteval<\/code> \u5728 LLM \u6392\u884c\u699c\u7b2c\u4e00\u7248\u7684\u57fa\u51c6\u4e0a\u8bc4\u4f30\u6211\u4eec\u7684\u6a21\u578b\u3002<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><code>model name<\/code><\/th><th><code>ARC<\/code><\/th><th><code>HellaSwag<\/code><\/th><th><code>MMLU<\/code><\/th><th><code>Winogrande<\/code><\/th><th><code>TruthfulQA<\/code><\/th><th><code>GSM8K<\/code><\/th><th><code>Average<\/code><\/th><\/tr><\/thead><tbody><tr><td><em><strong>\u7eaf SSM \u6a21\u578b<\/strong><\/em><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><code>Falcon Mamba-7B<\/code><sup>*<\/sup><\/td><td>62.03<\/td><td>80.82<\/td><td>62.11<\/td><td>73.64<\/td><td>53.42<\/td><td>52.54<\/td><td><strong>64.09<\/strong><\/td><\/tr><tr><td><code>TRI-ML\/mamba-7b-rw<\/code><sup>*<\/sup><\/td><td>51.25<\/td><td>80.85<\/td><td>33.41<\/td><td>71.11<\/td><td>32.08<\/td><td>4.70<\/td><td>45.52<\/td><\/tr><tr><td><em><strong>\u6df7\u5408 SSM \u6ce8\u610f\u529b\u6a21\u578b<\/strong><\/em><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><code>recurrentgemma-9b<\/code><sup>**<\/sup><\/td><td>52.00<\/td><td>80.40<\/td><td>60.50<\/td><td>73.60<\/td><td>38.60<\/td><td>42.60<\/td><td>57.95<\/td><\/tr><tr><td><code>Zyphra\/Zamba-7B-v1<\/code><sup>*<\/sup><\/td><td>56.14<\/td><td>82.23<\/td><td>58.11<\/td><td>79.87<\/td><td>52.88<\/td><td>30.78<\/td><td>60.00<\/td><\/tr><tr><td><em><strong>\u53d8\u538b\u5668\u578b\u53f7<\/strong><\/em><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td><code>Falcon2-11B<\/code><\/td><td>59.73<\/td><td>82.91<\/td><td>58.37<\/td><td>78.30<\/td><td>52.56<\/td><td>53.83<\/td><td><strong>64.28<\/strong><\/td><\/tr><tr><td><code>Meta-Llama-3-8B<\/code><\/td><td>60.24<\/td><td>82.23<\/td><td>66.70<\/td><td>78.45<\/td><td>42.93<\/td><td>45.19<\/td><td>62.62<\/td><\/tr><tr><td><code>Meta-Llama-3.1-8B<\/code><\/td><td>58.53<\/td><td>82.13<\/td><td>66.43<\/td><td>74.35<\/td><td>44.29<\/td><td>47.92<\/td><td>62.28<\/td><\/tr><tr><td><code>Mistral-7B-v0.1<\/code><\/td><td>59.98<\/td><td>83.31<\/td><td>64.16<\/td><td>78.37<\/td><td>42.15<\/td><td>37.83<\/td><td>60.97<\/td><\/tr><tr><td><code>gemma-7B<\/code><\/td><td>61.09<\/td><td>82.20<\/td><td>64.56<\/td><td>79.01<\/td><td>44.79<\/td><td>50.87<\/td><td>63.75<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u5bf9\u4e8e\u6807<em>\u6709\u661f<\/em>\u53f7\u7684\u6a21\u578b\uff0c\u6211\u4eec\u5728\u5185\u90e8\u8bc4\u4f30\u4e86\u4efb\u52a1\uff0c\u800c\u5bf9\u4e8e\u6807\u6709\u4e24\u4e2a<em>\u661f\u53f7<\/em>\u7684\u6a21\u578b\uff0c\u7ed3\u679c\u53d6\u81ea\u7eb8\u8d28\u6216\u6a21\u578b\u5361\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/huggingface.co\/blog\/falconmamba#processing-large-sequences\"><\/a>\u5904\u7406\u5927\u578b\u5e8f\u5217<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u5728\u5904\u7406\u5927\u578b\u5e8f\u5217\u65f6\uff0c\u9075\u5faa\u7406\u8bba\u6548\u7387 SSM \u6a21\u578b\uff0c\u6211\u4eec\u4f7f\u7528&nbsp;<a href=\"https:\/\/github.com\/huggingface\/optimum-benchmark\">optimum-benchmark<\/a>&nbsp;\u5e93\u5bf9 Falcon Mamba \u548c\u6d41\u884c\u7684\u8f6c\u6362\u5668\u6a21\u578b\u4e4b\u95f4\u7684\u5185\u5b58\u4f7f\u7528\u91cf\u548c\u751f\u6210\u541e\u5410\u91cf\u8fdb\u884c\u4e86\u6bd4\u8f83\u3002\u4e3a\u4e86\u516c\u5e73\u6bd4\u8f83\uff0c\u6211\u4eec\u91cd\u65b0\u8c03\u6574\u4e86\u6240\u6709 transformer \u6a21\u578b\u7684\u8bcd\u6c47\u91cf\u4ee5\u5339\u914d Falcon Mamba\uff0c\u56e0\u4e3a\u5b83\u5bf9\u6a21\u578b\u7684\u5185\u5b58\u9700\u6c42\u6709\u5f88\u5927\u5f71\u54cd\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u5728\u8ba8\u8bba\u7ed3\u679c\u4e4b\u524d\uff0c\u8ba9\u6211\u4eec\u5148\u8ba8\u8bba\u4e00\u4e0b\u5e8f\u5217\u7684\u63d0\u793a\uff08\u9884\u586b\u5145\uff09\u548c\u751f\u6210\uff08\u89e3\u7801\uff09\u90e8\u5206\u4e4b\u95f4\u7684\u533a\u522b\u3002\u6b63\u5982\u6211\u4eec\u5c06\u770b\u5230\u7684\uff0c\u5bf9\u4e8e\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\u6765\u8bf4\uff0c\u9884\u586b\u5145\u7684\u7ec6\u8282\u6bd4\u5bf9\u4e8e\u8f6c\u6362\u5668\u6a21\u578b\u66f4\u91cd\u8981\u3002\u5f53\u8f6c\u6362\u5668\u751f\u6210\u4e0b\u4e00\u4e2a\u4ee4\u724c\u65f6\uff0c\u5b83\u9700\u8981\u6ce8\u610f\u4e0a\u4e0b\u6587\u4e2d\u6240\u6709\u5148\u524d\u4ee4\u724c\u7684\u952e\u548c\u503c\u3002\u8fd9\u610f\u5473\u7740\u5185\u5b58\u9700\u6c42\u548c\u751f\u6210\u65f6\u95f4\u968f\u4e0a\u4e0b\u6587\u957f\u5ea6\u8fdb\u884c\u7ebf\u6027\u7f29\u653e\u3002\u72b6\u6001\u7a7a\u95f4\u6a21\u578b\u4ec5\u5173\u6ce8\u5e76\u5b58\u50a8\u5176\u5faa\u73af\u72b6\u6001\uff0c\u56e0\u6b64\u4e0d\u9700\u8981\u989d\u5916\u7684\u5185\u5b58\u6216\u65f6\u95f4\u6765\u751f\u6210\u5927\u578b\u5e8f\u5217\u3002\u867d\u7136\u8fd9\u89e3\u91ca\u4e86 SSM \u5728\u89e3\u7801\u9636\u6bb5\u76f8\u5bf9\u4e8e\u53d8\u538b\u5668\u7684\u6240\u8c13\u4f18\u52bf\uff0c\u4f46\u9884\u586b\u5145\u9636\u6bb5\u9700\u8981\u989d\u5916\u7684\u52aa\u529b\u624d\u80fd\u5145\u5206\u5229\u7528 SSM \u67b6\u6784\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u9884\u586b\u5145\u7684\u6807\u51c6\u65b9\u6cd5\u662f\u5e76\u884c\u5904\u7406\u6574\u4e2a\u63d0\u793a\uff0c\u4ee5\u5145\u5206\u5229\u7528 GPU\u3002\u8fd9\u79cd\u65b9\u6cd5\u5728&nbsp;<a href=\"https:\/\/github.com\/huggingface\/optimum-benchmark\">optimum-benchmark<\/a>&nbsp;\u5e93\u4e2d\u4f7f\u7528\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a\u5e76\u884c\u9884\u586b\u5145\u3002\u5e76\u884c\u9884\u586b\u5145\u9700\u8981\u5728\u5185\u5b58\u4e2d\u5b58\u50a8\u63d0\u793a\u7b26\u4e2d\u6bcf\u4e2a\u4ee4\u724c\u7684\u9690\u85cf\u72b6\u6001\u3002\u5bf9\u4e8e\u53d8\u538b\u5668\u6765\u8bf4\uff0c\u8fd9\u4e2a\u989d\u5916\u7684\u5185\u5b58\u4e3b\u8981\u662f\u7531\u5b58\u50a8\u7684KV\u7f13\u5b58\u7684\u5185\u5b58\u4e3b\u5bfc\u7684\u3002\u5bf9\u4e8e SSM \u6a21\u578b\uff0c\u4e0d\u9700\u8981\u7f13\u5b58\uff0c\u7528\u4e8e\u5b58\u50a8\u9690\u85cf\u72b6\u6001\u7684\u5185\u5b58\u6210\u4e3a\u552f\u4e00\u4e0e\u63d0\u793a\u957f\u5ea6\u6210\u6b63\u6bd4\u7684\u7ec4\u4ef6\u3002\u56e0\u6b64\uff0c\u5185\u5b58\u9700\u6c42\u5c06\u968f\u7740\u63d0\u793a\u957f\u5ea6\u7684\u589e\u52a0\u800c\u6269\u5c55\uff0c\u5e76\u4e14 SSM \u6a21\u578b\u5c06\u5931\u53bb\u5904\u7406\u4efb\u610f\u957f\u5e8f\u5217\u7684\u80fd\u529b\uff0c\u7c7b\u4f3c\u4e8e\u8f6c\u6362\u5668\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u5e76\u884c\u9884\u586b\u5145\u7684\u66ff\u4ee3\u65b9\u6cd5\u662f\u9010\u4e2a\u4ee4\u724c\u5904\u7406\u63d0\u793a\u4ee4\u724c\uff0c\u6211\u4eec\u5c06\u5176\u79f0\u4e3a<em>\u987a\u5e8f\u9884\u586b\u5145<\/em>\u3002\u4e0e\u5e8f\u5217\u5e76\u884c\u7c7b\u4f3c\uff0c\u5b83\u4e5f\u53ef\u4ee5\u5728\u63d0\u793a\u7684\u8f83\u5927\u5757\u4e0a\u5b8c\u6210\uff0c\u800c\u4e0d\u662f\u5728\u5355\u4e2a\u4ee4\u724c\u4e0a\u5b8c\u6210\uff0c\u4ee5\u83b7\u5f97\u66f4\u597d\u7684 GPU \u4f7f\u7528\u7387\u3002\u867d\u7136\u987a\u5e8f\u9884\u586b\u5145\u5bf9\u53d8\u538b\u5668\u6765\u8bf4\u610f\u4e49\u4e0d\u5927\uff0c\u4f46\u5b83\u5e26\u56de\u4e86 SSM \u6a21\u578b\u5904\u7406\u4efb\u610f\u957f\u63d0\u793a\u7684\u53ef\u80fd\u6027\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u8003\u8651\u5230\u8fd9\u4e9b\u8bc4\u8bba\uff0c\u6211\u4eec\u9996\u5148\u6d4b\u8bd5\u4e86\u5355\u4e2a 24 GB A10 GPU \u4e0a\u53ef\u4ee5\u5bb9\u7eb3\u7684\u6700\u5927\u5e8f\u5217\u957f\u5ea6\uff0c\u5e76\u5c06\u7ed3\u679c\u5982\u4e0b<a href=\"https:\/\/huggingface.co\/blog\/falconmamba#max-length\">\u56fe<\/a>\u6240\u793a\u3002\u6279\u5904\u7406\u5927\u5c0f\u56fa\u5b9a\u4e3a 1\uff0c\u6211\u4eec\u4f7f\u7528 float32 \u7cbe\u5ea6\u3002\u5373\u4f7f\u5bf9\u4e8e\u5e76\u884c\u9884\u586b\u5145\uff0cFalcon Mamba \u4e5f\u53ef\u4ee5\u9002\u5e94\u6bd4\u8f6c\u6362\u5668\u66f4\u5927\u7684\u5e8f\u5217\uff0c\u800c\u5728\u987a\u5e8f\u9884\u586b\u5145\u4e2d\uff0c\u5b83\u53ef\u4ee5\u91ca\u653e\u5176\u5168\u90e8\u6f5c\u529b\u5e76\u53ef\u4ee5\u5904\u7406\u4efb\u610f\u957f\u63d0\u793a<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"432\" src=\"https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-2.png\" alt=\"\" class=\"wp-image-4438\" srcset=\"https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-2.png 720w, https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-2-300x180.png 300w, https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-2-600x360.png 600w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/huggingface.co\/datasets\/tiiuae\/documentation-images\/resolve\/main\/falcon_mamba\/max_len_llalma3-1.png\"><img decoding=\"async\" src=\"https:\/\/huggingface.co\/datasets\/tiiuae\/documentation-images\/resolve\/main\/falcon_mamba\/max_len_llalma3-1.png\" alt=\"\u6a21\u578b\u6027\u80fd\"\/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u4f7f\u7528\u6279\u91cf\u5927\u5c0f 1 \u548c H100 GPU \u5728\u63d0\u793a\u957f\u5ea6\u4e3a 1 \u548c\u6700\u591a 130k \u751f\u6210\u7684\u4ee4\u724c\u7684\u8bbe\u7f6e\u4e2d\u6d4b\u91cf\u751f\u6210\u541e\u5410\u91cf\u3002\u7ed3\u679c\u5982\u4e0b<a href=\"https:\/\/huggingface.co\/blog\/falconmamba#throughput\">\u56fe<\/a>\u6240\u793a\u3002\u6211\u4eec\u89c2\u5bdf\u5230\uff0c\u6211\u4eec\u7684 Falcon Mamba \u5728\u6052\u5b9a\u541e\u5410\u91cf\u4e0b\u751f\u6210\u6240\u6709\u4ee4\u724c\uff0c\u800c CUDA \u5cf0\u503c\u5185\u5b58\u6ca1\u6709\u4efb\u4f55\u589e\u52a0\u3002\u5bf9\u4e8e transformer \u6a21\u578b\uff0c\u5cf0\u503c\u5185\u5b58\u4f1a\u589e\u957f\uff0c\u5e76\u4e14\u968f\u7740\u751f\u6210\u7684 token \u6570\u91cf\u7684\u589e\u52a0\uff0c\u751f\u6210\u901f\u5ea6\u4f1a\u51cf\u6162\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"384\" src=\"https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-1.png\" alt=\"\" class=\"wp-image-4437\" srcset=\"https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-1.png 720w, https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-1-300x160.png 300w, https:\/\/www.aqwu.net\/wp\/wp-content\/uploads\/2024\/08\/\u56fe\u7247-1-600x320.png 600w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/huggingface.co\/datasets\/tiiuae\/documentation-images\/resolve\/main\/falcon_mamba\/thoughput-llama3-1.png\"><img decoding=\"async\" src=\"https:\/\/huggingface.co\/datasets\/tiiuae\/documentation-images\/resolve\/main\/falcon_mamba\/thoughput-llama3-1.png\" alt=\"\u6a21\u578b\u6027\u80fd\"\/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/huggingface.co\/blog\/falconmamba#how-to-use-it-within-hugging-face-transformers\"><\/a>\u5982\u4f55\u5728 Hugging Face \u53d8\u538b\u5668\u4e2d\u4f7f\u7528\u5b83\uff1f<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Falcon Mamba \u67b6\u6784\u5c06\u5728 Hugging Face \u8f6c\u6362\u5668\u5e93 \uff08&gt;4.45.0\uff09 \u7684\u4e0b\u4e00\u4e2a\u7248\u672c\u4e2d\u63d0\u4f9b\u3002\u8981\u4f7f\u7528\u8be5\u6a21\u578b\uff0c\u8bf7\u786e\u4fdd\u5b89\u88c5\u6700\u65b0\u7248\u672c\u7684 Hugging Face \u8f6c\u6362\u5668\u6216\u4ece\u6e90\u4ee3\u7801\u5b89\u88c5\u5e93\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Falcon Mamba \u4e0e\u60a8\u719f\u6089\u7684\u5927\u591a\u6570 Hugging Face \u63d0\u4f9b\u7684 API \u517c\u5bb9\uff0c\u4f8b\u5982 <code>AutoModelForCausalLM<\/code> \u6216 <code>pipeline<\/code> \uff1a<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:python decode:true \" >from transformers import AutoModelForCausalLM, AutoTokenizer \n\nmodel_id = \"tiiuae\/falcon-mamba-7b\" \ntokenizer = AutoTokenizer.from_pretrained(model_id) \n\nmodel = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=\"auto\", device_map=\"auto\") \ninputs = tokenizer(\"Hello world, today\", return_tensors=\"pt\").to(0) \n\noutput = model.generate(**inputs, max_new_tokens=100, do_sample=True) \nprint(tokenizer.decode(Output[0], skip_special_tokens=True)) \n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u7531\u4e8e\u6a21\u578b\u5f88\u5927\uff0c\u5b83\u8fd8\u652f\u6301 <code>bitsandbytes<\/code> \u91cf\u5316\u7b49\u529f\u80fd\uff0c\u4ee5\u5728\u8f83\u5c0f\u7684 GPU \u5185\u5b58\u7ea6\u675f\u4e0b\u8fd0\u884c\u6a21\u578b\uff0c\u4f8b\u5982\uff1a<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:python decode:true \" >from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig \n\nmodel_id = \"tiiuae\/falcon-mamba-7b\" \ntokenizer = AutoTokenizer.from_pretrained(model_id) \n\nquantization_config = BitsAndBytesConfig(load_in_4bit=True) \nmodel = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config) \n\ninputs = tokenizer(\"Hello world, today\", return_tensors=\"pt\").to(0) \noutput = model.generate(**inputs, max_new_tokens=100, do_sample=True) \n\nprint(tokenizer.decode(output[0], skip_special_tokens=True)) \n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u6211\u4eec\u4e5f\u5f88\u9ad8\u5174\u5730\u63a8\u51fa Falcon Mamba \u7684\u6307\u4ee4\u8c03\u6574\u7248\u672c\uff0c\u8be5\u7248\u672c\u5df2\u4f7f\u7528\u989d\u5916\u7684 50 \u4ebf\u4e2a\u76d1\u7763\u5fae\u8c03 \uff08SFT\uff09 \u6570\u636e\u4ee4\u724c\u8fdb\u884c\u4e86\u5fae\u8c03\u3002\u8fd9\u79cd\u6269\u5c55\u8bad\u7ec3\u589e\u5f3a\u4e86\u6a21\u578b\u4ee5\u66f4\u9ad8\u7684\u7cbe\u5ea6\u548c\u6709\u6548\u6027\u6267\u884c\u6559\u5b66\u4efb\u52a1\u7684\u80fd\u529b\u3002\u60a8\u53ef\u4ee5\u901a\u8fc7\u6211\u4eec\u7684\u6f14\u793a\u6765\u4f53\u9a8c ininstruction \u6a21\u578b\u7684\u529f\u80fd\uff0c<a href=\"https:\/\/huggingface.co\/spaces\/tiiuae\/falcon-mamba-playground\">\u53ef\u5728\u6b64\u5904<\/a>\u83b7\u5f97\u3002\u5bf9\u4e8e\u804a\u5929\u6a21\u677f\uff0c\u6211\u4eec\u4f7f\u7528\u4ee5\u4e0b\u683c\u5f0f\uff1a<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:python decode:true \" >&lt;|im_start|&gt;user\nprompt&lt;|im_end|&gt;\n&lt;|im_start|&gt;assistant\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">\u60a8\u4e5f\u53ef\u4ee5\u76f4\u63a5\u4f7f\u7528<a href=\"https:\/\/huggingface.co\/tiiuae\/falcon-mamba-7b-4bit\">\u57fa\u7840\u6a21\u578b<\/a>\u548c<a href=\"https:\/\/huggingface.co\/tiiuae\/falcon-mamba-7b-instruct-4bit\">\u6307\u793a\u6a21\u578b<\/a>\u7684 4 \u4f4d\u8f6c\u6362\u7248\u672c\u3002\u786e\u4fdd\u6709\u6743\u8bbf\u95ee\u4e0e <code>bitsandbytes<\/code> \u5e93\u517c\u5bb9\u7684 GPU \u6765\u8fd0\u884c\u91cf\u5316\u6a21\u578b\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u60a8\u8fd8\u53ef\u4ee5\u4f7f\u7528 <code>torch.compile<\/code> \u4ece\u66f4\u5feb\u7684\u63a8\u7406\u4e2d\u53d7\u76ca;\u53ea\u9700\u5728\u52a0\u8f7d\u6a21\u578b\u540e\u8c03\u7528 <code>model = torch.compile(model)<\/code> \u5373\u53ef\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/huggingface.co\/blog\/falconmamba#acknowledgments\"><\/a>\u786e\u8ba4<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u8fd9\u7bc7\u535a\u6587\u7684\u4f5c\u8005\u8981\u611f\u8c22 Hugging Face \u56e2\u961f\u5728\u4ed6\u4eec\u7684\u751f\u6001\u7cfb\u7edf\u4e2d\u7684\u987a\u5229\u652f\u6301\u548c\u6574\u5408<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/huggingface.co\/alozowski\">Alina Lozovskaya<\/a>\u00a0\u548c\u00a0<a href=\"https:\/\/huggingface.co\/clefourrier\">Clementine Fourrier<\/a>\u00a0\u5e2e\u52a9\u6211\u4eec\u5728\u6392\u884c\u699c\u4e0a\u8bc4\u4f30\u6a21\u578b<\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/ArthurZ\">Arthur Zucker<\/a>\u00a0\u8d1f\u8d23\u53d8\u538b\u5668\u96c6\u6210<\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/reach-vb\">Vaibhav Srivastav<\/a>\u3001<a href=\"https:\/\/huggingface.co\/hysts\">hysts<\/a>\u00a0\u548c\u00a0<a href=\"https:\/\/huggingface.co\/osanseviero\">Omar Sanseviero<\/a>\u00a0\u5bf9\u4e0e Hub \u76f8\u5173\u7684\u95ee\u9898\u7684\u652f\u6301<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u4f5c\u8005\u8fd8\u8981\u611f\u8c22 Tri Dao \u548c Albert Gu \u5c06 Mamba \u67b6\u6784\u5b9e\u65bd\u5e76\u5411\u793e\u533a\u5f00\u6e90\u3002<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u539f\u6587\u94fe\u63a5\uff1a<a href=\"https:\/\/huggingface.co\/blog\/falconmamba\">\u6b22\u8fce\u730e\u9e70\u66fc\u5df4\uff1a\u7b2c\u4e00\u6b3e\u5f3a\u5927\u7684\u65e0\u5173\u6ce87B\u578b\u53f7 &#8212; Welcome Falcon Mamba: The first strong attention-free 7B model (huggingface.co)<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Falcon Mamba&nbsp;\u662f\u963f\u5e03\u624e\u6bd4\u6280\u672f\u521b\u65b0\u7814\u7a76\u6240 \uff08TII\uff09&nbsp;\u6839\u636e TII Falcon [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[444,445,443,442],"tags":[534,404],"class_list":["post-4433","post","type-post","status-publish","format-standard","hentry","category-ai","category-ainews","category-llm","category-llms","tag-falcon-mamba-7b","tag-llm"],"views":1645,"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts\/4433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4433"}],"version-history":[{"count":3,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts\/4433\/revisions"}],"predecessor-version":[{"id":4439,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=\/wp\/v2\/posts\/4433\/revisions\/4439"}],"wp:attachment":[{"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4433"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aqwu.net\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}