了解 LLaMA-2 模型结构(1)

0. 前言

Llama2（有时拼写为LLaMA-2）是一个由Meta Platforms（以前的Facebook）发布的大型语言模型，旨在用于自然语言处理（NLP）任务。它是LLaMA（Large Language Model Meta AI）系列的一部分，旨在提供高效且可扩展的解决方案，以支持各种NLP和生成任务。到我最后的更新时，具体的Llama2模型结构的细节可能还未完全公开。然而，可以根据其前身和类似的大型语言模型推测它的一般特征和结构。

一般而言，大型语言模型如LLaMA系列和其他类似的模型（例如GPT-3、GPT-4、BERT等）通常基于变换器（Transformer）架构。变换器架构由Vaswani等人在2017年引入，它的核心是自注意力（self-attention）机制，这允许模型在处理输入数据时动态地权衡信息的重要性。

变换器模型通常包含以下主要组成部分：

自注意力层：使模型能够处理输入序列中的每个元素，并根据其他元素的内容动态调整对每个元素的关注度。
前馈网络（FFN）：在自注意力层之后，每个位置的输出会通过一个前馈网络，这是一种简单的神经网络，用于进一步处理数据。
层归一化和残差连接：为了提高训练过程的稳定性和效率，通常在自注意力层和前馈网络后加入层归一化和残差连接。
编码器和解码器结构：虽然一些模型（如GPT系列）仅使用解码器结构，其他模型（如BERT）则使用编码器结构，还有一些模型（如原始的Transformer）同时使用编码器和解码器。

Llama2作为一个大型语言模型，可能采用了这些基础组件的高度优化版本，并结合了最新的技术进展，比如稀疏注意力机制、效率更高的训练技术、以及更先进的参数共享策略等，以提高模型的性能和效率。

请注意，具体的Llama2模型结构细节可能需要查阅最新的文献或Meta Platforms发布的技术文档来获取。

1. 环境准备

conda create -yn llama2
conda activate llama2
 
git clone https://github.com/facebookresearch/llama
 
cd llama
mkdir newsrc
mkdir -p meta-llama/Llama-2-7b-chat-hf

conda create -yn llama2

conda activate llama2

git clone https://github.com/facebookresearch/llama

cd llama

mkdir newsrc

mkdir -p meta-llama/Llama-2-7b-chat-hf

2.下载模型

需要下载 https://huggingface.co/meta-llama/Llama-2-7b-chat-hf目录下的文件

模型文件目录内容：

ls -l meta-llama/Llama-2-7b-chat-hf/
total 13163396
-rwxrwxrwx 1 tony tony        614 Mar 10 23:51 config.json
-rwxrwxrwx 1 tony tony        188 Mar 10 23:51 generation_config.json
-rwxrwxrwx 1 tony tony       1519 Mar 10 23:51 gitattributes
-rwxrwxrwx 1 tony tony 9976576152 Mar 11 00:58 model-00001-of-00002.safetensors
-rwxrwxrwx 1 tony tony 3500296424 Mar 11 00:17 model-00002-of-00002.safetensors
-rwxrwxrwx 1 tony tony      26788 Mar 10 23:51 model.safetensors.index.json
-rwxrwxrwx 1 tony tony      26788 Mar 10 23:51 pytorch_model.bin.index.json
-rwxrwxrwx 1 tony tony        414 Mar 10 23:51 special_tokens_map.json
-rwxrwxrwx 1 tony tony    1842767 Mar 10 23:51 tokenizer.json
-rwxrwxrwx 1 tony tony     499723 Mar 10 23:51 tokenizer.model
-rwxrwxrwx 1 tony tony       1618 Mar 10 23:51 tokenizer_config.json

ls -l meta-llama/Llama-2-7b-chat-hf/

total 13163396

-rwxrwxrwx 1 tony tony 614 Mar 10 23:51 config.json

-rwxrwxrwx 1 tony tony 188 Mar 10 23:51 generation_config.json

-rwxrwxrwx 1 tony tony 1519 Mar 10 23:51 gitattributes

-rwxrwxrwx 1 tony tony 9976576152 Mar 11 00:58 model-00001-of-00002.safetensors

-rwxrwxrwx 1 tony tony 3500296424 Mar 11 00:17 model-00002-of-00002.safetensors

-rwxrwxrwx 1 tony tony 26788 Mar 10 23:51 model.safetensors.index.json

-rwxrwxrwx 1 tony tony 26788 Mar 10 23:51 pytorch_model.bin.index.json

-rwxrwxrwx 1 tony tony 414 Mar 10 23:51 special_tokens_map.json

-rwxrwxrwx 1 tony tony 1842767 Mar 10 23:51 tokenizer.json

-rwxrwxrwx 1 tony tony 499723 Mar 10 23:51 tokenizer.model

-rwxrwxrwx 1 tony tony 1618 Mar 10 23:51 tokenizer_config.json

3. 模型的加载

为了让代码能够执行，您需要确保已经安装了transformers库。如果尚未安装，可以通过运行pip install transformers命令来安装。

pip install transformers

1	pip install transformers

下面的代码是加载Llama-2-7b-chat-hf，程序命名为 test01.py，文件保存到 newsrc 目录下：

from transformers import AutoModelForCausalLM

# 指定模型路径
model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型
hf_model = AutoModelForCausalLM.from_pretrained(model_path)

# 打印模型信息
print(hf_model)

from transformers import AutoModelForCausalLM

# 指定模型路径

model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型

hf_model = AutoModelForCausalLM.from_pretrained(model_path)

# 打印模型信息

print(hf_model)

在模型下载准备好后，我们运行 test01.py

python newsrc/test01.py
Loading checkpoint shards: 100%|███████| 2/2 [02:32<00:00, 76.49s/it]
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

python newsrc/test01.py

Loading checkpoint shards: 100%|███████| 2/2 [02:32<00:00, 76.49s/it]

LlamaForCausalLM(

(model): LlamaModel(

(embed_tokens): Embedding(32000, 4096)

(layers): ModuleList(

(0-31): 32 x LlamaDecoderLayer(

(self_attn): LlamaSdpaAttention(

(q_proj): Linear(in_features=4096, out_features=4096, bias=False)

(k_proj): Linear(in_features=4096, out_features=4096, bias=False)

(v_proj): Linear(in_features=4096, out_features=4096, bias=False)

(o_proj): Linear(in_features=4096, out_features=4096, bias=False)

(rotary_emb): LlamaRotaryEmbedding()

)

(mlp): LlamaMLP(

(gate_proj): Linear(in_features=4096, out_features=11008, bias=False)

(up_proj): Linear(in_features=4096, out_features=11008, bias=False)

(down_proj): Linear(in_features=11008, out_features=4096, bias=False)

(act_fn): SiLU()

)

(input_layernorm): LlamaRMSNorm()

(post_attention_layernorm): LlamaRMSNorm()

)

(norm): LlamaRMSNorm()

)

(lm_head): Linear(in_features=4096, out_features=32000, bias=False)

)

从提供的输出中，我们可以看到Llama2模型的结构细节。这个模型是使用transformers库加载的，其路径是meta-llama/Llama-2-7b-chat-hf，表明这是一个专为聊天任务优化的7亿参数版本的Llama2模型。以下是模型结构的关键点：

嵌入层（Embedding）：使用一个嵌入层将输入的词标（tokens）转换为固定大小的向量，这里的维度是4096，词汇表大小为32000。
解码器层（LlamaDecoderLayer）：包含32个解码器层，每个层都具有以下组件：
- 自注意力（LlamaSdpaAttention）：一个自注意力机制，包括四个线性变换（q_proj, k_proj, v_proj, o_proj）以及一个旋转位置编码（rotary_emb）。
- 多层感知机（LlamaMLP）：包括一个门控投影（gate_proj），上升投影（up_proj），下降投影（down_proj），以及激活函数（act_fn），这里使用了SiLU（Sigmoid线性单元）。
- 层归一化（LlamaRMSNorm）：在输入和自注意力后使用RMS归一化，有利于模型训练的稳定性。
输出头（Linear）：最后，一个线性层将解码器的输出转换回词汇表空间，用于生成最终的输出词标，这里输出层的大小也是32000，与词汇表大小相同。

这个结构揭示了Llama2模型在处理自然语言任务时的高级能力，通过其深层的网络结构、复杂的自注意力机制和多层感知机组件，它能够捕捉和生成复杂的语言模式。此外，该模型采用了先进的正则化技术和激活函数，进一步提升了模型的性能和训练稳定性。

4. 模型的参数大小如何计算？

计算模型的参数大小涉及统计模型中所有可训练参数的总数。在深度学习模型，尤其是像Llama2这样的大型变换器模型中，这些参数主要集中在几个关键部分：嵌入层（embedding layers）、自注意力层（self-attention layers）、前馈网络（feed-forward networks），以及可能的其他特殊层或组件。

每个部分的参数数量可以通过下面的方法计算：

嵌入层（Embedding Layers）:
- 对于词嵌入层，参数数量通常是词汇表大小乘以嵌入维度。例如，如果词汇表大小是32,000且嵌入维度是4,096，则参数数量为32,000 * 4,096。
自注意力层（Self-Attention Layers）:
- 自注意力层包含多个线性变换（例如，q_proj, k_proj, v_proj, o_proj），每个的参数数量是输入维度乘以输出维度。如果没有偏置项，那么对于每个投影，参数数量为in_features * out_features。考虑到有多个这样的层，且每个解码器层都包含一个自注意力机制，总参数数量会相应乘以层的数量。
前馈网络（Feed-Forward Networks）:
- 前馈网络通常包括至少两个线性变换，参数数量是两次变换维度的乘积的和。例如，如果第一个变换将维度从4,096映射到11,008，然后第二个变换又将其映射回4,096，那么这部分的参数数量为4,096 * 11,008 + 11,008 * 4,096。
其他层和组件:
- 任何额外的层或组件（如层归一化、特殊的激活函数等）也会有自己的参数，虽然相比于上述部分，它们的参数数量通常较少。

综合这些信息，我们可以计算出整个模型的大致参数大小。例如，给定一个具有32个解码器层的模型，每个层的参数数量可以通过上述方法计算，然后将所有层的参数数量相加以得到总数。对于Llama2这样的复杂模型，参数数量通常达到数亿甚至数十亿。

为了提供一个具体的例子，让我们假设一个简化的计算，仅考虑嵌入层和一个自注意力层的参数：

嵌入层：32,000 * 4,096 = 131,072,000（词嵌入）
自注意力层（假设每个解码器层有相似的参数数量）：
- 仅考虑线性变换（q, k, v, o）：4 * (4,096 * 4,096) = 67,108,864（单层）
- 如果有32层：67,108,864 * 32 = 2,147,483,648

这只是一个非常粗略的估算，实际的计算需要考虑所有层和组件的详细参数。如果你需要一个精确的计算，通常最直接的方法是使用模型的框架API（例如PyTorch或TensorFlow）来自动计算模型的总参数数量。在PyTorch中，你可以使用sum(p.numel() for p in model.parameters())来计算一个模型的总参数数量。

程序命名为 test02.py，文件保存到 newsrc 目录下：

from transformers import AutoModelForCausalLM

# 指定模型路径
model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型
hf_model = AutoModelForCausalLM.from_pretrained(model_path)

# 计算参数数量
total_params = sum(p.numel() for p in hf_model.parameters())

# 计算模型大小（以字节为单位），假设每个参数为32位浮点数
model_size_bytes = total_params * 4  # 4字节/参数

# 转换为更易于理解的单位：MB
model_size_mb = model_size_bytes / (1024 ** 2)

print(f"Total Parameters: {total_params}")
print(f"Model Size: {model_size_mb:.2f} MB")

from transformers import AutoModelForCausalLM

# 指定模型路径

model_path = "meta-llama/Llama-2-7b-chat-hf"

# 加载模型

hf_model = AutoModelForCausalLM.from_pretrained(model_path)

# 计算参数数量

total_params = sum(p.numel() for p in hf_model.parameters())

# 计算模型大小（以字节为单位），假设每个参数为32位浮点数

model_size_bytes = total_params * 4 # 4字节/参数

# 转换为更易于理解的单位：MB

model_size_mb = model_size_bytes / (1024 ** 2)

print(f"Total Parameters: {total_params}")

print(f"Model Size: {model_size_mb:.2f} MB")

运行 test01.py

python newsrc/test02.py
Loading checkpoint shards: 100%|██████████████████| 2/2 [01:35<00:00, 47.99s/it]
Total Parameters: 6738415616
Model Size: 25705.02 MB

python newsrc/test02.py

Loading checkpoint shards: 100%|██████████████████| 2/2 [01:35<00:00, 47.99s/it]

Total Parameters: 6738415616

Model Size: 25705.02 MB

从提供的脚本运行结果中，我们可以看到Llama2模型的总参数数量为6,738,415,616，即大约6.74亿个参数。此外，模型的大小被报告为25,705.02MB，或者大约25.7GB。

模型的大小（以MB或GB为单位）基于其参数的存储需求。每个参数通常以浮点数（如32位浮点数或64位浮点数）的形式存储。在深度学习中，32位浮点数（即float32）是最常用的参数类型，每个参数需要4字节的存储空间。如果我们假设Llama2模型的参数使用的是32位浮点数，我们可以如下计算模型的理论大小：

让我们根据给出的参数数量来实际计算一下模型大小。

根据计算，Llama2模型的大小确实约为25,705.02MB，或大约25.7GB，这与你提供的脚本运行结果一致。这个大小说明了模型参数的庞大和模型结构的复杂性，反映了其在处理各种自然语言处理任务时的强大能力。这种模型大小也意味着在资源有限的环境中运行它可能会遇到挑战，尤其是在内存容量受限的设备上。

0. 前言

1. 环境准备

2.下载模型

3. 模型的加载

4. 模型的参数大小如何计算？

相关文章

发表评论 取消回复

发表评论取消回复