使用 Amazon SageMaker 微调 Llama 2 模型

Amazon SageMaker

生成式人工智能

本篇文章主要介绍如何使用 [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) 进行 Llama 2 模型微调的示例。这个示例主要包括: 1. Llama 2 总体介绍 2. Llama 2 微调介绍 3. Llama 2 环境设置 4. Llama 2 微调训练 ### 前言随着生成式 AI 的热度逐渐升高，国内外各种基座大语言竞相出炉，在其基础上衍生出种类繁多的应用场景。训练优异的基座大语言模型在通用性方面表现较好，但模型可能并未涉及到特定领域的专业术语、领域内的特定用语或上下文等。采用微调技术可以通过在领域特定数据上进行训练，使模型更好地适应目标领域的特殊语言模式和结构；结合基座模型的通用性和领域特定性，使得模型更具实际应用价值。 ### Llama 2 总体介绍 Llama 2 是 META 最新开源的 LLM，包括 7B、13B 和 70B 三个版本，训练数据集超过了 Llama 2 的 40%，达到 2 万亿 token；上下文长度也提升到 4K，可以极大扩展多轮对话的轮数、提示词输入数据；与此同时，Llama 2 Chat 模型使用基于人类反馈的强化学习（Reinforcement Learning from Human Feedback，RLHF），针对对话场景进行了大幅优化，达到了非常出色的有用性和安全性基准。HuggingFace 的 TGI 和 vLLM 等框架均有针对 Llama 2 的推理优化，进一步强化了 Llama 2 的可用性。 Llama 2 被认为是开源界大语言模型的首选，众多的垂类大模型均采用 Llama 2 作为基座大模型，在此基础上添加行业数据进行模型的预训练或者微调，适配更多的行业场景。 ### Llama 2 微调介绍模型微调主要分为 Full Fine-Tune 和 PEFT (Performance-Efficient Fine-Tune)，前者模型全部参数都会进行更新，训练时间较长，训练资源较大；而后者会冻结大部分参数、微调训练网络结构，常见的方式是 LoRA 和 P-Tuning v2。 PEFT 微调方式由于参数更新较少，可能导致模型无法学习到全部领域知识，对于特定任务或领域来说会出现推理不稳定的情况，因此大多数生产系统均使用全参数方式进行模型的微调。基于上述原因，本文会以全参数微调方式介绍 Llama 2 在 [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) 上的微调。 ### Llama 2 环境设置备注：项目中的示例代码均保存于代码仓库，地址如下: https://github.com/aws-samples/llm-workshop-on-amazon-sagemaker?trk=cndc-detail 1. 升级 Python SDK ```js pip install -U sagemaker ``` 2. 获取运行时资源，包括区域、角色、账号、S3 桶等 ```js import boto3 import sagemaker from sagemaker import get_execution_role sess = sagemaker.Session() role = get_execution_role() sagemaker_default_bucket = sess.default_bucket() account = sess.boto_session.client("sts").get_caller_identity()["Account"] region = sess.boto_session.region_name ``` ### Llama 2 微调训练 #### 微调准备 **克隆代码** - 采用 lm-sys 团队发布的 FastChat 平台进行 Llama 2 的微调，FastChat 也用于训练了知名的 Vicuna 模型，具有良好的代码规范和性能优化。 ```js git clone https://github.com/lm-sys/FastChat.git cd FastChat git reset --hard 974537efbd82093b45e64d07904efe7728193a52 ``` **下载 Llama 2 原始模型** ```js from huggingface_hub import snapshot_download from pathlib import Path local_cache_path = Path("./model") local_cache_path.mkdir(exist_ok=True) model_name = "TheBloke/Llama-2-13B-fp16" # Only download pytorch checkpoint files allow_patterns = ["*.json", "*.pt", "*.bin", "*.model", "*.py"] model_download_path = snapshot_download( repo_id=model_name, cache_dir=local_cache_path, allow_patterns=allow_patterns, revision='b2e65e8ad4bb35e5abaee0170ebd5fc2134a50bb' ) # Get the model files path import os from glob import glob local_model_path = None paths = os.walk(r'./model') for root, dirs, files in paths: for file in files: if file == 'config.json': print(os.path.join(root,file)) local_model_path = str(os.path.join(root,file))[0:-11] print(local_model_path) if local_model_path == None: print("Model download may failed, please check prior step!") ``` **拷贝模型和数据到 [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)** ```js chmod +x ./s5cmd ./s5cmd sync \${local_model_path} s3://\${sagemaker_default_bucket}/llm/models/llama2/TheBloke/Llama-2-13B-fp16/ rm -rf model ``` #### 模型微调 - 模型的微调使用全参数模型，以实现微调后模型的稳定性。 - 模型的微调使用开源框架 DeepSpeed 进行加速。 **准备基础镜像** 使用 [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) 定制的深度学习训练镜像作为基础镜像，再安装 Llama 2 训练所需的依赖包。Dockerfile 如下： ```js %%writefile Dockerfile ## You should change below region code to the region you used, here sample is use us-west-2 From 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04 ENV LANG=C.UTF-8 ENV PYTHONUNBUFFERED=TRUE ENV PYTHONDONTWRITEBYTECODE=TRUE RUN pip3 uninstall -y deepspeed \\ && pip3 install deepspeed==0.10.0 \\ && pip3 install transformers==4.30.2 ## Make all local GPUs visible ENV NVIDIA_VISIBLE_DEVICES="all" ``` **模型微调代码** 模型微调源代码较多，细节可以参考上述 git 仓库。 **微调参数** - 为了节省显存，采用 DeepSpeed Stage-3 - 训练过程开启 bf16，实现整数范围和精度的平衡 - 训练数据集采用官方提供的 dummy_conversation.json，也就是典型的 {"instruction"、"input"、"output"} 的格式，同时可以支持多轮对话 ```js DEEPSPEED_OPTS=""" FastChat/fastchat/train/train_mem.py --deepspeed ds.json --model_name_or_path "/tmp/llama_pretrain/" --data_path FastChat/data/dummy_conversation.json --output_dir "/tmp/llama_out" --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "no" --save_steps 2000 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --cache_dir '/tmp' --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --bf16 True --tf32 True --report_to "none" """ ``` **微调脚本** - 微调使用 torchrun + DeepSpeed 进行分布式训练 ```js %%writefile ./src/ds-train-dist.sh #!/bin/bash CURRENT_HOST="\${SM_CURRENT_HOST}" IFS=',' read -ra hosts_array <<< "\${SM_HOSTS}" NNODES=\${#hosts_array[@]} NODE_RANK=0 for i in "\${!hosts_array[@]}"; do if [[ "\${hosts_array[\$i]}" == *\${CURRENT_HOST}* ]]; then echo "host index：\$i" NODE_RANK="\$i" fi done MASTER_PORT="13579" export NCCL_SOCKET_IFNAME="eth0" #Configure the distributed arguments for torch.distributed.launch. GPUS_PER_NODE="\$SM_NUM_GPUS" DISTRIBUTED_ARGS="--nproc_per_node \$GPUS_PER_NODE \\ --nnodes \$NNODES \\ --node_rank \$NODE_RANK \\ --master_addr \$MASTER_ADDR \\ --master_port \$MASTER_PORT" chmod +x ./s5cmd ./s5cmd sync s3://\$MODEL_S3_BUCKET/llm/models/llama2/TheBloke/Llama-2-13B-fp16/* /tmp/llama_pretrain/ CMD="torchrun \${DISTRIBUTED_ARGS} \${DEEPSPEED_OPTS}" echo \${CMD} \${CMD} 2>&1 if [[ "\${CURRENT_HOST}" == "\${MASTER_ADDR}" ]]; then ./s5cmd sync /tmp/llama_out s3://\$MODEL_S3_BUCKET/llm/models/llama2/output/TheBloke/Llama-2-13B-fp16/\$(date +%Y-%m-%d-%H-%M-%S)/ fi ``` **启动微调** - 全参数微调，需要使用至少一台 p4de.12xlarge（8 卡 A100 40GB）作为训练机器。 - 当微调完成后，训练好的模型自动存储于指定的 S3 桶内，可用于后续的模型部署推理。 ```js import time from sagemaker.estimator import Estimator environment = { 'MODEL_S3_BUCKET': sagemaker_default_bucket # The bucket to store pretrained model and fine-tune model } base_job_name = 'llama2-13b-finetune' instance_type = 'ml.p4d.24xlarge' estimator = Estimator(role=role, entry_point='ds-train-dist.sh', source_dir='./src', base_job_name=base_job_name, instance_count=1, instance_type=instance_type, image_uri=image_uri, environment=environment, disable_profiler=True, debugger_hook_config=False) estimator.fit() ``` ### 总结大语言模型方兴未艾，正在以各种方式改变和影响着整个世界。客户拥抱大语言模型，亚马逊云科技团队同样在深耕客户需求和大语言模型技术，可以在未来更好地协助客户实现需求，提升业务价值。 ![开发者尾巴.gif](https://dev-media.amazoncloud.cn/3837392c3fb5480bb8888240c352df4c_%E5%BC%80%E5%8F%91%E8%80%85%E5%B0%BE%E5%B7%B4.gif "开发者尾巴.gif")