{"value":"There have been many recent advancements in the NLP domain. Pre-trained models and fully managed NLP services have democratised access and adoption of NLP. [Amazon Comprehend](https://aws.amazon.com/comprehend/) is a fully managed service that can perform NLP tasks like custom entity recognition, topic modelling, sentiment analysis and more to extract insights from data without the need of any prior ML experience.\n\nLast year, AWS announced a [partnership ](https://aws.amazon.com/blogs/machine-learning/aws-and-hugging-face-collaborate-to-simplify-and-accelerate-adoption-of-natural-language-processing-models/)with [Hugging Face](https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face) to help bring natural language processing (NLP) models to production faster. Hugging Face is an open-source AI community, focused on NLP. Their Python-based library ([Transformers](https://github.com/huggingface/transformers)) provides tools to easily use popular state-of-the-art Transformer architectures like BERT, RoBERTa, and GPT. You can apply these models to a variety of NLP tasks, such as text classification, information extraction, and question answering, among [others](https://huggingface.co/transformers/task_summary.html).\n\n[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process, making it easier to develop high-quality models. The SageMaker Python SDK provides open-source APIs and containers to train and deploy models on SageMaker, using several different ML and deep learning frameworks.\n\nThe Hugging Face integration with SageMaker allows you to build Hugging Face models at scale on your own domain-specific use cases.\n\nIn this post, we walk you through an example of how to build and deploy a custom Hugging Face text summarizer on SageMaker. We use Pegasus [1] for this purpose, the first Transformer-based model specifically pre-trained on an objective tailored for abstractive text summarization. BERT is pre-trained on masking random words in a sentence; in contrast, during Pegasus’s pre-training, sentences are masked from an input document. The model then generates the missing sentences as a single output sequence using all the unmasked sentences as context, creating an executive summary of the document as a result.\n\nThanks to the flexibility of the HuggingFace library, you can easily adapt the code shown in this post for other types of transformer models, such as t5, BART, and more.\n\n### **Load your own dataset to fine-tune a Hugging Face model**\n\nTo load a custom dataset from a CSV file, we use the ```load_dataset``` method from the Transformers package. We can apply tokenization to the loaded dataset using the ```datasets.Dataset.map``` function. The ```map``` function iterates over the loaded dataset and applies the tokenize function to each example. The tokenized dataset can then be passed to the trainer for fine-tuning the model. See the following code:\n\n```\n# Python\ndef tokenize(batch):\n tokenized_input = tokenizer(batch[args.input_column], padding='max_length', truncation=True, max_length=args.max_source)\n tokenized_target = tokenizer(batch[args.target_column], padding='max_length', truncation=True, max_length=args.max_target)\n tokenized_input['target'] = tokenized_target['input_ids']\n\n return tokenized_input\n \n\ndef load_and_tokenize_dataset(data_dir):\n for file in os.listdir(data_dir):\n dataset = load_dataset(\"csv\", data_files=os.path.join(data_dir, file), split='train')\n tokenized_dataset = dataset.map(lambda batch: tokenize(batch), batched=True, batch_size=512)\n tokenized_dataset.set_format('numpy', columns=['input_ids', 'attention_mask', 'labels'])\n \n return tokenized_dataset\n```\n\n### Build your training script for the Hugging Face SageMaker estimator\nAs explained in the post [AWS and Hugging Face collaborate to simplify and accelerate adoption of Natural Language Processing models](![image.png](https://dev-media.amazoncloud.cn/52900a3432c9493db5be9b112c7937c1_image.png)https://aws.amazon.com/blogs/machine-learning/aws-and-hugging-face-collaborate-to-simplify-and-accelerate-adoption-of-natural-language-processing-models/), training a Hugging Face model on SageMaker has never been easier. We can do so by using the Hugging Face estimator from the [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html?highlight=Hugging%20Face).\n\nThe following code snippet fine-tunes Pegasus on our dataset. You can also find many [sample notebooks](https://github.com/huggingface/notebooks/tree/master/sagemaker) that guide you through fine-tuning different types of models, available directly in the transformers GitHub repository. To enable distributed training, we can use the [Data Parallelism Library](https://aws.amazon.com/blogs/aws/managed-data-parallelism-in-amazon-sagemaker-simplifies-training-on-large-datasets/) in SageMaker, which has been built into the HuggingFace Trainer API. To enable data parallelism, we need to define the ```distribution``` parameter in our Hugging Face estimator.\n\n```\n# Python\nfrom sagemaker.huggingface import HuggingFace\n# configuration for running training on smdistributed Data Parallel\ndistribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}\nhuggingface_estimator = HuggingFace(entry_point='train.py',\n source_dir='code',\n base_job_name='huggingface-pegasus',\n instance_type= 'ml.g4dn.16xlarge',\n instance_count=1,\n transformers_version='4.6',\n pytorch_version='1.7',\n py_version='py36',\n output_path=output_path,\n role=role,\n hyperparameters = {\n 'model_name': 'google/pegasus-xsum',\n 'epoch': 10,\n 'per_device_train_batch_size': 2\n },\n distribution=distribution)\nhuggingface_estimator.fit({'train': training_input_path, 'validation': validation_input_path, 'test': test_input_path})\n```\n\nThe maximum training batch size you can configure depends on the model size and the GPU memory of the instance used. If SageMaker distributed training is enabled, the total batch size is the sum of every batch that is distributed across each device/GPU. If we use an ml.g4dn.16xlarge with distributed training instead of an ml.g4dn.xlarge instance, we have eight times (8 GPUs) as much memory as a ml.g4dn.xlarge instance (1 GPU). The batch size per device remains the same, but eight devices are training in parallel.\n\nAs usual with SageMaker, we create a ```train.py``` script to use with Script Mode and pass hyperparameters for training. The following code snippet for Pegasus loads the model and trains it using the Transformers ```Trainer``` class:\n\n```\n# Python\nfrom transformers import (\n AutoModelForSeq2SeqLM,\n AutoTokenizer,\n Seq2SeqTrainer,\n Seq2seqTrainingArguments\n)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)\n \ntraining_args = Seq2seqTrainingArguments(\n output_dir=args.model_dir,\n num_train_epochs=args.epoch,\n per_device_train_batch_size=args.train_batch_size,\n per_device_eval_batch_size=args.eval_batch_size,\n warmup_steps=args.warmup_steps,\n weight_decay=args.weight_decay,\n logging_dir=f\"{args.output_data_dir}/logs\",\n logging_strategy='epoch',\n evaluation_strategy='epoch',\n saving_strategy='epoch',\n adafactor=True,\n do_train=True,\n do_eval=True,\n do_predict=True,\n save_total_limit = 3,\n load_best_model_at_end=True,\n metric_for_best_model='eval_loss'\n # With the goal to deploy the best checkpoint to production\n # it is important to set load_best_model_at_end=True,\n # this makes sure that the last model is saved at the root\n # of the model_dir” directory\n)\n \ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset['train'],\n eval_dataset=dataset['validation']\n)\n\ntrainer.train()\ntrainer.save_model()\n\n# Get rid of unused checkpoints inside the container to limit the model.tar.gz size\nos.system(f\"rm -rf {args.model_dir}/checkpoint-*/\")\n```\n\nThe full code is available on [GitHub](https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/huggingface_byo_scripts_and_data).\n\n### **Deploy the trained Hugging Face model to SageMaker**\n\nOur friends at Hugging Face have made inference on SageMaker for Transformers models simpler than ever thanks to the [SageMaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit). You can directly deploy the previously trained model by simply setting up the environment variable ```\"HF_TASK\":\"summarization\"``` (for instructions, see [Pegasus Models](https://huggingface.co/google/pegasus-xsum)), choosing **Deploy**, and then choosing **Amazon SageMaker**, without needing to write an inference script.\n\nHowever, if you need some specific way to generate or postprocess predictions, for example generating several summary suggestions based on a list of different text generation parameters, writing your own inference script can be useful and relatively straightforward:\n\n```\n# Python\n# inference.py script\n\nimport os\nimport json\nimport torch\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\ndef model_fn(model_dir):\n # Create the model and tokenizer and load weights\n # from the previous training Job, passed here through \"model_dir\"\n # that is reflected in HuggingFaceModel \"model_data\"\n tokenizer = AutoTokenizer.from_pretrained(model_dir)\n model = AutoModelForSeq2SeqLM.from_pretrained(model_dir).to(device).eval()\n \n model_dict = {'model':model, 'tokenizer':tokenizer}\n \n return model_dict\n \n\ndef predict_fn(input_data, model_dict):\n # Return predictions/generated summaries\n # using the loaded model and tokenizer on input_data\n text = input_data.pop('inputs')\n parameters_list = input_data.pop('parameters_list', None)\n \n tokenizer = model_dict['tokenizer']\n model = model_dict['model']\n\n # Parameters may or may not be passed \n input_ids = tokenizer(text, truncation=True, padding='longest', return_tensors=\"pt\").input_ids.to(device)\n \n if parameters_list:\n predictions = []\n for parameters in parameters_list:\n output = model.generate(input_ids, **parameters)\n predictions.append(tokenizer.batch_decode(output, skip_special_tokens=True))\n else:\n output = model.generate(input_ids)\n predictions = tokenizer.batch_decode(output, skip_special_tokens=True)\n \n return predictions\n \n \ndef input_fn(request_body, request_content_type):\n # Transform the input request to a dictionary\n request = json.loads(request_body)\n return request\n```\n\nAs shown in the preceding code, such an inference script for HuggingFace on SageMaker only needs the following template functions:\n\n- **model_fn()** – Reads the content of what was saved at the end of the training job inside ```SM_MODEL_DIR```, or from an existing model weights directory saved as a tar.gz file in [Amazon Simple Storage Service](http://aws.amazon.com/s3) (Amazon S3). It’s used to load the trained model and associated tokenizer.\n- **input_fn()** – Formats the data received from a request made to the endpoint.\n- **predict_fn()** – Calls the output of ```model_fn()``` (the model and tokenizer) to run inference on the output of ```input_fn()``` (the formatted data).\n\nOptionally, you can create an ```output_fn()``` function for inference formatting, using the output of ```predict_fn()```, which we didn’t demonstrate in this post.\n\nWe can then deploy the trained Hugging Face model with its associated inference script to SageMaker using the [Hugging Face SageMaker Model](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html#hugging-face-model) class:\n\n```\n# Python\nfrom sagemaker.huggingface import HuggingFaceModel\n\nmodel = HuggingFaceModel(model_data=huggingface_estimator.model_data,\n role=role,\n framework_version='1.7',\n py_version='py36',\n entry_point='inference.py',\n source_dir='code')\n \npredictor = model.deploy(initial_instance_count=1,\n instance_type='ml.g4dn.xlarge'\n )\n```\n\n### **Test the deployed model**\n\nFor this demo, we trained the model on the [Women’s E-Commerce Clothing Reviews dataset](https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/), which contains reviews of clothing articles (which we consider as the input text) and their associated titles (which we consider as summaries). After we remove articles with missing titles, the dataset contains 19,675 reviews. Fine-tuning the Pegasus model on a training set containing 70% of those articles for five epochs took approximately 3.5 hours on an ml.p3.16xlarge instance.\n\nWe can then deploy the model and test it with some example data from the test set. The following is an example review describing a sweater:\n\n```\n# Python\nReview Text\n\"I ordered this sweater in green in petite large. The color and knit is beautiful and the shoulders and body fit comfortably; however, the sleeves were very long for a petite. I roll them, and it looks okay but would have rather had a normal petite length sleeve.\"\n\nOriginal Title\n\"Long sleeves\"\n\nRating\n3\n```\n\nThanks to our custom inference script hosted in a SageMaker endpoint, we can generate several summaries for this review with different text generation parameters. For example, we can ask the endpoint to generate a range of very short to moderately long summaries specifying different length penalties (the smaller the length penalty, the shorter the generated summary). The following are some parameter input examples, and the subsequent machine-generated summaries:\n\n```\n# Python\ninputs = {\n \"inputs\":[\n\"I ordered this sweater in green in petite large. The color and knit is beautiful and the shoulders and body fit comfortably; however, the sleeves were very long for a petite. I roll them, and it looks okay but would have rather had a normal petite length sleeve.\"\n ],\n\n \"parameters_list\":[\n {\n \"length_penalty\":2\n },\n\t{\n \"length_penalty\":1\n },\n\t{\n \"length_penalty\":0.6\n },\n {\n \"length_penalty\":0.4\n }\n ]\n\nresult = predictor.predict(inputs)\nprint(result)\n\n[\n [\"Beautiful color and knit but sleeves are very long for a petite\"],\n [\"Beautiful sweater, but sleeves are too long for a petite\"],\n [\"Cute, but sleeves are long\"],\n [\"Very long sleeves\"]\n]\n```\n\nWhich summary do you prefer? The first generated title captures all the important facts about the review, with a quarter the number of words. In contrast, the last one only uses three words (less than 1/10th the length of the original review) to focus on the most important feature of the sweater.\n\n### **Conclusion**\n\nYou can fine-tune a text summarizer on your custom dataset and deploy it to production on SageMaker with this simple example available on [GitHub](https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/huggingface_byo_scripts_and_data). Additional [sample notebooks](https://github.com/huggingface/notebooks/tree/master/sagemaker) to train and deploy Hugging Face models on SageMaker are also available.\n\nAs always, AWS welcomes feedback. Please submit any comments or questions.\n\n#### **References**\n\n[[1] PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1810.04805.pdf)\n\n### **About the authors**\n\n![image.png](https://dev-media.amazoncloud.cn/afc9ee2a9cba4a0dabfa0007605a20a2_image.png)\n\n**Viktor Malesevic** is a Machine Learning Engineer with AWS Professional Services, passionate about Natural Language Processing and MLOps. He works with customers to develop and put challenging deep learning models to production on AWS. In his spare time, he enjoys sharing a glass of red wine and some cheese with friends.\n\n![image.png](https://dev-media.amazoncloud.cn/f3114731cc2342798a26a810a6f038f7_image.png)\n\n**Aamna Najmi** is a Data Scientist with AWS Professional Services. She is passionate about helping customers innovate with Big Data and Artificial Intelligence technologies to tap business value and insights from data. In her spare time, she enjoys gardening and traveling to new places.\n\n\n","render":"<p>There have been many recent advancements in the NLP domain. Pre-trained models and fully managed NLP services have democratised access and adoption of NLP. <a href=\"https://aws.amazon.com/comprehend/\" target=\"_blank\">Amazon Comprehend</a> is a fully managed service that can perform NLP tasks like custom entity recognition, topic modelling, sentiment analysis and more to extract insights from data without the need of any prior ML experience.</p>\n<p>Last year, AWS announced a <a href=\"https://aws.amazon.com/blogs/machine-learning/aws-and-hugging-face-collaborate-to-simplify-and-accelerate-adoption-of-natural-language-processing-models/\" target=\"_blank\">partnership </a>with <a href=\"https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face\" target=\"_blank\">Hugging Face</a> to help bring natural language processing (NLP) models to production faster. Hugging Face is an open-source AI community, focused on NLP. Their Python-based library (<a href=\"https://github.com/huggingface/transformers\" target=\"_blank\">Transformers</a>) provides tools to easily use popular state-of-the-art Transformer architectures like BERT, RoBERTa, and GPT. You can apply these models to a variety of NLP tasks, such as text classification, information extraction, and question answering, among <a href=\"https://huggingface.co/transformers/task_summary.html\" target=\"_blank\">others</a>.</p>\n<p><a href=\"https://aws.amazon.com/sagemaker/\" target=\"_blank\">Amazon SageMaker</a> is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process, making it easier to develop high-quality models. The SageMaker Python SDK provides open-source APIs and containers to train and deploy models on SageMaker, using several different ML and deep learning frameworks.</p>\n<p>The Hugging Face integration with SageMaker allows you to build Hugging Face models at scale on your own domain-specific use cases.</p>\n<p>In this post, we walk you through an example of how to build and deploy a custom Hugging Face text summarizer on SageMaker. We use Pegasus [1] for this purpose, the first Transformer-based model specifically pre-trained on an objective tailored for abstractive text summarization. BERT is pre-trained on masking random words in a sentence; in contrast, during Pegasus’s pre-training, sentences are masked from an input document. The model then generates the missing sentences as a single output sequence using all the unmasked sentences as context, creating an executive summary of the document as a result.</p>\n<p>Thanks to the flexibility of the HuggingFace library, you can easily adapt the code shown in this post for other types of transformer models, such as t5, BART, and more.</p>\n<h3><a id=\"Load_your_own_dataset_to_finetune_a_Hugging_Face_model_12\"></a><strong>Load your own dataset to fine-tune a Hugging Face model</strong></h3>\n<p>To load a custom dataset from a CSV file, we use the <code>load_dataset</code> method from the Transformers package. We can apply tokenization to the loaded dataset using the <code>datasets.Dataset.map</code> function. The <code>map</code> function iterates over the loaded dataset and applies the tokenize function to each example. The tokenized dataset can then be passed to the trainer for fine-tuning the model. See the following code:</p>\n<pre><code class=\"lang-\"># Python\ndef tokenize(batch):\n tokenized_input = tokenizer(batch[args.input_column], padding='max_length', truncation=True, max_length=args.max_source)\n tokenized_target = tokenizer(batch[args.target_column], padding='max_length', truncation=True, max_length=args.max_target)\n tokenized_input['target'] = tokenized_target['input_ids']\n\n return tokenized_input\n \n\ndef load_and_tokenize_dataset(data_dir):\n for file in os.listdir(data_dir):\n dataset = load_dataset("csv", data_files=os.path.join(data_dir, file), split='train')\n tokenized_dataset = dataset.map(lambda batch: tokenize(batch), batched=True, batch_size=512)\n tokenized_dataset.set_format('numpy', columns=['input_ids', 'attention_mask', 'labels'])\n \n return tokenized_dataset\n</code></pre>\n<h3><a id=\"Build_your_training_script_for_the_Hugging_Face_SageMaker_estimator_35\"></a>Build your training script for the Hugging Face SageMaker estimator</h3>\n<p>As explained in the post <a href=\"!%5Bimage.png%5D(https://dev-media.amazoncloud.cn/52900a3432c9493db5be9b112c7937c1_image.png)https://aws.amazon.com/blogs/machine-learning/aws-and-hugging-face-collaborate-to-simplify-and-accelerate-adoption-of-natural-language-processing-models/\" target=\"_blank\">AWS and Hugging Face collaborate to simplify and accelerate adoption of Natural Language Processing models</a>, training a Hugging Face model on SageMaker has never been easier. We can do so by using the Hugging Face estimator from the <a href=\"https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html?highlight=Hugging%20Face\" target=\"_blank\">SageMaker SDK</a>.</p>\n<p>The following code snippet fine-tunes Pegasus on our dataset. You can also find many <a href=\"https://github.com/huggingface/notebooks/tree/master/sagemaker\" target=\"_blank\">sample notebooks</a> that guide you through fine-tuning different types of models, available directly in the transformers GitHub repository. To enable distributed training, we can use the <a href=\"https://aws.amazon.com/blogs/aws/managed-data-parallelism-in-amazon-sagemaker-simplifies-training-on-large-datasets/\" target=\"_blank\">Data Parallelism Library</a> in SageMaker, which has been built into the HuggingFace Trainer API. To enable data parallelism, we need to define the <code>distribution</code> parameter in our Hugging Face estimator.</p>\n<pre><code class=\"lang-\"># Python\nfrom sagemaker.huggingface import HuggingFace\n# configuration for running training on smdistributed Data Parallel\ndistribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}\nhuggingface_estimator = HuggingFace(entry_point='train.py',\n source_dir='code',\n base_job_name='huggingface-pegasus',\n instance_type= 'ml.g4dn.16xlarge',\n instance_count=1,\n transformers_version='4.6',\n pytorch_version='1.7',\n py_version='py36',\n output_path=output_path,\n role=role,\n hyperparameters = {\n 'model_name': 'google/pegasus-xsum',\n 'epoch': 10,\n 'per_device_train_batch_size': 2\n },\n distribution=distribution)\nhuggingface_estimator.fit({'train': training_input_path, 'validation': validation_input_path, 'test': test_input_path})\n</code></pre>\n<p>The maximum training batch size you can configure depends on the model size and the GPU memory of the instance used. If SageMaker distributed training is enabled, the total batch size is the sum of every batch that is distributed across each device/GPU. If we use an ml.g4dn.16xlarge with distributed training instead of an ml.g4dn.xlarge instance, we have eight times (8 GPUs) as much memory as a ml.g4dn.xlarge instance (1 GPU). The batch size per device remains the same, but eight devices are training in parallel.</p>\n<p>As usual with SageMaker, we create a <code>train.py</code> script to use with Script Mode and pass hyperparameters for training. The following code snippet for Pegasus loads the model and trains it using the Transformers <code>Trainer</code> class:</p>\n<pre><code class=\"lang-\"># Python\nfrom transformers import (\n AutoModelForSeq2SeqLM,\n AutoTokenizer,\n Seq2SeqTrainer,\n Seq2seqTrainingArguments\n)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)\n \ntraining_args = Seq2seqTrainingArguments(\n output_dir=args.model_dir,\n num_train_epochs=args.epoch,\n per_device_train_batch_size=args.train_batch_size,\n per_device_eval_batch_size=args.eval_batch_size,\n warmup_steps=args.warmup_steps,\n weight_decay=args.weight_decay,\n logging_dir=f"{args.output_data_dir}/logs",\n logging_strategy='epoch',\n evaluation_strategy='epoch',\n saving_strategy='epoch',\n adafactor=True,\n do_train=True,\n do_eval=True,\n do_predict=True,\n save_total_limit = 3,\n load_best_model_at_end=True,\n metric_for_best_model='eval_loss'\n # With the goal to deploy the best checkpoint to production\n # it is important to set load_best_model_at_end=True,\n # this makes sure that the last model is saved at the root\n # of the model_dir” directory\n)\n \ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset['train'],\n eval_dataset=dataset['validation']\n)\n\ntrainer.train()\ntrainer.save_model()\n\n# Get rid of unused checkpoints inside the container to limit the model.tar.gz size\nos.system(f"rm -rf {args.model_dir}/checkpoint-*/")\n</code></pre>\n<p>The full code is available on <a href=\"https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/huggingface_byo_scripts_and_data\" target=\"_blank\">GitHub</a>.</p>\n<h3><a id=\"Deploy_the_trained_Hugging_Face_model_to_SageMaker_119\"></a><strong>Deploy the trained Hugging Face model to SageMaker</strong></h3>\n<p>Our friends at Hugging Face have made inference on SageMaker for Transformers models simpler than ever thanks to the <a href=\"https://github.com/aws/sagemaker-huggingface-inference-toolkit\" target=\"_blank\">SageMaker Hugging Face Inference Toolkit</a>. You can directly deploy the previously trained model by simply setting up the environment variable <code>"HF_TASK":"summarization"</code> (for instructions, see <a href=\"https://huggingface.co/google/pegasus-xsum\" target=\"_blank\">Pegasus Models</a>), choosing <strong>Deploy</strong>, and then choosing <strong>Amazon SageMaker</strong>, without needing to write an inference script.</p>\n<p>However, if you need some specific way to generate or postprocess predictions, for example generating several summary suggestions based on a list of different text generation parameters, writing your own inference script can be useful and relatively straightforward:</p>\n<pre><code class=\"lang-\"># Python\n# inference.py script\n\nimport os\nimport json\nimport torch\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\ndevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")\n\ndef model_fn(model_dir):\n # Create the model and tokenizer and load weights\n # from the previous training Job, passed here through "model_dir"\n # that is reflected in HuggingFaceModel "model_data"\n tokenizer = AutoTokenizer.from_pretrained(model_dir)\n model = AutoModelForSeq2SeqLM.from_pretrained(model_dir).to(device).eval()\n \n model_dict = {'model':model, 'tokenizer':tokenizer}\n \n return model_dict\n \n\ndef predict_fn(input_data, model_dict):\n # Return predictions/generated summaries\n # using the loaded model and tokenizer on input_data\n text = input_data.pop('inputs')\n parameters_list = input_data.pop('parameters_list', None)\n \n tokenizer = model_dict['tokenizer']\n model = model_dict['model']\n\n # Parameters may or may not be passed \n input_ids = tokenizer(text, truncation=True, padding='longest', return_tensors="pt").input_ids.to(device)\n \n if parameters_list:\n predictions = []\n for parameters in parameters_list:\n output = model.generate(input_ids, **parameters)\n predictions.append(tokenizer.batch_decode(output, skip_special_tokens=True))\n else:\n output = model.generate(input_ids)\n predictions = tokenizer.batch_decode(output, skip_special_tokens=True)\n \n return predictions\n \n \ndef input_fn(request_body, request_content_type):\n # Transform the input request to a dictionary\n request = json.loads(request_body)\n return request\n</code></pre>\n<p>As shown in the preceding code, such an inference script for HuggingFace on SageMaker only needs the following template functions:</p>\n<ul>\n<li><strong>model_fn()</strong> – Reads the content of what was saved at the end of the training job inside <code>SM_MODEL_DIR</code>, or from an existing model weights directory saved as a tar.gz file in <a href=\"http://aws.amazon.com/s3\" target=\"_blank\">Amazon Simple Storage Service</a> (Amazon S3). It’s used to load the trained model and associated tokenizer.</li>\n<li><strong>input_fn()</strong> – Formats the data received from a request made to the endpoint.</li>\n<li><strong>predict_fn()</strong> – Calls the output of <code>model_fn()</code> (the model and tokenizer) to run inference on the output of <code>input_fn()</code> (the formatted data).</li>\n</ul>\n<p>Optionally, you can create an <code>output_fn()</code> function for inference formatting, using the output of <code>predict_fn()</code>, which we didn’t demonstrate in this post.</p>\n<p>We can then deploy the trained Hugging Face model with its associated inference script to SageMaker using the <a href=\"https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html#hugging-face-model\" target=\"_blank\">Hugging Face SageMaker Model</a> class:</p>\n<pre><code class=\"lang-\"># Python\nfrom sagemaker.huggingface import HuggingFaceModel\n\nmodel = HuggingFaceModel(model_data=huggingface_estimator.model_data,\n role=role,\n framework_version='1.7',\n py_version='py36',\n entry_point='inference.py',\n source_dir='code')\n \npredictor = model.deploy(initial_instance_count=1,\n instance_type='ml.g4dn.xlarge'\n )\n</code></pre>\n<h3><a id=\"Test_the_deployed_model_204\"></a><strong>Test the deployed model</strong></h3>\n<p>For this demo, we trained the model on the <a href=\"https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/\" target=\"_blank\">Women’s E-Commerce Clothing Reviews dataset</a>, which contains reviews of clothing articles (which we consider as the input text) and their associated titles (which we consider as summaries). After we remove articles with missing titles, the dataset contains 19,675 reviews. Fine-tuning the Pegasus model on a training set containing 70% of those articles for five epochs took approximately 3.5 hours on an ml.p3.16xlarge instance.</p>\n<p>We can then deploy the model and test it with some example data from the test set. The following is an example review describing a sweater:</p>\n<pre><code class=\"lang-\"># Python\nReview Text\n"I ordered this sweater in green in petite large. The color and knit is beautiful and the shoulders and body fit comfortably; however, the sleeves were very long for a petite. I roll them, and it looks okay but would have rather had a normal petite length sleeve."\n\nOriginal Title\n"Long sleeves"\n\nRating\n3\n</code></pre>\n<p>Thanks to our custom inference script hosted in a SageMaker endpoint, we can generate several summaries for this review with different text generation parameters. For example, we can ask the endpoint to generate a range of very short to moderately long summaries specifying different length penalties (the smaller the length penalty, the shorter the generated summary). The following are some parameter input examples, and the subsequent machine-generated summaries:</p>\n<pre><code class=\"lang-\"># Python\ninputs = {\n "inputs":[\n"I ordered this sweater in green in petite large. The color and knit is beautiful and the shoulders and body fit comfortably; however, the sleeves were very long for a petite. I roll them, and it looks okay but would have rather had a normal petite length sleeve."\n ],\n\n "parameters_list":[\n {\n "length_penalty":2\n },\n\t{\n "length_penalty":1\n },\n\t{\n "length_penalty":0.6\n },\n {\n "length_penalty":0.4\n }\n ]\n\nresult = predictor.predict(inputs)\nprint(result)\n\n[\n ["Beautiful color and knit but sleeves are very long for a petite"],\n ["Beautiful sweater, but sleeves are too long for a petite"],\n ["Cute, but sleeves are long"],\n ["Very long sleeves"]\n]\n</code></pre>\n<p>Which summary do you prefer? The first generated title captures all the important facts about the review, with a quarter the number of words. In contrast, the last one only uses three words (less than 1/10th the length of the original review) to focus on the most important feature of the sweater.</p>\n<h3><a id=\"Conclusion_259\"></a><strong>Conclusion</strong></h3>\n<p>You can fine-tune a text summarizer on your custom dataset and deploy it to production on SageMaker with this simple example available on <a href=\"https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/huggingface_byo_scripts_and_data\" target=\"_blank\">GitHub</a>. Additional <a href=\"https://github.com/huggingface/notebooks/tree/master/sagemaker\" target=\"_blank\">sample notebooks</a> to train and deploy Hugging Face models on SageMaker are also available.</p>\n<p>As always, AWS welcomes feedback. Please submit any comments or questions.</p>\n<h4><a id=\"References_265\"></a><strong>References</strong></h4>\n<p><a href=\"https://arxiv.org/pdf/1810.04805.pdf\" target=\"_blank\">[1] PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization</a></p>\n<h3><a id=\"About_the_authors_269\"></a><strong>About the authors</strong></h3>\n<p><img src=\"https://dev-media.amazoncloud.cn/afc9ee2a9cba4a0dabfa0007605a20a2_image.png\" alt=\"image.png\" /></p>\n<p><strong>Viktor Malesevic</strong> is a Machine Learning Engineer with AWS Professional Services, passionate about Natural Language Processing and MLOps. He works with customers to develop and put challenging deep learning models to production on AWS. In his spare time, he enjoys sharing a glass of red wine and some cheese with friends.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/f3114731cc2342798a26a810a6f038f7_image.png\" alt=\"image.png\" /></p>\n<p><strong>Aamna Najmi</strong> is a Data Scientist with AWS Professional Services. She is passionate about helping customers innovate with Big Data and Artificial Intelligence technologies to tap business value and insights from data. In her spare time, she enjoys gardening and traveling to new places.</p>\n"}