亚马逊云科技 NLP 月刊 2022年6月

机器学习

亚马逊云科技

{"value":"[Gilead](https://www.gilead.com/)’s PDM (pharmaceutical development and manufacturing) team chose Amazon Web Services (Amazon Web Services), adopting Amazon Kendra, a highly accurate intelligent search service powered by ML. While receiving support from Amazon Web Services, the PDM team [built a data lake within 9 months](https://aws.amazon.com/solutions/case-studies/gilead-case-study/), and continued on to build a search tool within only 3 months, completing its project well within its estimated timeline of 3 years.\n\n![image.png](https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/5a98907416a24a9f955e41984fc75e08_image.png)\n\nSince launching its enterprise search tool, users across the PDM team have been able to substantially reduce manual data management tasks and the amount of time it takes to search for information by approximately 50 percent. This has fueled research, experimentation, and pharmaceutical breakthroughs.\n\nAmazon Kendra is a turnkey AI solution that, when configured correctly, is capable of spanning every single domain in the organization while being straightforward to implement.\n\n-- Jeremy Zhang, Director of Data Science and Knowledge Management, Gilead Sciences Inc.\n\n### **Latent Space**\n[Latent Space](https://www.latentspace.co/) is a company that specializes in the next wave of generative models for businesses and creatives, combining fields that have long had little overlap: graphics and natural language processing (NLP).\n\nAmazon SageMaker‘s unique automated model partitioning and efficient pipelining approach made [our adoption of model parallelism](https://aws.amazon.com/blogs/machine-learning/how-latent-space-used-the-amazon-sagemaker-model-parallelism-library-to-push-the-frontiers-of-large-scale-transformers/) possible with little engineering effort, and we scaled our training of models beyond 1 billion parameters, which is an important requirement for us. Furthermore, we observed that when training with a 16 node, eight GPU training setup with the SageMaker model parallelism library, we recorded a 38% improvement in efficiency compared to our previous training runs.\n\n### **AI Language Services**\n#### **Amazon Lex**\n[Amazon Lex](https://aws.amazon.com/lex/) provides automatic speech recognition (ASR) and natural language understanding (NLU) capabilities so you can build applications and interactive voice response (IVR) solutions with engaging user experiences. Now, you can [programmatically provide phrases as hints](programmatically provide phrases as hints) during a live interaction to influence the transcription of spoken input. \n\n![image.png](https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/0db07d0be0864d4dbf8f4451d99f06f8_image.png)\n\n#### **Amazon Comprehend**\n[Amazon Comprehend](https://aws.amazon.com/comprehend/) is a natural language processing (NLP) service that uses machine learning to find insights and relationships like people, places, sentiments, and topics in unstructured text. You can use Amazon Comprehend ML capabilities to detect and redact personally identifiable information (PII) in customer emails, support tickets, product reviews, social media, and more. Now, [Amazon Comprehend PII supports 14 new entity types](https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-comprehend-detects-redacts-pll-entity-types-across-us-uk-canada-india/), with localized support for entities within the United States, Canada, United Kingdom, and India. Customers can now detect and redact 36 entities to protect sensitive data. \n\n#### **Amazon Lex**\n[Amazon Lex](https://aws.amazon.com/lex/) is a service for building conversational interfaces into any application using voice and text. Starting today, you can [give Amazon Lex additional information about how to process speech input by creating a custom vocabulary](https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-lex-supports-custom-vocabulary/). A custom vocabulary is a list of domain-specific terms or unique words (e.g., brand names, product names) that are more difficult to recognize. You create the list and add it to the bot definition, so Amazon Lex can use these words when determining the user’s intent or collecting information in a conversation.\n\n### **NLP on Amazon SageMaker**\n- [Detect social media fake news using graph machine learning with Amazon Neptune ML](https://aws.amazon.com/blogs/machine-learning/detect-social-media-fake-news-using-graph-machine-learning-with-amazon-neptune-ml/). The spread of misinformation and fake news on these platforms has posed a major challenge to the well-being of individuals and societies. Therefore, it is imperative that we develop robust and automated solutions for early detection of fake news on social media. Traditional approaches rely purely on the news content (using natural language processing) to mark information as real or fake. However, the social context in which the news is published and shared can provide additional insights into the nature of fake news on social media and improve the predictive capabilities of fake news detection tools.\n\n![image.png](https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/f0bb29b857fb4a51826e89a6810d7535_image.png)\n\n- [Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/fine-tune-transformer-language-models-for-linguistic-diversity-with-hugging-face-on-amazon-sagemaker/). Today, natural language processing (NLP) examples are dominated by the English language, the native language for only 5% of the human population and spoken only by 17%. In this post, we summarize the challenges of low-resource languages and experiment with different solution approaches covering over 100 languages using Hugging Face transformers on Amazon SageMaker.\n- [Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models](https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/). In this post, we provide a step-by-step walkthrough on how to fine-tune and deploy a text classification model, using trained models from TensorFlow Hub. We explore two ways of obtaining the same result, via JumpStart’s graphical interface on Studio, and programmatically through JumpStart’s APIs.\n- [Build a custom Q&A dataset using Amazon SageMaker Ground Truth to train a Hugging Face Q&A NLU model](https://aws.amazon.com/blogs/machine-learning/build-a-custom-qa-dataset-using-amazon-sagemaker-ground-truth-to-train-a-hugging-face-qa-nlu-model/). One NLU problem of particular business interest is the task of question answering. In this post, we demonstrate how to build a custom question answering dataset using Amazon SageMaker Ground Truth to train a Hugging Face question answering NLU model.\n- [Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/achieve-hyperscale-performance-for-model-serving-using-nvidia-triton-inference-server-on-amazon-sagemaker/). In this post, we look at best practices for deploying transformer models at scale on GPUs using Triton Inference Server on SageMaker. First, we start with a summary of key concepts around latency in SageMaker, and an overview of performance tuning guidelines. Next, we provide an overview of Triton and its features as well as example code for deploying on SageMaker. Finally, we perform load tests using SageMaker Inference Recommender and summarize the insights and conclusions from load testing of a popular transformer model provided by Hugging Face.\n\n![image.png](https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/fdb2a94aec19419abf72241803a2bac5_image.png)\n\n### **Content Moderation design patterns with Amazon Web Services managed AI services**\nModern web and mobile platforms fuel businesses and drive user engagement through social features, from startups to large organizations. Online community members expect safe and inclusive experiences where they can freely consume and contribute images, videos, text, and audio. The ever-increasing volume, variety, and complexity of UGC (user generated content) make traditional human moderation workflows challenging to scale to protect users.\n\n![image.png](https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/69ec54320b3d42f69ec61d165efc3f5c_image.png)\n\nWatch a presentation of the demo on [YouTube](https://www.youtube.com/watch?v=yMN3Xx0DcoU)\nRead more about [content moderation design patterns with Amazon Web Services managed AI services](https://aws.amazon.com/blogs/machine-learning/content-moderation-design-patterns-with-aws-managed-ai-services/)","render":"<p><a href=\"https://www.gilead.com/\" target=\"_blank\">Gilead</a>’s PDM (pharmaceutical development and manufacturing) team chose Amazon Web Services (Amazon Web Services), adopting Amazon Kendra, a highly accurate intelligent search service powered by ML. While receiving support from Amazon Web Services, the PDM team <a href=\"https://aws.amazon.com/solutions/case-studies/gilead-case-study/\" target=\"_blank\">built a data lake within 9 months</a>, and continued on to build a search tool within only 3 months, completing its project well within its estimated timeline of 3 years.</p>\n<p><img src=\"https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/5a98907416a24a9f955e41984fc75e08_image.png\" alt=\"image.png\" /></p>\n<p>Since launching its enterprise search tool, users across the PDM team have been able to substantially reduce manual data management tasks and the amount of time it takes to search for information by approximately 50 percent. This has fueled research, experimentation, and pharmaceutical breakthroughs.</p>\n<p>Amazon Kendra is a turnkey AI solution that, when configured correctly, is capable of spanning every single domain in the organization while being straightforward to implement.</p>\n<p>– Jeremy Zhang, Director of Data Science and Knowledge Management, Gilead Sciences Inc.</p>\n<h3><a id=\"Latent_Space_10\"></a><strong>Latent Space</strong></h3>\n<p><a href=\"https://www.latentspace.co/\" target=\"_blank\">Latent Space</a> is a company that specializes in the next wave of generative models for businesses and creatives, combining fields that have long had little overlap: graphics and natural language processing (NLP).</p>\n<p>Amazon SageMaker‘s unique automated model partitioning and efficient pipelining approach made <a href=\"https://aws.amazon.com/blogs/machine-learning/how-latent-space-used-the-amazon-sagemaker-model-parallelism-library-to-push-the-frontiers-of-large-scale-transformers/\" target=\"_blank\">our adoption of model parallelism</a> possible with little engineering effort, and we scaled our training of models beyond 1 billion parameters, which is an important requirement for us. Furthermore, we observed that when training with a 16 node, eight GPU training setup with the SageMaker model parallelism library, we recorded a 38% improvement in efficiency compared to our previous training runs.</p>\n<h3><a id=\"AI_Language_Services_15\"></a><strong>AI Language Services</strong></h3>\n<h4><a id=\"Amazon_Lex_16\"></a><strong>Amazon Lex</strong></h4>\n<p><a href=\"https://aws.amazon.com/lex/\" target=\"_blank\">Amazon Lex</a> provides automatic speech recognition (ASR) and natural language understanding (NLU) capabilities so you can build applications and interactive voice response (IVR) solutions with engaging user experiences. Now, you can [programmatically provide phrases as hints](programmatically provide phrases as hints) during a live interaction to influence the transcription of spoken input.</p>\n<p><img src=\"https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/0db07d0be0864d4dbf8f4451d99f06f8_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Amazon_Comprehend_21\"></a><strong>Amazon Comprehend</strong></h4>\n<p><a href=\"https://aws.amazon.com/comprehend/\" target=\"_blank\">Amazon Comprehend</a> is a natural language processing (NLP) service that uses machine learning to find insights and relationships like people, places, sentiments, and topics in unstructured text. You can use Amazon Comprehend ML capabilities to detect and redact personally identifiable information (PII) in customer emails, support tickets, product reviews, social media, and more. Now, <a href=\"https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-comprehend-detects-redacts-pll-entity-types-across-us-uk-canada-india/\" target=\"_blank\">Amazon Comprehend PII supports 14 new entity types</a>, with localized support for entities within the United States, Canada, United Kingdom, and India. Customers can now detect and redact 36 entities to protect sensitive data.</p>\n<h4><a id=\"Amazon_Lex_24\"></a><strong>Amazon Lex</strong></h4>\n<p><a href=\"https://aws.amazon.com/lex/\" target=\"_blank\">Amazon Lex</a> is a service for building conversational interfaces into any application using voice and text. Starting today, you can <a href=\"https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-lex-supports-custom-vocabulary/\" target=\"_blank\">give Amazon Lex additional information about how to process speech input by creating a custom vocabulary</a>. A custom vocabulary is a list of domain-specific terms or unique words (e.g., brand names, product names) that are more difficult to recognize. You create the list and add it to the bot definition, so Amazon Lex can use these words when determining the user’s intent or collecting information in a conversation.</p>\n<h3><a id=\"NLP_on_Amazon_SageMaker_27\"></a><strong>NLP on Amazon SageMaker</strong></h3>\n<ul>\n<li><a href=\"https://aws.amazon.com/blogs/machine-learning/detect-social-media-fake-news-using-graph-machine-learning-with-amazon-neptune-ml/\" target=\"_blank\">Detect social media fake news using graph machine learning with Amazon Neptune ML</a>. The spread of misinformation and fake news on these platforms has posed a major challenge to the well-being of individuals and societies. Therefore, it is imperative that we develop robust and automated solutions for early detection of fake news on social media. Traditional approaches rely purely on the news content (using natural language processing) to mark information as real or fake. However, the social context in which the news is published and shared can provide additional insights into the nature of fake news on social media and improve the predictive capabilities of fake news detection tools.</li>\n</ul>\n<p><img src=\"https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/f0bb29b857fb4a51826e89a6810d7535_image.png\" alt=\"image.png\" /></p>\n<ul>\n<li><a href=\"https://aws.amazon.com/blogs/machine-learning/fine-tune-transformer-language-models-for-linguistic-diversity-with-hugging-face-on-amazon-sagemaker/\" target=\"_blank\">Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker</a>. Today, natural language processing (NLP) examples are dominated by the English language, the native language for only 5% of the human population and spoken only by 17%. In this post, we summarize the challenges of low-resource languages and experiment with different solution approaches covering over 100 languages using Hugging Face transformers on Amazon SageMaker.</li>\n<li><a href=\"https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/\" target=\"_blank\">Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models</a>. In this post, we provide a step-by-step walkthrough on how to fine-tune and deploy a text classification model, using trained models from TensorFlow Hub. We explore two ways of obtaining the same result, via JumpStart’s graphical interface on Studio, and programmatically through JumpStart’s APIs.</li>\n<li><a href=\"https://aws.amazon.com/blogs/machine-learning/build-a-custom-qa-dataset-using-amazon-sagemaker-ground-truth-to-train-a-hugging-face-qa-nlu-model/\" target=\"_blank\">Build a custom Q&A dataset using Amazon SageMaker Ground Truth to train a Hugging Face Q&A NLU model</a>. One NLU problem of particular business interest is the task of question answering. In this post, we demonstrate how to build a custom question answering dataset using Amazon SageMaker Ground Truth to train a Hugging Face question answering NLU model.</li>\n<li><a href=\"https://aws.amazon.com/blogs/machine-learning/achieve-hyperscale-performance-for-model-serving-using-nvidia-triton-inference-server-on-amazon-sagemaker/\" target=\"_blank\">Achieve hyperscale performance for model serving using NVIDIA Triton Inference Server on Amazon SageMaker</a>. In this post, we look at best practices for deploying transformer models at scale on GPUs using Triton Inference Server on SageMaker. First, we start with a summary of key concepts around latency in SageMaker, and an overview of performance tuning guidelines. Next, we provide an overview of Triton and its features as well as example code for deploying on SageMaker. Finally, we perform load tests using SageMaker Inference Recommender and summarize the insights and conclusions from load testing of a popular transformer model provided by Hugging Face.</li>\n</ul>\n<p><img src=\"https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/fdb2a94aec19419abf72241803a2bac5_image.png\" alt=\"image.png\" /></p>\n<h3><a id=\"Content_Moderation_design_patterns_with_Amazon_Web_Services_managed_AI_services_39\"></a><strong>Content Moderation design patterns with Amazon Web Services managed AI services</strong></h3>\n<p>Modern web and mobile platforms fuel businesses and drive user engagement through social features, from startups to large organizations. Online community members expect safe and inclusive experiences where they can freely consume and contribute images, videos, text, and audio. The ever-increasing volume, variety, and complexity of UGC (user generated content) make traditional human moderation workflows challenging to scale to protect users.</p>\n<p><img src=\"https://awsdevweb.s3.cn-north-1.amazonaws.com.cn/69ec54320b3d42f69ec61d165efc3f5c_image.png\" alt=\"image.png\" /></p>\n<p>Watch a presentation of the demo on <a href=\"https://www.youtube.com/watch?v=yMN3Xx0DcoU\" target=\"_blank\">YouTube</a><br />\nRead more about <a href=\"https://aws.amazon.com/blogs/machine-learning/content-moderation-design-patterns-with-aws-managed-ai-services/\" target=\"_blank\">content moderation design patterns with Amazon Web Services managed AI services</a></p>\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家