亚马逊云科技 NLP 月刊 2022年9月

人工智能
机器学习
海外精选
Amazon Comprehend
Amazon Transcribe
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"\n\nHello world. This is the September 2022 edition of the Amazon Web Services Natural Language Processing(NLP) newsletter covering everything related to NLP at Amazon Web Services. Feel free to leave comments & share it on your social network.\n\n## **Amazon Web Services NLP Summit**\n\nOnly one week to the first ever ++[Amazon Web Services NLP Summit 2022](https://awsnlpsummit.splashthat.com/)++ in London! We are so excited about this and the team has put in a huge amount of effort to make sure that the agenda is packed with interesting speakers, sessions, and workshops.\n\n\n\nThe Amazon Web Services NLP Summit 2022 at London, UK, features inspiring keynotes by:\n++[Craig Saunders](https://www.linkedin.com/in/craigjsaunders/)++ Director of Machine Learning at Alexa AI\n++[Vasi Philomin](https://www.linkedin.com/in/vasi-philomin/)++ Vice President of Amazon Web Services AI\n++[Satish Lakshmanan](https://www.linkedin.com/in/satish-lakshmanan-1a6b3a/)++ Director of Amazon Web Services AI/ML Worldwide Specialists\n\nNot only do you get to interact with and hear from the most innovative minds in the industry, you can also try it in action with hands-on workshops, inspiring breakouts, and learn from expert panels of inventors, scientists, and pioneering startups that are shaping key NLP trends in Amazon Web Services today. The summit will host 25 sessions covering cutting-edge innovation, the hottest research and building enterprise NLP applications at scale for popular use cases across industries.\n\nWhen: October 5 and 6, 2022\nWhere: 1 Principal Place, Worship Street, LONDON, EC2A 2FA\nFormat: In-person only\n\nUse ++[this link](https://awsnlpsummit.splashthat.com/)++ to register and save your spots.\n\n\n## **NLP@Amazon Web Services Customer Success Stories**\n\n**Savana** - ++[Builds World’s Only Natural Language Processing Clinical Research Network on Amazon Web Services](https://aws.amazon.com/solutions/case-studies/savana-natural-language-case-study/)++\nMadrid-based ++[Savana](https://savanamed.com/)++ helps healthcare providers to unlock the value of their electronic medical records (EMRs) for research purposes. It combines research-grade methodology with natural language processing (NLP) and predictive analytics to obtain relevant results for healthcare and life science providers investigating disease prediction and treatment.\n\n*\"Amazon Web Services and Savana work together as trusted cloud service providers and our customers benefit from that trust as well. Hospital IT departments are realizing that the cloud is safer than the traditional on-premises approach.\" – said Jorge Tello, Chief Executive Officer at Savana*\n\n**HM Land Registry** - ++[Cuts Document Review Time in Half Using Amazon Textract](https://aws.amazon.com/solutions/case-studies/hm-land-registry/?did=cr_card&trk=cr_card)++\nThe UK government agency’s caseworkers need to review complex legal documents manually, which was time consuming, so ++[HMLR](https://www.gov.uk/land-registry)++ and ++[Kainos](https://www.kainos.com/)++ built a solution powered by [Amazon Textract](https://aws.amazon.com/cn/textract/?trk=cndc-detail) and [Amazon Comprehend](https://aws.amazon.com/cn/comprehend/?trk=cndc-detail) that automatically compares documents and flags discrepancies for review. Now, caseworkers no longer need to review thousands of documents per week, and the agency can approve property transfers faster.\n\n\"This project has successfully shown how we can use artificial intelligence and machine learning and, more importantly, how we can strengthen that capability as we progress through our digital transformation. Using Amazon Web Services, we can improve our processes and enhance how our employees work\" – said ++[Nick Davies](https://www.linkedin.com/in/nick-davies-279b9572/)++, Senior Product Manager for Internal Digital Services at HM Land Registry\n\n**State Auto** - ++[Improves Processes across the Life Cycle Using Amazon Web Services Machine Learning, Computer Vision, and Serverless Architecture](https://aws.amazon.com/solutions/case-studies/state-auto/?did=cr_card&trk=cr_card)++\n++[State Automobile Mutual Insurance Company](https://www.stateauto.com/)++ (State Auto) began using technology to help its customer service representatives (CSRs) meet quality and customer satisfaction score goals when it built its SA360 solution on Amazon Web Services (Amazon Web Services). By using data-fueled insights and making these insights available to customers and CSRs, State Auto was able to build a better service experience, redirecting typical customer calls to self-service channels so that CSRs were able to focus on those customers with more complex needs.\n\n*\"Because Amazon Web Services services do their job so well out of the box, we have the flexibility to be creative and build things on top of them.\" – said ++[Uthra Ramanujam](https://www.linkedin.com/in/uthra-ramanujam-2907b0/)++, Vice President of Strategic Technology Research at State Automobile Mutual Insurance Company*\n\n\n## **AI Language Services**\n\n**++[Amazon Lex introduces Visual Conversation builder](https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-visual-conversation-builder/)++,**\na drag and drop interface to visualize and build conversation flows in a no-code environment. The Visual Conversation Builder greatly simplifies bot design. In addition to the already available menu-based editor, and Lex APIs, the visual builder provides a complete view of the entire conversation flow in one location. It empowers any user to build engaging conversational experiences more quickly. Launch ++[blog post](https://aws.amazon.com/blogs/machine-learning/announcing-visual-conversation-builder-for-amazon-lex/)++.\n\n![wjujomyovtlo60vqmslz.jpg](https://dev-media.amazoncloud.cn/12660dae30bc4997b937c503238c5abf_wjujomyovtlo60vqmslz.jpg)\n\n**++[Amazon Lex introduces the composite slot type](https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-lex-composite-slot-type/)++**. A slot is used to capture user input and provide the bot the necessary information to fulfil a task. In some cases, the information contains multiple values, each requiring its own slot. With the composite slot type, [Amazon Lex](https://aws.amazon.com/cn/lex/?trk=cndc-detail) can capture the full user response at once and associate each piece of information with the appropriate slot.\n\n**++[Amazon Web Services Comprehend now supports synchronous processing for targeted sentiment](https://aws.amazon.com/about-aws/whats-new/2022/09/aws-comprehend-supports-synchronous-processing-targeted-sentiment/)++**. [Amazon Comprehend](https://aws.amazon.com/cn/comprehend/?trk=cndc-detail) customers can now extract the sentiments associated with entities from text documents in real-time using the newly released synchronous API. Targeted Sentiment synchronous API enables customers to derive granular sentiments associated with specific entities of interest such as brands or products without waiting for batch processing. Launch blog post.\n\n![hnlheisq70z9wob11ut0.png](https://dev-media.amazoncloud.cn/63ca372fe0de42e2b673eaaf292b3a39_hnlheisq70z9wob11ut0.png)\n\n**++[Amazon Textract announces updates to the text extraction feature](https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-textract-updates-text-extraction-feature/)++**. [Amazon Textract](https://aws.amazon.com/cn/textract/?trk=cndc-detail) is a machine learning service that automatically extracts text, handwriting, and data from any document or image. We are pleased to announce quality enhancements to our text extraction feature available via the DetectDocumentText API. The latest Text detection models available via the DetectDocumentText API now provide improvements to word and line extraction accuracy and specifically for E13B fonts commonly found in checks/cheques, International Bank Account Numbers found in banking documents, and long words (e.g., email addresses).\n\n## **NLP on SageMaker**\n\n**++[Amazon SageMaker now supports deploying large models through configurable volume size and timeout quotas](https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-sagemaker-deploying-large-models-volume-size-timeout-quotas/)++**. [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail)’s Real-time and Asynchronous Inference options can now deploy large models (up to 500GB) for inference by configuring the maximum EBS volume size and timeout quotas. This launch enables customers to leverage SageMaker's fully managed Real-time and Asynchronous inference capabilities to deploy and manage large ML models such as variants of GPT and OPT. Launch blog post: ++[Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference.](https://aws.amazon.com/blogs/machine-learning/deploy-large-models-on-amazon-sagemaker-using-djlserving-and-deepspeed-model-parallel-inference/)++\n\n![tf2q87b6dtf069z47jx5.jpg](https://dev-media.amazoncloud.cn/4c551ffd4d8e40a99522ec25b514e8b8_tf2q87b6dtf069z47jx5.jpg)\n\n**++[Amazon SageMaker Model Monitor: A system for real-time insights into deployed machine learning models](https://www.amazon.science/publications/amazon-sagemaker-model-monitor-a-system-for-real-time-insights-into-deployed-machine-learning-models)++**, is a fully managed service that continuously monitors the quality of machine learning models hosted on [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail). It automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models.\n\nNLP encoders like word-2vec, BERT and RoBERTa are used in a wide array of applications, ranging from chat bots and virtual assistants, to machine translation and text summarization. These encoders operate by converting input words or sequences of words into (contextual or non-contextual) word-level embeddings. These embeddings are then used by downstream task-specific models. A change in the distribution of the input text can clearly impact the performance of the downstream model. \n\nHowever, due to specific structure of text, monitoring text data is a challenging task. Unlike tabular data which is often fixed-dimensional and bounded, text data is often free form. To overcome these issues, ++[SageMaker Model Monitor](https://aws.amazon.com/sagemaker/model-monitor/)++ can be used on the embeddings of the text data as opposed to the raw text itself. The following figure shows an example of configuring a custom monitoring schedule to detect drifts in text data.\n\n![gcpd858a4mpe392mlecw.png](https://dev-media.amazoncloud.cn/b18ed37be37e43fcaf224e986c6b55b9_gcpd858a4mpe392mlecw.png)\n\n# **NLP @ Community**\n\n### **Whisper**\nLast week ++[OpenAI](https://openai.com/)++ released an open-sourcing neural net called ++[Whisper](https://openai.com/blog/whisper/)++ that approaches human level robustness and accuracy on English speech recognition. This automatic speech recognition (ASR) system was trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.\nLaunch content [++[Blog](https://openai.com/blog/whisper/)++] [++[Paper](https://cdn.openai.com/papers/whisper.pdf)++] [++[Code](https://github.com/openai/whisper)++] [++[Model card](https://github.com/openai/whisper/blob/main/model-card.md)++]\n\n![j246mstd7hosxfqu09gc.png](https://dev-media.amazoncloud.cn/c4e717966f0649159d31e0ce51ae0f2c_j246mstd7hosxfqu09gc.png)\n\n\n## **NLP Posts from Amazon Web Services Machine Learning Blog**\n\n++[**Improve transcription accuracy of customer-agent calls with custom vocabulary in Amazon Transcribe**](https://aws.amazon.com/blogs/machine-learning/improve-transcription-accuracy-of-customer-agent-calls-with-custom-vocabulary-in-amazon-transcribe/)++\nIn many countries, such as India, English is not the primary language of communication. Indian customer conversations contain regional languages like Hindi, with English words and phrases spoken randomly throughout the calls. In the source media files, there can be proper nouns, domain-specific acronyms, words, or phrases that the default [Amazon Transcribe](https://aws.amazon.com/cn/transcribe/?trk=cndc-detail) model isn’t aware of. Transcriptions for such media files can have inaccurate spellings for those words.\nThis post demonstrates how you can provide more information to [Amazon Transcribe](https://aws.amazon.com/cn/transcribe/?trk=cndc-detail) with ++[custom vocabularies](https://docs.aws.amazon.com/transcribe/latest/dg/custom-vocabulary.html#create-vocabulary-table)++ to update the way [Amazon Transcribe](https://aws.amazon.com/cn/transcribe/?trk=cndc-detail) handles transcription of your audio files with business-specific terminology.\n\n## ++[**Get better insight from reviews using Amazon Comprehend**](https://aws.amazon.com/blogs/machine-learning/get-better-insight-from-reviews-using-amazon-comprehend/)++\nStatistics show that the majority of shoppers use reviews to determine what products to buy and which services to use. As per ++[Spiegel Research Centre](https://spiegel.medill.northwestern.edu/how-online-reviews-influence-sales/)++, the purchase likelihood for a product with five reviews is 270% greater than the purchase likelihood of a product with no reviews. Reviews have the power to influence consumer decisions and strengthen brand value.\nThis post shows how to use [Amazon Comprehend](https://aws.amazon.com/cn/comprehend/?trk=cndc-detail) to extract meaningful information from product reviews, analyze it to understand how users of different demographics are reacting to products, and discover aggregated information on user affinity towards a product.\n\n![g64wqme29abeuv4p5a13.png](https://dev-media.amazoncloud.cn/da889bff50bc4253b5c18717b0f0f24f_g64wqme29abeuv4p5a13.png)\n\n\n### **++[Read webpages and highlight content using Amazon Polly](https://aws.amazon.com/blogs/machine-learning/read-webpages-and-highlight-content-using-amazon-polly/)++**\nThis post demonstrates how to use [Amazon Polly](https://aws.amazon.com/cn/polly/?trk=cndc-detail)—a leading cloud service that converts text into lifelike speech—to read the content of a webpage and highlight the content as it’s being read. Adding audio playback to a webpage improves the accessibility and visitor experience of the page. Audio-enhanced content is more impactful and memorable, draws more traffic to the page, and taps into the spending power of visitors. It also improves the brand of the company or organization that publishes the page.\n\n![y65ylth55ij5qbngdgu2.jpg](https://dev-media.amazoncloud.cn/103304d74ce2497381091341792e3e29_y65ylth55ij5qbngdgu2.jpg)\n\n### **++[Discover insights from Zendesk with Amazon Kendra intelligent search](https://aws.amazon.com/blogs/machine-learning/discover-insights-from-zendesk-with-amazon-kendra-intelligent-search/)++**\nNow you can use the [Amazon Kendra](https://aws.amazon.com/cn/kendra/?trk=cndc-detail) ++[Zendesk connector](https://www.zendesk.com/service/)++ to index your Zendesk service tickets, help guides, and community posts, and perform intelligent search powered by machine learning (ML). [Amazon Kendra](https://aws.amazon.com/cn/kendra/?trk=cndc-detail) smartly and efficiently answers natural language-based queries using advanced natural language processing (NLP) techniques. It can learn effectively from your Zendesk data, extracting meaning and context.\nThis post shows how to configure the [Amazon Kendra](https://aws.amazon.com/cn/kendra/?trk=cndc-detail) Zendesk connector to index your Zendesk domain and take advantage of [Amazon Kendra](https://aws.amazon.com/cn/kendra/?trk=cndc-detail) intelligent search.\n\n\n### **++[Choose the k-NN algorithm for your billion-scale use case with OpenSearch](https://aws.amazon.com/blogs/big-data/choose-the-k-nn-algorithm-for-your-billion-scale-use-case-with-opensearch/)++**\nWhen organizations set out to build machine learning (ML) applications such as natural language processing (NLP) systems, recommendation engines, or search-based systems, often times k-Nearest Neighbor (k-NN) search will be used at some point in the workflow. As the number of data points reaches the hundreds of millions or even billions, scaling a k-NN search system can be a major challenge. Applying Approximate Nearest Neighbor (ANN) search is a great way to overcome this challenge.\nThe OpenSearch k-NN plugin provides the ability to use some of these algorithms within an ++[OpenSearch](https://opensearch.org/)++ cluster. This post presents the different algorithms that are supported and shows experiments to see some of the trade-offs between them.\n\n![fl7qxyebfw0z6sbbyh6u.jpg](https://dev-media.amazoncloud.cn/122f9a286be349068f775d6c1ca6683c_fl7qxyebfw0z6sbbyh6u.jpg)\n\n## **Stay in touch with NLP on Amazon Web Services**\n\nOur contact: ++[Amazon Web Services-nlp@amazon.com](mailto:aws-nlp@amazon.com)++\nEmail us about (1) your awesome project about NLP on Amazon Web Services, (2) let us know which post in the newsletter helped your NLP journey, (3) other things that you want us to post on the newsletter. Talk to you soon.\n\n\n\n","render":"<p>Hello world. This is the September 2022 edition of the Amazon Web Services Natural Language Processing(NLP) newsletter covering everything related to NLP at Amazon Web Services. Feel free to leave comments &amp; share it on your social network.</p>\n<h2><a id=\\"Amazon_Web_Services_NLP_Summit_4\\"></a><strong>Amazon Web Services NLP Summit</strong></h2>\\n<p>Only one week to the first ever <ins><a href=\\"https://awsnlpsummit.splashthat.com/\\" target=\\"_blank\\">Amazon Web Services NLP Summit 2022</a></ins> in London! We are so excited about this and the team has put in a huge amount of effort to make sure that the agenda is packed with interesting speakers, sessions, and workshops.</p>\n<p>The Amazon Web Services NLP Summit 2022 at London, UK, features inspiring keynotes by:<br />\\n<ins><a href=\\"https://www.linkedin.com/in/craigjsaunders/\\" target=\\"_blank\\">Craig Saunders</a></ins> Director of Machine Learning at Alexa AI<br />\\n<ins><a href=\\"https://www.linkedin.com/in/vasi-philomin/\\" target=\\"_blank\\">Vasi Philomin</a></ins> Vice President of Amazon Web Services AI<br />\\n<ins><a href=\\"https://www.linkedin.com/in/satish-lakshmanan-1a6b3a/\\" target=\\"_blank\\">Satish Lakshmanan</a></ins> Director of Amazon Web Services AI/ML Worldwide Specialists</p>\n<p>Not only do you get to interact with and hear from the most innovative minds in the industry, you can also try it in action with hands-on workshops, inspiring breakouts, and learn from expert panels of inventors, scientists, and pioneering startups that are shaping key NLP trends in Amazon Web Services today. The summit will host 25 sessions covering cutting-edge innovation, the hottest research and building enterprise NLP applications at scale for popular use cases across industries.</p>\n<p>When: October 5 and 6, 2022<br />\\nWhere: 1 Principal Place, Worship Street, LONDON, EC2A 2FA<br />\\nFormat: In-person only</p>\n<p>Use <ins><a href=\\"https://awsnlpsummit.splashthat.com/\\" target=\\"_blank\\">this link</a></ins> to register and save your spots.</p>\n<h2><a id=\\"NLPAmazon_Web_Services_Customer_Success_Stories_24\\"></a><strong>NLP@Amazon Web Services Customer Success Stories</strong></h2>\\n<p><strong>Savana</strong> - <ins><a href=\\"https://aws.amazon.com/solutions/case-studies/savana-natural-language-case-study/\\" target=\\"_blank\\">Builds World’s Only Natural Language Processing Clinical Research Network on Amazon Web Services</a></ins><br />\\nMadrid-based <ins><a href=\\"https://savanamed.com/\\" target=\\"_blank\\">Savana</a></ins> helps healthcare providers to unlock the value of their electronic medical records (EMRs) for research purposes. It combines research-grade methodology with natural language processing (NLP) and predictive analytics to obtain relevant results for healthcare and life science providers investigating disease prediction and treatment.</p>\n<p><em>“Amazon Web Services and Savana work together as trusted cloud service providers and our customers benefit from that trust as well. Hospital IT departments are realizing that the cloud is safer than the traditional on-premises approach.” – said Jorge Tello, Chief Executive Officer at Savana</em></p>\\n<p><strong>HM Land Registry</strong> - <ins><a href=\\"https://aws.amazon.com/solutions/case-studies/hm-land-registry/?did=cr_card&amp;trk=cr_card\\" target=\\"_blank\\">Cuts Document Review Time in Half Using Amazon Textract</a></ins><br />\\nThe UK government agency’s caseworkers need to review complex legal documents manually, which was time consuming, so <ins><a href=\\"https://www.gov.uk/land-registry\\" target=\\"_blank\\">HMLR</a></ins> and <ins><a href=\\"https://www.kainos.com/\\" target=\\"_blank\\">Kainos</a></ins> built a solution powered by Amazon Textract and Amazon Comprehend that automatically compares documents and flags discrepancies for review. Now, caseworkers no longer need to review thousands of documents per week, and the agency can approve property transfers faster.</p>\n<p>“This project has successfully shown how we can use artificial intelligence and machine learning and, more importantly, how we can strengthen that capability as we progress through our digital transformation. Using Amazon Web Services, we can improve our processes and enhance how our employees work” – said <ins><a href=\\"https://www.linkedin.com/in/nick-davies-279b9572/\\" target=\\"_blank\\">Nick Davies</a></ins>, Senior Product Manager for Internal Digital Services at HM Land Registry</p>\n<p><strong>State Auto</strong> - <ins><a href=\\"https://aws.amazon.com/solutions/case-studies/state-auto/?did=cr_card&amp;trk=cr_card\\" target=\\"_blank\\">Improves Processes across the Life Cycle Using Amazon Web Services Machine Learning, Computer Vision, and Serverless Architecture</a></ins><br />\\n<ins><a href=\\"https://www.stateauto.com/\\" target=\\"_blank\\">State Automobile Mutual Insurance Company</a></ins> (State Auto) began using technology to help its customer service representatives (CSRs) meet quality and customer satisfaction score goals when it built its SA360 solution on Amazon Web Services (Amazon Web Services). By using data-fueled insights and making these insights available to customers and CSRs, State Auto was able to build a better service experience, redirecting typical customer calls to self-service channels so that CSRs were able to focus on those customers with more complex needs.</p>\n<p><em>“Because Amazon Web Services services do their job so well out of the box, we have the flexibility to be creative and build things on top of them.” – said <ins><a href=\\"https://www.linkedin.com/in/uthra-ramanujam-2907b0/\\" target=\\"_blank\\">Uthra Ramanujam</a></ins>, Vice President of Strategic Technology Research at State Automobile Mutual Insurance Company</em></p>\\n<h2><a id=\\"AI_Language_Services_42\\"></a><strong>AI Language Services</strong></h2>\\n<p><strong><ins><a href=\\"https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-visual-conversation-builder/\\" target=\\"_blank\\">Amazon Lex introduces Visual Conversation builder</a></ins>,</strong><br />\\na drag and drop interface to visualize and build conversation flows in a no-code environment. The Visual Conversation Builder greatly simplifies bot design. In addition to the already available menu-based editor, and Lex APIs, the visual builder provides a complete view of the entire conversation flow in one location. It empowers any user to build engaging conversational experiences more quickly. Launch <ins><a href=\\"https://aws.amazon.com/blogs/machine-learning/announcing-visual-conversation-builder-for-amazon-lex/\\" target=\\"_blank\\">blog post</a></ins>.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/12660dae30bc4997b937c503238c5abf_wjujomyovtlo60vqmslz.jpg\\" alt=\\"wjujomyovtlo60vqmslz.jpg\\" /></p>\n<p><strong><ins><a href=\\"https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-lex-composite-slot-type/\\" target=\\"_blank\\">Amazon Lex introduces the composite slot type</a></ins></strong>. A slot is used to capture user input and provide the bot the necessary information to fulfil a task. In some cases, the information contains multiple values, each requiring its own slot. With the composite slot type, [Amazon Lex](https://aws.amazon.com/cn/lex/?trk=cndc-detail) can capture the full user response at once and associate each piece of information with the appropriate slot.</p>\\n<p><strong><ins><a href=\\"https://aws.amazon.com/about-aws/whats-new/2022/09/aws-comprehend-supports-synchronous-processing-targeted-sentiment/\\" target=\\"_blank\\">Amazon Web Services Comprehend now supports synchronous processing for targeted sentiment</a></ins></strong>. [Amazon Comprehend](https://aws.amazon.com/cn/comprehend/?trk=cndc-detail) customers can now extract the sentiments associated with entities from text documents in real-time using the newly released synchronous API. Targeted Sentiment synchronous API enables customers to derive granular sentiments associated with specific entities of interest such as brands or products without waiting for batch processing. Launch blog post.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/63ca372fe0de42e2b673eaaf292b3a39_hnlheisq70z9wob11ut0.png\\" alt=\\"hnlheisq70z9wob11ut0.png\\" /></p>\n<p><strong><ins><a href=\\"https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-textract-updates-text-extraction-feature/\\" target=\\"_blank\\">Amazon Textract announces updates to the text extraction feature</a></ins></strong>. [Amazon Textract](https://aws.amazon.com/cn/textract/?trk=cndc-detail) is a machine learning service that automatically extracts text, handwriting, and data from any document or image. We are pleased to announce quality enhancements to our text extraction feature available via the DetectDocumentText API. The latest Text detection models available via the DetectDocumentText API now provide improvements to word and line extraction accuracy and specifically for E13B fonts commonly found in checks/cheques, International Bank Account Numbers found in banking documents, and long words (e.g., email addresses).</p>\\n<h2><a id=\\"NLP_on_SageMaker_57\\"></a><strong>NLP on SageMaker</strong></h2>\\n<p><strong><ins><a href=\\"https://aws.amazon.com/about-aws/whats-new/2022/09/amazon-sagemaker-deploying-large-models-volume-size-timeout-quotas/\\" target=\\"_blank\\">Amazon SageMaker now supports deploying large models through configurable volume size and timeout quotas</a></ins></strong>. [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail)’s Real-time and Asynchronous Inference options can now deploy large models (up to 500GB) for inference by configuring the maximum EBS volume size and timeout quotas. This launch enables customers to leverage SageMaker’s fully managed Real-time and Asynchronous inference capabilities to deploy and manage large ML models such as variants of GPT and OPT. Launch blog post: <ins><a href=\\"https://aws.amazon.com/blogs/machine-learning/deploy-large-models-on-amazon-sagemaker-using-djlserving-and-deepspeed-model-parallel-inference/\\" target=\\"_blank\\">Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference.</a></ins></p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/4c551ffd4d8e40a99522ec25b514e8b8_tf2q87b6dtf069z47jx5.jpg\\" alt=\\"tf2q87b6dtf069z47jx5.jpg\\" /></p>\n<p><strong><ins><a href=\\"https://www.amazon.science/publications/amazon-sagemaker-model-monitor-a-system-for-real-time-insights-into-deployed-machine-learning-models\\" target=\\"_blank\\">Amazon SageMaker Model Monitor: A system for real-time insights into deployed machine learning models</a></ins></strong>, is a fully managed service that continuously monitors the quality of machine learning models hosted on [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail). It automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models.</p>\\n<p>NLP encoders like word-2vec, BERT and RoBERTa are used in a wide array of applications, ranging from chat bots and virtual assistants, to machine translation and text summarization. These encoders operate by converting input words or sequences of words into (contextual or non-contextual) word-level embeddings. These embeddings are then used by downstream task-specific models. A change in the distribution of the input text can clearly impact the performance of the downstream model.</p>\n<p>However, due to specific structure of text, monitoring text data is a challenging task. Unlike tabular data which is often fixed-dimensional and bounded, text data is often free form. To overcome these issues, <ins><a href=\\"https://aws.amazon.com/sagemaker/model-monitor/\\" target=\\"_blank\\">SageMaker Model Monitor</a></ins> can be used on the embeddings of the text data as opposed to the raw text itself. The following figure shows an example of configuring a custom monitoring schedule to detect drifts in text data.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b18ed37be37e43fcaf224e986c6b55b9_gcpd858a4mpe392mlecw.png\\" alt=\\"gcpd858a4mpe392mlecw.png\\" /></p>\n<h1><a id=\\"NLP__Community_71\\"></a><strong>NLP @ Community</strong></h1>\\n<h3><a id=\\"Whisper_73\\"></a><strong>Whisper</strong></h3>\\n<p>Last week <ins><a href=\\"https://openai.com/\\" target=\\"_blank\\">OpenAI</a></ins> released an open-sourcing neural net called <ins><a href=\\"https://openai.com/blog/whisper/\\" target=\\"_blank\\">Whisper</a></ins> that approaches human level robustness and accuracy on English speech recognition. This automatic speech recognition (ASR) system was trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.<br />\\nLaunch content [<ins><a href=\\"https://openai.com/blog/whisper/\\" target=\\"_blank\\">Blog</a></ins>] [<ins><a href=\\"https://cdn.openai.com/papers/whisper.pdf\\" target=\\"_blank\\">Paper</a></ins>] [<ins><a href=\\"https://github.com/openai/whisper\\" target=\\"_blank\\">Code</a></ins>] [<ins><a href=\\"https://github.com/openai/whisper/blob/main/model-card.md\\" target=\\"_blank\\">Model card</a></ins>]</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/c4e717966f0649159d31e0ce51ae0f2c_j246mstd7hosxfqu09gc.png\\" alt=\\"j246mstd7hosxfqu09gc.png\\" /></p>\n<h2><a id=\\"NLP_Posts_from_Amazon_Web_Services_Machine_Learning_Blog_80\\"></a><strong>NLP Posts from Amazon Web Services Machine Learning Blog</strong></h2>\\n<p><ins><a href=\\"https://aws.amazon.com/blogs/machine-learning/improve-transcription-accuracy-of-customer-agent-calls-with-custom-vocabulary-in-amazon-transcribe/\\" target=\\"_blank\\"><strong>Improve transcription accuracy of customer-agent calls with custom vocabulary in Amazon Transcribe</strong></a></ins><br />\\nIn many countries, such as India, English is not the primary language of communication. Indian customer conversations contain regional languages like Hindi, with English words and phrases spoken randomly throughout the calls. In the source media files, there can be proper nouns, domain-specific acronyms, words, or phrases that the default Amazon Transcribe model isn’t aware of. Transcriptions for such media files can have inaccurate spellings for those words.<br />\\nThis post demonstrates how you can provide more information to Amazon Transcribe with <ins><a href=\\"https://docs.aws.amazon.com/transcribe/latest/dg/custom-vocabulary.html#create-vocabulary-table\\" target=\\"_blank\\">custom vocabularies</a></ins> to update the way Amazon Transcribe handles transcription of your audio files with business-specific terminology.</p>\n<h2><a id=\\"Get_better_insight_from_reviews_using_Amazon_Comprehendhttpsawsamazoncomblogsmachinelearninggetbetterinsightfromreviewsusingamazoncomprehend_86\\"></a><ins><a href=\\"https://aws.amazon.com/blogs/machine-learning/get-better-insight-from-reviews-using-amazon-comprehend/\\" target=\\"_blank\\"><strong>Get better insight from reviews using Amazon Comprehend</strong></a></ins></h2>\\n<p>Statistics show that the majority of shoppers use reviews to determine what products to buy and which services to use. As per <ins><a href=\\"https://spiegel.medill.northwestern.edu/how-online-reviews-influence-sales/\\" target=\\"_blank\\">Spiegel Research Centre</a></ins>, the purchase likelihood for a product with five reviews is 270% greater than the purchase likelihood of a product with no reviews. Reviews have the power to influence consumer decisions and strengthen brand value.<br />\\nThis post shows how to use Amazon Comprehend to extract meaningful information from product reviews, analyze it to understand how users of different demographics are reacting to products, and discover aggregated information on user affinity towards a product.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/da889bff50bc4253b5c18717b0f0f24f_g64wqme29abeuv4p5a13.png\\" alt=\\"g64wqme29abeuv4p5a13.png\\" /></p>\n<h3><a id=\\"Read_webpages_and_highlight_content_using_Amazon_Pollyhttpsawsamazoncomblogsmachinelearningreadwebpagesandhighlightcontentusingamazonpolly_93\\"></a><strong><ins><a href=\\"https://aws.amazon.com/blogs/machine-learning/read-webpages-and-highlight-content-using-amazon-polly/\\" target=\\"_blank\\">Read webpages and highlight content using Amazon Polly</a></ins></strong></h3>\\n<p>This post demonstrates how to use Amazon Polly—a leading cloud service that converts text into lifelike speech—to read the content of a webpage and highlight the content as it’s being read. Adding audio playback to a webpage improves the accessibility and visitor experience of the page. Audio-enhanced content is more impactful and memorable, draws more traffic to the page, and taps into the spending power of visitors. It also improves the brand of the company or organization that publishes the page.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/103304d74ce2497381091341792e3e29_y65ylth55ij5qbngdgu2.jpg\\" alt=\\"y65ylth55ij5qbngdgu2.jpg\\" /></p>\n<h3><a id=\\"Discover_insights_from_Zendesk_with_Amazon_Kendra_intelligent_searchhttpsawsamazoncomblogsmachinelearningdiscoverinsightsfromzendeskwithamazonkendraintelligentsearch_98\\"></a><strong><ins><a href=\\"https://aws.amazon.com/blogs/machine-learning/discover-insights-from-zendesk-with-amazon-kendra-intelligent-search/\\" target=\\"_blank\\">Discover insights from Zendesk with Amazon Kendra intelligent search</a></ins></strong></h3>\\n<p>Now you can use the Amazon Kendra <ins><a href=\\"https://www.zendesk.com/service/\\" target=\\"_blank\\">Zendesk connector</a></ins> to index your Zendesk service tickets, help guides, and community posts, and perform intelligent search powered by machine learning (ML). Amazon Kendra smartly and efficiently answers natural language-based queries using advanced natural language processing (NLP) techniques. It can learn effectively from your Zendesk data, extracting meaning and context.<br />\\nThis post shows how to configure the Amazon Kendra Zendesk connector to index your Zendesk domain and take advantage of Amazon Kendra intelligent search.</p>\n<h3><a id=\\"Choose_the_kNN_algorithm_for_your_billionscale_use_case_with_OpenSearchhttpsawsamazoncomblogsbigdatachoosetheknnalgorithmforyourbillionscaleusecasewithopensearch_103\\"></a><strong><ins><a href=\\"https://aws.amazon.com/blogs/big-data/choose-the-k-nn-algorithm-for-your-billion-scale-use-case-with-opensearch/\\" target=\\"_blank\\">Choose the k-NN algorithm for your billion-scale use case with OpenSearch</a></ins></strong></h3>\\n<p>When organizations set out to build machine learning (ML) applications such as natural language processing (NLP) systems, recommendation engines, or search-based systems, often times k-Nearest Neighbor (k-NN) search will be used at some point in the workflow. As the number of data points reaches the hundreds of millions or even billions, scaling a k-NN search system can be a major challenge. Applying Approximate Nearest Neighbor (ANN) search is a great way to overcome this challenge.<br />\\nThe OpenSearch k-NN plugin provides the ability to use some of these algorithms within an <ins><a href=\\"https://opensearch.org/\\" target=\\"_blank\\">OpenSearch</a></ins> cluster. This post presents the different algorithms that are supported and shows experiments to see some of the trade-offs between them.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/122f9a286be349068f775d6c1ca6683c_fl7qxyebfw0z6sbbyh6u.jpg\\" alt=\\"fl7qxyebfw0z6sbbyh6u.jpg\\" /></p>\n<h2><a id=\\"Stay_in_touch_with_NLP_on_Amazon_Web_Services_109\\"></a><strong>Stay in touch with NLP on Amazon Web Services</strong></h2>\\n<p>Our contact: <ins><a href=\\"mailto:aws-nlp@amazon.com\\" target=\\"_blank\\">Amazon Web Services-nlp@amazon.com</a></ins><br />\\nEmail us about (1) your awesome project about NLP on Amazon Web Services, (2) let us know which post in the newsletter helped your NLP journey, (3) other things that you want us to post on the newsletter. Talk to you soon.</p>\n"}
0
目录
关闭