New Alexa features: Natural turn-taking

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"*Today in Seattle, Dave Limp, Amazon’s senior vice president for devices, unveiled the latest lineup of products and services from his organization. During the presentation, Rohit Prasad, Amazon vice president and Alexa head scientist, described three new advances from the Alexa science team. One of those is natural turn-taking.*\n\nRead Alexa head scientist Rohit Prasad's overview of today's Alexa-related announcements [on Amazon's Day One blog](https://blog.aboutamazon.com/devices/ai-advances-make-alexa-more-natural-conversational-and-useful).\n\nAlexa’s natural-turn-taking feature — which we plan to launch next year — will let customers interact with Alexa more naturally, without the need to repeat the wake word. The feature’s AI will be able to recognize when users have finished speaking, when their speech is directed at the device, when it’s not, and whether or not a reply is expected.\n\nNatural turn-taking builds on Alexa’s Follow-Up Mode, which uses acoustic cues to distinguish device-directed and non-device-directed speech. It adds other cues, such as visual information from devices with cameras. On-device algorithms process images from the camera, inferring from speakers’ body positions whether they are likely to be addressing Alexa.\n\nThe output of the computer vision algorithms is combined with the output of Alexa’s existing [acoustic algorithm](https://www.amazon.science/blog/alexa-do-i-need-to-use-your-wake-word-how-about-now) for detecting device-directed speech and fed to an on-device fusion model, which determines device directedness. This approach can distinguish device-directed speech even when multiple speakers are interacting with each other and with Alexa.\n\n![image.png](https://dev-media.amazoncloud.cn/37e85c0f437740798ae5580a9b7769a8_image.png)\n\nNatural turn-taking fuses audio and visual information to reach a final judgment about device directedness.\n\nOne key to natural turn-taking is handling barge-ins, or customer interruptions of Alexa’s output speech. When a customer barges in with a new request (“show me Italian restaurants instead”), Alexa knows to stop speaking and proceed with processing the new request.\n\nIn some cases of barge-in, Alexa also needs to know how far she got in her output speech, as that information could be useful to the dialogue manager. We call this scenario contextual barge-in. If, for instance, Alexa is returning a list of options after a customer request, and the customer interrupts to say, “That one”, Alexa knows that “that one” refers to whatever option Alexa was reading at the time of the barge-in.\n\nThis feature uses the difference between the time stamps of the commencement of the interrupted speech and the interruption itself to determine how far into the speech to look for a referent for the customer’s utterance. That information is passed to the [Alexa Conversations](https://www.amazon.science/blog/science-innovations-power-alexa-conversations-dialogue-management) dialogue manager, where it is used in determining the proper response to the customer utterance.\n\nWhen natural turn-taking launches, we also plan to beta-test a feature known as user pacing. User pacing relies on several different signals to determine whether a customer has finished speaking and whether he or she needs any additional prompting.\n\nThose signals include space fillers, such as “um” or “uh”; the lengthening of vowels, as in, “Let me seeee … ”; and incomplete utterances, such as “I think I’m going to go with”.\n\nWe are also investigating new techniques for inferring device directedness from the speech signal. Earlier this year, for instance, we [reported](https://www.amazon.science/blog/how-alexa-knows-when-youre-talking-to-her) a method that uses syntactic and semantic characteristics of customer utterances as well as the [acoustic characteristics](https://www.amazon.science/blog/alexa-do-i-need-to-use-your-wake-word-how-about-now) already employed by Follow-Up Mode.\n\n**More coverage of Alexa announcements**\n- [Interactive teaching by customers](https://www.amazon.science/blog/new-alexa-features-interactive-teaching-by-customers)\n- [Speaking style adaptation](https://www.amazon.science/blog/new-text-to-speech-generator-and-rephraser-move-alexa-toward-concept-to-speech)\n- [The science behind Echo Show 10](https://www.amazon.science/blog/the-science-behind-echo-show-10)\n\nABOUT THE AUTHOR\n\n#### **[Pradeep Natarajan](https://www.amazon.science/author/pradeep-natarajan)**\n\nPradeep Natarajan is a principal speech scientist in the Alexa AI organization.\n\n#### **[Arindam Mandal](https://www.amazon.science/author/arindam-mandal)**\n\nArindam Mandal is the director of dialogue services in the Alexa Natural Understanding group.\n\n#### **[Nikko Ström](https://www.amazon.science/author/nikko-strom)**\n\nNikko Ström is a vice president and distinguished scientist in the Alexa AI organization.","render":"Today in Seattle, Dave Limp, Amazon’s senior vice president for devices, unveiled the latest lineup of products and services from his organization. During the presentation, Rohit Prasad, Amazon vice president and Alexa head scientist, described three new advances from the Alexa science team. One of those is natural turn-taking.\nRead Alexa head scientist Rohit Prasad’s overview of today’s Alexa-related announcements <a href=\"https://blog.aboutamazon.com/devices/ai-advances-make-alexa-more-natural-conversational-and-useful\" target=\"_blank\">on Amazon’s Day One blog</a>.\nAlexa’s natural-turn-taking feature — which we plan to launch next year — will let customers interact with Alexa more naturally, without the need to repeat the wake word. The feature’s AI will be able to recognize when users have finished speaking, when their speech is directed at the device, when it’s not, and whether or not a reply is expected.\nNatural turn-taking builds on Alexa’s Follow-Up Mode, which uses acoustic cues to distinguish device-directed and non-device-directed speech. It adds other cues, such as visual information from devices with cameras. On-device algorithms process images from the camera, inferring from speakers’ body positions whether they are likely to be addressing Alexa.\nThe output of the computer vision algorithms is combined with the output of Alexa’s existing <a href=\"https://www.amazon.science/blog/alexa-do-i-need-to-use-your-wake-word-how-about-now\" target=\"_blank\">acoustic algorithm</a> for detecting device-directed speech and fed to an on-device fusion model, which determines device directedness. This approach can distinguish device-directed speech even when multiple speakers are interacting with each other and with Alexa.\n<img src=\"https://dev-media.amazoncloud.cn/37e85c0f437740798ae5580a9b7769a8_image.png\" alt=\"image.png\" />\nNatural turn-taking fuses audio and visual information to reach a final judgment about device directedness.\nOne key to natural turn-taking is handling barge-ins, or customer interruptions of Alexa’s output speech. When a customer barges in with a new request (“show me Italian restaurants instead”), Alexa knows to stop speaking and proceed with processing the new request.\nIn some cases of barge-in, Alexa also needs to know how far she got in her output speech, as that information could be useful to the dialogue manager. We call this scenario contextual barge-in. If, for instance, Alexa is returning a list of options after a customer request, and the customer interrupts to say, “That one”, Alexa knows that “that one” refers to whatever option Alexa was reading at the time of the barge-in.\nThis feature uses the difference between the time stamps of the commencement of the interrupted speech and the interruption itself to determine how far into the speech to look for a referent for the customer’s utterance. That information is passed to the <a href=\"https://www.amazon.science/blog/science-innovations-power-alexa-conversations-dialogue-management\" target=\"_blank\">Alexa Conversations</a> dialogue manager, where it is used in determining the proper response to the customer utterance.\nWhen natural turn-taking launches, we also plan to beta-test a feature known as user pacing. User pacing relies on several different signals to determine whether a customer has finished speaking and whether he or she needs any additional prompting.\nThose signals include space fillers, such as “um” or “uh”; the lengthening of vowels, as in, “Let me seeee … ”; and incomplete utterances, such as “I think I’m going to go with”.\nWe are also investigating new techniques for inferring device directedness from the speech signal. Earlier this year, for instance, we <a href=\"https://www.amazon.science/blog/how-alexa-knows-when-youre-talking-to-her\" target=\"_blank\">reported</a> a method that uses syntactic and semantic characteristics of customer utterances as well as the <a href=\"https://www.amazon.science/blog/alexa-do-i-need-to-use-your-wake-word-how-about-now\" target=\"_blank\">acoustic characteristics</a> already employed by Follow-Up Mode.\nMore coverage of Alexa announcements\n<ul>\n<li><a href=\"https://www.amazon.science/blog/new-alexa-features-interactive-teaching-by-customers\" target=\"_blank\">Interactive teaching by customers</a></li>\n<li><a href=\"https://www.amazon.science/blog/new-text-to-speech-generator-and-rephraser-move-alexa-toward-concept-to-speech\" target=\"_blank\">Speaking style adaptation</a></li>\n<li><a href=\"https://www.amazon.science/blog/the-science-behind-echo-show-10\" target=\"_blank\">The science behind Echo Show 10</a></li>\n</ul>\nABOUT THE AUTHOR\n<h4><a id=\"Pradeep_Natarajanhttpswwwamazonscienceauthorpradeepnatarajan_33\"></a><a href=\"https://www.amazon.science/author/pradeep-natarajan\" target=\"_blank\">Pradeep Natarajan</a></h4>\nPradeep Natarajan is a principal speech scientist in the Alexa AI organization.\n<h4><a id=\"Arindam_Mandalhttpswwwamazonscienceauthorarindammandal_37\"></a><a href=\"https://www.amazon.science/author/arindam-mandal\" target=\"_blank\">Arindam Mandal</a></h4>\nArindam Mandal is the director of dialogue services in the Alexa Natural Understanding group.\n<h4><a id=\"Nikko_Strmhttpswwwamazonscienceauthornikkostrom_41\"></a><a href=\"https://www.amazon.science/author/nikko-strom\" target=\"_blank\">Nikko Ström</a></h4>\nNikko Ström is a vice president and distinguished scientist in the Alexa AI organization.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家