A quick guide to Amazon’s 50-plus ICASSP papers

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Amazon researchers have more than 50 papers at this year’s International Conference on Acoustics, Speech, and Signal Processing (++[ICASSP](https://www.amazon.science/conferences-and-events/icassp-2022)++). A plurality of them are on automatic speech recognition and related topics, such as keyword spotting and speaker identification. But others range farther afield, to topics such as computer vision and federated learning.\n\n![image.png](https://dev-media.amazoncloud.cn/fb85afc72fca44acae450b85b99bdb01_image.png)\n\nThis year's ICASSP includes a virtual component, from May 7 to 13, and an in-person component in Singapore, May 22 to 27.\n\n#### **Acoustic-event detection**\n\n\n++[Federated self-supervised learning for acoustic event classification](https://www.amazon.science/publications/federated-self-supervised-learning-for-acoustic-event-classification)++\nMeng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang\n\n++[Improved representation learning for acoustic event classification using tree-structured ontology](https://www.amazon.science/publications/improved-representation-learning-for-acoustic-event-classification-using-tree-structured-ontology)++\nArman Zharmagambetov, Qingming Tang, Chieh-Chi Kao, Qin Zhang, Ming Sun, Viktor Rozgic, Jasha Droppo, Chao Wang\n\n++[WikiTAG: Wikipedia-based knowledge embeddings towards improved acoustic event classification](https://www.amazon.science/publications/wikitag-wikipedia-based-knowledge-embeddings-towards-improved-acoustic-event-classification)++\nQin Zhang, Qingming Tang, Chieh-Chi Kao, Ming Sun, Yang Liu, Chao Wang\n\n#### **Automatic speech recognition**\n\n\n++[A likelihood ratio-based domain adaptation method for end-to-end models](https://www.amazon.science/publications/a-likelihood-ratio-based-domain-adaptation-method-for-end-to-end-models)++\nChhavi Choudhury, Ankur Gandhe, Xiaohan Ding, Ivan Bulyko\n\n++[Being greedy does not hurt: Sampling strategies for end-to-end speech recognition](https://www.amazon.science/publications/being-greedy-does-note-hurt-sampling-strategies-for-end-to-end-speech-recognition)++\nJahn Heymann, Egor Lakomkin, Leif RādellJahn Heymann, Egor Lakomkin, Leif RādelJahn Heymann, Egor Lakomkin, Leif RādelJahn Heymann, Egor Lakomkin, Leif Rādel\n\n++[Caching networks: Capitalizing on common speech for ASR](https://www.amazon.science/publications/caching-networks-capitalizing-on-common-speech-for-asr)++\nAnastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris\n\n![image.png](https://dev-media.amazoncloud.cn/053218358bc94e05ac9d2581cb8c9bfa_image.png)\n\nIn \"LATTENTION: Lattice attention in ASR rescoring\", Amazon researchers show that applying an attention mechanism (colored grid) to a lattice encoding multiple automatic-speech-recognition (ASR) hypotheses improves ASR performance.\n\n++[Contextual adapters for personalized speech recognition in neural transducers](https://www.amazon.science/publications/contextual-adapters-for-personalized-speech-recognition-in-neural-transducers)++\nKanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann\n\n++[LATTENTION: Lattice attention in ASR rescoring](https://www.amazon.science/publications/lattention-lattice-attention-in-asr-rescoring)++\nPrabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe, Volker Leutnant\n\n++[Listen, know and spell: Knowledge-infused subword modeling for improving ASR performance of out-of-vocabulary (OOV) named entities](https://www.amazon.science/publications/listen-know-and-spell-knowledge-infused-subword-modeling-for-improving-asr-performance-of-out-of-vocabulary-oov-named-entities)++\nNilaksh Das, Monica Sunkara, Dhanush Bekal, Duen Horng Chau, Sravan Bodapati, Katrin Kirchhoff\n\n![image.png](https://dev-media.amazoncloud.cn/1b7c9cd279fb4d6eb86a71ec1264618f_image.png)\n\nIn \"Listen, know and spell: Knowledge-infused subword modeling for improving ASR performance of OOV named entities\", Amazon researchers show how to improve automatic speech recognition by incorporating information from knowledge graphs into the processing pipeline.\n\n++[Mitigating closed-model adversarial examples with Bayesian neural modeling for enhanced end-to-end speech recognition](https://www.amazon.science/publications/mitigating-closed-model-adversarial-examples-with-bayesian-neural-modeling-for-enhanced-end-to-end-speech-recognition)++\nChao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko\n\n++[Multi-modal pre-training for automated speech recognition](https://www.amazon.science/publications/multi-modal-pre-training-for-automated-speech-recognition)++\nDavid M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister\n\n++[Multi-turn RNN-T for streaming recognition of multi-party speech](https://www.amazon.science/publications/multi-turn-rnn-t-for-streaming-recognition-of-multi-party-speech)++\nIlya Sklyar, Anna Piunova, Xianrui Zheng, Yulan Liu\n\n++[RescoreBERT: Discriminative speech recognition rescoring with BERT](https://www.amazon.science/publications/rescorebert-discriminative-speech-recognition-rescoring-with-bert)++\nLiyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko\n\n++[USTED: Improving ASR with a unified speech and text encoder-decoder](https://www.amazon.science/publications/usted-improving-asr-with-a-unified-speech-and-text-encoder-decoder)++\nBolaji Yusuf, Ankur Gandhe, Alex Sokolov\n\n++[VADOI: Voice-activity-detection overlapping inference for end-to-end long-form speech recognition](https://www.amazon.science/publications/vadoi-voice-activity-detection-overlapping-inference-for-end-to-end-long-form-speech-recognition)++\nJinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas\n\n![image.png](https://dev-media.amazoncloud.cn/77917b277c67456c913d06297ee6ab6d_image.png)\n\nThe model used in \"Multi-turn RNN-T for streaming recognition of multi-party speech\" to disentangle overlapping speech in multi-party automatic speech recognition.\n\n#### **Computer vision**\n\n++[ASD-transformer: Efficient active speaker detection using self and multimodal transformers](https://www.amazon.science/publications/asd-transformer-efficient-active-speaker-detection-using-self-and-multimodal-transformers)++\nGourav Datta, Tyler Etchart, Vivek Yadav, Varsha Hedau, Pradeep Natarajan, Shih-Fu Chang\n\n++[Dynamically pruning SegFormer for efficient semantic segmentation](https://www.amazon.science/publications/dynamically-pruning-segformer-for-efficient-semantic-segmentation)++\nHaoli Bai, Hongda Mao, Dinesh Nair\n\n++[Enhancing contrastive learning with temporal cognizance for audio-visual representation generation](https://www.amazon.science/publications/enhancing-contrastive-learning-with-temporal-cognizance-for-audio-visual-representation-generation)++\nChandrashekhar Lavania, Shiva Sundaram, Sundararajan Srinivasan, Katrin Kirchhoff\n\n++[Few-shot gaze estimation with model offset predictors](https://www.amazon.science/publications/few-shot-gaze-estimation-with-model-offset-predictors)++\nJiawei Ma, Xu Zhang, Yue Wu, Varsha Hedau, Shih-Fu Chang\n\n++[Visual representation learning with self-supervised attention for low-label high-data regime](https://www.amazon.science/publications/visual-representation-learning-with-self-supervised-attention-for-low-label-high-data-regime)++\nPrarthana Bhattacharyya, Chenge Li, Xiaonan Zhao, István Fehérvári, Jason Sun\n\n#### **Federated learning**\n\n\n++[Federated learning challenges and opportunities: An outlook](https://www.amazon.science/publications/federated-learning-challenges-and-opportunities-an-outlook)++\nJie Ding, Eric Tramel, Anit Kumar Sahu, Shuang Wu, Salman Avestimehr, Tao Zhang\n\n![image.png](https://dev-media.amazoncloud.cn/212a7de334994eecb57b939fd99da790_image.png)\n\nThe federated-learning scenario considered in \"++[Federated learning challenges and opportunities: An outlook](https://www.amazon.science/publications/federated-learning-challenges-and-opportunities-an-outlook)++\".\n\n++[Learnings from federated learning in the real world](https://www.amazon.science/publications/learnings-from-federated-learning-in-the-real-world)++\nChristophe Dupuy, Tanya G. Roosta, Leo Long, Clement Chung, Rahul Gupta, Salman Avestimehr\n\n#### **Information retrieval**\n\n\n++[Contrastive knowledge graph attention network for request-based recipe recommendation](https://www.amazon.science/publications/contrastive-knowledge-graph-attention-network-for-request-based-recipe-recommendation)++\nXiyao Ma, Zheng Gao, Qian Hu, Mohamed Abdelhady\n\n#### **Keyword spotting**\n\n\n++[Unified speculation, detection, and verification keyword spotting](https://www.amazon.science/publications/unified-speculation-detection-and-verification-keyword-spotting)++\nGeng-shen Fu, Thibaud Senechal, Aaron Challenner, Tao Zhang\n\n#### **Machine translation**\n\n++[Isometric MT: Neural machine translation for automatic dubbing](https://www.amazon.science/publications/isometric-mt-neural-machine-translation-for-automatic-dubbing)++\nSurafel Melaku Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico\n\n#### **Natural-language understanding**\n\n++[ADVIN: Automatically discovering novel domains and intents from user text utterances](https://www.amazon.science/publications/advin-automatically-discovering-novel-domains-and-intents-from-user-text-utterances)++\nNikhita Vedula, Rahul Gupta, Aman Alok, Mukund Sridhar, Shankar Ananthakrishnan\n\n++[An efficient DP-SGD mechanism for large scale NLU models](https://www.amazon.science/publications/an-efficient-dp-sgd-mechanism-for-large-scale-nlu-models)++\nChristophe Dupuy, Radhika Arava, Rahul Gupta, Anna Rumshisky\n\n#### **Paralinguistics**\n\n++[Confidence estimation for speech emotion recognition based on the relationship between emotion categories and primitives](https://www.amazon.science/publications/confidence-estimation-for-speech-emotion-recognition-based-on-the-relationship-between-emotion-categories-and-primitives)++\nYang Li, Constantinos Papayiannis, Viktor Rozgic, Elizabeth Shriberg, Chao Wang\n\n++[Multi-lingual multi-task speech emotion recognition using wav2vec 2.0](https://www.amazon.science/publications/multi-lingual-multi-task-speech-emotion-recognition-using-wav2vec-2-0)++\nMayank Sharma\n\n++[Representation learning through cross-modal conditional teacher-student training for speech emotion recognition](https://www.amazon.science/publications/representation-learning-through-cross-modal-conditional-teacher-student-training-for-speech-emotion-recognition)++\nSundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff\n\n++[Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition](https://www.amazon.science/publications/sentiment-aware-automatic-speech-recognition-pre-training-for-enhanced-speech-emotion-recognition)++\nAyoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang\n\n#### **Personalization**\n\n++[Incremental user embedding modeling for personalized text classification](https://www.amazon.science/publications/incremental-user-embedding-modeling-for-personalized-text-classification)++\nRuixue Lian, Che-Wei Huang, Yuqing Tang, Qilong Gu, Chengyuan Ma, Chenlei (Edward) Guo\n\n#### **Signal processing**\n\n++[Deep adaptive AEC: Hybrid of deep learning and adaptive acoustic echo cancellation](https://www.amazon.science/publications/deep-adaptive-aec-hybrid-of-deep-learning-and-adaptive-acoustic-echo-cancellation)++\nHao Zhang, Srivatsan Kandadai, Harsha Rao, Minje Kim, Tarun Pruthi, Trausti Kristjansson\n\n++[Improved singing voice separation with chromagram-based pitch-aware remixing](https://www.amazon.science/publications/improved-singing-voice-separation-with-chromagram-based-pitch-aware-remixing)++\nSiyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy\n\n++[Sparse recovery of acoustic waves](https://www.amazon.science/publications/sparse-recovery-of-acoustic-waves)++\n++[Mohamed Mansour](https://www.amazon.science/author/mohamed-mansour)++\n\n++[Upmixing via style transfer: A variational autoencoder for disentangling spatial images and musical content](https://www.amazon.science/publications/upmixing-via-style-transfer-a-variational-autoencoder-for-disentangling-spatial-images-and-musical-content)++\nHaici Yang, Sanna Wager, Spencer Russell, Mike Luo, Minje Kim, Wontak Kim\n\n#### **Sound source localization**\n\n++[End-to-end Alexa device arbitration](https://www.amazon.science/publications/end-to-end-alexa-device-arbitration)++\nJarred Barber, Yifeng Fan, Tao Zhang\n\n#### **Speaker diarization/identification/verification**\n\n++[ASR-aware end-to-end neural diarization](https://www.amazon.science/publications/asr-aware-end-to-end-neural-diarization)++\nAparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke\n\n++[Improving fairness in speaker verification via group-adapted fusion network](https://www.amazon.science/publications/improving-fairness-in-speaker-verification-via-group-adapted-fusion-network)++\nHua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke\n\n++[OpenFEAT: Improving speaker identification by open-set few-shot embedding adaptation with Transformer](https://www.amazon.science/publications/openfeat-improving-speaker-indentification-by-open-set-few-shot-embedding-adaptation-with-transformer)++\nKishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee\n\n++[Self-supervised speaker recognition training using human-machine dialogues](https://www.amazon.science/publications/self-supervised-speaker-recognition-training-using-human-machine-dialogues)++\nMetehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow\n\n++[Self-supervised speaker verification with simple Siamese network and self-supervised regularization](https://www.amazon.science/publications/self-supervised-speaker-verification-with-simple-siamese-network-and-self-supervised-regularization)++\nMufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan\n\n#### **Spoken-language understanding**\n\n++[A neural prosody encoder for end-to-end dialogue act classification](https://www.amazon.science/publications/a-neural-prosody-encoder-for-end-to-end-dialogue-act-classification)++\nKai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Mueller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo\n\n++[Multi-task RNN-T with semantic decoder for streamable spoken language understanding](https://www.amazon.science/publications/multi-task-rnn-t-with-semantic-decoder-for-streamable-spoken-language-understanding)++\nXuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra\n\n++[Tie your embeddings down: Cross-modal latent spaces for end-to-end spoken language understanding](https://www.amazon.science/publications/tie-your-embeddings-down-cross-modal-latent-spaces-for-end-to-end-spoken-language-understanding)++\nBhuvan Agrawal, Markus Mueller, Samridhi Choudhary, Martin Radfar, Athanasios Mouchtaris, Ross McGowan, Nathan Susanj, Siegfried Kunzmann\n\n++[TINYS2I: A small-footprint utterance classification model with contextual support for on-device SLU](https://www.amazon.science/publications/tinys2i-a-small-footprint-utterance-classification-model-with-contextual-support-for-on-device-slu)++\nAnastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Pavel Kveton, Jon Webb, Athanasios Mouchtaris\n\n#### **Text-to-speech**\n\n++[Cross-speaker style transfer for text-to-speech using data augmentation](https://www.amazon.science/publications/cross-speaker-style-transfer-for-text-to-speech-using-data-augmentation)++\nManuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba\n\n++[Distribution augmentation for low-resource expressive text-to-speech](https://www.amazon.science/publications/distribution-augmentation-for-low-resource-expressive-text-to-speech)++\nMateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova\n\n++[Duration modeling of neural TTS for automatic dubbing](https://www.amazon.science/publications/duration-modeling-of-neural-tts-for-automatic-dubbing)++\nJohanes Effendi, Yogesh Virkar, Roberto Barra-Chicote, Marcello Federico\n\n++[Neural speech synthesis on a shoestring: Improving the efficiency of LPCNET](https://www.amazon.science/publications/neural-speech-synthesis-on-a-shoestring-improving-the-efficiency-of-lpcnet)++\nJean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy\n\n++[Text-free non-parallel many-to-many voice conversion using normalising flows](https://www.amazon.science/publications/text-free-non-parallel-many-to-many-voice-conversion-using-normalising-flows)++\nThomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa\n\n++[VoiceFilter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module](https://www.amazon.science/publications/voicefilter-few-shot-text-to-speech-speaker-adaptation-using-voice-conversion-as-a-post-processing-module)++\nAdam Gabrys, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba\n\n#### **Time series forecasting**\n\n++[Robust nonparametric distribution forecast with backtest-based bootstrap and adaptive residual selection](https://www.amazon.science/publications/robust-nonparametric-distribution-forecast-with-backtest-based-bootstrap-and-adaptive-residual-selection)++\nLongshaokan Marshall Wang, Lingda Wang, Mina Georgieva, Paulo Machado, Abinaya Ulagappa, Safwan Ahmed, Yan Lu, Arjun Bakshi, Farhad Ghassemi\n\nABOUT THE AUTHOR\n\n#### **Staff writer**","render":"Amazon researchers have more than 50 papers at this year’s International Conference on Acoustics, Speech, and Signal Processing (<ins><a href=\\"https://www.amazon.science/conferences-and-events/icassp-2022\\" target=\\"_blank\\">ICASSP</a></ins>). A plurality of them are on automatic speech recognition and related topics, such as keyword spotting and speaker identification. But others range farther afield, to topics such as computer vision and federated learning.\n<img src=\\"https://dev-media.amazoncloud.cn/fb85afc72fca44acae450b85b99bdb01_image.png\\" alt=\\"image.png\\" />\nThis year’s ICASSP includes a virtual component, from May 7 to 13, and an in-person component in Singapore, May 22 to 27.\n<h4><a id=\\"Acousticevent_detection_6\\"></a>Acoustic-event detection</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/federated-self-supervised-learning-for-acoustic-event-classification\\" target=\\"_blank\\">Federated self-supervised learning for acoustic event classification</a></ins> \\nMeng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang\n<ins><a href=\\"https://www.amazon.science/publications/improved-representation-learning-for-acoustic-event-classification-using-tree-structured-ontology\\" target=\\"_blank\\">Improved representation learning for acoustic event classification using tree-structured ontology</a></ins> \\nArman Zharmagambetov, Qingming Tang, Chieh-Chi Kao, Qin Zhang, Ming Sun, Viktor Rozgic, Jasha Droppo, Chao Wang\n<ins><a href=\\"https://www.amazon.science/publications/wikitag-wikipedia-based-knowledge-embeddings-towards-improved-acoustic-event-classification\\" target=\\"_blank\\">WikiTAG: Wikipedia-based knowledge embeddings towards improved acoustic event classification</a></ins> \\nQin Zhang, Qingming Tang, Chieh-Chi Kao, Ming Sun, Yang Liu, Chao Wang\n<h4><a id=\\"Automatic_speech_recognition_18\\"></a>Automatic speech recognition</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/a-likelihood-ratio-based-domain-adaptation-method-for-end-to-end-models\\" target=\\"_blank\\">A likelihood ratio-based domain adaptation method for end-to-end models</a></ins> \\nChhavi Choudhury, Ankur Gandhe, Xiaohan Ding, Ivan Bulyko\n<ins><a href=\\"https://www.amazon.science/publications/being-greedy-does-note-hurt-sampling-strategies-for-end-to-end-speech-recognition\\" target=\\"_blank\\">Being greedy does not hurt: Sampling strategies for end-to-end speech recognition</a></ins> \\nJahn Heymann, Egor Lakomkin, Leif RādellJahn Heymann, Egor Lakomkin, Leif RādelJahn Heymann, Egor Lakomkin, Leif RādelJahn Heymann, Egor Lakomkin, Leif Rādel\n<ins><a href=\\"https://www.amazon.science/publications/caching-networks-capitalizing-on-common-speech-for-asr\\" target=\\"_blank\\">Caching networks: Capitalizing on common speech for ASR</a></ins> \\nAnastasios Alexandridis, Grant P. Strimel, Ariya Rastrow, Pavel Kveton, Jon Webb, Maurizio Omologo, Siegfried Kunzmann, Athanasios Mouchtaris\n<img src=\\"https://dev-media.amazoncloud.cn/053218358bc94e05ac9d2581cb8c9bfa_image.png\\" alt=\\"image.png\\" />\nIn “LATTENTION: Lattice attention in ASR rescoring”, Amazon researchers show that applying an attention mechanism (colored grid) to a lattice encoding multiple automatic-speech-recognition (ASR) hypotheses improves ASR performance.\n<ins><a href=\\"https://www.amazon.science/publications/contextual-adapters-for-personalized-speech-recognition-in-neural-transducers\\" target=\\"_blank\\">Contextual adapters for personalized speech recognition in neural transducers</a></ins> \\nKanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann\n<ins><a href=\\"https://www.amazon.science/publications/lattention-lattice-attention-in-asr-rescoring\\" target=\\"_blank\\">LATTENTION: Lattice attention in ASR rescoring</a></ins> \\nPrabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe, Volker Leutnant\n<ins><a href=\\"https://www.amazon.science/publications/listen-know-and-spell-knowledge-infused-subword-modeling-for-improving-asr-performance-of-out-of-vocabulary-oov-named-entities\\" target=\\"_blank\\">Listen, know and spell: Knowledge-infused subword modeling for improving ASR performance of out-of-vocabulary (OOV) named entities</a></ins> \\nNilaksh Das, Monica Sunkara, Dhanush Bekal, Duen Horng Chau, Sravan Bodapati, Katrin Kirchhoff\n<img src=\\"https://dev-media.amazoncloud.cn/1b7c9cd279fb4d6eb86a71ec1264618f_image.png\\" alt=\\"image.png\\" />\nIn “Listen, know and spell: Knowledge-infused subword modeling for improving ASR performance of OOV named entities”, Amazon researchers show how to improve automatic speech recognition by incorporating information from knowledge graphs into the processing pipeline.\n<ins><a href=\\"https://www.amazon.science/publications/mitigating-closed-model-adversarial-examples-with-bayesian-neural-modeling-for-enhanced-end-to-end-speech-recognition\\" target=\\"_blank\\">Mitigating closed-model adversarial examples with Bayesian neural modeling for enhanced end-to-end speech recognition</a></ins> \\nChao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko\n<ins><a href=\\"https://www.amazon.science/publications/multi-modal-pre-training-for-automated-speech-recognition\\" target=\\"_blank\\">Multi-modal pre-training for automated speech recognition</a></ins> \\nDavid M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister\n<ins><a href=\\"https://www.amazon.science/publications/multi-turn-rnn-t-for-streaming-recognition-of-multi-party-speech\\" target=\\"_blank\\">Multi-turn RNN-T for streaming recognition of multi-party speech</a></ins> \\nIlya Sklyar, Anna Piunova, Xianrui Zheng, Yulan Liu\n<ins><a href=\\"https://www.amazon.science/publications/rescorebert-discriminative-speech-recognition-rescoring-with-bert\\" target=\\"_blank\\">RescoreBERT: Discriminative speech recognition rescoring with BERT</a></ins> \\nLiyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko\n<ins><a href=\\"https://www.amazon.science/publications/usted-improving-asr-with-a-unified-speech-and-text-encoder-decoder\\" target=\\"_blank\\">USTED: Improving ASR with a unified speech and text encoder-decoder</a></ins> \\nBolaji Yusuf, Ankur Gandhe, Alex Sokolov\n<ins><a href=\\"https://www.amazon.science/publications/vadoi-voice-activity-detection-overlapping-inference-for-end-to-end-long-form-speech-recognition\\" target=\\"_blank\\">VADOI: Voice-activity-detection overlapping inference for end-to-end long-form speech recognition</a></ins> \\nJinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas\n<img src=\\"https://dev-media.amazoncloud.cn/77917b277c67456c913d06297ee6ab6d_image.png\\" alt=\\"image.png\\" />\nThe model used in “Multi-turn RNN-T for streaming recognition of multi-party speech” to disentangle overlapping speech in multi-party automatic speech recognition.\n<h4><a id=\\"Computer_vision_69\\"></a>Computer vision</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/asd-transformer-efficient-active-speaker-detection-using-self-and-multimodal-transformers\\" target=\\"_blank\\">ASD-transformer: Efficient active speaker detection using self and multimodal transformers</a></ins> \\nGourav Datta, Tyler Etchart, Vivek Yadav, Varsha Hedau, Pradeep Natarajan, Shih-Fu Chang\n<ins><a href=\\"https://www.amazon.science/publications/dynamically-pruning-segformer-for-efficient-semantic-segmentation\\" target=\\"_blank\\">Dynamically pruning SegFormer for efficient semantic segmentation</a></ins> \\nHaoli Bai, Hongda Mao, Dinesh Nair\n<ins><a href=\\"https://www.amazon.science/publications/enhancing-contrastive-learning-with-temporal-cognizance-for-audio-visual-representation-generation\\" target=\\"_blank\\">Enhancing contrastive learning with temporal cognizance for audio-visual representation generation</a></ins> \\nChandrashekhar Lavania, Shiva Sundaram, Sundararajan Srinivasan, Katrin Kirchhoff\n<ins><a href=\\"https://www.amazon.science/publications/few-shot-gaze-estimation-with-model-offset-predictors\\" target=\\"_blank\\">Few-shot gaze estimation with model offset predictors</a></ins> \\nJiawei Ma, Xu Zhang, Yue Wu, Varsha Hedau, Shih-Fu Chang\n<ins><a href=\\"https://www.amazon.science/publications/visual-representation-learning-with-self-supervised-attention-for-low-label-high-data-regime\\" target=\\"_blank\\">Visual representation learning with self-supervised attention for low-label high-data regime</a></ins> \\nPrarthana Bhattacharyya, Chenge Li, Xiaonan Zhao, István Fehérvári, Jason Sun\n<h4><a id=\\"Federated_learning_86\\"></a>Federated learning</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/federated-learning-challenges-and-opportunities-an-outlook\\" target=\\"_blank\\">Federated learning challenges and opportunities: An outlook</a></ins> \\nJie Ding, Eric Tramel, Anit Kumar Sahu, Shuang Wu, Salman Avestimehr, Tao Zhang\n<img src=\\"https://dev-media.amazoncloud.cn/212a7de334994eecb57b939fd99da790_image.png\\" alt=\\"image.png\\" />\nThe federated-learning scenario considered in “<ins><a href=\\"https://www.amazon.science/publications/federated-learning-challenges-and-opportunities-an-outlook\\" target=\\"_blank\\">Federated learning challenges and opportunities: An outlook</a></ins>”.\n<ins><a href=\\"https://www.amazon.science/publications/learnings-from-federated-learning-in-the-real-world\\" target=\\"_blank\\">Learnings from federated learning in the real world</a></ins> \\nChristophe Dupuy, Tanya G. Roosta, Leo Long, Clement Chung, Rahul Gupta, Salman Avestimehr\n<h4><a id=\\"Information_retrieval_99\\"></a>Information retrieval</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/contrastive-knowledge-graph-attention-network-for-request-based-recipe-recommendation\\" target=\\"_blank\\">Contrastive knowledge graph attention network for request-based recipe recommendation</a></ins> \\nXiyao Ma, Zheng Gao, Qian Hu, Mohamed Abdelhady\n<h4><a id=\\"Keyword_spotting_105\\"></a>Keyword spotting</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/unified-speculation-detection-and-verification-keyword-spotting\\" target=\\"_blank\\">Unified speculation, detection, and verification keyword spotting</a></ins> \\nGeng-shen Fu, Thibaud Senechal, Aaron Challenner, Tao Zhang\n<h4><a id=\\"Machine_translation_111\\"></a>Machine translation</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/isometric-mt-neural-machine-translation-for-automatic-dubbing\\" target=\\"_blank\\">Isometric MT: Neural machine translation for automatic dubbing</a></ins> \\nSurafel Melaku Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico\n<h4><a id=\\"Naturallanguage_understanding_116\\"></a>Natural-language understanding</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/advin-automatically-discovering-novel-domains-and-intents-from-user-text-utterances\\" target=\\"_blank\\">ADVIN: Automatically discovering novel domains and intents from user text utterances</a></ins> \\nNikhita Vedula, Rahul Gupta, Aman Alok, Mukund Sridhar, Shankar Ananthakrishnan\n<ins><a href=\\"https://www.amazon.science/publications/an-efficient-dp-sgd-mechanism-for-large-scale-nlu-models\\" target=\\"_blank\\">An efficient DP-SGD mechanism for large scale NLU models</a></ins> \\nChristophe Dupuy, Radhika Arava, Rahul Gupta, Anna Rumshisky\n<h4><a id=\\"Paralinguistics_124\\"></a>Paralinguistics</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/confidence-estimation-for-speech-emotion-recognition-based-on-the-relationship-between-emotion-categories-and-primitives\\" target=\\"_blank\\">Confidence estimation for speech emotion recognition based on the relationship between emotion categories and primitives</a></ins> \\nYang Li, Constantinos Papayiannis, Viktor Rozgic, Elizabeth Shriberg, Chao Wang\n<ins><a href=\\"https://www.amazon.science/publications/multi-lingual-multi-task-speech-emotion-recognition-using-wav2vec-2-0\\" target=\\"_blank\\">Multi-lingual multi-task speech emotion recognition using wav2vec 2.0</a></ins> \\nMayank Sharma\n<ins><a href=\\"https://www.amazon.science/publications/representation-learning-through-cross-modal-conditional-teacher-student-training-for-speech-emotion-recognition\\" target=\\"_blank\\">Representation learning through cross-modal conditional teacher-student training for speech emotion recognition</a></ins> \\nSundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff\n<ins><a href=\\"https://www.amazon.science/publications/sentiment-aware-automatic-speech-recognition-pre-training-for-enhanced-speech-emotion-recognition\\" target=\\"_blank\\">Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition</a></ins> \\nAyoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang\n<h4><a id=\\"Personalization_138\\"></a>Personalization</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/incremental-user-embedding-modeling-for-personalized-text-classification\\" target=\\"_blank\\">Incremental user embedding modeling for personalized text classification</a></ins> \\nRuixue Lian, Che-Wei Huang, Yuqing Tang, Qilong Gu, Chengyuan Ma, Chenlei (Edward) Guo\n<h4><a id=\\"Signal_processing_143\\"></a>Signal processing</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/deep-adaptive-aec-hybrid-of-deep-learning-and-adaptive-acoustic-echo-cancellation\\" target=\\"_blank\\">Deep adaptive AEC: Hybrid of deep learning and adaptive acoustic echo cancellation</a></ins> \\nHao Zhang, Srivatsan Kandadai, Harsha Rao, Minje Kim, Tarun Pruthi, Trausti Kristjansson\n<ins><a href=\\"https://www.amazon.science/publications/improved-singing-voice-separation-with-chromagram-based-pitch-aware-remixing\\" target=\\"_blank\\">Improved singing voice separation with chromagram-based pitch-aware remixing</a></ins> \\nSiyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy\n<ins><a href=\\"https://www.amazon.science/publications/sparse-recovery-of-acoustic-waves\\" target=\\"_blank\\">Sparse recovery of acoustic waves</a></ins> \\n<ins><a href=\\"https://www.amazon.science/author/mohamed-mansour\\" target=\\"_blank\\">Mohamed Mansour</a></ins>\n<ins><a href=\\"https://www.amazon.science/publications/upmixing-via-style-transfer-a-variational-autoencoder-for-disentangling-spatial-images-and-musical-content\\" target=\\"_blank\\">Upmixing via style transfer: A variational autoencoder for disentangling spatial images and musical content</a></ins> \\nHaici Yang, Sanna Wager, Spencer Russell, Mike Luo, Minje Kim, Wontak Kim\n<h4><a id=\\"Sound_source_localization_157\\"></a>Sound source localization</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/end-to-end-alexa-device-arbitration\\" target=\\"_blank\\">End-to-end Alexa device arbitration</a></ins> \\nJarred Barber, Yifeng Fan, Tao Zhang\n<h4><a id=\\"Speaker_diarizationidentificationverification_162\\"></a>Speaker diarization/identification/verification</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/asr-aware-end-to-end-neural-diarization\\" target=\\"_blank\\">ASR-aware end-to-end neural diarization</a></ins> \\nAparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke\n<ins><a href=\\"https://www.amazon.science/publications/improving-fairness-in-speaker-verification-via-group-adapted-fusion-network\\" target=\\"_blank\\">Improving fairness in speaker verification via group-adapted fusion network</a></ins> \\nHua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke\n<ins><a href=\\"https://www.amazon.science/publications/openfeat-improving-speaker-indentification-by-open-set-few-shot-embedding-adaptation-with-transformer\\" target=\\"_blank\\">OpenFEAT: Improving speaker identification by open-set few-shot embedding adaptation with Transformer</a></ins> \\nKishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee\n<ins><a href=\\"https://www.amazon.science/publications/self-supervised-speaker-recognition-training-using-human-machine-dialogues\\" target=\\"_blank\\">Self-supervised speaker recognition training using human-machine dialogues</a></ins> \\nMetehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow\n<ins><a href=\\"https://www.amazon.science/publications/self-supervised-speaker-verification-with-simple-siamese-network-and-self-supervised-regularization\\" target=\\"_blank\\">Self-supervised speaker verification with simple Siamese network and self-supervised regularization</a></ins> \\nMufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan\n<h4><a id=\\"Spokenlanguage_understanding_179\\"></a>Spoken-language understanding</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/a-neural-prosody-encoder-for-end-to-end-dialogue-act-classification\\" target=\\"_blank\\">A neural prosody encoder for end-to-end dialogue act classification</a></ins> \\nKai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Mueller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo\n<ins><a href=\\"https://www.amazon.science/publications/multi-task-rnn-t-with-semantic-decoder-for-streamable-spoken-language-understanding\\" target=\\"_blank\\">Multi-task RNN-T with semantic decoder for streamable spoken language understanding</a></ins> \\nXuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra\n<ins><a href=\\"https://www.amazon.science/publications/tie-your-embeddings-down-cross-modal-latent-spaces-for-end-to-end-spoken-language-understanding\\" target=\\"_blank\\">Tie your embeddings down: Cross-modal latent spaces for end-to-end spoken language understanding</a></ins> \\nBhuvan Agrawal, Markus Mueller, Samridhi Choudhary, Martin Radfar, Athanasios Mouchtaris, Ross McGowan, Nathan Susanj, Siegfried Kunzmann\n<ins><a href=\\"https://www.amazon.science/publications/tinys2i-a-small-footprint-utterance-classification-model-with-contextual-support-for-on-device-slu\\" target=\\"_blank\\">TINYS2I: A small-footprint utterance classification model with contextual support for on-device SLU</a></ins> \\nAnastasios Alexandridis, Kanthashree Mysore Sathyendra, Grant P. Strimel, Pavel Kveton, Jon Webb, Athanasios Mouchtaris\n<h4><a id=\\"Texttospeech_193\\"></a>Text-to-speech</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/cross-speaker-style-transfer-for-text-to-speech-using-data-augmentation\\" target=\\"_blank\\">Cross-speaker style transfer for text-to-speech using data augmentation</a></ins> \\nManuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba\n<ins><a href=\\"https://www.amazon.science/publications/distribution-augmentation-for-low-resource-expressive-text-to-speech\\" target=\\"_blank\\">Distribution augmentation for low-resource expressive text-to-speech</a></ins> \\nMateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova\n<ins><a href=\\"https://www.amazon.science/publications/duration-modeling-of-neural-tts-for-automatic-dubbing\\" target=\\"_blank\\">Duration modeling of neural TTS for automatic dubbing</a></ins> \\nJohanes Effendi, Yogesh Virkar, Roberto Barra-Chicote, Marcello Federico\n<ins><a href=\\"https://www.amazon.science/publications/neural-speech-synthesis-on-a-shoestring-improving-the-efficiency-of-lpcnet\\" target=\\"_blank\\">Neural speech synthesis on a shoestring: Improving the efficiency of LPCNET</a></ins> \\nJean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy\n<ins><a href=\\"https://www.amazon.science/publications/text-free-non-parallel-many-to-many-voice-conversion-using-normalising-flows\\" target=\\"_blank\\">Text-free non-parallel many-to-many voice conversion using normalising flows</a></ins> \\nThomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa\n<ins><a href=\\"https://www.amazon.science/publications/voicefilter-few-shot-text-to-speech-speaker-adaptation-using-voice-conversion-as-a-post-processing-module\\" target=\\"_blank\\">VoiceFilter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module</a></ins> \\nAdam Gabrys, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba\n<h4><a id=\\"Time_series_forecasting_213\\"></a>Time series forecasting</h4>\\n<ins><a href=\\"https://www.amazon.science/publications/robust-nonparametric-distribution-forecast-with-backtest-based-bootstrap-and-adaptive-residual-selection\\" target=\\"_blank\\">Robust nonparametric distribution forecast with backtest-based bootstrap and adaptive residual selection</a></ins> \\nLongshaokan Marshall Wang, Lingda Wang, Mina Georgieva, Paulo Machado, Abinaya Ulagappa, Safwan Ahmed, Yan Lu, Arjun Bakshi, Farhad Ghassemi\nABOUT THE AUTHOR\n<h4><a id=\\"Staff_writer_220\\"></a>Staff writer</h4>\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家