3 questions with Jeremy Holleman: How to design and develop ultra-low-power AI processors

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"![image.png](https://dev-media.amazoncloud.cn/c37a8ed2e8914f57856da995583caba7_image.png)\n\nSyntiant's NDP architecture is built from the ground up to run deep learning algorithms. The company says its NDP101 neural decision processor achieves breakthrough performance by coupling computation and memory, and exploiting the inherent parallelism of deep learning and computing at only required numerical precision.\nCREDIT: SYNTIANT\n\nEditor’s Note: This article is the latest installment within a series Amazon Science is publishing related to the science behind products and services from companies in which Amazon has invested. Syntiant, founded in 2017, has shipped more than 10 million units to customers worldwide, and has obtained $65 million in funding from leading technology companies, including the Amazon [Alexa Fund](https://developer.amazon.com/en-US/alexa/alexa-startups/alexa-fund).\n\nIn late July, Amazon held its [Alexa Live event](https://developer.amazon.com/en-US/alexa/alexa-live), where the company introduced more than 50 features to help developers and device makers build ambient voice-computing experiences, and drive the growth of voice computing.\n\nThe event included an Amazon Alexa Startups Showcase in which [Syntiant](https://www.syntiant.com/), a semiconductor company founded in 2017, and based in Irvine, California, shared its vision for making voice the computing interface of the future. \n\n In 2017, [Kurt Busch](https://www.linkedin.com/in/kfbusch/), Syntiant’s chief executive officer, and [Jeremy Holleman](https://scholar.google.com/citations?user=kWTef0wAAAAJ&hl=en), Syntiant’s chief scientist, and a professor of electrical and computer engineering at the University of North Carolina at Charlotte, were focused on finding an answer to the question: How do you optimize the performance of machine learning models on power- and cost-constrained hardware?\n\nAccording to Syntiant, they — and other members of Syntiant’s veteran management team — had the idea for a processor architecture that could deliver 200 times the efficiency, 20 times the performance, and at half the cost of existing edge processors. One key to their approach — optimizing for memory access versus traditional processors’ focus on logic.\n\n![image.png](https://dev-media.amazoncloud.cn/c99622766e214a2895b13de0663ee115_image.png)\n\nJeremy Holleman is Syntiant's chief scientist, and a professor of electrical and computer engineering at the University of North Carolina at Charlotte.\n\nThis insight, and others, led them to the formation of Syntiant, which for the past four years has been designing and developing ultra-low-power, high-performance, deep neural network processors for computing at the network’s edge, helping to reduce latency, and increase the privacy and security of power- and cost-constrained applications running on devices as small as earbuds, and as large as automobiles.\n\nSyntiant’s processors enable always-on voice (AOV) control for most battery-powered devices, from cell phones and earbuds, to drones, laptops and other voice-activated products. The company’s Neural Decision Processors (NDPs) provide highly accurate wake word, command word and event detection in a tiny package with near-zero power consumption.\n\n<video src=\"https://dev-media.amazoncloud.cn/a196cc6e9fe54243b28bc4b92cb70a68_Syntiant%20CEO%20on%20the%20future%20of%20ambient%20computing%20%EF%BD%9C%20Amazon%20Alexa%20Startups%20Showcase.mp4\" class=\"manvaVedio\" controls=\"controls\" style=\"width:160px;height:160px\"></video>\n\n#### **Syntiant CEO on the future of ambient computing**\n\nDuring the Amazon Alexa Startups Showcase, Kurt Busch, CEO of Syntiant, an Alexa Fund company, explained how they're using the latest in voice technology to invent the future of ambient computing, and why he thinks voice will be the next user interface.\n\nHolleman is considered a leading authority on ultra-low-power integrated circuits, and directs the [Integrated Silicon Systems Laboratory](https://coefs.uncc.edu/jhollem3/) at the University of North Carolina, Charlotte, where he is an associate professor. He’s also is a coauthor of the book “[Ultra Low-Power Integrated Circuit Design for Wireless Neural Interfaces](https://www.amazon.com/Low-Power-Integrated-Circuit-Wireless-Interfaces-dp-1441967265/dp/1441967265/ref=mt_other?_encoding=UTF8&me=&qid=1624477986)”, which was first published in 2011.\n\nAmazon Science asked Holleman three questions about the challenges of designing and developing ultra-low-power AI processors, and why he believes voice will become the predominant user interface of the future.\n\nQ. You are one of 22 authors on a paper, \"MLPerf Tiny Benchmark\", which has been accepted to the NeurIPS 2021 Conference. What does this benchmark suite comprise, and why is it significant to the tinyML field?\n\nThe [MLPerf Tiny Benchmark](https://arxiv.org/abs/2106.07597) actually includes four tests meant to measure the performance and efficiency of very small devices on ML inference: keyword spotting, person detection, image recognition, and anomaly detection. For each test, there is a reference model, and code to measure the latency and power on a reference platform.\n\nI try to think about the benchmark from the standpoint of a system developer – someone building a device that needs some local intelligence. They have to figure out, with a given energy budget and system requirements, what solution is going to work for them. So they need to understand the power consumption and speed of different hardware. When you look at most of the information available, everyone measures their hardware on different things, so it’s really hard to compare. The benchmark makes it clear exactly what is being measured and – in the closed division – every submission is running the exact same model, so it’s a clear apples-to-apples comparison.\n\nThen the open division takes the same principle – every submission does the same thing – but allows for some different tradeoffs by just defining the problem and allowing submitters to run different models that may take advantage of particular aspects of their hardware. So you wind up with a Pareto surface of accuracy, power, and speed. I think this last part is particularly important in the “tiny” space because there is a lot of room to jointly optimize models, hardware, and features to get high-performing and high-efficiency end-to-end systems.\n\n#### **Q. What do you consider Syntiant’s key ingredients in your development and design of ultra-low-power AI processors, and how will your team’s work contribute to voice becoming the predominant user interface of the future?**\n\nI would say there are two major elements that have been key to our success. The first is, as I mentioned before, that edge ML requires tight coupling between the hardware and the algorithms. From the very beginning at Syntiant, we’ve had our silicon designers and our modelers working closely together. That shows up in office arrangement, with hardware and software groups all intermingled; in code and design reviews, really all across the company. And I think that’s paid off in outcomes. We see how easy it is to map a given algorithm to our hardware, because the hardware was designed to do all the hard work of coordinating memory access in a way that’s optimized for exactly the types of computation we see in ML workloads. And for the same reason, we see the benefits of that approach in power and performance.\n\nThe second big piece is that we realized that deep learning is still such a new field that the expertise required to deliver production-grade solutions is still very rare. It’s easy enough to download an MNIST or CIFAR demo, train it up and you think, “I’ve got this figured out!” But when you deploy a device to millions of people who interact with it on a daily basis, the job becomes much harder. You need to acquire data, validate it, debug models, and it’s a big job. We knew that for most customers, we couldn’t just toss a piece of silicon over the fence and leave the rest to them. That led us to put a lot of effort into building a complete pipeline addressing the data tasks, training, and evaluation, so we can provide a complete solution to customers who don’t have a ton of ML expertise in house.\n\n#### **Q. What in particular makes edge processing difficult?**\n\nOn the hardware side, the big challenges are power and cost. Whether you’re talking about a watch, an earbud, or a phone, consumers have some pretty hard requirements for how long a battery needs to last – generally a day – and how much they will pay for something. And on the modeling side, edge devices find themselves in a tremendously diverse set of environments, so you need a voice assistant to recognize you not just in the kitchen or in the car, but on a factory floor, at a football game, and everywhere else you can imagine going.\n\nThen those three things push against each other like the classical balloon analogy. If you push down cost by choosing a lower-end processor, it may not have the throughput to run the model quickly, so you run at a lower frame rate, under-sampling the input signal, and you miss events. Or you find a model that works well, and you run it fast enough, but then the power required to run it limits battery life. This tradeoff is especially difficult for features that are always on, like a wakeword detector, or person detection in a security camera. At Syntiant, we had to address all of these issues simultaneously, which is why it was so important to have all of our teams tightly connected, work through the use cases, and know how each piece affected all the other pieces.\n\nHaving done that work, the result is that you get the power of modern ML in tiny devices with almost no impact on the battery life. And the possibilities, especially for voice interfaces, is very exciting. We’ve all grown accustomed to interacting with our phone by voice and we’ve seen how often we want to do something but don’t have a free hand available for a tactile interface.\n\nSyntiant’s technology is making it possible to bring that experience to smaller and cheaper devices with all of the processing happening locally. So many of the devices we use have useful information they can’t share with us because the interface would be too expensive. Imagine being able to say “TV remote, where are you?” or “Smoke alarm, why are you beeping?” and getting a clear and quick answer. We’ve forgotten that some annoying things we’ve gotten so used to can be fixed. And of course you don’t want all of the cost and the privacy concerns associated with sending all of that information to the cloud.\n\n#### **Conventional general-purpose processors don’t have the efficiency to run strong models within the constraints that edge devices have. With our new architecture, powerful machine learning can be deployed practically anywhere for the first time.**\n\nJeremy Holleman\n\nSo we’re focused on putting that level of intelligence right in the device. To deliver that, we need all of these pieces to come together: the data pipeline, the models, and the hardware. Conventional general-purpose processors don’t have the efficiency to run strong models within the constraints that edge devices have. With our new architecture, powerful machine learning can be deployed practically anywhere for the first time.\n\nABOUT THE AUTHOR\n\n#### **Staff writer**","render":"<img src=\"https://dev-media.amazoncloud.cn/c37a8ed2e8914f57856da995583caba7_image.png\" alt=\"image.png\" />\nSyntiant’s NDP architecture is built from the ground up to run deep learning algorithms. The company says its NDP101 neural decision processor achieves breakthrough performance by coupling computation and memory, and exploiting the inherent parallelism of deep learning and computing at only required numerical precision. \nCREDIT: SYNTIANT\nEditor’s Note: This article is the latest installment within a series Amazon Science is publishing related to the science behind products and services from companies in which Amazon has invested. Syntiant, founded in 2017, has shipped more than 10 million units to customers worldwide, and has obtained $65 million in funding from leading technology companies, including the Amazon <a href=\"https://developer.amazon.com/en-US/alexa/alexa-startups/alexa-fund\" target=\"_blank\">Alexa Fund</a>.\nIn late July, Amazon held its <a href=\"https://developer.amazon.com/en-US/alexa/alexa-live\" target=\"_blank\">Alexa Live event</a>, where the company introduced more than 50 features to help developers and device makers build ambient voice-computing experiences, and drive the growth of voice computing.\nThe event included an Amazon Alexa Startups Showcase in which <a href=\"https://www.syntiant.com/\" target=\"_blank\">Syntiant</a>, a semiconductor company founded in 2017, and based in Irvine, California, shared its vision for making voice the computing interface of the future.\nIn 2017, <a href=\"https://www.linkedin.com/in/kfbusch/\" target=\"_blank\">Kurt Busch</a>, Syntiant’s chief executive officer, and <a href=\"https://scholar.google.com/citations?user=kWTef0wAAAAJ&hl=en\" target=\"_blank\">Jeremy Holleman</a>, Syntiant’s chief scientist, and a professor of electrical and computer engineering at the University of North Carolina at Charlotte, were focused on finding an answer to the question: How do you optimize the performance of machine learning models on power- and cost-constrained hardware?\nAccording to Syntiant, they — and other members of Syntiant’s veteran management team — had the idea for a processor architecture that could deliver 200 times the efficiency, 20 times the performance, and at half the cost of existing edge processors. One key to their approach — optimizing for memory access versus traditional processors’ focus on logic.\n<img src=\"https://dev-media.amazoncloud.cn/c99622766e214a2895b13de0663ee115_image.png\" alt=\"image.png\" />\nJeremy Holleman is Syntiant’s chief scientist, and a professor of electrical and computer engineering at the University of North Carolina at Charlotte.\nThis insight, and others, led them to the formation of Syntiant, which for the past four years has been designing and developing ultra-low-power, high-performance, deep neural network processors for computing at the network’s edge, helping to reduce latency, and increase the privacy and security of power- and cost-constrained applications running on devices as small as earbuds, and as large as automobiles.\nSyntiant’s processors enable always-on voice (AOV) control for most battery-powered devices, from cell phones and earbuds, to drones, laptops and other voice-activated products. The company’s Neural Decision Processors (NDPs) provide highly accurate wake word, command word and event detection in a tiny package with near-zero power consumption.\n<video src=\"https://dev-media.amazoncloud.cn/a196cc6e9fe54243b28bc4b92cb70a68_Syntiant%20CEO%20on%20the%20future%20of%20ambient%20computing%20%EF%BD%9C%20Amazon%20Alexa%20Startups%20Showcase.mp4\" controls=\"controls\"></video>\n<h4><a id=\"Syntiant_CEO_on_the_future_of_ambient_computing_25\"></a>Syntiant CEO on the future of ambient computing</h4>\nDuring the Amazon Alexa Startups Showcase, Kurt Busch, CEO of Syntiant, an Alexa Fund company, explained how they’re using the latest in voice technology to invent the future of ambient computing, and why he thinks voice will be the next user interface.\nHolleman is considered a leading authority on ultra-low-power integrated circuits, and directs the <a href=\"https://coefs.uncc.edu/jhollem3/\" target=\"_blank\">Integrated Silicon Systems Laboratory</a> at the University of North Carolina, Charlotte, where he is an associate professor. He’s also is a coauthor of the book “<a href=\"https://www.amazon.com/Low-Power-Integrated-Circuit-Wireless-Interfaces-dp-1441967265/dp/1441967265/ref=mt_other?_encoding=UTF8&me=&qid=1624477986\" target=\"_blank\">Ultra Low-Power Integrated Circuit Design for Wireless Neural Interfaces</a>”, which was first published in 2011.\nAmazon Science asked Holleman three questions about the challenges of designing and developing ultra-low-power AI processors, and why he believes voice will become the predominant user interface of the future.\nQ. You are one of 22 authors on a paper, “MLPerf Tiny Benchmark”, which has been accepted to the NeurIPS 2021 Conference. What does this benchmark suite comprise, and why is it significant to the tinyML field?\nThe <a href=\"https://arxiv.org/abs/2106.07597\" target=\"_blank\">MLPerf Tiny Benchmark</a> actually includes four tests meant to measure the performance and efficiency of very small devices on ML inference: keyword spotting, person detection, image recognition, and anomaly detection. For each test, there is a reference model, and code to measure the latency and power on a reference platform.\nI try to think about the benchmark from the standpoint of a system developer – someone building a device that needs some local intelligence. They have to figure out, with a given energy budget and system requirements, what solution is going to work for them. So they need to understand the power consumption and speed of different hardware. When you look at most of the information available, everyone measures their hardware on different things, so it’s really hard to compare. The benchmark makes it clear exactly what is being measured and – in the closed division – every submission is running the exact same model, so it’s a clear apples-to-apples comparison.\nThen the open division takes the same principle – every submission does the same thing – but allows for some different tradeoffs by just defining the problem and allowing submitters to run different models that may take advantage of particular aspects of their hardware. So you wind up with a Pareto surface of accuracy, power, and speed. I think this last part is particularly important in the “tiny” space because there is a lot of room to jointly optimize models, hardware, and features to get high-performing and high-efficiency end-to-end systems.\n<h4><a id=\"Q_What_do_you_consider_Syntiants_key_ingredients_in_your_development_and_design_of_ultralowpower_AI_processors_and_how_will_your_teams_work_contribute_to_voice_becoming_the_predominant_user_interface_of_the_future_41\"></a>Q. What do you consider Syntiant’s key ingredients in your development and design of ultra-low-power AI processors, and how will your team’s work contribute to voice becoming the predominant user interface of the future?</h4>\nI would say there are two major elements that have been key to our success. The first is, as I mentioned before, that edge ML requires tight coupling between the hardware and the algorithms. From the very beginning at Syntiant, we’ve had our silicon designers and our modelers working closely together. That shows up in office arrangement, with hardware and software groups all intermingled; in code and design reviews, really all across the company. And I think that’s paid off in outcomes. We see how easy it is to map a given algorithm to our hardware, because the hardware was designed to do all the hard work of coordinating memory access in a way that’s optimized for exactly the types of computation we see in ML workloads. And for the same reason, we see the benefits of that approach in power and performance.\nThe second big piece is that we realized that deep learning is still such a new field that the expertise required to deliver production-grade solutions is still very rare. It’s easy enough to download an MNIST or CIFAR demo, train it up and you think, “I’ve got this figured out!” But when you deploy a device to millions of people who interact with it on a daily basis, the job becomes much harder. You need to acquire data, validate it, debug models, and it’s a big job. We knew that for most customers, we couldn’t just toss a piece of silicon over the fence and leave the rest to them. That led us to put a lot of effort into building a complete pipeline addressing the data tasks, training, and evaluation, so we can provide a complete solution to customers who don’t have a ton of ML expertise in house.\n<h4><a id=\"Q_What_in_particular_makes_edge_processing_difficult_47\"></a>Q. What in particular makes edge processing difficult?</h4>\nOn the hardware side, the big challenges are power and cost. Whether you’re talking about a watch, an earbud, or a phone, consumers have some pretty hard requirements for how long a battery needs to last – generally a day – and how much they will pay for something. And on the modeling side, edge devices find themselves in a tremendously diverse set of environments, so you need a voice assistant to recognize you not just in the kitchen or in the car, but on a factory floor, at a football game, and everywhere else you can imagine going.\nThen those three things push against each other like the classical balloon analogy. If you push down cost by choosing a lower-end processor, it may not have the throughput to run the model quickly, so you run at a lower frame rate, under-sampling the input signal, and you miss events. Or you find a model that works well, and you run it fast enough, but then the power required to run it limits battery life. This tradeoff is especially difficult for features that are always on, like a wakeword detector, or person detection in a security camera. At Syntiant, we had to address all of these issues simultaneously, which is why it was so important to have all of our teams tightly connected, work through the use cases, and know how each piece affected all the other pieces.\nHaving done that work, the result is that you get the power of modern ML in tiny devices with almost no impact on the battery life. And the possibilities, especially for voice interfaces, is very exciting. We’ve all grown accustomed to interacting with our phone by voice and we’ve seen how often we want to do something but don’t have a free hand available for a tactile interface.\nSyntiant’s technology is making it possible to bring that experience to smaller and cheaper devices with all of the processing happening locally. So many of the devices we use have useful information they can’t share with us because the interface would be too expensive. Imagine being able to say “TV remote, where are you?” or “Smoke alarm, why are you beeping?” and getting a clear and quick answer. We’ve forgotten that some annoying things we’ve gotten so used to can be fixed. And of course you don’t want all of the cost and the privacy concerns associated with sending all of that information to the cloud.\n<h4><a id=\"Conventional_generalpurpose_processors_dont_have_the_efficiency_to_run_strong_models_within_the_constraints_that_edge_devices_have_With_our_new_architecture_powerful_machine_learning_can_be_deployed_practically_anywhere_for_the_first_time_57\"></a>Conventional general-purpose processors don’t have the efficiency to run strong models within the constraints that edge devices have. With our new architecture, powerful machine learning can be deployed practically anywhere for the first time.</h4>\nJeremy Holleman\nSo we’re focused on putting that level of intelligence right in the device. To deliver that, we need all of these pieces to come together: the data pipeline, the models, and the hardware. Conventional general-purpose processors don’t have the efficiency to run strong models within the constraints that edge devices have. With our new architecture, powerful machine learning can be deployed practically anywhere for the first time.\nABOUT THE AUTHOR\n<h4><a id=\"Staff_writer_65\"></a>Staff writer</h4>\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家