{"value":"We as humans take for granted our ability to operate in ever-changing home environments. Every morning, we can get from our bed to the kitchen, even after we change our furniture arrangement or move chairs around or when family members leave their shoes and bags in the middle of the hallway.\n\nThis is because humans develop a deep contextual understanding of their environments that is invariant to a variety of changes. That understanding is enabled by superior sensors (eyes, ears, and touch), a powerful computer (the brain), and vast memory. However, for a robot that has finite sensors, computational power, and memory, dealing with a challenging dynamic environment requires innovative new algorithms and representations.\n\nAt Amazon, scientists and engineers have been investigating ways to help [Astro](https://www.amazon.com/Introducing-Amazon-Astro/dp/B078NSDFSB) know where it is at all times in a customer's home with few to no assumptions about the environment. Astro’s [Intelligent Motion](https://www.amazon.science/blog/astros-intelligent-motion-brings-state-of-the-art-navigation-to-the-home) system relies on visual simultaneous localization and mapping, or V-SLAM, which enables a robot to use visual data to simultaneously construct a map of its environment and determine its position on that map.\n\n![image.png](https://dev-media.amazoncloud.cn/3be1eff9f16d43d7b479328d2fab90c8_image.png)\n\nA high-level overview of a V-SLAM system.\n\nA V-SLAM system typically consists of a visual odometry tracker, a nonlinear optimizer, a loop-closure detector, and mapping components. The front end of Astro’s system performs visual odometry by extracting visual features from sensor data, establishing correspondences between features from different sensor feeds, and tracking the features from frame to frame in order to estimate sensor movement.\n\nLoop-closure detection tries to match the features in the current frame with those previously seen to correct for accumulated inaccuracies in visual odometry. Astro then processes the visual features, estimated sensor poses, and loop-closure information and optimizes it to obtain a global motion trajectory and map.\n\nState-of-the art research on V-SLAM assumes that the robot’s environment is mostly static and rarely changes. But those assumptions can’t be expected to hold in customers’ homes.\n\n<video src=\\"https://dev-media.amazoncloud.cn/9f1b204b71ce4c1b9d59ed6b47d9ac63_astro-navigation-video.mp4\\" class=\\"manvaVedio\\" controls=\\"controls\\" style=\\"width:160px;height:160px\\"></video>\n\n\n#### **Visual odometry and loop closure**\n\n\nAn example from a mock home environment, which demonstrates how Astro connects visual features captured by two sensors (red lines) and at different times (green lines). The actual data is discarded after the salient features (yellow circles) are extracted.\n\nFor Astro to localize robustly in home environments, we had to overcome a number of challenges, which we discuss in the following sections.\n\n\n#### **Environmental dynamics**\n\n\nChanges in the home happen at varying time scales: short-term changes, such as the presence of pets and people; medium-term changes, such as the appearance of objects like boxes, bags, or chairs that have been moved around; and long-term changes, such as holiday decorations, large-furniture rearrangements, or even structural changes to walls during renovations.\n\nIn addition, the lighting inside homes changes constantly as the sun moves and indoor lights are turned on and off, shading and illuminating rooms and furniture in ways that can make the same scene look very different at different times. Astro must be able to operate across all lighting conditions, including total darkness.\n\n![image.png](https://dev-media.amazoncloud.cn/e9db3a6878a646719398ec673fbdd1e9_image.png)\n\nTwo sets of inputs from Astro's perspective, showing how similarities between two different places in the home can lead to perceptual aliasing. Images have been adjusted for clarity.\n\n![image.png](https://dev-media.amazoncloud.cn/ce98a5cdcc7849d9ba78031e4a594e9f_image.png)\n\nIn this sample input from a simulated home environment, Astro's perspective on the same room at two different times demonstrates how dramatically lighting conditions can vary. Images have been adjusted for clarity.\n\nWhile industrial robots can function in controlled environments whose variations are precoded as rules in software programs, adapting to unscripted environmental changes is one of the fundamental challenges the Astro team had to solve. The Intelligent Motion system needs a high-level visual understanding of its environment, such that invariant visual cues can be extracted and described programmatically.\n\nAstro uses deep-learning algorithms trained with millions of image pairs, both captured and synthesized, that depict similar scenes at different times of day. Those images mimic a variety of possible scenarios Astro may face in a real customer’s home, such as different scene layouts, lighting and perspective changes, occlusions, object movements, and decorations.\n\nAstro’s algorithms also enable it to adapt to an environment that it has never seen before (like a new customer’s home). The development of those algorithms required a highly accurate and scalable ground-truth mechanism that can be conveniently deployed to homes and allows the team to test and improve the robustness of the V-SLAM system.\n\nIn the figure below, for instance, a floor plan of the home was acquired ahead of time, and device motion was then estimated from sensor data at centimeter-level accuracy.\n\n![image.png](https://dev-media.amazoncloud.cn/0ce4f02fea214f4a90e243dbcdbaf5ab_image.png)\n\nA sample visualization of Astro’s ground truth system.\n\n\n#### **Using sensor fusion to improve localization**\n\n\nIn order to improve the accuracy and robustness of localization, Astro fuses data from its navigation sensors with that of wheel encoders and an inertial measurement unit (IMU), which uses gyroscopes and accelerometers to gauge motion. Each of these sensors has limitations that can affect Astro's ability to localize, and to determine which sensors can be trusted at a given time, it is important to understand their noise characteristics and failure modes.\n\nFor example, when Astro drives over a threshold, the IMU sensor can saturate and give an erroneous reading. Or if Astro drives over a flooring surface where its wheels slip, its wheel encoders can give an inaccurate reading. Visual factors such as illumination and motion blur can also impact sensor readings.\n\nThe Astro team also had to account for a variety of use cases that would predictably cause sensor errors. For example, the team had to ensure that when Astro is lifted off the floor, the wheel encoder data is handled appropriately, and when the device enters low-power mode, certain sensor data is not processed.\n\n![下载.jpg](https://dev-media.amazoncloud.cn/830632944a3a4fbab7495990d63e3d34_%E4%B8%8B%E8%BD%BD.jpg)\n\nA simplified overview of Astro’s SLAM system.\n\n\n#### **Computational and memory limitations**\n\nAstro has finite onboard computational capacity and memory, which need to be shared among several critical systems. The Astro team developed a nonlinear optimization technique for “bundle adjustment”, the simultaneous refinement of the 3-D coordinates of the scene, the estimation of the robot’s relative motion, and optical characteristics of the camera, which is computationally efficient enough to generate six-degree-of-freedom pose information multiple times per second.\n\nBecause Astro’s map of the home is constantly updated to accommodate changes in the environment, its memory footprint steadily grows, necessitating compression and pruning techniques that preserve the map’s utility while staying within on-device memory limits.\n\nTo that end, the Astro team designed a long-term-mapping system with multiple layers of contextual knowledge, from higher-level understanding — such as which rooms Astro can visit — to lower-level understanding — such as differentiating the appearance of objects lying on the floor. This multilayer approach helps Astro efficiently recognize any major changes to its operating environment while being robust enough to disregard minor changes.\n\nAll these updates happen on-device, without any cloud processing. A constantly updated representation of the customer’s home allows Astro to robustly and effectively localize itself over months.\n\nIn creating this new category of home robot, the Astro team used deep learning and built on state-of-the-art computational-geometry techniques to give Astro spatial intelligence far beyond that of simpler home robots. The Astro team will continue innovating to ensure that Astro learns new ways to adapt to more homes, helping customers save time in their busy lives.\n\nABOUT THE AUTHOR\n\n#### **[Jianbo Ye](https://www.amazon.science/author/jianbo-ye)**\n\nJianbo Ye is a senior applied scientist at Amazon.\n\n#### **[Arnie Sen](https://www.amazon.science/author/arnie-sen)**\n\nArnie Sen is a senior manager of software development at Amazon.","render":"<p>We as humans take for granted our ability to operate in ever-changing home environments. Every morning, we can get from our bed to the kitchen, even after we change our furniture arrangement or move chairs around or when family members leave their shoes and bags in the middle of the hallway.</p>\n<p>This is because humans develop a deep contextual understanding of their environments that is invariant to a variety of changes. That understanding is enabled by superior sensors (eyes, ears, and touch), a powerful computer (the brain), and vast memory. However, for a robot that has finite sensors, computational power, and memory, dealing with a challenging dynamic environment requires innovative new algorithms and representations.</p>\n<p>At Amazon, scientists and engineers have been investigating ways to help <a href=\\"https://www.amazon.com/Introducing-Amazon-Astro/dp/B078NSDFSB\\" target=\\"_blank\\">Astro</a> know where it is at all times in a customer’s home with few to no assumptions about the environment. Astro’s <a href=\\"https://www.amazon.science/blog/astros-intelligent-motion-brings-state-of-the-art-navigation-to-the-home\\" target=\\"_blank\\">Intelligent Motion</a> system relies on visual simultaneous localization and mapping, or V-SLAM, which enables a robot to use visual data to simultaneously construct a map of its environment and determine its position on that map.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/3be1eff9f16d43d7b479328d2fab90c8_image.png\\" alt=\\"image.png\\" /></p>\n<p>A high-level overview of a V-SLAM system.</p>\n<p>A V-SLAM system typically consists of a visual odometry tracker, a nonlinear optimizer, a loop-closure detector, and mapping components. The front end of Astro’s system performs visual odometry by extracting visual features from sensor data, establishing correspondences between features from different sensor feeds, and tracking the features from frame to frame in order to estimate sensor movement.</p>\n<p>Loop-closure detection tries to match the features in the current frame with those previously seen to correct for accumulated inaccuracies in visual odometry. Astro then processes the visual features, estimated sensor poses, and loop-closure information and optimizes it to obtain a global motion trajectory and map.</p>\n<p>State-of-the art research on V-SLAM assumes that the robot’s environment is mostly static and rarely changes. But those assumptions can’t be expected to hold in customers’ homes.</p>\n<p><video src=\\"https://dev-media.amazoncloud.cn/9f1b204b71ce4c1b9d59ed6b47d9ac63_astro-navigation-video.mp4\\" controls=\\"controls\\"></video></p>\\n<h4><a id=\\"Visual_odometry_and_loop_closure_19\\"></a><strong>Visual odometry and loop closure</strong></h4>\\n<p>An example from a mock home environment, which demonstrates how Astro connects visual features captured by two sensors (red lines) and at different times (green lines). The actual data is discarded after the salient features (yellow circles) are extracted.</p>\n<p>For Astro to localize robustly in home environments, we had to overcome a number of challenges, which we discuss in the following sections.</p>\n<h4><a id=\\"Environmental_dynamics_27\\"></a><strong>Environmental dynamics</strong></h4>\\n<p>Changes in the home happen at varying time scales: short-term changes, such as the presence of pets and people; medium-term changes, such as the appearance of objects like boxes, bags, or chairs that have been moved around; and long-term changes, such as holiday decorations, large-furniture rearrangements, or even structural changes to walls during renovations.</p>\n<p>In addition, the lighting inside homes changes constantly as the sun moves and indoor lights are turned on and off, shading and illuminating rooms and furniture in ways that can make the same scene look very different at different times. Astro must be able to operate across all lighting conditions, including total darkness.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e9db3a6878a646719398ec673fbdd1e9_image.png\\" alt=\\"image.png\\" /></p>\n<p>Two sets of inputs from Astro’s perspective, showing how similarities between two different places in the home can lead to perceptual aliasing. Images have been adjusted for clarity.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/ce98a5cdcc7849d9ba78031e4a594e9f_image.png\\" alt=\\"image.png\\" /></p>\n<p>In this sample input from a simulated home environment, Astro’s perspective on the same room at two different times demonstrates how dramatically lighting conditions can vary. Images have been adjusted for clarity.</p>\n<p>While industrial robots can function in controlled environments whose variations are precoded as rules in software programs, adapting to unscripted environmental changes is one of the fundamental challenges the Astro team had to solve. The Intelligent Motion system needs a high-level visual understanding of its environment, such that invariant visual cues can be extracted and described programmatically.</p>\n<p>Astro uses deep-learning algorithms trained with millions of image pairs, both captured and synthesized, that depict similar scenes at different times of day. Those images mimic a variety of possible scenarios Astro may face in a real customer’s home, such as different scene layouts, lighting and perspective changes, occlusions, object movements, and decorations.</p>\n<p>Astro’s algorithms also enable it to adapt to an environment that it has never seen before (like a new customer’s home). The development of those algorithms required a highly accurate and scalable ground-truth mechanism that can be conveniently deployed to homes and allows the team to test and improve the robustness of the V-SLAM system.</p>\n<p>In the figure below, for instance, a floor plan of the home was acquired ahead of time, and device motion was then estimated from sensor data at centimeter-level accuracy.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/0ce4f02fea214f4a90e243dbcdbaf5ab_image.png\\" alt=\\"image.png\\" /></p>\n<p>A sample visualization of Astro’s ground truth system.</p>\n<h4><a id=\\"Using_sensor_fusion_to_improve_localization_55\\"></a><strong>Using sensor fusion to improve localization</strong></h4>\\n<p>In order to improve the accuracy and robustness of localization, Astro fuses data from its navigation sensors with that of wheel encoders and an inertial measurement unit (IMU), which uses gyroscopes and accelerometers to gauge motion. Each of these sensors has limitations that can affect Astro’s ability to localize, and to determine which sensors can be trusted at a given time, it is important to understand their noise characteristics and failure modes.</p>\n<p>For example, when Astro drives over a threshold, the IMU sensor can saturate and give an erroneous reading. Or if Astro drives over a flooring surface where its wheels slip, its wheel encoders can give an inaccurate reading. Visual factors such as illumination and motion blur can also impact sensor readings.</p>\n<p>The Astro team also had to account for a variety of use cases that would predictably cause sensor errors. For example, the team had to ensure that when Astro is lifted off the floor, the wheel encoder data is handled appropriately, and when the device enters low-power mode, certain sensor data is not processed.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/830632944a3a4fbab7495990d63e3d34_%E4%B8%8B%E8%BD%BD.jpg\\" alt=\\"下载.jpg\\" /></p>\n<p>A simplified overview of Astro’s SLAM system.</p>\n<h4><a id=\\"Computational_and_memory_limitations_69\\"></a><strong>Computational and memory limitations</strong></h4>\\n<p>Astro has finite onboard computational capacity and memory, which need to be shared among several critical systems. The Astro team developed a nonlinear optimization technique for “bundle adjustment”, the simultaneous refinement of the 3-D coordinates of the scene, the estimation of the robot’s relative motion, and optical characteristics of the camera, which is computationally efficient enough to generate six-degree-of-freedom pose information multiple times per second.</p>\n<p>Because Astro’s map of the home is constantly updated to accommodate changes in the environment, its memory footprint steadily grows, necessitating compression and pruning techniques that preserve the map’s utility while staying within on-device memory limits.</p>\n<p>To that end, the Astro team designed a long-term-mapping system with multiple layers of contextual knowledge, from higher-level understanding — such as which rooms Astro can visit — to lower-level understanding — such as differentiating the appearance of objects lying on the floor. This multilayer approach helps Astro efficiently recognize any major changes to its operating environment while being robust enough to disregard minor changes.</p>\n<p>All these updates happen on-device, without any cloud processing. A constantly updated representation of the customer’s home allows Astro to robustly and effectively localize itself over months.</p>\n<p>In creating this new category of home robot, the Astro team used deep learning and built on state-of-the-art computational-geometry techniques to give Astro spatial intelligence far beyond that of simpler home robots. The Astro team will continue innovating to ensure that Astro learns new ways to adapt to more homes, helping customers save time in their busy lives.</p>\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\\"Jianbo_Yehttpswwwamazonscienceauthorjianboye_83\\"></a><strong><a href=\\"https://www.amazon.science/author/jianbo-ye\\" target=\\"_blank\\">Jianbo Ye</a></strong></h4>\n<p>Jianbo Ye is a senior applied scientist at Amazon.</p>\n<h4><a id=\\"Arnie_Senhttpswwwamazonscienceauthorarniesen_87\\"></a><strong><a href=\\"https://www.amazon.science/author/arnie-sen\\" target=\\"_blank\\">Arnie Sen</a></strong></h4>\n<p>Arnie Sen is a senior manager of software development at Amazon.</p>\n"}