Explicit control of GAN-generated synthetic images

机器学习

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"In recent years, generative adversarial networks (++[GANs](https://www.amazon.science/tag/generative-adversarial-networks)++) have demonstrated a remarkable ability to synthesize realistic visual images from scratch.\n\nBut controlling specific features of a GAN’s output — lighting conditions or viewing angle, for instance, or whether someone is smiling or frowning — has been difficult. Most approaches depend on trial-and-error exploration of the GAN’s parameter space. A recent approach to controlling synthesized faces involves generating 3-D archetypes with graphics software, a cumbersome process that offers limited control and is restricted to a single image category.\n\n![下载.gif](https://dev-media.amazoncloud.cn/b966c55f49804455a49c64309d8c2b0a_%E4%B8%8B%E8%BD%BD.gif)\n\nA new method enables explicit control of properties of synthetic images produced by generative adversarial networks (GANs), such as camera angle, subject age, and expression.\n\nAt this year’s International Conference on Computer Vision (++[ICCV](https://www.amazon.science/conferences-and-events/iccv-2021)++), together with Amazon distinguished scientist ++[Gérard Medioni](https://www.amazon.science/search?q=medioni)++, we presented a new approach to controlling GANs’ output that allows numerical specification of image parameters — viewing angle or the age of a human figure, for instance — and is applicable to a variety of image categories.\n\nOur approach outperformed its predecessors on several measures of control precision, but we also evaluated it in user studies. Users found images generated with our approach more realistic than images generated by its two leading predecessors, by a two-to-one margin. \n\n#### **Latent spaces**\n\nThe training setup for a GAN involves two machine learning models: the generator and the discriminator. The generator learns to produce images that will fool the discriminator, while the discriminator learns to distinguish synthetic images from real ones.\n\nDuring training, the model learns a probability distribution over a learned set of image parameters (in the StyleGAN family of models, there are 512 parameters). That distribution describes the range of parameter values that occur in real images. Synthesizing a new image is then a matter of picking a random point from that distribution and passing it to the generator.\n\n<video src=\\"https://dev-media.amazoncloud.cn/3e4906936edb483e94092915961b91dd_GAN-Control%20-%20Explicitly%20Controllable%20GANs%20%28with%20dogs%29.mp4\\" class=\\"manvaVedio\\" controls=\\"controls\\" style=\\"width:160px;height:160px\\"></video>\n\n#### **Explicitly controllable GANs**\n\nThe image parameters define a latent space (with, in StyleGAN, 512 dimensions). Variations in image properties — high to low camera angle, young to old faces, left to right lighting, and so on — might lie along particular axes of the space. But because the generator is a black-box neural network, the structure of the space is unknown.\n\nPrevious work on controllable GANs involved exploring the space in an attempt to learn its structure. But that structure can be irregular, so that learning about one property tells you little about the others. And properties can be entangled, so that changing one also changes others.\n\nRecent work adopted a more principled approach, in which the input to the generator specifies properties of an image of a human face, and the generator is evaluated on how well its output matches a 3-D graphics model with the same properties.\n\nThis approach has some limitations, however. One is that it works only with faces. Another is that it can yield output images that look synthetic, since the generator learns to match properties of synthetic training targets. And finally, it’s hard to capture more holistic properties, like a person’s age, with a graphics model.\n\n#### **Controllable GANs**\n\nIn our paper, we present an approach to controlling GANs that requires only numerical inputs, modifies a wide range of image properties, and applies to a large variety of image categories.\n\nFirst, we use contrastive learning to structure the latent space so that the properties we’re interested in lie along different dimensions — that is, they’re not entangled. Then we learn a set of controllers that can modify those properties individually.\n\n![image.png](https://dev-media.amazoncloud.cn/d0abd3241ad64e759f5dc80fbec3f8b7_image.png)\n\nIn our method, we use contrastive learning to structure the latent space (W) such that different image properties tend to fall along different dimensions. Then we learn a set of controllers (y) that map numerical values to the latent space.\n\nTo start, we select a set of image properties we wish to control and construct a representational space such that each dimension of the space corresponds to one of the properties (Z in the image above). Then we select pairs of points in that space that have the same value in one dimension but different values in the other dimensions.\n\nDuring training, we pass these point pairs through a set of fully connected neural-network layers that learn to map points in our constructed space onto points in a learned latent space (W in the figure). The points in the learned space will act as controllers for our generator.\n\nThen, in addition to the standard adversarial loss, which penalizes the generator if it fails to fool the discriminator, we compute a set of additional losses, one for each property. These are based on off-the-shelf models that compute the image properties — age, expression, lighting direction, and so on. The losses force images that share properties closer together in the latent space, while forcing apart images that don’t share properties.\n\nOnce we’ve trained the generator, we randomly select points in the latent space, generate the corresponding images, and measure their properties. Then we train a new set of controllers that take the measured properties as input and output the corresponding points in the latent space. When those controllers are trained, we have a way to map specific property measurements to points in the latent space.\n\n#### **Evaluation**\n\nTo evaluate our method, we compared it to two of the prior methods that used 3-D graphics models to train face generators. We found that faces generated using our method better matched the input parameters than faces generated with the earlier methods.\n\nWe also asked human subjects to rate the realism of images produced by our method and the two baselines. In 67% of cases, subjects found our images more natural than either baseline. The better of the two baselines had a score of 22%.\n\nFinally, we asked our human subjects whether they agreed or disagreed that our generated human faces exhibited properties we’d controlled for. For five of those properties, the agreement ranged from 87% to 98%. On the sixth property — elevated camera angle — the agreement was only about 66%, but at low angles, the effect may have been too subtle to discern.\n\n![image.png](https://dev-media.amazoncloud.cn/2a9364e083b245959c6f4f23690bae5b_image.png)\n\nAt low angles, camera elevation can be difficult to discern.\n\nIn these evaluations, we necessarily restricted ourselves to generating human faces, since that’s the only domain for which strong baselines were available. But we also experimented with images of dogs’ faces and synthetic paintings, neither of which prior methods could handle. The results can be judged in the images below:\n\n![image.png](https://dev-media.amazoncloud.cn/c92e6e08aec54387bdd381ad04698492_image.png)\n\nABOUT THE AUTHOR\n\n#### **[Igor Kviatkovsky](https://www.amazon.science/author/igor-kviatkovsky)**\n\nIgor Kviatkovsky is a senior research scientist at Amazon.\n\n#### **[Alon Shoshan](https://www.amazon.science/author/alon-shoshan)**\n\nAlon Shoshan is an applied scientist at Amazon.\n\n#### **[Nadav Bhonker](https://www.amazon.science/author/nadav-bhonker)**\n\nNadav Bhonker is an applied scientist at Amazon.\n\n\n\n","render":"In recent years, generative adversarial networks (<ins><a href=\\"https://www.amazon.science/tag/generative-adversarial-networks\\" target=\\"_blank\\">GANs</a></ins>) have demonstrated a remarkable ability to synthesize realistic visual images from scratch.\nBut controlling specific features of a GAN’s output — lighting conditions or viewing angle, for instance, or whether someone is smiling or frowning — has been difficult. Most approaches depend on trial-and-error exploration of the GAN’s parameter space. A recent approach to controlling synthesized faces involves generating 3-D archetypes with graphics software, a cumbersome process that offers limited control and is restricted to a single image category.\n<img src=\\"https://dev-media.amazoncloud.cn/b966c55f49804455a49c64309d8c2b0a_%E4%B8%8B%E8%BD%BD.gif\\" alt=\\"下载.gif\\" />\nA new method enables explicit control of properties of synthetic images produced by generative adversarial networks (GANs), such as camera angle, subject age, and expression.\nAt this year’s International Conference on Computer Vision (<ins><a href=\\"https://www.amazon.science/conferences-and-events/iccv-2021\\" target=\\"_blank\\">ICCV</a></ins>), together with Amazon distinguished scientist <ins><a href=\\"https://www.amazon.science/search?q=medioni\\" target=\\"_blank\\">Gérard Medioni</a></ins>, we presented a new approach to controlling GANs’ output that allows numerical specification of image parameters — viewing angle or the age of a human figure, for instance — and is applicable to a variety of image categories.\nOur approach outperformed its predecessors on several measures of control precision, but we also evaluated it in user studies. Users found images generated with our approach more realistic than images generated by its two leading predecessors, by a two-to-one margin.\n<h4><a id=\\"Latent_spaces_12\\"></a>Latent spaces</h4>\\nThe training setup for a GAN involves two machine learning models: the generator and the discriminator. The generator learns to produce images that will fool the discriminator, while the discriminator learns to distinguish synthetic images from real ones.\nDuring training, the model learns a probability distribution over a learned set of image parameters (in the StyleGAN family of models, there are 512 parameters). That distribution describes the range of parameter values that occur in real images. Synthesizing a new image is then a matter of picking a random point from that distribution and passing it to the generator.\n<video src=\\"https://dev-media.amazoncloud.cn/3e4906936edb483e94092915961b91dd_GAN-Control%20-%20Explicitly%20Controllable%20GANs%20%28with%20dogs%29.mp4\\" controls=\\"controls\\"></video>\\n<h4><a id=\\"Explicitly_controllable_GANs_20\\"></a>Explicitly controllable GANs</h4>\\nThe image parameters define a latent space (with, in StyleGAN, 512 dimensions). Variations in image properties — high to low camera angle, young to old faces, left to right lighting, and so on — might lie along particular axes of the space. But because the generator is a black-box neural network, the structure of the space is unknown.\nPrevious work on controllable GANs involved exploring the space in an attempt to learn its structure. But that structure can be irregular, so that learning about one property tells you little about the others. And properties can be entangled, so that changing one also changes others.\nRecent work adopted a more principled approach, in which the input to the generator specifies properties of an image of a human face, and the generator is evaluated on how well its output matches a 3-D graphics model with the same properties.\nThis approach has some limitations, however. One is that it works only with faces. Another is that it can yield output images that look synthetic, since the generator learns to match properties of synthetic training targets. And finally, it’s hard to capture more holistic properties, like a person’s age, with a graphics model.\n<h4><a id=\\"Controllable_GANs_30\\"></a>Controllable GANs</h4>\\nIn our paper, we present an approach to controlling GANs that requires only numerical inputs, modifies a wide range of image properties, and applies to a large variety of image categories.\nFirst, we use contrastive learning to structure the latent space so that the properties we’re interested in lie along different dimensions — that is, they’re not entangled. Then we learn a set of controllers that can modify those properties individually.\n<img src=\\"https://dev-media.amazoncloud.cn/d0abd3241ad64e759f5dc80fbec3f8b7_image.png\\" alt=\\"image.png\\" />\nIn our method, we use contrastive learning to structure the latent space (W) such that different image properties tend to fall along different dimensions. Then we learn a set of controllers (y) that map numerical values to the latent space.\nTo start, we select a set of image properties we wish to control and construct a representational space such that each dimension of the space corresponds to one of the properties (Z in the image above). Then we select pairs of points in that space that have the same value in one dimension but different values in the other dimensions.\nDuring training, we pass these point pairs through a set of fully connected neural-network layers that learn to map points in our constructed space onto points in a learned latent space (W in the figure). The points in the learned space will act as controllers for our generator.\nThen, in addition to the standard adversarial loss, which penalizes the generator if it fails to fool the discriminator, we compute a set of additional losses, one for each property. These are based on off-the-shelf models that compute the image properties — age, expression, lighting direction, and so on. The losses force images that share properties closer together in the latent space, while forcing apart images that don’t share properties.\nOnce we’ve trained the generator, we randomly select points in the latent space, generate the corresponding images, and measure their properties. Then we train a new set of controllers that take the measured properties as input and output the corresponding points in the latent space. When those controllers are trained, we have a way to map specific property measurements to points in the latent space.\n<h4><a id=\\"Evaluation_48\\"></a>Evaluation</h4>\\nTo evaluate our method, we compared it to two of the prior methods that used 3-D graphics models to train face generators. We found that faces generated using our method better matched the input parameters than faces generated with the earlier methods.\nWe also asked human subjects to rate the realism of images produced by our method and the two baselines. In 67% of cases, subjects found our images more natural than either baseline. The better of the two baselines had a score of 22%.\nFinally, we asked our human subjects whether they agreed or disagreed that our generated human faces exhibited properties we’d controlled for. For five of those properties, the agreement ranged from 87% to 98%. On the sixth property — elevated camera angle — the agreement was only about 66%, but at low angles, the effect may have been too subtle to discern.\n<img src=\\"https://dev-media.amazoncloud.cn/2a9364e083b245959c6f4f23690bae5b_image.png\\" alt=\\"image.png\\" />\nAt low angles, camera elevation can be difficult to discern.\nIn these evaluations, we necessarily restricted ourselves to generating human faces, since that’s the only domain for which strong baselines were available. But we also experimented with images of dogs’ faces and synthetic paintings, neither of which prior methods could handle. The results can be judged in the images below:\n<img src=\\"https://dev-media.amazoncloud.cn/c92e6e08aec54387bdd381ad04698492_image.png\\" alt=\\"image.png\\" />\nABOUT THE AUTHOR\n<h4><a id=\\"Igor_Kviatkovskyhttpswwwamazonscienceauthorigorkviatkovsky_66\\"></a><a href=\\"https://www.amazon.science/author/igor-kviatkovsky\\" target=\\"_blank\\">Igor Kviatkovsky</a></h4>\nIgor Kviatkovsky is a senior research scientist at Amazon.\n<h4><a id=\\"Alon_Shoshanhttpswwwamazonscienceauthoralonshoshan_70\\"></a><a href=\\"https://www.amazon.science/author/alon-shoshan\\" target=\\"_blank\\">Alon Shoshan</a></h4>\nAlon Shoshan is an applied scientist at Amazon.\n<h4><a id=\\"Nadav_Bhonkerhttpswwwamazonscienceauthornadavbhonker_74\\"></a><a href=\\"https://www.amazon.science/author/nadav-bhonker\\" target=\\"_blank\\">Nadav Bhonker</a></h4>\nNadav Bhonker is an applied scientist at Amazon.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家