Growing generative adversarial networks, layer by layer

机器学习

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Generative adversarial networks (GANs) can produce remarkably realistic synthetic images. During training, a GAN pits a generator, which produces the image, against a discriminator, which tries to distinguish between real and synthetic images. The “arms race” between the two can yield a very convincing generator.\n\nThe generation of high-resolution, sharp, and diverse images demands large networks. However, if the network is too big, adversarial training can fail to converge on a good generator. Researchers address this problem by starting with a small generator and a correspondingly small discriminator and gradually adding more and more neural-network layers to both, ensuring that the generator maintains a baseline level of performance as it grows in complexity.\n\nIn the past, this approach has been deterministic: a fixed number of layers, of fixed size and predetermined type, are added on a fixed schedule. In a ++[paper](https://www.amazon.science/publications/dynamically-grown-generative-adversarial-networks)++ my colleagues and I presented at the annual meeting of the Association for the Advancement of Artificial Intelligence (++[AAAI](https://www.amazon.science/conferences-and-events/aaai-2021)++), we explore a more organic way of growing a GAN, computing the size, number, and type of the added layers incrementally, on the fly, based on performance during training.\n\n![image.png](https://dev-media.amazoncloud.cn/dc520a28315c415780a6d62aa59271c1_image.png)\n\nA comparison of bedroom interior images created by our model (top) and an earlier progressively grown GAN (bottom).\n\nThe graphic above compares images produced by our method to those produced by an earlier progressively grown GAN. We also use standard metrics to evaluate our model’s output, the sliced Wasserstein distance and the Fréchet inception distance. Both measure the difference between two probability distributions — in this case, the distributions of visual features for real and synthetic images. Better distribution matching means both higher sample fidelity and greater diversity. \n\nWe compared our model to several other GANs, including other progressively grown GANs, on several different data sets, and found that, with one exception, ours had lower distance scores on both measures. The one exception was a “part-based” GAN, which uses a fundamentally different approach, separately synthesizing segments of an image and then stitching them together. But in principle, that approach could be used in conjunction with ours.\n\n\n#### **Breaking symmetry**\n\n\nOne distinguishing feature of our approach is that it is not constrained to symmetric architectures. With previous progressively grown GANs, the generator and discriminator grow in lockstep and end up with the same number of layers. With our approach, the number of layers in the generator and discriminator is optimized separately, and the two networks can have significantly different architectures. \n\nOur method’s dynamic growing process turns out to allow faster generator growth, with guidance from a moderate discriminator; the discriminator catches up later to provide stronger critics, helping the generator mature. This is consistent with ++[recent research](https://www.amazon.science/blog/the-importance-of-forgetting-in-artificial-and-animal-intelligence)++ on the training dynamics of neural networks, showing that “memorization” phases are followed by “consolidation” phases.\n\nOur approach alternates between training the existing GAN and adding new layers. During each growth stage, our algorithm has the option of adding to the generator, adding to the discriminator, or both. \n\nIf a layer added to the top of the generator is larger than the layers below it, then a layer of the same size must be added to the bottom of the discriminator, as the outputs of the generator must have the same size as the inputs to the discriminator. Such additions increase the resolution of the images the generator produces.\n\n![image.png](https://dev-media.amazoncloud.cn/a8f5142e7cd247e4b9162c1dbf9900ee_image.png)\n\nOur protocol for alternating between growth stages and training stages in growable GANs. The generator (G) and discriminator (D) may grow asymmetrically, resulting in models of different sizes. Some growth stages increase image resolution by adding new, larger network layers to the top of the generator stack and the bottom of the discriminator stack.\n\nCREDIT: GLYNIS CONDON\n\nWhen a new layer — with randomly initialized weights — is added to either network, the weights of existing layers are inherited. Future training may adjust the carried-over weights, however.\n\nLike most AI applications that deal with images, our image discriminator uses a convolutional neural network. In a typical computer vision application, a convolutional neural network steps through an input image in fixed-size chunks — say, three-pixel-by-three-pixel squares — and applies the same bank of image filters to each chunk. The next layer of the network applies a similar bank of filters to each of the first layer’s outputs, and so on. The output of the network is a vector that characterizes the input image in some way — say, identifying objects.\n\nAn image generator does the same thing in reverse, beginning with a high-level specification and outputting an image. But the principle of convolution is the same.\n\nIn our approach, when our algorithm adds a layer to either the generator or discriminator in our GAN, it has to determine not only the size of the layer but also the scale of the convolutions — how big the filters are and how much they should overlap.\n\nMoreover, the optimal sizes of a layer and its filters depend not just on the inputs and outputs of that layer but also on the inputs and outputs of all the layers that succeed it. Canvassing all the possibilities of layer and filter size for both the layer to be added and all its successors is computationally intractable. So instead, our algorithm considers the k best models recorded in the search history and computes all the possible next layers to add to those. \n\nThis random sampling is not guaranteed to converge on the global optimum for layer and filter size. But like most deep-learning optimization, it leads to a good-enough local optimum. And it gives the growable GAN much more flexibility than fixing the architectural parameters in advance does.\n\nABOUT THE AUTHOR\n\n#### **[Yuting Zhang](https://www.amazon.science/author/yuting-zhang)**\n\nYuting Zhang is a senior applied scientist with Amazon Web Services.","render":"Generative adversarial networks (GANs) can produce remarkably realistic synthetic images. During training, a GAN pits a generator, which produces the image, against a discriminator, which tries to distinguish between real and synthetic images. The “arms race” between the two can yield a very convincing generator.\nThe generation of high-resolution, sharp, and diverse images demands large networks. However, if the network is too big, adversarial training can fail to converge on a good generator. Researchers address this problem by starting with a small generator and a correspondingly small discriminator and gradually adding more and more neural-network layers to both, ensuring that the generator maintains a baseline level of performance as it grows in complexity.\nIn the past, this approach has been deterministic: a fixed number of layers, of fixed size and predetermined type, are added on a fixed schedule. In a <ins><a href=\"https://www.amazon.science/publications/dynamically-grown-generative-adversarial-networks\" target=\"_blank\">paper</a></ins> my colleagues and I presented at the annual meeting of the Association for the Advancement of Artificial Intelligence (<ins><a href=\"https://www.amazon.science/conferences-and-events/aaai-2021\" target=\"_blank\">AAAI</a></ins>), we explore a more organic way of growing a GAN, computing the size, number, and type of the added layers incrementally, on the fly, based on performance during training.\n<img src=\"https://dev-media.amazoncloud.cn/dc520a28315c415780a6d62aa59271c1_image.png\" alt=\"image.png\" />\nA comparison of bedroom interior images created by our model (top) and an earlier progressively grown GAN (bottom).\nThe graphic above compares images produced by our method to those produced by an earlier progressively grown GAN. We also use standard metrics to evaluate our model’s output, the sliced Wasserstein distance and the Fréchet inception distance. Both measure the difference between two probability distributions — in this case, the distributions of visual features for real and synthetic images. Better distribution matching means both higher sample fidelity and greater diversity.\nWe compared our model to several other GANs, including other progressively grown GANs, on several different data sets, and found that, with one exception, ours had lower distance scores on both measures. The one exception was a “part-based” GAN, which uses a fundamentally different approach, separately synthesizing segments of an image and then stitching them together. But in principle, that approach could be used in conjunction with ours.\n<h4><a id=\"Breaking_symmetry_15\"></a>Breaking symmetry</h4>\nOne distinguishing feature of our approach is that it is not constrained to symmetric architectures. With previous progressively grown GANs, the generator and discriminator grow in lockstep and end up with the same number of layers. With our approach, the number of layers in the generator and discriminator is optimized separately, and the two networks can have significantly different architectures.\nOur method’s dynamic growing process turns out to allow faster generator growth, with guidance from a moderate discriminator; the discriminator catches up later to provide stronger critics, helping the generator mature. This is consistent with <ins><a href=\"https://www.amazon.science/blog/the-importance-of-forgetting-in-artificial-and-animal-intelligence\" target=\"_blank\">recent research</a></ins> on the training dynamics of neural networks, showing that “memorization” phases are followed by “consolidation” phases.\nOur approach alternates between training the existing GAN and adding new layers. During each growth stage, our algorithm has the option of adding to the generator, adding to the discriminator, or both.\nIf a layer added to the top of the generator is larger than the layers below it, then a layer of the same size must be added to the bottom of the discriminator, as the outputs of the generator must have the same size as the inputs to the discriminator. Such additions increase the resolution of the images the generator produces.\n<img src=\"https://dev-media.amazoncloud.cn/a8f5142e7cd247e4b9162c1dbf9900ee_image.png\" alt=\"image.png\" />\nOur protocol for alternating between growth stages and training stages in growable GANs. The generator (G) and discriminator (D) may grow asymmetrically, resulting in models of different sizes. Some growth stages increase image resolution by adding new, larger network layers to the top of the generator stack and the bottom of the discriminator stack.\nCREDIT: GLYNIS CONDON\nWhen a new layer — with randomly initialized weights — is added to either network, the weights of existing layers are inherited. Future training may adjust the carried-over weights, however.\nLike most AI applications that deal with images, our image discriminator uses a convolutional neural network. In a typical computer vision application, a convolutional neural network steps through an input image in fixed-size chunks — say, three-pixel-by-three-pixel squares — and applies the same bank of image filters to each chunk. The next layer of the network applies a similar bank of filters to each of the first layer’s outputs, and so on. The output of the network is a vector that characterizes the input image in some way — say, identifying objects.\nAn image generator does the same thing in reverse, beginning with a high-level specification and outputting an image. But the principle of convolution is the same.\nIn our approach, when our algorithm adds a layer to either the generator or discriminator in our GAN, it has to determine not only the size of the layer but also the scale of the convolutions — how big the filters are and how much they should overlap.\nMoreover, the optimal sizes of a layer and its filters depend not just on the inputs and outputs of that layer but also on the inputs and outputs of all the layers that succeed it. Canvassing all the possibilities of layer and filter size for both the layer to be added and all its successors is computationally intractable. So instead, our algorithm considers the k best models recorded in the search history and computes all the possible next layers to add to those.\nThis random sampling is not guaranteed to converge on the global optimum for layer and filter size. But like most deep-learning optimization, it leads to a good-enough local optimum. And it gives the growable GAN much more flexibility than fixing the architectural parameters in advance does.\nABOUT THE AUTHOR\n<h4><a id=\"Yuting_Zhanghttpswwwamazonscienceauthoryutingzhang_46\"></a><a href=\"https://www.amazon.science/author/yuting-zhang\" target=\"_blank\">Yuting Zhang</a></h4>\nYuting Zhang is a senior applied scientist with Amazon Web Services.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家