An Incomplete Map of the GAN models

Guo-Jun Qi

guojunq@gmail.com

March 11th 2017

 

 

In this short article, we attempt to plot an incomplete map of the GAN-family models. We focus on the “basic” GAN models, which are not adapted or modified with any additional features.  For this reason, the popular GANs like InfoGAN, conditional GAN and auto-encoder GANs are not within the scope of our discussion in this article. In addition, we will also review a generalized GAN variant, called GLS-GAN, which unifies both Wasserstein GAN and LS-GAN that constitute the second form of regularized GAN models in literature.

 

Before we move further, let us take a glance at the map below.

 

 

Figure: An incomplete map of GANs.

 

 

To plot this map, we need a criterion to draw the boundary between different GAN models. While there are many criteria, we classify the GANs into two large classes based on whether the model has “Regularized versus Infinite Modeling Ability” [Goodfellow2014].

 

The infinite modeling ability (IMA), assumes that the discriminator of the GAN with IMA can distinguish between real and fake examples, no matter how these samples are distributed.  This is indeed a non-parametric assumption where the discriminator can have an infinite number of parameters.

 

In contrast, the GANs with regularized modeling ability (RMA) resorts to a regularized model to distinguish between real and fake examples whose distributions satisfy certain regularity conditions.  For example, both LS-GAN [Loss-Sensitive GAN, Qi2017] and WGAN [Wasserstein GAN, Arjovsky2017] are trained in a space of Lipschitz continuous functions, which are based on the Lipschitz regularity to distinguish between real and fake samples [Qi2017].

 

The significance of IMA and RMA is they are assumed explicitly or implicitly in theoretical analysis to prove the consistency conjecture of a proposed GAN model, which is the density of their generated samples match the density of true samples.

 

It has been shown [Qi2017] that both LS-GAN and WGAN are based on the Lipschitz regularity, an assumption that the model has regularized modeling ability in contrast to the non-parametric assumption of infinite modeling ability used. 

 

Class 1: Infinite Modeling Ability (IMA)

 

The first class of GANs contain the classic GAN [Goodfellow2014], EBGAN [Zhao2016], least-square GAN [Mao2017] and f-divergence GAN [Nowozin2016]. 

 

In [Arjovsky2017], it has been shown that the classic GAN and EBGAN attempt to minimize J-S distance and total variation distance between the generative density and the true data density, while the Least Square GAN is formulated to minimize their Pearson  divergence and the f-divergence GAN minimizes a family of f-divergence. 

 

Table: GANs with IMA and the distance between distributions they minimize to learn the generator.

GANs with IMA

Distance between distributions

The classic GAN

J-S distance

EBGAN

Total Variation

Least Square GAN

  divergence

f-GAN

f-divergence

 

 

Vanishing Gradient: What is interesting is these infinite modeling ability GANs all suffer from the vanishing gradient problem as discussed in [Arjovsky2015].

If the manifold of real data and the manifold of generated data has no or negligible overlap, the distances they are posed to minimize will become a constant causing vanishing gradient that cannot update the generator at all. This is a problem that has been reported in [Goodfellow2013] since the first GAN (J-S) was proposed.

 

Open Question: Is this only a coincidence between the IMA and the vanishing gradient problem?  In other words, is the IMA inevitably causing vanishing gradient? Can we find some GANs with IMA but can avoid the vanishing gradient problem? Probably.

 

Class 2: Regularized Modeling Ability (RMA)

 

On the contrary, the second class of GANs contain the WGAN and LS-GAN recently developed.  By chance, both models are built based on Lipschitz regularities [Qi2017], and they do not need infinite modeling ability to prove the consistency conjecture.

 

Even more, in Appendix D of [Qi2017], it is shown that both WGAN and LS-GAN belong to a super class of Generalized LS-GAN (GLS-GAN):

                1. WGAN is the GLS-GAN with a cost of  ,

                2. LS-GAN is the GLS-GAN with a cost of .

More GLS-GANs can be found by defining a proper cost function satisfying some conditions [Qi2017]. 

An interesting example is when , it will define a GLS-GAN with a L1 cost that minimizes the difference between the loss functions of real and generated samples (up to their margin). This is quite different from the idea behind both LS-GAN and WGAN, but preliminary results show that it works!

For details, please refer to Appendix D of [Qi2017].

 

Obviously, there should exist the other GANs with RMA, but not belonging to the GLS-GAN.  More research efforts deserve to discover more GANs with RMA that can properly regularize the data generation process by introducing priors characterizing the true data distribution beyond the Lipschitz assumptions. This should be an interesting direction to expand the territory of the GANs.

   

Examples of GLS-GAN (Generalized LS-GAN)

 

We can adopt a cost function of Leaky Linear Rectifier  to define the GLS-GAN with a slope .  Then WGAN is GLS-GAN with   and LS-GAN is GLS-GAN with .

 

Similarly, more GLS-GANs with various cost functions can be defined. Below we illustrate some results generated by the GLS-GAN with the Leaky Linear Rectifier cost of different slopes.

 

                            

                            (a) GLS-GAN with                                     (b) LS-GAN, i.e., GLS-GAN with                       (c) WGAN, i.e., GLS-GAN with

 

The source code of the GLS-GAN is available at github/glsgan, and the details about GLS-GAN can be found at [Qi2017].

 

 

References

[Qi2017] G.-J. Qi, Loss-Sensitive Generative Adversarial Networks, preprint, 2017.

[Goodfellow2013] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Qzair, A. Courville and Y. Bengio, Generativ Adversarial Nets, in Advances in Neural Information Processing Systems, 2014, pp. 2672-2680.

[Arjovsky2015] M. Arjovsky and L. Bottou. Towards Principled Methods for Training Generative Adversarial Networks, 2015.

[Nowozin2016] S. Nowozin, B. Cseke, and R. Tomioka, f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization, 2016.

[Arjovsky2017] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN, 2017.

[Zhao2016] J. Zhao, M. Mathieu, and Y. LeCun, Energy-based Generative Adversarial Network, 2016.

[Mao2017] X. Mao, Q. Li, H. Xie, R. Y.K. Lau and Z. Wang, Least Squares Generative Adversarial Networks, 2017.