An Incomplete Map of
the GAN models
Guo-Jun Qi
guojunq@gmail.com
March 11^{th} 2017
In this short
article, we attempt to plot an incomplete map of the GAN-family models. We focus
on the “basic” GAN models, which are not adapted or modified with any
additional features. For this reason, the popular GANs like InfoGAN, conditional GAN and auto-encoder GANs are not
within the scope of our discussion in this article. In addition, we will also
review a generalized GAN variant, called GLS-GAN, which unifies both
Wasserstein GAN and LS-GAN that constitute the second form of regularized GAN
models in literature.
Before we move
further, let us take a glance at the map below.
Figure: An incomplete map of GANs.
To plot this map,
we need a criterion to draw the boundary between different GAN models. While
there are many criteria, we classify the GANs into two large classes based on
whether the model has “Regularized versus Infinite Modeling Ability”
[Goodfellow2014].
The infinite
modeling ability (IMA), assumes that the discriminator of the GAN with IMA
can distinguish between real and fake examples, no matter how these samples are
distributed. This is indeed a non-parametric assumption where the
discriminator can have an infinite number of parameters.
In contrast, the
GANs with regularized modeling ability (RMA) resorts to a regularized
model to distinguish between real and fake examples whose distributions satisfy
certain regularity conditions. For example, both LS-GAN [Loss-Sensitive
GAN, Qi2017] and WGAN [Wasserstein GAN, Arjovsky2017] are trained in a space of
Lipschitz continuous functions, which are based on the Lipschitz regularity to
distinguish between real and fake samples [Qi2017].
The significance
of IMA and RMA is they are assumed explicitly or implicitly in theoretical
analysis to prove the consistency conjecture of a proposed GAN model,
which is the density of their generated samples match the density of true
samples.
It has been shown
[Qi2017] that both LS-GAN and WGAN are based on the Lipschitz regularity, an
assumption that the model has regularized modeling ability in contrast to the non-parametric
assumption of infinite modeling ability used.
Class 1:
Infinite Modeling Ability (IMA)
The first class of
GANs contain the classic GAN [Goodfellow2014], EBGAN [Zhao2016], least-square
GAN [Mao2017] and f-divergence GAN [Nowozin2016].
In [Arjovsky2017],
it has been shown that the classic GAN and EBGAN attempt to minimize J-S
distance and total variation distance between the generative density and the
true data density, while the Least Square GAN is formulated to minimize their
Pearson divergence and the f-divergence GAN minimizes a
family of f-divergence.
Table: GANs with IMA and the distance between
distributions they minimize to learn the generator.
GANs with IMA |
Distance
between distributions |
The classic GAN |
J-S distance |
EBGAN |
Total Variation |
Least
Square GAN |
divergence |
f-GAN |
f-divergence |
Vanishing
Gradient: What is
interesting is these infinite modeling ability GANs all suffer from the
vanishing gradient problem as discussed in [Arjovsky2015].
If the manifold of
real data and the manifold of generated data has no or negligible overlap, the
distances they are posed to minimize will become a constant causing vanishing
gradient that cannot update the generator at all. This is a problem that has
been reported in [Goodfellow2013] since the first GAN (J-S) was proposed.
Open Question: Is this only a coincidence between the IMA
and the vanishing gradient problem? In other words, is the IMA inevitably
causing vanishing gradient? Can we find some GANs with IMA but can avoid the
vanishing gradient problem? Probably.
Class 2:
Regularized Modeling Ability (RMA)
On the contrary, the
second class of GANs contain the WGAN and LS-GAN recently developed. By
chance, both models are built based on Lipschitz regularities [Qi2017], and
they do not need infinite modeling ability to prove the consistency conjecture.
Even more, in
Appendix D of [Qi2017], it is shown that both WGAN and LS-GAN belong to a super
class of Generalized LS-GAN (GLS-GAN):
1. WGAN is the GLS-GAN with a cost of ,
2. LS-GAN is the GLS-GAN with a
cost of .
More GLS-GANs can
be found by defining a proper cost function satisfying some conditions
[Qi2017].
An interesting
example is when , it will define a GLS-GAN with a L_{1}
cost that minimizes the difference between the loss functions of real and
generated samples (up to their margin). This is quite different from the idea
behind both LS-GAN and WGAN, but preliminary results show that it works!
For details,
please refer to Appendix D of [Qi2017].
Obviously, there
should exist the other GANs with RMA, but not belonging to the GLS-GAN.
More research efforts deserve to discover more GANs with RMA that can properly
regularize the data generation process by introducing priors characterizing the
true data distribution beyond the Lipschitz assumptions. This should be an
interesting direction to expand the territory of the GANs.
Examples of
GLS-GAN (Generalized LS-GAN)
We can adopt a
cost function of Leaky Linear Rectifier to define the GLS-GAN with a slope . Then WGAN is GLS-GAN with and LS-GAN is GLS-GAN with .
Similarly, more
GLS-GANs with various cost functions can be defined. Below we illustrate some
results generated by the GLS-GAN with the Leaky Linear Rectifier cost of
different slopes.
(a)
GLS-GAN with (b)
LS-GAN, i.e., GLS-GAN with
(c) WGAN, i.e., GLS-GAN with
The source code of
the GLS-GAN is available at github/glsgan,
and the details about GLS-GAN can be found at [Qi2017].
References
[Qi2017] G.-J. Qi,
Loss-Sensitive Generative Adversarial Networks, preprint, 2017.
[Goodfellow2013]
I. Goodfellow, J. Pouget-Abadie,
M. Mirza, B. Xu, D. Warde-Farley, S. Qzair, A. Courville and Y. Bengio, Generativ Adversarial
Nets, in Advances in Neural Information Processing Systems, 2014, pp. 2672-2680.
[Arjovsky2015] M. Arjovsky and L. Bottou. Towards
Principled Methods for Training Generative Adversarial Networks, 2015.
[Nowozin2016] S. Nowozin, B. Cseke, and R. Tomioka, f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization, 2016.
[Arjovsky2017] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN, 2017.
[Zhao2016] J.
Zhao, M. Mathieu, and Y. LeCun, Energy-based
Generative Adversarial Network, 2016.
[Mao2017] X. Mao,
Q. Li, H. Xie, R. Y.K. Lau and Z. Wang, Least Squares
Generative Adversarial Networks, 2017.