StyleGAN[1] [2] generator network has two parts: full-connected mapping network (named mapping
), and pyramid CNN synthesis network (named g
).
Mapping
is a transformation from dimension 512 to 512, and g
is a transformation from dimension 512 to 1024×1024×3.
The design of mapping
is intended to disentangle the manifold mapping from latent space to feature variation space.
I’m interested in how the shape of learned mapping in network warps exactly, so this is my experiment.
By normalization at the beginning of mapping network, input z vectors are on the regular 512-d unit spherical surface.
Supposing mapping
is a continuous function, all possible w points from Z will distribute on a irregular closed 512-d surface.
To show a 512-d manifold is difficult, for humans only have 2-d vision. But we can show some local characteristics by dimension slicing.
Here is my way. Pick a geodesic line on sphere, i.e. a great circle, map it into W space, then show the warped result circle. To get a great circle of 512-d sphere, for generility, random sample 2 points (by a standard normal distribution sample then normalize it), then slerp between and beyond them multiple times evenly, until finished one cycle on the sphere. To show the 512-d result circle, I simply project the high dimensional line into multiple low dimensional lines. I.e. for every point w in the result circle:
$$ \textbf{w}: [w_{1}, w_{2}, w_{3}, …, w_{512}] \rightarrow \{[w_{1}, w_{2}, w_{3}], [w_{4}, w_{5}, w_{6}], …\} $$
Then plot the projections in a 3D coordinate system viewport, as you see below.
As you see in the plotting, projected circles entwines in most dimensions. So the mapping from Z to W is more rugged than I expected in the conceptual illustration. Intervals between neighbor points, though not very even, but high dimensional gauge can’t be speculated by low dimensional projections.
When you select very many dimensions (by moving the second slider to right), you will see the overall distribution of points’ coordinates. It may be a significant observation that most points congregate at the first octant (+, +, +), more exactly, the tetrahedron area with vertices about (0, 0, 0), (1.5, 0, 0), (0, 1.5, 0), (0, 0, 1.5). This phenomenon reminds ReLU activation’s effection. According to StyleGAN source code[3], Leaky ReLU is used in mapping network by default, which coincides the ploting results. In some sense, the asymmetry may be necessary to disentanglement learning. But in a further thinking, considering the network is trained on a dataset from nature, why nature need such a specific asymmetry and where it come from?
Lastly, inspection on features of generated images. Let’s suppose there are some superplanes in the Z space, which split some binary high-level semantic features, such as male/female, young/old, skin color dark/light and so on (for some feature there is no definite boundary probably, but moving along some direction, i.e. plane’s normal vector, will change this feature most rapidly)[4]. And we can safely suppose that a random great circle (with a random normal vector) on unit sphere will intersect with most feature superplanes. In fact, considering the high dimensions, 2 random superplanes will be very closed to perpendicular in most cases. So we will get an interesting inference, generated images sampling from a great circle will experience many features variation: male/female, old/young, and anything else you can imagine. So such an experiment can be helpful to see the diversity of a GAN, and test how well fitted the network with dataset.
This is my StyleGAN web porting project for research. A video demo:
paper: A Style-Based Generator Architecture for Generative Adversarial Networks ↩︎
paper: Analyzing and Improving the Image Quality of StyleGAN ↩︎
Someone may argue that feature space could be more straight for W than Z, but considering the highly irregular shape and the relation between ψ and feature intensity, I think it’s an open question. ↩︎