## Image-to-Image Translation with Conditional Adversial Networks

by BAIR | 21 NOV 2016

arXiv: 1611.07004v1

#### Architecture

- relationship of 1-M M-1 for image is analogous to language translation in many sense
- ▢ identify underlying patterns of language modelling effort that could be application-agnostic
- UNet is helpful in preserving low-level info compared to autoencoder. (image pairs share lots of low level details. auto-encoder would lose it)
- advantage not specific to cGAN; also works for L1 loss

- ▢ Is text/sentences of two languages also shares this attributes ?

#### Generator

$$\mathcal{L} _{\mathit{cGAN}} (G, D) = \begin{aligned} & \hspace{0.2cm}\mathbb{E} _{x,y \sim p_{data}(x,y)}[logD(x,y)] \\ + & \hspace{0.2cm} \mathbb{E}_{x\sim p_{data}(x),z \sim p(z)}[log(1-D(x, G(x,z))]\end{aligned}$$

- Usage of GAN automates the process of writing a loss function and allows developers to highlight high level goals only
- cGAN compared to GAN for sharper images and better segmentation as it learn structured loss.
- ▢ predicted translation semantic tree as the backbone for a prior conditionals

#### Discriminator

- pixelGAN cannot increase spatial sharpness; patchGAN yields good results; imageGAN has no significant improvement but add O(n)
- patchGAN is useful for scaling across large images.
- could be use as a conv. filter to generate

#### Loss Function

- L1 leads to narrower distribution than ground truth. It prefers grayish color (median of color distrbutions).
- Discriminator can identify grayish outputs as specific features to capture
- Adversial loss could push the distrubution closer to the ground truth. It could perform "sharpening"
- Edges -> not blur (sharp lines)
- Colors -> not median (more colorful)

#### Unclassified

- Eval metrics determine the behaviours of both G and D.
- FCN-8s scores for evaluating if the common segmentation engion can identify generated images
- ▢ perplexity -> need approximation functions of syntax tree (real vs fake)
- img GAN is useful for ambigious task (large output space - highly detailed). But might be better off to use simple L1 when it is small output space (segmentation/classfication)
- (Table 1) cGAN in general outperform GAN. Interestingly L1+GAN has a slightly higher performance on per-pixel accurary.
- Paper is the first to demonstrate that GAN can generate discrete labels other than continous-valed variation (images)

#### PS

highlighted ref:

- Optimization - Instance normalization D. Ulyanov 1607.08022
- conditional GAN - Conditional geenrative adversarial nets. M. Mirza 1411.1784