**CMU 16-726 Learning-based Image Synthesis**
**Assignment #4**
*Title: "Neural Style Transfer"*
*Name: Soyong Shin (soyongs@andrew.cmu.edu)*
(##) Contents
* Part 1 Introduction and Overview
* Part 2 Content Reconstruction
* Part 3 Texture Synthesis
* Part 4 Neural Transfer
(##) Part 1 Introduction and Overview
**1.1 Introduction**
![figure [sample_transfer]: Sample image of Neural Style Transfer](report/Figure1.png)
From this assignment, I implemented an algorithm that transfers the style of certain image to the another while preserving the content of that target image.
This method, called "Neural Style Transfer" is consist of two parts; content reconstruction and texture synthesis.
Content reconstruction is to build an image from noise that has similar contents with a target image, while style synthesis is to
generate an image with similar style from the target.
Then by integrating two methods, we can finally build the overall architecture that takes two images (one for the style and the other for the content) and
returns a new image that has content and style from two input images (See Figure 1).
------------------------------------------------------------------------------------------------------------------------------------------------------------
**1.2 Neural network (VGG-19)**
This algorithm uses feature extractor of **VGG-19** network architecture which consists of 5 convolutional blocks.
Outputs of the convolutional blocks are extracted features of different abstract levels ($L={1, 2, 3, 4, 5}$).
For reconstructing content and synthesizing style, the algorithm calculates content and style losses at feature level where selection of the level $L$ is one of the hyper-parameters of the algorithm.
![figure [vgg19_network]: Network Architecture of VGG-19 (Feature Extractor)](report/Figure2.png)
Figure 2 shows network architecture and diagram of VGG-19 feature extractor.
From here, I will use feature at level L $F^L$ that is the output of intermediate convolutional neural network marked as "Level $L$".
Note that this VGG-19 feature extractor is pre-trained by imageNet dataset.
------------------------------------------------------------------------------------------------------------------------------------------------------------
**1.3 Code Implementation**
To use different level of features for experiments, I duplicated VGG-19 feature extractor while inserting content and style losses at intermediate feature levels.
Each convolutional layer (*Conv2D*) comes before activation function (*ReLU*) and I implemented a function ***get_model_and_losses*** that inserts
losses right after the final convolutional layer of each convolutional block. The code implementation is as below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | def get_model_and_losses(cnn, style_img, content_img, content_layers=content_layers_default, style_layers=style_layers_default): """Duplicating CNN model while inserting losses at given feature level. Args. cnn: VGG-19 feature extractor pre-trained on ImageNet style_img: image from which synthesize tecture content_img: image from which reconstruct content content_layers: levels of feature to use for content reconstruction style_layers: levels of feature to use for texture synthesis Return. model: duplicated model with losses inserted style_losses: losses layer of texture synthesis content_losses: losses layer of content reconstruction """ cnn = copy.deepcopy(cnn) content_losses = [] style_losses = [] normalization = Normalization() model = nn.Sequential(normalization) block_idx, layer_idx = 1, 1 end_of_blocks = ['conv_1_2', 'conv_2_2', 'conv_3_3', 'conv_4_4', 'conv_5_4'] for layer in cnn.children(): if isinstance(layer, nn.Conv2d): # 2D convolutional networks name = 'conv_%d_%d'%(block_idx, layer_idx) elif isinstance(layer, nn.ReLU): # relu layer name = 'relu_%d_%d'%(block_idx, layer_idx) layer = nn.ReLU(inplace=False) if 'conv_%d_%d'%(block_idx, layer_idx) in end_of_blocks: # when current conv layer is the final layer of each block # we add content and style loss here if 'conv_%d'%block_idx in content_layers: # add content layer target = model(content_img).detach() content_loss = ContentLoss(target) model.add_module('content_loss_%d'%block_idx, content_loss) content_losses.append(content_loss) if 'conv_%d'%block_idx in style_layers: # add style layer target = model(style_img).detach() style_loss = StyleLoss(target) model.add_module('style_loss_%d'%block_idx, style_loss) style_losses.append(style_loss) if max(content_layers + style_layers) == 'conv_%d'%block_idx: break layer_idx += 1 elif isinstance(layer, nn.MaxPool2d): # pooling layer name = 'pool_%d'%(block_idx) block_idx += 1 layer_idx = 1 else: NameError, "unexpected layer name appeared!" model.add_module(name, layer) return model, style_losses, content_losses | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | class ContentLoss(nn.Module): def __init__(self, target,): super(ContentLoss, self).__init__() # Normalize target and input for each layer _target = target.detach() self.mean = _target.mean((2, 3), keepdim=True) self.std = _target.std((2, 3), keepdim=True) self.target = (_target - self.mean) / self.std def forward(self, input): # Forward pass self.loss = F.mse_loss((input - self.mean)/self.std, self.target) return input | cs |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | def gram_matrix(activations): a, b, c, d = activations.size() # a=batch size(=1) features = activations.view(a * b, c * d) gram = torch.mm(features, features.T) normalized_gram = gram.div(a * b * c * d) return normalized_gram class StyleLoss(nn.Module): def __init__(self, target_feature): super(StyleLoss, self).__init__() # Normalize feature at each layer _target = target_feature.detach() self.mean = _target.mean((2, 3), keepdim=True) self.std = _target.std((2, 3), keepdim=True) self.target = gram_matrix((_target - self.mean) / self.std) def forward(self, input): normalized_gram = gram_matrix((input - self.mean)/self.std) self.loss = F.mse_loss(normalized_gram, self.target) return input | cs |