Author: TeYang, Lau
Last Updated: 9 August 2020
Please refer to this notebook on Kaggle for a more detailed description, analysis and insights of the project.
For this project, I applied deep learning and convolutional neural networks onto some of my own personal pictures, after finishing the Convolutional Neural Network course on Coursera by Deeplearning.ai. This is more of a fun project to create new artistic style pictures from my own pictures while at the same time brush up on my deep learning and CNN skills.
- To create new artistic style photos by merging my own photos with a style photo
- Understand the ways CNN can be modified for other purposes
- Understand the core of how neural style transfer works
- Image preprocessing for input into model
- Selecting content and style layers from CNN model
- Defining generated, content and style cost function
- Minimizing the loss between generated, and content and style images
Neural Style Transfer (NST) is a machine learning optimization technique that uses 2 images - a content image and a style image, and blends them together so the output (generated image) looks like the content image, but 'painted' in the style of the style image. This algorithm was first introduced by Gatys et al. in the paper called A Neural Algorithm of Artistic Style. The core innovation of NST is the use of deep learning to minimize the generated image's content and style so that it matches that of the content and style images. By doing so, it allows us to compose our own pictures in the style of any other paintings/pictures ranging from google images to famous paintings.
Here we will attempt to paint the content image in the style of the style image.
The cost function is the most important idea in neural style transfer. The overall idea is simple. We want the generated image to have the same content as the content image while having the same style as the style image. Therefore, we are looking to reduce the cost/loss of the generated image, which we define as the sum of its content cost and style cost:
Content cost is defined as the euclidean distance between the activations of the content and generated image at a specific layer:
Style cost is defined as the Frobenius norm of the gram matrices of the style and generated image at a specific layer:
With each epoch in training the model, the generated image becomes more similar in content and style to the content and style images respectively. Refer to the notebook on Kaggle to look at this animated process!
Neural style transfer is a fun activity for learning deep learning and neural networks and can be a good way to take a break from doing/learning too much deep learning while still learning and practicing it. Fun aside, NST also has real life applications. For example, it can be used as a data augmentation tool for creating new medical images, especially for positive cases, which are usually more scarce compared to negative cases. It has also been used to improve 3D Cardiovascular MR Image Segmentation. This is the era of deep learning and AI and I look forward to seeing what they can do and contribute to the medical and health industry.