An Example of a Convolutional Neural Network for Image Super-Resolution—Tutorial
The CNN we use in this tutorial is the Fast Super-Resolution Convolutional Neural Network (FSRCNN), based on the work described in [1] and [2], who proposed a new approach to perform single-image SR using CNNs. We describe in more detail this network and its predecessor (the Super-Resolution Convolutional Neural Network (SRCNN)) in an associated article (“An Example of a Convolutional Neural Network for Image Super-Resolution”).
FSRCNN Structure
As described in the associated article and in [2], the FSRCNN consists of the following operations:
- Feature extraction: Extracts a set of feature maps directly from the low-resolution (LR) image.
- Shrinking: Reduces dimension of feature vectors (thus decreasing the number of parameters) by using a smaller number of filters (compared to the number of filters used for feature extraction).
- Non-linear mapping: Maps feature maps representing LR patches to high-resolution (HR) ones. This step is performed using several mapping layers with filter size smaller than the one used in SCRNN.
- Expanding: Increases dimension of feature vectors. This operation performs the inverse operation as the shrinking layers in order to more accurately produce the HR image.
- Deconvolution: Produces the HR image from HR features.
The structure of the FSRCNN (56, 12, 4) model (which is the best performing model reported in [2], and described in the associated article) is shown in Figure 1. It has a LR feature dimension of 56 (number of filters both in the first convolution and in the deconvolution layer), 12 shrinking filters (the number of filters in the layers in the middle of the network, performing the mapping operation), and a mapping depth of 4 (the number of convolutional layers that implement the mapping between the LR and the HR feature space).
Figure 1: Structure of the FSRCNN (56 ,12, 4).
Training and Testing Data Preparation
Datasets to train and test this implementation are available from the authors’ [2]
website. The train dataset consists of 91 images of different sizes. There are two test datasets: Set 5 (containing 5 images) and Set 14 (containing 14 images). In this tutorial, both train and test datasets will be packed into an HDF5* file (
https://support.hdfgroup.org/), which can be efficiently used from the Caffe framework. For more information about Caffe optimized for Intel® architecture, visit
Manage Deep Learning Networks with Caffe* Optimized for Intel® Architecture.
Both train and test datasets need some preprocessing, as follows:
- Train dataset: First, the images are converted to YCrCb color space (https://en.wikipedia.org/wiki/YCbCr), and only the luminance channel Y is used in this tutorial. Each of the 91 images in the train dataset is downsampled by a factor k, where k is the scaling factor desired for super-resolution, obtaining in this way a pair of corresponding LR and HR images. Next, each image pair (LR/HR) is cropped into a subset of small subimages, using stride s, so we end up with N pairs of LR/HR subimages for each one of the 91 original train images. The reason for cropping the images for training is that we want to train the model using both LR and HR local features located in a small area. The number of subimages, N, depends on the size of the subimages and the stride s. The authors of [2], for their experiments define a 7x7 pixels size for the LR subimages, and a 21x21 pixels size for the HR subimages, which corresponds to a scaling factor k=3.
- Test dataset: Each image in the test dataset is processed in the same way as the training dataset, with the exception that the stride s can be larger than the one used for training, to accelerate the testing procedure.
The following Python code snippets show one possible way to generate the train and test datasets. We use OpenCV* (
http://opencv.org/) to handle and preprocess the images. The first snippet shows how to generate the HR and LR subimage pair set from one of the original images in the 91-image train dataset for the specific case where scaling factor
k=3 and stride = 19:
sys.path.append( '$CAFFE_HOME/opencv-2.4.13/release/lib/' ) |
image = cv2.imread( '<PATH TO FILES>/Train/t1.bmp' ) |
image_ycrcb = cv2.cvtColor(image, cv2.COLOR_RGB2YCR_CB) |
image_ycrcb = image_ycrcb[:,:, 0 ] |
image_ycrcb = image_ycrcb.reshape((image_ycrcb.shape[ 0 ], image_ycrcb.shape[ 1 ], 1 )) |
height_small = int (height / scale) |
width_small = int (width / scale) |
image_pair_HR = cv2.resize(image_ycrcb, (width_small * scale, height_small * scale) ) |
image_pair_LR = cv2.resize(image_ycrcb, (width_small, height_small) ) |
input_HR = np.zeros((size_ground, size_ground, 1 , 1024 )) |
input_LR = np.zeros((size_input + 2 * size_pad, size_input + 2 * size_pad, 1 , 1024 )) |
height, width = image_pair_HR.shape[: 2 ] |
for i in range ( 0 , height - size_ground + 1 , stride): |
for j in range ( 0 , width - size_ground + 1 , stride): |
subimage_HR = image_pair_HR[i:i + size_ground, j:j + size_ground] |
height_small = size_input |
subimage_LR = cv2.resize(subimage_HR, (width_small, height_small) ) |
np.lib.pad(subimage_LR, ((size_pad, 2 ), ( 2 , 2 )), 'constant' , constant_values = ( 0.0 )) |
input_HR[:,:, 0 ,count - 1 ] = subimage_HR |
input_LR[:,:, 0 ,count - 1 ] = np.lib.pad(subimage_LR, ((size_pad, 2 ), ( 2 , 2 )), 'constant' , constant_values = ( 0.0 )) |
The next snippet shows how to use the python h5py module to create an hdf5 file that contains the HR and LR subimage pair set created in the previous snippet:
with h5py. File ( 'train1.h5' , 'w' ) as H5: |
H5.create_dataset( 'Input' , data = input_LR ) |
H5.create_dataset( 'Ground' , data = input_HR ) |
The previous two snippets can be used to create the hdf5 file containing the entire training set of 91 images to be used for training in Caffe.
FSRCNN Training
The reference model (described in the previous section) is implemented using
Intel® Distribution for Caffe, which has been optimized to run on Intel CPUs. An introduction to the basics of this framework and directions to install it can be found at the
Intel® AI Academy.
In Caffe, models are defined using protobuf files. The FSRCNN model can be downloaded from the authors’ [2]
website. The code snippet below shows the input layer and the first convolutional layer of the FSRCNN (56, 12, 4) model defined by its authors [2]. The input layer reads the train/test data from the files whose filenames are defined in the source files located in the
$HOME_CAFFE/examples
directory
(train.txt
and
test.txt)
. The batch size for training is 128.
source: "examples/FSRCNN/train.txt" |
include: { phase: TRAIN } |
source: "examples/FSRCNN/test.txt" |
To train the above model, the authors of [2] provide in their website a solver protobuf file containing the training parameters and the location of the protobuf network definition file:
# The train/test net protocol buffer definition
net: "examples/FSRCNN/FSRCNN.prototxt"
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 5000
# The base learning rate, momentum and the weight decay of the network.
#base_lr: 0.005
base_lr: 0.001
momentum: 0.9
weight_decay: 0
# Learning rate policy
lr_policy: "fixed"
# Display results every 100 iterations
display: 1000
# Maximum number of iterations
max_iter: 1000000
# write intermediate results (snapshots)
snapshot: 5000
snapshot_prefix: "examples/FSRCNN/RESULTS/FSRCNN-56_12_4"
# solver mode: CPU or GPU
solver_mode: CPU
The solver shown above will train the network defined in the model definition file FSRCNN.prototxt
using the following parameters:
- The test interval will be every 5000 iterations, and 100 is the number of forward passes the test should perform.
- The base learning rate will be 0.005, and the learning rate policy is fixed, which means the learning rate will not change with time. Momentum is 0.9 (a common choice) and weight_decay is zero (no regularization to penalize large weights).
- Intermediate results (snapshots) will be written to disk every 5000 iterations, and the maximum number of iterations (when the training will stop) is 1000000.
- Snapshot results will be written to the
examples/FSRCNN/RESULTS
directory (assuming we run Caffe from the install directory $CAFFE_ROOT
). Model files (containing the trained weights) will be pre-fixed by the string ‘FSRCNN-56_12_4
’.
The reader is encouraged to experiment with different parameters. One useful option is to define a small maximum number of iterations and explore how the test error decreases, and compare this rate between different sets of parameters.
Once the network definition and solver files are ready, start training by running the caffe
command located in the build/tools
directory:
export CAFFE_ROOT=< Path to caffe >
$CAFFE_ROOT/build/tools/caffe train -engine "MKL2017" –solver \ $CAFFE_ROOT/examples/FSRCNN//FSRCNN_solver.prototxt 2>$CAFFE_ROOT/examples/FSRCNN/output.log
Resume Training Using Saved Snapshots
After training the CNN, the network parameters (weights) will be written to disk according to the frequency specified by the snapshot parameter. Caffe will create two files at each snapshot:
FSRCNN-56_12_4_iter_1000000.caffemodel
FSRCNN-56_12_4_iter_1000000.solverstate
The model file contains the learned model parameters corresponding to the indicated iteration, serialized as binary protocol buffer files. The solver state file is the state snapshot containing all the necessary information to recover the solver state at the time of the snapshot. This file will let us resume training from the snapshot instead of restarting from scratch. For example, let us assume we ran training for 1 million iterations, and after that we realize that we need to run it for an extra 500K iterations to further reduce the testing error. We can restart the training using the snapshot taken after 1 million iterations:
$CAFFE_ROOT/build/tools/caffe train -engine "MKL2017" –solver\ $CAFFE_ROOT/examples/FSRCNN//FSRCNN_solver.prototxt –snapshot\ $CAFFE_ROOT/examples/FSRCNN/RESULTS/FSRCNN-56_12_4_iter_1000000.solverstate\ 2>$CAFFE_ROOT/examples/FSRCNN/output_resume.log
So, the new training will run until the new number of iterations specified in the solver file is reached, which in this case is 1500000.
FSRCNN Testing Using Pre-Trained Parameters
Once we have a trained model, we can use it to perform super-resolution on an input LR image. We can test the network at any moment during the training as long as we have model snapshots already generated.
In practice, we can use the super-resolution model we trained to increase the resolution on any image or video. However, for the purposes of this tutorial, we want to test our trained model in a LR image for which we have an HR image to compare with. To this effect, we will use a sample image from the test dataset that is used in [1] and [2] (from the Set5 dataset, which is also commonly used to test SR models in other publications).
To perform the test, we will use a sample image (butterfly) as the ground truth. To create the input LR image, we will blur and downsample the ground truth image, and will use it to feed the trained network. Once we forward-run the network with the input image, obtaining a super-resolved image as output, we will compare the three images (ground truth, LR, and super-resolved) to visually evaluate the performance of the SR network we trained.
The test procedure described above can be implemented in several ways. As an example, the following Python script implements the testing procedure using the OpenCV library for image handling:
caffe_root = '$APPS/caffe/' |
sys.path.insert( 0 , caffe_root + 'python' ) |
sys.path.append( 'opencv-2.4.13/release/lib/' ) |
net = caffe.Net(caffe_root + 'FSRCNN_predict.prototxt' , |
caffe_root + 'examples/FSRCNN/RESULTS/FSRCNN-56_12_4_iter_300000.caffemodel' , caffe.TRAIN) |
input_dir = caffe_root + 'examples/SRCNN/DATA/Set5/' |
im_raw = cv2.imread(caffe_root + '/examples/SRCNN/DATA/Set5/butterfly.bmp' ) |
ycrcb = cv2.cvtColor(im_raw, cv2.COLOR_RGB2YCR_CB) |
im_raw = im_raw.reshape((im_raw.shape[ 0 ], im_raw.shape[ 1 ], 1 )) |
im_blur = cv2.blur(im_raw, ( 4 , 4 )) |
im_small = cv2.resize(im_blur, ( int (im_raw.shape[ 0 ] / scale), int (im_raw.shape[ 1 ] / scale))) |
im_raw = im_raw.reshape(( 1 , 1 , im_raw.shape[ 0 ], im_raw.shape[ 1 ])) |
im_blur = im_blur.reshape(( 1 , 1 , im_blur.shape[ 0 ], im_blur.shape[ 1 ])) |
im_small = im_small.reshape(( 1 , 1 , im_small.shape[ 0 ], im_small.shape[ 1 ])) |
c1,c2,h,w = im_input.shape |
net.blobs[ 'data' ].data[...] = im_input |
mat = (mat[ 0 ,:,:]).astype( 'uint8' ) |
im_raw = im_raw.reshape((im_raw.shape[ 2 ], im_raw.shape[ 3 ])) |
im_blur = im_blur.reshape((im_blur.shape[ 2 ], im_blur.shape[ 3 ])) |
im_comp = im_blur.reshape((im_comp.shape[ 2 ], im_comp.shape[ 3 ])) |
cv2.imshow( "image" ,im_raw) |
cv2.imshow( "image2" ,im_comp) |
Running the above script on the test image displays the output shown in Figure 2. Readers are encouraged to try this network and refine the parameters to obtain better super-resolution results.
Figure 2: Testing the trained FSRCNN. The left image is the ground truth. The image in the center is the ground truth after being blurred and downsampled. The image on the right is the super-resolved image using a model snapshot after 300000 iterations.
Summary
In this short tutorial, we have shown how to train and test a CNN for super-resolution. The CNN we described is the Fast Super-Resolution Convolutional Neural Network (FSRCNN) [2], which is described in more detailed in in an associated article (“An Example of a Convolutional Neural Network for Image Super-Resolution”). This particular CNN was chosen for this tutorial because of its relative simplicity, good performance, and the importance of the authors’ work in the area of CNNs for super-resolution. Several new CNN architectures for super-resolution have been described in the literature recently, and several of them compare their performance to the FSRCNN or its predecessor, created by the same authors: the SRCNN [1].
The training and testing in this tutorial was performed using Intel® Xeon® processors, using the Intel Distribution for Caffe deep learning framework and Intel Distribution for Python, which is optimized to run on Intel Xeon processors.
Deep learning-based image/video super-resolution is an exciting development in the field of computer vision. Readers are encouraged to experiment with this network, as well as newer architectures, and test with their own images and videos. To start using Intel’s optimized tools for machine learning and deep learning, visit Intel® Developer Zone (Intel® DZ).
No comments:
Post a Comment