Traffic signs recognition. Preparing russian dataset for learning. Part 2.

30.06.2017
By

I have described bad attempts to train RNN with Russian Traffic dataset before – here for example.

To solve this problem first I have sorted all images into classes to it’s own directory and found that quality of images is very bad:

1. there is a lot of images which can’t recognize by human – also computer can’t teach some rools from tis images – for example:

WTF?!!!!

 

Of cource we need to delete all this traffic signs from training classes to prepare good base for learning…

2. Some images have prepared from one image which step by step blurred…

3. About 40% of traffic sign classes have few images (from 4 to 50)…

Before training we need to fix all this problems to recieve proper result.

Ok. How to fix this problem better?

We need to use Keras library to prepare new images from existed.

Python program for images generation at the same directory where all images stored.

"""
create new image files for trainings of traffic_ru CNN
program produced new augmented files in each class directory under root directory
until reach volume (files_min_limit variable)
could be executed several times until reached this volume
"""
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import os
import pandas as pd
import numpy as np

if __name__ == '__main__':
    root = '/home/tensorflow/python_prog/traffic_ru/mod_train_cleaned_augmented/'

datagen = ImageDataGenerator(
        rotation_range=17,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.3,
        zoom_range=0.15,
        horizontal_flip=False,
        fill_mode='nearest')

# make full path to root
root = os.path.abspath(root)
#print(root)

files_min_limit = 800 # minimum files limit in one directory for training

for path, dirs, files in os.walk(root):
    # step by step in each dirs ander the root
    for d in dirs:
        print d
        # calculation of files in dir
        filescount = len([name for name in os.listdir(os.path.join(path,d)) if os.path.isfile(os.path.join(os.path.join(path,d), name))])
        print filescount
        # if we have reach file limit skip to next directory
        if filescount > files_min_limit:
            continue
        #iteraion with each file
        for name in os.listdir(os.path.join(path,d)):
            if os.path.isfile(os.path.join(os.path.join(path,d), name)):
                if "new_0" in str(name): #do not make augmented from augmented file
                    continue
                print name
                img = load_img(os.path.join(os.path.join(path,d), name))  # this is a PIL image
                x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
                x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)
                i = 0
                for batch in datagen.flow(x, batch_size=1, save_to_dir=os.path.join(path,d), save_prefix='new', save_format='jpg'):
                    i += 1
                    if i > 5: # make 5 augmented copy of file
                        break
            # how many files now? if reach skip
            if len([name for name in os.listdir(os.path.join(path,d)) if os.path.isfile(os.path.join(os.path.join(path,d), name))]) > files_min_limit:
                break

Program generates 5 copy of each existed picture inside directory. How many pictures we need to prepare for each traffic sign class? It depends off quality of recognitionYou need to reach. I have prepapred at least 800 pictures in each directory ( after Jovanny Claudio example which I have used like pattern for this project)

After that we need to make blur for each new picture. Remember – this program make blur each time when executed for all images which have «new_0″ string in the name… start only one time!

"""
add blur for new augmented images
which was created by image_gen.py
program make blur with all new augmented files in each class directory under root directory
could be executed only one time
"""

import os
import pandas as pd
import numpy as np
import cv2

if __name__ == '__main__':
    root = '/home/tensorflow/python_prog/traffic_ru/mod_train_cleaned_augmented/'

# make full path to root
root = os.path.abspath(root)
#print(root)

for path, dirs, files in os.walk(root):
    # step by step in each dirs ander the root
    for d in dirs:
        print d
        #iteraion with each file
        for name in os.listdir(os.path.join(path,d)):
            if os.path.isfile(os.path.join(os.path.join(path,d), name)):
                if "new_0" not in str(name):
                    continue
                print name
                img = cv2.imread(os.path.join(os.path.join(path,d), name))  # load image
                size = 4
                # generating the kernel
                kernel_motion_blur = np.zeros((size, size))
                kernel_motion_blur[int((size-1)/2), :] = np.ones(size)
                kernel_motion_blur = kernel_motion_blur / size
                img_blur = cv2.filter2D(img, -1, kernel_motion_blur)
                cv2.imwrite(os.path.join(os.path.join(path,d), name), img_blur)

Examples of generated blurred pictures:

 

 

Looks like real pictures… :)

Ok. We have about 58000 images stored in directories from 0 to 66 (67 classes) for training. About 6300 images stored in other root directory (testing) for testing. And special directory with same traffic sign classes structure for validation. (For validation copy training directory and delete about 80%-95% of images in each traffic sign class subdirectory randomly). I have about 3000 images for validation.

Next possible steps.

1. Transformation of images to make learning more effective

2. From Training, Testing and validation pictures preparing «pickled» data for speed up all calculations.

 """
preparing pickled file from images for fast learning
and also make preprocessing of files to train
"""
from __future__ import division

from tensorflow.python.framework import graph_util
from tensorflow.python.platform import gfile

import os
import random
import cv2
import skimage.data
import skimage.transform
import skimage.exposure
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import pickle

def load_data(data_dir):
    """
    Loads a data set and returns two lists:

    images: a list of Numpy arrays, each representing an image.
    labels: a list of numbers that represent the images labels.
    """
    # Get all subdirectories of data_dir. Each represents a label.
    directories = [d for d in os.listdir(data_dir)
                   if os.path.isdir(os.path.join(data_dir, d))]
    # Loop through the label directories and collect the data in
    # two lists, labels and images.
    labels = []
    images = []
    for d in directories:
        label_dir = os.path.join(data_dir, d)
        file_names = [os.path.join(label_dir, f)
                      for f in os.listdir(label_dir) if (f.endswith(".jpg") or f.endswith(".jpeg"))]
        # For each label, load it's images and add them to the images list.
        # And add the label number (i.e. directory name) to the labels list.
        for f in file_names:
            #image = cv2.imread(f, cv2.COLOR_BGRA2RGB)
            image = skimage.data.imread(f)
            image = cv2.resize(image, (32, 32))
            #images.append(skimage.data.imread(f))
            images.append(image)
            labels.append(int(d))
        print d
    images_np = np.array(images)
    return images_np, labels

##########################################################################################################
# preprocessing images
##########################################################################################################
def pre_processing_single_img (img):

    img_y = cv2.cvtColor(img, (cv2.COLOR_BGR2YUV))[:,:,0]
    img_y = (img_y / 255.).astype(np.float32)
    #img_y = exposure.adjust_log(img_y)
    img_y = (skimage.exposure.equalize_adapthist(img_y,) - 0.5)
    img_y = img_y.reshape(img_y.shape + (1,))

    return img_y

def pre_processing(X):
    print(X.shape)
    X_out = np.empty((X.shape[0],X.shape[1],X.shape[2],1)).astype(np.float32)
    print(X_out.shape)
    i=0
    for idx, img in enumerate(X):
        X_out[idx] = pre_processing_single_img(img)
        i+=1
        if i%1000 == 0:
            print i
    return X_out

###########################################################################################################

# Load training and testing datasets.
ROOT_PATH = "/home/tensorflow/python_prog/traffic_ru"
train_data_dir = os.path.join(ROOT_PATH, "mod_train_cleaned_augmented")
validation_data_dir = os.path.join(ROOT_PATH, "mod_train_cleaned_validation")
test_data_dir = os.path.join(ROOT_PATH, "Testing")
pickle_data_dir = os.path.join(ROOT_PATH, "pickle_data")

print("load training data")
X_train, y_train = load_data(train_data_dir)
print("load validation data")
X_valid, y_valid = load_data(validation_data_dir)
print("load testing_data")
X_test, y_test = load_data(test_data_dir)

print("writing training data to pickled file")
d = {"features":X_train,"labels":y_train  }
dataset_name = os.path.join(pickle_data_dir, "train_ru.p")
with open(dataset_name, 'wb') as handle:
    pickle.dump(d, handle, protocol=pickle.HIGHEST_PROTOCOL)
del d

print("writing validation data to pickled file")
d = {"features":X_valid,"labels":y_valid  }
dataset_name = os.path.join(pickle_data_dir, "valid_ru.p")
with open(dataset_name, 'wb') as handle:
    pickle.dump(d, handle, protocol=pickle.HIGHEST_PROTOCOL)
del d

print("writing testing data to pickled file")
d = {"features":X_test,"labels":y_test  }
dataset_name = os.path.join(pickle_data_dir, "test_ru.p")
with open(dataset_name, 'wb') as handle:
    pickle.dump(d, handle, protocol=pickle.HIGHEST_PROTOCOL)
del d
print("pickled datasets saved")

# Preprocessing images and label dataset
print("starting preprocessing trained dataset")
X_train_p = pre_processing(X_train)
y_train_p = y_train
print("starting preprocessing validation dataset")
X_valid_p = pre_processing(X_valid)
y_valid_p = y_valid
print("starting preprocessing testing dataset")
X_test_p = pre_processing(X_test)
y_test_p = y_test
print("finished preprocessing procedure")
d = {"features":X_train_p.astype('float32'),"labels":y_train_p  }
dataset_name = "./pickle_data/train_ru_p.p"
with open(dataset_name, 'wb') as handle:
    pickle.dump(d, handle, protocol=pickle.HIGHEST_PROTOCOL)
del d
d = {"features":X_valid_p.astype('float32'),"labels":y_valid_p  }
dataset_name = "./pickle_data/valid_ru_p.p"
with open(dataset_name, 'wb') as handle:
    pickle.dump(d, handle, protocol=pickle.HIGHEST_PROTOCOL)
del d
d = {"features":X_test_p.astype('float32'),"labels":y_test_p  }
dataset_name = "./pickle_data/test_ru_p.p"
with open(dataset_name, 'wb') as handle:
    pickle.dump(d, handle, protocol=pickle.HIGHEST_PROTOCOL)
del d
print("preprocessed datasets saved")

After tis program we have 3 pickled files with all images. 3 only pickled (train, validation and test) and 3 preprocessed.

How looks preprocessed image?

You can see how from light and dark image we have recieved good similar images for recognition.

Tags: , , , , ,

Добавить комментарий