Image Classification using Tensorflow

Image classification is one of the fundamental problems in computer vision and it covers up a wide range of building blocks to build more complicated models. In this particular article, we wish to build a model which takes an image as an input and categorize whether the image is of a cat or a dog. We divide this process into five steps.

Image credit: Unsplash


Classification is a type of problem where we want our model to categorize the input data sample into different categories. Image classification is one of the fundamental problems in computer vision and it covers a wide range of building blocks to build more complicated models. In this particular article, we wish to build a model which takes an image as an input and categorizes whether the image is of a cat or a dog. We divide this process into five steps.

Complete code can be found here.

Understand the Data

For this article, we are using the Dog vs Cat Image classification dataset available on Kaggle. You can directly download the zip file using the link or set up the Kaggle API on your system and use the following command to download the zip file.

mkdir -p ~/DogsvsCats && \
cd ~/DogsvsCats && \
mkdir -p data && \
cd data && \
kaggle competitions download -c dogs-vs-cats && \
unzip && \
unzip && \
unzip && \
rm && \
cd ..

Here we are downloading the zip file, unzipping the training and test datasets, and then removing the zip files.

We can use the following commands to check the project structure.

tree -d

Our project structure appears to be

└── data
    ├── test1
    └── train

We can check the image files from the train directory.

ls data/train/ | head -20

Split the Data

Let’s create a python script and open it in the Sublime Editor.

echo >

After importing the necessary packages and defining some necessary variables, we jump right into the process of splitting our data. Here our goal is to split our total dataset into training and validation set. Our strategy is to take all the images of cat class, randomly shuffle them, and then keep the 80% of images for the training set and the rest of the 20% of the images will be put in the validation set. We repeat the same process for dog class as well. We define the following two functions to perform that task.

def get_per_class_image_list(image_dir, image_list, class_name, split_iden=".", split_iden_index=0, shuffle=True):
	class_name_image_list = [os.path.join(image_dir, image_file) for image_file in image_list if image_file.split(split_iden)[split_iden_index] == class_name]
	if shuffle:
	print("For Class {} Found {} Images".format(class_name, len(class_name_image_list)))
	return class_name_image_list
def split_data(image_dir, image_list, class_list, split_index):
	train_data_dict = {"images":[], "labels":[]}
	val_data_dict = {"images":[], "labels":[]}
	for i, class_name in enumerate(class_list):
		class_image_list = get_per_class_image_list(image_dir=image_dir, image_list=image_list, class_name=class_name)
		train_image_list = class_image_list[:int(len(class_image_list)*split_index)]
		train_label_list = [i for k in train_image_list]
		val_image_list = class_image_list[int(len(class_image_list)*split_index):]
		val_label_list = [i for k in val_image_list]
	return train_data_dict, val_data_dict

At the end of this step we have two dictionaries containig complete image file paths under images key and associated label under labels key.

train_data_dict = {"images" : [], "labels":[]}
val_data_dict = {"images" : [], "labels":[]}

Build Data Pipelines

To build our data pipeline, we use TensorFlow’s data API. As per TensorFlow’s website, The API enables you to build complex input pipelines from simple, reusable pieces.

Simply put instead of building individual custom functions to loop over our train_data_dict, shuffle the entries, read, resize and rescale the image files, and create batches; we use blocks from to perform those tasks for us. I will write a separate medium post to explain the in-depth and compare it with other options.

def get_img_file(img_path, input_shape):
	image =
	image = tf.image.decode_jpeg(image, channels=3)
	image = tf.image.resize(image, [input_shape[0], input_shape[1]], antialias=True)
	image = tf.cast(image, tf.float32)/255.0
	return image
def parse_function(ip_dict, input_shape):
	label = ip_dict["labels"]
	image = get_img_file(img_path=ip_dict["images"], input_shape=input_shape)
	return image, label
def get_data_pipeline(data_dict, batch_size, input_shape):
	total_images = len(data_dict["images"])
	with tf.device('/cpu:0'):
		dataset =
		dataset = dataset.shuffle(total_images)
		dataset = ip_dict: parse_function(ip_dict, input_shape),
		dataset = dataset.batch(batch_size)
		dataset = dataset.prefetch(buffer_size=1)
	return dataset

Build the Classification Model

Define Model Architecture

Sometimes in TensorFlow, we refer to this step as graph building. In this case, we are going to create a convolutional neural network-based architecture. Particularly we use the layer configuration from paper VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Specifically, we are going to use their 16 weight layer architecture which is also known as VGG 16 Model. We use tf.keras.layers to use the predefined layers to build CNN. On a side note, if you are new to Deep learning or you are an expert working in the field, if you have not read this paper, I highly recommend you read it. Apart from the main contribution, the paper discusses many small details that can help you to build very robust classification models.

def get_model_arch(input_shape, last_layer_nodes=1000, last_layer_activation='sigmoid'):
	input_img = Input(input_shape, name='input')
	x = Convolution2D(64, (3, 3), activation='relu', padding='same', name='fe0_conv1')(input_img)
	x = Convolution2D(64, (3, 3), activation='relu', padding='same', name='fe0_conv2')(x)
	x = MaxPooling2D((2, 2), padding='same', name='fe0_mp')(x)

	x = Convolution2D(128, (3, 3), activation='relu', padding='same', name='fe1_conv1')(x)
	x = Convolution2D(128, (3, 3), activation='relu', padding='same', name='fe1_conv2')(x)
	x = MaxPooling2D((2, 2), padding='same', name='fe1_mp')(x)
	x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='fe2_conv1')(x)
	x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='fe2_conv2')(x)
	x = Convolution2D(256, (3, 3), activation='relu', padding='same', name='fe2_conv3')(x)
	x = MaxPooling2D((2, 2), padding='same', name='fe2_mp')(x)

	x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='fe3_conv1')(x)
	x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='fe3_conv2')(x)
	x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='fe3_conv3')(x)
	x = MaxPooling2D((2, 2), padding='same', name='fe3_mp')(x)

	x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='fe4_conv1')(x)
	x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='fe4_conv2')(x)
	x = Convolution2D(512, (3, 3), activation='relu', padding='same', name='fe4_conv3')(x)
	x = MaxPooling2D((2, 2), padding='same', name='fe4_mp')(x)
	x = Flatten(name='feature')(x)
	x = Dense(4096, activation='relu', name='fc0')(x)
	x = Dense(4096, activation='relu', name='fc1')(x)
	logits = Dense(last_layer_nodes, name='logits')(x)
	probabilities = Activation(last_layer_activation)(logits)
	model_arch = Model(inputs=input_img, outputs=probabilities)
	return model_arch

The only change I would recommend making in the above code is to reduce the number of nodes in two dense layers.

x = Dense(100, activation='relu', name='fc0')(x) 
x = Dense(100, activation='relu', name='fc1')(x)

Loss Function

As our problem is formulated as a Binary Classification problem, we are going to use Binary Cross-Entropy loss.

loss = tf.keras.losses.BinaryCrossentropy()


We use Adam optimizer and keep the learning rate to 0.00001

optimizer = tf.keras.optimizers.Adam(lr = LEARNING_RATE)

Performance Metrics

Apart from the loss function, the performance metric is a function to measure how well we are doing in the job of classification. One of the most common performance metrics is Accuracy. In our case, we use Binary Accuracy a performance metric.

metric = tf.keras.metrics.BinaryAccuracy(name="baccuracy")

Compile Model

The last step in the model-building process is to compile the model. The fundamental idea is to check the compatibility and build a package of our model architecture, loss function, optimizer, and performance metric.

model = get_model_arch(input_shape=INPUT_SHAPE)
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

We can check the summary of our model using print(model.summary()).

Train Model

This part is where we actually launch the training and observe the performance of our model on the validation set. There are different options to train a model for example we can build a custom training function or we can use methods like fit or fit_generator associated with Model class. In this particular article, we are going to use thefit method.

history =, epochs=EPOCHS, validation_data=val_data_pipeline, shuffle=True, verbose=1)

We can save your model file using the following command."Weights.h5")

Bonus Step

If you are interested to plot some fancy graphs to visualize the performance of your model during the training process, you can following the function to plot the curves.

def plot_metric_curve(history, metric, title):
	plt.legend(['train', 'val'], loc='upper left')

plot_metric_curve(history, metric="loss", title="Loss Comparison")
plot_metric_curve(history, metric="baccuracy", title="Binary Accuracy Comparison")


Building and training a custom image classifier on your dataset is a simple, straightforward, and fun process. Roughly after 15 Epochs of the training, our model is overfitting and which is fine because we didn’t work on fine-tuning different parameters of our model to work best for this particular dataset. In the next article, we will discuss different tricks and techniques to make our model better.

Viral Thakar
Viral Thakar
Machine Learning Engineer

My research interests include machine learning, computer vision and social innovations.