About - Image Classification with CNNs

Introduction

This Project comes in 6 parts:

Project Proposal
Data Analysis
Visual Aids & Diagrams for Reports and presentations
Analysis Report
Presentation of Results to Non-technical audience (Slides)
Execuitive Summary for technical audience

Summary

Integrated skills and knowledge from graduate program domains into a single project:

Created classification model based on the ‘AlexNet’ convolutional neural network architecture
Assessed 21B+ data points, resulting in 98% test set accuracy
Performed with R and Tensorflow utilizing Advanced Memory Management & Multithreading
Project acts as a precursor to NLP for ASL

Project Proposal

Project Topic: Image Classification using Convolutional Neural Networks

Research Question: To what extent can hand shape be accurately classified from images?

Hypothesis: Images can be classified with 90% accuracy using Convolutional Neural Networks.

Data Analytics Tools and Techniques: The data analysis technique that will be used to classify the images is a Convolutional Neural Network using Keras/Tensorflow.

Justification of Tools/Techniques: CNNs are an industry standard in image classification analyses because they perform "phenomenally well on computer vision tasks" (Rizvi, 2022). Instead of the analyst identifying the key features of the classes of images, CNNs break down image data into a three-dimensional matrix representing color components per pixel and "'learns' how to extract these features, and ultimately infer what object they constitute" (Google Developers, 2022). Because the network learns from the raw data rather than be interpreted by the analyst, the network is capable of determining which features are most important and even identify features a human analyst could not.

Application type: Stand-alone. Analysis can be accessed on desktop via download to run in RStudio or viewed on the web as html R Notebook.

Programming/development language and Tools:

R version 4.2.1,
Python v3.8.13 (through R's Reticulate library)
Tensorflow v2.9.2
Azure Virtual Machine
RStudio Desktop
Anaconda/Miniconda

R Library	Used for
caret	Classification and Regression Training
jsonlite	A Simple and Robust JSON Parser and Generator for R
keras	R Interface to 'Keras'
tidyverse	R Packages for Data Science
imager	Image data manipulation
magick	Image data manipulation

The Data Set

The data for this analysis consists of 27,000 images of hands displaying ASL alphabet shapes. This is a synthesized set of images created by Lexset, which has been released under a CC BY-NC (Creative Commons Attribution-NonCommercial 4.0 International) license in order to promote their synthetic data generation software platform product “Seahaven” which was used to create the data set.

(image size) _ (rgb values) _ (number of images) =
512 _ 513 _ 3 * 27000 = 21 Billion data points

Data set	Percent Distribution	Number of Images
train	72%	19440
validation	18%	4860
test	10%	2700
total		27000

Convolutional Neural Network Model

Layer	Type	k	Stride	Padding	Shape-out	activation
0	Input	—	—	—	227, 227, 3	—
1	Convolutional	11	4	'VALID'	55, 55, 96	'relu'
2	MaxPooling	3	2	'VALID'	27, 27, 96	—
3	Convolutional	5	1	'SAME'	27, 27, 256	'relu'
4	MaxPooling	3	2	'VALID'	13, 13, 256	—
5	Convolutional	3	1	'SAME'	13, 13, 384	'relu'
6	Convolutional	3	1	'SAME'	13, 13, 384	'relu'
7	Convolutional	3	1	'SAME'	13, 13, 256	'relu'
8	MaxPooling	3	2	'VALID'	6, 6, 256	—
9	Flatten	—	—	—	9216	—
10	Dropout	—	—	—	9216	—
11	Dense	—	—	—	4096	'relu'
12	Dense	—	—	—	4096	'relu'
13	Output (Dense)	—	—	—	27	'softmax'

Hyperparameters

Parameter	Chosen Value
activation functions	'relu', 'softmax'
loss function	'categorical_crossentropy'
optimizer	adam, learning_rate = 0.00001
dropout_rate	0.00000001
initializers	kernel & bias = 'random_uniform'
epochs	max = 25
stopping criteria	monitor = val_accuracy, patience = 2
evaluation metric	accuracy, kappa


# Custom Layer functions

con_layer <- function(x,filters, ksize, bias, strides=1, padding='SAME'){
  layer_conv_2d(x, filters = filters,
                kernel_size = c(ksize, ksize),
                padding = padding,
                activation = "relu",
                strides=c(strides, strides),
                use_bias = TRUE,
                kernel_initializer='random_uniform',
                bias_initializer='random_uniform')}

pool_layer <- function(x){
  layer_max_pooling_2d(x, pool_size = c(3, 3), strides=2, padding= 'VALID') }

fully_connected_layer <- function(units, activation = "relu", name){
  layer_dense(units = units,
              activation = activation,
              kernel_initializer='random_uniform',
              bias_initializer='random_uniform') }


// Create Model Function

model_a <- function(learning_rate=0.0001, dropout_rate=0.00000001){
  k_clear_session()

  model <- keras_model_sequential(input_shape = c(img_size,img_size,3)) %>%
    con_layer(filters=96, ksize=11, strides = 4, padding = 'VALID') %>%
    pool_layer() %>%
    con_layer(filters=256, ksize=5) %>%
    pool_layer() %>%
    con_layer(filters=384, ksize=3) %>%
    con_layer(filters=384, ksize=3) %>%
    con_layer(filters=256, ksize=3) %>%
    pool_layer() %>%
    layer_flatten() %>%
    layer_dropout(dropout_rate)%>%
    layer_dense(units = 4096, activation= "relu", kernel_initializer='random_uniform',
                bias_initializer='random_uniform')  %>%
    layer_dense(units = 4096, activation= "relu", kernel_initializer='random_uniform',
                bias_initializer='random_uniform')  %>%
    layer_dense(name="Output", units = n_classes, activation = "softmax")

  model %>% compile(loss = "categorical_crossentropy",
    optimizer = optimizer_adam(learning_rate= learning_rate),
    metrics = "accuracy")


  return(model)
}
my_model <- model_a(learning_rate=0.00001)

Training the Model

# Train Model
batch_size <- 128
epochs <- 25
hist <- my_model %>% fit(
  train_images,
  # train_labels,
  #steps_per_epoch = train_images$n %/% batch_size,
  epochs = epochs,
  validation_data = validation_images,
  #validation_steps = validation_images$n %/% batch_size,
  verbose = 2,
     callbacks = list(callback_early_stopping(monitor = "val_accuracy", min_delta=0.001,
            patience = 2, restore_best_weights = TRUE))
)

Evaluating the Model

In evaluating our model, we achieved the following results:

Evaluation Metric	Value
precision	0.939569
recall	0.937778
f1	0.937799
accuracy	0.937778
rand_accuracy	0.037037
kappa	0.935385

View R Notebook Output

Reporting & Presentations

Analysis Report Presentation Slides Presentation Video

Next Steps

Continued Training with more Real-World Data

Expand Training & Testing Data: Enhance with "Real-World" data and augment more synthetic data that is "less-than-ideal" to get more diversity.

Hybridize with Other Models

Position Tracking : Add an additional network (potentially an RNN) to locate the position of the hands in the images.
Analysis over Time: Implement a time-series component to track the location and shape of the hand over time.