ASL Hand-Shape Classification with Convolutional Neural Networks

Oct 18, 2022
Neural Networks

My Role

  • Sole DeveloperProject Manager

Objective

This graduate capstone integrates skills and knowledge gained from graduate program domains into a single project.

MSDA Graduate Capstone, WGU

Introduction

This Project comes in 6 parts:

Summary

Integrated skills and knowledge from graduate program domains into a single project:

  • Created classification model based on the ‘AlexNet’ convolutional neural network architecture
  • Assessed 21B+ data points, resulting in 98% test set accuracy
  • Performed with R and Tensorflow utilizing Advanced Memory Management & Multithreading
  • Project acts as a precursor to NLP for ASL

Project Proposal

Project Topic: Image Classification using Convolutional Neural Networks

Research Question: To what extent can hand shape be accurately classified from images?

Hypothesis: Images can be classified with 90% accuracy using Convolutional Neural Networks.

Data Analytics Tools and Techniques: The data analysis technique that will be used to classify the images is a Convolutional Neural Network using Keras/Tensorflow.

Justification of Tools/Techniques: CNNs are an industry standard in image classification analyses because they perform "phenomenally well on computer vision tasks" (Rizvi, 2022). Instead of the analyst identifying the key features of the classes of images, CNNs break down image data into a three-dimensional matrix representing color components per pixel and "'learns' how to extract these features, and ultimately infer what object they constitute" (Google Developers, 2022). Because the network learns from the raw data rather than be interpreted by the analyst, the network is capable of determining which features are most important and even identify features a human analyst could not.

Application type: Stand-alone. Analysis can be accessed on desktop via download to run in RStudio or viewed on the web as html R Notebook.

Programming/development language and Tools:

  • R version 4.2.1,
  • Python v3.8.13 (through R's Reticulate library)
  • Tensorflow v2.9.2
  • Azure Virtual Machine
  • RStudio Desktop
  • Anaconda/Miniconda
R LibraryUsed for
caretClassification and Regression Training
jsonliteA Simple and Robust JSON Parser and Generator for R
kerasR Interface to 'Keras'
tidyverseR Packages for Data Science
imagerImage data manipulation
magickImage data manipulation

The Data Set

Data Set Image

The data for this analysis consists of 27,000 images of hands displaying ASL alphabet shapes. This is a synthesized set of images created by Lexset, which has been released under a CC BY-NC (Creative Commons Attribution-NonCommercial 4.0 International) license in order to promote their synthetic data generation software platform product “Seahaven” which was used to create the data set.

(image size) _ (rgb values) _ (number of images) =
512  _  513  _  3  *  27000 = 21 Billion data points

Example Image
Data setPercent DistributionNumber of Images
train72%19440
validation18%4860
test10%2700
total27000

Convolutional Neural Network Model

LayerTypekStridePaddingShape-outactivation
0Input227, 227, 3
1Convolutional114'VALID'55, 55, 96'relu'
2MaxPooling32'VALID'27, 27, 96
3Convolutional51'SAME'27, 27, 256'relu'
4MaxPooling32'VALID'13, 13, 256
5Convolutional31'SAME'13, 13, 384'relu'
6Convolutional31'SAME'13, 13, 384'relu'
7Convolutional31'SAME'13, 13, 256'relu'
8MaxPooling32'VALID'6, 6, 256
9Flatten9216
10Dropout9216
11Dense4096'relu'
12Dense4096'relu'
13Output (Dense)27'softmax'
Full Diagram of My Model

Hyperparameters

ParameterChosen Value
activation functions'relu', 'softmax'
loss function'categorical_crossentropy'
optimizeradam, learning_rate = 0.00001
dropout_rate0.00000001
initializerskernel & bias = 'random_uniform'
epochsmax = 25
stopping criteriamonitor = val_accuracy, patience = 2
evaluation metricaccuracy, kappa

# Custom Layer functions

con_layer <- function(x,filters, ksize, bias, strides=1, padding='SAME'){
  layer_conv_2d(x, filters = filters,
                kernel_size = c(ksize, ksize),
                padding = padding,
                activation = "relu",
                strides=c(strides, strides),
                use_bias = TRUE,
                kernel_initializer='random_uniform',
                bias_initializer='random_uniform')}

pool_layer <- function(x){
  layer_max_pooling_2d(x, pool_size = c(3, 3), strides=2, padding= 'VALID') }

fully_connected_layer <- function(units, activation = "relu", name){
  layer_dense(units = units,
              activation = activation,
              kernel_initializer='random_uniform',
              bias_initializer='random_uniform') }


// Create Model Function

model_a <- function(learning_rate=0.0001, dropout_rate=0.00000001){
  k_clear_session()

  model <- keras_model_sequential(input_shape = c(img_size,img_size,3)) %>%
    con_layer(filters=96, ksize=11, strides = 4, padding = 'VALID') %>%
    pool_layer() %>%
    con_layer(filters=256, ksize=5) %>%
    pool_layer() %>%
    con_layer(filters=384, ksize=3) %>%
    con_layer(filters=384, ksize=3) %>%
    con_layer(filters=256, ksize=3) %>%
    pool_layer() %>%
    layer_flatten() %>%
    layer_dropout(dropout_rate)%>%
    layer_dense(units = 4096, activation= "relu", kernel_initializer='random_uniform',
                bias_initializer='random_uniform')  %>%
    layer_dense(units = 4096, activation= "relu", kernel_initializer='random_uniform',
                bias_initializer='random_uniform')  %>%
    layer_dense(name="Output", units = n_classes, activation = "softmax")

  model %>% compile(loss = "categorical_crossentropy",
    optimizer = optimizer_adam(learning_rate= learning_rate),
    metrics = "accuracy")


  return(model)
}
my_model <- model_a(learning_rate=0.00001)

Training the Model

# Train Model
batch_size <- 128
epochs <- 25
hist <- my_model %>% fit(
  train_images,
  # train_labels,
  #steps_per_epoch = train_images$n %/% batch_size,
  epochs = epochs,
  validation_data = validation_images,
  #validation_steps = validation_images$n %/% batch_size,
  verbose = 2,
     callbacks = list(callback_early_stopping(monitor = "val_accuracy", min_delta=0.001,
            patience = 2, restore_best_weights = TRUE))
)

Training Graph

Evaluating the Model

In evaluating our model, we achieved the following results:

Evaluation MetricValue
precision0.939569
recall0.937778
f10.937799
accuracy0.937778
rand_accuracy0.037037
kappa0.935385
View R Notebook Output

Reporting & Presentations

Presentation Slides Cover
Analysis Report Presentation Slides Presentation Video

Next Steps

Continued Training with more Real-World Data

  • Expand Training & Testing Data: Enhance with "Real-World" data and augment more synthetic data that is "less-than-ideal" to get more diversity.

Hybridize with Other Models

  • Position Tracking : Add an additional network (potentially an RNN) to locate the position of the hands in the images.
  • Analysis over Time: Implement a time-series component to track the location and shape of the hand over time.