ASL Hand-Shape Classification with Convolutional Neural Networks
My Role
- Sole DeveloperProject Manager
Objective
This graduate capstone integrates skills and knowledge gained from graduate program domains into a single project.
MSDA Graduate Capstone, WGU
Introduction
This Project comes in 6 parts:
- Project Proposal
- Data Analysis
- Visual Aids & Diagrams for Reports and presentations
- Analysis Report
- Presentation of Results to Non-technical audience (Slides)
- Execuitive Summary for technical audience
Summary
Integrated skills and knowledge from graduate program domains into a single project:
- Created classification model based on the ‘AlexNet’ convolutional neural network architecture
- Assessed 21B+ data points, resulting in 98% test set accuracy
- Performed with R and Tensorflow utilizing Advanced Memory Management & Multithreading
- Project acts as a precursor to NLP for ASL
Project Proposal
Project Topic: Image Classification using Convolutional Neural Networks
Research Question: To what extent can hand shape be accurately classified from images?
Hypothesis: Images can be classified with 90% accuracy using Convolutional Neural Networks.
Data Analytics Tools and Techniques: The data analysis technique that will be used to classify the images is a Convolutional Neural Network using Keras/Tensorflow.
Justification of Tools/Techniques: CNNs are an industry standard in image classification analyses because they perform "phenomenally well on computer vision tasks" (Rizvi, 2022). Instead of the analyst identifying the key features of the classes of images, CNNs break down image data into a three-dimensional matrix representing color components per pixel and "'learns' how to extract these features, and ultimately infer what object they constitute" (Google Developers, 2022). Because the network learns from the raw data rather than be interpreted by the analyst, the network is capable of determining which features are most important and even identify features a human analyst could not.
Application type: Stand-alone. Analysis can be accessed on desktop via download to run in RStudio or viewed on the web as html R Notebook.
Programming/development language and Tools:
- R version 4.2.1,
- Python v3.8.13 (through R's Reticulate library)
- Tensorflow v2.9.2
- Azure Virtual Machine
- RStudio Desktop
- Anaconda/Miniconda
R Library | Used for |
---|---|
caret | Classification and Regression Training |
jsonlite | A Simple and Robust JSON Parser and Generator for R |
keras | R Interface to 'Keras' |
tidyverse | R Packages for Data Science |
imager | Image data manipulation |
magick | Image data manipulation |
The Data Set
The data for this analysis consists of 27,000 images of hands displaying ASL alphabet shapes. This is a synthesized set of images created by Lexset, which has been released under a CC BY-NC (Creative Commons Attribution-NonCommercial 4.0 International) license in order to promote their synthetic data generation software platform product “Seahaven” which was used to create the data set.
(image size) _ (rgb values) _ (number of images) =
512 _ 513 _ 3 * 27000 = 21 Billion data points
Data set | Percent Distribution | Number of Images |
---|---|---|
train | 72% | 19440 |
validation | 18% | 4860 |
test | 10% | 2700 |
total | 27000 |
Convolutional Neural Network Model
Layer | Type | k | Stride | Padding | Shape-out | activation |
---|---|---|---|---|---|---|
0 | Input | — | — | — | 227, 227, 3 | — |
1 | Convolutional | 11 | 4 | 'VALID' | 55, 55, 96 | 'relu' |
2 | MaxPooling | 3 | 2 | 'VALID' | 27, 27, 96 | — |
3 | Convolutional | 5 | 1 | 'SAME' | 27, 27, 256 | 'relu' |
4 | MaxPooling | 3 | 2 | 'VALID' | 13, 13, 256 | — |
5 | Convolutional | 3 | 1 | 'SAME' | 13, 13, 384 | 'relu' |
6 | Convolutional | 3 | 1 | 'SAME' | 13, 13, 384 | 'relu' |
7 | Convolutional | 3 | 1 | 'SAME' | 13, 13, 256 | 'relu' |
8 | MaxPooling | 3 | 2 | 'VALID' | 6, 6, 256 | — |
9 | Flatten | — | — | — | 9216 | — |
10 | Dropout | — | — | — | 9216 | — |
11 | Dense | — | — | — | 4096 | 'relu' |
12 | Dense | — | — | — | 4096 | 'relu' |
13 | Output (Dense) | — | — | — | 27 | 'softmax' |
Hyperparameters
Parameter | Chosen Value |
---|---|
activation functions | 'relu', 'softmax' |
loss function | 'categorical_crossentropy' |
optimizer | adam, learning_rate = 0.00001 |
dropout_rate | 0.00000001 |
initializers | kernel & bias = 'random_uniform' |
epochs | max = 25 |
stopping criteria | monitor = val_accuracy, patience = 2 |
evaluation metric | accuracy, kappa |
# Custom Layer functions
con_layer <- function(x,filters, ksize, bias, strides=1, padding='SAME'){
layer_conv_2d(x, filters = filters,
kernel_size = c(ksize, ksize),
padding = padding,
activation = "relu",
strides=c(strides, strides),
use_bias = TRUE,
kernel_initializer='random_uniform',
bias_initializer='random_uniform')}
pool_layer <- function(x){
layer_max_pooling_2d(x, pool_size = c(3, 3), strides=2, padding= 'VALID') }
fully_connected_layer <- function(units, activation = "relu", name){
layer_dense(units = units,
activation = activation,
kernel_initializer='random_uniform',
bias_initializer='random_uniform') }
// Create Model Function
model_a <- function(learning_rate=0.0001, dropout_rate=0.00000001){
k_clear_session()
model <- keras_model_sequential(input_shape = c(img_size,img_size,3)) %>%
con_layer(filters=96, ksize=11, strides = 4, padding = 'VALID') %>%
pool_layer() %>%
con_layer(filters=256, ksize=5) %>%
pool_layer() %>%
con_layer(filters=384, ksize=3) %>%
con_layer(filters=384, ksize=3) %>%
con_layer(filters=256, ksize=3) %>%
pool_layer() %>%
layer_flatten() %>%
layer_dropout(dropout_rate)%>%
layer_dense(units = 4096, activation= "relu", kernel_initializer='random_uniform',
bias_initializer='random_uniform') %>%
layer_dense(units = 4096, activation= "relu", kernel_initializer='random_uniform',
bias_initializer='random_uniform') %>%
layer_dense(name="Output", units = n_classes, activation = "softmax")
model %>% compile(loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate= learning_rate),
metrics = "accuracy")
return(model)
}
my_model <- model_a(learning_rate=0.00001)
Training the Model
# Train Model
batch_size <- 128
epochs <- 25
hist <- my_model %>% fit(
train_images,
# train_labels,
#steps_per_epoch = train_images$n %/% batch_size,
epochs = epochs,
validation_data = validation_images,
#validation_steps = validation_images$n %/% batch_size,
verbose = 2,
callbacks = list(callback_early_stopping(monitor = "val_accuracy", min_delta=0.001,
patience = 2, restore_best_weights = TRUE))
)
Evaluating the Model
In evaluating our model, we achieved the following results:
Evaluation Metric | Value |
---|---|
precision | 0.939569 |
recall | 0.937778 |
f1 | 0.937799 |
accuracy | 0.937778 |
rand_accuracy | 0.037037 |
kappa | 0.935385 |
Reporting & Presentations
Next Steps
Continued Training with more Real-World Data
- Expand Training & Testing Data: Enhance with "Real-World" data and augment more synthetic data that is "less-than-ideal" to get more diversity.
Hybridize with Other Models
- Position Tracking : Add an additional network (potentially an RNN) to locate the position of the hands in the images.
- Analysis over Time: Implement a time-series component to track the location and shape of the hand over time.