Simple project what helps recognizing sign language with torch vision.
- Kaggle American Sign Language
- Kaggle American Sign Language 1GB (Alphabets Only)
- Kaggle American Sign Language 7GB (Alphabets Only)
========================================================================================================================
Layer (type (var_name)) Input Shape Output Shape Param # Trainable
========================================================================================================================
TinyVGG (TinyVGG) [16, 3, 64, 64] [16, 10] -- True
├─Sequential (conv_layer1) [16, 3, 64, 64] [16, 60, 29, 29] -- True
│ └─Conv2d (0) [16, 3, 64, 64] [16, 60, 62, 62] 1,680 True
│ └─ReLU (1) [16, 60, 62, 62] [16, 60, 62, 62] -- --
│ └─Conv2d (2) [16, 60, 62, 62] [16, 60, 60, 60] 32,460 True
│ └─ReLU (3) [16, 60, 60, 60] [16, 60, 60, 60] -- --
│ └─MaxPool2d (4) [16, 60, 60, 60] [16, 60, 29, 29] -- --
├─Sequential (conv_layer2) [16, 60, 29, 29] [16, 60, 12, 12] -- True
│ └─Conv2d (0) [16, 60, 29, 29] [16, 60, 27, 27] 32,460 True
│ └─ReLU (1) [16, 60, 27, 27] [16, 60, 27, 27] -- --
│ └─Conv2d (2) [16, 60, 27, 27] [16, 60, 25, 25] 32,460 True
│ └─ReLU (3) [16, 60, 25, 25] [16, 60, 25, 25] -- --
│ └─MaxPool2d (4) [16, 60, 25, 25] [16, 60, 12, 12] -- --
├─Sequential (classification_layer) [16, 60, 12, 12] [16, 10] -- True
│ └─Flatten (0) [16, 60, 12, 12] [16, 8640] -- --
│ └─Linear (1) [16, 8640] [16, 10] 86,410 True
│ └─Softmax (2) [16, 10] [16, 10] -- --
========================================================================================================================
Total params: 185,470
Trainable params: 185,470
Non-trainable params: 0
Total mult-adds (G): 2.68
========================================================================================================================
Input size (MB): 0.79
Forward/backward pass size (MB): 67.57
Params size (MB): 0.74
Estimated Total Size (MB): 69.10
====================================================================================================================================================================================================================================================================
Layer (type (var_name)) Input Shape Output Shape Param # Trainable
============================================================================================================================================
EfficientNet (EfficientNet) [16, 3, 64, 64] [16, 10] -- Partial
├─Sequential (efficientnet_features) [16, 3, 64, 64] [16, 1280, 2, 2] -- False
│ └─Conv2dNormActivation (0) [16, 3, 64, 64] [16, 32, 32, 32] -- False
│ │ └─Conv2d (0) [16, 3, 64, 64] [16, 32, 32, 32] (864) False
│ │ └─BatchNorm2d (1) [16, 32, 32, 32] [16, 32, 32, 32] (64) False
│ │ └─SiLU (2) [16, 32, 32, 32] [16, 32, 32, 32] -- --
│ └─Sequential (1) [16, 32, 32, 32] [16, 16, 32, 32] -- False
│ │ └─MBConv (0) [16, 32, 32, 32] [16, 16, 32, 32] (1,448) False
│ └─Sequential (2) [16, 16, 32, 32] [16, 24, 16, 16] -- False
│ │ └─MBConv (0) [16, 16, 32, 32] [16, 24, 16, 16] (6,004) False
│ │ └─MBConv (1) [16, 24, 16, 16] [16, 24, 16, 16] (10,710) False
│ └─Sequential (3) [16, 24, 16, 16] [16, 40, 8, 8] -- False
│ │ └─MBConv (0) [16, 24, 16, 16] [16, 40, 8, 8] (15,350) False
│ │ └─MBConv (1) [16, 40, 8, 8] [16, 40, 8, 8] (31,290) False
│ └─Sequential (4) [16, 40, 8, 8] [16, 80, 4, 4] -- False
│ │ └─MBConv (0) [16, 40, 8, 8] [16, 80, 4, 4] (37,130) False
│ │ └─MBConv (1) [16, 80, 4, 4] [16, 80, 4, 4] (102,900) False
│ │ └─MBConv (2) [16, 80, 4, 4] [16, 80, 4, 4] (102,900) False
│ └─Sequential (5) [16, 80, 4, 4] [16, 112, 4, 4] -- False
│ │ └─MBConv (0) [16, 80, 4, 4] [16, 112, 4, 4] (126,004) False
│ │ └─MBConv (1) [16, 112, 4, 4] [16, 112, 4, 4] (208,572) False
│ │ └─MBConv (2) [16, 112, 4, 4] [16, 112, 4, 4] (208,572) False
│ └─Sequential (6) [16, 112, 4, 4] [16, 192, 2, 2] -- False
│ │ └─MBConv (0) [16, 112, 4, 4] [16, 192, 2, 2] (262,492) False
│ │ └─MBConv (1) [16, 192, 2, 2] [16, 192, 2, 2] (587,952) False
│ │ └─MBConv (2) [16, 192, 2, 2] [16, 192, 2, 2] (587,952) False
│ │ └─MBConv (3) [16, 192, 2, 2] [16, 192, 2, 2] (587,952) False
│ └─Sequential (7) [16, 192, 2, 2] [16, 320, 2, 2] -- False
│ │ └─MBConv (0) [16, 192, 2, 2] [16, 320, 2, 2] (717,232) False
│ └─Conv2dNormActivation (8) [16, 320, 2, 2] [16, 1280, 2, 2] -- False
│ │ └─Conv2d (0) [16, 320, 2, 2] [16, 1280, 2, 2] (409,600) False
│ │ └─BatchNorm2d (1) [16, 1280, 2, 2] [16, 1280, 2, 2] (2,560) False
│ │ └─SiLU (2) [16, 1280, 2, 2] [16, 1280, 2, 2] -- --
├─Sequential (classifier) [16, 1280, 2, 2] [16, 10] -- True
│ └─Flatten (0) [16, 1280, 2, 2] [16, 5120] -- --
│ └─Linear (1) [16, 5120] [16, 10] 51,210 True
│ └─Softmax (2) [16, 10] [16, 10] -- --
============================================================================================================================================
Total params: 4,058,758
Trainable params: 51,210
Non-trainable params: 4,007,548
Total mult-adds (M): 513.11
============================================================================================================================================
Input size (MB): 0.79
Forward/backward pass size (MB): 142.00
Params size (MB): 16.24
Estimated Total Size (MB): 159.02
============================================================================================================================================(venv) navin3d@Navins-MacBook-Pro SignLanguage-Recognition % python3 train.py
Using DEVICE : cpu
Number of Epochs : 10
Batch Size : 16
Hidden Units : 60
Learning Rate : 0.001
Num of classes : 10
Epoch: 1 | train_loss: 0.1376 | train_acc: 0.2355 | test_loss: 0.1152 | test_acc: 0.5625
Epoch: 2 | train_loss: 0.1198 | train_acc: 0.5436 | test_loss: 0.1003 | test_acc: 0.7500
Epoch: 3 | train_loss: 0.1153 | train_acc: 0.6163 | test_loss: 0.0991 | test_acc: 0.8125
Epoch: 4 | train_loss: 0.1134 | train_acc: 0.6453 | test_loss: 0.1091 | test_acc: 0.6250
Epoch: 5 | train_loss: 0.1141 | train_acc: 0.6366 | test_loss: 0.1074 | test_acc: 0.6250
Epoch: 6 | train_loss: 0.1119 | train_acc: 0.6701 | test_loss: 0.0965 | test_acc: 0.8125
Epoch: 7 | train_loss: 0.1129 | train_acc: 0.6512 | test_loss: 0.1036 | test_acc: 0.6875
Epoch: 8 | train_loss: 0.1101 | train_acc: 0.7006 | test_loss: 0.0958 | test_acc: 0.8125
Epoch: 9 | train_loss: 0.1121 | train_acc: 0.6628 | test_loss: 0.1155 | test_acc: 0.5625
Epoch: 10 | train_loss: 0.1112 | train_acc: 0.6802 | test_loss: 0.0958 | test_acc: 0.8125
[INFO] Saving model to: models/TinyVGG_Epoch_10_Batch_16.pth(venv) navin3d@Navins-MacBook-Pro SignLanguage-Recognition % python3 train.py
Using DEVICE : cpu
Number of Epochs : 10
Batch Size : 16
Hidden Units : 60
Learning Rate : 0.001
Num of classes : 10
Epoch: 1 | train_loss: 0.1313 | train_acc: 0.4157 | test_loss: 0.1195 | test_acc: 0.5625
Epoch: 2 | train_loss: 0.1182 | train_acc: 0.6279 | test_loss: 0.1146 | test_acc: 0.5625
Epoch: 3 | train_loss: 0.1138 | train_acc: 0.6817 | test_loss: 0.1167 | test_acc: 0.4375
Epoch: 4 | train_loss: 0.1129 | train_acc: 0.6860 | test_loss: 0.1110 | test_acc: 0.6250
Epoch: 5 | train_loss: 0.1122 | train_acc: 0.6860 | test_loss: 0.1055 | test_acc: 0.7500
Epoch: 6 | train_loss: 0.1105 | train_acc: 0.7180 | test_loss: 0.1165 | test_acc: 0.5000
Epoch: 7 | train_loss: 0.1104 | train_acc: 0.7180 | test_loss: 0.1063 | test_acc: 0.6250
Epoch: 8 | train_loss: 0.1102 | train_acc: 0.7195 | test_loss: 0.1022 | test_acc: 0.8125
Epoch: 9 | train_loss: 0.1088 | train_acc: 0.7384 | test_loss: 0.1050 | test_acc: 0.7500
Epoch: 10 | train_loss: 0.1091 | train_acc: 0.7413 | test_loss: 0.1125 | test_acc: 0.5625
[INFO] Saving model to: models/EfficientNet_Epoch_10_Batch_16_LR_0.001.pth