src.models package¶
Submodules¶
src.models.data_augmentation module¶
Contains functions to augment skeleton data. Skeleton data has a prior normalization step applied where the scene is translated from the camera coordinate system to a new local coordinate system. The translation is given by the vector formed between the origin of the camera and the first subject’s SPINE_MID joint for the first frame. The vector is the same for all subsequent frames.
The skeleton data used as inputs is already “prior” normalized.
Three functions are provided.
build_rotation_matrix: Creates a 3x3 rotation matrix for a given axis
rotate_skeleton: Randomly rotates a skeleton sequence around the X, Y and Z axis.
stretched_image_from_skeleton_sequence: Creates an RGB image from a skeleton sequence
Note that the rotation angles are randomly taken in between hardcoded global variables in this file.
As more guidelines we add the following informations.
- Kinect v2 coordinate system:
x : horizontal plane
y : height
z : depth
NTU RGB-D sequences are acquired from -45° to 45° on the x axis
-
src.models.data_augmentation.
build_rotation_matrix
(axis, rot_angle)¶ Builds a random rotation matrix for a given axis.
- Inputs:
axis (int): Axis of rotation (0: x, 1: y, 2: z)
rot_angle (float): Angle of rotation in degrees
- Outputs:
rotation_matrix (np array): 3x3 rotation matrix
-
src.models.data_augmentation.
rotate_skeleton
(skeleton)¶ Rotates the skeleton sequence around its different axis.
- Inputs:
skeleton (np array): Skeleton sequence of shape (3 {x, y, z}, max_frame, num_joint=25, n_subjects=2)
- Outputs:
skeleton_aug (np array): Randomly rotated skeleton sequence of shape (3 {x, y, z}, max_frame, num_joint=25, n_subjects=2)
-
src.models.data_augmentation.
stretched_image_from_skeleton_sequence
(skeleton, c_min, c_max)¶ Rotates the skeleton sequence around its different axis.
- Inputs:
skeleton (np array): Skeleton sequence of shape (3 {x, y, z}, max_frame, num_joint=25, n_subjects=2)
c_min (int): Minimum coordinate value across all sequences, joints, subjects, frames after the prior normalization step.
c_max (int): Maximum coordinate value across all sequences, joints, subjects, frames after the prior normalization step.
- Outputs:
skeleton_image (np array): RGB image of shape (3, 224, 224)
src.models.gen_data_loaders module¶
Used to create the training, validation and test PyTorch data loaders. All data loaders are created from the same custom PyTorch dataset template (h5_pytorch_dataset.py). A helper function is used to create 3 lists containing the sequences’ names for the 3 sets. These lists are used for the __getitem__ method of the datasets.
- The provided functions are as follows:
gen_sets_lists: Creates lists with the sequences’ names of the train-val-test splits
create_data_loaders: Creates three data loaders corresponding to the train-val-test splits
We use 5% of the training set as our validation set.
Note that because we fix the seed, the sets lists are consistent across runs. This is useful when studying the impact of a given hyperparameter for example.
-
src.models.gen_data_loaders.
create_data_loaders
(data_path, evaluation_type, model_type, use_pose, use_ir, use_cropped_IR, batch_size, sub_sequence_length, augment_data)¶ Generates three PyTorch data loaders corresponding to the train-val-test splits.
- Inputs:
data_path (str): Path containing the h5 files (default ./data/processed/).
evaluation_type (str): Benchmark evaluated. Either “cross_subject” of “cross_view”
model_type (str): “FUSION” only for now.
use_pose (bool): Include skeleton data
use_ir (bool): Include IR data
use_cropped_IR (bool): Type of IR dataset
batch_size (int): Size of batch
sub_sequence_length (str): Number of frames to subsample from full IR sequences
augment_data (bool): Choose to augment data by geometric transformation (skeleton data) or horizontal flip (IR data)
- Outputs:
training_generator (PyTorch data loader): Training PyTorch data loader
validation_generator (PyTorch data loader): Validation PyTorch data loader
testing_generator (PyTorch data loader): Testing PyTorch data loader
-
src.models.gen_data_loaders.
gen_sets_lists
(data_path, evaluation_type)¶ Generates 3 lists containing the sequences’ names for the train-val-test splits.
- Inputs:
data_path (str): Path containing the h5 files (default ./data/processed/). This folder should contain the samples_names.txt file containing all the samples’ names.
evaluation_type (str): Benchmark evaluated. Either “cross_subject” or “cross_view”
- Outputs:
training_samples (list): All the training sequences’ names
validation_samples (list): All the validation sequences’ names
testing samples (list): All the testing sequences’ names
src.models.h5_pytorch_dataset module¶
Custom PyTorch dataset that reads from the h5 datasets (see src.data module for more infos).
-
class
src.models.h5_pytorch_dataset.
TorchDataset
(model_type, use_pose, use_ir, use_cropped_IR, data_path, sub_sequence_length, augment_data, samples_names)¶ Bases:
torch.utils.data.dataset.Dataset
This custom PyTorch lazy loads from the h5 datasets. This means that it does not load the entire dataset in memory, which would be impossible for the IR sequences. Instead, it opens and reads from the h5 file. This is a bit slower, but very memory efficient. Additionally, the lost time is mitigated when using multiple workers for the data loaders.
- Attributes:
data_path (str): Path containing the h5 files (default ./data/processed/).
model_type (str): “FUSION” only for now.
use_pose (bool): Include skeleton data
use_ir (bool): Include IR data
use_cropped_IR (bool): Type of IR dataset
sub_sequence_length (str): Number of frames to subsample from full IR sequences
augment_data (bool): Choose to augment data by geometric transformation (skeleton data) or horizontal flip (IR data)
samples_names (list): Contains the sequences names of the dataset (ie. train, validation, test)
- Methods:
__getitem__(index): Returns the processed sequence (skeleton and/or IR) and its label
__len__(): Returns the number of elements in dataset.
src.models.plot_confusion_matrix module¶
Computes the confusion matrix for a trained model. Takes as input important parameters such as the benchmark studied. A confusion matrix in .png format is saved in the trained model folder provided.
Plotting the confusion matrix is best called using the provided Makefile provided.
>>> make confusion_matrix \
PROCESSED_DATA_PATH=X \
MODEL_FOLDER=X \
MODEL_FILE=X \
EVALUATION_TYPE=X \
MODEL_TYPE=X \
USE_POSE=X \
USE_IR=X \
USE_CROPPED_IR=X \
BATCH_SIZE=X \
SUB_SEQUENCE_LENGTH=X \
- With the parameters taking from the following values :
- DATA_PATH:
Path to h5 files. Default location is ./data/processed/
- MODEL_FOLDER:
Output path to save models and log files. A folder inside that path will be automatically created. Default location is ./models/
- MODEL_FILE:
Name of the model.
- EVALUATION_TYPE:
[cross_subject | cross_view]
- MODEL_TYPE:
[FUSION]
- USE_POSE:
[True, False]
- USE_IR:
[True, False]
- USE_CROPPED_IR:
[True, False]
- BATCH_SIZE:
Whole positive number above 1.
- SUB_SEQUENCE_LENGTH:
[1 .. 20] Specifies the number of frames to take from a complete IR sequence.
src.models.pose_ir_fusion module¶
Contains a PyTorch model fusing IR and pose data for improved classification. Also contains a helper function which normalizes pose and IR tensors.
-
class
src.models.pose_ir_fusion.
FUSION
(use_pose, use_ir, pretrained)¶ Bases:
torch.nn.modules.module.Module
This model is built on three submodules. The first is called a “pose module”, which takes a skeleton sequence mapped to an image an outputs a 512-long feature vector. The second one is an “IR module”, which takes an IR sequence and outputs a 512-long feature vector. The third one is a “classification module”, which combines the 2 feature vectors (concatenation) and predicts a class via an MLP. This model can achieve over 90% accuracy on both benchmarks of the NTU RGB+D (60) dataset.
- Attributes:
use_pose (bool): Include skeleton data
use_ir (bool): Include IR data
pose_net (PyTorch model): Pretrained ResNet-18. Only exists if use_pose is True.
ir_net (PyTorch model): Pretrained R(2+1)D-18. Only exists if use_ir is True.
class_mlp (PyTorch model): Classification MLP. Input size is adjusted depending on the modules used. Input size is 512 if only one module is used, 1024 for two modules.
- Methods:
forward(X): Forward step. X contains pose/IR data
-
forward
(X)¶ Forward step of the FUSION model. Input X contains a list of 2 tensors containing pose and IR data. The input is already normalized as specified in the PyTorch pretrained vision models documentation, using the prime_X_fusion function. Each tensor is then passed to its corresponding module. The 2 feature vectors are concatenated, then fed to the classification module (MLP) which then outputs a prediction.
- Inputs:
- X (list of PyTorch tensors): Contains the following tensors:
X_skeleton (PyTorch tensor): pose images of shape (batch_size, 3, 224, 224) if use_pose is True. Else, tensor = None.
X_ir (PyTorch tensor): IR sequences of shape (batch_size, 3, seq_len, 112, 112) if use_ir is True. Else, tensor = None
- Outputs:
pred (PyTorch tensor): Contains the the log-Softmax normalized predictions of shape (batch_size, n_classes=60)
-
src.models.pose_ir_fusion.
prime_X_fusion
(X, use_pose, use_ir)¶ Normalizes X (list of tensors) as defined in the pretrained Torchvision models documentation. Note that X_ir is reshaped in this function.
- Inputs:
- X (list of PyTorch tensors): Contains the following tensors:
X_skeleton (PyTorch tensor): pose images of shape (batch_size, 3, 224, 224) if use_pose is True. Else, tensor = -1.
X_ir (PyTorch tensor): IR sequences of shape (batch_size, seq_len, 3, 112, 112) if use_ir is True. Else, tensor = -1
use_pose (bool): Include skeleton data
use_ir (bool): Include IR data
- Outputs:
- X (list of PyTorch tensors): Contains the following tensors:
X_skeleton (PyTorch tensor): pose images of shape (batch_size, 3, 224, 224) if use_pose is True. Else, tensor = None.
X_ir (PyTorch tensor): IR sequences of shape (batch_size, 3, seq_len, 112, 112) if use_ir is True. Else, tensor = None
src.models.torchvision_models module¶
This module is a copy taken from the official Torchvision documentation of a greater release. The reason it is included is because we use an older version of Torchvision, as it is the latest available on our cluster. Will update in the future.
-
src.models.torchvision_models.
r3d_18
(pretrained=False, progress=True, **kwargs)¶ Construct 18 layer Resnet3D model as in https://arxiv.org/abs/1711.11248
- Args:
pretrained (bool): If True, returns a model pre-trained on Kinetics-400 progress (bool): If True, displays a progress bar of the download to stderr
- Returns:
nn.Module: R3D-18 network
-
src.models.torchvision_models.
mc3_18
(pretrained=False, progress=True, **kwargs)¶ Constructor for 18 layer Mixed Convolution network as in https://arxiv.org/abs/1711.11248
- Args:
pretrained (bool): If True, returns a model pre-trained on Kinetics-400 progress (bool): If True, displays a progress bar of the download to stderr
- Returns:
nn.Module: MC3 Network definition
-
src.models.torchvision_models.
r2plus1d_18
(pretrained=False, progress=True, **kwargs)¶ Constructor for the 18 layer deep R(2+1)D network as in https://arxiv.org/abs/1711.11248
- Args:
pretrained (bool): If True, returns a model pre-trained on Kinetics-400 progress (bool): If True, displays a progress bar of the download to stderr
- Returns:
nn.Module: R(2+1)D-18 network
src.models.train_model module¶
The main file for the src.models module. Takes as input the different hyperparameters and starts training the model. The model is saved after every epoch. A batch_log.txt keeps the record of the accuracy and loss of each batch. A log.txt keeps a record of the accuracy of the train-val-test sets after each epoch.
Note that although we compute the test set at each epoch, we take the final decision based on the validation set only.
Training is best called using the provided Makefile provided.
>>> make train \
PROCESSED_DATA_PATH=X \
MODEL_FOLDER=X \
EVALUATION_TYPE=X \
MODEL_TYPE=X \
USE_POSE=X \
USE_IR=X \
PRETRAINED=X \
USE_CROPPED_IR=X \
LEARNING_RATE=X \
WEIGHT_DECAY=X \
GRADIENT_THRESHOLD=X \
EPOCHS=X \
BATCH_SIZE=X \
ACCUMULATION_STEPS=X \
SUB_SEQUENCE_LENGTH=X \
AUGMENT_DATA=X \
EVALUATE_TEST=X \
SEED=X
- With the parameters taking from the following values :
- PROCESSED_DATA_PATH:
Path to h5 files. Default location is ./data/processed/
- MODEL_FOLDER:
Output path to save models and log files. A folder inside that path will be automatically created. Default location is ./models/
- EVALUATION_TYPE:
[cross_subject | cross_view]
- MODEL_TYPE:
[FUSION]
- USE_POSE:
[True, False]
- USE_IR:
[True, False]
- PRETRAINED:
[True, False]
- USE_CROPPED_IR:
[True, False]
- LEARNING_RATE:
Real positive number.
- WEIGHT_DECAY:
Real positive number. If 0, then no weight decay is applied.
- EPOCHS:
Whole positive number above 1.
- BATCH_SIZE:
Whole positive number above 1.
- GRADIENT_THRESHOLD:
Real positive number. If 0, then no threshold is applied
- ACCUMULATION_STEPS:
Accumulate gradient across batches. This is a trick to virtually train larger batches on modest architectures.
- SUB_SEQUENCE_LENGTH:
[1 .. 20] Specifies the number of frames to take from a complete IR sequence.
- AUGMENT_DATA
[True, False]
- EVALUATE_TEST
[True, False]
- SEED
Positive whole number. Used to make training replicable.
src.models.train_utils module¶
Contains helper function to train a network, evaluate its accuracy score, and plot a confusion matrix.
- The following functions are provided:
plot_confusion_matrix: Given a prediction and a ground truth vector, returns a plot of the confusion matrix.
calculate_accuracy: Calculates accuracy score between 2 PyTorch tensors
evaluate_set: Computes accuracy for a given set (train-val-test)
train_model: Trains a model with the given hyperparameters.
-
src.models.train_utils.
calculate_accuracy
(Y_hat, Y)¶ Calculates accuracy score for prediction tensor given its ground truh.
- Inputs:
Y_hat (PyTorch tensor): Predictions scores (Softmax/log-Softmax) of shape (batch_size, n_classes)
Y (PyTorch tensor): Ground truth vector of shape (batch_size, n_classes)
- Outputs:
accuracy (int): Accuracy score
Y_hat (np array): Numpy version of Y_hat of shape (batch_size, n_classes)
Y (np array): Numpy version of Y of shape (batch_size, n_classes)
-
src.models.train_utils.
evaluate_set
(model, model_type, data_loader, output_folder, set_name)¶ Calculates accuracy score over a given set (train-test-val) and returns two vectors with all predictions and all ground truths.
- Inputs:
model (PyTorch model): Evaluated PyTorch model.
model_type (str): “FUSION” only for now.
data_loader (PyTorch data loader): Data loader of evaluated set
output_folder (str): Path of output folder
set_name (str): Name of the evaluated set [ie. “TRAIN” | “VAL” | “TEST”]
- Outputs:
accuracy (int): Accuracy over set
y_true (list of np arrays): Lists of all ground truths vectors. Each index of the list yields the ground truths for a given batch.
y_pred (list of np arrays): Lists of all predictions vectors. Each index of the list yields the predictions for a given batch.
-
src.models.train_utils.
plot_confusion_matrix
(y_true, y_pred, classes, normalize=False, title=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>)¶ This function is taken from the sklearn website. It is slightly modified. Given a prediction vector, a ground truth vector and a list containing the names of the classes, it returns a confusion matrix plot.
- Inputs:
y_true (np.int32 array): 1D array of predictions
y_pred (np.int32 array): 1D array of ground truths
classes (list): List of action names
normalize (bool): Use percentages instead of totals
title (str): Title of the plot
cmap (matplotlib cmap): Plot color style
- Outputs:
ax (matplotlib plot): Confusion matrix plot
-
src.models.train_utils.
train_model
(model, model_type, optimizer, learning_rate, weight_decay, gradient_threshold, epochs, accumulation_steps, evaluate_test, output_folder, train_generator, test_generator, validation_generator=None)¶ Trains a model in batches fashion. At each epoch, the entire training set is studied, then the validation and the test sets are evaluated. Note that we only use the validation set to select which model to keep. Files log.txt and batch_log.txt are used to debug and record training progress.
- Inputs:
model (PyTorch model): Model to train.
model_type (str): “FUSION” only for now.
optimizer (str): Name of the optimizer to use (“ADAM” of “SGD” only for now)
learning_rate (float): Learning rate
weight_decay (float): Weight decay
gradient_threshold (float): Clip gradient by this value. If 0, no threshold is applied.
epochs (int): Number of epochs to train.
accumulation_steps (int): Accumulate gradient across batches. This is a trick to virtually train larger batches on modest architectures.
evaluate_test (bool): Choose to evaluate test set or not at each epoch.
output_folder (str): Entire path in which log files and models are saved. By default: ./models/automatically_created_folder/
train_generator (PyTorch data loader): Training set data loader
validation_generator (PyTorch data loader): Validation set data loader
test_generator (PyTorch data loader): Test set data loader
src.models.utils module¶
-
src.models.utils.
set_parameter_requires_grad
(model, feature_extracting)¶ Sets model to feature extraction mode or not. If feature_extracting is True, the gradients are frozen in the model. Else, the gradients are activated.
- Inputs:
model (PyTorch model): Model to set
feature_extracting (bool): If true, freezes model gradients. If not, activates model gradients.