src.models package

Submodules

src.models.data_augmentation module

Contains functions to augment skeleton data. Skeleton data has a prior normalization step applied where the scene is translated from the camera coordinate system to a new local coordinate system. The translation is given by the vector formed between the origin of the camera and the first subject’s SPINE_MID joint for the first frame. The vector is the same for all subsequent frames.

The skeleton data used as inputs is already “prior” normalized.

Three functions are provided.

  • build_rotation_matrix: Creates a 3x3 rotation matrix for a given axis

  • rotate_skeleton: Randomly rotates a skeleton sequence around the X, Y and Z axis.

  • stretched_image_from_skeleton_sequence: Creates an RGB image from a skeleton sequence

Note that the rotation angles are randomly taken in between hardcoded global variables in this file.

As more guidelines we add the following informations.

Kinect v2 coordinate system:
  • x : horizontal plane

  • y : height

  • z : depth

NTU RGB-D sequences are acquired from -45° to 45° on the x axis

src.models.data_augmentation.build_rotation_matrix(axis, rot_angle)

Builds a random rotation matrix for a given axis.

Inputs:
  • axis (int): Axis of rotation (0: x, 1: y, 2: z)

  • rot_angle (float): Angle of rotation in degrees

Outputs:

rotation_matrix (np array): 3x3 rotation matrix

src.models.data_augmentation.rotate_skeleton(skeleton)

Rotates the skeleton sequence around its different axis.

Inputs:

skeleton (np array): Skeleton sequence of shape (3 {x, y, z}, max_frame, num_joint=25, n_subjects=2)

Outputs:

skeleton_aug (np array): Randomly rotated skeleton sequence of shape (3 {x, y, z}, max_frame, num_joint=25, n_subjects=2)

src.models.data_augmentation.stretched_image_from_skeleton_sequence(skeleton, c_min, c_max)

Rotates the skeleton sequence around its different axis.

Inputs:
  • skeleton (np array): Skeleton sequence of shape (3 {x, y, z}, max_frame, num_joint=25, n_subjects=2)

  • c_min (int): Minimum coordinate value across all sequences, joints, subjects, frames after the prior normalization step.

  • c_max (int): Maximum coordinate value across all sequences, joints, subjects, frames after the prior normalization step.

Outputs:

skeleton_image (np array): RGB image of shape (3, 224, 224)

src.models.gen_data_loaders module

Used to create the training, validation and test PyTorch data loaders. All data loaders are created from the same custom PyTorch dataset template (h5_pytorch_dataset.py). A helper function is used to create 3 lists containing the sequences’ names for the 3 sets. These lists are used for the __getitem__ method of the datasets.

The provided functions are as follows:
  • gen_sets_lists: Creates lists with the sequences’ names of the train-val-test splits

  • create_data_loaders: Creates three data loaders corresponding to the train-val-test splits

We use 5% of the training set as our validation set.

Note that because we fix the seed, the sets lists are consistent across runs. This is useful when studying the impact of a given hyperparameter for example.

src.models.gen_data_loaders.create_data_loaders(data_path, evaluation_type, model_type, use_pose, use_ir, use_cropped_IR, batch_size, sub_sequence_length, augment_data)

Generates three PyTorch data loaders corresponding to the train-val-test splits.

Inputs:
  • data_path (str): Path containing the h5 files (default ./data/processed/).

  • evaluation_type (str): Benchmark evaluated. Either “cross_subject” of “cross_view”

  • model_type (str): “FUSION” only for now.

  • use_pose (bool): Include skeleton data

  • use_ir (bool): Include IR data

  • use_cropped_IR (bool): Type of IR dataset

  • batch_size (int): Size of batch

  • sub_sequence_length (str): Number of frames to subsample from full IR sequences

  • augment_data (bool): Choose to augment data by geometric transformation (skeleton data) or horizontal flip (IR data)

Outputs:
  • training_generator (PyTorch data loader): Training PyTorch data loader

  • validation_generator (PyTorch data loader): Validation PyTorch data loader

  • testing_generator (PyTorch data loader): Testing PyTorch data loader

src.models.gen_data_loaders.gen_sets_lists(data_path, evaluation_type)

Generates 3 lists containing the sequences’ names for the train-val-test splits.

Inputs:
  • data_path (str): Path containing the h5 files (default ./data/processed/). This folder should contain the samples_names.txt file containing all the samples’ names.

  • evaluation_type (str): Benchmark evaluated. Either “cross_subject” or “cross_view”

Outputs:
  • training_samples (list): All the training sequences’ names

  • validation_samples (list): All the validation sequences’ names

  • testing samples (list): All the testing sequences’ names

src.models.h5_pytorch_dataset module

Custom PyTorch dataset that reads from the h5 datasets (see src.data module for more infos).

class src.models.h5_pytorch_dataset.TorchDataset(model_type, use_pose, use_ir, use_cropped_IR, data_path, sub_sequence_length, augment_data, samples_names)

Bases: torch.utils.data.dataset.Dataset

This custom PyTorch lazy loads from the h5 datasets. This means that it does not load the entire dataset in memory, which would be impossible for the IR sequences. Instead, it opens and reads from the h5 file. This is a bit slower, but very memory efficient. Additionally, the lost time is mitigated when using multiple workers for the data loaders.

Attributes:
  • data_path (str): Path containing the h5 files (default ./data/processed/).

  • model_type (str): “FUSION” only for now.

  • use_pose (bool): Include skeleton data

  • use_ir (bool): Include IR data

  • use_cropped_IR (bool): Type of IR dataset

  • sub_sequence_length (str): Number of frames to subsample from full IR sequences

  • augment_data (bool): Choose to augment data by geometric transformation (skeleton data) or horizontal flip (IR data)

  • samples_names (list): Contains the sequences names of the dataset (ie. train, validation, test)

Methods:
  • __getitem__(index): Returns the processed sequence (skeleton and/or IR) and its label

  • __len__(): Returns the number of elements in dataset.

src.models.plot_confusion_matrix module

Computes the confusion matrix for a trained model. Takes as input important parameters such as the benchmark studied. A confusion matrix in .png format is saved in the trained model folder provided.

Plotting the confusion matrix is best called using the provided Makefile provided.

>>> make confusion_matrix \
    PROCESSED_DATA_PATH=X \
    MODEL_FOLDER=X \
    MODEL_FILE=X \
    EVALUATION_TYPE=X \
    MODEL_TYPE=X \
    USE_POSE=X \
    USE_IR=X \
    USE_CROPPED_IR=X \
    BATCH_SIZE=X \
    SUB_SEQUENCE_LENGTH=X \
With the parameters taking from the following values :
  • DATA_PATH:

    Path to h5 files. Default location is ./data/processed/

  • MODEL_FOLDER:

    Output path to save models and log files. A folder inside that path will be automatically created. Default location is ./models/

  • MODEL_FILE:

    Name of the model.

  • EVALUATION_TYPE:

    [cross_subject | cross_view]

  • MODEL_TYPE:

    [FUSION]

  • USE_POSE:

    [True, False]

  • USE_IR:

    [True, False]

  • USE_CROPPED_IR:

    [True, False]

  • BATCH_SIZE:

    Whole positive number above 1.

  • SUB_SEQUENCE_LENGTH:

    [1 .. 20] Specifies the number of frames to take from a complete IR sequence.

src.models.pose_ir_fusion module

Contains a PyTorch model fusing IR and pose data for improved classification. Also contains a helper function which normalizes pose and IR tensors.

class src.models.pose_ir_fusion.FUSION(use_pose, use_ir, pretrained)

Bases: torch.nn.modules.module.Module

This model is built on three submodules. The first is called a “pose module”, which takes a skeleton sequence mapped to an image an outputs a 512-long feature vector. The second one is an “IR module”, which takes an IR sequence and outputs a 512-long feature vector. The third one is a “classification module”, which combines the 2 feature vectors (concatenation) and predicts a class via an MLP. This model can achieve over 90% accuracy on both benchmarks of the NTU RGB+D (60) dataset.

Attributes:
  • use_pose (bool): Include skeleton data

  • use_ir (bool): Include IR data

  • pose_net (PyTorch model): Pretrained ResNet-18. Only exists if use_pose is True.

  • ir_net (PyTorch model): Pretrained R(2+1)D-18. Only exists if use_ir is True.

  • class_mlp (PyTorch model): Classification MLP. Input size is adjusted depending on the modules used. Input size is 512 if only one module is used, 1024 for two modules.

Methods:

forward(X): Forward step. X contains pose/IR data

forward(X)

Forward step of the FUSION model. Input X contains a list of 2 tensors containing pose and IR data. The input is already normalized as specified in the PyTorch pretrained vision models documentation, using the prime_X_fusion function. Each tensor is then passed to its corresponding module. The 2 feature vectors are concatenated, then fed to the classification module (MLP) which then outputs a prediction.

Inputs:
X (list of PyTorch tensors): Contains the following tensors:
  • X_skeleton (PyTorch tensor): pose images of shape (batch_size, 3, 224, 224) if use_pose is True. Else, tensor = None.

  • X_ir (PyTorch tensor): IR sequences of shape (batch_size, 3, seq_len, 112, 112) if use_ir is True. Else, tensor = None

Outputs:

pred (PyTorch tensor): Contains the the log-Softmax normalized predictions of shape (batch_size, n_classes=60)

src.models.pose_ir_fusion.prime_X_fusion(X, use_pose, use_ir)

Normalizes X (list of tensors) as defined in the pretrained Torchvision models documentation. Note that X_ir is reshaped in this function.

Inputs:
  • X (list of PyTorch tensors): Contains the following tensors:
    • X_skeleton (PyTorch tensor): pose images of shape (batch_size, 3, 224, 224) if use_pose is True. Else, tensor = -1.

    • X_ir (PyTorch tensor): IR sequences of shape (batch_size, seq_len, 3, 112, 112) if use_ir is True. Else, tensor = -1

  • use_pose (bool): Include skeleton data

  • use_ir (bool): Include IR data

Outputs:
X (list of PyTorch tensors): Contains the following tensors:
  • X_skeleton (PyTorch tensor): pose images of shape (batch_size, 3, 224, 224) if use_pose is True. Else, tensor = None.

  • X_ir (PyTorch tensor): IR sequences of shape (batch_size, 3, seq_len, 112, 112) if use_ir is True. Else, tensor = None

src.models.torchvision_models module

This module is a copy taken from the official Torchvision documentation of a greater release. The reason it is included is because we use an older version of Torchvision, as it is the latest available on our cluster. Will update in the future.

src.models.torchvision_models.r3d_18(pretrained=False, progress=True, **kwargs)

Construct 18 layer Resnet3D model as in https://arxiv.org/abs/1711.11248

Args:

pretrained (bool): If True, returns a model pre-trained on Kinetics-400 progress (bool): If True, displays a progress bar of the download to stderr

Returns:

nn.Module: R3D-18 network

src.models.torchvision_models.mc3_18(pretrained=False, progress=True, **kwargs)

Constructor for 18 layer Mixed Convolution network as in https://arxiv.org/abs/1711.11248

Args:

pretrained (bool): If True, returns a model pre-trained on Kinetics-400 progress (bool): If True, displays a progress bar of the download to stderr

Returns:

nn.Module: MC3 Network definition

src.models.torchvision_models.r2plus1d_18(pretrained=False, progress=True, **kwargs)

Constructor for the 18 layer deep R(2+1)D network as in https://arxiv.org/abs/1711.11248

Args:

pretrained (bool): If True, returns a model pre-trained on Kinetics-400 progress (bool): If True, displays a progress bar of the download to stderr

Returns:

nn.Module: R(2+1)D-18 network

src.models.train_model module

The main file for the src.models module. Takes as input the different hyperparameters and starts training the model. The model is saved after every epoch. A batch_log.txt keeps the record of the accuracy and loss of each batch. A log.txt keeps a record of the accuracy of the train-val-test sets after each epoch.

Note that although we compute the test set at each epoch, we take the final decision based on the validation set only.

Training is best called using the provided Makefile provided.

>>> make train \
    PROCESSED_DATA_PATH=X \
    MODEL_FOLDER=X \
    EVALUATION_TYPE=X \
    MODEL_TYPE=X \
    USE_POSE=X \
    USE_IR=X \
    PRETRAINED=X \
    USE_CROPPED_IR=X \
    LEARNING_RATE=X \
    WEIGHT_DECAY=X \
    GRADIENT_THRESHOLD=X \
    EPOCHS=X \
    BATCH_SIZE=X \
    ACCUMULATION_STEPS=X \
    SUB_SEQUENCE_LENGTH=X \
    AUGMENT_DATA=X \
    EVALUATE_TEST=X \
    SEED=X
With the parameters taking from the following values :
  • PROCESSED_DATA_PATH:

    Path to h5 files. Default location is ./data/processed/

  • MODEL_FOLDER:

    Output path to save models and log files. A folder inside that path will be automatically created. Default location is ./models/

  • EVALUATION_TYPE:

    [cross_subject | cross_view]

  • MODEL_TYPE:

    [FUSION]

  • USE_POSE:

    [True, False]

  • USE_IR:

    [True, False]

  • PRETRAINED:

    [True, False]

  • USE_CROPPED_IR:

    [True, False]

  • LEARNING_RATE:

    Real positive number.

  • WEIGHT_DECAY:

    Real positive number. If 0, then no weight decay is applied.

  • EPOCHS:

    Whole positive number above 1.

  • BATCH_SIZE:

    Whole positive number above 1.

  • GRADIENT_THRESHOLD:

    Real positive number. If 0, then no threshold is applied

  • ACCUMULATION_STEPS:

    Accumulate gradient across batches. This is a trick to virtually train larger batches on modest architectures.

  • SUB_SEQUENCE_LENGTH:

    [1 .. 20] Specifies the number of frames to take from a complete IR sequence.

  • AUGMENT_DATA

    [True, False]

  • EVALUATE_TEST

    [True, False]

  • SEED

    Positive whole number. Used to make training replicable.

src.models.train_utils module

Contains helper function to train a network, evaluate its accuracy score, and plot a confusion matrix.

The following functions are provided:
  • plot_confusion_matrix: Given a prediction and a ground truth vector, returns a plot of the confusion matrix.

  • calculate_accuracy: Calculates accuracy score between 2 PyTorch tensors

  • evaluate_set: Computes accuracy for a given set (train-val-test)

  • train_model: Trains a model with the given hyperparameters.

src.models.train_utils.calculate_accuracy(Y_hat, Y)

Calculates accuracy score for prediction tensor given its ground truh.

Inputs:
  • Y_hat (PyTorch tensor): Predictions scores (Softmax/log-Softmax) of shape (batch_size, n_classes)

  • Y (PyTorch tensor): Ground truth vector of shape (batch_size, n_classes)

Outputs:
  • accuracy (int): Accuracy score

  • Y_hat (np array): Numpy version of Y_hat of shape (batch_size, n_classes)

  • Y (np array): Numpy version of Y of shape (batch_size, n_classes)

src.models.train_utils.evaluate_set(model, model_type, data_loader, output_folder, set_name)

Calculates accuracy score over a given set (train-test-val) and returns two vectors with all predictions and all ground truths.

Inputs:
  • model (PyTorch model): Evaluated PyTorch model.

  • model_type (str): “FUSION” only for now.

  • data_loader (PyTorch data loader): Data loader of evaluated set

  • output_folder (str): Path of output folder

  • set_name (str): Name of the evaluated set [ie. “TRAIN” | “VAL” | “TEST”]

Outputs:
  • accuracy (int): Accuracy over set

  • y_true (list of np arrays): Lists of all ground truths vectors. Each index of the list yields the ground truths for a given batch.

  • y_pred (list of np arrays): Lists of all predictions vectors. Each index of the list yields the predictions for a given batch.

src.models.train_utils.plot_confusion_matrix(y_true, y_pred, classes, normalize=False, title=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>)

This function is taken from the sklearn website. It is slightly modified. Given a prediction vector, a ground truth vector and a list containing the names of the classes, it returns a confusion matrix plot.

Inputs:
  • y_true (np.int32 array): 1D array of predictions

  • y_pred (np.int32 array): 1D array of ground truths

  • classes (list): List of action names

  • normalize (bool): Use percentages instead of totals

  • title (str): Title of the plot

  • cmap (matplotlib cmap): Plot color style

Outputs:

ax (matplotlib plot): Confusion matrix plot

src.models.train_utils.train_model(model, model_type, optimizer, learning_rate, weight_decay, gradient_threshold, epochs, accumulation_steps, evaluate_test, output_folder, train_generator, test_generator, validation_generator=None)

Trains a model in batches fashion. At each epoch, the entire training set is studied, then the validation and the test sets are evaluated. Note that we only use the validation set to select which model to keep. Files log.txt and batch_log.txt are used to debug and record training progress.

Inputs:
  • model (PyTorch model): Model to train.

  • model_type (str): “FUSION” only for now.

  • optimizer (str): Name of the optimizer to use (“ADAM” of “SGD” only for now)

  • learning_rate (float): Learning rate

  • weight_decay (float): Weight decay

  • gradient_threshold (float): Clip gradient by this value. If 0, no threshold is applied.

  • epochs (int): Number of epochs to train.

  • accumulation_steps (int): Accumulate gradient across batches. This is a trick to virtually train larger batches on modest architectures.

  • evaluate_test (bool): Choose to evaluate test set or not at each epoch.

  • output_folder (str): Entire path in which log files and models are saved. By default: ./models/automatically_created_folder/

  • train_generator (PyTorch data loader): Training set data loader

  • validation_generator (PyTorch data loader): Validation set data loader

  • test_generator (PyTorch data loader): Test set data loader

src.models.utils module

src.models.utils.set_parameter_requires_grad(model, feature_extracting)

Sets model to feature extraction mode or not. If feature_extracting is True, the gradients are frozen in the model. Else, the gradients are activated.

Inputs:
  • model (PyTorch model): Model to set

  • feature_extracting (bool): If true, freezes model gradients. If not, activates model gradients.

Module contents