DataJoint U24 - Workflow DeepLabCut¶
Interactively run the workflow¶
The workflow requires a DeepLabCut project with labeled data.
- If you don't have data, refer to 00-DataDownload and 01-Configure.
- For an overview of the schema, refer to 02-WorkflowStructure.
- For a more automated approach, refer to 03-Automate.
Let's change the directory to load the local config, dj_local_conf.json
.
import os
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='workflow-deeplabcut', ("Please move to the "
+ "workflow directory")
Pipeline.py
activates the DataJoint elements
and declares other required tables.
import datajoint as dj
from workflow_deeplabcut.pipeline import lab, subject, session, train, model
# Directing our pipeline to the appropriate config location
from element_interface.utils import find_full_path
from workflow_deeplabcut.paths import get_dlc_root_data_dir
config_path = find_full_path(get_dlc_root_data_dir(),
'from_top_tracking/config.yaml')
Connecting cbroz@dss-db.datajoint.io:3306
Manually Inserting Entries¶
Upstream tables¶
We can insert entries into dj.Manual
tables (green in diagrams) by providing values as a dictionary or a list of dictionaries.
session.Session.heading
# subject : varchar(32) # session_datetime : datetime(3) #
subject.Subject.insert1(dict(subject='subject6',
sex='F',
subject_birth_date='2020-01-01',
subject_description='hneih_E105'))
session_keys = [dict(subject='subject6', session_datetime='2021-06-02 14:04:22'),
dict(subject='subject6', session_datetime='2021-06-03 14:43:10')]
session.Session.insert(session_keys)
We can look at the contents of this table and restrict by a value.
session.Session() & "session_datetime > '2021-06-01 12:00:00'" & "subject='subject6'"
subject | session_datetime |
---|---|
subject6 | 2021-06-02 14:04:22 |
subject6 | 2021-06-03 14:43:10 |
Total: 2
DeepLabcut Tables¶
The VideoSet
table in the train
schema retains records of files generated in the video labeling process (e.g., h5
, csv
, png
). DeepLabCut will refer to the mat
file located under the training-datasets
directory.
We recommend storing all paths as relative to the root in your config.
train.VideoSet.insert1({'video_set_id': 0})
project_folder = 'from_top_tracking/'
training_files = ['labeled-data/train1/CollectedData_DJ.h5',
'labeled-data/train1/CollectedData_DJ.csv',
'labeled-data/train1/img00674.png',
'videos/train1.mp4']
for idx, filename in enumerate(training_files):
train.VideoSet.File.insert1({'video_set_id': 0,
'file_id': idx,
'file_path': (project_folder + filename)})
train.VideoSet.File()
video_set_id | file_id | file_path |
---|---|---|
0 | 0 | from_top_tracking/labeled-data/train1/CollectedData_DJ.h5 |
0 | 1 | from_top_tracking/labeled-data/train1/CollectedData_DJ.csv |
0 | 2 | from_top_tracking/labeled-data/train1/img00674.png |
0 | 3 | from_top_tracking/videos/train1.mp4 |
Total: 4
Training a Network¶
First, we'll add a ModelTrainingParamSet
. This is a lookup table that we can reference when training a model.
train.TrainingParamSet.heading
paramset_idx : smallint # --- paramset_desc : varchar(128) # param_set_hash : uuid # hash identifying this parameterset params : longblob # dictionary of all applicable parameters
The params
longblob should be a dictionary that captures all items for DeepLabCut's train_network
function. At minimum, this is the contents of the project's config file, as well as suffle
and trainingsetindex
, which are not included in the config.
from deeplabcut import train_network
help(train_network) # for more information on optional parameters
Loading DLC 2.2.1.1... DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI) Help on function train_network in module deeplabcut.pose_estimation_tensorflow.training: train_network(config, shuffle=1, trainingsetindex=0, max_snapshots_to_keep=5, displayiters=None, saveiters=None, maxiters=None, allow_growth=True, gputouse=None, autotune=False, keepdeconvweights=True, modelprefix='') Trains the network with the labels in the training dataset. Parameters ---------- config : string Full path of the config.yaml file as a string. shuffle: int, optional, default=1 Integer value specifying the shuffle index to select for training. trainingsetindex: int, optional, default=0 Integer specifying which TrainingsetFraction to use. Note that TrainingFraction is a list in config.yaml. max_snapshots_to_keep: int or None Sets how many snapshots are kept, i.e. states of the trained network. Every saving interation many times a snapshot is stored, however only the last ``max_snapshots_to_keep`` many are kept! If you change this to None, then all are kept. See: https://github.com/DeepLabCut/DeepLabCut/issues/8#issuecomment-387404835 displayiters: optional, default=None This variable is actually set in ``pose_config.yaml``. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out the ``pose_config.yaml`` file for the corresponding project. If ``None``, the value from there is used, otherwise it is overwritten! saveiters: optional, default=None This variable is actually set in ``pose_config.yaml``. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out the ``pose_config.yaml`` file for the corresponding project. If ``None``, the value from there is used, otherwise it is overwritten! maxiters: optional, default=None This variable is actually set in ``pose_config.yaml``. However, you can overwrite it with this hack. Don't use this regularly, just if you are too lazy to dig out the ``pose_config.yaml`` file for the corresponding project. If ``None``, the value from there is used, otherwise it is overwritten! allow_growth: bool, optional, default=True. For some smaller GPUs the memory issues happen. If ``True``, the memory allocator does not pre-allocate the entire specified GPU memory region, instead starting small and growing as needed. See issue: https://forum.image.sc/t/how-to-stop-running-out-of-vram/30551/2 gputouse: optional, default=None Natural number indicating the number of your GPU (see number in nvidia-smi). If you do not have a GPU put None. See: https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries autotune: bool, optional, default=False Property of TensorFlow, somehow faster if ``False`` (as Eldar found out, see https://github.com/tensorflow/tensorflow/issues/13317). keepdeconvweights: bool, optional, default=True Also restores the weights of the deconvolution layers (and the backbone) when training from a snapshot. Note that if you change the number of bodyparts, you need to set this to false for re-training. modelprefix: str, optional, default="" Directory containing the deeplabcut models to use when evaluating the network. By default, the models are assumed to exist in the project folder. Returns ------- None Examples -------- To train the network for first shuffle of the training dataset >>> deeplabcut.train_network('/analysis/project/reaching-task/config.yaml') To train the network for second shuffle of the training dataset >>> deeplabcut.train_network( '/analysis/project/reaching-task/config.yaml', shuffle=2, keepdeconvweights=True, )
Here, we give these items, load the config contents, and overwrite some defaults, including maxiters
, to restrict our training iterations to 5.
import yaml
paramset_idx = 0; paramset_desc='from_top_tracking'
with open(config_path, 'rb') as y:
config_params = yaml.safe_load(y)
training_params = {'shuffle': '1',
'trainingsetindex': '0',
'maxiters': '5',
'scorer_legacy': 'False',
'maxiters': '5',
'multianimalproject':'False'}
config_params.update(training_params)
train.TrainingParamSet.insert_new_params(paramset_idx=paramset_idx,
paramset_desc=paramset_desc,
params=config_params)
Now, we add a TrainingTask
. As a computed table, ModelTraining
will reference this to start training when calling populate()
train.TrainingTask.heading
video_set_id : int # paramset_idx : smallint # training_id : int # --- model_prefix="" : varchar(32) # project_path="" : varchar(255) # DLC's project_path in config relative to root
key={'video_set_id': 0,
'paramset_idx':0,
'training_id': 1,
'project_path':'from_top_tracking/'
}
train.TrainingTask.insert1(key, skip_duplicates=True)
train.TrainingTask()
video_set_id | paramset_idx | training_id | model_prefix | project_path DLC's project_path in config relative to root |
---|---|---|---|---|
0 | 0 | 1 | from_top_tracking/ |
Total: 1
train.ModelTraining.populate()
(Output cleared for brevity)
The network is now trained and ready to evaluate. Use the function 'evaluate_network' to evaluate the network.
train.ModelTraining()
video_set_id | paramset_idx | training_id | latest_snapshot latest exact snapshot index (i.e., never -1) | config_template stored full config file |
---|---|---|---|---|
0 | 0 | 1 | 5 | =BLOB= |
Total: 1
To resume training from a checkpoint, we would need to
edit the relevant config file (see also update_pose_cfg
in workflow_deeplabcut.load_demo_data
).
Emperical work suggests 200k iterations for any true use-case.
For better quality predictions in this demo, we'll revert the checkpoint file and use a pretrained model.
from workflow_deeplabcut.load_demo_data import revert_checkpoint_file
revert_checkpoint_file()
Tracking Joints/Body Parts¶
The model
schema uses a lookup table for managing Body Parts tracked across models.
model.BodyPart.heading
# body_part : varchar(32) # --- body_part_description="" : varchar(1000) #
Helper functions allow us to first, identify all the new body parts from a given config, and, second, insert them with user-friendly descriptions.
model.BodyPart.extract_new_body_parts(config_path)
Existing body parts: ['bodycenter' 'head' 'tailbase'] New body parts: []
array([], dtype='<U10')
bp_desc=['Body Center', 'Head', 'Base of Tail']
model.BodyPart.insert_from_config(config_path,bp_desc)
Existing body parts: [] New body parts: ['bodycenter' 'head' 'tailbase'] New descriptions: ['Body Center', 'Head', 'Base of Tail']
Declaring/Evaluating a Model¶
We can insert into Model
table for automatic evaluation
model.Model.insert_new_model(model_name='FromTop-latest',dlc_config=config_path,
shuffle=1,trainingsetindex=0,
model_description='FromTop - latest snapshot',
paramset_idx=0,
params={"snapshotindex":-1})
--- DLC Model specification to be inserted --- model_name: FromTop-latest model_description: FromTop - latest snapshot scorer: DLCmobnet100fromtoptrackingFeb23shuffle1 task: from_top_tracking date: Feb23 iteration: 0 snapshotindex: -1 shuffle: 1 trainingsetindex: 0 project_path: from_top_tracking paramset_idx: 0 -- Template/Contents of config.yaml -- Task: from_top_tracking scorer: DJ date: Feb23 video_sets: {'/tmp/test_data/from_top_tracking/videos/train1.mp4': {'crop': '0, 500, 0, 500'}} bodyparts: ['head', 'bodycenter', 'tailbase'] start: 0 stop: 1 numframes2pick: 20 pcutoff: 0.6 dotsize: 3 alphavalue: 0.7 colormap: viridis TrainingFraction: [0.95] iteration: 0 default_net_type: resnet_50 snapshotindex: -1 batch_size: 8 cropping: False x1: 0 x2: 640 y1: 277 y2: 624 corner2move2: [50, 50] move2corner: True croppedtraining: None default_augmenter: default identity: None maxiters: 5 modelprefix: multianimalproject: False scorer_legacy: False shuffle: 1 skeleton: [['bodypart1', 'bodypart2'], ['objectA', 'bodypart3']] skeleton_color: black train_fraction: 0.95 trainingsetindex: 0 project_path: /tmp/test_data/from_top_tracking
model.Model()
model_name User-friendly model name | task Task in the config yaml | date Date in the config yaml | iteration Iteration/version of this model | snapshotindex which snapshot for prediction (if -1, latest) | shuffle Shuffle (1) or not (0) | trainingsetindex Index of training fraction list in config.yaml | scorer Scorer/network name - DLC's GetScorerName() | config_template Dictionary of the config for analyze_videos() | project_path DLC's project_path in config relative to root | model_prefix | model_description | paramset_idx |
---|---|---|---|---|---|---|---|---|---|---|---|---|
FromTop-latest | from_top_tracking | Feb23 | 0 | -1 | 1 | 0 | DLCmobnet100fromtoptrackingFeb23shuffle1 | =BLOB= | from_top_tracking | FromTop - latest snapshot | 0 |
Total: 1
ModelEvaluation
will reference the Model
using the populate
method and insert the output from DeepLabCut's evaluate_network
function
model.ModelEvaluation.heading
model_name : varchar(64) # user-friendly model name --- train_iterations : int # Training iterations train_error=null : float # Train error (px) test_error=null : float # Test error (px) p_cutoff=null : float # p-cutoff used train_error_p=null : float # Train error with p-cutoff test_error_p=null : float # Test error with p-cutoff
model.ModelEvaluation.populate()
Config: {'all_joints': [[0], [1], [2]], 'all_joints_names': ['head', 'bodycenter', 'tailbase'], 'batch_size': 1, 'crop_pad': 0, 'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_from_top_trackingFeb23/from_top_tracking_DJ95shuffle1.mat', 'dataset_type': 'imgaug', 'deterministic': False, 'fg_fraction': 0.25, 'global_scale': 0.8, 'init_weights': '/Volumes/GoogleDrive/My ' 'Drive/ref/DeepLabCut/deeplabcut/pose_estimation_tensorflow/models/pretrained/mobilenet_v2_1.0_224.ckpt', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'location_refinement': True, 'locref_huber_loss': True, 'locref_loss_weight': 1.0, 'locref_stdev': 7.2801, 'log_dir': 'log', 'mean_pixel': [123.68, 116.779, 103.939], 'mirror': False, 'net_type': 'mobilenet_v2_1.0', 'num_joints': 3, 'optimizer': 'sgd', 'pairwise_huber_loss': True, 'pairwise_predict': False, 'partaffinityfield_predict': False, 'regularize': False, 'scoremap_dir': 'test', 'shuffle': True, 'snapshot_prefix': '/tmp/test_data/from_top_tracking/dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/test/snapshot', 'stride': 8.0, 'weigh_negatives': False, 'weigh_only_present_joints': False, 'weigh_part_predictions': False, 'weight_decay': 0.0001} /Users/cb/miniconda3/envs/ele/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py:1694: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead. warnings.warn('`layer.apply` is deprecated and '
Running DLC_mobnet_100_from_top_trackingFeb23shuffle1_103000 with # of training iterations: 103000 Running evaluation ...
20it [00:06, 3.29it/s]
Analysis is done and the results are stored (see evaluation-results) for snapshot: snapshot-103000 Results for 103000 training iterations: 95 1 train error: 9.28 pixels. Test error: 9.84 pixels. With pcutoff of 0.6 train error: 9.28 pixels. Test error: 9.84 pixels Thereby, the errors are given by the average distances between the labels by DLC and the scorer. The network is evaluated and the results are stored in the subdirectory 'evaluation_results'. Please check the results, then choose the best model (snapshot) for prediction. You can update the config.yaml file with the appropriate index for the 'snapshotindex'. Use the function 'analyze_video' to make predictions on new videos. Otherwise, consider adding more labeled-data and retraining the network (see DeepLabCut workflow Fig 2, Nath 2019)
model.ModelEvaluation()
model_name User-friendly model name | train_iterations Training iterations | train_error Train error (px) | test_error Test error (px) | p_cutoff p-cutoff used | train_error_p Train error with p-cutoff | test_error_p Test error with p-cutoff |
---|---|---|---|---|---|---|
FromTop-latest | 103000 | 9.28 | 9.84 | 0.6 | 9.28 | 9.84 |
Total: 1
Pose Estimation¶
To use our model, we'll first need to insert a session recoring into VideoRecording
model.VideoRecording()
subject | session_datetime | recording_id | device |
---|---|---|---|
Total: 0
key = {'subject': 'subject6',
'session_datetime': '2021-06-02 14:04:22',
'recording_id': '1', 'device': 'Camera1'}
model.VideoRecording.insert1(key)
_ = key.pop('device') # get rid of secondary key from master table
key.update({'file_id': 1,
'file_path': 'from_top_tracking/videos/test-2s.mp4'})
model.VideoRecording.File.insert1(key)
model.VideoRecording.File()
subject | session_datetime | recording_id | file_id | file_path filepath of video, relative to root data directory |
---|---|---|---|---|
subject6 | 2021-06-02 14:04:22 | 1 | 1 | from_top_tracking/videos/test-2s.mp4 |
Total: 1
RecordingInfo
automatically populates with file information
model.RecordingInfo.populate()
model.RecordingInfo()
subject | session_datetime | recording_id | px_height height in pixels | px_width width in pixels | nframes number of frames | fps (Hz) frames per second | recording_datetime Datetime for the start of the recording | recording_duration video duration (s) from nframes / fps |
---|---|---|---|---|---|---|---|---|
subject6 | 2021-06-02 14:04:22 | 1 | 500 | 500 | 123 | 60 | None | 2.05 |
Total: 1
Next, we specify if the PoseEstimation
table should load results from an existing file or trigger the estimation command. Here, we can also specify parameters for DeepLabCut's analyze_videos
as a dictionary.
key = (model.VideoRecording & {'recording_id': '1'}).fetch1('KEY')
key.update({'model_name': 'FromTop-latest', 'task_mode': 'trigger'})
key
{'subject': 'subject6', 'session_datetime': datetime.datetime(2021, 6, 2, 14, 4, 22), 'recording_id': 1, 'model_name': 'FromTop-latest', 'task_mode': 'trigger'}
model.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True})
model.PoseEstimation.populate()
Config: {'all_joints': [[0], [1], [2]], 'all_joints_names': ['head', 'bodycenter', 'tailbase'], 'batch_size': 1, 'crop_pad': 0, 'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_from_top_trackingFeb23/from_top_tracking_DJ95shuffle1.mat', 'dataset_type': 'imgaug', 'deterministic': False, 'fg_fraction': 0.25, 'global_scale': 0.8, 'init_weights': '/Volumes/GoogleDrive/My ' 'Drive/ref/DeepLabCut/deeplabcut/pose_estimation_tensorflow/models/pretrained/mobilenet_v2_1.0_224.ckpt', 'intermediate_supervision': False, 'intermediate_supervision_layer': 12, 'location_refinement': True, 'locref_huber_loss': True, 'locref_loss_weight': 1.0, 'locref_stdev': 7.2801, 'log_dir': 'log', 'mean_pixel': [123.68, 116.779, 103.939], 'mirror': False, 'net_type': 'mobilenet_v2_1.0', 'num_joints': 3, 'optimizer': 'sgd', 'pairwise_huber_loss': True, 'pairwise_predict': False, 'partaffinityfield_predict': False, 'regularize': False, 'scoremap_dir': 'test', 'shuffle': True, 'snapshot_prefix': '/tmp/test_data/from_top_tracking/dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/test/snapshot', 'stride': 8.0, 'weigh_negatives': False, 'weigh_only_present_joints': False, 'weigh_part_predictions': False, 'weight_decay': 0.0001} /Users/cb/miniconda3/envs/ele/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py:1694: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead. warnings.warn('`layer.apply` is deprecated and '
Using snapshot-103000 for model /tmp/test_data/from_top_tracking/dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1 Starting to analyze % /tmp/test_data/from_top_tracking/videos/test-2s.mp4 Loading /tmp/test_data/from_top_tracking/videos/test-2s.mp4 Duration of video [s]: 2.05 , recorded with 60.0 fps! Overall # of frames: 123 found with (before cropping) frame dimensions: 500 500 Starting to extract posture
98%|█████████▊| 120/123 [00:37<00:00, 3.22it/s]
Saving results in /tmp/test_data/from_top_tracking/videos/device_Camera1_recording_1_model_FromTop-latest... Saving csv poses! The videos are analyzed. Now your research can truly start! You can create labeled videos with 'create_labeled_video' If the tracking is not satisfactory for some videos, consider expanding the training set. You can use the function 'extract_outlier_frames' to extract a few representative outlier frames.
By default, DataJoint will store results in a subdirectory
<processed_dir> / videos / device_<name>_recording_<#>_model_<name>
where
processed_dir
is optionally specified in the datajoint config. If unspecified, this will be the project directory. The device and model names are specified elsewhere in the schema.
We can get this estimation directly as a pandas dataframe.
model.PoseEstimation.get_trajectory(key)
scorer | FromTop-latest | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
bodyparts | bodycenter | head | tailbase | |||||||||
coords | x | y | z | likelihood | x | y | z | likelihood | x | y | z | likelihood |
0 | 246.782684 | 298.728088 | 0.0 | 0.999998 | 241.036957 | 316.332489 | 0.0 | 0.999850 | 256.203064 | 278.553314 | 0.0 | 0.999998 |
1 | 246.217529 | 299.358063 | 0.0 | 0.999997 | 239.048737 | 319.177002 | 0.0 | 0.999905 | 255.819626 | 280.200745 | 0.0 | 0.999996 |
2 | 244.459579 | 301.309235 | 0.0 | 0.999999 | 240.238800 | 320.525696 | 0.0 | 0.999899 | 255.705093 | 280.939056 | 0.0 | 0.999995 |
3 | 242.014755 | 302.865204 | 0.0 | 0.999999 | 238.536774 | 322.324463 | 0.0 | 0.999941 | 254.424484 | 282.015778 | 0.0 | 0.999990 |
4 | 240.900177 | 303.459167 | 0.0 | 0.999998 | 237.967987 | 324.072327 | 0.0 | 0.999941 | 252.180603 | 280.899200 | 0.0 | 0.999977 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
118 | 248.682251 | 364.709869 | 0.0 | 0.999965 | 270.854980 | 371.893127 | 0.0 | 0.999961 | 234.899185 | 356.035583 | 0.0 | 0.999996 |
119 | 250.326385 | 366.870361 | 0.0 | 0.999972 | 271.488495 | 373.099884 | 0.0 | 0.999991 | 235.644073 | 356.815125 | 0.0 | 0.999989 |
120 | 251.634140 | 367.709198 | 0.0 | 0.999972 | 272.043884 | 373.402893 | 0.0 | 0.999995 | 236.953812 | 358.651459 | 0.0 | 0.999977 |
121 | 255.393692 | 364.111145 | 0.0 | 0.999979 | 273.417572 | 373.906799 | 0.0 | 0.999997 | 238.825363 | 361.561798 | 0.0 | 0.999885 |
122 | 257.736847 | 365.264008 | 0.0 | 0.999996 | 276.008667 | 373.901245 | 0.0 | 0.999992 | 239.148163 | 364.029297 | 0.0 | 0.999962 |
123 rows × 12 columns
In the next notebook, we'll look at additional tools in the workflow for automating these steps.