DataJoint U24 - Workflow DeepLabCut¶
Interactively run the workflow¶
The workflow requires a DeepLabCut project with labeled data.
- If you don't have data, refer to 00-DataDownload and 01-Configure.
- For an overview of the schema, refer to 02-WorkflowStructure.
- For a more automated approach, refer to 03-Automate.
Let's change the directory to load the local config, dj_local_conf.json.
import os
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd())=='notebooks': os.chdir('..')
assert os.path.basename(os.getcwd())=='workflow-deeplabcut', ("Please move to the "
+ "workflow directory")
Pipeline.py activates the DataJoint elements and declares other required tables.
import datajoint as dj
from workflow_deeplabcut.pipeline import lab, subject, session, train, model
# Directing our pipeline to the appropriate config location
from element_interface.utils import find_full_path
from workflow_deeplabcut.paths import get_dlc_root_data_dir
config_path = find_full_path(get_dlc_root_data_dir(),
'from_top_tracking/config.yaml')
Connecting cbroz@dss-db.datajoint.io:3306
Manually Inserting Entries¶
Upstream tables¶
We can insert entries into dj.Manual tables (green in diagrams) by providing values as a dictionary or a list of dictionaries.
session.Session.heading
# subject : varchar(32) # session_datetime : datetime(3) #
subject.Subject.insert1(dict(subject='subject6',
sex='F',
subject_birth_date='2020-01-01',
subject_description='hneih_E105'))
session_keys = [dict(subject='subject6', session_datetime='2021-06-02 14:04:22'),
dict(subject='subject6', session_datetime='2021-06-03 14:43:10')]
session.Session.insert(session_keys)
We can look at the contents of this table and restrict by a value.
session.Session() & "session_datetime > '2021-06-01 12:00:00'" & "subject='subject6'"
| subject | session_datetime |
|---|---|
| subject6 | 2021-06-02 14:04:22 |
| subject6 | 2021-06-03 14:43:10 |
Total: 2
DeepLabcut Tables¶
The VideoSet table in the train schema retains records of files generated in the video labeling process (e.g., h5, csv, png). DeepLabCut will refer to the mat file located under the training-datasets directory.
We recommend storing all paths as relative to the root in your config.
train.VideoSet.insert1({'video_set_id': 0})
project_folder = 'from_top_tracking/'
training_files = ['labeled-data/train1/CollectedData_DJ.h5',
'labeled-data/train1/CollectedData_DJ.csv',
'labeled-data/train1/img00674.png',
'videos/train1.mp4']
for idx, filename in enumerate(training_files):
train.VideoSet.File.insert1({'video_set_id': 0,
'file_id': idx,
'file_path': (project_folder + filename)})
train.VideoSet.File()
| video_set_id | file_id | file_path |
|---|---|---|
| 0 | 0 | from_top_tracking/labeled-data/train1/CollectedData_DJ.h5 |
| 0 | 1 | from_top_tracking/labeled-data/train1/CollectedData_DJ.csv |
| 0 | 2 | from_top_tracking/labeled-data/train1/img00674.png |
| 0 | 3 | from_top_tracking/videos/train1.mp4 |
Total: 4
Training a Network¶
First, we'll add a ModelTrainingParamSet. This is a lookup table that we can reference when training a model.
train.TrainingParamSet.heading
paramset_idx : smallint # --- paramset_desc : varchar(128) # param_set_hash : uuid # hash identifying this parameterset params : longblob # dictionary of all applicable parameters
The params longblob should be a dictionary that captures all items for DeepLabCut's train_network function. At minimum, this is the contents of the project's config file, as well as suffle and trainingsetindex, which are not included in the config.
from deeplabcut import train_network
help(train_network) # for more information on optional parameters
Loading DLC 2.2.1.1...
DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI)
Help on function train_network in module deeplabcut.pose_estimation_tensorflow.training:
train_network(config, shuffle=1, trainingsetindex=0, max_snapshots_to_keep=5, displayiters=None, saveiters=None, maxiters=None, allow_growth=True, gputouse=None, autotune=False, keepdeconvweights=True, modelprefix='')
Trains the network with the labels in the training dataset.
Parameters
----------
config : string
Full path of the config.yaml file as a string.
shuffle: int, optional, default=1
Integer value specifying the shuffle index to select for training.
trainingsetindex: int, optional, default=0
Integer specifying which TrainingsetFraction to use.
Note that TrainingFraction is a list in config.yaml.
max_snapshots_to_keep: int or None
Sets how many snapshots are kept, i.e. states of the trained network. Every
saving interation many times a snapshot is stored, however only the last
``max_snapshots_to_keep`` many are kept! If you change this to None, then all
are kept.
See: https://github.com/DeepLabCut/DeepLabCut/issues/8#issuecomment-387404835
displayiters: optional, default=None
This variable is actually set in ``pose_config.yaml``. However, you can
overwrite it with this hack. Don't use this regularly, just if you are too lazy
to dig out the ``pose_config.yaml`` file for the corresponding project. If
``None``, the value from there is used, otherwise it is overwritten!
saveiters: optional, default=None
This variable is actually set in ``pose_config.yaml``. However, you can
overwrite it with this hack. Don't use this regularly, just if you are too lazy
to dig out the ``pose_config.yaml`` file for the corresponding project.
If ``None``, the value from there is used, otherwise it is overwritten!
maxiters: optional, default=None
This variable is actually set in ``pose_config.yaml``. However, you can
overwrite it with this hack. Don't use this regularly, just if you are too lazy
to dig out the ``pose_config.yaml`` file for the corresponding project.
If ``None``, the value from there is used, otherwise it is overwritten!
allow_growth: bool, optional, default=True.
For some smaller GPUs the memory issues happen. If ``True``, the memory
allocator does not pre-allocate the entire specified GPU memory region, instead
starting small and growing as needed.
See issue: https://forum.image.sc/t/how-to-stop-running-out-of-vram/30551/2
gputouse: optional, default=None
Natural number indicating the number of your GPU (see number in nvidia-smi).
If you do not have a GPU put None.
See: https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries
autotune: bool, optional, default=False
Property of TensorFlow, somehow faster if ``False``
(as Eldar found out, see https://github.com/tensorflow/tensorflow/issues/13317).
keepdeconvweights: bool, optional, default=True
Also restores the weights of the deconvolution layers (and the backbone) when
training from a snapshot. Note that if you change the number of bodyparts, you
need to set this to false for re-training.
modelprefix: str, optional, default=""
Directory containing the deeplabcut models to use when evaluating the network.
By default, the models are assumed to exist in the project folder.
Returns
-------
None
Examples
--------
To train the network for first shuffle of the training dataset
>>> deeplabcut.train_network('/analysis/project/reaching-task/config.yaml')
To train the network for second shuffle of the training dataset
>>> deeplabcut.train_network(
'/analysis/project/reaching-task/config.yaml',
shuffle=2,
keepdeconvweights=True,
)
Here, we give these items, load the config contents, and overwrite some defaults, including maxiters, to restrict our training iterations to 5.
import yaml
paramset_idx = 0; paramset_desc='from_top_tracking'
with open(config_path, 'rb') as y:
config_params = yaml.safe_load(y)
training_params = {'shuffle': '1',
'trainingsetindex': '0',
'maxiters': '5',
'scorer_legacy': 'False',
'maxiters': '5',
'multianimalproject':'False'}
config_params.update(training_params)
train.TrainingParamSet.insert_new_params(paramset_idx=paramset_idx,
paramset_desc=paramset_desc,
params=config_params)
Now, we add a TrainingTask. As a computed table, ModelTraining will reference this to start training when calling populate()
train.TrainingTask.heading
video_set_id : int # paramset_idx : smallint # training_id : int # --- model_prefix="" : varchar(32) # project_path="" : varchar(255) # DLC's project_path in config relative to root
key={'video_set_id': 0,
'paramset_idx':0,
'training_id': 1,
'project_path':'from_top_tracking/'
}
train.TrainingTask.insert1(key, skip_duplicates=True)
train.TrainingTask()
| video_set_id | paramset_idx | training_id | model_prefix | project_path DLC's project_path in config relative to root |
|---|---|---|---|---|
| 0 | 0 | 1 | from_top_tracking/ |
Total: 1
train.ModelTraining.populate()
(Output cleared for brevity)
The network is now trained and ready to evaluate. Use the function 'evaluate_network' to evaluate the network.
train.ModelTraining()
| video_set_id | paramset_idx | training_id | latest_snapshot latest exact snapshot index (i.e., never -1) | config_template stored full config file |
|---|---|---|---|---|
| 0 | 0 | 1 | 5 | =BLOB= |
Total: 1
To resume training from a checkpoint, we would need to
edit the relevant config file (see also update_pose_cfg in workflow_deeplabcut.load_demo_data).
Emperical work suggests 200k iterations for any true use-case.
For better quality predictions in this demo, we'll revert the checkpoint file and use a pretrained model.
from workflow_deeplabcut.load_demo_data import revert_checkpoint_file
revert_checkpoint_file()
Tracking Joints/Body Parts¶
The model schema uses a lookup table for managing Body Parts tracked across models.
model.BodyPart.heading
# body_part : varchar(32) # --- body_part_description="" : varchar(1000) #
Helper functions allow us to first, identify all the new body parts from a given config, and, second, insert them with user-friendly descriptions.
model.BodyPart.extract_new_body_parts(config_path)
Existing body parts: ['bodycenter' 'head' 'tailbase'] New body parts: []
array([], dtype='<U10')
bp_desc=['Body Center', 'Head', 'Base of Tail']
model.BodyPart.insert_from_config(config_path,bp_desc)
Existing body parts: [] New body parts: ['bodycenter' 'head' 'tailbase'] New descriptions: ['Body Center', 'Head', 'Base of Tail']
Declaring/Evaluating a Model¶
We can insert into Model table for automatic evaluation
model.Model.insert_new_model(model_name='FromTop-latest',dlc_config=config_path,
shuffle=1,trainingsetindex=0,
model_description='FromTop - latest snapshot',
paramset_idx=0,
params={"snapshotindex":-1})
--- DLC Model specification to be inserted ---
model_name: FromTop-latest
model_description: FromTop - latest snapshot
scorer: DLCmobnet100fromtoptrackingFeb23shuffle1
task: from_top_tracking
date: Feb23
iteration: 0
snapshotindex: -1
shuffle: 1
trainingsetindex: 0
project_path: from_top_tracking
paramset_idx: 0
-- Template/Contents of config.yaml --
Task: from_top_tracking
scorer: DJ
date: Feb23
video_sets: {'/tmp/test_data/from_top_tracking/videos/train1.mp4': {'crop': '0, 500, 0, 500'}}
bodyparts: ['head', 'bodycenter', 'tailbase']
start: 0
stop: 1
numframes2pick: 20
pcutoff: 0.6
dotsize: 3
alphavalue: 0.7
colormap: viridis
TrainingFraction: [0.95]
iteration: 0
default_net_type: resnet_50
snapshotindex: -1
batch_size: 8
cropping: False
x1: 0
x2: 640
y1: 277
y2: 624
corner2move2: [50, 50]
move2corner: True
croppedtraining: None
default_augmenter: default
identity: None
maxiters: 5
modelprefix:
multianimalproject: False
scorer_legacy: False
shuffle: 1
skeleton: [['bodypart1', 'bodypart2'], ['objectA', 'bodypart3']]
skeleton_color: black
train_fraction: 0.95
trainingsetindex: 0
project_path: /tmp/test_data/from_top_tracking
model.Model()
| model_name User-friendly model name | task Task in the config yaml | date Date in the config yaml | iteration Iteration/version of this model | snapshotindex which snapshot for prediction (if -1, latest) | shuffle Shuffle (1) or not (0) | trainingsetindex Index of training fraction list in config.yaml | scorer Scorer/network name - DLC's GetScorerName() | config_template Dictionary of the config for analyze_videos() | project_path DLC's project_path in config relative to root | model_prefix | model_description | paramset_idx |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FromTop-latest | from_top_tracking | Feb23 | 0 | -1 | 1 | 0 | DLCmobnet100fromtoptrackingFeb23shuffle1 | =BLOB= | from_top_tracking | FromTop - latest snapshot | 0 |
Total: 1
ModelEvaluation will reference the Model using the populate method and insert the output from DeepLabCut's evaluate_network function
model.ModelEvaluation.heading
model_name : varchar(64) # user-friendly model name --- train_iterations : int # Training iterations train_error=null : float # Train error (px) test_error=null : float # Test error (px) p_cutoff=null : float # p-cutoff used train_error_p=null : float # Train error with p-cutoff test_error_p=null : float # Test error with p-cutoff
model.ModelEvaluation.populate()
Config:
{'all_joints': [[0], [1], [2]],
'all_joints_names': ['head', 'bodycenter', 'tailbase'],
'batch_size': 1,
'crop_pad': 0,
'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_from_top_trackingFeb23/from_top_tracking_DJ95shuffle1.mat',
'dataset_type': 'imgaug',
'deterministic': False,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': '/Volumes/GoogleDrive/My '
'Drive/ref/DeepLabCut/deeplabcut/pose_estimation_tensorflow/models/pretrained/mobilenet_v2_1.0_224.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 1.0,
'locref_stdev': 7.2801,
'log_dir': 'log',
'mean_pixel': [123.68, 116.779, 103.939],
'mirror': False,
'net_type': 'mobilenet_v2_1.0',
'num_joints': 3,
'optimizer': 'sgd',
'pairwise_huber_loss': True,
'pairwise_predict': False,
'partaffinityfield_predict': False,
'regularize': False,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': '/tmp/test_data/from_top_tracking/dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/test/snapshot',
'stride': 8.0,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
/Users/cb/miniconda3/envs/ele/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py:1694: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
warnings.warn('`layer.apply` is deprecated and '
Running DLC_mobnet_100_from_top_trackingFeb23shuffle1_103000 with # of training iterations: 103000 Running evaluation ...
20it [00:06, 3.29it/s]
Analysis is done and the results are stored (see evaluation-results) for snapshot: snapshot-103000 Results for 103000 training iterations: 95 1 train error: 9.28 pixels. Test error: 9.84 pixels. With pcutoff of 0.6 train error: 9.28 pixels. Test error: 9.84 pixels Thereby, the errors are given by the average distances between the labels by DLC and the scorer. The network is evaluated and the results are stored in the subdirectory 'evaluation_results'. Please check the results, then choose the best model (snapshot) for prediction. You can update the config.yaml file with the appropriate index for the 'snapshotindex'. Use the function 'analyze_video' to make predictions on new videos. Otherwise, consider adding more labeled-data and retraining the network (see DeepLabCut workflow Fig 2, Nath 2019)
model.ModelEvaluation()
| model_name User-friendly model name | train_iterations Training iterations | train_error Train error (px) | test_error Test error (px) | p_cutoff p-cutoff used | train_error_p Train error with p-cutoff | test_error_p Test error with p-cutoff |
|---|---|---|---|---|---|---|
| FromTop-latest | 103000 | 9.28 | 9.84 | 0.6 | 9.28 | 9.84 |
Total: 1
Pose Estimation¶
To use our model, we'll first need to insert a session recoring into VideoRecording
model.VideoRecording()
| subject | session_datetime | recording_id | device |
|---|---|---|---|
Total: 0
key = {'subject': 'subject6',
'session_datetime': '2021-06-02 14:04:22',
'recording_id': '1', 'device': 'Camera1'}
model.VideoRecording.insert1(key)
_ = key.pop('device') # get rid of secondary key from master table
key.update({'file_id': 1,
'file_path': 'from_top_tracking/videos/test-2s.mp4'})
model.VideoRecording.File.insert1(key)
model.VideoRecording.File()
| subject | session_datetime | recording_id | file_id | file_path filepath of video, relative to root data directory |
|---|---|---|---|---|
| subject6 | 2021-06-02 14:04:22 | 1 | 1 | from_top_tracking/videos/test-2s.mp4 |
Total: 1
RecordingInfo automatically populates with file information
model.RecordingInfo.populate()
model.RecordingInfo()
| subject | session_datetime | recording_id | px_height height in pixels | px_width width in pixels | nframes number of frames | fps (Hz) frames per second | recording_datetime Datetime for the start of the recording | recording_duration video duration (s) from nframes / fps |
|---|---|---|---|---|---|---|---|---|
| subject6 | 2021-06-02 14:04:22 | 1 | 500 | 500 | 123 | 60 | None | 2.05 |
Total: 1
Next, we specify if the PoseEstimation table should load results from an existing file or trigger the estimation command. Here, we can also specify parameters for DeepLabCut's analyze_videos as a dictionary.
key = (model.VideoRecording & {'recording_id': '1'}).fetch1('KEY')
key.update({'model_name': 'FromTop-latest', 'task_mode': 'trigger'})
key
{'subject': 'subject6',
'session_datetime': datetime.datetime(2021, 6, 2, 14, 4, 22),
'recording_id': 1,
'model_name': 'FromTop-latest',
'task_mode': 'trigger'}
model.PoseEstimationTask.insert_estimation_task(key,params={'save_as_csv':True})
model.PoseEstimation.populate()
Config:
{'all_joints': [[0], [1], [2]],
'all_joints_names': ['head', 'bodycenter', 'tailbase'],
'batch_size': 1,
'crop_pad': 0,
'dataset': 'training-datasets/iteration-0/UnaugmentedDataSet_from_top_trackingFeb23/from_top_tracking_DJ95shuffle1.mat',
'dataset_type': 'imgaug',
'deterministic': False,
'fg_fraction': 0.25,
'global_scale': 0.8,
'init_weights': '/Volumes/GoogleDrive/My '
'Drive/ref/DeepLabCut/deeplabcut/pose_estimation_tensorflow/models/pretrained/mobilenet_v2_1.0_224.ckpt',
'intermediate_supervision': False,
'intermediate_supervision_layer': 12,
'location_refinement': True,
'locref_huber_loss': True,
'locref_loss_weight': 1.0,
'locref_stdev': 7.2801,
'log_dir': 'log',
'mean_pixel': [123.68, 116.779, 103.939],
'mirror': False,
'net_type': 'mobilenet_v2_1.0',
'num_joints': 3,
'optimizer': 'sgd',
'pairwise_huber_loss': True,
'pairwise_predict': False,
'partaffinityfield_predict': False,
'regularize': False,
'scoremap_dir': 'test',
'shuffle': True,
'snapshot_prefix': '/tmp/test_data/from_top_tracking/dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/test/snapshot',
'stride': 8.0,
'weigh_negatives': False,
'weigh_only_present_joints': False,
'weigh_part_predictions': False,
'weight_decay': 0.0001}
/Users/cb/miniconda3/envs/ele/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py:1694: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
warnings.warn('`layer.apply` is deprecated and '
Using snapshot-103000 for model /tmp/test_data/from_top_tracking/dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1 Starting to analyze % /tmp/test_data/from_top_tracking/videos/test-2s.mp4 Loading /tmp/test_data/from_top_tracking/videos/test-2s.mp4 Duration of video [s]: 2.05 , recorded with 60.0 fps! Overall # of frames: 123 found with (before cropping) frame dimensions: 500 500 Starting to extract posture
98%|█████████▊| 120/123 [00:37<00:00, 3.22it/s]
Saving results in /tmp/test_data/from_top_tracking/videos/device_Camera1_recording_1_model_FromTop-latest... Saving csv poses! The videos are analyzed. Now your research can truly start! You can create labeled videos with 'create_labeled_video' If the tracking is not satisfactory for some videos, consider expanding the training set. You can use the function 'extract_outlier_frames' to extract a few representative outlier frames.
By default, DataJoint will store results in a subdirectory
<processed_dir> / videos / device_<name>_recording_<#>_model_<name>where
processed_diris optionally specified in the datajoint config. If unspecified, this will be the project directory. The device and model names are specified elsewhere in the schema.
We can get this estimation directly as a pandas dataframe.
model.PoseEstimation.get_trajectory(key)
| scorer | FromTop-latest | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bodyparts | bodycenter | head | tailbase | |||||||||
| coords | x | y | z | likelihood | x | y | z | likelihood | x | y | z | likelihood |
| 0 | 246.782684 | 298.728088 | 0.0 | 0.999998 | 241.036957 | 316.332489 | 0.0 | 0.999850 | 256.203064 | 278.553314 | 0.0 | 0.999998 |
| 1 | 246.217529 | 299.358063 | 0.0 | 0.999997 | 239.048737 | 319.177002 | 0.0 | 0.999905 | 255.819626 | 280.200745 | 0.0 | 0.999996 |
| 2 | 244.459579 | 301.309235 | 0.0 | 0.999999 | 240.238800 | 320.525696 | 0.0 | 0.999899 | 255.705093 | 280.939056 | 0.0 | 0.999995 |
| 3 | 242.014755 | 302.865204 | 0.0 | 0.999999 | 238.536774 | 322.324463 | 0.0 | 0.999941 | 254.424484 | 282.015778 | 0.0 | 0.999990 |
| 4 | 240.900177 | 303.459167 | 0.0 | 0.999998 | 237.967987 | 324.072327 | 0.0 | 0.999941 | 252.180603 | 280.899200 | 0.0 | 0.999977 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 118 | 248.682251 | 364.709869 | 0.0 | 0.999965 | 270.854980 | 371.893127 | 0.0 | 0.999961 | 234.899185 | 356.035583 | 0.0 | 0.999996 |
| 119 | 250.326385 | 366.870361 | 0.0 | 0.999972 | 271.488495 | 373.099884 | 0.0 | 0.999991 | 235.644073 | 356.815125 | 0.0 | 0.999989 |
| 120 | 251.634140 | 367.709198 | 0.0 | 0.999972 | 272.043884 | 373.402893 | 0.0 | 0.999995 | 236.953812 | 358.651459 | 0.0 | 0.999977 |
| 121 | 255.393692 | 364.111145 | 0.0 | 0.999979 | 273.417572 | 373.906799 | 0.0 | 0.999997 | 238.825363 | 361.561798 | 0.0 | 0.999885 |
| 122 | 257.736847 | 365.264008 | 0.0 | 0.999996 | 276.008667 | 373.901245 | 0.0 | 0.999992 | 239.148163 | 364.029297 | 0.0 | 0.999962 |
123 rows × 12 columns
In the next notebook, we'll look at additional tools in the workflow for automating these steps.