snippets

Short snippets or how-to's:

Data Handling

Two!Ears data to HDF5 format

in matlab:

addpath('~/src/caffe_sandbox/datasets/twoears/'); % for calling twoears2hdf5()
% format train set
fpath_src = './train/dataStoreUni.mat';
dir_dst = './data/twoears/' % directory must exist
twoears2hdf5(fpath_src, dir_dst);
% same for test set
fpath_src = './test/dataStoreUni.mat';;
dir_dst = './data/twoears/' % directory must exist, can be same location as test set
twoears2hdf5(fpath_src, dir_dst);

Data balancing:

in python:

from nideep.datasets.balance_hdf5 import save_balanced_class_count_hdf5 
# train set:
fpath_src = './data/data_train.h5'
fpath_dst = './data/bal/data_train.h5' # parent directory must exist
keys = ['feat1', 'feat2'] # make sure classnames are not included
idxs = save_balanced_class_count_hdf5(fpath_src, keys, fpath_dst, key_label='label', other_clname='general')

Too much data in HDF5

Caffe has(had) a cap on the size of a single HDF5 file. The workaround is to split the HDF5 into multiple smaller HDF5 files and list them in the .txt file provided to the HDF5DataLayer.

in python:

from nideep.iow.to_hdf5 import split_hdf5
fpath_src = './data/data_train.h5'
paths = split_hdf5(fpath_src, './data/split/')
# ...create a txt file in which each line contains the absolute path of each new smaller HDF5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snippets

Data Handling

Two!Ears data to HDF5 format

Data balancing:

Too much data in HDF5

Clone this wiki locally