Skip to content
Youssef Kashef edited this page Mar 16, 2016 · 10 revisions

Short snippets or how-to's:

Data Handling

Two!Ears data to HDF5 format

in matlab:

addpath('~/src/caffe_sandbox/datasets/twoears/'); % for calling twoears2hdf5()
% format train set
fpath_src = './train/dataStoreUni.mat';
dir_dst = './data/twoears/' % directory must exist
twoears2hdf5(fpath_src, dir_dst);
% same for test set
fpath_src = './test/dataStoreUni.mat';;
dir_dst = './data/twoears/' % directory must exist, can be same location as test set
twoears2hdf5(fpath_src, dir_dst);

Data balancing:

in python:

from nideep.datasets.balance_hdf5 import save_balanced_class_count_hdf5 
# train set:
fpath_src = './data/data_train.h5'
fpath_dst = './data/bal/data_train.h5' # parent directory must exist
keys = ['feat1', 'feat2'] # make sure classnames are not included
idxs = save_balanced_class_count_hdf5(fpath_src, keys, fpath_dst, key_label='label', other_clname='general')

Too much data in HDF5

Caffe has(had) a cap on the size of a single HDF5 file. The workaround is to split the HDF5 into multiple smaller HDF5 files and list them in the .txt file provided to the HDF5DataLayer.

in python:

from nideep.iow.to_hdf5 import split_hdf5
fpath_src = './data/data_train.h5'
paths = split_hdf5(fpath_src, './data/split/')
# ...create a txt file in which each line contains the absolute path of each new smaller HDF5
Clone this wiki locally