-
Notifications
You must be signed in to change notification settings - Fork 66
Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24
base: master
Are you sure you want to change the base?
Conversation
DeepResnet on Neon 2.3 gives the following error:
|
Thanks @tpatejko for the PR. Also, could you please take a look at https://github.com/NervanaSystems/modelzoo/tree/VGG_neon2.3 to merge the best ingredients of the two for VGG? Thank you! |
@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/} | |||
echo "Downloading weights file from ${WEIGHTS_URL}" | |||
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null | |||
|
|||
python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1 | |||
python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted/ --manifest_root /data/i1k-extracted --manifest train:/data/i1k-extracted/train-index.csv --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar -z 256 | tee output.dat 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use "/dataset/aeon/I1K/i1k-extracted/" instead of "/data/i1k-extracted/" to be consistent with validation dataset structure?
@@ -23,14 +23,14 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/} | |||
echo "Downloading weights file from ${WEIGHTS_URL}" | |||
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null | |||
|
|||
python -u $TEST_SCRIPT -i ${EXECUTOR_NUMBER} -vvv --model_file $WEIGHTS_FILE --no_progress_bar -w /usr/local/data/CIFAR10/macrobatches | tee output.dat 2>&1 | |||
python -u $TEST_SCRIPT -i ${EXECUTOR_NUMBER} -vvv --model_file $WEIGHTS_FILE --manifest val:/data/CIFAR/val-index.csv --manifest_root /data/CIFAR -b gpu -z 32 --no_progress_bar 2>&1 | tee output.dat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use "/dataset/aeon/CIFAR10" instead of "/data/CIFAR"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baojun-nervana Out of curiosity, is there any particular reason why these paths should be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpatejko that is the validation dataset directory, so we don't have to update those after the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/} | |||
echo "Downloading weights file from ${WEIGHTS_URL}" | |||
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null | |||
|
|||
python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1 | |||
python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted/ --manifest_root /data/i1k-extracted --manifest train:/data/i1k-extracted/train-index.csv --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar -z 256 2>&1 | tee output.dat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/} | |||
echo "Downloading weights file from ${WEIGHTS_URL}" | |||
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null | |||
|
|||
python -u googlenet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1 | |||
python -u googlenet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted --manifest_root /data/i1k-extracted --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar 2>&1 | tee output.dat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -22,7 +22,8 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/} | |||
echo "Downloading weights file from ${WEIGHTS_URL}" | |||
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null | |||
|
|||
python -u vgg_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar -z 64 --vgg_version E | tee output.dat 2>&1 | |||
python -u vgg_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted -vvv --manifest_root /data/i1k-extracted --manifest val:/data/i1k-extracted/val-index.csv --model_file $WEIGHTS_FILE --no_progress_bar -z 64 --vgg_version E 2>&1 | tee output.dat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@wei-v-wang Thanks for your comment I tried to change the script following your remarks but I have a problem with the following part of code:
but I'm not quite sure how to initialize Model part. I tried to use code that builds model in file resnet_cifar10.py but I encountered the following error:
Another question: could you elaborate on what you mean by merging best ingredients of two for VGG? |
@@ -13,7 +13,7 @@ The trained weights file can be downloaded from AWS | |||
|
|||
|
|||
### neon version | |||
The model weight file above has been generated using neon version tag [v1.4.0]((https://github.com/NervanaSystems/neon/releases/tag/v1.4.0). | |||
The model weight file above has been generated using neon version tag [v2.3.0]((https://github.com/NervanaSystems/neon/releases/tag/v2.3.0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The mode weight file" will probably need to be converted to new format. Have you already obtained a trained weight with neon v2.2/v2.3 format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming yes.
@@ -105,15 +135,15 @@ | |||
|
|||
# configure callbacks | |||
valmetric = TopKMisclassification(k=5) | |||
callbacks = Callbacks(model, eval_set=test, metric=valmetric, **args.callback_args) | |||
callbacks = Callbacks(model, eval_set=val, metric=valmetric, **args.callback_args) | |||
|
|||
if args.model_file is not None: | |||
model.load_params(args.model_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.load_params(args.model_file) should work. Why does resnet_eval.py in cifar10 does not support this.
We can see alexnet support this.
val_config = config(manifest_filename, manifest_root, batch_size, subset_pct) | ||
return wrap_dataloader(AeonDataLoader(val_config)) | ||
|
||
test_set = make_val_config(args.manifest["val"], args.manifest_root, batch_size=args.batch_size) | ||
|
||
model = Model(load_obj(args.model_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model=Model(load_obj(args.model_file)) will break the code, since neon v2.2 and v2.3 was not supposed to initialize a model using old weights. model=Model(load_obj(args.model_file) works with new weights.
So how do we get the new format of an old args.model_file.
Can we try to use model=Model(layers=cifar10_layers).
And do model.load_params(args.model_files, load_state)..
When we see there is an assert failure. This usually mean only a subset of the cifar10 layers was involved in saving the weights. So we need to print out those layer information from the old weight file and see which layers are missing. We then delete those layers (temporarily) from cifar10 layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is resnet_eval.py still functional/successful in loading weights?
model.load_params(args.model_file) | ||
model.initialize(test, cost) | ||
model.load_params(args.model_file, load_states=False) | ||
model.initialize(val, cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, not specific to this task,
what is the difference between validation and test?
Normally we do training + validation. And then we do test with test dataset, right?
It seems neon has only being using training dataset and validation dataset for testing, not test dataset.
model.load_params(args.model_file) | ||
mets=model.eval(test, metric=TopKMisclassification(k=5)) | ||
model.load_params(args.model_file, load_states=False) | ||
mets=model.eval(val, metric=TopKMisclassification(k=5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #25 has both training + inference and updated readme.
@tpatejko Can you please work on uploading the new formatted weight files to aws first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Need to upload new weights to aws and verify resnet_cifar10 functionality.
All_cnn, Alexnet, Googlenet and Alexnet work with Neon 2.3 and Aeon 1.0.
DeepResnet works with Neon 2.2 due to errors related to bias fusion (?)