Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

tpatejko
Copy link

@tpatejko tpatejko commented Nov 7, 2017

All_cnn, Alexnet, Googlenet and Alexnet work with Neon 2.3 and Aeon 1.0.
DeepResnet works with Neon 2.2 due to errors related to bias fusion (?)

@tpatejko
Copy link
Author

tpatejko commented Nov 7, 2017

DeepResnet on Neon 2.3 gives the following error:

File"resnet_eval.py", line 63, in
model = Model(load_obj(args.model_file), weights_only=True)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/models/model.py", line 70, in init
self.deserialize(layers, load_states=(not weights_only))
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/models/model.py", line 475, in deserialize
self.layers.load_weights(model_dict['model'], load_states)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 202, in load_weights
branch.load_weights(bdict, load_states=load_states)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 202, in load_weights
branch.load_weights(bdict, load_states=load_states)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 198, in load_weights
pdict['config']['layers'] = self.fusion_pass(pdict['config']['layers'])
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 177, in fusion_pass
if any([pattern(l1, l2) for pattern in patterns]):
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 172, in
y['type'] == 'neon.layers.layer.Bias']
TypeError: 'NoneType' object has no attribute 'getitem'

@tpatejko tpatejko requested review from jennifermyers, wei-v-wang and hanlint and removed request for jennifermyers November 7, 2017 12:26
@wei-v-wang
Copy link

Thanks @tpatejko for the PR.
Regarding DeepResnet failure, could you please try the code pattern discussed in NervanaSystems/neon#415 ?
i.e. can you try to re-write Model(load_obj(args.model_file), load_states)
to
Model (...)
model.load_params(args.model_file, load_states)?

Also, could you please take a look at https://github.com/NervanaSystems/modelzoo/tree/VGG_neon2.3 to merge the best ingredients of the two for VGG?

Thank you!

@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
echo "Downloading weights file from ${WEIGHTS_URL}"
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1
python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted/ --manifest_root /data/i1k-extracted --manifest train:/data/i1k-extracted/train-index.csv --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar -z 256 | tee output.dat 2>&1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use "/dataset/aeon/I1K/i1k-extracted/" instead of "/data/i1k-extracted/" to be consistent with validation dataset structure?

@@ -23,14 +23,14 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
echo "Downloading weights file from ${WEIGHTS_URL}"
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

python -u $TEST_SCRIPT -i ${EXECUTOR_NUMBER} -vvv --model_file $WEIGHTS_FILE --no_progress_bar -w /usr/local/data/CIFAR10/macrobatches | tee output.dat 2>&1
python -u $TEST_SCRIPT -i ${EXECUTOR_NUMBER} -vvv --model_file $WEIGHTS_FILE --manifest val:/data/CIFAR/val-index.csv --manifest_root /data/CIFAR -b gpu -z 32 --no_progress_bar 2>&1 | tee output.dat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use "/dataset/aeon/CIFAR10" instead of "/data/CIFAR"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baojun-nervana Out of curiosity, is there any particular reason why these paths should be changed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpatejko that is the validation dataset directory, so we don't have to update those after the PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
echo "Downloading weights file from ${WEIGHTS_URL}"
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1
python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted/ --manifest_root /data/i1k-extracted --manifest train:/data/i1k-extracted/train-index.csv --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar -z 256 2>&1 | tee output.dat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
echo "Downloading weights file from ${WEIGHTS_URL}"
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

python -u googlenet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1
python -u googlenet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted --manifest_root /data/i1k-extracted --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar 2>&1 | tee output.dat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -22,7 +22,8 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
echo "Downloading weights file from ${WEIGHTS_URL}"
curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

python -u vgg_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar -z 64 --vgg_version E | tee output.dat 2>&1
python -u vgg_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted -vvv --manifest_root /data/i1k-extracted --manifest val:/data/i1k-extracted/val-index.csv --model_file $WEIGHTS_FILE --no_progress_bar -z 64 --vgg_version E 2>&1 | tee output.dat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@tpatejko
Copy link
Author

@wei-v-wang Thanks for your comment

I tried to change the script following your remarks but I have a problem with the following part of code:

Model (...)
model.load_params(args.model_file, load_states)

but I'm not quite sure how to initialize Model part. I tried to use code that builds model in file resnet_cifar10.py but I encountered the following error:

File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 200, in load_weights
assert len(pdict['config']['layers']) == len(self.layers)

Another question: could you elaborate on what you mean by merging best ingredients of two for VGG?

@wei-v-wang
Copy link

wei-v-wang commented Nov 15, 2017

@tpatejko Sorry I must have missed your comments above.

I will take a look at the resnet_eval failure.

For VGG, I just create a PR to modelZoo as well. #25

Thanks!

@@ -13,7 +13,7 @@ The trained weights file can be downloaded from AWS


### neon version
The model weight file above has been generated using neon version tag [v1.4.0]((https://github.com/NervanaSystems/neon/releases/tag/v1.4.0).
The model weight file above has been generated using neon version tag [v2.3.0]((https://github.com/NervanaSystems/neon/releases/tag/v2.3.0).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The mode weight file" will probably need to be converted to new format. Have you already obtained a trained weight with neon v2.2/v2.3 format?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming yes.

@@ -105,15 +135,15 @@

# configure callbacks
valmetric = TopKMisclassification(k=5)
callbacks = Callbacks(model, eval_set=test, metric=valmetric, **args.callback_args)
callbacks = Callbacks(model, eval_set=val, metric=valmetric, **args.callback_args)

if args.model_file is not None:
model.load_params(args.model_file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.load_params(args.model_file) should work. Why does resnet_eval.py in cifar10 does not support this.
We can see alexnet support this.

val_config = config(manifest_filename, manifest_root, batch_size, subset_pct)
return wrap_dataloader(AeonDataLoader(val_config))

test_set = make_val_config(args.manifest["val"], args.manifest_root, batch_size=args.batch_size)

model = Model(load_obj(args.model_file))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model=Model(load_obj(args.model_file)) will break the code, since neon v2.2 and v2.3 was not supposed to initialize a model using old weights. model=Model(load_obj(args.model_file) works with new weights.
So how do we get the new format of an old args.model_file.
Can we try to use model=Model(layers=cifar10_layers).
And do model.load_params(args.model_files, load_state)..

When we see there is an assert failure. This usually mean only a subset of the cifar10 layers was involved in saving the weights. So we need to print out those layer information from the old weight file and see which layers are missing. We then delete those layers (temporarily) from cifar10 layers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is resnet_eval.py still functional/successful in loading weights?

model.load_params(args.model_file)
model.initialize(test, cost)
model.load_params(args.model_file, load_states=False)
model.initialize(val, cost)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, not specific to this task,
what is the difference between validation and test?
Normally we do training + validation. And then we do test with test dataset, right?
It seems neon has only being using training dataset and validation dataset for testing, not test dataset.

model.load_params(args.model_file)
mets=model.eval(test, metric=TopKMisclassification(k=5))
model.load_params(args.model_file, load_states=False)
mets=model.eval(val, metric=TopKMisclassification(k=5))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This vgg_neon.py seems to be only the inference path.
Is it OK to merge your VGG changes to #25?
Or merge changes in #25 to this PR is also OK.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #25 has both training + inference and updated readme.

@wei-v-wang wei-v-wang dismissed their stale review February 2, 2018 19:27

new commits

@wei-v-wang
Copy link

@tpatejko Can you please work on uploading the new formatted weight files to aws first?

Copy link

@wei-v-wang wei-v-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Need to upload new weights to aws and verify resnet_cifar10 functionality.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants