Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24

tpatejko · 2017-11-07T12:21:43Z

All_cnn, Alexnet, Googlenet and Alexnet work with Neon 2.3 and Aeon 1.0.
DeepResnet works with Neon 2.2 due to errors related to bias fusion (?)

…nfig

tpatejko · 2017-11-07T12:26:29Z

DeepResnet on Neon 2.3 gives the following error:

File"resnet_eval.py", line 63, in
model = Model(load_obj(args.model_file), weights_only=True)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/models/model.py", line 70, in init
self.deserialize(layers, load_states=(not weights_only))
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/models/model.py", line 475, in deserialize
self.layers.load_weights(model_dict['model'], load_states)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 202, in load_weights
branch.load_weights(bdict, load_states=load_states)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 202, in load_weights
branch.load_weights(bdict, load_states=load_states)
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 198, in load_weights
pdict['config']['layers'] = self.fusion_pass(pdict['config']['layers'])
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 177, in fusion_pass
if any([pattern(l1, l2) for pattern in patterns]):
File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 172, in
y['type'] == 'neon.layers.layer.Bias']
TypeError: 'NoneType' object has no attribute 'getitem'

wei-v-wang · 2017-11-08T06:33:01Z

Thanks @tpatejko for the PR.
Regarding DeepResnet failure, could you please try the code pattern discussed in NervanaSystems/neon#415 ?
i.e. can you try to re-write Model(load_obj(args.model_file), load_states)
to
Model (...)
model.load_params(args.model_file, load_states)?

Also, could you please take a look at https://github.com/NervanaSystems/modelzoo/tree/VGG_neon2.3 to merge the best ingredients of the two for VGG?

Thank you!

baojun-nervana · 2017-11-09T23:26:56Z

ImageClassification/ILSVRC2012/Alexnet/test.sh

@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
 echo "Downloading weights file from ${WEIGHTS_URL}"
 curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

-python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1
+python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted/ --manifest_root /data/i1k-extracted --manifest train:/data/i1k-extracted/train-index.csv --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar  -z 256 | tee output.dat 2>&1


can you use "/dataset/aeon/I1K/i1k-extracted/" instead of "/data/i1k-extracted/" to be consistent with validation dataset structure?

baojun-nervana · 2017-11-09T23:30:28Z

ImageClassification/CIFAR10/DeepResNet/test.sh

@@ -23,14 +23,14 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
 echo "Downloading weights file from ${WEIGHTS_URL}"
 curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

-python -u $TEST_SCRIPT  -i ${EXECUTOR_NUMBER} -vvv --model_file $WEIGHTS_FILE --no_progress_bar -w /usr/local/data/CIFAR10/macrobatches | tee output.dat 2>&1
+python -u $TEST_SCRIPT  -i ${EXECUTOR_NUMBER} -vvv --model_file $WEIGHTS_FILE --manifest val:/data/CIFAR/val-index.csv --manifest_root /data/CIFAR -b gpu -z 32 --no_progress_bar 2>&1 | tee output.dat


can you use "/dataset/aeon/CIFAR10" instead of "/data/CIFAR"?

@baojun-nervana Out of curiosity, is there any particular reason why these paths should be changed?

@tpatejko that is the validation dataset directory, so we don't have to update those after the PR.

baojun-nervana · 2017-11-09T23:33:39Z

ImageClassification/ILSVRC2012/Alexnet/test.sh

@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
 echo "Downloading weights file from ${WEIGHTS_URL}"
 curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

-python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1
+python -u alexnet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted/ --manifest_root /data/i1k-extracted --manifest train:/data/i1k-extracted/train-index.csv --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar  -z 256 2>&1 | tee output.dat


can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?

baojun-nervana · 2017-11-09T23:34:40Z

ImageClassification/ILSVRC2012/Googlenet/test.sh

@@ -22,7 +22,7 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
 echo "Downloading weights file from ${WEIGHTS_URL}"
 curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

-python -u googlenet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar | tee output.dat 2>&1
+python -u googlenet_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted --manifest_root /data/i1k-extracted --manifest val:/data/i1k-extracted/val-index.csv -vvv --model_file $WEIGHTS_FILE --no_progress_bar 2>&1 | tee output.dat


can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?

baojun-nervana · 2017-11-09T23:34:58Z

ImageClassification/ILSVRC2012/VGG/test.sh

@@ -22,7 +22,8 @@ WEIGHTS_FILE=${WEIGHTS_URL##*/}
 echo "Downloading weights file from ${WEIGHTS_URL}"
 curl -o $WEIGHTS_FILE $WEIGHTS_URL 2> /dev/null

-python -u vgg_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /usr/local/data/I1K/macrobatches/ -vvv --model_file $WEIGHTS_FILE --no_progress_bar  -z 64 --vgg_version E | tee output.dat 2>&1
+python -u vgg_neon.py --test_only -i ${EXECUTOR_NUMBER} -w /data/i1k-extracted -vvv --manifest_root /data/i1k-extracted --manifest val:/data/i1k-extracted/val-index.csv --model_file $WEIGHTS_FILE --no_progress_bar -z 64 --vgg_version E 2>&1 | tee output.dat


can you use "/dataset/aeon/I1K/i1k-extracted" instead of "/data/i1k-extracted"?

tpatejko · 2017-11-13T15:34:44Z

@wei-v-wang Thanks for your comment

I tried to change the script following your remarks but I have a problem with the following part of code:

Model (...)
model.load_params(args.model_file, load_states)

but I'm not quite sure how to initialize Model part. I tried to use code that builds model in file resnet_cifar10.py but I encountered the following error:

File "/home/tpatejko/_aeon/ModelZoo/neon_2.3/neon/layers/container.py", line 200, in load_weights
assert len(pdict['config']['layers']) == len(self.layers)

Another question: could you elaborate on what you mean by merging best ingredients of two for VGG?

wei-v-wang · 2017-11-15T21:18:12Z

@tpatejko Sorry I must have missed your comments above.

I will take a look at the resnet_eval failure.

For VGG, I just create a PR to modelZoo as well. #25

Thanks!

wei-v-wang · 2017-11-15T21:24:12Z

ImageClassification/CIFAR10/All_CNN/readme.md

@@ -13,7 +13,7 @@ The trained weights file can be downloaded from AWS


 ### neon version
-The model weight file above has been generated using neon version tag [v1.4.0]((https://github.com/NervanaSystems/neon/releases/tag/v1.4.0).
+The model weight file above has been generated using neon version tag [v2.3.0]((https://github.com/NervanaSystems/neon/releases/tag/v2.3.0).


"The mode weight file" will probably need to be converted to new format. Have you already obtained a trained weight with neon v2.2/v2.3 format?

Assuming yes.

wei-v-wang · 2017-11-15T21:27:18Z

ImageClassification/ILSVRC2012/Alexnet/alexnet_neon.py

@@ -105,15 +135,15 @@

 # configure callbacks
 valmetric = TopKMisclassification(k=5)
-callbacks = Callbacks(model, eval_set=test, metric=valmetric, **args.callback_args)
+callbacks = Callbacks(model, eval_set=val, metric=valmetric, **args.callback_args)

 if args.model_file is not None:
    model.load_params(args.model_file)


model.load_params(args.model_file) should work. Why does resnet_eval.py in cifar10 does not support this.
We can see alexnet support this.

wei-v-wang · 2017-11-15T21:30:40Z

ImageClassification/CIFAR10/DeepResNet/resnet_eval.py

+    val_config = config(manifest_filename, manifest_root, batch_size, subset_pct)
+    return wrap_dataloader(AeonDataLoader(val_config))
+
+test_set = make_val_config(args.manifest["val"], args.manifest_root, batch_size=args.batch_size)

 model = Model(load_obj(args.model_file))


model=Model(load_obj(args.model_file)) will break the code, since neon v2.2 and v2.3 was not supposed to initialize a model using old weights. model=Model(load_obj(args.model_file) works with new weights.
So how do we get the new format of an old args.model_file.
Can we try to use model=Model(layers=cifar10_layers).
And do model.load_params(args.model_files, load_state)..

When we see there is an assert failure. This usually mean only a subset of the cifar10 layers was involved in saving the weights. So we need to print out those layer information from the old weight file and see which layers are missing. We then delete those layers (temporarily) from cifar10 layers.

Is resnet_eval.py still functional/successful in loading weights?

wei-v-wang · 2017-11-15T21:33:14Z

ImageClassification/ILSVRC2012/Googlenet/googlenet_neon.py

-model.load_params(args.model_file)
-model.initialize(test, cost)
+model.load_params(args.model_file, load_states=False)
+model.initialize(val, cost)


In general, not specific to this task,
what is the difference between validation and test?
Normally we do training + validation. And then we do test with test dataset, right?
It seems neon has only being using training dataset and validation dataset for testing, not test dataset.

wei-v-wang · 2017-11-15T21:35:49Z

ImageClassification/ILSVRC2012/VGG/vgg_neon.py

-model.load_params(args.model_file)
-mets=model.eval(test, metric=TopKMisclassification(k=5))
+model.load_params(args.model_file, load_states=False)
+mets=model.eval(val, metric=TopKMisclassification(k=5))


This vgg_neon.py seems to be only the inference path.
Is it OK to merge your VGG changes to #25?
Or merge changes in #25 to this PR is also OK.

PR #25 has both training + inference and updated readme.

…ayers

new commits

wei-v-wang · 2018-02-02T19:27:43Z

@tpatejko Can you please work on uploading the new formatted weight files to aws first?

wei-v-wang

Overall looks good. Need to upload new weights to aws and verify resnet_cifar10 functionality.

Tomasz Patejko added 19 commits October 19, 2017 00:12

Alexnet prepared for new Aeon: call to Aeon wrapper and changes in co…

02f75df

…nfig

Version changes in readme.md file

1fc4a10

Googlenet prepared for Aeon 1.0

19926d4

Incorrect commit information in readme.md

952048d

Changes in readme.md about Neon verion

d1aedf4

VGG changed for new Neon

ae6d4b0

VGG batch size changed to 64

ed06362

VGG readme.md updated

4ad1a24

Change in how output is stored in output.dat file

3e7736f

Changing version on Neon in readme file

f3eb100

CIFAR resnet corrected for neon 2.2

839eaf5

Resnet prepared for new data loader

4690ccf

Corrected documentation for neon 2.2

0655d21

Alexnet checked for Neon v2.3

f7e6337

Changes for neon 2.3: add load_states=False

ccd9daf

Changes in neon version

14be62e

Added load_states=False for neon 2.3

e58f718

Neon version changed to 2.3

70ff8df

All_CNN: version of neon changed to 2.3

9e8c7a5

tpatejko requested review from jennifermyers, wei-v-wang and hanlint and removed request for jennifermyers November 7, 2017 12:26

baojun-nervana reviewed Nov 9, 2017

View reviewed changes

wei-v-wang previously requested changes Nov 15, 2017

View reviewed changes

Tomasz Patejko added 3 commits December 9, 2017 22:12

Changing links to weight files in format with fused conv+bias

5502ec3

Alexnet converted to the new format with fused bias and convolution l…

8b4be94

…ayers

Googlenet uses weights file in new format with fused conv+bias layers

a428edb

wei-v-wang reviewed Feb 2, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24

Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24

tpatejko commented Nov 7, 2017

tpatejko commented Nov 7, 2017

wei-v-wang commented Nov 8, 2017

baojun-nervana Nov 9, 2017

baojun-nervana Nov 9, 2017

tpatejko Nov 13, 2017

baojun-nervana Nov 13, 2017

wei-v-wang Feb 2, 2018

baojun-nervana Nov 9, 2017

wei-v-wang Feb 2, 2018

baojun-nervana Nov 9, 2017

wei-v-wang Feb 2, 2018

baojun-nervana Nov 9, 2017

wei-v-wang Feb 2, 2018

tpatejko commented Nov 13, 2017

wei-v-wang commented Nov 15, 2017 •

edited

Loading

wei-v-wang Nov 15, 2017

wei-v-wang Feb 2, 2018

wei-v-wang Nov 15, 2017

wei-v-wang Nov 15, 2017

wei-v-wang Feb 2, 2018

wei-v-wang Nov 15, 2017

wei-v-wang Nov 15, 2017

wei-v-wang Nov 15, 2017

wei-v-wang commented Feb 2, 2018

wei-v-wang left a comment

Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24

Are you sure you want to change the base?

Image classification topologies corrected for Neon 2.2 and Neon 2.3 #24

Conversation

tpatejko commented Nov 7, 2017

tpatejko commented Nov 7, 2017

wei-v-wang commented Nov 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tpatejko commented Nov 13, 2017

wei-v-wang commented Nov 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wei-v-wang commented Feb 2, 2018

wei-v-wang left a comment

Choose a reason for hiding this comment

wei-v-wang commented Nov 15, 2017 •

edited

Loading