Sparse benchmark #157

kalyc · 2018-08-17T22:41:09Z

Summary

Added primary benchmarking model for sparse with synthetic data

Related Issues

PR Overview

[n] This PR requires new unit tests [y/n] (make sure tests are included)
[y] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
[y] This PR is backwards compatible [y/n]
[n] This PR changes the current API [y/n]

roywei

Thanks for the contribution! reviews in line.

roywei · 2018-08-20T17:49:07Z

benchmark/sparse/linear_regression/keras_tf_model.py

+# Fix random seed for reproducibility
+np.random.seed(7)
+
+start = time.time()


It's better to only time the training time. As data generation code is different in pure mxnet (extra step to crate Iterators)

Good point, fixed.

roywei · 2018-08-20T17:53:44Z

benchmark/sparse/linear_regression/keras_tf_model.py

+from scipy.sparse import csc_matrix
+
+# Fix random seed for reproducibility
+np.random.seed(7)


Can we create a common file to run the benchmark and it will create a random dataset instead of fixed seed? Some thing like run_sparse_benchmark.py, and the logic will be:

1. Generate random dataset 2. Pass the dataset to `keras_tf_model.py` to train and it returns acc, loss, time to train 3. Pass the dataset to `mxnet_sparse_model.py` to train and it returns acc, loss, time to train

In this way you can make sure same data is passed and each time you run it, data will be random generated

roywei

few minor comments.

roywei · 2018-08-20T20:39:39Z

benchmark/sparse/linear_regression/keras_tf_model.py

+
+    start = time.time()
+    model.fit(train_data, train_label,
+              epochs=1000,


epoch can be a configurable param

roywei · 2018-08-20T20:40:41Z

benchmark/sparse/linear_regression/run_sparse_benchmark.py

+
+def invoke_benchmark(batch_size, mode):
+    # Fix random seed for reproducibility
+    np.random.seed(7)


Now we can remove this fixed seed since same data is passing to both modes.

roywei · 2018-08-20T20:42:27Z

benchmark/sparse/linear_regression/results.md

+
+
+### Results 
+| Instance Type | GPUs  | Batch Size  | MXNet (Time/Epoch) | Keras-TensorFlow (Time/Epoch)  |


We can mention here if users want to reproduce exact result, they can use fixed seed 7. In general, we should observe consist/similar result with different random datasets
Or we can show an average of a few runs.

… fixed random seed in script

roywei · 2018-08-21T17:48:48Z

.travis.yml

@@ -124,4 +124,4 @@ script:
      PYTHONPATH=$PWD:$PYTHONPATH py.test tests/test_documentation.py;
    else
      PYTHONPATH=$PWD:$PYTHONPATH py.test tests/ --ignore=tests/integration_tests --ignore=tests/test_documentation.py --ignore=tests/keras/legacy/layers_test.py --cov-config .coveragerc --cov=keras tests/;
-    fi
+    fi


keep the same with dev branch

It is the same as dev - for some reason its showing that there is a diff.

Diff is an extra line at end of file is deleted.

roywei

LGTM! please fix the .travis.yml diff before merging

sandeep-krishnamurthy

Thanks for your contributions Kalyanee!

Any metric on memory consumption applicable?

sandeep-krishnamurthy · 2018-08-22T05:07:36Z

benchmark/sparse/linear_regression/mxnet_sparse_model.py

+    metric = mx.metric.MSE()
+    mse = model.score(eval_iter, metric)
+    print("Achieved {0:.6f} validation MSE".format(mse[0][1]))
+    assert model.score(eval_iter, metric)[0][1] < 0.01001, "Achieved MSE (%f) is larger than expected (0.01001)" % \


nit: why assert in benchmark script?
Same comment for above tf script.

Also, we are generating random data, so accuracy/loss doesn't add any value here?

For first comment - I followed the style of the benchmarking script for linear regression model written in MXNet - https://mxnet.incubator.apache.org/tutorials/python/linear-regression.html

As we have a pre-defined test and train set, even if its synthetic we can loss/accuracy can still help us with understanding how the model is performing.

Didn't track memory usage as we just wanted to check speed of mxnet vs keras-tf on sparse tensors.

Agree that accuracy/loss has no value for synthetic data, even if the acc increase, it does not mean it's performing better.
You can extend that in future if you want to include real dataset. and only check acc/loss on real dataset.

About memory consumption , it's a good value add as sparse tensors save a lot of memory, and it's very easy to report. Just run on a GPU instance and report the output of nvidia-smi

Yes thats a good point - I will update the memory usage in another PR. This only tracks CPU performance.

sandeep-krishnamurthy · 2018-08-22T05:08:02Z

benchmark/sparse/linear_regression/results.md

+### Configuration
+| Dataset          | Synthetic(Randomly generated)                                |
+| :--------------- | :----------------------------------------------------------- |
+| Keras            | v2.2.0                                                       |


sandeep-krishnamurthy · 2018-08-22T05:09:09Z

benchmark/sparse/linear_regression/results.md

+|  C5.18X Large |   0  | 128  | 5.72 sec | 9.86 sec |
+
+### Note
+For reproducing above results set seed to `7` by adding this line in the `run_sparse_benchmark` script - `np.random.seed(7)`


why don't we add this line rather than asking the user to add.

We wanted to avoid the assumption that this benchmark is reproducible only one one dataset. So as per comments by @roywei we moved it under ReadMe - for details about how to get these values.

Or we can report an average of multiple runs. My point is better performance on a fixed seed is not convinced enough.

sandeep-krishnamurthy · 2018-08-22T05:09:47Z

benchmark/sparse/linear_regression/results.md

+### Note
+For reproducing above results set seed to `7` by adding this line in the `run_sparse_benchmark` script - `np.random.seed(7)`
+
+Run the file as `python run_sparse_benchmark.py --batch=128 --epochs=1000`


why 1000 epochs? 25 should be good enough?

25 doesn't converge well for either of the backends - when we have a MSE limit of 0.01001 set
MXNet model results in MSE 0.020521
Keras-TF results in MSE 0.017107 with 25 epochs

sandeep-krishnamurthy · 2018-08-22T05:10:53Z

benchmark/sparse/linear_regression/results.md

+   Results below show the performance comparison of linear regression with MXNet vs Keras-Tensorflow using sparse tensors
+```
+
+### Configuration


Please add a comment about sparse type - csr

sandeep-krishnamurthy · 2018-08-22T05:13:35Z

benchmark/sparse/linear_regression/run_sparse_benchmark.py

+from scipy import sparse
+
+
+def invoke_benchmark(batch_size, epochs):


Nit: it would be useful going forward if you have 2 function invoke_tf_keras_benchmark() and invoke_mxnet_benchmark(). and from CLI take an option for framework name may be?

Yes that was added in the earlier revisions - we decided to just go forward with having only one function with invoking both the benchmarks.

kalyc requested review from sandeep-krishnamurthy and roywei August 17, 2018 22:41

kalyc force-pushed the sparse-benchmark branch from 499ed9c to 12bfbc6 Compare August 20, 2018 16:48

kalyc changed the title ~~Sparse benchmark [WIP]~~ Sparse benchmark Aug 20, 2018

roywei suggested changes Aug 20, 2018

View reviewed changes

roywei reviewed Aug 20, 2018

View reviewed changes

roywei closed this Aug 21, 2018

roywei reopened this Aug 21, 2018

kalyc added 5 commits August 21, 2018 10:28

Add benchmarking models for sparse using synthetic data

fba5896

Add run_sparse_benchmark script and update benchmark results

38a21c2

Add argparser for number of epochs, remove argument 'mode' and remove…

7c4a79d

… fixed random seed in script

Fix PEP error

e7b943e

Update mxnet_sparse benchmark to use sparse mxnet variable

8110427

kalyc force-pushed the sparse-benchmark branch from 3932458 to 8110427 Compare August 21, 2018 17:34

roywei reviewed Aug 21, 2018

View reviewed changes

roywei approved these changes Aug 21, 2018

View reviewed changes

Remove print statement in benchmark

6eee4d2

sandeep-krishnamurthy reviewed Aug 22, 2018

View reviewed changes

Remove assertion on model accuracy, update results documentation

d5690f2

sandeep-krishnamurthy approved these changes Aug 22, 2018

View reviewed changes

kalyc merged commit cd75046 into awslabs:dev Aug 22, 2018



		### Results
		\| Instance Type \| GPUs \| Batch Size \| MXNet (Time/Epoch) \| Keras-TensorFlow (Time/Epoch) \|

		from scipy import sparse


		def invoke_benchmark(batch_size, epochs):

Sparse benchmark #157

Sparse benchmark #157

Conversation

kalyc commented Aug 17, 2018 • edited Loading

Summary

Related Issues

PR Overview

roywei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei left a comment

Choose a reason for hiding this comment

sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kalyc Aug 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kalyc Aug 22, 2018 • edited Loading

Choose a reason for hiding this comment

kalyc commented Aug 17, 2018 •

edited

Loading

kalyc Aug 22, 2018 •

edited

Loading

kalyc Aug 22, 2018 •

edited

Loading