Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Sparse benchmark #157

Merged
merged 7 commits into from
Aug 22, 2018
Merged

Sparse benchmark #157

merged 7 commits into from
Aug 22, 2018

Conversation

kalyc
Copy link

@kalyc kalyc commented Aug 17, 2018

Summary

Added primary benchmarking model for sparse with synthetic data

Related Issues

PR Overview

  • [n] This PR requires new unit tests [y/n] (make sure tests are included)
  • [y] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • [y] This PR is backwards compatible [y/n]
  • [n] This PR changes the current API [y/n]

@kalyc kalyc force-pushed the sparse-benchmark branch from 499ed9c to 12bfbc6 Compare August 20, 2018 16:48
@kalyc kalyc changed the title Sparse benchmark [WIP] Sparse benchmark Aug 20, 2018
Copy link

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! reviews in line.

# Fix random seed for reproducibility
np.random.seed(7)

start = time.time()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to only time the training time. As data generation code is different in pure mxnet (extra step to crate Iterators)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

from scipy.sparse import csc_matrix

# Fix random seed for reproducibility
np.random.seed(7)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a common file to run the benchmark and it will create a random dataset instead of fixed seed? Some thing like run_sparse_benchmark.py, and the logic will be:

1. Generate random dataset
2. Pass the dataset to `keras_tf_model.py` to train and it returns acc, loss, time to train
3. Pass the dataset to `mxnet_sparse_model.py` to train and it returns acc, loss, time to train

In this way you can make sure same data is passed and each time you run it, data will be random generated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments.


start = time.time()
model.fit(train_data, train_label,
epochs=1000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoch can be a configurable param

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def invoke_benchmark(batch_size, mode):
# Fix random seed for reproducibility
np.random.seed(7)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we can remove this fixed seed since same data is passing to both modes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



### Results
| Instance Type | GPUs | Batch Size | MXNet (Time/Epoch) | Keras-TensorFlow (Time/Epoch) |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mention here if users want to reproduce exact result, they can use fixed seed 7. In general, we should observe consist/similar result with different random datasets
Or we can show an average of a few runs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@roywei roywei closed this Aug 21, 2018
@roywei roywei reopened this Aug 21, 2018
@kalyc kalyc force-pushed the sparse-benchmark branch from 3932458 to 8110427 Compare August 21, 2018 17:34
@@ -124,4 +124,4 @@ script:
PYTHONPATH=$PWD:$PYTHONPATH py.test tests/test_documentation.py;
else
PYTHONPATH=$PWD:$PYTHONPATH py.test tests/ --ignore=tests/integration_tests --ignore=tests/test_documentation.py --ignore=tests/keras/legacy/layers_test.py --cov-config .coveragerc --cov=keras tests/;
fi
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the same with dev branch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same as dev - for some reason its showing that there is a diff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff is an extra line at end of file is deleted.

Copy link

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! please fix the .travis.yml diff before merging

Copy link

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions Kalyanee!

Any metric on memory consumption applicable?

metric = mx.metric.MSE()
mse = model.score(eval_iter, metric)
print("Achieved {0:.6f} validation MSE".format(mse[0][1]))
assert model.score(eval_iter, metric)[0][1] < 0.01001, "Achieved MSE (%f) is larger than expected (0.01001)" % \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why assert in benchmark script?
Same comment for above tf script.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we are generating random data, so accuracy/loss doesn't add any value here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For first comment - I followed the style of the benchmarking script for linear regression model written in MXNet - https://mxnet.incubator.apache.org/tutorials/python/linear-regression.html

As we have a pre-defined test and train set, even if its synthetic we can loss/accuracy can still help us with understanding how the model is performing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't track memory usage as we just wanted to check speed of mxnet vs keras-tf on sparse tensors.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that accuracy/loss has no value for synthetic data, even if the acc increase, it does not mean it's performing better.
You can extend that in future if you want to include real dataset. and only check acc/loss on real dataset.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About memory consumption , it's a good value add as sparse tensors save a lot of memory, and it's very easy to report. Just run on a GPU instance and report the output of nvidia-smi

Copy link
Author

@kalyc kalyc Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats a good point - I will update the memory usage in another PR. This only tracks CPU performance.

### Configuration
| Dataset | Synthetic(Randomly generated) |
| :--------------- | :----------------------------------------------------------- |
| Keras | v2.2.0 |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.2.2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

| C5.18X Large | 0 | 128 | 5.72 sec | 9.86 sec |

### Note
For reproducing above results set seed to `7` by adding this line in the `run_sparse_benchmark` script - `np.random.seed(7)`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we add this line rather than asking the user to add.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to avoid the assumption that this benchmark is reproducible only one one dataset. So as per comments by @roywei we moved it under ReadMe - for details about how to get these values.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can report an average of multiple runs. My point is better performance on a fixed seed is not convinced enough.

### Note
For reproducing above results set seed to `7` by adding this line in the `run_sparse_benchmark` script - `np.random.seed(7)`

Run the file as `python run_sparse_benchmark.py --batch=128 --epochs=1000`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1000 epochs? 25 should be good enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 doesn't converge well for either of the backends - when we have a MSE limit of 0.01001 set
MXNet model results in MSE 0.020521
Keras-TF results in MSE 0.017107 with 25 epochs

Results below show the performance comparison of linear regression with MXNet vs Keras-Tensorflow using sparse tensors
```

### Configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about sparse type - csr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from scipy import sparse


def invoke_benchmark(batch_size, epochs):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be useful going forward if you have 2 function invoke_tf_keras_benchmark() and invoke_mxnet_benchmark(). and from CLI take an option for framework name may be?

Copy link
Author

@kalyc kalyc Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was added in the earlier revisions - we decided to just go forward with having only one function with invoking both the benchmarks.

@kalyc kalyc merged commit cd75046 into awslabs:dev Aug 22, 2018
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants