Skip to content

Register emulated kernel implementations for RandomStandardNormal and TruncatedNormal #120

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Dec 15, 2020

Conversation

adtsai
Copy link
Contributor

@adtsai adtsai commented Dec 12, 2020

For some reason, on some models, TensorFlow has a habit of forcibly colocating kernels like ApplyAdam with RandomUniform, RandomStandardNormal, and TruncatedNormal. This may be because the initial weights, which is what Adam optimizes, are initialized at training start by one of the Random operators.

This change registers a set of kernels to "emulate" support for RandomStandardNormal and TruncatedNormal. These kernels re-use the CPU implementations and merely upload the values to a GPU tensor. This means that computation is still done on the CPU (which is okay, since it's usually only done once during initialization), but the DML registration means they can now be colocated with other operators, like ApplyAdam.

This change should improve our AI-Benchmark scores by about 5-10%.

Copy link
Contributor

@PatriceVignola PatriceVignola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jstoecker
Copy link
Contributor

We need to track that these aren't truly implemented. Can you add some metadata to the ops table to track emulated kernel implementations?

@adtsai
Copy link
Contributor Author

adtsai commented Dec 15, 2020

@jstoecker I'll add something to the op report for these kernels.

@adtsai adtsai merged commit a2134ad into directml Dec 15, 2020
jstoecker pushed a commit that referenced this pull request Dec 15, 2020
jstoecker added a commit that referenced this pull request Dec 15, 2020
Merges some of the recent changes from the directml branch:
* Use compute queue for AMD devices (#102)
* Register List Kernels for DML (#95)
* Update DirectMLX to latest (#104)
* Remove extra rows from test email (#106)
* Fix DML's Select kernel for int64 (#113)
* Fix list kernels and tensor array ops registration (#114)
* Simplify CI scripts (#112)
* Fix StridedSlice's input size coalescing (#115)
* Disable int64 image test (#116)
* Fix network share copy path (#117)
* Pipeline should continue if a test job fails (#118)
* Switch network share path to use build number instead of build ID
* Add missing HostMemory int32 registrations for _Arg and _RetVal (#122)
* Implement all the arithmetic Scatter and ResourceScatter operators (#121)
* Register emulated kernel implementations for RandomStandardNormal and TruncatedNormal (#120)
@adtsai adtsai deleted the p/adtsai/normal branch December 15, 2020 22:07
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants