-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Update dataloaders to use Aggregation Lists #264
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far!!
I have a marked a couple of places where I think some methods can be re-written a bit cleaner, and I am asking for a pretty substantial architecture change for the TF and Torch data generators, but overall, the meat of this PR looks fantastic!
AS always feel free to lmk what you think!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments in addition to Matt's comments
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #264 +/- ##
===========================================
+ Coverage 84.89% 87.53% +2.64%
===========================================
Files 60 60
Lines 3423 3386 -37
===========================================
+ Hits 2906 2964 +58
+ Misses 517 422 -95
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! A found a couple of last minute things to address, but overall this looks about ready to go!
@@ -41,7 +41,7 @@ | |||
"outputs": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running the notebook in a new dev environment (numpy 1.24). I got the error below. Downgrading to 1.23 worked. Do you think we should pin to a version or update the heat transfer source files? It seems like the data type in question was deprecated several years ago.
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[3], line 14
11 size = 64
13 for _ in range(3):
---> 14 u_s = fd2d_heat_steady_test01 (size, size)
15 pcolor_list(u_s, "Left: initial temperature. Right: steady state.")
File ~/craylabs/SmartSim/tutorials/ml_training/surrogate/steady_state.py:270, in fd2d_heat_steady_test01(nx, ny)
267 source_centers = 0.2+np.random.rand(np.random.randint(1,6),2)*0.6
269 Xgrid, Ygrid = np.meshgrid(xvec, yvec)
--> 270 u_init = np.zeros_like(Xgrid).astype(np.bool)
271 for center in source_centers:
272 u_init |= (Xgrid-center[0])**2 + (Ygrid-center[1])**2 < 0.05**2
File ~/miniconda3/envs/ss_test/lib/python3.9/site-packages/numpy/__init__.py:305, in __getattr__(attr)
300 warnings.warn(
301 f"In the future `np.{attr}` will be defined as the "
302 "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
304 if attr in __former_attrs__:
--> 305 raise AttributeError(__former_attrs__[attr])
307 # Importing Tester requires importing all of UnitTest which is not a
308 # cheap import Since it is mainly used in test suits, we lazy import it
309 # here to save on the order of 10 ms of import time for most users
310 #
...
AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to confirm, as the deprecation message suggests, that swapping np.bool -> bool
appears to give the desired output with both numpy==1.23.0
and numpy==1.24.2
. I'm in favor of simply updating the simulation code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that code is old, fixing it won't hurt! Thanks for the catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!! Thanks for all the hard work on this one! The use of aggregation lists here make this whole section much easier to understand/utilize/extend imo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks!
This PR updates the TF and PyTorch data loaders to make use of SmartRedis's aggregation lists.
When training in parallel, we need to adopt a round-robin distribution of
Datasets
. This is OK as long as the simulation and the training have the same producing/consuming speed, but if the simulation is way faster (or there are manyDataset
s in the list when we start to train), we end up making many calls to get the interleaved batches. We could add another parameter in the future, to change the interleaving/stride across ranks, but for now, I think it will be fine.