You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before). The modified helper function can be called by external functions to compute self influence.
The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded. The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call. This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.
The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.
Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.
Reviewed By: NarineK
Differential Revision: D35603078
fbshipit-source-id: aff397c8278d60f1eb93f126d9703fe447c6ca71
0 commit comments