-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Description
This might be a stupid question but I couldn't find a solution anywhere.
When I use gpu to run non-negative decompositions for a random tensor, it is much slower than using a cpu (for various sizes). For reference it takes 0.4 seconds on cpu while it takes more than 10 seconds on gpu to run a single decomposition (size 3x2x2, but the same holds for 100 x 100 x 1000). I have pytorch and cuda 11.1 as well as cudnn on my computer and my gpu is rtx 3070 so it should theoretically beat my cpu?
Metadata
Metadata
Assignees
Labels
No labels