-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[CI] Framework and hardware-specific CI tests #997
Conversation
The documentation is not available anymore as the PR was closed or merged. |
OMP_NUM_THREADS: 4 | ||
MKL_NUM_THREADS: 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CPU runner has 8 cores => 2 pytest workers * 4 cores.
The speed isn't affected by this change (only faster due to the new docker image)
matrix: | ||
config: | ||
- name: Fast PyTorch CPU tests on Ubuntu | ||
framework: pytorch | ||
runner: docker-cpu | ||
image: diffusers/diffusers-pytorch-cpu | ||
report: torch_cpu | ||
- name: Fast Flax CPU tests on Ubuntu | ||
framework: flax | ||
runner: docker-cpu | ||
image: diffusers/diffusers-flax-cpu | ||
report: flax_cpu | ||
- name: Fast ONNXRuntime CPU tests on Ubuntu | ||
framework: onnxruntime | ||
runner: docker-cpu | ||
image: diffusers/diffusers-onnxruntime-cpu | ||
report: onnx_cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matrix defines the different combinations of frameworks, docker images and runners to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
class OnnxStableDiffusionPipelineIntegrationTests(unittest.TestCase): | ||
def test_inference(self): | ||
provider = ( | ||
"CUDAExecutionProvider", | ||
{ | ||
"gpu_mem_limit": "17179869184", # 16GB. | ||
"arena_extend_strategy": "kSameAsRequested", | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Onnx tests now run with the CUDA provider. This enables us to add more integration tests without worrying about inference speed.
assert images.shape == (8, 1, 128, 128, 3) | ||
assert np.abs(np.abs(images[0, 0, :2, :2, -2:], dtype=np.float32).sum() - 3.1111548) < 1e-3 | ||
assert np.abs(np.abs(images, dtype=np.float32).sum() - 199746.95) < 5e-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this slow (tpu) test had different values before. Updated the reference values to what I've got on the TPU runner with jax[tpu]
if jax_device == "tpu": | ||
assert abs(result_sum - 255.0714) < 1e-2 | ||
assert abs(result_mean - 0.332124) < 1e-3 | ||
else: | ||
assert abs(result_sum - 255.1113) < 1e-2 | ||
assert abs(result_mean - 0.332176) < 1e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler tests needed some adjustments for when they're running on TPU.
if jax_device == "tpu": | ||
pass | ||
# FIXME: both result_sum and result_mean are nan on TPU | ||
# assert jnp.isnan(result_sum) | ||
# assert jnp.isnan(result_mean) | ||
else: | ||
assert abs(result_sum - 149.0784) < 1e-2 | ||
assert abs(result_mean - 0.1941) < 1e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not too urgent to fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! glad the tests are catching these things though!
"jax>=0.2.8,!=0.3.2,<=0.3.6", | ||
"jaxlib>=0.1.65,<=0.3.6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the version cap from jax, as we can use the latest version now (and it's required for docker support)
The PR is now ready for review, lmk if something needs to be explained more! cc @muellerzr @ydshieh for optional reviews and/or inspiration :) |
Wow - super cool! Great job :-) |
Looks all good to me - happy to merge! |
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \ | ||
-s -v -k "Flax" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) think it's a bit saver/easier to work with environment variables e.g. RUN_FLAX=True/False
and a test decorator but ok for me for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will add it soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Looks good to me
container: | ||
image: python:3.7 | ||
image: ${{ matrix.config.image }} | ||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need --gpus 0
or --gpus all
if we want to use GPU in the docker? In transformers
CI, we specified it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is PR tests, and only on CPU. Sorry to bother
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=tests_torch_gpu tests/ | ||
python -m pytest -n 0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use -n 0
to disable xdist
for Flax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precisely! Looks like jax[tpu]
doesn't like being launched with multiprocessing at all: the TPU gets reserved by the parent process and the tests can't get access to it afterwards: jax-ml/jax#10192
Very nice and clean usage of matrix! |
* [WIP][CI] Framework and hardware-specific docker images for CI tests * username * fix cpu * try out the image * push latest * update workspace * no root isolation for actions * add a flax image * flax and onnx matrix * fix runners * add reports * onnxruntime image * retry tpu * fix * fix * build onnxruntime * naming * onnxruntime-gpu image * onnxruntime-gpu image, slow tests * latest jax version * trigger flax * run flax tests in one thread * fast flax tests on cpu * fast flax tests on cpu * trigger slow tests * rebuild torch cuda * force cuda provider * fix onnxruntime tests * trigger slow * don't specify gpu for tpu * optimize * memory limit * fix flax tests * disable docker cache
Now we have the following GutHub actions runners as separate machines:
This PR sorts the tests to use the appropriate runners and base docker images:
Flax
orOnnx
?diffusers-pytorch-cpu
image on thedocker-cpu
runnerconda
env on theapple-m1
runnerFlax
?diffusers-flax-cpu
image on thedocker-cpu
runnerOnnx
?diffusers-onnxruntime-cpu
image on thedocker-cpu
runnerFlax
orOnnx
?diffusers-pytorch-cuda
image on thedocker-gpu
runnerFlax
?diffusers-flax-tpu
image on thedocker-tpu
runnerOnnx
?diffusers-onnxruntime-cuda
image on thedocker-gpu
runnerdiffusers-pytorch-cuda
image on thedocker-gpu
runner