Roadmap

This document lists general directions that core team is interested to see developed in PyTorch-Ignite.

We are using Github Projects to define our different goals: releases, particular milestones etc.

Principal goals

continue maintaining high-quality, well-tested and documented modules.
provide distributed framework support via ignite.distributed: XLA (e.g. TPU), Horovod
provide new higher-level API based on Engine to simplify the usage while keeping flexibility as a contrib module
provide helper on data management via ignite.data: sampling, multi-dataloaders
provide more intergrations with other tools to simplify Machine/Deep Learning end-to-end applications.
visibility and communications

Codebase maintenance

add typing to the whole package
adapt the code and add mypy check
merge contrib module into principal library ?

Pre-built Docker images

Provide helper docker images to quick-start with a task
https://hub.docker.com/orgs/pytorchignite

Distributed framework support

XLA devices support via pytorch/xla
Horovod
Explore DDP + RPC
Better support different types of parallelism: data, model, pipeline.

Metrics

All metrics work in distributed
- configurable distributed metrics reduce/gather methods
Minor improvements:
- better support of sklearn metrics
- Classification metrics with micro/macro options
Metrics for NLP: to define
Metrics for GANs: FID, PPL (#998)

See also related GSoC 2021 project idea description

Higher-level API

push-button contrib trainers with AMP, distributed etc
automatic batch size via toma

See also related GSoC 2021 project idea description

Refactor Engine

Engine of 0.4.x version contains several major bugs related to the way we implemented events triggering and counting. In this case, events filtering requires state and corresponding attributes to be available which is not a nice design. To solve the following issues : https://github.com/pytorch/ignite/issues?q=is%3Aissue+is%3Aopen+label%3A%22module%3A+engine%22 it requires major Engine redesign while keeping as much as possible the backward compatibility.

Run/Resume logic improvements

Currently, we have a bit unclear engine's behaviour about when restart from the beginning and when to continue.

Currently

# (re)start from 0 to 5
engine.run(data, max_epochs=5) -> Engine run starting with max_epochs=5 => state.epoch=5

# continue from 5 to 7
engine.run(data, max_epochs=7) -> Engine run resuming from iteration 50, epoch 5 until 7 epochs => state.epoch=7

# error
engine.run(data, max_epochs=4) -> ValueError: Argument max_epochs should be larger than the start epoch

# restart from 0 to 7 (As state.epoch == max_epochs(=7), this should be like that as we always do: evaluator.run(data) without any other instructions)
engine.run(data, max_epochs=7) -> Engine run starting with max_epochs=7 => state.epoch=7

# forced restart from 0 to 5
engine.state.max_epochs = None
engine.run(data, max_epochs=5) -> Engine run starting with max_epochs=5 => state.epoch=5

# forced restart from 0 to 9, instead of continue from state.epoch=7
engine.state.max_epochs = None
engine.run(data, max_epochs=9) -> Engine run starting with max_epochs=9 => state.epoch=9

A proposition to change it slightly: "error" case and ugly engine.state.max_epochs=None solution.

Proposed API

# SAME. (re)start from 0 to 5
engine.run(data, max_epochs=5) -> Engine run starting with max_epochs=5 => state.epoch=5

# SAME. continue from 5 to 7
engine.run(data, max_epochs=7) -> Engine run resuming from iteration 50, epoch 5 until 7 epochs => state.epoch=7

# As max_epochs=4 <= state.epoch=7 => restart
engine.run(data, max_epochs=4) -> Engine run starting with max_epochs=4 => state.epoch=4

# restart from 0 to 4
engine.run(data, max_epochs=4) -> Engine run starting with max_epochs=4 => state.epoch=4

# Now (not forced) restart from 0 to 3 (as max_epochs=3 <= state.epoch=4 => restart)
engine.run(data, max_epochs=3) -> Engine run starting with max_epochs=3 => state.epoch=3

# SOMETHING TO CHANGE HERE. Forced restart from 0 to 9, instead of continue from state.epoch=3
engine.state.max_epochs = None  # maybe, engine.reset() -> state.epoch=state.iteration=0,state.max_epochs=state.max_iters=None
engine.run(data, max_epochs=9) -> Engine run starting with max_epochs=9 => state.epoch=9

# In case of max_iters, we'll have to do:
engine.run(data, max_iters=100) -> Engine run starting with max_iters=100 => state.iteration=100
engine.state.max_iters = None
engine.run(data, max_iters=100) -> Engine run starting with max_iters=100 => state.iteration=100
# So there is no uniform API to restart engine...

Helper on data management

better and simple coverage of multi-dataloaders use-cases, e.g. GAN, SSL, etc

Integrations

Verify compatibility (if ignite is not blocking) writing applications for Federated Learning
Verify compatibility (if ignite is not blocking) writing applications with Distributed RPC framework

Communications

More applications and successful stories with PyTorch-Ignite
Showcase via ClearML Ignite server :
- more experiments with Ignite from our users

PyTorch-Ignite presented to you with love by PyTorch community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly