Changelog

v0.0.3

Released on September 30, 2019.

Featured:
torchgpipe now overlaps copy and computation using the separate CUDA streams. Previously, GPU could not compute a partition while copying micro-batches across different GPUs because they all happened on the same default CUDA stream.
Other Improvements:
  • Added support for PyTorch 1.2.
  • Redesigned the internal pipeline parallelism to represent dependencies transparently.
  • Fixed the hanging issue when an exception is raised in a partition.
  • Fixed the unintended size accumulation (issue #3 by Shiyan Deng) of balance_by_size().
Breaking Changes:
  • No more support for PyTorch 1.0.
  • Changed type of GPipe.devices from tuple to list.
  • Removed current_microbatch. This approach turned out to be incompatible with checkpointing.

v0.0.2

Released on June 26, 2019.

  • Added support for PyTorch 1.1.
  • Refined public APIs.
  • Detailed documentation.
  • Proper exceptions for invalid usage.
  • Provided automatic balancing.
  • Provided inspecting utilities: current_microbatch (DO NOT USE, deprecated since v0.0.3) and is_recomputing()
  • Reimplemented deferred batch normalization by subclassing.

v0.0.1

Released on May 14, 2019 to evaluate usability and efficiency internally.

  • Provided a functional GPipe implementation, including pipeline parallelism, checkpointing, and deferred batch normalization.
  • Supported Python 3.6+ and PyTorch 1.0.