Released on September 18, 2020.

Changed the license to BSD-3-Clause.


Released on July 29, 2020.

  • Updated docs.

  • Added support for PyTorch 1.5.


Released on November 29, 2019.


@skippable for efficient skip connections. With this interface, GPipe copies skip tensors directly to the destination device.

  • Checkpointing deterministically handles randomness managed by PyTorch.

  • balance_by_size() analyzes parameters as well.

Breaking Changes:


Released on October 8, 2019.

  • Reduced GPU memory fragmentation by caching CUDA streams for copy.

  • Fixed potential GPU memory violation on tuple of multiple tensors.

  • Fixed potential GPU memory violation on shifted view tensors. (issue #27366 and pull request #27371 on PyTorch)


Released on September 30, 2019.


torchgpipe now overlaps copy and computation using the separate CUDA streams. Previously, GPU could not compute a partition while copying micro-batches across different GPUs because they all happened on the same default CUDA stream.

Other Improvements:
  • Added support for PyTorch 1.2.

  • Redesigned the internal pipeline parallelism to represent dependencies transparently.

  • Reduced memory usage for backpropagation by forgetting recomputation results at the right time.

  • Fixed the hanging issue when an exception is raised in a partition.

  • Fixed the unintended size accumulation (issue #3 by Shiyan Deng) of balance_by_size().

Breaking Changes:
  • No more support for PyTorch 1.0.

  • Changed type of GPipe.devices from tuple to list.

  • Removed current_microbatch. This approach turned out to be incompatible with checkpointing.


Released on June 26, 2019.

  • Added support for PyTorch 1.1.

  • Refined public APIs.

  • Detailed documentation.

  • Proper exceptions for invalid usage.

  • Provided automatic balancing.

  • Provided inspecting utilities: current_microbatch (DO NOT USE, deprecated since v0.0.3) and is_recomputing()

  • Reimplemented deferred batch normalization by subclassing.


Released on May 14, 2019 to evaluate usability and efficiency internally.

  • Provided a functional GPipe implementation, including pipeline parallelism, checkpointing, and deferred batch normalization.

  • Supported Python 3.6+ and PyTorch 1.0.