Changelog¶

v0.0.7¶

Released on September 18, 2020.

Changed the license to BSD-3-Clause.

v0.0.6¶

Released on July 29, 2020.

Updated docs.
Added support for PyTorch 1.5.

v0.0.5¶

Released on November 29, 2019.

Featured:

@skippable for efficient skip connections. With this interface, GPipe copies skip tensors directly to the destination device.

Improvements:

Checkpointing deterministically handles randomness managed by PyTorch.
balance_by_size() analyzes parameters as well.

Breaking Changes:

Moved torchgpipe_balancing module to torchgpipe.balance.
Redesigned interface of balance_by_time() and balance_by_size().

v0.0.4¶

Released on October 8, 2019.

Reduced GPU memory fragmentation by caching CUDA streams for copy.
Fixed potential GPU memory violation on tuple of multiple tensors.
Fixed potential GPU memory violation on shifted view tensors. (issue #27366 and pull request #27371 on PyTorch)

v0.0.3¶

Released on September 30, 2019.

Featured:

torchgpipe now overlaps copy and computation using the separate CUDA streams. Previously, GPU could not compute a partition while copying micro-batches across different GPUs because they all happened on the same default CUDA stream.

Other Improvements:

Added support for PyTorch 1.2.
Redesigned the internal pipeline parallelism to represent dependencies transparently.
Reduced memory usage for backpropagation by forgetting recomputation results at the right time.
Fixed the hanging issue when an exception is raised in a partition.
Fixed the unintended size accumulation (issue #3 by Shiyan Deng) of balance_by_size().

Breaking Changes:

No more support for PyTorch 1.0.
Changed type of GPipe.devices from tuple to list.
Removed current_microbatch. This approach turned out to be incompatible with checkpointing.

v0.0.2¶

Released on June 26, 2019.

Added support for PyTorch 1.1.
Refined public APIs.
Detailed documentation.
Proper exceptions for invalid usage.
Provided automatic balancing.
Provided inspecting utilities: current_microbatch (DO NOT USE, deprecated since v0.0.3) and is_recomputing()
Reimplemented deferred batch normalization by subclassing.

v0.0.1¶

Released on May 14, 2019 to evaluate usability and efficiency internally.

Provided a functional GPipe implementation, including pipeline parallelism, checkpointing, and deferred batch normalization.
Supported Python 3.6+ and PyTorch 1.0.