Changelog¶
v0.0.5¶
Released on November 29, 2019.
- Featured:
@skippable
for efficient skip connections. With this interface,GPipe
copies skip tensors directly to the destination device.- Improvements:
Checkpointing deterministically handles randomness managed by PyTorch.
balance_by_size()
analyzes parameters as well.
- Breaking Changes:
Moved
torchgpipe_balancing
module totorchgpipe.balance
.Redesigned interface of
balance_by_time()
andbalance_by_size()
.
v0.0.4¶
Released on October 8, 2019.
Reduced GPU memory fragmentation by caching CUDA streams for copy.
Fixed potential GPU memory violation on tuple of multiple tensors.
Fixed potential GPU memory violation on shifted view tensors. (issue #27366 and pull request #27371 on PyTorch)
v0.0.3¶
Released on September 30, 2019.
- Featured:
torchgpipe
now overlaps copy and computation using the separate CUDA streams. Previously, GPU could not compute a partition while copying micro-batches across different GPUs because they all happened on the same default CUDA stream.- Other Improvements:
Added support for PyTorch 1.2.
Redesigned the internal pipeline parallelism to represent dependencies transparently.
Reduced memory usage for backpropagation by forgetting recomputation results at the right time.
Fixed the hanging issue when an exception is raised in a partition.
Fixed the unintended size accumulation (issue #3 by Shiyan Deng) of
balance_by_size()
.
- Breaking Changes:
No more support for PyTorch 1.0.
Changed type of
GPipe.devices
fromtuple
tolist
.Removed
current_microbatch
. This approach turned out to be incompatible with checkpointing.
v0.0.2¶
Released on June 26, 2019.
Added support for PyTorch 1.1.
Refined public APIs.
Detailed documentation.
Proper exceptions for invalid usage.
Provided automatic balancing.
Provided inspecting utilities:
current_microbatch
(DO NOT USE, deprecated since v0.0.3) andis_recomputing()
Reimplemented deferred batch normalization by subclassing.
v0.0.1¶
Released on May 14, 2019 to evaluate usability and efficiency internally.
Provided a functional GPipe implementation, including pipeline parallelism, checkpointing, and deferred batch normalization.
Supported Python 3.6+ and PyTorch 1.0.