Changelog¶
v0.0.3¶
Released on September 30, 2019.
- Featured:
- torchgpipe now overlaps copy and computation using the separate CUDA streams. Previously, GPU could not compute a partition while copying micro-batches across different GPUs because they all happened on the same default CUDA stream.
- Other Improvements:
- Added support for PyTorch 1.2.
- Redesigned the internal pipeline parallelism to represent dependencies transparently.
- Fixed the hanging issue when an exception is raised in a partition.
- Fixed the unintended size accumulation (issue #3 by Shiyan Deng) of
balance_by_size()
.
- Breaking Changes:
- No more support for PyTorch 1.0.
- Changed type of
GPipe.devices
fromtuple
tolist
. - Removed
current_microbatch
. This approach turned out to be incompatible with checkpointing.
v0.0.2¶
Released on June 26, 2019.
- Added support for PyTorch 1.1.
- Refined public APIs.
- Detailed documentation.
- Proper exceptions for invalid usage.
- Provided automatic balancing.
- Provided inspecting utilities:
current_microbatch
(DO NOT USE, deprecated since v0.0.3) andis_recomputing()
- Reimplemented deferred batch normalization by subclassing.
v0.0.1¶
Released on May 14, 2019 to evaluate usability and efficiency internally.
- Provided a functional GPipe implementation, including pipeline parallelism, checkpointing, and deferred batch normalization.
- Supported Python 3.6+ and PyTorch 1.0.