- Added support a different strategy for cov computations in the multi-tower scenario. In this strategy we do the cov computations locally on each tower and then sum the results, as opposed to concatenating everything onto a single device. This other strategy can be enabled by setting the global variable TOWER_STRATEGY to "separate" (default value is "concat", which implements the old strategy). We might change this to use "separate" by default if this turns out to be the best default.
- The code and documentation now no longer refer to the towers as computing different "mini-batches", since this was a confusing use of terminology. The best way to think about things is that the combine data over all the towers forms the mini-batch. Note however when factors process multiple towers using the "separate" strategy their batch_size variable will still refer to the amount of data in a single tower.
- Fixed a bug in how the "option 1" and "option 2" RNN Fisher approximations were computed in the multi-tower scenario.
- The "time-folded-into-batch" feature recently added has now changed in terms of what format it uses. Time is now the first dimension before the reshape, not the second, which is consistent with the convention used in other codebases.
PiperOrigin-RevId:
190615398