Revamp a few ScratchAllocator classes in cudnn_rnn_ops
Prepare for RNN autotune.
* The scratch allocator classes are renamed s.t. they're named by the duration of memory allocated.
* CudnnReservespaceAllocator ==> CudnnRnnAllocatorInOutput.
* CudnnWorkspaceAllocator ==> CudnnRnnAllocatorInTemp
* The old CudnnWorkspaceAllocator (new CudnnRnnAllocatorInTemp) is made a template s.t. it works with different tensor dtypes, which is used later in autotune, during which both workspace (uint8) and reserve space (input_dtype) are temp-allocated.
* Change CudnnModelShapes ==> CudnnRnnModelShapes
PiperOrigin-RevId:
192018334