Davinci: SPI performance enhancements
The following restructuring and optimisations increase the SPI
read performance from 1.3MiB/s (on da850) to 2.87MiB/s (on da830):
Remove continual revaluation of driver state from the core of the
copy loop. State can not change during the copy loop, so it is
possible to move these evaluations to before the copy loop.
Cost is more code space as loop variants are required for each set
of possible configurations. The loops are simpler however, so the
extra is only 128bytes on da830 with CONFIG_SPI_HALF_DUPLEX
defined.
Unrolling the first copy loop iteration allows the TX buffer to be
pre-loaded reducing SPI clock starvation.
Unrolling the last copy loop iteration removes testing for the
final loop iteration every time round the loop.
Using the RX buffer empty flag as a transfer throttle allows the
assumption that it is always safe to write to the TX buffer, so
polling of TX buffer full flag can be removed.
Signed-off-by: Nick Thompson <nick.thompson@ge.com>
Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>