crypto: arm64/chacha - simplify tail block handling
authorArd Biesheuvel <ardb@kernel.org>
Fri, 6 Nov 2020 16:39:38 +0000 (17:39 +0100)
committerHerbert Xu <herbert@gondor.apana.org.au>
Fri, 13 Nov 2020 09:38:55 +0000 (20:38 +1100)
commitc4fc6328d6c67690a7e6e03f43a5a976a13120ef
treed704af4733b928a4bf78f5afc5ec641abb78fd58
parent9c0cef2364750c00ab380cc8902dbbc91e230183
crypto: arm64/chacha - simplify tail block handling

Based on lessons learnt from optimizing the 32-bit version of this driver,
we can simplify the arm64 version considerably, by reordering the final
two stores when the last block is not a multiple of 64 bytes. This removes
the need to use permutation instructions to calculate the elements that are
clobbered by the final overlapping store, given that the store of the
penultimate block now follows it, and that one carries the correct values
for those elements already.

While at it, simplify the overlapping loads as well, by calculating the
address of the final overlapping load upfront, and switching to this
address for every load that would otherwise extend past the end of the
source buffer.

There is no impact on performance, but the resulting code is substantially
smaller and easier to follow.

Cc: Eric Biggers <ebiggers@google.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
arch/arm64/crypto/chacha-neon-core.S