platform/kernel/linux-rpi.git
3 years agocrypto: arm64/aes-ce-mac - simplify NEON yield
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:24 +0000 (12:36 +0100)]
crypto: arm64/aes-ce-mac - simplify NEON yield

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm64/aes-neonbs - remove NEON yield calls
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:23 +0000 (12:36 +0100)]
crypto: arm64/aes-neonbs - remove NEON yield calls

There is no need for elaborate yield handling in the bit-sliced NEON
implementation of AES, given that skciphers are naturally bounded by the
size of the chunks returned by the skcipher_walk API. So remove the
yield calls from the asm code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm64/sha512-ce - simplify NEON yield
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:22 +0000 (12:36 +0100)]
crypto: arm64/sha512-ce - simplify NEON yield

Instead of calling into kernel_neon_end() and kernel_neon_begin() (and
potentially into schedule()) from the assembler code when running in
task mode and a reschedule is pending, perform only the preempt count
check in assembler, but simply return early in this case, and let the C
code deal with the consequences.

This reverts commit 6caf7adc5e458f77f550b6c6ca8effa152d61b4a.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm64/sha3-ce - simplify NEON yield
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:21 +0000 (12:36 +0100)]
crypto: arm64/sha3-ce - simplify NEON yield

Instead of calling into kernel_neon_end() and kernel_neon_begin() (and
potentially into schedule()) from the assembler code when running in
task mode and a reschedule is pending, perform only the preempt count
check in assembler, but simply return early in this case, and let the C
code deal with the consequences.

This reverts commit 7edc86cb1c18b4c274672232117586ea2bef1d9a.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm64/sha2-ce - simplify NEON yield
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:20 +0000 (12:36 +0100)]
crypto: arm64/sha2-ce - simplify NEON yield

Instead of calling into kernel_neon_end() and kernel_neon_begin() (and
potentially into schedule()) from the assembler code when running in
task mode and a reschedule is pending, perform only the preempt count
check in assembler, but simply return early in this case, and let the C
code deal with the consequences.

This reverts commit d82f37ab5e2426287013eba38b1212e8b71e5be3.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm64/sha1-ce - simplify NEON yield
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:19 +0000 (12:36 +0100)]
crypto: arm64/sha1-ce - simplify NEON yield

Instead of calling into kernel_neon_end() and kernel_neon_begin() (and
potentially into schedule()) from the assembler code when running in
task mode and a reschedule is pending, perform only the preempt count
check in assembler, but simply return early in this case, and let the C
code deal with the consequences.

This reverts commit 7df8d164753e6e6f229b72767595072bc6a71f48.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()
Daniele Alessandrelli [Wed, 3 Feb 2021 11:28:37 +0000 (11:28 +0000)]
crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()

The length ('len' parameter) passed to crypto_ecdh_decode_key() is never
checked against the length encoded in the passed buffer ('buf'
parameter). This could lead to an out-of-bounds access when the passed
length is less than the encoded length.

Add a check to prevent that.

Fixes: 3c4b23901a0c7 ("crypto: ecdh - Add ECDH software support")
Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: powerpc/sha256 - remove unneeded semicolon
Yang Li [Tue, 2 Feb 2021 03:17:30 +0000 (11:17 +0800)]
crypto: powerpc/sha256 - remove unneeded semicolon

Eliminate the following coccicheck warning:
./arch/powerpc/crypto/sha256-spe-glue.c:132:2-3: Unneeded
semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: caam - Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE
Jiapeng Chong [Tue, 2 Feb 2021 02:06:15 +0000 (10:06 +0800)]
crypto: caam - Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE

Fix the following coccicheck warning:

./drivers/crypto/caam/debugfs.c:23:0-23: WARNING: caam_fops_u64_ro
should be defined with DEFINE_DEBUGFS_ATTRIBUTE.

./drivers/crypto/caam/debugfs.c:22:0-23: WARNING: caam_fops_u32_ro
should be defined with DEFINE_DEBUGFS_ATTRIBUTE.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: twofish - use unaligned accessors instead of alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:37 +0000 (19:02 +0100)]
crypto: twofish - use unaligned accessors instead of alignmask

Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the
Twofish input and output blocks, which propagates to mode drivers, and
results in pointless copying on architectures that don't care about
alignment, use the unaligned accessors, which will do the right thing on
each respective architecture, avoiding the need for double buffering.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: fcrypt - drop unneeded alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:36 +0000 (19:02 +0100)]
crypto: fcrypt - drop unneeded alignmask

The fcrypt implementation uses memcpy() to access the input and output
buffers so there is no need to set an alignmask.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: cast6 - use unaligned accessors instead of alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:35 +0000 (19:02 +0100)]
crypto: cast6 - use unaligned accessors instead of alignmask

Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the
CAST6 input and output blocks, which propagates to mode drivers, and
results in pointless copying on architectures that don't care about
alignment, use the unaligned accessors, which will do the right thing on
each respective architecture, avoiding the need for double buffering.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: cast5 - use unaligned accessors instead of alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:34 +0000 (19:02 +0100)]
crypto: cast5 - use unaligned accessors instead of alignmask

Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the
CAST5 input and output blocks, which propagates to mode drivers, and
results in pointless copying on architectures that don't care about
alignment, use the unaligned accessors, which will do the right thing on
each respective architecture, avoiding the need for double buffering.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: camellia - use unaligned accessors instead of alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:33 +0000 (19:02 +0100)]
crypto: camellia - use unaligned accessors instead of alignmask

Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the
Camellia input and output blocks, which propagates to mode drivers, and
results in pointless copying on architectures that don't care about
alignment, use the unaligned accessors, which will do the right thing on
each respective architecture, avoiding the need for double buffering.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: blowfish - use unaligned accessors instead of alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:32 +0000 (19:02 +0100)]
crypto: blowfish - use unaligned accessors instead of alignmask

Instead of using an alignmask of 0x3 to ensure 32-bit alignment of
the Blowfish input and output blocks, which propagates to mode drivers,
and results in pointless copying on architectures that don't care about
alignment, use the unaligned accessors, which will do the right thing on
each respective architecture, avoiding the need for double buffering.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: serpent - use unaligned accessors instead of alignmask
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:31 +0000 (19:02 +0100)]
crypto: serpent - use unaligned accessors instead of alignmask

Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the
Serpent input and output blocks, which propagates to mode drivers, and
results in pointless copying on architectures that don't care about
alignment, use the unaligned accessors, which will do the right thing on
each respective architecture, avoiding the need for double buffering.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: serpent - get rid of obsolete tnepres variant
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:30 +0000 (19:02 +0100)]
crypto: serpent - get rid of obsolete tnepres variant

It is not trivial to trace back why exactly the tnepres variant of
serpent was added ~17 years ago - Google searches come up mostly empty,
but it seems to be related with the 'kerneli' version, which was based
on an incorrect interpretation of the serpent spec.

In other words, nobody is likely to care anymore today, so let's get rid
of it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: michael_mic - fix broken misalignment handling
Ard Biesheuvel [Mon, 1 Feb 2021 18:02:29 +0000 (19:02 +0100)]
crypto: michael_mic - fix broken misalignment handling

The Michael MIC driver uses the cra_alignmask to ensure that pointers
presented to its update and finup/final methods are 32-bit aligned.
However, due to the way the shash API works, this is no guarantee that
the 32-bit reads occurring in the update method are also aligned, as the
size of the buffer presented to update may be of uneven length. For
instance, an update() of 3 bytes followed by a misaligned update() of 4
or more bytes will result in a misaligned access using an accessor that
is not suitable for this.

On most architectures, this does not matter, and so setting the
cra_alignmask is pointless. On architectures where this does matter,
setting the cra_alignmask does not actually solve the problem.

So let's get rid of the cra_alignmask, and use unaligned accessors
instead, where appropriate.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agohwrng: timeriomem - Fix cooldown period calculation
Jan Henrik Weinstock [Mon, 1 Feb 2021 15:14:59 +0000 (16:14 +0100)]
hwrng: timeriomem - Fix cooldown period calculation

Ensure cooldown period tolerance of 1% is actually accounted for.

Fixes: ca3bff70ab32 ("hwrng: timeriomem - Improve performance...")
Signed-off-by: Jan Henrik Weinstock <jan.weinstock@rwth-aachen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: marvell - CRYPTO_DEV_OCTEONTX2_CPT should depend on ARCH_THUNDER2
Geert Uytterhoeven [Mon, 1 Feb 2021 13:44:31 +0000 (14:44 +0100)]
crypto: marvell - CRYPTO_DEV_OCTEONTX2_CPT should depend on ARCH_THUNDER2

The Marvell OcteonTX2 CPT physical function PCI device is present only
on OcteonTx2 SoC, and not available as an independent PCIe endpoint.
Hence add a dependency on ARCH_THUNDER2, to prevent asking the user
about this driver when configuring a kernel without OcteonTx2 platform
support.

Fixes: 5e8ce8334734c5f2 ("crypto: marvell - add Marvell OcteonTX2 CPT PF driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/crypto
Herbert Xu [Wed, 10 Feb 2021 06:20:35 +0000 (17:20 +1100)]
Merge git://git./linux/kernel/git/arm64/linux for-next/crypto

Pull change from arm64 tree that's needed for crypto arm changes.

3 years agocrypto: crypto4xx - Avoid linking failure with HW_RANDOM=m
Florian Fainelli [Sat, 30 Jan 2021 22:55:38 +0000 (14:55 -0800)]
crypto: crypto4xx - Avoid linking failure with HW_RANDOM=m

It is currently possible to build CONFIG_HW_RANDOM_PPC4XX=y with
CONFIG_HW_RANDOM=m which would lead to the inability of linking with
devm_hwrng_{register,unregister}. We cannot have the framework modular
and the consumer of that framework built-in, so make that dependency
explicit.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - Add dependency on NET_VENDOR_MARVELL
Herbert Xu [Fri, 29 Jan 2021 05:48:56 +0000 (16:48 +1100)]
crypto: octeontx2 - Add dependency on NET_VENDOR_MARVELL

The crypto octeontx2 driver depends on the mbox code in the network
tree.  It tries to select the MBOX Kconfig option but that option
itself depends on many other options which are not selected, e.g.,
CONFIG_NET_VENDOR_MARVELL.  It would be inappropriate to select them
all as randomly prompting the user for network options which would
oterhwise be disabled just because a crypto driver has been enabled
makes no sense.

This patch fixes this by adding a dependency on NET_VENDOR_MARVELL.
This makes the crypto driver invisible if the network option is off.

If the crypto driver must be visible even without the network stack
then the shared mbox code should be moved out of drivers/net.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 5e8ce8334734 ("crypto: marvell - add Marvell OcteonTX2 CPT...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - fix signedness bug in cptvf_register_interrupts()
Dan Carpenter [Wed, 27 Jan 2021 05:25:59 +0000 (08:25 +0300)]
crypto: octeontx2 - fix signedness bug in cptvf_register_interrupts()

The "num_vec" has to be signed for the error handling to work.

Fixes: 19d8e8c7be15 ("crypto: octeontx2 - add virtual function driver support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: ccree - fix spelling typo of allocated
dingsenjie [Tue, 26 Jan 2021 03:45:53 +0000 (11:45 +0800)]
crypto: ccree - fix spelling typo of allocated

allocted -> allocated

Signed-off-by: dingsenjie <dingsenjie@yulong.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agoarm64: assembler: add cond_yield macro
Ard Biesheuvel [Wed, 3 Feb 2021 11:36:18 +0000 (12:36 +0100)]
arm64: assembler: add cond_yield macro

Add a macro cond_yield that branches to a specified label when called if
the TIF_NEED_RESCHED flag is set and decreasing the preempt count would
make the task preemptible again, resulting in a schedule to occur. This
can be used by kernel mode SIMD code that keeps a lot of state in SIMD
registers, which would make chunking the input in order to perform the
cond_resched() check from C code disproportionately costly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210203113626.220151-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agocrypto: salsa20 - remove Salsa20 stream cipher algorithm
Ard Biesheuvel [Thu, 21 Jan 2021 13:07:33 +0000 (14:07 +0100)]
crypto: salsa20 - remove Salsa20 stream cipher algorithm

Salsa20 is not used anywhere in the kernel, is not suitable for disk
encryption, and widely considered to have been superseded by ChaCha20.
So let's remove it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: tgr192 - remove Tiger 128/160/192 hash algorithms
Ard Biesheuvel [Thu, 21 Jan 2021 13:07:32 +0000 (14:07 +0100)]
crypto: tgr192 - remove Tiger 128/160/192 hash algorithms

Tiger is never referenced anywhere in the kernel, and unlikely
to be depended upon by userspace via AF_ALG. So let's remove it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: rmd320 - remove RIPE-MD 320 hash algorithm
Ard Biesheuvel [Thu, 21 Jan 2021 13:07:31 +0000 (14:07 +0100)]
crypto: rmd320 - remove RIPE-MD 320 hash algorithm

RIPE-MD 320 is never referenced anywhere in the kernel, and unlikely
to be depended upon by userspace via AF_ALG. So let's remove it

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: rmd256 - remove RIPE-MD 256 hash algorithm
Ard Biesheuvel [Thu, 21 Jan 2021 13:07:30 +0000 (14:07 +0100)]
crypto: rmd256 - remove RIPE-MD 256 hash algorithm

RIPE-MD 256 is never referenced anywhere in the kernel, and unlikely
to be depended upon by userspace via AF_ALG. So let's remove it

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: rmd128 - remove RIPE-MD 128 hash algorithm
Ard Biesheuvel [Thu, 21 Jan 2021 13:07:29 +0000 (14:07 +0100)]
crypto: rmd128 - remove RIPE-MD 128 hash algorithm

RIPE-MD 128 is never referenced anywhere in the kernel, and unlikely
to be depended upon by userspace via AF_ALG. So let's remove it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: marvell/cesa - Fix use of sg_pcopy on iomem pointer
Herbert Xu [Thu, 21 Jan 2021 05:16:46 +0000 (16:16 +1100)]
crypto: marvell/cesa - Fix use of sg_pcopy on iomem pointer

The cesa driver mixes use of iomem pointers and normal kernel
pointers.  Sometimes it uses memcpy_toio/memcpy_fromio on both
while other times it would use straight memcpy on both, through
the sg_pcopy_* helpers.

This patch fixes this by adding a new field sram_pool to the engine
for the normal pointer case which then allows us to use the right
interface depending on the value of engine->pool.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: talitos - Fix ctr(aes) on SEC1
Christophe Leroy [Wed, 20 Jan 2021 18:57:25 +0000 (18:57 +0000)]
crypto: talitos - Fix ctr(aes) on SEC1

While ctr(aes) requires the use of a special descriptor on SEC2 (see
commit 70d355ccea89 ("crypto: talitos - fix ctr-aes-talitos")), that
special descriptor doesn't work on SEC1, see commit e738c5f15562
("powerpc/8xx: Add DT node for using the SEC engine of the MPC885").

However, the common nonsnoop descriptor works properly on SEC1 for
ctr(aes).

Add a second template for ctr(aes) that will be registered
only on SEC1.

Fixes: 70d355ccea89 ("crypto: talitos - fix ctr-aes-talitos")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
Christophe Leroy [Wed, 20 Jan 2021 18:57:24 +0000 (18:57 +0000)]
crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)

Talitos Security Engine AESU considers any input
data size that is not a multiple of 16 bytes to be an error.
This is not a problem in general, except for Counter mode
that is a stream cipher and can have an input of any size.

Test Manager for ctr(aes) fails on 4th test vector which has
a length of 499 while all previous vectors which have a 16 bytes
multiple length succeed.

As suggested by Freescale, round up the input data length to the
nearest 16 bytes.

Fixes: 5e75ae1b3cef ("crypto: talitos - add new crypto modes")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon/hpre - add ecc algorithm inqury for uacce device
Hui Tang [Mon, 18 Jan 2021 08:18:19 +0000 (16:18 +0800)]
crypto: hisilicon/hpre - add ecc algorithm inqury for uacce device

Uacce SysFS support more algorithms inqury such as
'ecdh/ecdsa/sm2/x25519/x448'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon/hpre - add two RAS correctable errors processing
Hui Tang [Mon, 18 Jan 2021 08:17:25 +0000 (16:17 +0800)]
crypto: hisilicon/hpre - add two RAS correctable errors processing

1.One CE error is detecting timeout of generating a random number.
2.Another is detecting timeout of SVA prefetching address.

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon/hpre - delete ECC 1bit error reported threshold
Hui Tang [Mon, 18 Jan 2021 08:15:40 +0000 (16:15 +0800)]
crypto: hisilicon/hpre - delete ECC 1bit error reported threshold

Delete 'HPRE_RAS_ECC1BIT_TH' register setting of hpre,
since register 'QM_RAS_CE_THRESHOLD' of qm has done this work.

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - release FPU during skcipher walk API calls
Ard Biesheuvel [Sat, 16 Jan 2021 16:48:10 +0000 (17:48 +0100)]
crypto: aesni - release FPU during skcipher walk API calls

Taking ownership of the FPU in kernel mode disables preemption, and
this may result in excessive scheduling blackouts if the size of the
data being processed on the FPU is unbounded.

Given that taking and releasing the FPU is cheap these days on x86, we
can limit the impact of this issue easily for skcipher implementations,
by moving the FPU begin/end calls inside the skcipher walk processing
loop. Considering that skcipher walks operate on at most one page at a
time, doing so fully mitigates this issue.

This also permits the skcipher walk logic to use non-atomic kmalloc()
calls etc so we can change the 'atomic' bool argument in the calls to
skcipher_walk_virt() to false as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - replace CTR function pointer with static call
Ard Biesheuvel [Sat, 16 Jan 2021 16:48:09 +0000 (17:48 +0100)]
crypto: aesni - replace CTR function pointer with static call

Indirect calls are very expensive on x86, so use a static call to set
the system-wide AES-NI/CTR asm helper.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: keembay - use 64-bit arithmetic for computing bit_len
Ovidiu Panait [Fri, 15 Jan 2021 20:46:05 +0000 (22:46 +0200)]
crypto: keembay - use 64-bit arithmetic for computing bit_len

src_size and aad_size are defined as u32, so the following expressions are
currently being evaluated using 32-bit arithmetic:

bit_len = src_size * 8;
...
bit_len = aad_size * 8;

However, bit_len is used afterwards in a context that expects a valid
64-bit value (the lower and upper 32-bit words of bit_len are extracted
and written to hw).

In order to make sure the correct bit length is generated and the 32-bit
multiplication does not wrap around, cast src_size and aad_size to u64.

Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Acked-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: lib/chacha20poly1305 - define empty module exit function
Jason A. Donenfeld [Fri, 15 Jan 2021 19:30:12 +0000 (20:30 +0100)]
crypto: lib/chacha20poly1305 - define empty module exit function

With no mod_exit function, users are unable to unload the module after
use. I'm not aware of any reason why module unloading should be
prohibited for this one, so this commit simply adds an empty exit
function.

Reported-and-tested-by: John Donnelly <john.p.donnelly@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - register with linux crypto framework
Srujana Challa [Fri, 15 Jan 2021 13:52:27 +0000 (19:22 +0530)]
crypto: octeontx2 - register with linux crypto framework

CPT offload module utilises the linux crypto framework to offload
crypto processing. This patch registers supported algorithms by
calling registration functions provided by the kernel crypto API.

The module currently supports:
- AES block cipher in CBC,ECB and XTS mode.
- 3DES block cipher in CBC and ECB mode.
- AEAD algorithms.
  authenc(hmac(sha1),cbc(aes)),
  authenc(hmac(sha256),cbc(aes)),
  authenc(hmac(sha384),cbc(aes)),
  authenc(hmac(sha512),cbc(aes)),
  authenc(hmac(sha1),ecb(cipher_null)),
  authenc(hmac(sha256),ecb(cipher_null)),
  authenc(hmac(sha384),ecb(cipher_null)),
  authenc(hmac(sha512),ecb(cipher_null)),
  rfc4106(gcm(aes)).

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - add support to process the crypto request
Srujana Challa [Fri, 15 Jan 2021 13:52:26 +0000 (19:22 +0530)]
crypto: octeontx2 - add support to process the crypto request

Attach LFs to CPT VF to process the crypto requests and register
LF interrupts.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - add virtual function driver support
Srujana Challa [Fri, 15 Jan 2021 13:52:25 +0000 (19:22 +0530)]
crypto: octeontx2 - add virtual function driver support

Add support for the Marvell OcteonTX2 CPT virtual function
driver. This patch includes probe, PCI specific initialization
and interrupt handling.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - add support to get engine capabilities
Srujana Challa [Fri, 15 Jan 2021 13:52:24 +0000 (19:22 +0530)]
crypto: octeontx2 - add support to get engine capabilities

Adds support to get engine capabilities and adds a new mailbox
to share capabilities with VF driver.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - add LF framework
Srujana Challa [Fri, 15 Jan 2021 13:52:23 +0000 (19:22 +0530)]
crypto: octeontx2 - add LF framework

CPT RVU Local Functions(LFs) needs to be attached to the
PF/VF to submit the instructions to CPT.
This patch adds the interface to initialize and attach
the LFs. It also adds interface to register the LF's
interrupts.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - load microcode and create engine groups
Srujana Challa [Fri, 15 Jan 2021 13:52:22 +0000 (19:22 +0530)]
crypto: octeontx2 - load microcode and create engine groups

CPT includes microcoded GigaCypher symmetric engines(SEs), IPsec
symmetric engines(IEs), and asymmetric engines (AEs).
Each engine receives CPT instructions from the engine groups it has
subscribed to. This patch loads microcode, configures three engine
groups(one for SEs, one for IEs and one for AEs), and configures
all engines.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - enable SR-IOV and mailbox communication with VF
Srujana Challa [Fri, 15 Jan 2021 13:52:21 +0000 (19:22 +0530)]
crypto: octeontx2 - enable SR-IOV and mailbox communication with VF

Adds 'sriov_configure' to enable/disable virtual functions (VFs).
Also Initializes VF<=>PF mailbox IRQs, register handlers for
processing these mailbox messages.

Admin function (AF) handles resource allocation and configuration for
PFs and their VFs. PFs request the AF directly, via mailboxes.
Unlike PFs, VFs cannot send a mailbox request directly. A VF sends
mailbox messages to its parent PF, with which it shares a mailbox
region. The PF then forwards these messages to the AF. After handling
the request, the AF sends a response back to the VF, through the PF.

This patch adds support for this 'VF <=> PF <=> AF' mailbox
communication.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: octeontx2 - add mailbox communication with AF
Srujana Challa [Fri, 15 Jan 2021 13:52:20 +0000 (19:22 +0530)]
crypto: octeontx2 - add mailbox communication with AF

In the resource virtualization unit (RVU) each of the PF and AF
(admin function) share a 64KB of reserved memory region for
communication. This patch initializes PF <=> AF mailbox IRQs,
registers handlers for processing these communication messages.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: marvell - add Marvell OcteonTX2 CPT PF driver
Srujana Challa [Fri, 15 Jan 2021 13:52:19 +0000 (19:22 +0530)]
crypto: marvell - add Marvell OcteonTX2 CPT PF driver

Adds skeleton for the Marvell OcteonTX2 CPT physical function
driver which includes probe, PCI specific initialization and
hardware register defines.
RVU defines are present in AF driver
(drivers/net/ethernet/marvell/octeontx2/af), header files from
AF driver are included here to avoid duplication.

Signed-off-by: Suheil Chandran <schandran@marvell.com>
Signed-off-by: Lukasz Bartosik <lbartosik@marvell.com>
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm64/sha - add missing module aliases
Ard Biesheuvel [Thu, 14 Jan 2021 18:10:10 +0000 (19:10 +0100)]
crypto: arm64/sha - add missing module aliases

The accelerated, instruction based implementations of SHA1, SHA2 and
SHA3 are autoloaded based on CPU capabilities, given that the code is
modest in size, and widely used, which means that resolving the algo
name, loading all compatible modules and picking the one with the
highest priority is taken to be suboptimal.

However, if these algorithms are requested before this CPU feature
based matching and autoloading occurs, these modules are not even
considered, and we end up with suboptimal performance.

So add the missing module aliases for the various SHA implementations.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: bcm - Fix sparse warnings
Herbert Xu [Thu, 14 Jan 2021 06:39:58 +0000 (17:39 +1100)]
crypto: bcm - Fix sparse warnings

This patch fixes a number of sparse warnings in the bcm driver.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto - shash: reduce minimum alignment of shash_desc structure
Ard Biesheuvel [Wed, 13 Jan 2021 09:11:35 +0000 (10:11 +0100)]
crypto - shash: reduce minimum alignment of shash_desc structure

Unlike many other structure types defined in the crypto API, the
'shash_desc' structure is permitted to live on the stack, which
implies its contents may not be accessed by DMA masters. (This is
due to the fact that the stack may be located in the vmalloc area,
which requires a different virtual-to-physical translation than the
one implemented by the DMA subsystem)

Our definition of CRYPTO_MINALIGN_ATTR is based on ARCH_KMALLOC_MINALIGN,
which may take DMA constraints into account on architectures that support
non-cache coherent DMA such as ARM and arm64. In this case, the value is
chosen to reflect the largest cacheline size in the system, in order to
ensure that explicit cache maintenance as required by non-coherent DMA
masters does not affect adjacent, unrelated slab allocations. On arm64,
this value is currently set at 128 bytes.

This means that applying CRYPTO_MINALIGN_ATTR to struct shash_desc is both
unnecessary (as it is never used for DMA), and undesirable, given that it
wastes stack space (on arm64, performing the alignment costs 112 bytes in
the worst case, and the hole between the 'tfm' and '__ctx' members takes
up another 120 bytes, resulting in an increased stack footprint of up to
232 bytes.) So instead, let's switch to the minimum SLAB alignment, which
does not take DMA constraints into account.

Note that this is a no-op for x86.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: keembay-ocs-hcu - Add dependency on HAS_IOMEM and ARCH_KEEMBAY
Daniele Alessandrelli [Wed, 6 Jan 2021 15:27:33 +0000 (15:27 +0000)]
crypto: keembay-ocs-hcu - Add dependency on HAS_IOMEM and ARCH_KEEMBAY

Add the following additional dependencies for CRYPTO_DEV_KEEMBAY_OCS_HCU:

- HAS_IOMEM to prevent build failures

- ARCH_KEEMBAY to prevent asking the user about this driver when
  configuring a kernel without Intel Keem Bay platform support.

Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: keembay-ocs-hcu - Fix a WARN() message
Dan Carpenter [Wed, 6 Jan 2021 09:25:08 +0000 (12:25 +0300)]
crypto: keembay-ocs-hcu - Fix a WARN() message

The first argument to WARN() is a condition and the messages is the
second argument is the string, so this WARN() will only display the
__func__ part of the message.

Fixes: ae832e329a8d ("crypto: keembay-ocs-hcu - Add HMAC support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86 - use local headers for x86 specific shared declarations
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:09 +0000 (17:48 +0100)]
crypto: x86 - use local headers for x86 specific shared declarations

The Camellia, Serpent and Twofish related header files only contain
declarations that are shared between different implementations of the
respective algorithms residing under arch/x86/crypto, and none of their
contents should be used elsewhere. So move the header files into the
same location, and use local #includes instead.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86 - remove glue helper module
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:08 +0000 (17:48 +0100)]
crypto: x86 - remove glue helper module

All dependencies on the x86 glue helper module have been replaced by
local instantiations of the new ECB/CBC preprocessor helper macros, so
the glue helper module can be retired.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/twofish - drop dependency on glue helper
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:07 +0000 (17:48 +0100)]
crypto: x86/twofish - drop dependency on glue helper

Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/cast6 - drop dependency on glue helper
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:06 +0000 (17:48 +0100)]
crypto: x86/cast6 - drop dependency on glue helper

Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/cast5 - drop dependency on glue helper
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:05 +0000 (17:48 +0100)]
crypto: x86/cast5 - drop dependency on glue helper

Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/serpent - drop dependency on glue helper
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:04 +0000 (17:48 +0100)]
crypto: x86/serpent - drop dependency on glue helper

Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/camellia - drop dependency on glue helper
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:03 +0000 (17:48 +0100)]
crypto: x86/camellia - drop dependency on glue helper

Replace the glue helper dependency with implementations of ECB and CBC
based on the new CPP macros, which avoid the need for indirect calls.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86 - add some helper macros for ECB and CBC modes
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:02 +0000 (17:48 +0100)]
crypto: x86 - add some helper macros for ECB and CBC modes

The x86 glue helper module is starting to show its age:
- It relies heavily on function pointers to invoke asm helper functions that
  operate on fixed input sizes that are relatively small. This means the
  performance is severely impacted by retpolines.
- It goes to great lengths to amortize the cost of kernel_fpu_begin()/end()
  over as much work as possible, which is no longer necessary now that FPU
  save/restore is done lazily, and doing so may cause unbounded scheduling
  blackouts due to the fact that enabling the FPU in kernel mode disables
  preemption.
- The CBC mode decryption helper makes backward strides through the input, in
  order to avoid a single block size memcpy() between chunks. Consuming the
  input in this manner is highly likely to defeat any hardware prefetchers,
  so it is better to go through the data linearly, and perform the extra
  memcpy() where needed (which is turned into direct loads and stores by the
  compiler anyway). Note that benchmarks won't show this effect, given that
  the memory they use is always cache hot.
- It implements blockwise XOR in terms of le128 pointers, which imply an
  alignment that is not guaranteed by the API, violating the C standard.

GCC does not seem to be smart enough to elide the indirect calls when the
function pointers are passed as arguments to static inline helper routines
modeled after the existing ones. So instead, let's create some CPP macros
that encapsulate the core of the ECB and CBC processing, so we can wire
them up for existing users of the glue helper module, i.e., Camellia,
Serpent, Twofish and CAST6.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/blowfish - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:01 +0000 (17:48 +0100)]
crypto: x86/blowfish - drop CTR mode implementation

Blowfish in counter mode is never used in the kernel, so there
is no point in keeping an accelerated implementation around.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/des - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:48:00 +0000 (17:48 +0100)]
crypto: x86/des - drop CTR mode implementation

DES or Triple DES in counter mode is never used in the kernel, so there
is no point in keeping an accelerated implementation around.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/glue-helper - drop CTR helper routines
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:59 +0000 (17:47 +0100)]
crypto: x86/glue-helper - drop CTR helper routines

The glue helper's CTR routines are no longer used, so drop them.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/twofish - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:58 +0000 (17:47 +0100)]
crypto: x86/twofish - drop CTR mode implementation

Twofish in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/cast6 - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:57 +0000 (17:47 +0100)]
crypto: x86/cast6 - drop CTR mode implementation

CAST6 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/cast5 - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:56 +0000 (17:47 +0100)]
crypto: x86/cast5 - drop CTR mode implementation

CAST5 in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/serpent - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:55 +0000 (17:47 +0100)]
crypto: x86/serpent - drop CTR mode implementation

Serpent in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/camellia - drop CTR mode implementation
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:54 +0000 (17:47 +0100)]
crypto: x86/camellia - drop CTR mode implementation

Camellia in CTR mode is never used by the kernel directly, and is highly
unlikely to be relied upon by dm-crypt or algif_skcipher. So let's drop
the accelerated CTR mode implementation, and instead, rely on the CTR
template and the bare cipher.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/glue-helper - drop XTS helper routines
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:53 +0000 (17:47 +0100)]
crypto: x86/glue-helper - drop XTS helper routines

The glue helper's XTS routines are no longer used, so drop them.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/twofish - switch to XTS template
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:52 +0000 (17:47 +0100)]
crypto: x86/twofish - switch to XTS template

Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Twofish in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/serpent- switch to XTS template
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:51 +0000 (17:47 +0100)]
crypto: x86/serpent- switch to XTS template

Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Serpent in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/cast6 - switch to XTS template
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:50 +0000 (17:47 +0100)]
crypto: x86/cast6 - switch to XTS template

Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement CAST6 in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/camellia - switch to XTS template
Ard Biesheuvel [Tue, 5 Jan 2021 16:47:49 +0000 (17:47 +0100)]
crypto: x86/camellia - switch to XTS template

Now that the XTS template can wrap accelerated ECB modes, it can be
used to implement Camellia in XTS mode as well, which turns out to
be at least as fast, and sometimes even faster.

Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: marvell/cesa - Fix a spelling s/fautly/faultly/ in comment
Bhaskar Chowdhury [Tue, 5 Jan 2021 10:01:08 +0000 (15:31 +0530)]
crypto: marvell/cesa - Fix a spelling s/fautly/faultly/ in comment

s/fautly/faulty/p

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon/sec - register SEC device to uacce
Kai Ye [Tue, 5 Jan 2021 06:16:44 +0000 (14:16 +0800)]
crypto: hisilicon/sec - register SEC device to uacce

Register SEC device to uacce framework for user space.

Signed-off-by: Kai Ye <yekai13@huawei.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon/hpre - register HPRE device to uacce
Kai Ye [Tue, 5 Jan 2021 06:16:43 +0000 (14:16 +0800)]
crypto: hisilicon/hpre - register HPRE device to uacce

Register HPRE device to uacce framework for user space.

Signed-off-by: Kai Ye <yekai13@huawei.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon - add ZIP device using mode parameter
Kai Ye [Tue, 5 Jan 2021 06:16:42 +0000 (14:16 +0800)]
crypto: hisilicon - add ZIP device using mode parameter

Add 'uacce_mode' parameter for ZIP, which can be set as 0(default) or 1.
'0' means ZIP is only registered to kernel crypto, and '1' means it's
registered to both kernel crypto and UACCE.

Signed-off-by: Kai Ye <yekai13@huawei.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: hisilicon/qm - SVA bugfixed on Kunpeng920
Kai Ye [Tue, 5 Jan 2021 06:12:03 +0000 (14:12 +0800)]
crypto: hisilicon/qm - SVA bugfixed on Kunpeng920

Kunpeng920 SEC/HPRE/ZIP cannot support running user space SVA and kernel
Crypto at the same time. Therefore, the algorithms should not be registered
to Crypto as user space SVA is enabled.

Signed-off-by: Kai Ye <yekai13@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Reviewed-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: bcm - Rename struct device_private to bcm_device_private
Jiri Olsa [Mon, 4 Jan 2021 23:02:37 +0000 (00:02 +0100)]
crypto: bcm - Rename struct device_private to bcm_device_private

Renaming 'struct device_private' to 'struct bcm_device_private',
because it clashes with 'struct device_private' from
'drivers/base/base.h'.

While it's not a functional problem, it's causing two distinct
type hierarchies in BTF data. It also breaks build with options:
  CONFIG_DEBUG_INFO_BTF=y
  CONFIG_CRYPTO_DEV_BCM_SPU=y

as reported by Qais Yousef [1].

[1] https://lore.kernel.org/lkml/20201229151352.6hzmjvu3qh6p2qgg@e107158-lin/

Fixes: 9d12ba86f818 ("crypto: brcm - Add Broadcom SPU driver")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: qat - reduce size of mapped region
Adam Guerin [Mon, 4 Jan 2021 17:21:59 +0000 (17:21 +0000)]
crypto: qat - reduce size of mapped region

Restrict size of field to what is required by the operation.

This issue was detected by smatch:

    drivers/crypto/qat/qat_common/qat_asym_algs.c:328 qat_dh_compute_value() error: dma_map_single_attrs() '&qat_req->in.dh.in.b' too small (8 vs 64)

Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: qat - change format string and cast ring size
Adam Guerin [Mon, 4 Jan 2021 17:21:58 +0000 (17:21 +0000)]
crypto: qat - change format string and cast ring size

Cast ADF_SIZE_TO_RING_SIZE_IN_BYTES() so it can return a 64 bit value.

This issue was detected by smatch:

    drivers/crypto/qat/qat_common/adf_transport_debug.c:65 adf_ring_show() warn: should '(1 << (ring->ring_size - 1)) << 7' be a 64 bit type?

Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: qat - fix potential spectre issue
Adam Guerin [Mon, 4 Jan 2021 17:21:57 +0000 (17:21 +0000)]
crypto: qat - fix potential spectre issue

Sanitize ring_num value coming from configuration (and potentially
from user space) before it is used as index in the banks array.

This issue was detected by smatch:

    drivers/crypto/qat/qat_common/adf_transport.c:233 adf_create_ring() warn: potential spectre issue 'bank->rings' [r] (local cap)

Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: qat - configure arbiter mapping based on engines enabled
Wojciech Ziemba [Mon, 4 Jan 2021 16:55:46 +0000 (16:55 +0000)]
crypto: qat - configure arbiter mapping based on engines enabled

The hardware specific function adf_get_arbiter_mapping() modifies
the static array thrd_to_arb_map to disable mappings for AEs
that are disabled. This static array is used for each device
of the same type. If the ae mask is not identical for all devices
of the same type then the arbiter mapping returned by
adf_get_arbiter_mapping() may be wrong.

This patch fixes this problem by ensuring the static arbiter
mapping is unchanged and the device arbiter mapping is re-calculated
each time based on the static mapping.

Signed-off-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - replace function pointers with static branches
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:50 +0000 (16:55 +0100)]
crypto: aesni - replace function pointers with static branches

Replace the function pointers in the GCM implementation with static branches,
which are based on code patching, which occurs only at module load time.
This avoids the severe performance penalty caused by the use of retpolines.

In order to retain the ability to switch between different versions of the
implementation based on the input size on cores that support AVX and AVX2,
use static branches instead of static calls.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - refactor scatterlist processing
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:49 +0000 (16:55 +0100)]
crypto: aesni - refactor scatterlist processing

Currently, the gcm(aes-ni) driver open codes the scatterlist handling
that is encapsulated by the skcipher walk API. So let's switch to that
instead.

Also, move the handling at the end of gcmaes_crypt_by_sg() that is
dependent on whether we are encrypting or decrypting into the callers,
which always do one or the other.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - clean up mapping of associated data
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:48 +0000 (16:55 +0100)]
crypto: aesni - clean up mapping of associated data

The gcm(aes-ni) driver is only built for x86_64, which does not make
use of highmem. So testing for PageHighMem is pointless and can be
omitted.

While at it, replace GFP_ATOMIC with the appropriate runtime decided
value based on the context.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - drop unused asm prototypes
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:47 +0000 (16:55 +0100)]
crypto: aesni - drop unused asm prototypes

Drop some prototypes that are declared but never called.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: aesni - prevent misaligned buffers on the stack
Ard Biesheuvel [Mon, 4 Jan 2021 15:55:46 +0000 (16:55 +0100)]
crypto: aesni - prevent misaligned buffers on the stack

The GCM mode driver uses 16 byte aligned buffers on the stack to pass
the IV to the asm helpers, but unfortunately, the x86 port does not
guarantee that the stack pointer is 16 byte aligned upon entry in the
first place. Since the compiler is not aware of this, it will not emit
the additional stack realignment sequence that is needed, and so the
alignment is not guaranteed to be more than 8 bytes.

So instead, allocate some padding on the stack, and realign the IV
pointer by hand.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: qat - replace CRYPTO_AES with CRYPTO_LIB_AES in Kconfig
Marco Chiappero [Mon, 4 Jan 2021 15:35:15 +0000 (15:35 +0000)]
crypto: qat - replace CRYPTO_AES with CRYPTO_LIB_AES in Kconfig

Use CRYPTO_LIB_AES in place of CRYPTO_AES in the dependences for the QAT
common code.

Fixes: c0e583ab2016 ("crypto: qat - add CRYPTO_AES to Kconfig dependencies")
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Marco Chiappero <marco.chiappero@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: stm32 - Fix last sparse warning in stm32_cryp_check_ctr_counter
Herbert Xu [Mon, 4 Jan 2021 06:15:45 +0000 (17:15 +1100)]
crypto: stm32 - Fix last sparse warning in stm32_cryp_check_ctr_counter

This patch changes the cast in stm32_cryp_check_ctr_counter from
u32 to __be32 to match the prototype of stm32_cryp_hw_write_iv
correctly.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: vmx - Move extern declarations into header file
Herbert Xu [Sat, 2 Jan 2021 21:56:18 +0000 (08:56 +1100)]
crypto: vmx - Move extern declarations into header file

This patch moves the extern algorithm declarations into a header
file so that a number of compiler warnings are silenced.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper
Ard Biesheuvel [Thu, 31 Dec 2020 16:41:55 +0000 (17:41 +0100)]
crypto: x86/aes-ni-xts - rewrite and drop indirections via glue helper

The AES-NI driver implements XTS via the glue helper, which consumes
a struct with sets of function pointers which are invoked on chunks
of input data of the appropriate size, as annotated in the struct.

Let's get rid of this indirection, so that we can perform direct calls
to the assembler helpers. Instead, let's adopt the arm64 strategy, i.e.,
provide a helper which can consume inputs of any size, provided that the
penultimate, full block is passed via the last call if ciphertext stealing
needs to be applied.

This also allows us to enable the XTS mode for i386.

Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: x86/aes-ni-xts - use direct calls to and 4-way stride
Ard Biesheuvel [Thu, 31 Dec 2020 16:41:54 +0000 (17:41 +0100)]
crypto: x86/aes-ni-xts - use direct calls to and 4-way stride

The XTS asm helper arrangement is a bit odd: the 8-way stride helper
consists of back-to-back calls to the 4-way core transforms, which
are called indirectly, based on a boolean that indicates whether we
are performing encryption or decryption.

Given how costly indirect calls are on x86, let's switch to direct
calls, and given how the 8-way stride doesn't really add anything
substantial, use a 4-way stride instead, and make the asm core
routine deal with any multiple of 4 blocks. Since 512 byte sectors
or 4 KB blocks are the typical quantities XTS operates on, increase
the stride exported to the glue helper to 512 bytes as well.

As a result, the number of indirect calls is reduced from 3 per 64 bytes
of in/output to 1 per 512 bytes of in/output, which produces a 65% speedup
when operating on 1 KB blocks (measured on a Intel(R) Core(TM) i7-8650U CPU)

Fixes: 9697fa39efd3f ("x86/retpoline/crypto: Convert crypto assembler indirect jumps")
Tested-by: Eric Biggers <ebiggers@google.com> # x86_64
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: picoxcell - Remove PicoXcell driver
Rob Herring [Thu, 10 Dec 2020 20:03:14 +0000 (14:03 -0600)]
crypto: picoxcell - Remove PicoXcell driver

PicoXcell has had nothing but treewide cleanups for at least the last 8
years and no signs of activity. The most recent activity is a yocto vendor
kernel based on v3.0 in 2015.

Cc: Jamie Iles <jamie@jamieiles.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: arm/blake2b - add NEON-accelerated BLAKE2b
Eric Biggers [Wed, 23 Dec 2020 08:10:03 +0000 (00:10 -0800)]
crypto: arm/blake2b - add NEON-accelerated BLAKE2b

Add a NEON-accelerated implementation of BLAKE2b.

On Cortex-A7 (which these days is the most common ARM processor that
doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
SHA-256, and slightly faster than SHA-1.  It is also almost three times
as fast as the generic implementation of BLAKE2b:

Algorithm            Cycles per byte (on 4096-byte messages)
===================  =======================================
blake2b-256-neon     14.0
sha1-neon            16.3
blake2s-256-arm      18.8
sha1-asm             20.8
blake2s-256-generic  26.0
sha256-neon      28.9
sha256-asm      32.0
blake2b-256-generic  38.9

This implementation isn't directly based on any other implementation,
but it borrows some ideas from previous NEON code I've written as well
as from chacha-neon-core.S.  At least on Cortex-A7, it is faster than
the other NEON implementations of BLAKE2b I'm aware of (the
implementation in the BLAKE2 official repository using intrinsics, and
Andrew Moon's implementation which can be found in SUPERCOP).  It does
only one block at a time, so it performs well on short messages too.

NEON-accelerated BLAKE2b is useful because there is interest in using
BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
devices that lack the ARMv8 Crypto Extensions) to replace SHA-1.  On
these devices, the performance cost of upgrading to SHA-256 may be
unacceptable, whereas BLAKE2b-256 would actually improve performance.

Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
BLAKE2b is actually faster than BLAKE2s.  This is because NEON supports
64-bit operations, and because BLAKE2s's block size is too small for
NEON to be helpful for it.  The best I've been able to do with BLAKE2s
on Cortex-A7 is 18.8 cpb with an optimized scalar implementation.

(I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but
they're more complex as they require running multiple hashes at once.
Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7,
so I expect that any speedup from BLAKE2sp or BLAKE3 would come only
from the smaller number of rounds, not from the extra parallelism.)

For now this BLAKE2b implementation is only wired up to the shash API,
since there is no library API for BLAKE2b yet.  However, I've tried to
keep things consistent with BLAKE2s, e.g. by defining
blake2b_compress_arch() which is analogous to blake2s_compress_arch()
and could be exported for use by the library API later if needed.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: blake2b - update file comment
Eric Biggers [Wed, 23 Dec 2020 08:10:02 +0000 (00:10 -0800)]
crypto: blake2b - update file comment

The file comment for blake2b_generic.c makes it sound like it's the
reference implementation of BLAKE2b with only minor changes.  But it's
actually been changed a lot.  Update the comment to make this clearer.

Reviewed-by: David Sterba <dsterba@suse.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 years agocrypto: blake2b - sync with blake2s implementation
Eric Biggers [Wed, 23 Dec 2020 08:10:01 +0000 (00:10 -0800)]
crypto: blake2b - sync with blake2s implementation

Sync the BLAKE2b code with the BLAKE2s code as much as possible:

- Move a lot of code into new headers <crypto/blake2b.h> and
  <crypto/internal/blake2b.h>, and adjust it to be like the
  corresponding BLAKE2s code, i.e. like <crypto/blake2s.h> and
  <crypto/internal/blake2s.h>.

- Rename constants, e.g. BLAKE2B_*_DIGEST_SIZE => BLAKE2B_*_HASH_SIZE.

- Use a macro BLAKE2B_ALG() to define the shash_alg structs.

- Export blake2b_compress_generic() for use as a fallback.

This makes it much easier to add optimized implementations of BLAKE2b,
as optimized implementations can use the helper functions
crypto_blake2b_{setkey,init,update,final}() and
blake2b_compress_generic().  The ARM implementation will use these.

But this change is also helpful because it eliminates unnecessary
differences between the BLAKE2b and BLAKE2s code, so that the same
improvements can easily be made to both.  (The two algorithms are
basically identical, except for the word size and constants.)  It also
makes it straightforward to add a library API for BLAKE2b in the future
if/when it's needed.

This change does make the BLAKE2b code slightly more complicated than it
needs to be, as it doesn't actually provide a library API yet.  For
example, __blake2b_update() doesn't really need to exist yet; it could
just be inlined into crypto_blake2b_update().  But I believe this is
outweighed by the benefits of keeping the code in sync.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>