The arm64 architecture has the ability to exclusively load and store
a pair of registers from an address (ldxp/stxp). Also the SLUB can take
advantage of a cmpxchg_double implementation to avoid taking some
locks.
This patch provides an implementation of cmpxchg_double for 64-bit
pairs, and activates the logic required for the SLUB to use these
functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE).
On a Juno platform running on only the A57s I get quite a noticeable
performance improvement with hackbench.
Before patch applied:
$ ./hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 206.331
After patch applied:
$ ./hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 182.396
Signed-off-by: Steve Capper <steve.capper@linaro.org>