powerpc/64: Implement clear_bit_unlock_is_negative_byte()
authorNicholas Piggin <npiggin@gmail.com>
Tue, 3 Jan 2017 18:58:28 +0000 (04:58 +1000)
committerMichael Ellerman <mpe@ellerman.id.au>
Sat, 18 Feb 2017 03:40:01 +0000 (14:40 +1100)
commitd11914b21c4c21a294fe8937d66c1a192caa3cad
treec4b6822062b0d542c772979eb05ba20abbdae3f8
parent02983449c87b1dfd9b75af4c8a2a8057f9664c08
powerpc/64: Implement clear_bit_unlock_is_negative_byte()

Commit b91e1302ad9b8 ("mm: optimize PageWaiters bit use for
unlock_page()") added a special bitop function to speed up
unlock_page(). Implement this for 64-bit powerpc.

This improves the unlock_page() core code from this:

li 9,1
lwsync
1: ldarx 10,0,3,0
andc 10,10,9
stdcx. 10,0,3
bne- 1b
ori 2,2,0
ld 9,0(3)
andi. 10,9,0x80
beqlr
li 4,0
b wake_up_page_bit

To this:

li 10,1
lwsync
1: ldarx 9,0,3,0
andc 9,9,10
stdcx. 9,0,3
bne- 1b
andi. 10,9,0x80
beqlr
li 4,0
b wake_up_page_bit

In a test of elapsed time for dd writing into 16GB of already-dirty
pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB
this patch reduced overhead by 1.1%:

    N           Min           Max        Median           Avg        Stddev
x  19         2.578         2.619         2.594         2.595         0.011
+  19         2.552         2.592         2.564         2.565         0.008
Difference at 95.0% confidence
-0.030  +/- 0.006
-1.142% +/- 0.243%

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Made 64-bit only until I can test it properly on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
arch/powerpc/include/asm/bitops.h