x86/xor: Add alternative SSE implementation only prefetching once per 64-byte line
authorJan Beulich <JBeulich@suse.com>
Fri, 2 Nov 2012 14:20:24 +0000 (14:20 +0000)
committerIngo Molnar <mingo@kernel.org>
Fri, 25 Jan 2013 08:23:50 +0000 (09:23 +0100)
commitf317820cb6ee3fb173319bf76e0e62437be78ad2
treefc57358da4ba9f11a8d80e508d01e99c2c62c1f9
parente8f6e3f8a14bae98197c6d9f280cd23d22eb1a33
x86/xor: Add alternative SSE implementation only prefetching once per 64-byte line

On CPUs with 64-byte last level cache lines, this yields roughly
10% better performance, independent of CPU vendor or specific
model (as far as I was able to test).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/5093E4B802000078000A615E@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch/x86/include/asm/xor.h
arch/x86/include/asm/xor_32.h
arch/x86/include/asm/xor_64.h