Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
authorRonald S. Bultje <rbultje@google.com>
Thu, 20 Jun 2013 16:34:25 +0000 (09:34 -0700)
committerRonald S. Bultje <rbultje@google.com>
Thu, 20 Jun 2013 16:34:25 +0000 (09:34 -0700)
commit8fb6c58191251792765c2910af3f9d6da22d6c11
tree658ce312142ee7b7d3dd092beaab84e2cd301476
parent3656835771ad338ed22cebb19311274e90efc768
Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.

Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):

sse2   4x4:    99 ->   82 cycles
sse2   4x8:           128 cycles
sse2   8x4:           121 cycles
sse2   8x8:   149 ->  129 cycles
sse2   8x16:  235 ->  245 cycles (?)
sse2  16x8:   269 ->  203 cycles
sse2  16x16:  441 ->  349 cycles
sse2  16x32:          641 cycles
sse2  32x16:          643 cycles
sse2  32x32: 1733 -> 1154 cycles
sse2  32x64:         2247 cycles
sse2  64x32:         2323 cycles
sse2  64x64: 6984 -> 4442 cycles

ssse3  4x4:           100 cycles (?)
ssse3  4x8:           103 cycles
ssse3  8x4:            71 cycles
ssse3  8x8:           147 cycles
ssse3  8x16:          158 cycles
ssse3 16x8:   188 ->  162 cycles
ssse3 16x16:  316 ->  273 cycles
ssse3 16x32:          535 cycles
ssse3 32x16:          564 cycles
ssse3 32x32:          973 cycles
ssse3 32x64:         1930 cycles
ssse3 64x32:         1922 cycles
ssse3 64x64:         3760 cycles

Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
test/variance_test.cc
vp9/common/vp9_rtcd_defs.sh
vp9/encoder/x86/vp9_subpel_variance.asm [new file with mode: 0644]
vp9/encoder/x86/vp9_subpel_variance_impl_sse2.asm
vp9/encoder/x86/vp9_variance_impl_mmx.asm
vp9/encoder/x86/vp9_variance_impl_sse2.asm
vp9/encoder/x86/vp9_variance_impl_ssse3.asm [deleted file]
vp9/encoder/x86/vp9_variance_mmx.c
vp9/encoder/x86/vp9_variance_sse2.c
vp9/encoder/x86/vp9_variance_ssse3.c [deleted file]
vp9/vp9cx.mk