Add SSE4 optimization of S32A_Opaque_Blitrow
authorhenrik.smiding <henrik.smiding@intel.com>
Thu, 5 Jun 2014 14:50:54 +0000 (07:50 -0700)
committerCommit bot <commit-bot@chromium.org>
Thu, 5 Jun 2014 14:50:54 +0000 (07:50 -0700)
commite2527b147679b0c43019fae7d59cc3777d2d097e
treed08603391de4fdf674b249223cb16301affd46ef
parent58edea89627d347010cadc26ce3c092a9265a8ee
Add SSE4 optimization of S32A_Opaque_Blitrow

Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
gyp/opts.gyp
gyp/skia_lib.gyp
src/opts/SkBlitRow_opts_SSE4.h [new file with mode: 0644]
src/opts/SkBlitRow_opts_SSE4_asm.S [new file with mode: 0644]
src/opts/SkBlitRow_opts_SSE4_x64_asm.S [new file with mode: 0644]
src/opts/opts_check_x86.cpp