Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #16 id...
authorstephana <stephana@google.com>
Mon, 2 Feb 2015 17:52:43 +0000 (09:52 -0800)
committerCommit bot <commit-bot@chromium.org>
Mon, 2 Feb 2015 17:52:43 +0000 (09:52 -0800)
commit4988891a1173cd405bf1c1dd3a3668c451f45e4c
tree0fd4b535b7e9b57fb0b6248f69b04339bc5c7b69
parentdb204e301b1320af242e1be5e477cd9453b126a6
Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #16 id:300001 of https://codereview.chromium.org/874863002/)

Reason for revert:
This causes a bug on the 'hittestpath' GM on MacMini 4,1

See:

https://gold.skia.org/#/triage/hittestpath?head=0

for details.

Original issue's description:
> SSE4 opaque blend using intrinsics instead of assembly.
>
> Since we had such a hard time with the assembly versions of this blit (to the
> point that we have them completely disabled everywhere), I thought I'd take
> a shot at writing a version of the blit using intrinsics.
>
> The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> to skip the blend when the 16 src pixels we consider each loop are all opaque
> or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> all those alphas.
>
> It's worth looking to see if we can backport this type of logic to SSE2 using
> _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
>
> My local performance testing doesn't show this to be an unambiguous win
> (there are probably microbenchmarks and SKPs where we'd be better off just
> powering through the blend rather than looking at alphas), but the potential
> does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)
>
> DM says it draws pixel perfect compare to the old code.
>
> Microbenchmarks:
>                bitmap_RGBA_8888_A_source_stripes_two   14us -> 14.4us 1.03x
>              bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
>                        bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
> bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp  102us ->  101us 0.99x
>                 bitmap_RGBA_8888_scale_rotate_bilerp  103us ->  101us 0.99x
>                               bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
>              bitmap_RGBA_8888_A_scale_rotate_bicubic   71us ->   70us 0.99x
>          bitmap_RGBA_8888_update_scale_rotate_bilerp  103us ->  101us 0.99x
>               bitmap_RGBA_8888_A_scale_rotate_bilerp  112us ->  109us 0.98x
>                     bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
>                                     bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
>                              bitmap_RGBA_8888_update 5.78us ->  5.6us 0.97x
>                      bitmap_RGBA_8888_A_scale_bilerp 70.7us ->   68us 0.96x
>                     bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
>                                   bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
>                     bitmap_RGBA_8888_A_source_opaque   14us -> 6.29us 0.45x
>                bitmap_RGBA_8888_A_source_transparent   14us -> 3.65us 0.26x
>
> Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> be a decent predictor of real-world impact.
>
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
>
> CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493

TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842

Review URL: https://codereview.chromium.org/873553003
gyp/opts.gypi
src/opts/SkBlitRow_opts_SSE4.cpp [deleted file]
src/opts/SkBlitRow_opts_SSE4.h
src/opts/SkBlitRow_opts_SSE4_asm.S [new file with mode: 0644]
src/opts/SkBlitRow_opts_SSE4_x64_asm.S [new file with mode: 0644]
src/opts/SkColor_opts_SSE2.h
src/opts/opts_check_x86.cpp