Really use SSE4 (and SSSE3) in SkBlurImage_SSE4
authormtklein <mtklein@chromium.org>
Wed, 6 May 2015 20:22:02 +0000 (13:22 -0700)
committerCommit bot <commit-bot@chromium.org>
Wed, 6 May 2015 20:22:02 +0000 (13:22 -0700)
commite0cab96599764f7611de015f558a6b22162c3eda
treef08d1532ab8414bccaccfe7b298ee22c65a9927a
parent0ce02c3ac1ff412b14c275eaf918acd88b3f0774
Really use SSE4 (and SSSE3) in SkBlurImage_SSE4

We don't seem to be making good use of the available instruction set.
SSE4.1 gives us an easy way to unpack a pixel into an __m128i, and
SSSE3 gave us an easy way to do the reverse.

This should be bit-perfect and about a 10% speedup.

BUG=skia:

Review URL: https://codereview.chromium.org/1123263003
src/opts/SkBlurImage_opts_SSE4.cpp