I have found a more efficient way of detecting 1 and 0 alpha in SSE2. In addition...
authorherb <herb@google.com>
Mon, 23 May 2016 20:50:12 +0000 (13:50 -0700)
committerCommit bot <commit-bot@chromium.org>
Mon, 23 May 2016 20:50:12 +0000 (13:50 -0700)
commit074b48ecb5ed8f9b25039477794437ae853d85c4
tree3188dbfc96a1e64c52c22d6c383c82f3dfd31af6
parent1d1559620058365e0de25636f1bcf07fcc071c3d
I have found a more efficient way of detecting 1 and 0 alpha in SSE2. In addition, I found a stall on an execution unit for the lea instruction and rearranged to code to avoid that.

Before
 1,362.01 LinearSrcOvericonstrip.pngVSkOptsSSE41
 2,132.54 LinearSrcOvericonstrip.pngVSkOptsDefault
 1,717.77 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
 3,525.14 LinearSrcOvericonstrip.pngVSkOptsTrivial
11,181.78 LinearSrcOvericonstrip.pngVSkOptsBruteForce
   644.77 LinearSrcOvermandrill_512.pngVSkOptsSSE41
   682.51 LinearSrcOvermandrill_512.pngVSkOptsDefault
 1,169.65 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
 2,486.45 LinearSrcOvermandrill_512.pngVSkOptsTrivial
11,635.94 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
   217.76 LinearSrcOverplane.pngVSkOptsSSE41
   437.09 LinearSrcOverplane.pngVSkOptsDefault
   275.91 LinearSrcOverplane.pngVSkOptsNonSimdCore
   481.70 LinearSrcOverplane.pngVSkOptsTrivial
 1,504.66 LinearSrcOverplane.pngVSkOptsBruteForce
   323.90 LinearSrcOverbaby_tux.pngVSkOptsSSE41
   497.49 LinearSrcOverbaby_tux.pngVSkOptsDefault
   456.08 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
   786.46 LinearSrcOverbaby_tux.pngVSkOptsTrivial
 2,554.65 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
   484.83 LinearSrcOveryellow_rose.pngVSkOptsSSE41
   821.86 LinearSrcOveryellow_rose.pngVSkOptsDefault
   655.37 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
 1,323.80 LinearSrcOveryellow_rose.pngVSkOptsTrivial
 5,802.61 LinearSrcOveryellow_rose.pngVSkOptsBruteForce

After changes to sse2 and sse4.1
  1,343.12 LinearSrcOvericonstrip.pngVSkOptsSSE41
  1,441.17 LinearSrcOvericonstrip.pngVSkOptsDefault
  1,679.97 LinearSrcOvericonstrip.pngVSkOptsNonSimdCore
  3,481.05 LinearSrcOvericonstrip.pngVSkOptsTrivial
 10,979.99 LinearSrcOvericonstrip.pngVSkOptsBruteForce
    574.17 LinearSrcOvermandrill_512.pngVSkOptsSSE41
    641.40 LinearSrcOvermandrill_512.pngVSkOptsDefault
  1,169.44 LinearSrcOvermandrill_512.pngVSkOptsNonSimdCore
  2,359.84 LinearSrcOvermandrill_512.pngVSkOptsTrivial
 12,106.02 LinearSrcOvermandrill_512.pngVSkOptsBruteForce
    209.95 LinearSrcOverplane.pngVSkOptsSSE41
    249.12 LinearSrcOverplane.pngVSkOptsDefault
    270.36 LinearSrcOverplane.pngVSkOptsNonSimdCore
    466.30 LinearSrcOverplane.pngVSkOptsTrivial
  1,431.14 LinearSrcOverplane.pngVSkOptsBruteForce
    309.70 LinearSrcOverbaby_tux.pngVSkOptsSSE41
    354.86 LinearSrcOverbaby_tux.pngVSkOptsDefault
    442.69 LinearSrcOverbaby_tux.pngVSkOptsNonSimdCore
    764.12 LinearSrcOverbaby_tux.pngVSkOptsTrivial
  2,756.16 LinearSrcOverbaby_tux.pngVSkOptsBruteForce
    457.70 LinearSrcOveryellow_rose.pngVSkOptsSSE41
    500.50 LinearSrcOveryellow_rose.pngVSkOptsDefault
    677.84 LinearSrcOveryellow_rose.pngVSkOptsNonSimdCore
  1,301.50 LinearSrcOveryellow_rose.pngVSkOptsTrivial
  5,786.40 LinearSrcOveryellow_rose.pngVSkOptsBruteForce

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1998373002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/1998373002
src/opts/SkBlend_opts.h