SSE2 implementation of S32A_D565_Opaque_Dither
authorcommit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>
Fri, 7 Mar 2014 13:24:42 +0000 (13:24 +0000)
committercommit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81>
Fri, 7 Mar 2014 13:24:42 +0000 (13:24 +0000)
commitfe089b383aeae512ee39678a667c81867f730cd0
treed11efcf59f44438709ed169b3f8e508a507eec76
parent275804782f7b752cc9c25cb556db2a0cfc711dd9
SSE2 implementation of S32A_D565_Opaque_Dither

Run benchmarks with command line option "--forceDither true --forceBlend
1", almost all the benchmarks exercised S32A_D565_Opaque_Dither can get
about 20%-70% performance improvement.
Here are the data on i7-3770:
                                                  before    after
verts                                            4314.81  3627.64  15.93%
constXTile_MM_filter_trans                       1434.22   432.82  69.82%
constXTile_CC_filter_trans_scale                 1440.17   437.00  69.66%
constXTile_RR_filter_trans                       1436.96   431.93  69.94%
constXTile_MM_trans_scale                        1436.33   435.77  69.66%
constXTile_CC_trans                              1433.12   431.36  69.90%
constXTile_RR_trans_scale                        1436.13   436.06  69.64%
constXTile_MM_filter                             1411.55   408.06  71.09%
constXTile_CC_filter_scale                       1416.68   414.18  70.76%
constXTile_RR_filter                             1429.46   409.81  71.33%
constXTile_MM_scale                              1415.00   412.56  70.84%
constXTile_CC                                    1410.32   408.36  71.04%
constXTile_RR_scale                              1413.26   413.16  70.77%
repeatTile_4444_A                                1922.01   879.03  54.27%
repeatTile_4444_A                                1430.68   818.34  42.80%
repeatTile_4444_X                                1817.43   816.63  55.07%
maskshader                                       5911.09  5895.46   0.26%
gradient_create_alpha                               4.41     4.41  -0.15%
gradient_conical_clamp_3color                   35298.71 27574.34  21.88%
gradient_conical_clamp_hicolor                  35262.15 27538.99  21.90%
gradient_conical_clamp                          35276.21 27599.80  21.76%
gradient_radial2_mirror                         20846.74 12969.39  37.79%
gradient_radial2_clamp_hicolor                  21848.12 13967.57  36.07%
gradient_radial2_clamp                          21829.95 13978.57  35.97%
bitmap_4444_A_scale_rotate_bicubic                105.31    87.13  17.26%
bitmap_4444_A_scale_bicubic                        73.69    47.76  35.20%
bitmap_4444_update_scale_rotate_bilerp            125.65    87.86  30.08%
bitmap_4444_update_volatile_scale_rotate_bilerp   125.50    87.65  30.16%
bitmap_4444_scale_rotate_bilerp                   124.46    87.91  29.37%
bitmap_4444_A_scale_rotate_bilerp                 105.09    87.27  16.96%
bitmap_4444_update_scale_bilerp                   106.78    63.28  40.74%
bitmap_4444_update_volatile_scale_bilerp          106.66    63.66  40.32%
bitmap_4444_scale_bilerp                          106.70    63.19  40.78%
bitmap_4444_A_scale_bilerp                         83.05    62.25  25.04%
bitmap_a8                                          98.11    52.76  46.22%
bitmap_a8_A                                        98.24    52.85  46.20%

BUG=
R=mtklein@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/179443003

git-svn-id: http://skia.googlecode.com/svn/trunk@13699 2bbb7eff-a529-9590-31e7-b0007b416f81
src/opts/SkBlitRow_opts_SSE2.cpp
src/opts/SkBlitRow_opts_SSE2.h
src/opts/opts_check_SSE2.cpp