Xfermode: SSE2 implementation of overlay_modeproc
With SSE2 optimization, performance of Xfermode_Overlay will improve
about 35% on desktop i7-3770. Here are the data:
before:
Xfermode_Overlay 8888: cmsecs = 44.17 565: cmsecs = 59.27
after:
Xfermode_Overlay 8888: cmsecs = 28.30 565: cmsecs = 35.84
BUG=skia:
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/
232783002
git-svn-id: http://skia.googlecode.com/svn/trunk@14370
2bbb7eff-a529-9590-31e7-
b0007b416f81