Xfermode: SSE2 implementation of multiply_modeproc
This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=14006
Committed: http://code.google.com/p/skia/source/detail?r=14050
R=mtklein@google.com, robertphillips@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/
202903004
git-svn-id: http://skia.googlecode.com/svn/trunk@14107
2bbb7eff-a529-9590-31e7-
b0007b416f81