Add AVX2 version of ConvolveVertically
ConvolveVertically time is reduced about 60% using haswell cpu.
Nanobench results:
before after
bitmap_scale_filter_64_256 611us 302us
bitmap_scale_filter_80_90 101us 64.9us
bitmap_scale_filter_30_90 82.3us 51.4us
bitmap_scale_filter_10_90 73.6us 42.4us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=
2526733002
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Review-Url: https://codereview.chromium.org/
2526733002