SkHalfToFloat_01 / SkFloatToHalf_01
authormtklein <mtklein@chromium.org>
Thu, 11 Feb 2016 13:56:08 +0000 (05:56 -0800)
committerCommit bot <commit-bot@chromium.org>
Thu, 11 Feb 2016 13:56:08 +0000 (05:56 -0800)
commit9ea11a4235b3e3521cc8bf914a27c2d0dc062db9
treeed525d581fb1f9e7d097315071f3552acfe48775
parent2c89bc153b5228c6316b5cfa070cad3d6da169ca
SkHalfToFloat_01 / SkFloatToHalf_01

These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].

Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.

In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.

Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.

Getting close to _u16 performance:
    micros    bench
    261.13   xferu64_bw_1_opaque_u16
   1833.51   xferu64_bw_1_alpha_u16
   2762.32 ? xferu64_aa_1_opaque_u16
   3334.29   xferu64_aa_1_alpha_u16
    249.78   xferu64_bw_1_opaque_f16
   3383.18   xferu64_bw_1_alpha_f16
   4214.72   xferu64_aa_1_opaque_f16
   4701.19   xferu64_aa_1_alpha_f16

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005

Review URL: https://codereview.chromium.org/1685133005
src/core/SkHalf.h
src/core/SkXfermodeU64.cpp
tests/Float16Test.cpp