Flush to zero when loading f16 with sse2/sse4.1.
authorMike Klein <mtklein@chromium.org>
Thu, 16 Feb 2017 11:21:54 +0000 (06:21 -0500)
committerMike Klein <mtklein@chromium.org>
Thu, 16 Feb 2017 12:54:04 +0000 (12:54 +0000)
commitfa64774820cb42594d3f5bc2059953510f038636
tree46b4bdee3eaa607c5455b03da125563016d83ab0
parent8729e5bbf7c436fd7c7c13182adbbfb419f566b5
Flush to zero when loading f16 with sse2/sse4.1.

The multiply by 0x77800000 is quite slow when the input is denormalized.
We don't mind flushing those values (in the range of 1e-5) to zero.

Implement portable load_f16() / store_f16() too.

Change-Id: I125cff1c79ca71d9abe22ac7877136d86707cb56
Reviewed-on: https://skia-review.googlesource.com/8467
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
src/jumper/SkJumper.cpp
src/jumper/SkJumper.h
src/jumper/SkJumper_generated.h
src/jumper/SkJumper_stages.cpp