review.tizen.org Git - platform/upstream/opencv.git/commit

projects / platform / upstream / opencv.git / commit

author	Paul E. Murphy <pmur@users.noreply.github.com>
	Tue, 20 Aug 2019 16:26:38 +0000 (11:26 -0500)
committer	Paul E. Murphy <pmur@users.noreply.github.com>
	Tue, 20 Aug 2019 20:28:36 +0000 (15:28 -0500)
commit	33fb253a66275abaa5060ef318c9a5cc87c5fd6e
tree	5727f458405f37eb46cbe0cf4a243ce2e9340c28	tree \| snapshot
parent	7295983964044c280484469d73d6b8f59dbc5a4f	commit \| diff

core: vectorize dotProd_32s

Use 4x FMA chains to sum on SIMD 128 FP64 targets. On
x86 this showed about 1.4x improvement.

For PPC, do a full multiply (32x32->64b), convert to DP
then accumulate. This may be slightly less precise for
some inputs. But is 1.5x faster than the above which
is about 1.5x than the FMA above for ~2.5x speedup.

modules/core/include/opencv2/core/hal/intrin_vsx.hpp		diff \| blob \| history
modules/core/src/matmul.simd.hpp		diff \| blob \| history

Domain: Multimedia / Media Vision;

RSS Atom