further improvements in split & merge; started using non-temporary store instructions...
authorVadim Pisarevsky <vadim.pisarevsky@gmail.com>
Thu, 26 Jul 2018 09:04:28 +0000 (12:04 +0300)
committerGitHub <noreply@github.com>
Thu, 26 Jul 2018 09:04:28 +0000 (12:04 +0300)
commit43820d89b475dd32d11b441eaeef998dcd530752
treefe1725b939c62df5cb319d84308e931c2fe137f7
parent5336b9ad19bdbc0f4d913933074455554351b298
further improvements in split & merge; started using non-temporary store instructions (#12063)

* 1. changed static const __m128/256 to const __m128/256 to avoid wierd instructions and calls inserted by compiler.
2. added universal intrinsics that wrap MOVNTPS and other such (non-temporary or "no cache" store) instructions. v_store_interleave() and v_store() got respective flags/overloaded variants
3. rewrote split & merge to use the "no cache" store instructions. It resulted in dramatic performance improvement when processing big arrays

* hopefully, fixed some test failures where 4-channel v_store_interleave() is used

* added missing implementation of the new universal intrinsics (v_store_aligned_nocache() etc.)

* fixed silly typo in the new intrinsics in intrin_vsx.hpp

* still trying to fix VSX compiler errors

* still trying to fix VSX compiler errors

* still trying to fix VSX compiler errors

* still trying to fix VSX compiler errors
modules/core/include/opencv2/core/hal/intrin.hpp
modules/core/include/opencv2/core/hal/intrin_avx.hpp
modules/core/include/opencv2/core/hal/intrin_cpp.hpp
modules/core/include/opencv2/core/hal/intrin_neon.hpp
modules/core/include/opencv2/core/hal/intrin_sse.hpp
modules/core/include/opencv2/core/hal/intrin_vsx.hpp
modules/core/src/mathfuncs_core.simd.hpp
modules/core/src/merge.cpp
modules/core/src/split.cpp