Merge pull request #19486 from fpetrogalli:dotprod_fast-3.4
* [hal][neon] Optimize the v_dotprod_fast intrinsics for aarch64.
On Armv8 in AArch64 execution mode, we can skip the sequence
v<op>_<ty>(vget_high_<ty>(x), vget_high_<ty>(y))
in favour of
v<op>_high_<ty>(x, y)
This has better changes for recent compilers to use less data movement
operations and better register allocation. See for example:
https://godbolt.org/z/bPq7vd
* [hal][neon] Fix build failure on armv7.
* [hal][neon] Address review comments in PR.
PR: https://github.com/opencv/opencv/pull/19486
* [hal][neon] Define macro to check for the AArch64 execution state of Armv8.
* [hal][neon] Fix macro definition for AArch64.
The fix is needed to prevent warnings when building for Armv7.