remap/arm: Adjust inline asm constraints
gcc10 can effectively emit single precision registers if right
operand modifier constraint is not in use
This results in assembler rejecting the code
/tmp/ccEG4QpI.s:646: Error: VFP/Neon double precision register expected -- `vtbl.8 d3,{d0,d1},s8'
/tmp/ccEG4QpI.s:678: Error: invalid instruction shape -- `vmul.f32 d0,d0,s8'
Therefore add %P qualifier to request double registers sinece 'w' could
mean variable could be stored in s0..s14 and GCC defaults to printing out s0..s14.
Note those registers map to d0..d7 also.
Output generated is exactly same with gcc9, and it also now compiles
with gcc10
Its not documented well in gcc docs and there is a ticket for that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343
Signed-off-by: Khem Raj <raj.khem@gmail.com>