[ARM] Fix vcvtb/t.f16 input liveness
The `vcvtb.f16.f32 Sd, Sn` (and vcvtt.f16.f32) instruction convert a f32
into a f16, writing either the top or bottom halves of the register.
That means that half of the input register Sd is used in the output.
This wasn't being modelled in the instructions, leading later analyses
to believe that the registers were dead where they were not, generating
invalid scheduling
Fix that be specifying the input Sda register for the instructions too,
allowing them to be set for cases like vector inserts. Most of the
changes are plumbing through the constraint string, cstr.
Differential Revision: https://reviews.llvm.org/D126118