AMDGPU: Add more flat scratch load and store tests for 8 and 16-bit types
Add tests for more complicated scratch load and store patterns.
Includes:
- sign and zero extending loads of i8 and i16 to i32 into 32-bit register
- D16 instructions that affect only high or low 16 bits of 32-bit register
- D16 sign and zero extending loads of i8 to i16 into high or low 16 bits
of 32-bit register
- D16 loads of i16 to high or low 16 bits of 32-bit register
- D16 stores of i8 and i16 from high 16 bits of 32-bit register
Differential Revision: https://reviews.llvm.org/D145081