[RISCV] Cost model for general case of single vector permute
The cost model was not accounting for the fact that we can generate vrgather + an index expression.
Two cases to call out.
1) I did not model the difference between vrgather and vrgatherei16. The result is the constant pool cost can be slightly understated on RV32. I don't think we care, but if someone disagrees, this would be easy to add.
2) Our current codegen for i8 vectors longer than 256 (which is the limit of what this costs) has some room for improvement.
Differential Revision: https://reviews.llvm.org/D147000