iommu/arm-smmu-v3: Avoid back-to-back CMD_SYNC operations
Putting adjacent CMD_SYNCs into the command queue is nonsensical, but
can happen when multiple CPUs are inserting commands. Rather than leave
the poor old hardware to chew through these operations, we can instead
drop the subsequent SYNCs and poll for completion of the first. This
has been shown to improve IO performance under pressure, where the
number of SYNC operations reduces by about a third:
CMD_SYNCs reduced:
19542181
CMD_SYNCs total:
58098548 (include reduced)
CMDs total:
116197099 (TLBI:SYNC about 1:1)
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>