[X86][AMX] Lower tile copy instruction.
authorLuo, Yuanke <yuanke.luo@intel.com>
Sat, 20 Feb 2021 07:05:07 +0000 (15:05 +0800)
committerLuo, Yuanke <yuanke.luo@intel.com>
Mon, 22 Feb 2021 23:49:42 +0000 (07:49 +0800)
commit8f48ddd1935831979e1d7f37e47db532534b37c4
treea5e56979733de4ae79ba717e11fb1118ff2e1154
parente8617f2f1870022b7dd076bf43c7aaee30831197
[X86][AMX] Lower tile copy instruction.

Since there is no tile copy instruction, we need to store tile
register to stack and load from stack to another tile register.
We need extra GR to hold the stride, and we need stack slot to
hold the tile data register. We would run this pass after copy
propagation, so that we don't miss copy optimization. And we
would run this pass before prolog/epilog insertion, so that we
can allocate stack slot.

Differential Revision: https://reviews.llvm.org/D97112
llvm/lib/Target/X86/CMakeLists.txt
llvm/lib/Target/X86/X86.h
llvm/lib/Target/X86/X86LowerTileCopy.cpp [new file with mode: 0644]
llvm/lib/Target/X86/X86RegisterInfo.cpp
llvm/lib/Target/X86/X86TargetMachine.cpp
llvm/test/CodeGen/X86/AMX/amx-lower-tile-copy.ll [new file with mode: 0644]
llvm/test/CodeGen/X86/O0-pipeline.ll
llvm/test/CodeGen/X86/opt-pipeline.ll