From: Max Kazantsev Date: Thu, 15 Jul 2021 09:40:34 +0000 (+0700) Subject: [Test] We can benefit from pipelining of ymm load/stores X-Git-Tag: llvmorg-14-init~1409 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=69a3acffdf1b3f5fc040aaeafc1c77588a607d1a;p=platform%2Fupstream%2Fllvm.git [Test] We can benefit from pipelining of ymm load/stores This patch demonstrates a scenario when we need to load/store a single 64-byte value, which is done by 2 ymm loads and stores in AVX. The current codegen choses the following sequence: load ymm0 load ymm1 store ymm1 store ymm0 If we instead stored ymm0 before ymm1, we could execute 2nd load and 1st store in parallel. --- diff --git a/llvm/test/CodeGen/X86/ymm-ordering.ll b/llvm/test/CodeGen/X86/ymm-ordering.ll new file mode 100644 index 0000000..65ebbc6 --- /dev/null +++ b/llvm/test/CodeGen/X86/ymm-ordering.ll @@ -0,0 +1,21 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc < %s -mtriple=x86_64-linux -mattr=+avx | FileCheck %s + +; TODO: We If we stored ymm0 before ymm1, then we could execute 2nd load and 1st store in +; parallel. +define void @test_01(i8* %src, i8* %dest) { +; CHECK-LABEL: test_01: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: vmovups (%rdi), %ymm0 +; CHECK-NEXT: vmovups 32(%rdi), %ymm1 +; CHECK-NEXT: vmovups %ymm1, 32(%rsi) +; CHECK-NEXT: vmovups %ymm0, (%rsi) +; CHECK-NEXT: vzeroupper +; CHECK-NEXT: retq +entry: + %read = bitcast i8* %src to <64 x i8>* + %value = load <64 x i8>, <64 x i8>* %read, align 1 + %write = bitcast i8* %dest to <64 x i8>* + store <64 x i8> %value, <64 x i8>* %write, align 1 + ret void +}