[llvm] r222331 - [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend.

Hal Finkel hfinkel at anl.gov
Thu Nov 20 20:42:11 PST 2014


----- Original Message -----
> From: "Hao Liu" <Hao.Liu at arm.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Wednesday, November 19, 2014 12:39:53 AM
> Subject: [llvm] r222331 - [AArch64] Enable SeparateConstOffsetFromGEP,	EarlyCSE and LICM passes on AArch64 backend.
> 
> Author: haoliu
> Date: Wed Nov 19 00:39:53 2014
> New Revision: 222331
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=222331&view=rev
> Log:
> [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes
> on AArch64 backend.
> SeparateConstOffsetFromGEP can gives more optimizaiton opportunities
> related to GEPs, which benefits EarlyCSE
> and LICM. By enabling these passes we can have better address
> calculations and generate a better addressing
> mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious
> improvements on Cortex-A57.

Mirrored in the PPC backend in r222504. Thanks for all of your work on this!

 -Hal

> 
> Reviewed in http://reviews.llvm.org/D5864.
> 
> Added:
>     llvm/trunk/test/CodeGen/AArch64/aarch64-gep-opt.ll
> Modified:
>     llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp
>     llvm/trunk/test/CodeGen/AArch64/arm64-addr-mode-folding.ll
>     llvm/trunk/test/CodeGen/AArch64/arm64-cse.ll
> 
> Modified: llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp?rev=222331&r1=222330&r2=222331&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp (original)
> +++ llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp Wed Nov 19
> 00:39:53 2014
> @@ -81,6 +81,11 @@ EnableA53Fix835769("aarch64-fix-cortex-a
>                  cl::desc("Work around Cortex-A53 erratum 835769"),
>                  cl::init(false));
>  
> +static cl::opt<bool>
> +EnableGEPOpt("aarch64-gep-opt", cl::Hidden,
> +             cl::desc("Enable optimizations on complex GEPs"),
> +             cl::init(true));
> +
>  extern "C" void LLVMInitializeAArch64Target() {
>    // Register the target.
>    RegisterTargetMachine<AArch64leTargetMachine>
>    X(TheAArch64leTarget);
> @@ -205,6 +210,19 @@ void AArch64PassConfig::addIRPasses() {
>      addPass(createCFGSimplificationPass());
>  
>    TargetPassConfig::addIRPasses();
> +
> +  if (TM->getOptLevel() == CodeGenOpt::Aggressive && EnableGEPOpt) {
> +    // Call SeparateConstOffsetFromGEP pass to extract constants
> within indices
> +    // and lower a GEP with multiple indices to either arithmetic
> operations or
> +    // multiple GEPs with single index.
> +    addPass(createSeparateConstOffsetFromGEPPass(TM, true));
> +    // Call EarlyCSE pass to find and remove subexpressions in the
> lowered
> +    // result.
> +    addPass(createEarlyCSEPass());
> +    // Do loop invariant code motion in case part of the lowered
> result is
> +    // invariant.
> +    addPass(createLICMPass());
> +  }
>  }
>  
>  // Pass Pipeline Configuration
> 
> Added: llvm/trunk/test/CodeGen/AArch64/aarch64-gep-opt.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/aarch64-gep-opt.ll?rev=222331&view=auto
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AArch64/aarch64-gep-opt.ll (added)
> +++ llvm/trunk/test/CodeGen/AArch64/aarch64-gep-opt.ll Wed Nov 19
> 00:39:53 2014
> @@ -0,0 +1,163 @@
> +; RUN: llc -O3 -verify-machineinstrs %s -o - | FileCheck %s
> +; RUN: llc -O3 -print-after=codegenprepare -mcpu=cyclone < %s >%t
> 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
> +; RUN: llc -O3 -print-after=codegenprepare -mcpu=cortex-a53 < %s >%t
> 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s
> +target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
> +target triple = "aarch64-linux-gnueabi"
> +
> +; Following test cases test enabling SeparateConstOffsetFromGEP pass
> in AArch64
> +; backend. If useAA() returns true, it will lower a GEP with
> multiple indices
> +; into GEPs with a single index, otherwise it will lower it into a
> +; "ptrtoint+arithmetics+inttoptr" form.
> +
> +%struct = type { i32, i32, i32, i32, [20 x i32] }
> +
> +; Check that when two complex GEPs are used in two basic blocks,
> LLVM can
> +; elimilate the common subexpression for the second use.
> +define void @test_GEP_CSE([240 x %struct]* %string, i32* %adj, i32
> %lib, i64 %idxprom) {
> +  %liberties = getelementptr [240 x %struct]* %string, i64 1, i64
> %idxprom, i32 3
> +  %1 = load i32* %liberties, align 4
> +  %cmp = icmp eq i32 %1, %lib
> +  br i1 %cmp, label %if.then, label %if.end
> +
> +if.then:                                          ; preds = %entry
> +  %origin = getelementptr [240 x %struct]* %string, i64 1, i64
> %idxprom, i32 2
> +  %2 = load i32* %origin, align 4
> +  store i32 %2, i32* %adj, align 4
> +  br label %if.end
> +
> +if.end:                                           ; preds =
> %if.then, %entry
> +  ret void
> +}
> +
> +; CHECK-LABEL: test_GEP_CSE:
> +; CHECK: madd
> +; CHECK: ldr
> +; CHECK-NOT: madd
> +; CHECK:ldr
> +
> +; CHECK-NoAA-LABEL: @test_GEP_CSE(
> +; CHECK-NoAA: [[PTR0:%[a-zA-Z0-9]+]] = ptrtoint [240 x %struct]*
> %string to i64
> +; CHECK-NoAA: [[PTR1:%[a-zA-Z0-9]+]] = mul i64 %idxprom, 96
> +; CHECK-NoAA: [[PTR2:%[a-zA-Z0-9]+]] = add i64 [[PTR0]], [[PTR1]]
> +; CHECK-NoAA: add i64 [[PTR2]], 23052
> +; CHECK-NoAA: inttoptr
> +; CHECK-NoAA: if.then:
> +; CHECK-NoAA-NOT: ptrtoint
> +; CHECK-NoAA-NOT: mul
> +; CHECK-NoAA: add i64 [[PTR2]], 23048
> +; CHECK-NoAA: inttoptr
> +
> +; CHECK-UseAA-LABEL: @test_GEP_CSE(
> +; CHECK-UseAA: [[PTR0:%[a-zA-Z0-9]+]] = bitcast [240 x %struct]*
> %string to i8*
> +; CHECK-UseAA: [[IDX:%[a-zA-Z0-9]+]] = mul i64 %idxprom, 96
> +; CHECK-UseAA: [[PTR1:%[a-zA-Z0-9]+]] = getelementptr i8* [[PTR0]],
> i64 [[IDX]]
> +; CHECK-UseAA: getelementptr i8* [[PTR1]], i64 23052
> +; CHECK-UseAA: bitcast
> +; CHECK-UseAA: if.then:
> +; CHECK-UseAA: getelementptr i8* [[PTR1]], i64 23048
> +; CHECK-UseAA: bitcast
> +
> +%class.my = type { i32, [128 x i32], i32, [256 x %struct.pt]}
> +%struct.pt = type { %struct.point*, i32, i32 }
> +%struct.point = type { i32, i32 }
> +
> +; Check when a GEP is used across two basic block, LLVM can sink the
> address
> +; calculation and code gen can generate a better addressing mode for
> the second
> +; use.
> +define void @test_GEP_across_BB(%class.my* %this, i64 %idx) {
> +  %1 = getelementptr %class.my* %this, i64 0, i32 3, i64 %idx, i32 1
> +  %2 = load i32* %1, align 4
> +  %3 = getelementptr %class.my* %this, i64 0, i32 3, i64 %idx, i32 2
> +  %4 = load i32* %3, align 4
> +  %5 = icmp eq i32 %2, %4
> +  br i1 %5, label %if.true, label %exit
> +
> +if.true:
> +  %6 = shl i32 %4, 1
> +  store i32 %6, i32* %3, align 4
> +  br label %exit
> +
> +exit:
> +  %7 = add nsw i32 %4, 1
> +  store i32 %7, i32* %1, align 4
> +  ret void
> +}
> +; CHECK-LABEL: test_GEP_across_BB:
> +; CHECK: ldr {{w[0-9]+}}, [{{x[0-9]+}}, #528]
> +; CHECK: ldr {{w[0-9]+}}, [{{x[0-9]+}}, #532]
> +; CHECK-NOT: add
> +; CHECK: str {{w[0-9]+}}, [{{x[0-9]+}}, #532]
> +; CHECK: str {{w[0-9]+}}, [{{x[0-9]+}}, #528]
> +
> +; CHECK-NoAA-LABEL: test_GEP_across_BB(
> +; CHECK-NoAA: add i64 [[TMP:%[a-zA-Z0-9]+]], 528
> +; CHECK-NoAA: add i64 [[TMP]], 532
> +; CHECK-NoAA: if.true:
> +; CHECK-NoAA: {{%sunk[a-zA-Z0-9]+}} = add i64 [[TMP]], 532
> +; CHECK-NoAA: exit:
> +; CHECK-NoAA: {{%sunk[a-zA-Z0-9]+}} = add i64 [[TMP]], 528
> +
> +; CHECK-UseAA-LABEL: test_GEP_across_BB(
> +; CHECK-UseAA: [[PTR0:%[a-zA-Z0-9]+]] = getelementptr
> +; CHECK-UseAA: getelementptr i8* [[PTR0]], i64 528
> +; CHECK-UseAA: getelementptr i8* [[PTR0]], i64 532
> +; CHECK-UseAA: if.true:
> +; CHECK-UseAA: {{%sunk[a-zA-Z0-9]+}} = getelementptr i8* [[PTR0]],
> i64 532
> +; CHECK-UseAA: exit:
> +; CHECK-UseAA: {{%sunk[a-zA-Z0-9]+}} = getelementptr i8* [[PTR0]],
> i64 528
> +
> +%struct.S = type { float, double }
> + at struct_array = global [1024 x %struct.S] zeroinitializer, align 16
> +
> +; The following two test cases check we can extract constant from
> indices of
> +; struct type.
> +; The constant offsets are from indices "i64 %idxprom" and "i32 1".
> As the
> +; alloca size of %struct.S is 16, and "i32 1" is the 2rd element
> whose field
> +; offset is 8, the total constant offset is (5 * 16 + 8) = 88.
> +define double* @test-struct_1(i32 %i) {
> +entry:
> +  %add = add nsw i32 %i, 5
> +  %idxprom = sext i32 %add to i64
> +  %p = getelementptr [1024 x %struct.S]* @struct_array, i64 0, i64
> %idxprom, i32 1
> +  ret double* %p
> +}
> +; CHECK-NoAA-LABEL: @test-struct_1(
> +; CHECK-NoAA-NOT: getelementptr
> +; CHECK-NoAA: add i64 %{{[a-zA-Z0-9]+}}, 88
> +
> +; CHECK-UseAA-LABEL: @test-struct_1(
> +; CHECK-UseAA: getelementptr i8* %{{[a-zA-Z0-9]+}}, i64 88
> +
> +%struct3 = type { i64, i32 }
> +%struct2 = type { %struct3, i32 }
> +%struct1 = type { i64, %struct2 }
> +%struct0 = type { i32, i32, i64*, [100 x %struct1] }
> +
> +; The constant offsets are from indices "i32 3", "i64 %arrayidx" and
> "i32 1".
> +; "i32 3" is the 4th element whose field offset is 16. The alloca
> size of
> +; %struct1 is 32. "i32 1" is the 2rd element whose field offset is
> 8. So the
> +; total constant offset is 16 + (-2 * 32) + 8 = -40
> +define %struct2* @test-struct_2(%struct0* %ptr, i64 %idx) {
> +entry:
> +  %arrayidx = add nsw i64 %idx, -2
> +  %ptr2 = getelementptr %struct0* %ptr, i64 0, i32 3, i64 %arrayidx,
> i32 1
> +  ret %struct2* %ptr2
> +}
> +; CHECK-NoAA-LABEL: @test-struct_2(
> +; CHECK-NoAA-NOT: = getelementptr
> +; CHECK-NoAA: add i64 %{{[a-zA-Z0-9]+}}, -40
> +
> +; CHECK-UseAA-LABEL: @test-struct_2(
> +; CHECK-UseAA: getelementptr i8* %{{[a-zA-Z0-9]+}}, i64 -40
> +
> +; Test that when a index is added from two constant,
> SeparateConstOffsetFromGEP
> +; pass does not generate incorrect result.
> +define void @test_const_add([3 x i32]* %in) {
> +  %inc = add nsw i32 2, 1
> +  %idxprom = sext i32 %inc to i64
> +  %arrayidx = getelementptr [3 x i32]* %in, i64 %idxprom, i64 2
> +  store i32 0, i32* %arrayidx, align 4
> +  ret void
> +}
> +; CHECK-LABEL: test_const_add:
> +; CHECK: str wzr, [x0, #44]
> 
> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-addr-mode-folding.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-addr-mode-folding.ll?rev=222331&r1=222330&r2=222331&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AArch64/arm64-addr-mode-folding.ll
> (original)
> +++ llvm/trunk/test/CodeGen/AArch64/arm64-addr-mode-folding.ll Wed
> Nov 19 00:39:53 2014
> @@ -1,4 +1,4 @@
> -; RUN: llc -O3 -mtriple arm64-apple-ios3 %s -o - | FileCheck %s
> +; RUN: llc -O3 -mtriple arm64-apple-ios3 -aarch64-gep-opt=false %s
> -o - | FileCheck %s
>  ; <rdar://problem/13621857>
>  
>  @block = common global i8* null, align 8
> 
> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-cse.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-cse.ll?rev=222331&r1=222330&r2=222331&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AArch64/arm64-cse.ll (original)
> +++ llvm/trunk/test/CodeGen/AArch64/arm64-cse.ll Wed Nov 19 00:39:53
> 2014
> @@ -1,4 +1,4 @@
> -; RUN: llc -O3 < %s -aarch64-atomic-cfg-tidy=0 | FileCheck %s
> +; RUN: llc -O3 < %s -aarch64-atomic-cfg-tidy=0
> -aarch64-gep-opt=false | FileCheck %s
>  target triple = "arm64-apple-ios"
>  
>  ; rdar://12462006
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-commits mailing list