[PATCH] D67281: [AArch64][SimplifyCFG] Add additional cost for instructions in mergeConditionalStoreToAddress

Fri Sep 6 14:37:24 PDT 2019

efriedma added a subscriber: lebedev.ri.
efriedma added a comment.

I'm trying to understand the issue you're seeing... I guess it comes down to something like the following?

Before:

          tst     x18, x1
          b.eq    .LBB0_10
  .LBB0_9:
          orr     x16, x16, x18
          add     w0, w0, #1
          str     xzr, [x13, #56]
  .LBB0_10:
          cbz     x11, .LBB0_7

After:

  and     x2, x18, x1
  orr     x3, x16, x18
  tst     x18, x1
  orr     x2, x2, x11
  cinc    w0, w0, ne
  csel    x16, x16, x3, eq
  cbz     x2, .LBB0_7

I agree, five instructions is probably too many to speculate to eliminate a store.  But the patch doesn't really reflect the actual costs here, which largely have to do with the PHI->select transform rather than the actual arithmetic instructions.

Here's the relevant IR after the store merging transform:

  if.then:                                          ; preds = %for.body12
    %or = or i64 %res_in7.046, %bit.044
    %inc = add nsw i32 %retval1.243, 1
    br label %if.end

  if.end:                                           ; preds = %if.then, %for.body12
    %retval1.3 = phi i32 [ %inc, %if.then ], [ %retval1.243, %for.body12 ]
    %res_in7.1 = phi i64 [ %or, %if.then ], [ %res_in7.046, %for.body12 ]
    br i1 %tobool14, label %for.inc, label %if.then15

  if.then15:                                        ; preds = %if.end
    br label %for.inc

  for.inc:                                          ; preds = %if.then15, %if.end
    %simplifycfg.merge = phi i8* [ null, %if.then15 ], [ null, %if.end ]
    %4 = xor i1 %tobool, true
    %5 = xor i1 %tobool14, true
    %6 = or i1 %4, %5
    br i1 %6, label %7, label %8

  7:                                                ; preds = %for.inc
    store i8* %simplifycfg.merge, i8** %proc, align 8
    br label %8

  8:                                                ; preds = %for.inc, %7
    %inc18 = add nuw nsw i32 %j.047, 1
    %shl = shl i64 %bit.044, 1
    %exitcond = icmp eq i32 %inc18, 64
    br i1 %exitcond, label %for.cond.cleanup11, label %for.body12

Some of the branches collapse; at this point, we've basically traded a store for a branch which is likely predictable, assuming we turn the "or i1" back into a branch.   That's maybe okay, I guess?  But then we decide "if.then" is small enough to "predicate" it at the IR level, and we never reverse that decision when it turns out that doesn't simplify anything.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:179
+    case Instruction::And:
+    case Instruction::Xor:
+      return true;
----------------
This makes no sense; integer logic should be cheap.

================
Comment at: llvm/test/Transforms/SimplifyCFG/AArch64/check-instr-cost-for-folding.ll:2
+; RUN: opt < %s -mtriple=aarch64-linux-gnu -simplifycfg -check-speculation-cost=true -S >%t
+; RUN: FileCheck %s < %t
+; ModuleID = 'do_select.c'
----------------
This testcase is way too complicated given the change you're making.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67281/new/

https://reviews.llvm.org/D67281