[llvm-dev] Strange behaviour of post-legalising optimisations(?)

Wed Jun 5 08:58:01 PDT 2019

I come across a situation that I am having a hard time to understand. 

When I compile the following code :

char *tst( char *dest, const char *src, unsigned int len )
{
  for (int i=0 ; i<len ; i++) {
    dest[i] = src[i];
  } 
return dest;
}

Clang generates this for the ‘for’ body:

for.body:                                         ; preds = %for.cond
  %arrayidx = getelementptr inbounds i8, i8* %src, i16 %i.0
  %0 = load i8, i8* %arrayidx, align 1, !tbaa !2
  %arrayidx1 = getelementptr inbounds i8, i8* %dest, i16 %i.0
  store i8 %0, i8* %arrayidx1, align 1, !tbaa !2
  %inc = add nuw nsw i16 %i.0, 1
  br label %for.cond

This gets converted into this by llc:

Creating new node: t2: i16,ch = CopyFromReg t0, Register:i16 %3
Creating new node: t4: i16,ch = CopyFromReg t0, Register:i16 %0
Creating new node: t5: i16 = add t2, t4
Creating constant: t6: i16 = Constant<0>
Creating new node: t7: i16 = undef
Creating new node: t8: i8,ch = load<(load 1 from %ir.scevgep1, !tbaa !2)> t0, t5, undef:i16
Creating new node: t10: i16,ch = CopyFromReg t0, Register:i16 %2
Creating new node: t11: i16 = add t10, t4
Creating new node: t12: ch = store<(store 1 into %ir.scevgep, !tbaa !2)> t8:1, t8, t11, undef:i16
Creating constant: t13: i16 = Constant<1>
Creating new node: t14: i16 = add nuw nsw t4, Constant:i16<1>
Creating new node: t16: ch = CopyToReg t0, Register:i16 %1, t14
Creating new node: t17: ch = TokenFactor t16, t12
Creating new node: t19: ch = br t17, BasicBlock:ch<for.cond 0x10983ff38>
Initial selection DAG: %bb.3 'tst:for.body'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
  t4: i16,ch = CopyFromReg t0, Register:i16 %0
  t6: i16 = Constant<0>
      t2: i16,ch = CopyFromReg t0, Register:i16 %3
    t5: i16 = add t2, t4
  t8: i8,ch = load<(load 1 from %ir.scevgep1, !tbaa !2)> t0, t5, undef:i16
        t14: i16 = add nuw nsw t4, Constant:i16<1>
      t16: ch = CopyToReg t0, Register:i16 %1, t14
          t10: i16,ch = CopyFromReg t0, Register:i16 %2
        t11: i16 = add t10, t4
      t12: ch = store<(store 1 into %ir.scevgep, !tbaa !2)> t8:1, t8, t11, undef:i16
    t17: ch = TokenFactor t16, t12
  t19: ch = br t17, BasicBlock:ch<for.cond 0x10983ff38>

This is ok: 
The array indexes get replaced by ‘add’ instructions (see t5 and t11) that later on can be folded into appropriate load/store addressing modes.
So far so good.

However, if I replace the original code by this:

char *tst( char *dest, const char *src, unsigned int len )
{
  if ( dest  ) {
    for (int i=0 ; i<len ; i++) {
      dest[i] = src[i];
    }
  }

return dest;
}

I get IDENTICAL output from Clang for the ‘for’ body

But totally different code from llc:

Total amount of phi nodes to update: 0
Creating new node: t2: i16,ch = CopyFromReg t0, Register:i16 %1
Creating constant: t3: i16 = Constant<0>
Creating new node: t4: i16 = undef
Creating new node: t5: i8,ch = load<(load 1 from %ir.lsr.iv1, !tbaa !2)> t0, t2, undef:i16
Creating new node: t7: i16,ch = CopyFromReg t0, Register:i16 %2
Creating new node: t8: ch = store<(store 1 into %ir.lsr.iv, !tbaa !2)> t5:1, t5, t7, undef:i16
Creating constant: t9: i16 = Constant<1>
Creating new node: t10: i16 = add t7, Constant:i16<1>
Creating new node: t12: ch = CopyToReg t0, Register:i16 %3, t10
Creating new node: t13: i16 = add t2, Constant:i16<1>
Creating new node: t15: ch = CopyToReg t0, Register:i16 %4, t13
Creating new node: t17: i16,ch = CopyFromReg t0, Register:i16 %0
Creating constant: t18: i16 = Constant<-1>
Creating new node: t19: i16 = add t17, Constant:i16<-1>
Creating new node: t21: ch = CopyToReg t0, Register:i16 %5, t19
Creating new node: t22: ch = TokenFactor t12, t15, t21, t8
Creating new node: t24: ch = br t22, BasicBlock:ch<for.cond 0x10985e000>
Initial selection DAG: %bb.3 'tst:for.body'
SelectionDAG has 25 nodes:
  t0: ch = EntryToken
  t2: i16,ch = CopyFromReg t0, Register:i16 %1
  t3: i16 = Constant<0>
  t5: i8,ch = load<(load 1 from %ir.lsr.iv1, !tbaa !2)> t0, t2, undef:i16
  t7: i16,ch = CopyFromReg t0, Register:i16 %2
        t10: i16 = add t7, Constant:i16<1>
      t12: ch = CopyToReg t0, Register:i16 %3, t10
        t13: i16 = add t2, Constant:i16<1>
      t15: ch = CopyToReg t0, Register:i16 %4, t13
          t17: i16,ch = CopyFromReg t0, Register:i16 %0
        t19: i16 = add t17, Constant:i16<-1>
      t21: ch = CopyToReg t0, Register:i16 %5, t19
      t8: ch = store<(store 1 into %ir.lsr.iv, !tbaa !2)> t5:1, t5, t7, undef:i16
    t22: ch = TokenFactor t12, t15, t21, t8
  t24: ch = br t22, BasicBlock:ch<for.cond 0x10985e000>

Now, instead of using array indexes with ‘add’ instructions (which can be folded later on into load/store addressing modes), LLVM choses to directly increment the pointers!. This is suboptimal -at least on my architecture- because at this point it’s impossible to select optimal addressing modes. Instead, the pointers must be explicitly incremented with additional instructions as llvm dictates.

Any ideas about why this happens?. Particularly, what could cause this change of behaviour just by adding an ‘if’ before the ‘for’?  Note that the IR code for the for ‘for’ body is IDENTICAL in both cases, so this is definitely a LLVM thing. 

Joan Lluch

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190605/1e86f68f/attachment.html>