[PATCH] D140665: [SelectionDAG][RISCV][X86][AArch64][AMDGPU][PowerPC] Improve SimplifyDemandedBits for SHL with NUW/NSW flags.

Tue Dec 27 10:07:37 PST 2022

craig.topper added a comment.

In D140665#4017570 <https://reviews.llvm.org/D140665#4017570>, @lebedev.ri wrote:

> In D140665#4017550 <https://reviews.llvm.org/D140665#4017550>, @spatel wrote:
>
>> It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.
>>
>> I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.
>
> I'm not sure what this is trying to fix originally,
> but i think the test changes clearly show that we should be going in the opposite direction.

I'll try to fill in the original problem. See the test in aext-to-zext.ll

We have initially have this DAG

  Initial selection DAG: %bb.0 'read:entry'                                        
  SelectionDAG has 19 nodes:                                                       
    t0: ch,glue = EntryToken                                                       
    t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
    t3: i64 = Constant<0>                                                          
    t11: i16 = Constant<8>                                                         
                t8: i64 = add nuw t2, Constant:i64<1>                              
              t9: i8,ch = load<(load (s8) from %ir.arrayidx1)> t0, t8, undef:i64   
            t10: i16 = zero_extend t9                                              
          t13: i16 = shl nuw t10, Constant:i64<8>                                  
            t5: i8,ch = load<(load (s8) from %ir.adr)> t0, t2, undef:i64           
          t6: i16 = zero_extend t5                                                 
        t14: i16 = or t13, t6                                                      
      t15: i64 = zero_extend t14                                                   
    t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15                            
    t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

Part way through the first DAGCombine, t10 zero_extend is replaced by an any_extend because the extended bits aren't demanded. The users are a shl by 8 and a zero_extend that reads bits 63:16 of the shl.

The extends get combined with loads to leave this DAG

  Optimized lowered selection DAG: %bb.0 'read:entry'                              
  SelectionDAG has 15 nodes:                                                       
    t0: ch,glue = EntryToken                                                       
    t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
              t8: i64 = add nuw t2, Constant:i64<1>                                
            t20: i16,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
          t13: i16 = shl nuw t20, Constant:i64<8>                                  
          t21: i16,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
        t14: i16 = or t13, t21                                                     
      t15: i64 = zero_extend t14                                                   
    t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15                            
    t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

We type legalize and optimize to this

  SelectionDAG has 16 nodes:                                                       
    t0: ch,glue = EntryToken                                                       
    t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
              t8: i64 = add nuw t2, Constant:i64<1>                                
            t22: i64,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
          t23: i64 = shl nuw t22, Constant:i64<8>                                  
          t24: i64,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
        t25: i64 = or t23, t24                                                     
      t27: i64 = and t25, Constant:i64<65535>                                      
    t17: ch,glue = CopyToReg t0, Register:i64 $x10, t27                            
    t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1 

The t27 'and' would be unnecessary if we had two zextloads instead of a zextload and an aextload.

This patch happens to prevent the original zero_extend from becoming an any_extend. Which solves the problem, but I don't think it's robust.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140665/new/

https://reviews.llvm.org/D140665