[PATCH] D45748: [RISCV] Add peepholes for Global Address lowering patterns

Wed May 23 23:42:41 PDT 2018

asb accepted this revision.
asb added a comment.
This revision is now accepted and ready to land.

The point I was trying to make is that even with these peepholes, there are trivial cases that aren't caught, such as the example I shared:

  @foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0], align 2                                     

  define i32 @main() nounwind {                                                                                   
  entry:                                                                                                          
    %0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]* @foo, i32 0, i32 4), align 2                
    %cmp = icmp eq i16 %0, 140                                                                                    
    br i1 %cmp, label %if.end, label %if.then                                                                     

  if.then:                                                                                                        
    tail call void @abort()                                                                                       
    unreachable                                                                                                   

  if.end:                                                                                                         
    ret i32 0                                                                                                     
  }                                                                                                               

  declare void @abort()

This generates 3 instructions for the global access when you'd obviously prefer to use two:

  lui     a0, %hi(foo)
  addi    a0, a0, %lo(foo)
  lhu     a0, 8(a0)

I proposed one way of adding a peephole, but don't see a way of differentiating between the case above and more complex cases where the peephole is counter-productive (such as in the longer IR sample I shared). Possible the solution would be to have an IR pass perform common subexpression elimination on the getelementptr calculations - possibly such a pass already exists.

Anyway, is it really important? Probably not going to register in code profiling and probably a tiny impact on code size. I mainly raise it as I know we will all be poring over assembly output to look for missed optimisation opportunities and cases like the above stick out like a sore thumb. It's also the sort of thing compiler users do, and then confidently conclude that the compiler must be dumb.

I'm happy to land this patch as-is. It generates good code in a wide variety of cases (more so than many other backends). But please add the above example to your test file with a note about the missed optimisation opportunity.

https://reviews.llvm.org/D45748