[PATCH] D45748: [RISCV] Separate base from offset in lowerGlobalAddress

Thu May 17 07:26:45 PDT 2018

asb added a comment.

In https://reviews.llvm.org/D45748#1101638, @sabuasal wrote:

> In https://reviews.llvm.org/D45748#1100079, @asb wrote:
>
> > I looked at this in some more detail. I have a few very rough statistics on the effect of the various peepholes on the GCC torture suite. I don't claim this is representative of real code, but of course global accesses are highly codebase dependent anyway. Stats are just on the total static instruction count, i.e. it doesn't take into account if instructions are moved from hot to cold basic blocks or vice versa.
> >
> > Improvement of your proposed peephole over your patch with the peephole disabled:
> >
> > - 26 files have reduced number of instructions
> > - 10 have an increased number of instructions
> > - 6 have changed output but the same static instruction count
> >
> >   If you then add my proposed peephole on top of that, then you see the following changes (baseline = this patch including your peephole)
> > - 23 files have reduced number of instructions
> > - 11 have increased number of instructions
> > - 2 have changed output but the same static instruction count
>
>
> So you are saying your addition hurts size, right?

No, the baseline is this patch and its combines, so it reduces instruction count in a range of cases. Hard to say how representative that is, but that's true of all these discussions!

>> If it were doable, the ideal starting point would be the following:
>> 
>> - If the global base has only a single reference across the whole function, or every reference has the same offset then combine the global with offset
>> - If the global base has multiple references with different offsets, then never combine the global with the offset
> 
> This is exactly what I am doing in my peephole:
> 
> - Catch a tail node (RISCV::ADDI or RISCV::ADD)
> - Look at the operands of the node t detect a global address lowering sequence (LUI  %hi followed by and ADDI %lo)
> - Look at the other operands of the tail node to catch the and reconstruct the offset. (it can be just an operand if the tail is an ADDI, or you might have to reconstruct from an LUI +  ADDI pair if the offset is big or just an LUI)
> - Merge the offset back into the GlobalAddress if the global address only had one use. We can trust that at this point because ALL the blocks have RISCV target nodes.

Sorry for the confusion. I was being quite precise in saying referring to "references to a global base" rather than discussing uses of a particular globaladdress node. As the recent example I posted shows, there are cases where there are multiple references to the same global base across a function but this isn't reflected in multiple uses of the same node. As I say, your peephole seems thorough and well written, there's just no way to capture this case with it. Maybe if someone really wanted to tackle that issue it would be done as an IR transformation, to ensure a common globaladdress reference is reused.

>> I think the above two rules would be sensible regardless of whether you are linking statically or e.g. dynamic linking with PIC. Unfortunately we don't have access to that information at the point we'd like to make a decision about combining an offset. A particular lui+addi of a global may have only a single use, but that global may be referenced many times in the function. New TargetGlobalAddress nodes are introduced each time rather than reusing them. In these cases, MachineCSE can remove redundant instructions as long as different offsets aren't folded into the global. Unless we can force the reuse of identical TargetGlobalAddress nodes we can't easily write a peephole pass that respects the two rules proposed above. This input shows the problem well:
>> 
>>   @a = internal unnamed_addr global [4 x i32] zeroinitializer, align 4
>>   
>>   ; Function Attrs: noreturn nounwind
>>   define dso_local i32 @main() local_unnamed_addr #0 {
>>   entry:
>>     %0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 0), align 4
>>     %cmp = icmp eq i32 %0, 0
>>     br i1 %cmp, label %if.end, label %if.then
>>   
>>   if.then:                                          ; preds = %entry
>>     tail call void @abort() #3
>>     unreachable
>>   
>>   if.end:                                           ; preds = %entry
>>     %1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 1), align 4
>>     %cmp1 = icmp eq i32 %1, 3
>>     br i1 %cmp1, label %if.end3, label %if.then2
>>   
>>   if.then2:                                         ; preds = %if.end
>>     tail call void @abort() #3
>>     unreachable
>>   
>>   if.end3:                                          ; preds = %if.end
>>     %2 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 2), align 4
>>     %cmp4 = icmp eq i32 %2, 2
>>     br i1 %cmp4, label %if.end6, label %if.then5
>>   
>>   if.then5:                                         ; preds = %if.end3
>>     tail call void @abort() #3
>>     unreachable
>>   
>>   if.end6:                                          ; preds = %if.end3
>>     %3 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 3), align 4
>>     %cmp7 = icmp eq i32 %3, 1
>>     br i1 %cmp7, label %if.end9, label %if.then8
>>   
>>   if.then8:                                         ; preds = %if.end6
>>     tail call void @abort() #3
>>     unreachable
>>   
>>   if.end9:                                          ; preds = %if.end6
>>     tail call void @exit(i32 0) #3
>>     unreachable
>>   }
>>   
>>   declare void @abort()
>>   
>>   declare void @exit(i32)
> 
> Let me take a closer look at this test case. But I think SelectionDAG builder should be able 
>  to merge the TargetNodes if they refer to the same address.
> 
> Update: I looked at this test case, here is the ASM we are getting with this patch (separate + peepholes I added). It appears that the lowered target was actually reused (we only have one lui).

I'm sure that's not what I was seeing, let me take another look to see if I was doing something dumb.

https://reviews.llvm.org/D45748