[LLVMdev] RFC: Missing canonicalization in LLVM

Wed Jan 21 14:18:33 PST 2015

On Wed, Jan 21, 2015 at 2:16 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> So, we've run into some test cases which are pretty alarming.
>
> When inlining code in various different paths we can end up with this IR:
>
> define void @f(float* %value, i8* %b) {
> entry:
>   %0 = load float* %value, align 4
>   %1 = bitcast i8* %b to float*
>   store float %0, float* %1, align 1
>   ret void
> }
>
> define void @g(float* %value, i8* %b) {
> entry:
>   %0 = bitcast float* %value to i32*
>   %1 = load i32* %0, align 4
>   %2 = bitcast i8* %b to i32*
>   store i32 %1, i32* %2, align 1
>   ret void
> }
>
> Now, I don't really care one way or the other about these two IR inputs,
> but it's pretty concerning that we get these two equivalent bits of code
> and nothing canonicalizes to one or the other.
>
> So, the naive first blush approach here would be to canonicalize on the
> first -- it has fewer instructions after all -- but I don't think that's
> the right approach for two reasons:
>
> 1) It will be a *very* narrow canonicalization that only works with overly
> specific sets of casted pointers.
> 2) It doesn't effectively move us toward the optimizer treating IR with
> different pointee types for pointer types indistinguishably. Some day, I
> continue to think we should get rid of the pointee types entirely.
>
> To see why #1 and #2 are problematic, assume another round of inlining
> took place and we suddenly had the following IR:
>
>
And the missing IR example:

define void @f(i8* %value, i8* %b) {
entry:
  %0 = bitcast i8* %value to float*
  %1 = load float* %0, align 4
  %2 = bitcast i8* %b to float*
  store float %0, float* %1, align 1
  ret void
}

define void @g(i8* %value, i8* %b) {
entry:
  %0 = bitcast i8* %value to i32*
  %1 = load i32* %0, align 4
  %2 = bitcast i8* %b to i32*
  store i32 %1, i32* %2, align 1
  ret void
}

>
> AFAICT, this is the same and we still don't have a good canonicalization
> story.
>
> What seems like the obvious important and missing canonicalization is that
> when we have a loaded value that is *only* used by storing it back into
> memory, we don't canonicalize the type of that *value* (ignoring the
> pointer types) to a single value type.
>
> So, the only really suitable type for this kind of stuff is 'iN' where N
> matches the number of bits loaded or stored.
>
> I have this change implemented. It is trivial and unsurprising. However,
> the effects of this are impossible to predict so I wanted to make sure it
> made sense to others. Essentially, I expect random and hard to track down
> performance fluctuations across the board. Some things may get better,
> others may get worse, and they will probably all be bugs elsewhere in the
> stack.
>
> So, thoughts?
> -Chandler
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/38fb85fc/attachment.html>