[llvm-dev] Wrong code bug after GVN/PRE?

Fri Jan 13 10:31:26 PST 2017

On 1/13/2017 12:31 AM, Mikael Holmén via llvm-dev wrote:
> Hi,
>
> I've stumbled upon a case where I think gvn does a bad (wrong) 
> optimization. It's a bit messy to debug though so I'm not sure if I 
> should just write a PR about it a let someone who knows the code look 
> at it instead.
>
> Anyway, for the bug to trigger I need to run the following passes in 
> the same opt invocation:
>  -sroa -instcombine -simplifycfg -instcombine -gvn
>
> The problem seems to be that GVN::PerformLoadPRE triggers and I see a
>
> GVN REMOVING PRE LOAD:   %_tmp79 = load i16, i16* %_tmp78, align 2
>
> printout.
>
> If I instead first run
>
>  -sroa -instcombine -simplifycfg -instcombine
>
> and then
>
>  -gvn
>
> I don't get the
>
> GVN REMOVING PRE LOAD
>
> printout, and the resulting code looks ok to me.
>
> Is this expected? Should the output differ in these two cases?
>
>
> The input to gvn looks the same when running all passes and just gvn, 
> but I see a difference in how GVN::processNonLocalLoad(LoadInst *LI) 
> behaves:
>
> I get different results from
>
>   // Step 1: Find the non-local dependencies of the load.
>   LoadDepVect Deps;
>   MD->getNonLocalPointerDependency(LI, Deps);
>
> So we get different results from MemoryDependenceResults when invoking 
> gvn alone or after a bunch of other passes.
>
> Is this expected or not?
>
>
> I tried to dig into 
> MemoryDependenceResults::getNonLocalPointerDependency but I have to 
> say I'm quite lost.
>
> I ran some git bisect and as far as I can tell it starts going wrong 
> with commit
>
>  [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
>
> 2016-09-01, but that commit doesn't change GVN/MemoryDependenceResults 
> so I suppose it just changes the input to GVN so the problem suddenly 
> surfaces.
>
> Finally, the problem that I see is:
>
> In the input we have something like
>
>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>     {
>         lb[step1] = step1 + 7;
>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>
>         switch(lb[step1]) {
>         case 7:    CVAL_VERIFY(step1 == 0);
>             CVAL_VERIFY(ub[step1] == 10);
>                         __label(511);
>             break;
>         case 8:    CVAL_VERIFY(step1 == 1);
>             CVAL_VERIFY(ub[step1] == 100);
>                         __label(512);
>             break;
>         case 9:    CVAL_VERIFY(step1 == 2);
>             CVAL_VERIFY(ub[step1] == 1000);
>                         __label(513);
>             break;
>         default:;
>         }
>
>                 [...]
>     }
>
>     int i, j;
>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>     {
>         CVAL_VERIFY(ub[i] == j);
>     }
>
> So we have two loops. In the first one the array ub is written (low 
> index first, high index last), and then in the second loop the content 
> of ub is checked (low index first, high index last).
>
> After gvn this is changed into something like
>
>         int tmp;
>     for (int step1 = 0; step1 < LOOP_AMOUNT; step1++)
>     {
>         lb[step1] = step1 + 7;
>         ub[step1] = (step1 == 0 ? 10: ub[step1-1]*10);
>
>         switch(lb[step1]) {
>         case 7:    CVAL_VERIFY(step1 == 0);
>                         tmp = ub[step1];
>             CVAL_VERIFY(tmp == 10);
>                         __label(511);
>             break;
>         case 8:    CVAL_VERIFY(step1 == 1);
>                         tmp = ub[step1];
>             CVAL_VERIFY(tmp == 100);
>                         __label(512);
>             break;
>         case 9:    CVAL_VERIFY(step1 == 2);
>                         tmp = ub[step1];
>             CVAL_VERIFY(tmp == 1000);
>                         __label(513);
>             break;
>         default:;
>         }
>
>                 [...]
>     }
>
>     int i, j;
>     for ( j = 10, i = 0; j < 101; j=j*10, i++ )
>     {
>         CVAL_VERIFY((i == 0 ? tmp : ub[i]) == j);
>     }
>
> Note the introduced variable tmp.
>
> Also note that with this rewrite, after the first loop tmp will 
> contain the value of the element at the highest index in ub, and then 
> we use tmp in the first round of the second loop, but there we expect 
> the value of the lowest index element in ub.
>
> Any ideas? Should I file a PR? 

Ooh, wow, that's nasty: it looks like PRE is somehow deciding that 
ub[step1] and ub[0] are equivalent.  Please file a bug.

The reason you're having trouble splitting the "opt" invocation is that 
you aren't passing in "-preserve-ll-uselistorder"; it usually doesn't 
matter, but occasionally you'll run into an optimization which is 
sensitive to it.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project