Wrong value for sext i1 true in BasicAliasAnalysis

Tue Dec 23 11:43:38 PST 2014

----- Original Message -----
> From: "Nicholas White" <n.j.white at gmail.com>
> To: "Andrew Trick" <atrick at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Yin Ma" <yinma at codeaurora.org>, "llvm commits" <llvm-commits at cs.uiuc.edu>,
> "Arnold Schwaighofer" <aschwaighofer at apple.com>
> Sent: Monday, December 22, 2014 6:15:34 PM
> Subject: Re: Wrong value for sext i1 true in BasicAliasAnalysis
> 
> Hi Andy, Hal -
> 
> 
> > If you're only handling extension to a pointer-width, only one
> > extension can possible matter. I think handling extensions and
> > truncations to/from arbitrary width would be way too complicated.
> 
> 
> I ended up having to do this due to cases like this:
> 
> define void @test_zext_sext_amounts(i8* %mem, i8 %num) {
> %sext.1 = sext i8 %num to i16
> %sext.zext.1 = zext i16 %sext.1 to i64
> %sext.2 = sext i8 %num to i32
> %sext.zext.2 = zext i32 %sext.2 to i64
> %a = getelementptr inbounds i8* %mem, i64 %sext.zext.1
> %b = getelementptr inbounds i8* %mem, i64 %sext.zext.2
> ret void
> }
> 

It is not clear to me that this is a problem worth solving. We want BasicAA to give good answers on optimized IR. Expecting it to work just as well on un-optimized IR is not reasonable. If I run your function through opt -O3 I get this:

define void @test_zext_sext_amounts(i8* %mem, i8 %num) #0 {
  %sext.1 = sext i8 %num to i64
  %sext.zext.1 = and i64 %sext.1, 65535
  %sext.zext.2 = and i64 %sext.1, 4294967295
  %a = getelementptr inbounds i8* %mem, i64 %sext.zext.1
  %b = getelementptr inbounds i8* %mem, i64 %sext.zext.2
  ...
}

and this seems fairly general. sext(zext(y)) is not interesting because the sign bit is known to be zero, sext(sext(y)) and zext(zext(y)) are also not interesting, so only the zext(sext(y)) case is non-trivial, but the interesting path seems to have a maximum depth of two. Also, I'd concentrate on handling the IR in its optimized canonical form (which uses the 'and' instruction as above) not the zext(sext(y)) form.

> 
> The current tip returns MustAlias, a false positive, as it just looks

With TOT today, I get 'PartialAlias' for (%a <-> %b) with both your original IR and the optimized variant. I get 'MayAlias' if I run without a datalayout specified. How are you getting a MustAlias?

Thanks again,
Hal

> at the most recent extension (EK_SignExt) without regard to the
> "extension path" taken from %num to %a and %b. In this case, %a and
> %b should only PartialAlias - and I believe the complexity of the
> current state of http://reviews.llvm.org/D6682 (+277, -49 lines,
> constant runtime overhead as the SmallVector is at most length 6) is
> worth it as false positives sometimes trigger downstream
> optimisations that can result in miscompilations? If there's a
> better way to keep track of the "extension path" than a SmallVector
> of std::pairs I'd gladly use it though! Better yet, if there's a way
> of avoiding false positives without tracking all this state I'd
> definitely use it.
> 
> > I also don't understand, for example, why it is correct to truncate
> > and re-extend the already-computed Offset and Scale from the
> > caller.
> > Offset += Const->getValue().zextOrSelf(Offset.getBitWidth());
> 
> I still need to put more comments in here - this change is mostly to
> address Hal's comment. The current version of the path doesn't
> down-size the Offset and Scale APInts before passing them into the
> recursive GetLinearExpression calls - so to avoid the APInt
> arithmetic failing assertions I've used zextOrSelf (which I should
> really change to trunc, as it's always downsizing) to ensure it's
> downsized to the correct width. It's not clear whether doing it this
> way is clearer (or whether there's an even better way of doing this)
> - the BinaryOperator logic is more confusing but the SExtInst /
> ZExtInst logic is clearer.
> 
> > as this would cover the remaining cases.
> 
> I've covered this by adding an extra heuristic to aliasGEP (starting
> line 1189 in the patched file) so I don't have to pollute
> GetIndexDifference and GetLinearExpression with variable-valued
> offsets. This seems cleaner to me, but the change is definitely
> first-draft quality code (although I've tried to explain what it's
> attempting in the comments). Do you think it's a viable solution?
> 
> Thanks -
> 
> Nick
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory