[llvm-dev] inttoptr and noalias returns
    Nuno Lopes via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Fri Apr 16 11:20:05 PDT 2021
    
    
  
As far as I know there's no one working on this stuff.
 
The plan is to restrict the "inttoptr(ptrtoint(x))->x" optimization to the
safe cases. And then make alias analysis less conservative when dealing with
inttoptr (instead of always giving up). Plus make sure optimizations don't
produce inttoptr (this bit has improved *a lot* in the past year).
There isn't a major pressure to fix all this, as inttoptr is not very common
(in most C/C++ programs, at least).
 
That said, this year we got many applications for google summer of code of
students wanting to fix bugs, so this one actually came to my mind. Let's
see how many slots we get..
 
Nuno
 
 
From: Joseph Tremoulet <jotrem at microsoft.com> 
Sent: 16 April 2021 18:20
To: Nuno Lopes <nunoplopes at sapo.pt>
Cc: llvm-dev at lists.llvm.org
Subject: RE: [EXTERNAL] RE: [llvm-dev] inttoptr and noalias returns
 
Thank you, that's super helpful.
 
Do we have plans/proposals for how to avoid this?  I gather it involves
stopping optimization from blindly folding inttoptr(ptrtoint(x))->x, and
you've mentioned making sure we avoid introducing inttoptr+ptrtoint
unnecessarily, is the plan just those things?  You also mentioned augmenting
inttoptr w/ inbounds and that the folding is "correct in some cases", does
that mean we have plans (or at least a desire) to formulate refined rules
for when the folding is possible that will allow more optimization?  The
slide deck discusses separating the notions of logical pointers vs physical
pointers, is that something that anybody is working on changing the code to
model?
 
For context, I'm working on a front-end for a language that I can't change
whose type system doesn't really distinguish between native pointers and
pointer-sized integers, so I can only do so much to avoid creating
ptrtoint/inttoptr in the first place.  But there are some constructs we can
recognize as allocations, I'm hoping to be able to iteratively re-type
arithmetic trees rooted at those as pointers/geps.
 
 
Thanks,
-Joseph
 
 
 
From: Nuno Lopes <nunoplopes at sapo.pt <mailto:nunoplopes at sapo.pt> > 
Sent: Friday, April 16, 2021 12:48 PM
To: Joseph Tremoulet <jotrem at microsoft.com <mailto:jotrem at microsoft.com> >
Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> 
Subject: [EXTERNAL] RE: [llvm-dev] inttoptr and noalias returns
 
That's a very long story.. let me try to summarize why you can't do
"inttoptr(ptrtoint(x)) -> x" *blindly* (it's correct in some cases).
 
1.	Integers carry no provenance information and can be interchanged at
will.
This means that this transformation is always correct:
if (x == y)
  f(x);
=>
if (x == y)
  f(y);
 
 
2.	There are many pointers whose addresses are equal. For example:
char p[n];
char q[m];
char r[3];
 
We may have that (int)(p+n) == (int)q == (int)(r-m).
Even if we focus just on inbounds pointers (because we e.g. augmented
inttoptr to have an inbounds tag), we can still have 2 pointers with the
same address: p+n & q.
 
 
3.	Pointers have provenance. You can't use p+n to change memory of q.
p[n] = 42; // UB, to make the life of the alias analysis easier
 
 
If we put the three pieces together, we get that it's possible for the
compiler to swap a ptrtoint of a dereferenceable pointer with something else
and then if you blindly fold the ptrtoint/inttoptr chain, you get a wrong
pointer. Something like:
 
int x = p + n;
int y = q;
if (x == y)
  *(char*)y = 3;
 
=>  (GVN)
 
int x = p + n;
int y = q;
if (x == y)
  *(char*)x = 3;
 
=>  (invalid fold of inttoptr/ptrtoin chain)
 
int x = p + n;
int y = q;
if (x == y)
  *(p+n) = 3;
 
=>  (access OOB is UB)
 
int x = p + n;
int y = q;
if (x == y)
  UB;
 
 
I've a few slides on LLVM's AA that may help:
https://web.ist.utl.pt/nuno.lopes/pres/pointers-eurollvm18.pptx
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fweb.ist.u
tl.pt%2Fnuno.lopes%2Fpres%2Fpointers-eurollvm18.pptx&data=04%7C01%7Cjotrem%4
0microsoft.com%7Cc59392ee73544672120908d900f76d30%7C72f988bf86f141af91ab2d7c
d011db47%7C1%7C0%7C637541886804434751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ww2BElNYZQ
n%2FReivAtAt4BZfrK7LwoZvsmL2NDqmRqA%3D&reserved=0> 
 
Nuno
 
 
From: Joseph Tremoulet <jotrem at microsoft.com <mailto:jotrem at microsoft.com> >
Sent: 16 April 2021 15:48
To: Nuno Lopes <nunoplopes at sapo.pt <mailto:nunoplopes at sapo.pt> >
Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> 
Subject: RE: [EXTERNAL] RE: [llvm-dev] inttoptr and noalias returns
 
> otherwise relies on the incorrect transformation "inttoptr(ptrtoint(x)) ->
x"
 
Could you point me to an example/explanation of why that transformation is
incorrect?  It's not clear to me from the LangRef.
 
 
> A big issue with LLVM's static analysis is caching, since everything is
done lazily. If you want to add something more expensive to BasicAA, you
need to make sure that information is cached somehow to avoid recomputing it
a thousand times. Compilation time is quite sensitive to the performance of
BasicAA.
 
The IsCapturedCache in AAQueryInfo is pretty close to what I'm after, but I
don't really understand why the code in aliasCheck is using the weaker
isEscapeSource as opposed to !isNonEscapingLocalObject.
 
 
> escape pointers just like ptrtoint does
 
Yeah, so if the rule for ptrtoint is simply that the source pointer escapes,
then I'd think we could take advantage of the flip side of that and
isEscapeSource could return true for inttoptr, without needing expensive
analysis/caching.  But I know this can be a subtle area, so I'm not sure
that's the rule.  I see [1] that Ryan Taylor added discussing it to the
agenda for the February AA conference call, I'm curious what the outcome of
that was.
 
 
Thanks,
-Joseph
 
1 - https://lists.llvm.org/pipermail/llvm-dev/2021-February/148671.html
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llv
m.org%2Fpipermail%2Fllvm-dev%2F2021-February%2F148671.html&data=04%7C01%7Cjo
trem%40microsoft.com%7Cc59392ee73544672120908d900f76d30%7C72f988bf86f141af91
ab2d7cd011db47%7C1%7C0%7C637541886804444708%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0%2B
KVGy1WVr26nQOq%2BWsbXi73DNigGSq%2Fzh87%2Bwqsmng%3D&reserved=0> 
 
 
From: Nuno Lopes <nunoplopes at sapo.pt <mailto:nunoplopes at sapo.pt> > 
Sent: Thursday, April 15, 2021 6:37 AM
To: Joseph Tremoulet <jotrem at microsoft.com <mailto:jotrem at microsoft.com> >
Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> 
Subject: [EXTERNAL] RE: [llvm-dev] inttoptr and noalias returns
 
You're right that LLVM is very conservative in handling inttoptr. And
otherwise relies on the incorrect transformation "inttoptr(ptrtoint(x)) ->
x" to get rid of inttoptr.
I agree the store should have been removed in your second example. I guess
inttoptr is not frequently used, and even less after a bunch of fixes to
prevent optimizers from creating new ones.
 
BasicAA is quite basic, but that's all LLVM has. The other alias analyses in
git are either not useful in practice, unfinished or buggy. (I haven't
looked into that dir in a couple of years, so things may have changed in the
meantime).
A big issue with LLVM's static analysis is caching, since everything is done
lazily. If you want to add something more expensive to BasicAA, you need to
make sure that information is cached somehow to avoid recomputing it a
thousand times. Compilation time is quite sensitive to the performance of
BasicAA.
 
Although there's no definitive semantics for pointer comparisons yet
(soonish I hope), LLVM's behavior implies that pointer comparisons indeed
escape pointers just like ptrtoint does (except if the two pointers being
compared are inbounds and point to the same object, and therefore the
comparison is only around offsets and thus their address doesn't leak).
 
Nuno
 
 
From: llvm-dev <llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org> > On Behalf Of Joseph Tremoulet via
llvm-dev
Sent: 02 April 2021 19:26
To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >
Subject: Re: [llvm-dev] inttoptr and noalias returns
 
Stepping through this in the debugger, I see this code in BasicAliasAnalysis
doing a check similar to the sort that I would have expected to see proving
NoAlias for this case, but it's not because (ISTM) it's being pretty
conservative:
 
    // If one pointer is the result of a call/invoke or load and the other
is a
    // non-escaping local object within the same function, then we know the
    // object couldn't escape to a point where the call could return it.
    //
    // Note that if the pointers are in different functions, there are a
    // variety of complications. A call with a nocapture argument may still
    // temporary store the nocapture argument's value in a temporary memory
    // location if that memory location doesn't escape. Or it may pass a
    // nocapture value to other functions as long as they don't capture it.
    if (isEscapeSource(O1) &&
        isNonEscapingLocalObject(O2, &AAQI.IsCapturedCache))
      return NoAlias;
    if (isEscapeSource(O2) &&
        isNonEscapingLocalObject(O1, &AAQI.IsCapturedCache))
      return NoAlias;
  }
 
and
 
/// Returns true if the pointer is one which would have been considered an
/// escape by isNonEscapingLocalObject.
static bool isEscapeSource(const Value *V) {
  if (isa<CallBase>(V))
    return true;
 
  if (isa<Argument>(V))
    return true;
 
  // The load case works because isNonEscapingLocalObject considers all
  // stores to be escapes (it passes true for the StoreCaptures argument
  // to PointerMayBeCaptured).
  if (isa<LoadInst>(V))
    return true;
 
  return false;
}
 
Since we have to look through all the uses of O1/O2 (including certain
transitive ones) to prove isNonEscapingLocalObject, an
expensive-but-more-precise analysis could just check if O2/O1 is in that
set, IIUC.  I get why BasicAliasAnalysis isn't the right place to do that.
Is there some more expensive alias analysis that I could opt into and get
that sort of check?
 
Alternatively, following the logic that we can assume isEscapeSource for
loads because we treat stores as escapes, is there room to assume
isEscapeSource for inttoptrs because we treat ptrtoints, and things that let
you subtly intify pointers such as certain compares, as escapes?
 
Thanks,
-Joseph
 
 
From: llvm-dev <llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org> > On Behalf Of Joseph Tremoulet via
llvm-dev
Sent: Wednesday, March 31, 2021 2:09 PM
To: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >
Subject: [EXTERNAL] [llvm-dev] inttoptr and noalias returns
 
Hi,
 
I'm a bit confused about the interaction between inttoptr and noalias, and
would like to better understand our model.
 
I realize there's a bunch of in-flight work around restrict modeling and
that ptrtoint was on the agenda for last week's AA call.  I'm interested in
understanding both the current state and the thinking/plans for the future.
And I'm happy for pointers to anywhere this is already written down, I
didn't find it from skimming the AA call minutes or the mailing list
archive, but I could easily have overlooked it, and haven't really dug into
the set of restrict patches (nor do I know where to get a list of those).
 
I also realize that with aliasing questions there can always be a gap
between what the model says we can infer and how aggressive analyses and
optimizations are about actually making use of those inferences.  Again I'm
interested in both answers (and happy for either).
 
In the LangRef section on pointer aliasing rules [1], I see
 
An integer constant other than zero or a pointer value returned from a
function not defined within LLVM may be associated with address ranges
allocated through mechanisms other than those provided by LLVM. Such ranges
shall not overlap with any ranges of addresses allocated by mechanisms
provided by LLVM.
 
And I'm curious what "mechanisms provided by LLVM" for allocation means.
Alloca, presumably.  Global variables?  Certain intrinsics?  Any function
with a noalias return value?
 
In the LangRef description of the noalias attribute [2], I see
 
This indicates that memory locations accessed via pointer values based on
the argument or return value are not also accessed, during the execution of
the function, via pointer values not based on the argument or return value .
On function return values, the noalias attribute indicates that the function
acts like a system memory allocation function, returning a pointer to
allocated storage disjoint from the storage for any other object accessible
to the caller.
 
The phrase "the storage for any other object accessible to the caller" in
the noalias description sounds like a broader category than the phrase
"mechanisms provided by LLVM" from the pointer aliasing section, so I would
expect that if the pointer returned from a call to a function with return
attribute noalias does not escape, then loads/stores through it would not
alias loads/stores through a pointer produced by inttoptr.  Am I
interpreting that correctly?
 
I wrote some snippets [3] to see what the optimizer would do.  Each case has
a store of value 86 via pointer %p that I'd expect dead store elimination to
remove if we think it does not alias the subsequent load via pointer %q
(because immediately after that is another store to %p).
In each case, %q is the result of a call to a function whose return value is
annotated noalias.
 
When %p is a pointer parameter, I indeed see the optimizer removing the dead
store:
define i8 @test1(i8* %p) {
 
    %q = call i8* @allocate()
    store i8 86, i8* %p ; <-- this gets removed
    %result = load i8, i8* %q
    store i8 0, i8* %p
    ret i8 %result
}
 
When %p is the result of inttoptr, I do not see the store being removed, and
I'm wondering if this is because of a subtle aliasing rule or an intentional
conservativism in the optimizer or just a blind spot in the analysis:
define i8 @test2(i64 %p_as_int) {
    %p = inttoptr i64 %p_as_int to i8*
 
    %q = call i8* @allocate()
    store i8 86, i8* %p ; <-- this does not get removed
    %result = load i8, i8* %q
    store i8 0, i8* %p
    ret i8 %result
}
 
When I outline the inttoptr into a separate function, I again see the
optimizer remove the dead store, which again I'm wondering if the difference
between this and the previous case is an intentional subtle point or what.
define i8* @launder(i64 %int) noinline {
  %ptr = inttoptr i64 %int to i8*
  ret i8* %ptr
}
 
define i8 @test3(i64 %p_as_int) {
    %p = call i8* @launder(i64 %p_as_int)
 
    %q = call i8* @allocate()
    store i8 86, i8* %p ; <-- this gets removed
    %result = load i8, i8* %q
    store i8 0, i8* %p
    ret i8 %result
}
 
 
 
Happy for any insights you can share.
 
Thanks,
-Joseph
 
1 - https://llvm.org/docs/LangRef.html#pointeraliasing
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fllvm.org%
2Fdocs%2FLangRef.html%23pointeraliasing&data=04%7C01%7Cjotrem%40microsoft.co
m%7Cc59392ee73544672120908d900f76d30%7C72f988bf86f141af91ab2d7cd011db47%7C1%
7C0%7C637541886804444708%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=C8%2FXcULT99VQpDlsOO1WE
OhY%2FMGeCwhY9m7y72%2FlStI%3D&reserved=0> 
2 - https://llvm.org/docs/LangRef.html#parameter-attributes
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fllvm.org%
2Fdocs%2FLangRef.html%23parameter-attributes&data=04%7C01%7Cjotrem%40microso
ft.com%7Cc59392ee73544672120908d900f76d30%7C72f988bf86f141af91ab2d7cd011db47
%7C1%7C0%7C637541886804454666%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC
JQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=L%2BRLfoLcnu0KqMjx
rVR6jATwl%2FcpgbQDZtSM5sJ7TRc%3D&reserved=0> 
3 - https://godbolt.org/z/x8e41G33Y
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgodbolt.o
rg%2Fz%2Fx8e41G33Y&data=04%7C01%7Cjotrem%40microsoft.com%7Cc59392ee735446721
20908d900f76d30%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637541886804454
666%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1h
aWwiLCJXVCI6Mn0%3D%7C1000&sdata=e7QCklMraT%2Fs%2FWBuSRJwROVErYbCLiIAPmuRuccM
XnA%3D&reserved=0> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210416/1801f298/attachment.html>
    
    
More information about the llvm-dev
mailing list