[llvm-dev] getelementptr inbounds with offset 0

Doerfert, Johannes via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 12 12:44:44 PDT 2019


Hi Ralf,

On 04/10, Ralf Jung wrote:
> >>> I see. Is there a quick answer to the questions why you need inbounds
> >>> GEPs in that case? Can't you just use non-inbounds GEPs if you know you
> >>> might not have a valid base ptr and "optimize" it to inbounds once that
> >>> is proven?
> >>
> >> You mean on the Rust side?  We emit GEPi for field accesses and array indexing.
> >>  We cannot always statically determine if this is happening for a ZST or not.
> >> At the same time, given that no memory access ever happens for a ZST, allocating
> >> a ZST (Box::new in Rust, think of it like new in C++) does not actually allocate
> >> any memory, it just returns an integer (sufficiently aligned) cast to a pointer.
> > 
> > OK, but why not emit non-inbonuds GEPs instead? They do not come with
> > the problems you have now, or maybe I misunderstand.
> 
> The problem is statically figuring out whether it should be inbounds or
> non-inbounds.  When we have code like `&x[n]`, this might be an offset-by-0 in
> an empty slice and hence fall into the scope of my question, or it might be a
> "normal" array access where we definitely want inbounds.

I'd argue, after all this discussion at least, use non-inbounds if you
do not know you have a valid object (and want to avoid undef and all
what it entails). This might cause performance regressions, if you try
it, it would be interesting to know how much. We could even look into an
"inbounds" detection in the "Attributor framework" [0] to get some of
the performance back.

[0] https://reviews.llvm.org/D59919 (but see also the "Stack" tab that shows
                                     related commits)

> >> Sure, UB is definitely *defined* in a runtime-value dependent way.  The problem
> >> here is that it is not defined in a precise way -- something where one could
> >> write an interpreter that tracks all the extra state that is needed (like
> >> poison/undef and where allocations lie) and then says precisely under which
> >> conditions we have UB and under which we do not.
> >> What I am asking here for is the exact definition of GEPi if, *at run-time*, the
> >> offset is 0, and the base pointer is (a) an integer, or (b) dangling.
> > 
> > That last part is given by the lang-ref (imo):
> >   "If the inbounds keyword is present, the result value of the
> >    getelementptr is a poison value if the base pointer is not an in
> >    bounds address of an allocated object"
> > 
> > I read this as: If you have a GEPi, you get poison if the base pointer
> > is not an allocated object. That is a dangling pointer (b) causes the
> > GEPi to be poison and a pointer from integer (a) may, if the address
> > denoted by the integer is not inside, or one past, an allocated object.
> > Now any offset except 0 will add more possible ways to generate a poison
> > value.
> 
> Thanks.  That makes sense from reading the docs (though I am not convinced that
> it actually helps with optimizations to be this strict here).

I never argued it does "make sense" ;)


> For the (a) case, the question about "0-sized objects" remains, but it doesn't
> seem like the answer could affect what LLVM does.

I think I now see (maybe part of) your point.
Something like:

  x = malloc(0);
  // ... anything except free(x) or equivalent
  y = gep inbounds x, 0
  // ... anything except free(x) or equivalent
  use_but_not_dereference(y);

should be OK (= no undef/poison appears). Does that at least go in the
right direction? I think this should be OK from the IR definition or
something is broken. Obviously, there is always the possibility, or
better the certainty, that the implementation is somewhere broken ;)


> It would be really nice to have a reference interpreter for LLVM IR that can
> explicitly check for all the UB.  Maybe, one day... ;)

Let me know once you start working on one, I'd be quite interested ;)

Cheers,
  Johannes


-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190412/24ae3ff7/attachment.sig>


More information about the llvm-dev mailing list