<div dir="ltr"><div>I'm thinking about this a bit more, and I don't know if it's something that can be used in a generic backend like LLVM. Should accessing a smaller object from a bigger pointer be considered UB?<br></div><div>In contrast, sing the assumption that bigger types can not be dereferenced from smaller types is a good idea. If this is already added 

to BasicAA, I wonder why it does not optimize this out?<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 28, 2020 at 1:23 PM Doerfert, Johannes <<a href="mailto:jdoerfert@anl.gov">jdoerfert@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 01/28, Karl Rehm wrote:<br>

> ><br>

> > I need to take a closer look but I would have expected BasicAA to be<br>

> > able to determine that `do_log` and `R` cannot alias. In the -Os version<br>

> > (lower right here <a href="https://gcc.godbolt.org/z/KLaeiH" rel="noreferrer" target="_blank">https://gcc.godbolt.org/z/KLaeiH</a>), the write to `R`<br>

> > clobbers the read from `do_log` which prevents us from removing the<br>

> > load/store pair. My reasoning would have been that we know the size of<br>

> > `do_log` to be less than the size accessed via `R`. What exactly goes<br>

> > wrong or if my logic is flawed needs to be examined. I would start<br>

> > looking at the debug generated by the code parts touched here:<br>

> > <a href="https://reviews.llvm.org/D66157" rel="noreferrer" target="_blank">https://reviews.llvm.org/D66157</a><br>

> ><br>

> <br>

> Well, if I recall correctly there's actually a problem with AA being *too*<br>

> aggressive for this reason: <a href="https://gcc.godbolt.org/z/CXN8Gw" rel="noreferrer" target="_blank">https://gcc.godbolt.org/z/CXN8Gw</a><br>

> There's a bug report about this somewhere, don't remember where though.<br>

<br>

That seems to be related purely to TBAA: <a href="https://gcc.godbolt.org/z/ZLSzHH" rel="noreferrer" target="_blank">https://gcc.godbolt.org/z/ZLSzHH</a><br>

<br>

What I described should fire if you see the allocation. Alternatively,<br>

we can add a new "max object size" attribute [0] which gives a size<br>

upper bound for the pointed to memory. In either case, if you know the<br>

object pointed to by A is maximal N bytes of size, and the object<br>

pointed to by B is at most M bytes of size, and N < M, A and B do not<br>

alias. As of now, we use `dereferenceable` to determine the minimum size<br>

of the underlying object* and require to see the definition (I think) to<br>

obtain a maximum size.<br>

<br>

* It's actually a lower bound on the minimum size based on the pointer<br>

  but that is conservatively fine too.<br>

<br>

[0] <a href="https://www.youtube.com/watch?v=HVvvCSSLiTw" rel="noreferrer" target="_blank">https://www.youtube.com/watch?v=HVvvCSSLiTw</a><br>

<br>

<br>

> On Tue, Jan 28, 2020 at 12:44 PM Doerfert, Johannes <<a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a>><br>

> wrote:<br>

> <br>

> > Hi Karl, Roman,<br>

> ><br>

> > On 01/28, Roman Lebedev via llvm-dev wrote:<br>

> > > On Tue, Jan 28, 2020 at 8:09 PM Karl Rehm via llvm-dev<br>

> > > <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

> > > > I was looking into how the global optimization pass fares against<br>

> > > > things like what's reported in<br>

> > > > <a href="https://bugs.llvm.org/show_bug.cgi?id=44676" rel="noreferrer" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=44676</a><br>

> ><br>

> > I need to take a closer look but I would have expected BasicAA to be<br>

> > able to determine that `do_log` and `R` cannot alias. In the -Os version<br>

> > (lower right here <a href="https://gcc.godbolt.org/z/KLaeiH" rel="noreferrer" target="_blank">https://gcc.godbolt.org/z/KLaeiH</a>), the write to `R`<br>

> > clobbers the read from `do_log` which prevents us from removing the<br>

> > load/store pair. My reasoning would have been that we know the size of<br>

> > `do_log` to be less than the size accessed via `R`. What exactly goes<br>

> > wrong or if my logic is flawed needs to be examined. I would start<br>

> > looking at the debug generated by the code parts touched here:<br>

> > <a href="https://reviews.llvm.org/D66157" rel="noreferrer" target="_blank">https://reviews.llvm.org/D66157</a><br>

> ><br>

> ><br>

> > > > Looking at this, I think it would be pretty trivial to optimize that<br>

> > > > down given that there are already threading assumptions made:<br>

> > > > <a href="https://godbolt.org/z/u6ZqoB" rel="noreferrer" target="_blank">https://godbolt.org/z/u6ZqoB</a><br>

> ><br>

> > Optimizing more aggressively based on forward process guarantees will<br>

> > get us in more trouble than we are already in. I don't have the link<br>

> > handy but as far as I remember the proposed solution was to have a<br>

> > forward process guarantee function attribute. I would recommend we look<br>

> > into that first before we start more aggressive optimizations which will<br>

> > cause problems for a lot of (non C/C++) folks.<br>

> ><br>

> ><br>

> > > > Is this something I can look into?<br>

> ><br>

> > Sure :)<br>

> ><br>

> ><br>

> > > > Another thing is that currently *all* external calls break this<br>

> > > > optimization, including calls to intrinsics that probably shouldn't:<br>

> > > > <a href="https://godbolt.org/z/pK7Cew" rel="noreferrer" target="_blank">https://godbolt.org/z/pK7Cew</a><br>

> > > I think during load propagation, there is a legality check "here's a<br>

> > > load, and here's a store.<br>

> > > Is there anything in between that may have clobbered that memory<br>

> > location?".<br>

> ><br>

> > Right now we only have `__attribute__((pure/const))` but we want to<br>

> > expose all LLVM-IR attributes to the user soon [0] which will allow way<br>

> > more fine-grained control. Intrinsics are a different story again.<br>

> ><br>

> ><br>

> > > For calls, there are some attributes that are helpful here:<br>

> > > <a href="https://llvm.org/docs/LangRef.html#function-attributes" rel="noreferrer" target="_blank">https://llvm.org/docs/LangRef.html#function-attributes</a><br>

> > > So in this case, i guess `@llvm.x86.flags.write` intrinsic maybe can<br>

> > > be annotated with readonly attribute,<br>

> > > thus signalling that it won't clobber that memory location?<br>

> ><br>

> > While target specific intrinsics are a bit more complicated we see the<br>

> > problem often with generic intrinsic already. We proposed the other day<br>

> > [1] to change the default semantics of non-target specific intrinsics<br>

> > such that you have to opt-in for certain effects.<br>

> ><br>

> > For the above example you want `llvm.x86.flags.write` to be `writeonly` and<br>

> > `inaccesiblememonly`. Also `nosync`, `willreturn`, ...<br>

> ><br>

> > Cheers,<br>

> >   Johannes<br>

> ><br>

> ><br>

> > [0] <a href="https://www.youtube.com/watch?v=elmio6AoyK0" rel="noreferrer" target="_blank">https://www.youtube.com/watch?v=elmio6AoyK0</a><br>

> > [1] <a href="http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html" rel="noreferrer" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2019-August/134404.html</a><br>

> ><br>

<br>

-- <br>

<br>

Johannes Doerfert<br>

Researcher<br>

<br>

Argonne National Laboratory<br>

Lemont, IL 60439, USA<br>

<br>

<a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a><br>

</blockquote></div>