[PATCH] D115274: [IR][RFC] Memory region declaration intrinsic

Alexander Richardson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 23 13:45:58 PDT 2022


arichardson accepted this revision.
arichardson added a comment.
This revision is now accepted and ready to land.

I am happy with this but I feel like someone else should also approve it :)



================
Comment at: llvm/docs/LangRef.rst:20876
+to be able to annotate array bounds in C family of languages,
+which may allow alloca splitting, and better alias analysis.
+
----------------
lebedev.ri wrote:
> aaron.ballman wrote:
> > lebedev.ri wrote:
> > > aaron.ballman wrote:
> > > > lebedev.ri wrote:
> > > > > arichardson wrote:
> > > > > > lebedev.ri wrote:
> > > > > > > arichardson wrote:
> > > > > > > > lebedev.ri wrote:
> > > > > > > > > arichardson wrote:
> > > > > > > > > > Do you envision this being used for all sub-object pointer creations? If so it might need a flag to disable it since it might break some C patterns such as `container_of`.
> > > > > > > > > > 
> > > > > > > > > > According to https://godbolt.org/z/evTbejaMf the container_of macro results in an inbounds GEP, so with sufficient inlining things might break?
> > > > > > > > > > 
> > > > > > > > > > About three years ago I spent quite a lot of time enforcing sub-object bounds at runtime using CHERI. Almost all code works just fine but there are things such as container_of() that require opt-out annotations. I wrote about the incompatibilities that I found in Chapter 5 of https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-949.pdf. TL;DR: not many changes needed - about 50 annotations across the entire FreeBSD source tree. Almost all annotations due to container_of or emulation of C++ inheritance in C.
> > > > > > > > > I'm not sure what you mean by "all sub-object pointer creations".
> > > > > > > > > 
> > > > > > > > > Roughly, front-ends should emit this intrinsic on some pointer with some bounds
> > > > > > > > > iff they know that it would be UB to go *from that specific pointer* (aka, as per def-use)
> > > > > > > > > outside of the specified bounds.
> > > > > > > > > 
> > > > > > > > > The one case we know of is C arrays within structs.
> > > > > > > > By sub-object pointer creations I mean something like `&obj->field`. You could conceivable treat that as declaring a new sub-object bounded to just `field`. E.g. something like this: https://godbolt.org/z/bM7j1bxqs
> > > > > > > I think the question is slightly wrong. It's up to front-ends to decide when they can and can't emit this.
> > > > > > > 
> > > > > > > If you are asking about https://godbolt.org/z/adq5EWx17,
> > > > > > > then as per previous conversations about this, i do believe that code to be well-defined and not UB.
> > > > > > > 
> > > > > > > IOW, i do **not** believe that as per the current C/C++ standards wording each data member of a struct
> > > > > > > is it's own sub-object from which you are not allowed to get to it's neighbor objects,
> > > > > > > But perhaps @aaron.ballman wants to correct me on this.
> > > > > > > 
> > > > > > Yes absolutely agreed that this is purely up to the frontend, I just assumed you had plans to update clang to emit the new intrinsic.
> > > > > > 
> > > > > > It has been a long time since I looked at the C standard with regards to subobjects but if I recall correctly you are right that this is not defined as being illegal. However, doesn't that also mean that you can access the member before/after an embedded array?
> > > > > Right. Having a pointer to the array member of a struct isn't going to do anything.
> > > > > The magic happens when you have a pointer to the *element* of the array member:
> > > > > https://godbolt.org/z/cjE5bY4G4 <- manually crafted
> > > > > IOW, i do not believe that as per the current C/C++ standards wording each data member of a struct is it's own sub-object from which you are not allowed to get to it's neighbor objects, But perhaps @aaron.ballman wants to correct me on this.
> > > > 
> > > > The rules on pointer arithmetic in C++: http://eel.is/c++draft/expr.add#4, so https://godbolt.org/z/adq5EWx17 looks like UB to me in C++. C2x 6.5.6p9 reads similarly, so it also looks like UB to me in C.
> > > > 
> > > > What language rules make you think otherwise?
> > > > 
> > > > 
> > > Hm, thank you for correcting me,  that does not match my recollection from the previous time we discussed this.
> > > What about https://godbolt.org/z/8T54zxT43, is that pointer well-defined?
> > I believe that's also UB for the same reason -- the subtraction violates http://eel.is/c++draft/expr.add#4.3, so we can't say much about the resulting pointer value. Note, a slightly different case of using `+ 1` instead of `- 1` is valid because any object can be treated as an array of one (https://eel.is/c++draft/basic.compound#3.sentence-12) and a one-past-the-end pointer is valid (http://eel.is/c++draft/expr.add#4.2) so long as it's not dereferenced.
> Right, `+1` (`end` pointer) is non-dereferenceable, and is fine.
> 
> Hmmmm, that sounds too good to be true. Then i stand corrected,
> i guess we could emit it even for such cases, although that will likely break code,
> and i'm not sure if a sanitizer could be implemented to catch the UB.
Not a sanitizer for x86, but you can use CHERI LLVM and compile with -cheri-bounds=subobject-safe to detect such violations at runtime. Code can then be run on QEMU or also on Arm's Morello boards that were recently released (if you happen to be one of the recipients).
We use that flag for the FreeBSD kernel and it works very well with only a few minor adjustments.

I believe that a sanitizer for non-CHERI hardware would require complete provenance (bounds) shadow memory for every pointer so would be a big engineering effort.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115274/new/

https://reviews.llvm.org/D115274



More information about the llvm-commits mailing list