[PATCH] D71499: Add builtins for aligning and checking alignment of pointers and integers

Alexander Richardson via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 2 07:44:56 PST 2020


arichardson added a subscriber: brooks.
arichardson added a comment.

> What i'm asking is:
> 
> - Are these builtins designed (as per `clang/docs/LanguageExtensions.rst`) to only be passed pointers in-bounds to the allocated memory chunk (`Logical pointer`*), or any random bag of bits casted to pointer type (`Physical Pointer`*)?
> - If `Logical pointer`, are they designed to also produce `Logical pointer`? Or `Physical Pointer`?

In my view you both input and output should be a `Logical pointer`. If you want random values, you should be aligning `uintptr_t` instead.
For CHERI we only need to support passing and creating valid pointers (i.e. with the capability tag bit set and valid bounds information).
Since the source pointer contains bounds information, it will never be possible to use the __builtin_align_up/down result to access memory from a different underlying allocation.

>> I am not sure the GEP can be inbounds since I have seen some cases
>>  where aligning pointers is used to get a pointer to a different object.
>>  I most cases it should be in-bounds (even when used to implement `malloc()`),
>>  but I have seen some cases where aligning pointers is used to get a pointer
>>  to a different object.
> 
> Object as in C++ object?
>  I'm specifically talking about memory region/chunk, as in what is returned by `malloc()`.
> 
>> For example, some versions WebKit align pointers down by 64k to get a pointer to a structure that holds metadata for all objects allocated inside that region.
> 
> But that entire region is still a single memory region/chunk,
>  not just some other random memory chunk that *happens* to be close nearby?

Since we can run that code on CHERI-MIPS, I realize it must be the same memory region.
I had look at the code again and the aligned structure is constructed in a 64K region that is created by a single call to `fastMalloc(blockSize)`, i.e. `malloc()/mmap()`.

> 
> 
>> I am not sure what happens for those cases if we add inbounds (miscompilation?), so I haven't added it here.
>>  I guess we could add it if alignment is a constant and is less than the object size, but there might already be a pass to infer if a GEP is inbounds?
> 
> I'm pushing on this because these intrinsics are supposed to be better than hand-rolled variants
>  (and in their current form with a single non-inbounds GEP they already are infinitly better
>  than ptrtoint+inttoptr pair), but if we go with current design (return `Physical pointer`),
>  (which is less optimizer friendly than `gep inbounds` which would requre/produce `Logical pointer`s),
>  we'll be stuck with it..
> 
> Since there is no such builtin currently (certainly not in clang,
>  i don't see one in GCC), we **do** get to dictate it's semantics.
>  We don't **have** to define it to be most lax/UB-free (`ptrtoint`+`inttoptr` or non-`inbounds` `gep`),
>  but we //can// define it to be most optimizer-friendly (`gep inbounds`).
> 
> - https://web.ist.utl.pt/nuno.lopes/pubs/llvmmem-oopsla18.pdf

It seems like we should be able to add inbounds to the GEP when aligning pointer values.

However, I think this means that for the downstream CHERI codegen there is one more case where we need to be careful (but that shouldn't matter for upstream).
For us `uintptr_t` is represented as `i8 addrspace(200)*`.
As `uintptr_t` can hold values that are plain integers and not pointers, we would need to add the inbounds only if the AST type is a pointer, and use a non-inbounds GEP for `(u)intptr_t`.
Does that sound correct or did I misunderstand something?

Note: Creating out-of-bounds pointers (capabilities) is fine for CHERI in hardware, they just can't be dereferenced without trapping. You can bring them back in bounds and then use them again (as long as they weren't *too* far out of bounds, since the compression scheme can't represent all possible values).

Adding @brooks in case there is a use case in the OS kernel that may need to create out-of-bounds pointers that I forgot about.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71499/new/

https://reviews.llvm.org/D71499





More information about the cfe-commits mailing list