[PATCH] D125009: [ArgPromotion] Use common alignment for loads in caller

Fri May 6 04:27:55 PDT 2022

psamolysov added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:521
+    Align ArgAlign = Arg->getParamAlign().valueOrOne();
+    Align PartAlign = std::max(LI->getAlign(), ArgAlign);
     auto Pair = ArgParts.try_emplace(
----------------
nikic wrote:
> psamolysov wrote:
> > nikic wrote:
> > > Isn't this incorrect for non-zero offsets?
> > > 
> > > Overall, I'm not sure I really understand what you're trying to achieve here. What we're interested here is whether loads are speculatable, which is the case when they are defereferenceable and aligned. This can be either because the load is guaranteed to execute, or because we have known dereferenceability/alignment knowledge. Byval deref/align will be taken into account for the latter check (allCallersPassValidPointerForArgument).
> > @nikic Yes, you are absolutely right, this is incorrect for non-zero offsets. I'm going to fix it ASAP.
> > 
> > Initially, my intent was to copy the byval promotion behavior where the alignment of the byval argument is taken into account because if the argument is aligned by 32, for example, we should not use less alignment for the part with zero offset loading. And because the byval argument is just a copy of the original value from the caller, it's very easy to get it's alignment: just use the value from the corresponding argument definition in the callee.
> > 
> > But after rethinking, I see that for non-byval pointers, we should take into account not the argument alignment from the callee definition (as I did here) but actual alignment value from the caller in which we generate new load instructions. It can be considered as an error in the frontend of course, but if we have loads in the callee with less alignment for the part with zero offset than the whole structure was allocated in the caller with, and we use this less alignment in the new generated load instructions, it might be a source of errors in the compiled application. What do you think? 
> I think you're mixing up two things here: Legality and optimization. I believe the current code fully handles the legality aspect for both byval and non-byval arguments. If I understand right, your proposed change here is related to optimization: If you have an `align 32` argument that is used in an `align 4` load, you want to make the promoted load use `align 32` instead.
> 
> Doing that would be fine, but I also think that this is not really the place to do it: We already have transforms that will increase load/store alignment based on known alignment (e.g. from parameter align attributes). For example, InstCombine does this (see getOrEnforceKnownAlignment). I think the alignment handling code here is already tricky enough without also trying to optimize the alignments at the same time.
@nikic Thank you very much for the good explanation. My concern was about legality, so I though this is illegal to use `align 4` loads for `align 32` arguments.

If we replace the `byval` promotion approach with the currently used `non-byval` one as is, the `non-byval` rules will be used for alignments of the generated `load` instructions too: just the maximum `align` of the present in the callee `load` instructions will be used and `align` of the `byval` argument will just be ignored. Example, the `byval.ll` test, function `@g`:

```
define internal void @g(%struct.ss* byval(%struct.ss) align 32 %b) nounwind {
...
}
```

after promotion by `non-byval` scheme, in caller:

```
define i32 @main() nounwind  {
  %S = alloca %struct.ss, align 32
...
  %S01 = getelementptr %struct.ss, %struct.ss* , i64 0, i32 0
  %S01_VAL = load i32, i32* %S01, align 4
  call void @g(i32 %S01_VAL)
...
}
```

Will this `%S01_VAL = load i32, i32* %S01, align 4`,  not `align 32` as it is now, be correct?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125009/new/

https://reviews.llvm.org/D125009