[PATCH] D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b

Wed Oct 4 07:48:23 PDT 2017

mppf added a comment.

> Hang on, there's a more fundamental problem here this is papering over. If your pointers are larger than 64 bits, those pointers can have offsets larger than 64 bits. Since BasicAA is using 64-bit integers to represent pointer offsets, the math in DecomposeGEPExpression will overflow, so you'll get incorrect results, and eventually cause a miscompile.

In my use case, I have 128-bit pointers but "adding" to one of these will only ever modify the lower 64 bits. So for my use case, offsets > 64 bits just don't matter.

More generally, in the case of BasicAA, my understanding is that these int64_ts represent *fixed* offsets in to pointed-to-data. I.e. you'd have to have a struct field or other fixed-width type that is 16 exabytes or so. Sure, you could have a getelementptr accessing i8* with whatever fixed integer we want, say 2^80 for the sake of argument. I just don't think it's likely that any front-end would generate such a thing. (But of course they *could* if somebody comes up with a reason to... I'm just explaining why I'm not particularly worried about it). In practice, the fixed GEPs our front-end generates have offsets that almost certainly fit into an int32...

>> If we switch to using APInt for offsets, adjustToPointerSize() just goes away

> Granted. There's a risk that, if we try that, we'll find an unacceptable compile-time slowdown. Nevertheless, I'll go with you on this: there are only around 40 int64_t variables in BasicAA, so it's worth trying.

I'm not sure if you're asking me to try to do that? I've never seen this code before attempting this fix... so I'm not sure I'm a good candidate for that effort. This patch request is really a bug report with a suggested fix. If you prefer, I could switch to doing a bug report and let one of you fix it. (Among other things, I don't know how to appropriately analyze the performance impact).

Anyway my recent experience with APInt and offsets in https://reviews.llvm.org/D38501 suggests to me that it's not as simple as just switching to APInt. These GEP offsets have specific overflow behavior that needs to be considered. From LangRef.html for getelementptr:

> If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer.

(As an aside, I don't know why this arithmetic overflow has anything to do with inbounds, since things can overflow and still be in an allocated object).
I think the IR is designed this way because LLVM IR doesn't generally treat signed/unsigned integers differently. So to make negative offsets work right (like -1), the definition actually relies on overflow behavior. So we'd need to be doing all of this address arithmetic with APInts with the same width as the pointer. Maybe that's what you were proposing but I don't think such things are obvious...

>> so I don't see how this is forward progress.

It's forward progress because it solves an immediately blocking issue for me, it does so in a way that has low impact on anything else (performance, correctness, ...), and it exists already.
Don't let the perfect be the enemy of the good (enough), and all that.

>> I guess the alternative is to try to add a bunch of overflow checks, but that seems a lot more complicated for no benefit.

It seems to me that we could check that the computed GEP offsets fit in to int64_t without too much trouble... but I'd do that by doing the computation with APInts, say in the code calling GetLinearExpression that hfinkel mentioned.

https://reviews.llvm.org/D38499