<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">
</head>
<body>
<div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">On 4 Jun 2021, at 15:06, Nuno Lopes wrote:</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #3983C4; color:#3983C4; margin:0 0 5px; padding-left:5px"><p dir="auto">On 4 Jun 2021, at 11:24, George Mitenkov wrote:<br>
<br>
Hi all,<br>
<br>
Together with Nuno Lopes and Juneyoung Lee we propose to add a new byte<br>
type to LLVM to fix miscompilations due to load type punning. Please see<br>
the proposal below. It would be great to hear the<br>
feedback/comments/suggestions!<br>
<br>
<br>
Motivation<br>
==========<br>
<br>
char and unsigned char are considered to be universal holders in C. They<br>
can access raw memory and are used to implement memcpy. i8 is the LLVM’s<br>
counterpart but it does not have such semantics, which is also not<br>
desirable as it would disable many optimizations.<br>
<br>
I don’t believe this is correct. LLVM does not have an innate<br>
concept of typed memory. The type of a global or local allocation<br>
is just a roundabout way of giving it a size and default alignment,<br>
and similarly the type of a load or store just determines the width<br>
and default alignment of the access. There are no restrictions on<br>
what types can be used to load or store from certain objects.<br>
<br>
C-style type aliasing restrictions are imposed using tbaa<br>
metadata, which are unrelated to the IR type of the access.<br>
<br>
<br>
<br>
It’s debatable whether LLVM considers memory to be typed or not. If we don’t consider memory to be typed, then *all* integer load operations have to be considered as potentially escaping pointers. Example:<br>
store i32* %p, i32** %q<br>
%q2 = bitcast i32** %q to i64*<br>
%v = load i64* %q2<br>
<br>
This program stores a pointer and then loads it back as an integer. So there’s an implicit pointer-to-integer cast, which escapes the pointer. If we allow this situation to happen, then the alias analysis code is broken, as well as several optimizations. LLVM doesn’t consider loads as potential pointer escape sites. It would probably be a disaster (performance wise) if it did!</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Huh? It’s not the load that’s the problem in your example, it’s the store. If we have a pointer that’s currently known to not alias anything, and that pointer gets stored somewhere, and we can’t analyze all the uses of where it’s stored, then yes, anything that could potentially load from that memory might load the pointer, and so the pointer can’t just be assumed to not alias anything anymore. We don’t need subtle reasoning about bitcasts for that, that’s just basic conservative code analysis.</p>
<p dir="auto">If you want to be doing optimizations where we opaquely assume that certain loads don’t carry pointers, then you need frontend involvement for that; you can’t just assume that some random i64 doesn’t carry a pointer. And I’ve gotta tell you, some random i64 is allowed to carry a pointer in C, so you’re not likely to get much mileage out of that. If you make this change, all that’s going to happen is that C-ish frontends will have to make all their integer types <code>b8</code>, <code>b32</code>, <code>b64</code>, etc. to indicate that they can carry pointer data, and then you’ll be right back where you were in terms of optimizability. The frontends that <em>can</em> make stronger statements about integers can probably all make much stronger promises about aliasing than you’d get from this analysis anyway.</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #3983C4; color:#3983C4; margin:0 0 5px; padding-left:5px"><p dir="auto">The introduction of the byte type allow us to make all pointer <-> integer casts explicit, so that we don’t have to make all integer loads as escaping. It also allow us to say that LLVM optimizations are correct, and we “just” need to create a few new optimization to get rid of the extra bytecast instructions when they are provably not needed.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">It sounds an awful lot like you’re trying to optimize a different language than the one you’ve got. I can certainly see how this constraint on <code>inttoptr</code> might be useful, but it’s not a constraint that LLVM has historically documented or enforced, and you haven’t made a very compelling case for why it’s worth such a major and incompatible change to the language.</p>
<p dir="auto">John.</p>
</div>
</div>
</body>
</html>