<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 6/4/2021 2:25 PM, John McCall via

      cfe-dev wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:3F1E5D6A-DE05-46C9-A4E9-4C575EF66DE1@apple.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div style="font-family:sans-serif">I don’t believe this is

        correct. LLVM does not have an innate<br>

        <div style="white-space:normal">

          <p dir="auto">

            concept of typed memory. The type of a global or local

            allocation<br>

            is just a roundabout way of giving it a size and default

            alignment,<br>

            and similarly the type of a load or store just determines

            the width<br>

            and default alignment of the access. There are no

            restrictions on<br>

            what types can be used to load or store from certain

            objects.</p>

          <p dir="auto">C-style type aliasing restrictions are imposed

            using <code>tbaa</code><br>

            metadata, which are unrelated to the IR type of the access.</p>

          <p dir="auto">John.</p>

        </div>

      </div>

    </blockquote>

    <p>I've never been thoroughly involved in any of the actual

      optimizations here, but it seems to me that there is a soundness

      hole in the LLVM semantics that we gloss over when we say that

      LLVM doesn't have typed memory.</p>

    <p>Working backwards from what a putative operational semantics of

      LLVM might look like (and I'm going to ignore poison/undef because

      it's not relevant), I think there is agreement that integer types

      in LLVM are purely bitvectors. Any value of i64 5 can be replaced

      with any other value of i64 5 no matter where it came from. At the

      same time, when we have pointers involved, this is not true. Two

      pointers may have the same numerical value (e.g., when cast to

      integers), but one might not be replaceable with the other because

      there's other data that might not be the same. So in operational

      terms, pointers have both a numerical value and a bag of

      provenance data (probably other stuff, but let's be simple and

      call it provenance).</p>

    <p>Now we have to ask what the semantics of converting between

      integers and pointers are. Integers, as we've defined, don't have

      provenance data. So an inttoptr instruction has to synthesize that

      provenance somehow. Ideally, we'd want to grab that data from the

      ptrtoint instruction that generated the integer, but the semantics

      of integers means we can only launder that data globally, so that

      an inttoptr has the union of all of the provenance data that was

      ever fed into an inttoptr (I suspect the actual semantics we use

      is somewhat more precise than this in that it only considers those

      pointers that point to still-live data, which doesn't invalidate

      anything I'm about to talk about).</p>

    <p>Okay, what about memory? I believe what most people intend to

      mean when they say that LLVM's memory is untyped is that a load or

      store of any type is equivalent to first converting it to an

      integer and then storing the integer into memory. E.g. these two

      functions are semantically equivalent:<br>

    </p>

    <pre>define void @foo(ptr %mem, i8* %foo) {

  store i8* %foo, ptr %mem

}

define void @bar(ptr %mem, i8* %foo) {

  %asint = ptrtoint i8* %foo to i64 ; Or whatever pointer size you have

  store i64 %asint, ptr %mem

}

</pre>

    <p>In other words, we are to accept that every load and store

      instruction of a pointer has an implicit inttoptr or ptrtoint

      attached to it. But as I mentioned earlier, pointers have this

      extra metadata attached to it that is lost when converting to an

      integer. Under this strict interpretation of memory, we *lose*

      that metadata every time a pointer is stored in memory, as if we

      did an inttoptr(ptrtoint x). Thus, the following two functions are

      *not* semantically equivalent in that model:</p>

    <pre>define i8* @basic(i8* %in) {

  ret i8* %in

}

define i8* @via_alloc(i8* %in) {

  %mem = alloca i8*

  store i8* %in, i8** %mem

  %out = load i8*, i8** %mem

  ret i8* %out

}

</pre>

    <p>In order to allow these two functions to be equivalent, we have

      to let the load of a pointer recover the provenance data stored by

      the store of the pointer, and nothing more general. If either one

      of those were instead an integer load or store, then no provenance

      data can be communicated, so the integer and the pointer loads

      *must* be nonequivalent (although loading an integer instead of a

      pointer would presumably be a pessimistic transformation).</p>

    <p>In short, pointers have pointery bits that aren't reflected in a

      bitvector representation an integer has. LLVM has some

      optimizations that assume that loads and stores only have

      bitvector manipulation semantics, while other optimizations (and

      most of the frontends) expect that loads and stores will preserve

      the pointery bits. And when these interact with each other, it's

      undoubtedly possible that the pointery bits get lost along the

      way.<br>

    </p>

    <pre class="moz-signature" cols="72">-- 

Joshua Cranmer

</pre>

  </body>

</html>