[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 15 05:41:34 PST 2016


----- Original Message -----
> From: "James Molloy" <james at jamesmolloy.co.uk>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Philip Reames" <listmail at philipreames.com>
> Cc: llvm-dev at lists.llvm.org, "Daniel Sanders" <Daniel.Sanders at imgtec.com>
> Sent: Friday, January 15, 2016 2:45:32 AM
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction selection
> 
> 
> Hi,
> 
> 
> > "It is always a no-op cast because no bits change with this
> > conversion. The conversion is done as if the value had been stored
> > to memory and read back as type ty2."
> 
> 
> I think a simple "as-if" in there should be sufficient;
> 
> 
> "It is always a no-op cast because it acts as if no bits change with
> this conversion. The conversion is done as if the value had been
> stored to memory and read back as type ty2."
> 
> 
> What do you think?
> 

I think this sounds confusing (and, regardless, we always get to apply 'as if'). I see you're point, however, that any changes in the bits are unobservable at the IR level. Is it true that int -> floating-point -> int bitcasts round-trip cleanly for all possible values on all hardware? I was under the impression that this was not true. I think that the best solution might just be to delete the first sentence.

 -Hal

> 
> James
> 
> 
> On Thu, 14 Jan 2016 at 22:35 Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> ----- Original Message -----
> > From: "Philip Reames" < listmail at philipreames.com >
> > To: "James Molloy" < james at jamesmolloy.co.uk >, "Daniel Sanders" <
> > Daniel.Sanders at imgtec.com >, "Hal Finkel"
> > < hfinkel at anl.gov >
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Thursday, January 14, 2016 3:48:37 PM
> > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > instruction selection
> > 
> > This explanation makes a lot more sense to me. I think it would
> > make
> > sense to document this mental model, but I agree that this
> > interpretation does not seem to require changes to the IR
> > semantics.
> 
> The semantics, no. But we still may want to update the language
> reference. It says, "It is always a no-op cast because no bits
> change with this conversion. The conversion is done as if the value
> had been stored to memory and read back as type ty2." And, what
> we've learned, is that this second sentence does not always imply
> the first (the bits might, in fact, change).
> 
> -Hal
> 
> > 
> > Just to check, this implies that DSE *is* legal right?
> > 
> > Philip
> > 
> > 
> > On 01/14/2016 05:48 AM, James Molloy wrote:
> > 
> > 
> > 
> > Hi,
> > 
> > 
> > I've given a bit of misinformation here and have caused some
> > confusion. After talking with Tim and Mehdi last night on IRC, I
> > need to correct what I said above to fall more in line with what
> > Daniel is saying. If any of the below contradicts what I've said
> > already, please accept my apologies. This version should be right.
> > 
> > 
> > The behaviour of the code generator for big-endian NEON and MIPS is
> > derived from the fact that we did not want to change IR semantics
> > at
> > all. A fundamental property that we do not want to break is memory
> > round-tripping:
> > 
> > 
> > %1 = load <4 x i32>, %p32
> > %2 = bitcast <4 x i32> %1 to <2 x i64>
> > store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
> > 
> > 
> > The value of memory before and after the store MUST NOT change
> > (contrary to what I said in an earlier post, I know).
> > 
> > 
> > So in fact everything you can do in IR is valid. There are no
> > changes
> > to IR semantics in the slightest. However, when it comes to
> > generating code from the IR, there are new rules:
> > 1) Loads and stores are selected to be special loads and stores
> > that
> > do some transform from a canonical form in memory to a
> > type-specific
> > form in register.
> > 2) Because bitcasts are load/store pairs in semantic, they must
> > behave as if a store then load was done. Specifically (bitcast TyA
> > to TyB) must transform TyA -> canonical form -> TyB, as a store
> > then
> > load would. Therefore bitcasts are not no-ops during code
> > generation
> > (*but behave as if they are from an IR perspective!*).
> > 
> > 
> > The reason this works neatly in IR is due to the IR's type system.
> > In
> > order to change type, a cast must be inserted or a memory round
> > trip. There is no other way. However in SDAG, things break down a
> > bit. SDAG is more weakly typed, and bitconverts are often simply
> > removed. We need that not to happen. Bitconverts are not no-ops.
> > 
> > 
> > Daniel's explanation of physical register mapping was excellent so
> > I'm not going to repeat that.
> > 
> > 
> > I apologise for the confusion and misinformation. This is quite a
> > complex topic and takes a bit of mind bending for me to understand,
> > and it was a long time ago.
> > 
> > 
> > James
> > 
> > 
> > On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <
> > Daniel.Sanders at imgtec.com > wrote:
> > 
> > 
> > 
> > 
> > 
> > 
> > > Ok. Then we need to change the LangRef as suggested. Given this
> > > is
> > > a rather important semantic change, I think you need to send a
> > > top
> > > level RFC to the list.
> > 
> > 
> > 
> > 
> > 
> > FWIW, I don't think this is a semantic change to LLVM-IR itself. I
> > think it's more clearing up the misconception that LLVM-IR
> > semantics
> > also apply to SelectionDAG's operations. That said, I do think it's
> > important to mention this in LangRef since it's very easy to make
> > this mistake and very few targets need to worry about the
> > distinction.
> > 
> > 
> > 
> > To explain why I don't think this is a semantic change to LLVM-IR,
> > let's consider this example from earlier:
> > 
> > 
> > 
> > %0 = load <4 x i32> %x
> > %1 = bitcast <4 x i32> %0 to <2 x i64>
> > 
> > 
> > 
> > 
> > store <2 x i64> %1, <2 x i64>* %y
> > 
> > 
> > 
> > 
> > 
> > In LLVM-IR terms, if the value of %0 is:
> > 
> > %0 = 0x00112233_44556677_8899aabb_ccddeeff
> > 
> > then the value of %1 is:
> > 
> > %1 = 0x0011223344556677_8899aabbccddeeff
> > 
> > which agrees with the store/load and the 'no bits change'
> > statements
> > in LangRef.
> > 
> > 
> > 
> > However, the mapping of these bits to physical register bits is not
> > consistent between types:
> > 
> > Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
> > 
> > Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
> > 
> > 
> > 
> > Essentially, I'm saying that BitCastInst and ISD::BITCAST have
> > slightly different semantics because of their different domains.
> > The
> > former is working on an abstract representation of the values where
> > both statements in LangRef are true, but the latter is closer to
> > the
> > target where the 'no bits change' statement ceases to be true in
> > some cases.
> > 
> > 
> > 
> > > A couple of points that will need clarified:
> > > - Does this only apply to vector types? It definitely doesn't
> > > apply
> > > between pointer types today. What about integer, floating point,
> > > and FCAs?
> > 
> > 
> > 
> > 
> > 
> > I've only seen it for vector types so far but in theory it could
> > happen for other types. I'd expect FCAs to encounter it since the
> > physical registers may contain padding that isn't present in the
> > LLVM-IR representation and the placement and amount of padding will
> > depend on the exact FCA.
> > 
> > I can think of cases where address space casts can encounter the
> > same
> > problem but that's already been covered in LangRef ("It can be a
> > no-op cast or a complex value modification, depending on the target
> > and the address space pair.").
> > 
> > 
> > 
> > Does anyone use FCAs directly? Most targets seem to convert them to
> > same-sized integers or bitcast an FCA* to i8*.
> > 
> > 
> > 
> > 
> > > - Is combining two casts into one a legal operation? I think it
> > > is
> > > so far, but we need to explicitly state that.
> > 
> > 
> > 
> > 
> > 
> > Yes, A->B->C and A->C are equivalent.
> > 
> > 
> > 
> > 
> > > - Do we have a predicate for identifying no-op casts that can be
> > > freely removed/combined?
> > 
> > 
> > 
> > 
> > 
> > James mentioned one in CGP but I haven't been able to find it. I
> > don't think it's necessary to have one at the LLVM-IR level but we
> > do need one in the backends. I remember adding one to the backend
> > but I can't find that either so I think I'm remembering one of my
> > patches from before I split MSA's registers into type-specific
> > classes.
> > 
> > 
> > 
> > 
> > > - Is coercing a load to the type it's immediately bitcast to
> > > legal
> > > under this model?
> > 
> > 
> > 
> > 
> > 
> > Yes.
> > 
> > 
> > 
> > 
> > 
> > 
> > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> > Of Philip Reames via llvm-dev
> > Sent: 13 January 2016 20:31
> > To: James Molloy; Hal Finkel
> > Cc: llvm-dev
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > instruction selection
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On 01/13/2016 12:20 PM, James Molloy wrote:
> > 
> > 
> > 
> > 
> > > (Right?)
> > 
> > 
> > 
> > 
> > 
> > Uh no, the register content explicitly does change :( We insert REV
> > instructions (byteswap) on each bitcast. Bitcasts can be merged and
> > elided etc, but conceptually there's a register content change on
> > every bitcast.
> > 
> > Ok. Then we need to change the LangRef as suggested. Given this is
> > a
> > rather important semantic change, I think you need to send a top
> > level RFC to the list.
> > 
> > A couple of points that will need clarified:
> > - Does this only apply to vector types? It definitely doesn't apply
> > between pointer types today. What about integer, floating point,
> > and
> > FCAs?
> > - Is combining two casts into one a legal operation? I think it is
> > so
> > far, but we need to explicitly state that.
> > - Do we have a predicate for identifying no-op casts that can be
> > freely removed/combined?
> > - Is coercing a load to the type it's immediately bitcast to legal
> > under this model?
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > James
> > 
> > 
> > 
> > 
> > 
> > On Wed, 13 Jan 2016 at 18:09 Philip Reames <
> > listmail at philipreames.com > wrote:
> > 
> > 
> > 
> > 
> > 
> > On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> > > ----- Original Message -----
> > >> From: "James Molloy" < james at jamesmolloy.co.uk >
> > >> To: "Hal Finkel" < hfinkel at anl.gov >
> > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >, "Quentin Colombet" <
> > >> qcolombet at apple.com >
> > >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > >> instruction selection
> > >> 
> > >> 
> > >>> I think that teaching the optimizer about big-Endian lane
> > >>> ordering
> > >>> would have been better.
> > >> 
> > >> It's certainly arguable. Even in hindsight I'm glad we didn't -
> > >> that's the approach GCC took and they've been fixing subtle bugs
> > >> in
> > >> their vectorizer ever since.
> > >> 
> > >> 
> > >>> Inserting the REV after every LDR
> > >> 
> > >> We only do this conceptually. In most cases REVs cancel out, and
> > >> we
> > >> have the LD1 instruction which is LDR+REV. With enough peepholes
> > >> there's really no need for code to run slower.
> > >> 
> > >> 
> > >>> Given what's been done, should we update the LangRef.
> > >> 
> > >> Potentially, yes. I hadn't realised quite how strongly worded it
> > >> was
> > >> with respect to this.
> > >> 
> > > Please do ;)
> > I'm not sure changing bitcast is the right place. Since the bitcast
> > is
> > representing the in-register value (which doesn't change), maybe we
> > should define it as part of the load/store instead? That's
> > essentially
> > what's going on; we're converting from a canonical register form to
> > a
> > variety of memory forms. (Right?)
> > > 
> > > -Hal
> > > 
> > >> James
> > >> 
> > >> 
> > >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov >
> > >> wrote:
> > >> 
> > >> 
> > >> 
> > >> 
> > >> [resending so the message is smaller]
> > >> 
> > >> 
> > >> 
> > >> 
> > >> 
> > >> 
> > >> From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org >
> > >> To: "Quentin Colombet" < qcolombet at apple.com >
> > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> > >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > >> instruction selection
> > >> 
> > >> Hi Philip,
> > >> 
> > >> 
> > >> 
> > >> 
> > >> 
> > >> store <2 x i64> %1, <2 x i64>* %y
> > >> 
> > >> Yes. The memory pattern differs. This is the first diagram on
> > >> the
> > >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts )
> > >> 
> > >> 
> > >> I think that teaching the optimizer about big-Endian lane
> > >> ordering
> > >> would have been better. Inserting the REV after every LDR sounds
> > >> very similar to what we do for VSX on little-Endian PowerPC
> > >> systems
> > >> (PowerPC may have a slight advantage here in that we don't need
> > >> to
> > >> do insertelement / extractelement / shufflevector through memory
> > >> on
> > >> systems where little-Endian mode is relevant, see
> > >> http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> > >> ).
> > >> 
> > >> Given what's been done, should we update the LangRef. It
> > >> currently
> > >> reads, " The ‘ bitcast ‘ instruction converts value to type ty2
> > >> .
> > >> It
> > >> is always a no-op cast because no bits change with this
> > >> conversion.
> > >> The conversion is done as if the value had been stored to memory
> > >> and
> > >> read back as type ty2 ." But this is now, at the least,
> > >> misleading,
> > >> because this process of storing the value as one type and
> > >> reading
> > >> it
> > >> back in as another does, in fact, change the bits. We need to
> > >> make
> > >> clear that this might change the bits (perhaps specifically by
> > >> calling out this case of vector bitcasts on big-Endian
> > >> systems?).
> > >> 
> > >> 
> > >> 
> > >> Also, regarding this, " Most operating systems however do not
> > >> run
> > >> with alignment faults enabled, so this is often not an issue."
> > >> Are
> > >> you saying that the processor does the correct thing in this
> > >> case
> > >> (if alignment faults are not enabled, then it performs a proper
> > >> unaligned load), or that the operating-system trap handler
> > >> emulates
> > >> the unaligned load should one occur?
> > >> 
> > >> Thanks again,
> > >> Hal
> > >> 
> > >> 
> > >> _______________________________________________
> > >> 
> > >> 
> > >> LLVM Developers mailing list
> > >> llvm-dev at lists.llvm.org
> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >> 
> > >> 
> > >> --
> > >> Hal Finkel
> > >> Assistant Computational Scientist
> > >> Leadership Computing Facility
> > >> Argonne National Laboratory
> > >> 
> > 
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-dev mailing list