[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Fri Jan 15 05:46:43 PST 2016

----- Original Message -----

> From: "Daniel Sanders" <Daniel.Sanders at imgtec.com>
> To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal Finkel"
> <hfinkel at anl.gov>, "Philip Reames" <listmail at philipreames.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, January 15, 2016 4:29:33 AM
> Subject: RE: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection

> Hi,

> I think we just need to draw attention to the fact that other IR's
> may vary.

> I'm thinking we should add something like this to the ISD::BITCAST
> doxygen comment:
> This is subtly different from the bitcast instruction from LLVM-IR
> since this node may change the bits
> in the register. For example, this occurs on big-endian NEON and
> big-endian MSA where the layout
> of the bits in the register depends on the vector type and this node
> acts as a shuffle operation for
> some vector type combinations.
I agree; improving the ISD::BITCAST documentation is a good idea. 

> And have LangRef say something like:
> The conversion is done as if the value had been stored to memory and
> read back as type ty2. This is equivalent to a no-op cast where no
> bits change with this conversion.
> .. caution::
> This equivalence does not necessarily apply to other IR's in LLVM.
> See ISD::BITCAST for an example.
> The '.. caution::' should render in the same way as the 'Rationale'
> box in http://llvm.org/docs/LangRef.html#volatile-memory-accesses.
We should not do this. We try to keep the LangRef as implementation-independent as possible, and thus, we don't explicitly discuss things like ISD nodes there. 

-Hal 

> From: James Molloy [mailto:james at jamesmolloy.co.uk]
> Sent: 15 January 2016 08:46
> To: Hal Finkel; Philip Reames
> Cc: llvm-dev at lists.llvm.org; Daniel Sanders
> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection

> Hi,

> > "It is always a no-op cast because no bits change with this
> > conversion. The conversion is done as if the value had been stored
> > to memory and read back as type ty2."

> I think a simple "as-if" in there should be sufficient;

> "It is always a no-op cast because it acts as if no bits change with
> this conversion. The conversion is done as if the value had been
> stored to memory and read back as type ty2."

> What do you think?

> James

> On Thu, 14 Jan 2016 at 22:35 Hal Finkel < hfinkel at anl.gov > wrote:
> > ----- Original Message -----
> 

> > > From: "Philip Reames" < listmail at philipreames.com >
> 
> > > To: "James Molloy" < james at jamesmolloy.co.uk >, "Daniel Sanders"
> > > <
> > > Daniel.Sanders at imgtec.com >, "Hal Finkel"
> 
> > > < hfinkel at anl.gov >
> 
> > > Cc: llvm-dev at lists.llvm.org
> 
> > > Sent: Thursday, January 14, 2016 3:48:37 PM
> 
> > > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> > > instruction selection
> 
> > >
> 
> > > This explanation makes a lot more sense to me. I think it would
> > > make
> 
> > > sense to document this mental model, but I agree that this
> 
> > > interpretation does not seem to require changes to the IR
> > > semantics.
> 

> > The semantics, no. But we still may want to update the language
> > reference. It says, "It is always a no-op cast because no bits
> > change with this conversion. The conversion is done as if the value
> > had been stored to memory and read back as type ty2." And, what
> > we've learned, is that this second sentence does not always imply
> > the first (the bits might, in fact, change).
> 

> > -Hal
> 

> > >
> 
> > > Just to check, this implies that DSE *is* legal right?
> 
> > >
> 
> > > Philip
> 
> > >
> 
> > >
> 
> > > On 01/14/2016 05:48 AM, James Molloy wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > > Hi,
> 
> > >
> 
> > >
> 
> > > I've given a bit of misinformation here and have caused some
> 
> > > confusion. After talking with Tim and Mehdi last night on IRC, I
> 
> > > need to correct what I said above to fall more in line with what
> 
> > > Daniel is saying. If any of the below contradicts what I've said
> 
> > > already, please accept my apologies. This version should be
> > > right.
> 
> > >
> 
> > >
> 
> > > The behaviour of the code generator for big-endian NEON and MIPS
> > > is
> 
> > > derived from the fact that we did not want to change IR semantics
> > > at
> 
> > > all. A fundamental property that we do not want to break is
> > > memory
> 
> > > round-tripping:
> 
> > >
> 
> > >
> 
> > > %1 = load <4 x i32>, %p32
> 
> > > %2 = bitcast <4 x i32> %1 to <2 x i64>
> 
> > > store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)
> 
> > >
> 
> > >
> 
> > > The value of memory before and after the store MUST NOT change
> 
> > > (contrary to what I said in an earlier post, I know).
> 
> > >
> 
> > >
> 
> > > So in fact everything you can do in IR is valid. There are no
> > > changes
> 
> > > to IR semantics in the slightest. However, when it comes to
> 
> > > generating code from the IR, there are new rules:
> 
> > > 1) Loads and stores are selected to be special loads and stores
> > > that
> 
> > > do some transform from a canonical form in memory to a
> > > type-specific
> 
> > > form in register.
> 
> > > 2) Because bitcasts are load/store pairs in semantic, they must
> 
> > > behave as if a store then load was done. Specifically (bitcast
> > > TyA
> 
> > > to TyB) must transform TyA -> canonical form -> TyB, as a store
> > > then
> 
> > > load would. Therefore bitcasts are not no-ops during code
> > > generation
> 
> > > (*but behave as if they are from an IR perspective!*).
> 
> > >
> 
> > >
> 
> > > The reason this works neatly in IR is due to the IR's type
> > > system.
> > > In
> 
> > > order to change type, a cast must be inserted or a memory round
> 
> > > trip. There is no other way. However in SDAG, things break down a
> 
> > > bit. SDAG is more weakly typed, and bitconverts are often simply
> 
> > > removed. We need that not to happen. Bitconverts are not no-ops.
> 
> > >
> 
> > >
> 
> > > Daniel's explanation of physical register mapping was excellent
> > > so
> 
> > > I'm not going to repeat that.
> 
> > >
> 
> > >
> 
> > > I apologise for the confusion and misinformation. This is quite a
> 
> > > complex topic and takes a bit of mind bending for me to
> > > understand,
> 
> > > and it was a long time ago.
> 
> > >
> 
> > >
> 
> > > James
> 
> > >
> 
> > >
> 
> > > On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <
> 
> > > Daniel.Sanders at imgtec.com > wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > Ok. Then we need to change the LangRef as suggested. Given this
> > > > is
> 
> > > > a rather important semantic change, I think you need to send a
> > > > top
> 
> > > > level RFC to the list.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > FWIW, I don't think this is a semantic change to LLVM-IR itself.
> > > I
> 
> > > think it's more clearing up the misconception that LLVM-IR
> > > semantics
> 
> > > also apply to SelectionDAG's operations. That said, I do think
> > > it's
> 
> > > important to mention this in LangRef since it's very easy to make
> 
> > > this mistake and very few targets need to worry about the
> 
> > > distinction.
> 
> > >
> 
> > >
> 
> > >
> 
> > > To explain why I don't think this is a semantic change to
> > > LLVM-IR,
> 
> > > let's consider this example from earlier:
> 
> > >
> 
> > >
> 
> > >
> 
> > > %0 = load <4 x i32> %x
> 
> > > %1 = bitcast <4 x i32> %0 to <2 x i64>
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > store <2 x i64> %1, <2 x i64>* %y
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > In LLVM-IR terms, if the value of %0 is:
> 
> > >
> 
> > > %0 = 0x00112233_44556677_8899aabb_ccddeeff
> 
> > >
> 
> > > then the value of %1 is:
> 
> > >
> 
> > > %1 = 0x0011223344556677_8899aabbccddeeff
> 
> > >
> 
> > > which agrees with the store/load and the 'no bits change'
> > > statements
> 
> > > in LangRef.
> 
> > >
> 
> > >
> 
> > >
> 
> > > However, the mapping of these bits to physical register bits is
> > > not
> 
> > > consistent between types:
> 
> > >
> 
> > > Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
> 
> > >
> 
> > > Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
> 
> > >
> 
> > >
> 
> > >
> 
> > > Essentially, I'm saying that BitCastInst and ISD::BITCAST have
> 
> > > slightly different semantics because of their different domains.
> > > The
> 
> > > former is working on an abstract representation of the values
> > > where
> 
> > > both statements in LangRef are true, but the latter is closer to
> > > the
> 
> > > target where the 'no bits change' statement ceases to be true in
> 
> > > some cases.
> 
> > >
> 
> > >
> 
> > >
> 
> > > > A couple of points that will need clarified:
> 
> > > > - Does this only apply to vector types? It definitely doesn't
> > > > apply
> 
> > > > between pointer types today. What about integer, floating
> > > > point,
> 
> > > > and FCAs?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > I've only seen it for vector types so far but in theory it could
> 
> > > happen for other types. I'd expect FCAs to encounter it since the
> 
> > > physical registers may contain padding that isn't present in the
> 
> > > LLVM-IR representation and the placement and amount of padding
> > > will
> 
> > > depend on the exact FCA.
> 
> > >
> 
> > > I can think of cases where address space casts can encounter the
> > > same
> 
> > > problem but that's already been covered in LangRef ("It can be a
> 
> > > no-op cast or a complex value modification, depending on the
> > > target
> 
> > > and the address space pair.").
> 
> > >
> 
> > >
> 
> > >
> 
> > > Does anyone use FCAs directly? Most targets seem to convert them
> > > to
> 
> > > same-sized integers or bitcast an FCA* to i8*.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > - Is combining two casts into one a legal operation? I think it
> > > > is
> 
> > > > so far, but we need to explicitly state that.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Yes, A->B->C and A->C are equivalent.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > - Do we have a predicate for identifying no-op casts that can
> > > > be
> 
> > > > freely removed/combined?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > James mentioned one in CGP but I haven't been able to find it. I
> 
> > > don't think it's necessary to have one at the LLVM-IR level but
> > > we
> 
> > > do need one in the backends. I remember adding one to the backend
> 
> > > but I can't find that either so I think I'm remembering one of my
> 
> > > patches from before I split MSA's registers into type-specific
> 
> > > classes.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > - Is coercing a load to the type it's immediately bitcast to
> > > > legal
> 
> > > > under this model?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Yes.
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On
> > > Behalf
> 
> > > Of Philip Reames via llvm-dev
> 
> > > Sent: 13 January 2016 20:31
> 
> > > To: James Molloy; Hal Finkel
> 
> > > Cc: llvm-dev
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> 
> > > instruction selection
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > On 01/13/2016 12:20 PM, James Molloy wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > > (Right?)
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > Uh no, the register content explicitly does change :( We insert
> > > REV
> 
> > > instructions (byteswap) on each bitcast. Bitcasts can be merged
> > > and
> 
> > > elided etc, but conceptually there's a register content change on
> 
> > > every bitcast.
> 
> > >
> 
> > > Ok. Then we need to change the LangRef as suggested. Given this
> > > is
> > > a
> 
> > > rather important semantic change, I think you need to send a top
> 
> > > level RFC to the list.
> 
> > >
> 
> > > A couple of points that will need clarified:
> 
> > > - Does this only apply to vector types? It definitely doesn't
> > > apply
> 
> > > between pointer types today. What about integer, floating point,
> > > and
> 
> > > FCAs?
> 
> > > - Is combining two casts into one a legal operation? I think it
> > > is
> > > so
> 
> > > far, but we need to explicitly state that.
> 
> > > - Do we have a predicate for identifying no-op casts that can be
> 
> > > freely removed/combined?
> 
> > > - Is coercing a load to the type it's immediately bitcast to
> > > legal
> 
> > > under this model?
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > James
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > On Wed, 13 Jan 2016 at 18:09 Philip Reames <
> 
> > > listmail at philipreames.com > wrote:
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > >
> 
> > > On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> 
> > > > ----- Original Message -----
> 
> > > >> From: "James Molloy" < james at jamesmolloy.co.uk >
> 
> > > >> To: "Hal Finkel" < hfinkel at anl.gov >
> 
> > > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >, "Quentin Colombet"
> > > >> <
> 
> > > >> qcolombet at apple.com >
> 
> > > >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> 
> > > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> 
> > > >> instruction selection
> 
> > > >>
> 
> > > >>
> 
> > > >>> I think that teaching the optimizer about big-Endian lane
> 
> > > >>> ordering
> 
> > > >>> would have been better.
> 
> > > >>
> 
> > > >> It's certainly arguable. Even in hindsight I'm glad we didn't
> > > >> -
> 
> > > >> that's the approach GCC took and they've been fixing subtle
> > > >> bugs
> 
> > > >> in
> 
> > > >> their vectorizer ever since.
> 
> > > >>
> 
> > > >>
> 
> > > >>> Inserting the REV after every LDR
> 
> > > >>
> 
> > > >> We only do this conceptually. In most cases REVs cancel out,
> > > >> and
> 
> > > >> we
> 
> > > >> have the LD1 instruction which is LDR+REV. With enough
> > > >> peepholes
> 
> > > >> there's really no need for code to run slower.
> 
> > > >>
> 
> > > >>
> 
> > > >>> Given what's been done, should we update the LangRef.
> 
> > > >>
> 
> > > >> Potentially, yes. I hadn't realised quite how strongly worded
> > > >> it
> 
> > > >> was
> 
> > > >> with respect to this.
> 
> > > >>
> 
> > > > Please do ;)
> 
> > > I'm not sure changing bitcast is the right place. Since the
> > > bitcast
> 
> > > is
> 
> > > representing the in-register value (which doesn't change), maybe
> > > we
> 
> > > should define it as part of the load/store instead? That's
> 
> > > essentially
> 
> > > what's going on; we're converting from a canonical register form
> > > to
> > > a
> 
> > > variety of memory forms. (Right?)
> 
> > > >
> 
> > > > -Hal
> 
> > > >
> 
> > > >> James
> 
> > > >>
> 
> > > >>
> 
> > > >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov >
> > > >> wrote:
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> [resending so the message is smaller]
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org >
> 
> > > >> To: "Quentin Colombet" < qcolombet at apple.com >
> 
> > > >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> 
> > > >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> 
> > > >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> 
> > > >> instruction selection
> 
> > > >>
> 
> > > >> Hi Philip,
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> store <2 x i64> %1, <2 x i64>* %y
> 
> > > >>
> 
> > > >> Yes. The memory pattern differs. This is the first diagram on
> > > >> the
> 
> > > >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts
> > > >> )
> 
> > > >>
> 
> > > >>
> 
> > > >> I think that teaching the optimizer about big-Endian lane
> > > >> ordering
> 
> > > >> would have been better. Inserting the REV after every LDR
> > > >> sounds
> 
> > > >> very similar to what we do for VSX on little-Endian PowerPC
> 
> > > >> systems
> 
> > > >> (PowerPC may have a slight advantage here in that we don't
> > > >> need
> > > >> to
> 
> > > >> do insertelement / extractelement / shufflevector through
> > > >> memory
> 
> > > >> on
> 
> > > >> systems where little-Endian mode is relevant, see
> 
> > > >> http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> 
> > > >> ).
> 
> > > >>
> 
> > > >> Given what's been done, should we update the LangRef. It
> > > >> currently
> 
> > > >> reads, " The ‘ bitcast ‘ instruction converts value to type
> > > >> ty2
> > > >> .
> 
> > > >> It
> 
> > > >> is always a no-op cast because no bits change with this
> 
> > > >> conversion.
> 
> > > >> The conversion is done as if the value had been stored to
> > > >> memory
> 
> > > >> and
> 
> > > >> read back as type ty2 ." But this is now, at the least,
> 
> > > >> misleading,
> 
> > > >> because this process of storing the value as one type and
> > > >> reading
> 
> > > >> it
> 
> > > >> back in as another does, in fact, change the bits. We need to
> > > >> make
> 
> > > >> clear that this might change the bits (perhaps specifically by
> 
> > > >> calling out this case of vector bitcasts on big-Endian
> > > >> systems?).
> 
> > > >>
> 
> > > >>
> 
> > > >>
> 
> > > >> Also, regarding this, " Most operating systems however do not
> > > >> run
> 
> > > >> with alignment faults enabled, so this is often not an issue."
> > > >> Are
> 
> > > >> you saying that the processor does the correct thing in this
> > > >> case
> 
> > > >> (if alignment faults are not enabled, then it performs a
> > > >> proper
> 
> > > >> unaligned load), or that the operating-system trap handler
> 
> > > >> emulates
> 
> > > >> the unaligned load should one occur?
> 
> > > >>
> 
> > > >> Thanks again,
> 
> > > >> Hal
> 
> > > >>
> 
> > > >>
> 
> > > >> _______________________________________________
> 
> > > >>
> 
> > > >>
> 
> > > >> LLVM Developers mailing list
> 
> > > >> llvm-dev at lists.llvm.org
> 
> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> > > >>
> 
> > > >>
> 
> > > >> --
> 
> > > >> Hal Finkel
> 
> > > >> Assistant Computational Scientist
> 
> > > >> Leadership Computing Facility
> 
> > > >> Argonne National Laboratory
> 
> > > >>
> 
> > >
> 
> > >
> 
> > >
> 

> > --
> 
> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/c5af7996/attachment.html>