[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Thu Jan 14 05:48:10 PST 2016

Hi,

I've given a bit of misinformation here and have caused some confusion.
After talking with Tim and Mehdi last night on IRC, I need to correct what
I said above to fall more in line with what Daniel is saying. If any of the
below contradicts what I've said already, please accept my apologies. This
version should be right.

The behaviour of the code generator for big-endian NEON and MIPS is derived
from the fact that we did not want to change IR semantics at all. A
fundamental property that we do not want to break is memory round-tripping:

%1 = load <4 x i32>, %p32
%2 = bitcast <4 x i32> %1 to <2 x i64>
store <2 x i64> %2, (bitcast %p32 to <2 x i64>*)

The value of memory before and after the store MUST NOT change (contrary to
what I said in an earlier post, I know).

So in fact everything you can do in IR is valid. There are no changes to IR
semantics in the slightest. However, when it comes to generating code from
the IR, there are new rules:
  1) Loads and stores are selected to be special loads and stores that do
some transform from a canonical form in memory to a type-specific form in
register.
  2) Because bitcasts are load/store pairs in semantic, they must behave as
if a store then load was done. Specifically (bitcast TyA to TyB) must
transform TyA -> canonical form -> TyB, as a store then load would.
Therefore bitcasts are not no-ops during code generation (*but behave as if
they are from an IR perspective!*).

The reason this works neatly in IR is due to the IR's type system. In order
to change type, a cast must be inserted or a memory round trip. There is no
other way. However in SDAG, things break down a bit. SDAG is more weakly
typed, and bitconverts are often simply removed. We need that not to
happen. Bitconverts are not no-ops.

Daniel's explanation of physical register mapping was excellent so I'm not
going to repeat that.

I apologise for the confusion and misinformation. This is quite a complex
topic and takes a bit of mind bending for me to understand, and it was a
long time ago.

James

On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <Daniel.Sanders at imgtec.com>
wrote:

> > Ok.  Then we need to change the LangRef as suggested.  Given this is a
> rather important semantic change, I think you need to send a top level RFC
> to the list.
>
>
>
> FWIW, I don't think this is a semantic change to LLVM-IR itself. I think
> it's more clearing up the misconception that LLVM-IR semantics also apply
> to SelectionDAG's operations. That said, I do think it's important to
> mention this in LangRef since it's very easy to make this mistake and very
> few targets need to worry about the distinction.
>
>
>
> To explain why I don't think this is a semantic change to LLVM-IR, let's
> consider this example from earlier:
>
>     %0 = load <4 x i32> %x
>     %1 = bitcast <4 x i32> %0 to <2 x i64>
>
>     store <2 x i64> %1, <2 x i64>* %y
>
>
>
> In LLVM-IR terms, if the value of %0 is:
>
>     %0 = 0x00112233_44556677_8899aabb_ccddeeff
>
> then the value of %1 is:
>
>     %1 = 0x0011223344556677_8899aabbccddeeff
>
> which agrees with the store/load and the 'no bits change' statements in
> LangRef.
>
>
>
> However, the mapping of these bits to physical register bits is not
> consistent between types:
>
>     Physreg(%0) = 0xccddeeff_8899aabb_44556677_00112233
>
>     Physreg(%1) = 0x8899aabbccddeeff_0011223344556677
>
>
>
> Essentially, I'm saying that BitCastInst and ISD::BITCAST have slightly
> different semantics because of their different domains. The former is
> working on an abstract representation of the values where both statements
> in LangRef are true, but the latter is closer to the target where the 'no
> bits change' statement ceases to be true in some cases.
>
> > A couple of points that will need clarified:
> > - Does this only apply to vector types?  It definitely doesn't apply
> between pointer types today.  What about integer, floating point, and FCAs?
>
>
>
> I've only seen it for vector types so far but in theory it could happen
> for other types. I'd expect FCAs to encounter it since the physical
> registers may contain padding that isn't present in the LLVM-IR
> representation and the placement and amount of padding will depend on the
> exact FCA.
>
> I can think of cases where address space casts can encounter the same
> problem but that's already been covered in LangRef ("It can be a no-op cast
> or a complex value modification, depending on the target and the address
> space pair.").
>
>
>
> Does anyone use FCAs directly? Most targets seem to convert them to
> same-sized integers or bitcast an FCA* to i8*.
>
>
> > - Is combining two casts into one a legal operation?  I think it is so
> far, but we need to explicitly state that.
>
>
>
> Yes, A->B->C and A->C are equivalent.
>
>
> > - Do we have a predicate for identifying no-op casts that can be freely
> removed/combined?
>
>
>
> James mentioned one in CGP but I haven't been able to find it. I don't
> think it's necessary to have one at the LLVM-IR level but we do need one in
> the backends. I remember adding one to the backend but I can't find that
> either so I think I'm remembering one of my patches from before I split
> MSA's registers into type-specific classes.
>
>
> > - Is coercing a load to the type it's immediately bitcast to legal under
> this model?
>
>
>
> Yes.
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Philip
> Reames via llvm-dev
> *Sent:* 13 January 2016 20:31
> *To:* James Molloy; Hal Finkel
> *Cc:* llvm-dev
>
>
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
>
>
>
>
> On 01/13/2016 12:20 PM, James Molloy wrote:
>
> >  (Right?)
>
>
>
> Uh no, the register content explicitly does change :( We insert REV
> instructions (byteswap) on each bitcast. Bitcasts can be merged and elided
> etc, but conceptually there's a register content change on every bitcast.
>
> Ok.  Then we need to change the LangRef as suggested.  Given this is a
> rather important semantic change, I think you need to send a top level RFC
> to the list.
>
> A couple of points that will need clarified:
> - Does this only apply to vector types?  It definitely doesn't apply
> between pointer types today.  What about integer, floating point, and FCAs?
> - Is combining two casts into one a legal operation?  I think it is so
> far, but we need to explicitly state that.
> - Do we have a predicate for identifying no-op casts that can be freely
> removed/combined?
> - Is coercing a load to the type it's immediately bitcast to legal under
> this model?
>
>
>
> James
>
>
>
> On Wed, 13 Jan 2016 at 18:09 Philip Reames <listmail at philipreames.com>
> wrote:
>
>
>
> On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev wrote:
> > ----- Original Message -----
> >> From: "James Molloy" <james at jamesmolloy.co.uk>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Quentin Colombet" <
> qcolombet at apple.com>
> >> Sent: Wednesday, January 13, 2016 9:54:26 AM
> >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
> >>
> >>
> >>> I think that teaching the optimizer about big-Endian lane ordering
> >>> would have been better.
> >>
> >> It's certainly arguable. Even in hindsight I'm glad we didn't -
> >> that's the approach GCC took and they've been fixing subtle bugs in
> >> their vectorizer ever since.
> >>
> >>
> >>> Inserting the REV after every LDR
> >>
> >> We only do this conceptually. In most cases REVs cancel out, and we
> >> have the LD1 instruction which is LDR+REV. With enough peepholes
> >> there's really no need for code to run slower.
> >>
> >>
> >>> Given what's been done, should we update the LangRef.
> >>
> >> Potentially, yes. I hadn't realised quite how strongly worded it was
> >> with respect to this.
> >>
> > Please do ;)
> I'm not sure changing bitcast is the right place.  Since the bitcast is
> representing the in-register value (which doesn't change), maybe we
> should define it as part of the load/store instead?  That's essentially
> what's going on; we're converting from a canonical register form to a
> variety of memory forms.  (Right?)
> >
> >   -Hal
> >
> >> James
> >>
> >>
> >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel < hfinkel at anl.gov > wrote:
> >>
> >>
> >>
> >>
> >> [resending so the message is smaller]
> >>
> >>
> >>
> >>
> >>
> >>
> >> From: "James Molloy via llvm-dev" < llvm-dev at lists.llvm.org >
> >> To: "Quentin Colombet" < qcolombet at apple.com >
> >> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> >> Sent: Wednesday, January 13, 2016 2:35:32 AM
> >> Subject: Re: [llvm-dev] [GlobalISel] A Proposal for global
> >> instruction selection
> >>
> >> Hi Philip,
> >>
> >>
> >>
> >>
> >>
> >> store <2 x i64> %1, <2 x i64>* %y
> >>
> >> Yes. The memory pattern differs. This is the first diagram on the
> >> right at: http://llvm.org/docs/BigEndianNEON.html#bitconverts )
> >>
> >>
> >> I think that teaching the optimizer about big-Endian lane ordering
> >> would have been better. Inserting the REV after every LDR sounds
> >> very similar to what we do for VSX on little-Endian PowerPC systems
> >> (PowerPC may have a slight advantage here in that we don't need to
> >> do insertelement / extractelement / shufflevector through memory on
> >> systems where little-Endian mode is relevant, see
> >>
> http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf
> >> ).
> >>
> >> Given what's been done, should we update the LangRef. It currently
> >> reads, " The ‘ bitcast ‘ instruction converts value to type ty2 . It
> >> is always a no-op cast because no bits change with this conversion.
> >> The conversion is done as if the value had been stored to memory and
> >> read back as type ty2 ." But this is now, at the least, misleading,
> >> because this process of storing the value as one type and reading it
> >> back in as another does, in fact, change the bits. We need to make
> >> clear that this might change the bits (perhaps specifically by
> >> calling out this case of vector bitcasts on big-Endian systems?).
> >>
> >>
> >>
> >> Also, regarding this, " Most operating systems however do not run
> >> with alignment faults enabled, so this is often not an issue." Are
> >> you saying that the processor does the correct thing in this case
> >> (if alignment faults are not enabled, then it performs a proper
> >> unaligned load), or that the operating-system trap handler emulates
> >> the unaligned load should one occur?
> >>
> >> Thanks again,
> >> Hal
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/1ce58869/attachment.html>