[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Tue Jan 12 16:11:41 PST 2016

I think after reading your link I'm actually more confused.  This might 
just be a wording problem, but let me ask a couple of clarifying questions.

1) After compiling the code sequence below (from that page), does the in 
memory bit pattern differ?  The page seemed to contradict itself.

%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
      store <2 x i64> %1, <2 x i64>* %y

2) If so, does this mean that performing dead-store-elimination is 
illegal for ARM?

3) Are loads and stores ever allowed to fault based on the in memory 
representation?

4) What happens if we have a load of <2xi64> following the store above 
and we do DSE the store before forwarding it's value?

Philip

On 01/12/2016 05:55 AM, James Molloy via llvm-dev wrote:
> Hi,
>
> > I found this thinking quite difficult to explain. Does it make sense?
>
> It might help to link to the documentation on why bitcasts are weird 
> on big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts
>
> Cheers,
>
> James
>
> On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Hi,
>
>     I haven't found much time to look into the LLVM-IR-level
>     optimizations yet so I'm not sure how they handle bitcasts. With
>     that disclaimer in mind, I expect it's fine for the LLVM-IR level
>     optimizations to handle them using either definition since they
>     are equivalent at the LLVM-IR level. My thinking is that LLVM-IR
>     is consistent about how virtual bits are assigned to types and
>     that non-zero instruction nops arise when there is inconsistency.
>
>     At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto
>     bits 0-127 of <2 x i64> using the identity map. It's therefore ok
>     to interpret such bitcasts as zero-instruction no-ops. As far as I
>     can tell, LLVM-IR has been defined such that the identity map can
>     be used for bitcasts between all same-sized types, and also such
>     that bitcasting between different-sized types is invalid.
>
>     Similarly, most targets have a single mapping of virtual bit
>     numbers to physical bit numbers for each size that is applied
>     consistently when mapping a type to memory. For example 32-bits
>     map like so:
>
>     Little Endian Targets: virtual register bits
>     {0..7,8..15,16..23,24..31} map to physical memory bits
>     {0..7,8..15,16..23,24..31}
>
>     Big Endian Targets: virtual register bits
>     {0..7,8..15,16..23,24..31} map to physical memory bits
>     {24..31,16..23,8..15,0..7}
>
>     regardless of whether it's a float, or an i32. We therefore need
>     zero instructions to re-map physical memory bits for one type onto
>     another type.
>
>     The same idea holds for physical register classes. There's a
>     single consistent mapping from physical memory bits to physical
>     register bits that applies for all types that can be stored in
>     that class. As long as this is the case the load/store and
>     zero-instruction interpretation of bitcasts are equivalent.
>
>     In the case of big-endian MSA and NEON, there isn't a single
>     consistent mapping from physical memory bits to physical register
>     bits so the equivalence in the two definitions breaks down:
>
>     i128: virtual register bits {0..31, 32..63, 64..95, 96...127} map
>     to physical memory bits {96..127, 64..95, 32..63, 0..31}
>
>     <4 x i32>: virtual register bits {0..31, 32..63, 64..95, 96...127}
>     map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>
>     <2 x i64>: virtual register bits {0..31, 32..63, 64..95, 96...127}
>     map to physical memory bits {32..63, 0..31, 96..127, 64..95}
>
>     with these inconsistent mappings we require instructions to
>     bitcast between the types.
>
>     I found this thinking quite difficult to explain. Does it make sense?
>
>     > I am fine with treating bit casts as equivalent store/load pairs
>     in GISel, I just want to be sure we do not have a semantic gap
>     between the LLVM-IR and the backend if we do.
>
>     I think a gap would arise from not having a GISel equivalent to
>     ISD::BITCAST (gBITCAST?) available when it's necessary for
>     correctness. However, I agree that GISel should delete bitcasts
>     for the common case where the store/load and zero-instruction
>     definitions are equivalent.
>
>     *From:*Quentin Colombet [mailto:qcolombet at apple.com
>     <mailto:qcolombet at apple.com>]
>     *Sent:* 11 January 2016 17:23
>     *To:* Daniel Sanders
>     *Cc:* Tim Northover (t.p.northover at gmail.com
>     <mailto:t.p.northover at gmail.com>); llvm-dev
>
>
>     *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global
>     instruction selection
>
>     Hi Daniel,
>
>     Thanks for the pointers, I wasn’t aware of the second thread
>     you’ve mentioned.
>
>     I may be wrong but I think LLVM-IR optimizations really treat
>     bistcasts as no-op casts, in the sense of no instructions are
>     required.
>
>     Is there anyone that could chime in on that?
>
>     However, it seems SelectionDAG sticks to the load/store semantic:
>
>     "BITCAST - This operator converts between integer, vector and FP
>     values, as if the value was *stored to memory with one type and
>     loaded from the same address with the other type* (or equivalently
>     for vector format conversions, etc)."
>
>     I am fine with treating bit casts as equivalent store/load pairs
>     in GISel, I just want to be sure we do not have a semantic gap
>     between the LLVM-IR and the backend if we do.
>
>     Thanks,
>
>     -Quentin
>
>         On Jan 11, 2016, at 7:43 AM, Daniel Sanders
>         <Daniel.Sanders at imgtec.com <mailto:Daniel.Sanders at imgtec.com>>
>         wrote:
>
>         Hi,
>
>         It was a comment by Tim that first made me aware of it
>         (seehttp://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.htmlbut
>         I think he commented on one of my patches before that).
>
>         I asked about it on llvm-dev a couple weeks later
>         (http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html)
>         highlighting the contradiction and was told that 'no-op cast'
>         referred to the lack of math rather than a requirement that
>         zero instructions are used. It's therefore my understanding
>         that shuffling the bits to preserve the load/store based
>         definition isn't considered to be changing the bits.
>
>         I think the main thing the current definition is unclear on is
>         whether it refers to the bits in a physical machine register
>         or the bits in the LLVM-IR virtual register. Most of the time
>         these two views are the same but this doesn't quite work for
>         big-endian MSA/NEON. For example:
>
>         %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to <2 x i64>
>
>         %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) | 4>
>
>         are equivalent to each other in LLVM-IR terms but the
>         constants are physically laid out in MSA registers as:
>
>         0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2,
>         i32 3, i32 4>
>
>         0x00000003000000040000000100000002 # <2 x i64> <i64 (1 << 32)
>         | 2, i64 (3 << 32) | 4>
>
>         and we must therefore shuffle the bits to preserve LLVM-IR's
>         point of view.
>
>         *From:*Quentin Colombet [mailto:qcolombet at apple.com]
>         *Sent:*07 January 2016 19:58
>         *To:*Daniel Sanders
>         *Cc:*llvm-dev
>         *Subject:*Re: [llvm-dev] [GlobalISel] A Proposal for global
>         instruction selection
>
>         Hi Daniel,
>
>         I had a quick look at the language reference for bitcast and I
>         have a different reading than what you were pointing out.
>
>         Indeed, my take away is:
>
>         "It is*always a */*no-op cast*/ because no bits change with
>         this conversion."
>
>         In other words, deleting all bitcast instructions should be fine.
>
>         My understanding of the quote you’ve highlighted is that it
>         tells C programmers that this is like a memcpy, not a cast :).
>
>         Cheers,
>
>         -Quentin
>
>             On Nov 20, 2015, at 6:53 AM, Daniel Sanders
>             <Daniel.Sanders at imgtec.com
>             <mailto:Daniel.Sanders at imgtec.com>> wrote:
>
>             Hi,
>
>             I haven't had chance to read all of this yet, but one
>             minor thing occurred to me during your presentation that I
>             want to mention. At one point you mentioned deleting all
>             the bitcast instructions since they're equivalent to nops
>             but this isn't always true.
>
>             Thehttp://llvm.org/docs/LangRef.htmldefinition of the
>             bitcast instruction includes this sentence:
>
>             The conversion is done as if the value had been stored to
>             memory and read back as type ty2.
>
>             For big-endian MSA, this is equivalent to a shuffling of
>             the bits in the register because endianness only changes
>             the byte order within each element. The order of the
>             elements is unaffected by endianness. IIRC, big-endian
>             NEON is the same way.
>
>             *From:*llvm-dev
>             [mailto:llvm-dev-bounces at lists.llvm.org]*On Behalf
>             Of*Quentin Colombet via llvm-dev
>             *Sent:*18 November 2015 19:27
>             *To:*llvm-dev
>             *Subject:*[llvm-dev] [GlobalISel] A Proposal for global
>             instruction selection
>
>             Hi,
>
>             With this email, I would like to kick-off the development
>             for the next instruction selector that I described during
>             the last LLVM Dev’ Meeting.
>             For the motivations, see Jakob’s proposal
>             (http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html)
>             and for the proposal, see the slides (Keynote:
>             http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co or
>             PDF:
>             http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co)
>             or the talk
>             (https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2).
>
>
>             TL;DR This is happening now, feedbacks invited!
>
>             *** Context ***
>
>             During the last LLVM Dev’ Meeting, I have presented a
>             proposal for the next instruction selector, GlobalISel.
>             The proposal is basically summarized in "High Level
>             Prototype Design” and “Roadmap”. (If you want further
>             details, feel free to reach me.)
>
>             The first step of the development plan is to prototype the
>             new framework on open source. The idea is to *start
>             prototyping now(!)* and have the discussion ongoing in
>             parallel. The reason of such approach is to have code that
>             can be used to inform those discussions, e.g., by
>             collecting data and trying different designs approaches.
>             Regarding the discussion, I have listed a few points where
>             your feedbacks would be particularly appreciated (see
>             Feedback Invite).
>
>
>             Also, as I have mentioned in my talk, some issues are
>             controversial but I expect them to be resolved during
>             prototype development. Specifically theses concern aspects
>             of legalization (should parts of it be done at the LLVM IR
>             level or all at the MI level?) and code re-use for
>             instruction combiner. Please feel free to bring up your
>             specific concern as I move along with the development plan.
>
>             I expect the design to evolve with our experimental
>             findings and your feedbacks and contributions.
>             Nonetheless, we expect to nail down some design decisions
>             once and for all as the prototype progresses. I have
>             highlighted them with the following pattern *[final]*.
>
>
>
>             *** Feedback Invite ***
>
>             If you follow and support this work you need to be aware
>             of three things and I am eager to hear your feedback and
>             thoughts about them: the overall goals of Global ISel, the
>             goals of the prototype, and the impact of the prototype
>             work on backend design.
>
>             In the section “Goals", I defined (repeated for people
>             that saw the talk) the goals for the Global ISel design.
>             - Do you see anything missing?
>             - Do you see something that should not be there?
>
>             The prototype will answer critical design questions (see
>             “Design Questions the Prototype Addresses at the End of
>             M1" for examples) before the actual design of Gobal ISel
>             is finalized, but it cannot cover everything.
>             Specifically we will **not** look into improving TableGen
>             or reuse InstCombine (see “ Proposed Approach” for the
>             rational). Please let me know if you see any issue with that.
>
>             There is also basic ground work needed to prepare for
>             Global ISel and I need to extend the core
>             MachineInstr-level APIs as explained during the talk. For
>             this, I prepared sketches of patches to illustrate them
>             and describe the details in the “Implications” section
>             below. Please have a look at the patches to have a better
>             idea of the expected impact.
>
>             If there is anything else you want to discuss related to
>             Global ISel feel free to reach me. In particular, several
>             people expressed their interests during the LLVM Dev
>             Meeting in contributing to the project. Let me know what
>             is your area of interest, so that we can coordinate our
>             efforts.
>             Anyhow, please add [GlobalISel] in the subject line to
>             help categorizing the emails.
>
>
>
>             *** Goals ***
>
>             The high level goals of the new instruction selector are:
>             - Global instruction selector.
>             - Fast instruction selector.
>             - Shared code path for fast and good instruction selection.
>             - IR that represents ISA concepts better.
>             - More flexible instruction selector.
>             - Easier to maintain/understand framework, in particular
>             legalization.
>             - Self contained machine representation, no back links to
>             LLVM IR.
>             - No change to LLVM IR.
>
>             Note:  The goals are common to all targets. In particular,
>             we do not intend to work on target specific feature for
>             the prototype.
>             The bottom line is please make sure those goals are
>             compatible with what you want to achieve for your target,
>             even if your requirement does not get listed here.
>
>
>
>             *** Proposed Approach ***
>
>             In this section, I describe the approach I plan to pursue
>             in the prototype and the roadmap to get there. The final
>             design will flow out of it.
>
>             For this prototype, we purposely exclude any work to
>             improve or use TableGen or InstCombine *[final].* We will
>             keep in mind however, that some of the C++ code we write
>             will be table-generated at some point.
>             The rational is that we do not want to lay down a new
>             TableGen/InstCombine infrastructure before being able to
>             work on the ISel framework itself.
>
>             The prototype vehicle will be *AArch64*. None of the
>             changes for GlobalISel will negatively impact the existing
>             ISel.
>
>
>             ** High Level Prototype Design **
>
>             As shown in the talk, the expected pipeline for the
>             prototype is:
>             *LLVM IR *-> IRTranslator -> *Generic (G) MachineInstr* ->
>             Legalizer -> RegBankSelect -> Select -> *MachineInstr*
>
>             Where:
>             - Terms in *bold* are intermediate representations.
>             -  Generic MachineInstrs are machine instructions with a
>             generic opcode, e.g., ADD, COPY.
>
>             - IRTranslator: Translate LLVM IR to (G) MachineInstr.
>             - Legalizer: Legalize illegal (G) MachineInstr to legal
>             (G) MachineInstr.
>             - RegBankSelect: Assign virtual register with size to
>             virtual register with Register Bank.
>             - Select: Translate the remaining (G) MachineInstr to
>             MachineIntr.
>
>
>
>             ** Implications **
>
>             As part of the bring-up of the prototype, we need to
>             extend some of the core MachineInstr-level APIs:
>               - Need to remember FastMath flags for each MachineInstr.
>               - Need to know the type of each MachineInstr. We don’t
>             want ADD8, ADD16, etc.
>               - Extend the MachineRegisterInfo to support size as well
>             as register classes for virtual registers.
>
>             I have sketched the changes in the attached patches to
>             help picturing how the changes would impact the existing APIs.
>
>             Note: I do not intend to commit those changes as they are.
>             They will go the usual review process in due time.
>
>
>             The patches contain “// ***”-like comment that give a
>             rough explanation on why those changes are needed w.r.t.
>             the goals.
>             The order of the patches could be modified since the
>             dependencies between those are not sequential. Anyhow,
>             here are the patches:
>             1. Introduce (some of) the generic opcode.
>             2. Make MachineFunction more independent of LLVM IR to
>             eventually be able to delete the LLVM IR instance from the
>             memory.
>             3. Extend MachineInstr to represent additional information
>             attached to generic opcode.
>             4. Teach MachineRegisterInfo about size for virtual registers.
>             5. Introduce a helper class to build MachineInstr related
>             objects.
>             6. Add new target hooks to lower the ABI directly to
>             MachineInstr.
>             7. Introduce the IRTranslator pass.
>
>
>             ** Roadmap for the Prototype **
>
>             We plan to split the prototype in three main milestones:
>             1. Translation: LLVM IR to (G) MachineInstr translation.
>             2. Basic selector: Legal LLVM IR to target specific
>             MachineInstr.
>             3. Simple legalization: Support scalar type legalization
>             and some vector instructions.
>
>             Notes:
>             - For #1, we will not support any fancy instructions like
>             landing pad or switch.
>             - Each milestone should take about 3-4 months.
>
>             - At the end of #2, we would have a FastISel like selector.
>
>             Each milestone will be detailed right before starting it.
>             The rational is that we want to accommodate what we
>             discovered with the prototype for the next milestone. In
>             other words, in this email, *I only describe the first
>             milestone* in detail and I will give more details on the
>             next milestone shortly before we start it and so on. For
>             your information, here is the remaining of the intended
>             roadmap for the *full* project:
>             4. Productization: Clean up implementation, stabilize the
>             APIs.
>             5. Complex legalization: Extend legalization support to
>             everything missing.
>             6. Completeness: Fill the blanks, e.g., landing pad.
>             7. Clean-up and performance: Add the necessary bits to be
>             at parity or beat SelectionDAG generated code.
>             8. Transition: Document how to switch, provide tools to help.
>
>
>             ** Milestone 1 **
>
>             The first phase is focused on the IRTranslator pass.
>
>             The IRTranslator is responsible for translating the LLVM
>             IR into Generic MachineInstr. The IRTranslator pass uses
>             some target hooks to perform the ABI lowering. We can
>             either define a new API for them, e.g., ABILoweringInfo,
>             or extend the existing TargetLowering.
>             Moreover, the prototype will focus on simple instruction,
>             i.e., we will not support switch or landing pad for this
>             iteration.
>
>             At the end of M1, the prototype will not be able to
>             produce code, since we would only have the beginning of
>             the Global ISel pipeline. Instead, we will test the
>             IRTranslator on the generic output that is produced from
>             the tested IR.
>
>             * Design Decisions *
>
>             - The IRTranslator is a final class. Its purpose is to
>             move away from LLVM IR to MachineInstr world *[final]*.
>             - Lower the ABI as part of the translation process *[final]*.
>
>             * Design Questions the Prototype Addresses at the End of M1 *
>
>             - Handling of aggregate types during the translation.
>             - Lowering of switches.
>             - What about Module pass for Machine pass?
>             - Introduce new APIs to have a clearer separation between:
>               - Legalization (setOperationAction, etc.)
>               - Cost/Combine related (isXXXFree, etc.)
>               - Lowering related (LowerFormal, etc.)
>             - What is the contract with the backends? Is it still
>             “should be able to select any valid LLVM IR”?
>
>             Thanks,
>
>             -Quentin
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/6b3459c8/attachment-0001.html>