[llvm-dev] [GlobalISel] A Proposal for global instruction selection

Tue Jan 12 08:53:54 PST 2016

i128  =>  <16 x i8>  =>  GEP 0
i128  =>  <2 x i64>  =>  GEP 0  =>  <8 x i8>   =>  GEP 0
i128  =>  <2 x i64>  =>  GEP 0  =>  <2 x i32>  =>  GEP 0 => <4 x i8>   =>
 GEP 0

They all reference the same memory object from the same base address. If
the result is loaded, the in-register contents will differ between them
though (because there's a special "load a vector of this type" instruction
(LD1)).

On Tue, 12 Jan 2016 at 16:46 Mehdi Amini <mehdi.amini at apple.com> wrote:

> What happens when you cascade bitcast?
> Are these sequences all equivalent at the IR level (i.e. do they reference
> the same byte from the original i128)?
>
> i128  =>  <16 x i8>  =>  GEP 0
> i128  =>  <2 x i64>  =>  GEP 0  =>  <8 x i8>   =>  GEP 0
> i128  =>  <2 x i64>  =>  GEP 0  =>  <2 x i32>  =>  GEP 0 => <4 x i8>   =>
>  GEP 0
>
>
> —
> Mehdi
>
>
>
> On Jan 12, 2016, at 6:37 AM, Daniel Sanders via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Thanks, I didn't know about that page. It's a much clearer explanation of
> why the backend choses the code it does. However, there's a bit I'm trying
> to explain that isn't covered on that page. I'm trying to explain why the
> seemingly contradictory statements at
> http://llvm.org/docs/LangRef.html#bitcast-to-instruction don't actually
> contradict each other (even for big-endian NEON/MSA) while we're at the
> LLVM-IR level and why it's safe for LLVM-IR-level optimizations to use the
> zero-instruction definition despite the backend relying on the store/load
> definition. It boils down to both definitions being equivalent until we
> specialize to a target at which point the two definitions sometimes
> diverge. They diverge when the mapping of virtual bits to physical bits
> differs between LLVM-IR types.
>
>
> *From:* James Molloy [mailto:james at jamesmolloy.co.uk
> <james at jamesmolloy.co.uk>]
> *Sent:* 12 January 2016 13:56
> *To:* Daniel Sanders; Quentin Colombet
> *Cc:* llvm-dev
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
> Hi,
>
> > I found this thinking quite difficult to explain. Does it make sense?
> It might help to link to the documentation on why bitcasts are weird on
> big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts
>
> Cheers,
>
> James
>
> On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> I haven't found much time to look into the LLVM-IR-level optimizations yet
> so I'm not sure how they handle bitcasts. With that disclaimer in mind, I
> expect it's fine for the LLVM-IR level optimizations to handle them using
> either definition since they are equivalent at the LLVM-IR level. My
> thinking is that LLVM-IR is consistent about how virtual bits are assigned
> to types and that non-zero instruction nops arise when there is
> inconsistency.
>
> At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto bits 0-127
> of <2 x i64> using the identity map. It's therefore ok to interpret such
> bitcasts as zero-instruction no-ops. As far as I can tell, LLVM-IR has been
> defined such that the identity map can be used for bitcasts between all
> same-sized types, and also such that bitcasting between different-sized
> types is invalid.
>
> Similarly, most targets have a single mapping of virtual bit numbers to
> physical bit numbers for each size that is applied consistently when
> mapping a type to memory. For example 32-bits map like so:
> Little Endian Targets: virtual register bits {0..7,8..15,16..23,24..31}
> map to physical memory bits {0..7,8..15,16..23,24..31}
> Big Endian Targets: virtual register bits {0..7,8..15,16..23,24..31} map
> to physical memory bits {24..31,16..23,8..15,0..7}
> regardless of whether it's a float, or an i32. We therefore need zero
> instructions to re-map physical memory bits for one type onto another type.
>
> The same idea holds for physical register classes. There's a single
> consistent mapping from physical memory bits to physical register bits that
> applies for all types that can be stored in that class. As long as this is
> the case the load/store and zero-instruction interpretation of bitcasts are
> equivalent.
> In the case of big-endian MSA and NEON, there isn't a single consistent
> mapping from physical memory bits to physical register bits so the
> equivalence in the two definitions breaks down:
>                 i128: virtual register bits {0..31, 32..63, 64..95,
> 96...127} map to physical memory bits {96..127, 64..95, 32..63, 0..31}
>                 <4 x i32>: virtual register bits {0..31, 32..63, 64..95,
> 96...127} map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>                 <2 x i64>: virtual register bits {0..31, 32..63, 64..95,
> 96...127} map to physical memory bits {32..63, 0..31, 96..127, 64..95}
> with these inconsistent mappings we require instructions to bitcast
> between the types.
>
> I found this thinking quite difficult to explain. Does it make sense?
>
> > I am fine with treating bit casts as equivalent store/load pairs in
> GISel, I just want to be sure we do not have a semantic gap between the
> LLVM-IR and the backend if we do.
>
> I think a gap would arise from not having a GISel equivalent to
> ISD::BITCAST (gBITCAST?) available when it's necessary for correctness.
> However, I agree that GISel should delete bitcasts for the common case
> where the store/load and zero-instruction definitions are equivalent.
>
> *From:* Quentin Colombet [mailto:qcolombet at apple.com]
> *Sent:* 11 January 2016 17:23
> *To:* Daniel Sanders
> *Cc:* Tim Northover (t.p.northover at gmail.com); llvm-dev
>
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
> Hi Daniel,
>
> Thanks for the pointers, I wasn’t aware of the second thread you’ve
> mentioned.
>
> I may be wrong but I think LLVM-IR optimizations really treat bistcasts as
> no-op casts, in the sense of no instructions are required.
>
> Is there anyone that could chime in on that?
>
> However, it seems SelectionDAG sticks to the load/store semantic:
> "BITCAST - This operator converts between integer, vector and FP values,
> as if the value was *stored to memory with one type and loaded from the
> same address with the other type* (or equivalently for vector format
> conversions, etc)."
>
> I am fine with treating bit casts as equivalent store/load pairs in GISel,
> I just want to be sure we do not have a semantic gap between the LLVM-IR
> and the backend if we do.
>
> Thanks,
> -Quentin
>
>
> On Jan 11, 2016, at 7:43 AM, Daniel Sanders <Daniel.Sanders at imgtec.com>
> wrote:
>
> Hi,
>
> It was a comment by Tim that first made me aware of it (see
> http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html but I
> think he commented on one of my patches before that).
>
> I asked about it on llvm-dev a couple weeks later (
> http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html)
> highlighting the contradiction and was told that 'no-op cast' referred to
> the lack of math rather than a requirement that zero instructions are used.
> It's therefore my understanding that shuffling the bits to preserve the
> load/store based definition isn't considered to be changing the bits.
>
> I think the main thing the current definition is unclear on is whether it
> refers to the bits in a physical machine register or the bits in the
> LLVM-IR virtual register. Most of the time these two views are the same but
> this doesn't quite work for big-endian MSA/NEON. For example:
> %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to <2 x i64>
> %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) | 4>
> are equivalent to each other in LLVM-IR terms but the constants are
> physically laid out in MSA registers as:
> 0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2, i32 3, i32 4>
> 0x00000003000000040000000100000002 # <2 x i64> <i64 (1 << 32) | 2, i64 (3
> << 32) | 4>
> and we must therefore shuffle the bits to preserve LLVM-IR's point of view.
>
> *From:* Quentin Colombet [mailto:qcolombet at apple.com <qcolombet at apple.com>
> ]
> *Sent:* 07 January 2016 19:58
> *To:* Daniel Sanders
> *Cc:* llvm-dev
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
> Hi Daniel,
>
> I had a quick look at the language reference for bitcast and I have a
> different reading than what you were pointing out.
> Indeed, my take away is:
> "It is *always a **no-op cast* because no bits change with this
> conversion."
>
> In other words, deleting all bitcast instructions should be fine.
>
> My understanding of the quote you’ve highlighted is that it tells C
> programmers that this is like a memcpy, not a cast :).
>
> Cheers,
> -Quentin
>
> On Nov 20, 2015, at 6:53 AM, Daniel Sanders <Daniel.Sanders at imgtec.com>
> wrote:
>
> Hi,
>
> I haven't had chance to read all of this yet, but one minor thing occurred
> to me during your presentation that I want to mention. At one point you
> mentioned deleting all the bitcast instructions since they're equivalent to
> nops but this isn't always true.
>
> The http://llvm.org/docs/LangRef.html definition of the bitcast
> instruction includes this sentence:
> The conversion is done as if the value had been stored to memory and read
> back as type ty2.
> For big-endian MSA, this is equivalent to a shuffling of the bits in the
> register because endianness only changes the byte order within each
> element. The order of the elements is unaffected by endianness. IIRC,
> big-endian NEON is the same way.
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Quentin Colombet via
> llvm-dev
> *Sent:* 18 November 2015 19:27
> *To:* llvm-dev
> *Subject:* [llvm-dev] [GlobalISel] A Proposal for global instruction
> selection
>
> Hi,
>
> With this email, I would like to kick-off the development for the next
> instruction selector that I described during the last LLVM Dev’ Meeting.
> For the motivations, see Jakob’s proposal (
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html) and
> for the proposal, see the slides (Keynote:
> http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co or
> PDF:
> http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co)
> or the talk (
> https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2
> ).
>
> TL;DR This is happening now, feedbacks invited!
>
> *** Context ***
>
> During the last LLVM Dev’ Meeting, I have presented a proposal for the
> next instruction selector, GlobalISel. The proposal is basically summarized
> in "High Level Prototype Design” and “Roadmap”. (If you want further
> details, feel free to reach me.)
>
> The first step of the development plan is to prototype the new framework
> on open source. The idea is to *start prototyping now(!)* and have the
> discussion ongoing in parallel. The reason of such approach is to have code
> that can be used to inform those discussions, e.g., by collecting data and
> trying different designs approaches. Regarding the discussion, I have
> listed a few points where your feedbacks would be particularly appreciated
> (see Feedback Invite).
>
> Also, as I have mentioned in my talk, some issues are controversial but I
> expect them to be resolved during prototype development. Specifically
> theses concern aspects of legalization (should parts of it be done at the
> LLVM IR level or all at the MI level?) and code re-use for instruction
> combiner. Please feel free to bring up your specific concern as I move
> along with the development plan.
>
> I expect the design to evolve with our experimental findings and your
> feedbacks and contributions.
> Nonetheless, we expect to nail down some design decisions once and for all
> as the prototype progresses. I have highlighted them with the following
> pattern *[final]*.
>
>
>
> *** Feedback Invite ***
>
> If you follow and support this work you need to be aware of three things
> and I am eager to hear your feedback and thoughts about them: the overall
> goals of Global ISel, the goals of the prototype, and the impact of the
> prototype work on backend design.
>
> In the section “Goals", I defined (repeated for people that saw the talk)
> the goals for the Global ISel design.
> - Do you see anything missing?
> - Do you see something that should not be there?
>
> The prototype will answer critical design questions (see “Design Questions
> the Prototype Addresses at the End of M1" for examples) before the actual
> design of Gobal ISel is finalized, but it cannot cover everything.
> Specifically we will **not** look into improving TableGen or reuse
> InstCombine (see “ Proposed Approach” for the rational). Please let me know
> if you see any issue with that.
>
> There is also basic ground work needed to prepare for Global ISel and I
> need to extend the core MachineInstr-level APIs as explained during the
> talk. For this, I prepared sketches of patches to illustrate them and
> describe the details in the “Implications” section below. Please have a
> look at the patches to have a better idea of the expected impact.
>
> If there is anything else you want to discuss related to Global ISel feel
> free to reach me. In particular, several people expressed their interests
> during the LLVM Dev Meeting in contributing to the project. Let me know
> what is your area of interest, so that we can coordinate our efforts.
> Anyhow, please add [GlobalISel] in the subject line to help categorizing
> the emails.
>
>
>
> *** Goals ***
>
> The high level goals of the new instruction selector are:
> - Global instruction selector.
> - Fast instruction selector.
> - Shared code path for fast and good instruction selection.
> - IR that represents ISA concepts better.
> - More flexible instruction selector.
> - Easier to maintain/understand framework, in particular legalization.
> - Self contained machine representation, no back links to LLVM IR.
> - No change to LLVM IR.
>
> Note:  The goals are common to all targets. In particular, we do not
> intend to work on target specific feature for the prototype.
> The bottom line is please make sure those goals are compatible with what
> you want to achieve for your target, even if your requirement does not get
> listed here.
>
>
>
> *** Proposed Approach ***
>
> In this section, I describe the approach I plan to pursue in the prototype
> and the roadmap to get there. The final design will flow out of it.
>
> For this prototype, we purposely exclude any work to improve or use
> TableGen or InstCombine *[final].* We will keep in mind however, that
> some of the C++ code we write will be table-generated at some point.
> The rational is that we do not want to lay down a new TableGen/InstCombine
> infrastructure before being able to work on the ISel framework itself.
>
> The prototype vehicle will be *AArch64*. None of the changes for
> GlobalISel will negatively impact the existing ISel.
>
>
> ** High Level Prototype Design **
>
> As shown in the talk, the expected pipeline for the prototype is:
> *LLVM IR *-> IRTranslator -> *Generic (G) MachineInstr* -> Legalizer ->
> RegBankSelect -> Select -> *MachineInstr*
>
> Where:
> - Terms in *bold* are intermediate representations.
> -  Generic MachineInstrs are machine instructions with a generic opcode,
> e.g., ADD, COPY.
> - IRTranslator: Translate LLVM IR to (G) MachineInstr.
> - Legalizer: Legalize illegal (G) MachineInstr to legal (G) MachineInstr.
> - RegBankSelect: Assign virtual register with size to virtual register
> with Register Bank.
> - Select: Translate the remaining (G) MachineInstr to MachineIntr.
>
>
>
> ** Implications **
>
> As part of the bring-up of the prototype, we need to extend some of the
> core MachineInstr-level APIs:
>   - Need to remember FastMath flags for each MachineInstr.
>   - Need to know the type of each MachineInstr. We don’t want ADD8, ADD16,
> etc.
>   - Extend the MachineRegisterInfo to support size as well as register
> classes for virtual registers.
>
> I have sketched the changes in the attached patches to help picturing how
> the changes would impact the existing APIs.
>
> Note: I do not intend to commit those changes as they are. They will go
> the usual review process in due time.
>
> The patches contain “// ***”-like comment that give a rough explanation on
> why those changes are needed w.r.t. the goals.
> The order of the patches could be modified since the dependencies between
> those are not sequential. Anyhow, here are the patches:
> 1. Introduce (some of) the generic opcode.
> 2. Make MachineFunction more independent of LLVM IR to eventually be able
> to delete the LLVM IR instance from the memory.
> 3. Extend MachineInstr to represent additional information attached to
> generic opcode.
> 4. Teach MachineRegisterInfo about size for virtual registers.
> 5. Introduce a helper class to build MachineInstr related objects.
> 6. Add new target hooks to lower the ABI directly to MachineInstr.
> 7. Introduce the IRTranslator pass.
>
>
> ** Roadmap for the Prototype **
>
> We plan to split the prototype in three main milestones:
> 1. Translation: LLVM IR to (G) MachineInstr translation.
> 2. Basic selector: Legal LLVM IR to target specific MachineInstr.
> 3. Simple legalization: Support scalar type legalization and some vector
> instructions.
>
> Notes:
> - For #1, we will not support any fancy instructions like landing pad or
> switch.
> - Each milestone should take about 3-4 months.
> - At the end of #2, we would have a FastISel like selector.
>
> Each milestone will be detailed right before starting it. The rational is
> that we want to accommodate what we discovered with the prototype for the
> next milestone. In other words, in this email, *I only describe the first
> milestone* in detail and I will give more details on the next milestone
> shortly before we start it and so on. For your information, here is the
> remaining of the intended roadmap for the *full* project:
> 4. Productization: Clean up implementation, stabilize the APIs.
> 5. Complex legalization: Extend legalization support to everything missing.
> 6. Completeness: Fill the blanks, e.g., landing pad.
> 7. Clean-up and performance: Add the necessary bits to be at parity or
> beat SelectionDAG generated code.
> 8. Transition: Document how to switch, provide tools to help.
>
>
> ** Milestone 1 **
>
> The first phase is focused on the IRTranslator pass.
>
> The IRTranslator is responsible for translating the LLVM IR into Generic
> MachineInstr. The IRTranslator pass uses some target hooks to perform the
> ABI lowering. We can either define a new API for them, e.g.,
> ABILoweringInfo, or extend the existing TargetLowering.
> Moreover, the prototype will focus on simple instruction, i.e., we will
> not support switch or landing pad for this iteration.
>
> At the end of M1, the prototype will not be able to produce code, since we
> would only have the beginning of the Global ISel pipeline. Instead, we will
> test the IRTranslator on the generic output that is produced from the
> tested IR.
>
> * Design Decisions *
>
> - The IRTranslator is a final class. Its purpose is to move away from LLVM
> IR to MachineInstr world *[final]*.
> - Lower the ABI as part of the translation process *[final]*.
>
> * Design Questions the Prototype Addresses at the End of M1 *
>
> - Handling of aggregate types during the translation.
> - Lowering of switches.
> - What about Module pass for Machine pass?
> - Introduce new APIs to have a clearer separation between:
>   - Legalization (setOperationAction, etc.)
>   - Cost/Combine related (isXXXFree, etc.)
>   - Lowering related (LowerFormal, etc.)
> - What is the contract with the backends? Is it still “should be able to
> select any valid LLVM IR”?
>
> Thanks,
> -Quentin
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/c250fddf/attachment-0001.html>