[llvm-dev] [GlobalISel] A Proposal for global instruction selection
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Tue Jan 12 16:11:41 PST 2016
I think after reading your link I'm actually more confused. This might
just be a wording problem, but let me ask a couple of clarifying questions.
1) After compiling the code sequence below (from that page), does the in
memory bit pattern differ? The page seemed to contradict itself.
%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
store <2 x i64> %1, <2 x i64>* %y
2) If so, does this mean that performing dead-store-elimination is
illegal for ARM?
3) Are loads and stores ever allowed to fault based on the in memory
representation?
4) What happens if we have a load of <2xi64> following the store above
and we do DSE the store before forwarding it's value?
Philip
On 01/12/2016 05:55 AM, James Molloy via llvm-dev wrote:
> Hi,
>
> > I found this thinking quite difficult to explain. Does it make sense?
>
> It might help to link to the documentation on why bitcasts are weird
> on big-endian NEON: http://llvm.org/docs/BigEndianNEON.html#bitconverts
>
> Cheers,
>
> James
>
> On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Hi,
>
> I haven't found much time to look into the LLVM-IR-level
> optimizations yet so I'm not sure how they handle bitcasts. With
> that disclaimer in mind, I expect it's fine for the LLVM-IR level
> optimizations to handle them using either definition since they
> are equivalent at the LLVM-IR level. My thinking is that LLVM-IR
> is consistent about how virtual bits are assigned to types and
> that non-zero instruction nops arise when there is inconsistency.
>
> At the LLVM-IR level, bits 0-127 of <4 x i32> map directly onto
> bits 0-127 of <2 x i64> using the identity map. It's therefore ok
> to interpret such bitcasts as zero-instruction no-ops. As far as I
> can tell, LLVM-IR has been defined such that the identity map can
> be used for bitcasts between all same-sized types, and also such
> that bitcasting between different-sized types is invalid.
>
> Similarly, most targets have a single mapping of virtual bit
> numbers to physical bit numbers for each size that is applied
> consistently when mapping a type to memory. For example 32-bits
> map like so:
>
> Little Endian Targets: virtual register bits
> {0..7,8..15,16..23,24..31} map to physical memory bits
> {0..7,8..15,16..23,24..31}
>
> Big Endian Targets: virtual register bits
> {0..7,8..15,16..23,24..31} map to physical memory bits
> {24..31,16..23,8..15,0..7}
>
> regardless of whether it's a float, or an i32. We therefore need
> zero instructions to re-map physical memory bits for one type onto
> another type.
>
> The same idea holds for physical register classes. There's a
> single consistent mapping from physical memory bits to physical
> register bits that applies for all types that can be stored in
> that class. As long as this is the case the load/store and
> zero-instruction interpretation of bitcasts are equivalent.
>
> In the case of big-endian MSA and NEON, there isn't a single
> consistent mapping from physical memory bits to physical register
> bits so the equivalence in the two definitions breaks down:
>
> i128: virtual register bits {0..31, 32..63, 64..95, 96...127} map
> to physical memory bits {96..127, 64..95, 32..63, 0..31}
>
> <4 x i32>: virtual register bits {0..31, 32..63, 64..95, 96...127}
> map to physical memory bits {0..31, 32..63, 64..95, 96..127}
>
> <2 x i64>: virtual register bits {0..31, 32..63, 64..95, 96...127}
> map to physical memory bits {32..63, 0..31, 96..127, 64..95}
>
> with these inconsistent mappings we require instructions to
> bitcast between the types.
>
> I found this thinking quite difficult to explain. Does it make sense?
>
> > I am fine with treating bit casts as equivalent store/load pairs
> in GISel, I just want to be sure we do not have a semantic gap
> between the LLVM-IR and the backend if we do.
>
> I think a gap would arise from not having a GISel equivalent to
> ISD::BITCAST (gBITCAST?) available when it's necessary for
> correctness. However, I agree that GISel should delete bitcasts
> for the common case where the store/load and zero-instruction
> definitions are equivalent.
>
> *From:*Quentin Colombet [mailto:qcolombet at apple.com
> <mailto:qcolombet at apple.com>]
> *Sent:* 11 January 2016 17:23
> *To:* Daniel Sanders
> *Cc:* Tim Northover (t.p.northover at gmail.com
> <mailto:t.p.northover at gmail.com>); llvm-dev
>
>
> *Subject:* Re: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
>
> Hi Daniel,
>
> Thanks for the pointers, I wasn’t aware of the second thread
> you’ve mentioned.
>
> I may be wrong but I think LLVM-IR optimizations really treat
> bistcasts as no-op casts, in the sense of no instructions are
> required.
>
> Is there anyone that could chime in on that?
>
> However, it seems SelectionDAG sticks to the load/store semantic:
>
> "BITCAST - This operator converts between integer, vector and FP
> values, as if the value was *stored to memory with one type and
> loaded from the same address with the other type* (or equivalently
> for vector format conversions, etc)."
>
> I am fine with treating bit casts as equivalent store/load pairs
> in GISel, I just want to be sure we do not have a semantic gap
> between the LLVM-IR and the backend if we do.
>
> Thanks,
>
> -Quentin
>
> On Jan 11, 2016, at 7:43 AM, Daniel Sanders
> <Daniel.Sanders at imgtec.com <mailto:Daniel.Sanders at imgtec.com>>
> wrote:
>
> Hi,
>
> It was a comment by Tim that first made me aware of it
> (seehttp://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.htmlbut
> I think he commented on one of my patches before that).
>
> I asked about it on llvm-dev a couple weeks later
> (http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html)
> highlighting the contradiction and was told that 'no-op cast'
> referred to the lack of math rather than a requirement that
> zero instructions are used. It's therefore my understanding
> that shuffling the bits to preserve the load/store based
> definition isn't considered to be changing the bits.
>
> I think the main thing the current definition is unclear on is
> whether it refers to the bits in a physical machine register
> or the bits in the LLVM-IR virtual register. Most of the time
> these two views are the same but this doesn't quite work for
> big-endian MSA/NEON. For example:
>
> %0 = bitcast <4 x i32> <i32 1, i32 2, i32 3, i32 4> to <2 x i64>
>
> %0 = <2 x i64> <i64 (1 << 32) | 2, i64 (3 << 32) | 4>
>
> are equivalent to each other in LLVM-IR terms but the
> constants are physically laid out in MSA registers as:
>
> 0x00000004000000030000000200000001 # <4 x i32> <i32 1, i32 2,
> i32 3, i32 4>
>
> 0x00000003000000040000000100000002 # <2 x i64> <i64 (1 << 32)
> | 2, i64 (3 << 32) | 4>
>
> and we must therefore shuffle the bits to preserve LLVM-IR's
> point of view.
>
> *From:*Quentin Colombet [mailto:qcolombet at apple.com]
> *Sent:*07 January 2016 19:58
> *To:*Daniel Sanders
> *Cc:*llvm-dev
> *Subject:*Re: [llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
>
> Hi Daniel,
>
> I had a quick look at the language reference for bitcast and I
> have a different reading than what you were pointing out.
>
> Indeed, my take away is:
>
> "It is*always a */*no-op cast*/ because no bits change with
> this conversion."
>
> In other words, deleting all bitcast instructions should be fine.
>
> My understanding of the quote you’ve highlighted is that it
> tells C programmers that this is like a memcpy, not a cast :).
>
> Cheers,
>
> -Quentin
>
> On Nov 20, 2015, at 6:53 AM, Daniel Sanders
> <Daniel.Sanders at imgtec.com
> <mailto:Daniel.Sanders at imgtec.com>> wrote:
>
> Hi,
>
> I haven't had chance to read all of this yet, but one
> minor thing occurred to me during your presentation that I
> want to mention. At one point you mentioned deleting all
> the bitcast instructions since they're equivalent to nops
> but this isn't always true.
>
> Thehttp://llvm.org/docs/LangRef.htmldefinition of the
> bitcast instruction includes this sentence:
>
> The conversion is done as if the value had been stored to
> memory and read back as type ty2.
>
> For big-endian MSA, this is equivalent to a shuffling of
> the bits in the register because endianness only changes
> the byte order within each element. The order of the
> elements is unaffected by endianness. IIRC, big-endian
> NEON is the same way.
>
> *From:*llvm-dev
> [mailto:llvm-dev-bounces at lists.llvm.org]*On Behalf
> Of*Quentin Colombet via llvm-dev
> *Sent:*18 November 2015 19:27
> *To:*llvm-dev
> *Subject:*[llvm-dev] [GlobalISel] A Proposal for global
> instruction selection
>
> Hi,
>
> With this email, I would like to kick-off the development
> for the next instruction selector that I described during
> the last LLVM Dev’ Meeting.
> For the motivations, see Jakob’s proposal
> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html)
> and for the proposal, see the slides (Keynote:
> http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co or
> PDF:
> http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co)
> or the talk
> (https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2).
>
>
> TL;DR This is happening now, feedbacks invited!
>
> *** Context ***
>
> During the last LLVM Dev’ Meeting, I have presented a
> proposal for the next instruction selector, GlobalISel.
> The proposal is basically summarized in "High Level
> Prototype Design” and “Roadmap”. (If you want further
> details, feel free to reach me.)
>
> The first step of the development plan is to prototype the
> new framework on open source. The idea is to *start
> prototyping now(!)* and have the discussion ongoing in
> parallel. The reason of such approach is to have code that
> can be used to inform those discussions, e.g., by
> collecting data and trying different designs approaches.
> Regarding the discussion, I have listed a few points where
> your feedbacks would be particularly appreciated (see
> Feedback Invite).
>
>
> Also, as I have mentioned in my talk, some issues are
> controversial but I expect them to be resolved during
> prototype development. Specifically theses concern aspects
> of legalization (should parts of it be done at the LLVM IR
> level or all at the MI level?) and code re-use for
> instruction combiner. Please feel free to bring up your
> specific concern as I move along with the development plan.
>
> I expect the design to evolve with our experimental
> findings and your feedbacks and contributions.
> Nonetheless, we expect to nail down some design decisions
> once and for all as the prototype progresses. I have
> highlighted them with the following pattern *[final]*.
>
>
>
> *** Feedback Invite ***
>
> If you follow and support this work you need to be aware
> of three things and I am eager to hear your feedback and
> thoughts about them: the overall goals of Global ISel, the
> goals of the prototype, and the impact of the prototype
> work on backend design.
>
> In the section “Goals", I defined (repeated for people
> that saw the talk) the goals for the Global ISel design.
> - Do you see anything missing?
> - Do you see something that should not be there?
>
> The prototype will answer critical design questions (see
> “Design Questions the Prototype Addresses at the End of
> M1" for examples) before the actual design of Gobal ISel
> is finalized, but it cannot cover everything.
> Specifically we will **not** look into improving TableGen
> or reuse InstCombine (see “ Proposed Approach” for the
> rational). Please let me know if you see any issue with that.
>
> There is also basic ground work needed to prepare for
> Global ISel and I need to extend the core
> MachineInstr-level APIs as explained during the talk. For
> this, I prepared sketches of patches to illustrate them
> and describe the details in the “Implications” section
> below. Please have a look at the patches to have a better
> idea of the expected impact.
>
> If there is anything else you want to discuss related to
> Global ISel feel free to reach me. In particular, several
> people expressed their interests during the LLVM Dev
> Meeting in contributing to the project. Let me know what
> is your area of interest, so that we can coordinate our
> efforts.
> Anyhow, please add [GlobalISel] in the subject line to
> help categorizing the emails.
>
>
>
> *** Goals ***
>
> The high level goals of the new instruction selector are:
> - Global instruction selector.
> - Fast instruction selector.
> - Shared code path for fast and good instruction selection.
> - IR that represents ISA concepts better.
> - More flexible instruction selector.
> - Easier to maintain/understand framework, in particular
> legalization.
> - Self contained machine representation, no back links to
> LLVM IR.
> - No change to LLVM IR.
>
> Note: The goals are common to all targets. In particular,
> we do not intend to work on target specific feature for
> the prototype.
> The bottom line is please make sure those goals are
> compatible with what you want to achieve for your target,
> even if your requirement does not get listed here.
>
>
>
> *** Proposed Approach ***
>
> In this section, I describe the approach I plan to pursue
> in the prototype and the roadmap to get there. The final
> design will flow out of it.
>
> For this prototype, we purposely exclude any work to
> improve or use TableGen or InstCombine *[final].* We will
> keep in mind however, that some of the C++ code we write
> will be table-generated at some point.
> The rational is that we do not want to lay down a new
> TableGen/InstCombine infrastructure before being able to
> work on the ISel framework itself.
>
> The prototype vehicle will be *AArch64*. None of the
> changes for GlobalISel will negatively impact the existing
> ISel.
>
>
> ** High Level Prototype Design **
>
> As shown in the talk, the expected pipeline for the
> prototype is:
> *LLVM IR *-> IRTranslator -> *Generic (G) MachineInstr* ->
> Legalizer -> RegBankSelect -> Select -> *MachineInstr*
>
> Where:
> - Terms in *bold* are intermediate representations.
> - Generic MachineInstrs are machine instructions with a
> generic opcode, e.g., ADD, COPY.
>
> - IRTranslator: Translate LLVM IR to (G) MachineInstr.
> - Legalizer: Legalize illegal (G) MachineInstr to legal
> (G) MachineInstr.
> - RegBankSelect: Assign virtual register with size to
> virtual register with Register Bank.
> - Select: Translate the remaining (G) MachineInstr to
> MachineIntr.
>
>
>
> ** Implications **
>
> As part of the bring-up of the prototype, we need to
> extend some of the core MachineInstr-level APIs:
> - Need to remember FastMath flags for each MachineInstr.
> - Need to know the type of each MachineInstr. We don’t
> want ADD8, ADD16, etc.
> - Extend the MachineRegisterInfo to support size as well
> as register classes for virtual registers.
>
> I have sketched the changes in the attached patches to
> help picturing how the changes would impact the existing APIs.
>
> Note: I do not intend to commit those changes as they are.
> They will go the usual review process in due time.
>
>
> The patches contain “// ***”-like comment that give a
> rough explanation on why those changes are needed w.r.t.
> the goals.
> The order of the patches could be modified since the
> dependencies between those are not sequential. Anyhow,
> here are the patches:
> 1. Introduce (some of) the generic opcode.
> 2. Make MachineFunction more independent of LLVM IR to
> eventually be able to delete the LLVM IR instance from the
> memory.
> 3. Extend MachineInstr to represent additional information
> attached to generic opcode.
> 4. Teach MachineRegisterInfo about size for virtual registers.
> 5. Introduce a helper class to build MachineInstr related
> objects.
> 6. Add new target hooks to lower the ABI directly to
> MachineInstr.
> 7. Introduce the IRTranslator pass.
>
>
> ** Roadmap for the Prototype **
>
> We plan to split the prototype in three main milestones:
> 1. Translation: LLVM IR to (G) MachineInstr translation.
> 2. Basic selector: Legal LLVM IR to target specific
> MachineInstr.
> 3. Simple legalization: Support scalar type legalization
> and some vector instructions.
>
> Notes:
> - For #1, we will not support any fancy instructions like
> landing pad or switch.
> - Each milestone should take about 3-4 months.
>
> - At the end of #2, we would have a FastISel like selector.
>
> Each milestone will be detailed right before starting it.
> The rational is that we want to accommodate what we
> discovered with the prototype for the next milestone. In
> other words, in this email, *I only describe the first
> milestone* in detail and I will give more details on the
> next milestone shortly before we start it and so on. For
> your information, here is the remaining of the intended
> roadmap for the *full* project:
> 4. Productization: Clean up implementation, stabilize the
> APIs.
> 5. Complex legalization: Extend legalization support to
> everything missing.
> 6. Completeness: Fill the blanks, e.g., landing pad.
> 7. Clean-up and performance: Add the necessary bits to be
> at parity or beat SelectionDAG generated code.
> 8. Transition: Document how to switch, provide tools to help.
>
>
> ** Milestone 1 **
>
> The first phase is focused on the IRTranslator pass.
>
> The IRTranslator is responsible for translating the LLVM
> IR into Generic MachineInstr. The IRTranslator pass uses
> some target hooks to perform the ABI lowering. We can
> either define a new API for them, e.g., ABILoweringInfo,
> or extend the existing TargetLowering.
> Moreover, the prototype will focus on simple instruction,
> i.e., we will not support switch or landing pad for this
> iteration.
>
> At the end of M1, the prototype will not be able to
> produce code, since we would only have the beginning of
> the Global ISel pipeline. Instead, we will test the
> IRTranslator on the generic output that is produced from
> the tested IR.
>
> * Design Decisions *
>
> - The IRTranslator is a final class. Its purpose is to
> move away from LLVM IR to MachineInstr world *[final]*.
> - Lower the ABI as part of the translation process *[final]*.
>
> * Design Questions the Prototype Addresses at the End of M1 *
>
> - Handling of aggregate types during the translation.
> - Lowering of switches.
> - What about Module pass for Machine pass?
> - Introduce new APIs to have a clearer separation between:
> - Legalization (setOperationAction, etc.)
> - Cost/Combine related (isXXXFree, etc.)
> - Lowering related (LowerFormal, etc.)
> - What is the contract with the backends? Is it still
> “should be able to select any valid LLVM IR”?
>
> Thanks,
>
> -Quentin
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160112/6b3459c8/attachment-0001.html>
More information about the llvm-dev
mailing list