[LLVMdev] VLIW Ports

Tue Oct 25 11:59:23 PDT 2011

It seems to me that the concept of insn bundles or packets is needed 
with different characteristics, depending where it's used.

At early scheduling, when there may be no MachineInstruction objects 
yet, the data structure or annotation that's needed may be quite 
different from that needed at or near code generation and emission.  I 
think that what Sergei is talking about fits well with the former case, 
while Carlos' patch, which I'm still to examine, seems to fit well with 
the latter.

Since my focus is near the final phases of the compiler, Carlos' 
suggestion was more eloquent to me and seem to be a good starting point 
that I intend to investigate further.

One hesitation that I have about sub-classing though is that base-class 
objects may be destroyed and replaced by passes and thus the sub-class 
information may be lost.  This is not much of a problem at the final 
stages, when there are fewer passes to worry about, but it may be 
something difficult to control at early stages.

Having said this, if indeed the bundling of insns is done differently at 
different stages, while I would be comfortable sub-classing MI, provided 
that the life-time of the bundle used for scheduling is relatively 
short, it could be OK too.

However, if different bundling representations are used, later 
scheduling would have to understand both representations.

Anyways, just thinking out loud...

-- 
Evandro Menezes        Austin, TX        emenezes at codeaurora.org
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum

On 10/25/11 09:50, Sergei Larin wrote:
>
> Carlos,
>
>    Absolutely. And an addition to live range detection needs to be made aware of the global cycle... and it needs to be done regardless of representation methodology. Same for any pass that would care for packets. The important observation here IMHO is that "packetization" at early stage (before RA) is tentative, and RA can change the landscape, which must be somewhat finalized in Post RA scheduler. Nevertheless, I think one last pass, right before code emission is still needed to "clean up" the final schedule.
>
>    I do not have a patch handy, it would have been easier to illustrate my proposal, but the fact of this discussion alone shows growing interest to the problem. I have a feeling that we might obtain a VLIW target/back end shortly, and then it would become a real (and burning) issue. This might be our chance to outperform GCC RISC centric philosophy in an elegant and powerful way.
>
>    First step to healing is to recognize that we have an issue ;)
>
> Sergei
>
> -----Original Message-----
> From: Carlos Sánchez de La Lama [mailto:carlos.delalama at urjc.es]
> Sent: Tuesday, October 25, 2011 4:24 AM
> To: Sergei Larin
> Cc: 'Evan Cheng'; 'Stripf, Timo'; 'LLVM Dev'
> Subject: RE: [LLVMdev] VLIW Ports
>
> Hi Sergei,
>
>>    What would you say to a some sort of a "global cycle" field/marker to
>> determine all instructions scheduled at a certain "global" cycle. That way
>> the "bundle"/packet/multiop can be identified at any time via a common
>> "global cycle" value.
>
> But RA would need to know about this global cycle field, right? Cause a
> register can be reused in the same "global cycle" as it is killed.
>
> Carlos
>
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
>> Behalf Of Carlos Sánchez de La Lama
>> Sent: Monday, October 24, 2011 4:38 PM
>> To: Evan Cheng
>> Cc: Stripf, Timo; LLVM Dev
>> Subject: Re: [LLVMdev] VLIW Ports
>>
>> Hi Evan (and all),
>>
>>> I think any implementation that makes a "bundle" a different entity from
>> MachineInstr is going to be difficult to use. All of the current backend
>> passes will have to taught to know about bundles.
>>
>> The approach in the patch I sent (and I believe Timo's code works similar,
>> according to his explanations) is precisely to make "bundles" no different
>> from MachineInstructions. They are MIs (a class derived from it), so all
>> other passes work transparently with them. For example, in my code register
>> allocator does not know it is allocating regs for a bundle, it sees it just
>> as a MI using a lot of registers. Of course, normal (scalar) passes can not
>> "inspect" inside bundles, and wont be able for example to put spilling code
>> into bundles or anything like that.
>>
>> But the good point is that bundles (which are MIs) and regular MIs can
>> coexist inside a MachineBasicBlock, and bundles can easily be "broken back"
>> to regular MIs when needed for some pass.
>>
>>> I think what we need is a concept of a sequence of fixed machine
>> instructions. Something that represent a number of MachineInstr's that are
>> scheduled as a unit, something that is never broken up by MI passes such as
>> branch folding. This is something that current targets can use to, for
>> example, pre-schedule instructions. This can be useful for macro-fusing
>> optimization. It can also be used for VLIW targets.
>>
>> There might be something I am missing, but I do not see the advantage here.
>> Even more, if you use sequences you need to find a way to tell the passes
>> how long a sequence is. On the other hand, if you use a class derived from
>> MI, the passes know already (from their POV their are just dealing with
>> MIs). You have of course to be careful on how you build the bundles so they
>> have the right properties matching those of the inner MIs, and there is
>> where the pack/unpack methods come in.
>>
>> BR
>>
>> Carlos
>>
>>> On Oct 21, 2011, at 4:52 PM, Stripf, Timo wrote:
>>>
>>>> Hi all,
>>>>
>>>> I worked the last 2 years on a LLVM back-end that supports clustered and
>> non-clustered VLIW architectures. I also wrote a paper about it that is
>> currently within the review process and is hopefully going to be accepted.
>> Here is a small summary how I realized VLIW support with a LLVM back-end. I
>> also used packing and unpacking of VLIW bundles. My implementations do not
>> require any modification of the LLVM core.
>>>>
>>>> To support VLIW I added two representations for VLIW instructions: packed
>> and unpacked representation. Within the unpacked representation a VLIW
>> Bundle is separated by a NEXT instruction like it was done within the IA-64
>> back-end. The pack representation packs all instructions of one Bundle into
>> a single PACK instruction and I used this representation especially for the
>> register allocation.
>>>>
>>>> I used the following pass order for the clustered VLIW back-end:
>>>>
>>>> DAG->DAG Pattern Instruction Selection
>>>> ...
>>>> Clustering (Not required for unicluster VLIW architectures)
>>>> Scheduling
>>>> Packing
>>>> ...
>>>> Register Allocation
>>>> ...
>>>> Prolog/Epilog Insertion&  Frame Finalization
>>>> Unpacking
>>>> Reclustering
>>>> ...
>>>> Rescheduling (Splitting, Packing, Scheduling, Unpacking)
>>>> Assembly Printer
>>>>
>>>>
>>>> In principle, it is possible to use the LLVM scheduler to generate
>> parallel code by providing a custom hazard recognizer that checks true data
>> dependencies of the current bundle. The scheduler has also the capability to
>> output NEXT operations by using NoopHazard and outputting a NEXT instruction
>> instead of a NOP. However, the scheduler that is used within "DAG->DAG
>> Pattern Instruction Selection" uses this glue mechanism and that could be
>> problematic since no NEXT instructions are issued between glued
>> instructions.
>>>>
>>>> Within my back-end I added a parallelizing scheduling after "DAG->DAG
>> Pattern Instruction Selection" by reusing the LLVM Post-RA scheduler
>> together with a custom hazard recognizer as explained. The Post-RA scheduler
>> works very well with some small modifications (special PHI instruction
>> handling and a small performance issue due to the high virtual register
>> numbers) also before register allocation.
>>>>
>>>> Before register allocation the Packing pass converts the unpacked
>> representation outputted by the scheduler into the pack representation. So
>> the register allocation sees the VLIW bundles as one instruction. After
>> "Prolog/Epilog Insertion&  Frame Finalization" the Unpack pass converts the
>> PACK instruction back to the unpacked representation. Thereby, instructions
>> that were added within the Register Allocation and Prolog/Epilog Insertion
>> are recognized and gets into one bundle since they are not parallelized.
>>>>
>>>> At the end (just before assembly output) I added several passes for doing
>> a rescheduling. First, the splitting pass tries to split a VLIW bundle into
>> single instructions (if possible). The Packing pass packs all Bundles with
>> more the one instruction into a single PACK instruction. The scheduler will
>> recognize the PACK instruction as a single scheduling unit. Scheduling is
>> nearly the same as before RA. Unpacking establishes again the unpacked
>> representation.
>>>>
>>>> If anyone is interested in more information please send me an email. I'm
>> also interested in increasing support for VLIW architectures within LLVM.
>>>>
>>>> Kind regards,
>>>> Timo Stripf
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] Im
>> Auftrag von Carlos Sánchez de La Lama
>>>> Gesendet: Donnerstag, 6. Oktober 2011 13:14
>>>> An: LLVM Dev
>>>> Betreff: Re: [LLVMdev] VLIW Ports
>>>>
>>>> Hi all,
>>>>
>>>> here is the current (unfinished) version of the VLIW support I mentioned.
>> It is a patch over svn rev 141176. It includes the MachineInstrBundle class,
>> and small required changes in a couple of outside LLVM files.
>>>>
>>>> Also includes a modification to Mips target to simulate a 2-wide VLIW
>> MIPS. The scheduler is really silly, I did not want to implement a
>> scheduler, just the bundle class, and the test scheduler is just provided as
>> an example.
>>>>
>>>> Main thing still missing is to finish the "pack" and "unpack" methods in
>> the bundle class. Right now it manages operands, both implicit and explicit,
>> but it should also manage memory references, and update MIB flags acording
>> to sub-MI flags.
>>>>
>>>> For any question I would be glad to help.
>>>>
>>>> BR
>>>>
>>>> Carlos
>>>>
>>>> On Tue, 2011-09-20 at 16:02 +0200, Carlos Sánchez de La Lama wrote:
>>>>> Hi,
>>>>>
>>>>>> Has anyone attempted the port of LLVM to a VLIW architecture?  Is
>>>>>> there any publication about it?
>>>>>
>>>>> I have developed a derivation of MachineInstr class, called
>>>>> MachineInstrBundle, which is essnetially a VLIW-style machine
>>>>> instruction which can store any MI on each "slot". After the
>>>>> scheduling phase has grouped MIs in bundles, it has to call
>>>>> MIB->pack() method, which takes operands from the MIs in the "slots"
>>>>> and transfers them to the superinstruction. From this point on the
>>>>> bundle is a normal machineinstruction which can be processed by other
>>>>> LLVM passes (such as register allocation).
>>>>>
>>>>> The idea was to make a framework on top of which VLIW/ILP scheduling
>>>>> could be studies using LLVM. It is not completely finished, but it is
>>>>> more or less usable and works with a trivial scheduler in a synthetic
>>>>> MIPS-VLIW architecture. Code emission does not work though (yet) so
>>>>> bundles have to be unpacked prior to emission.
>>>>>
>>>>> I was waiting to finish it to send a patch to the list, but if you are
>>>>> interested I can send you a patch over svn of my current code.
>>>>>
>>>>> BR
>>>>>
>>>>> Carlos
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev