[llvm-dev] Instruction Itineraries: question about operand latencies

Wed Jun 8 15:13:33 PDT 2016

I did some looking around and found this in Passes.cpp:
// Temporary option to allow experimenting with MachineScheduler as a
post-RA
// scheduler. Targets can "properly" enable this with
// substitutePass(&PostRASchedulerID, &PostMachineSchedulerID); Ideally it
// wouldn't be part of the standard pass pipeline, and the target would
just add
// a PostRA scheduling pass wherever it wants.
static cl::opt<bool> MISchedPostRA("misched-postra", cl::Hidden,
  cl::desc("Run MachineScheduler post regalloc (independent of preRA
sched)"));

So I added this to our target's passConfig subclass:
class XSTGPassConfig : public TargetPassConfig {
public:
    XSTGPassConfig(XSTGTargetMachine *TM, PassManagerBase &PM) :
        TargetPassConfig(TM, PM) {
           if (TM->getOptLevel() != CodeGenOpt::None)
              substitutePass(&PostRASchedulerID, &PostMachineSchedulerID);
        }

Then built and ran clang on some code. I had added some couts to the
getInstrLatency to display the UseInstr. This is an example of what I see
on the output:

>>> DefInstr: %vreg34<def> = LOADI32_RI %vreg3, 268;
mem:LD4[%3](tbaa=<0x75a64d8>) R32C:%vreg34 GPRC:%vreg3
dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[
pipeline_routing_be.c:1120:1 ] ]
 Latency: 142 for: DefIdx= 0 UseIdx= 1
    UseInstr: %vreg34<def> = LOADI32_RI %vreg3, 268;
mem:LD4[%3](tbaa=<0x75a64d8>) R32C:%vreg34 GPRC:%vreg3
dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[
pipeline_routing_be.c:1120:1 ] ]
%vreg35<def> = CVT_U32_TO_U64 %vreg34; GPRC:%vreg35 R32C:%vreg34
dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[
pipeline_routing_be.c:1120:1 ] ]

...since I see vreg's mentioned there, I'm assuming this didn't run postRA
as I would have expected.

(Our code is based on LLVM 3.6 if that's relevant)

Phil

On Tue, Jun 7, 2016 at 8:57 PM, Ehsan Amiri <ehsanamiri at gmail.com> wrote:

> There are two scheduling passes. One is before register allocation and the
> other one is after register allocation. You probably looked at the print
> outs during first (pre-ra) scheduling pass. Start from
> TargetPassConfig::addMachinePasses to find more details about code gen
> passes.
>
> On Tue, Jun 7, 2016 at 10:02 PM, Phil Tomson <phil.a.tomson at gmail.com>
> wrote:
>
>> I overrode getInstrLatency and did some printing to see what is available
>> there. It looks like the registers are still virtual at that point when
>> getInstrLatency is called - is that correct? (we needed to make some
>> decisions based on actual registers that have been assigned since some
>> registers are reserved as address space pointers and we could vary the
>> latency based on which address space pointer register is being used - but
>> it looks like they're virtual there)
>>
>> Phil
>>
>> On Mon, Jun 6, 2016 at 3:10 PM, Ehsan Amiri <ehsanamiri at gmail.com> wrote:
>>
>>> Hi Phil
>>>
>>> There are some comments in "include/llvm/Target/TargetItinerary.td"
>>> where class InstrItinData is defined.
>>>
>>>  B is the number of cycles after issue where the first operand of the
>>> instruction is defined. A is the number of cycles that the instruction will
>>> stay in that particular stage in the pipeline. So for simple cases, like
>>> your example, one would expect that A and B should have the same value.But
>>> there is different API for accessing to A and B.
>>>
>>> An example of accessing to B in the source code can be found here:
>>> PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in
>>> include/llvm/MC/MCInstrItineraries.h. From this two you can probably find
>>> other relevant places.
>>>
>>> Hope this helps
>>> Ehsan
>>>
>>>
>>> On Mon, Jun 6, 2016 at 2:37 PM, Phil Tomson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> In our architecture loads from certain memory locations take a long
>>>> time to complete (on the order of 150 clock cycles). Since we don't have a
>>>> way to tell at compile time if the address being loaded from lies in slow
>>>> or fast memory, I've gone ahead and made all of the load numbers high, such
>>>> as:
>>>>
>>>>   InstrItinData< II_LOAD1,     [InstrStage<150, [AGU]>]>,
>>>>
>>>> However, I see that there is another field which I haven't specified
>>>> where operand latencies are specified.  Here's an example from
>>>> ARMScheduleA8.td:
>>>>
>>>>   InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2,
>>>> 2]>,
>>>>
>>>> Now I'm wondering if Instead of what I had above, I should instead have
>>>> specified:
>>>>
>>>>   InstrItinData< II_LOAD1,     [InstrStage<150, [AGU]>],[150,1,1]>,
>>>>
>>>> ?
>>>>
>>>> but is that first '150' parameter there redundant? Since it's specified
>>>> in the operand latency list ([150,1,1] - the first element of that array
>>>> being the latency for the output)?
>>>>
>>>>
>>>> To clarify, for values of  'A' and 'B' below:
>>>>
>>>>   InstrItinData< II_LOAD1,     [InstrStage<A, [AGU]>], [B,1,1]>,
>>>>
>>>> ...what is the difference in the meaning for 'A' and 'B'? Are they
>>>> essentially the same value since only one functional unit is specified?
>>>> ([AGU])
>>>>
>>>> Phil
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/917d0c8a/attachment.html>