[llvm-dev] Specify special cases of delay slots in the back end
Alex Susu via llvm-dev
llvm-dev at lists.llvm.org
Sun Mar 19 11:35:20 PDT 2017
Hello.
I would like to inform that the implementation I described in the previous email has
some flaws - and I'm not sure if it's really my fault.
For example, my back end handles correctly data hazards with Store, but not with Load
MachineInstructions.
More exactly, when I am given as parameter a Load instruction to the getHazardType()
method, and I consider it has a hazard and I return NoopHazard, unfortunately the
HazardRecognizer is not generating a NOP.
Does the ScoreboardHazardRecognizer class generate a NOP or not, when instructed to,
depending on the type of the instruction analyzed?
I actually found a (temporary) solution to my problem: I use the PreEmitNoops()
method instead of getHazardType(). So, I'm implementing the following simple behavior in
PreEmitNoops():
{
if (isDataHazard(SU))
return 1;
return ScoreboardHazardRecognizer::PreEmitNoops(SU);
}
I guess this solution would prevent me to change the order of instructions in order
to avoid generating NOPs to fill the delay slots.
Please let me know your opinion.
Thank you,
Alex
On 2/19/2017 9:29 PM, Alex Susu wrote:
> Hello.
> I wanted to inform that I managed to generate NOPs to avoid data hazards (I can
> present sample code with data hazard, if you want) with the standard
> ScoreboardHazardRecognizer by using a possibly not-so-great solution, which can generate
> correctly several NOPs in a function, if required. The way I implement this is by
> overriding the getHazardType() method with:
> ScheduleHazardRecognizer::HazardType
> ConnexDispatchGroupSBHazardRecognizer::getHazardType(SUnit *SU, int Stalls) {
> static bool emittedNoop = false;
>
> if (Stalls == 0 && // no (pipeline) stalls
> emittedNoop == false &&
> isDataHazard(SU)) {
> emittedNoop = true;
> return NoopHazard;
> }
> else
> emittedNoop = false;
>
> return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
> }
>
> However, I would like to return Hazard instead of NoopHazard to put useful
> instructions instead of NOPs - but when I do return Hazard nothing happens and the data
> hazard is not removed actually.
>
> Also, I see that when using the ScoreboardHazardRecognizer with a non-specialized (the
> default) getHazardType() method the post-RA scheduler is safely changing the order of a
> few instructions in the program (but he is not fixing my data hazards, because he can't
> recognize them by himself).
> As already said, my problem is that when returning from getHazardType() a Hazard value
> if we find a data-hazard, does NOT do anything new to the code, which means that the data
> hazard is not solved. (However, fortunately, when we return NoopHazard instead of Hazard
> we insert a NOP at the right place, therefore fixing the data hazard.)
>
> Please let me know if you can help me solve my issues with the data hazards by
> employing useful instructions in the delay slots instead of NOPs.
>
> Best regards,
> Alex
>
> PS: I see there are a few back ends successfully using the HazardRecognizer: PPC, ARM,
> AMDGPU, Hexagon, SystemZ (SystemZ was added it seems later than Jul 2016) so I believe for
> me it is also a good idea to use it. (Well, on the other hand, at least the Mips back end
> has its own MipsDelaySlotFiller.cpp that treats in its own way, exactly the problem I also
> want to solve, namely "// Simple pass to fill delay slots with useful instructions.")
>
>
>
> On 2/11/2017 2:39 PM, Alex Susu wrote:
>> Hello.
>> Hal, the problem I have is that it doesn't advance at the next available instruction -
>> it always gets the same store. This might be because I did not specify in a file like
>> [Target]Schedule.td the functional units, processor and instruction itineraries.
>> Regarding the Stalls argument to my method
>> [Target]DispatchGroupSBHazardRecognizer::getHazardType() I always get the argument Stalls
>> = 0. This is no surprise since in PostRASchedulerList.cpp we have only one call to it, in
>> method SchedulePostRATDList::ListScheduleTopDown():
>> ScheduleHazardRecognizer::HazardType HT =
>> HazardRec->getHazardType(CurSUnit, 0/*no stalls*/);
>
> !!!!!!!!!I am actually wrong - but getHazardType() is called only once in
> PostRASchedulerList.cpp:
> ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::getHazardType(SU = SU(5): t57:
> v128i1,ch = ST_INDIRECT_D<Mem:ST256[inttoptr (i16 52 to
> i16*)](tbaa=<0x23143b8>)(alias.scope=<0x230a920>)(noalias=<0x2307b30>,<0x2307450>)> t47,
> t104, t49, t56
> , Stalls = 0)
> isReadAfterWrite(SU = SU(5): t57: v128i1,ch = ST_INDIRECT_D<Mem:ST256[inttoptr (i16 52 to
> i16*)](tbaa=<0x23143b8>)(alias.scope=<0x230a920>)(noalias=<0x2307b30>,<0x2307450>)> t47,
> t104, t49, t56
>
> )
> isReadAfterWrite(): SU->Succs.size() = 1
> isReadAfterWrite(): (SU->getNode())->isMachineOpcode() = 1
> isReadAfterWrite(): (SU->getNode())->getOpcode() = 65430
> isReadAfterWrite(): (SU->getNode())->getMachineOpcode() = 105
> isReadAfterWrite(): SU->Succs[0] = SU(4): t73: ch = END_REPEAT_D t57:1
>
> )
> isReadAfterWrite(): (SUsucc->getNode())->getMachineOpcode() = 41
> isReadAfterWrite(): numUses = 3
> isReadAfterWrite(): MCID->getNumOperands() = 4
> isReadAfterWrite(): MCID->getNumDefs() = 1
> isReadAfterWrite(): SU->Preds.size() = 4
> isReadAfterWrite(): SU->Succs.size() = 1
> isReadAfterWrite(): SU can store
> isReadAfterWrite(): SDN->getNumOperands() = 4
> isReadAfterWrite(SU->Preds[0] = t47: v128i16 = <<Unknown Machine Node #65508>> t43, t32
> )
> isReadAfterWrite(): numDefs = 1
> isReadAfterWrite(): PredSDN->getNumOperands() = 2
> isReadAfterWrite(): SDN->getOperand(0) = t47: v128i16 = <<Unknown Machine Node #65508>>
> t43, t32
>
> isReadAfterWrite(): PredSDN->getOperand(0) = t43: v128i16,v128i1,ch = <<Unknown Machine
> Node #65465>><Mem:LD256[inttoptr (i16 51 to
> i16*)](tbaa=<0x23143b8>)(alias.scope=<0x2307450>)> t102, t104, t39, t97
>
> isReadAfterWrite(): Found PredSDN == SDN->getOperand(idUse)
> Pre-RA: getHazardType(): return NoopHazard
> ConnexDispatchGroupSBHazardRecognizerPreRAScheduler::getHazardType(SU = SU(5): t57:
> v128i1,ch = ST_INDIRECT_D<Mem:ST256[inttoptr (i16 52 to
> i16*)](tbaa=<0x23143b8>)(alias.scope=<0x230a920>)(noalias=<0x2307b30>,<0x2307450>)> t47,
> t104, t49, t56
> , Stalls = -1)
>
>
>
>
>> Let me state what I have added to my back end to enable scheduling with hazards:
>> - inspiring from lib/Target/PowerPC/PPCHazardRecognizers.h, I have created a class
>> [Target]DispatchGroupSBHazardRecognizer : public ScoreboardHazardRecognizer (I use
>> ScoreboardHazardRecognizer because I hope in the near future to make my class employ in
>> "out-of-order" execution USEFUL program instructions instead of NOP to handle my data
>> hazards), implementing for it only a method:
>> HazardType getHazardType(SUnit *SU, int Stalls);
>> In this method I check if the current SU is a vector store and the previous
>> instruction updates the register used by the store, which in my processor is a data
>> hazard, in which case I give:
>> return NoopHazard;
>> and otherwise, I give:
>> return ScoreboardHazardRecognizer::getHazardType(SU, Stalls);
>>
>> - I implemented in [Target]InstrInfo.cpp 2 more methods:
>> - CreateTargetPostRAHazardRecognizer() to register the
>> [Target]DispatchGroupSBHazardRecognizer()
>> - insertNoop() which returns the target's NOP
>>
>> - note that my vector (and scalar) instructions are inspired from the Mips back end,
>> which has MSAInst (and MipsInst) with NoItinerary InstrItinClass. Currently I am not using
>> a [Target]Schedule.td specifying functional units, processor and instruction itineraries.
>> This might be a problem - I guess ScoreboardHazardRecognizer relies on this information.
>>
>> In principle, should I maybe use the post-RA MI-scheduler instead of the standard
>> post-RA scheduler (maybe also
>> http://llvm.org/docs/doxygen/html/classllvm_1_1MachineSchedStrategy.html ) to deal with my
>> hazards ?
>> Following http://llvm.org/devmtg/2014-10/Slides/Estes-MISchedulerTutorial.pdf, the
>> MI-scheduler also handles hazards, but I guess it's less documented, although the AArch64
>> is using it.
>>
>> Thank you,
>> Alex
>>
>>
>> On 2/10/2017 11:33 PM, Hal Finkel wrote:
>>> Hi Alex,
>>>
>>> All of this makes sense, but are you correctly handling the Stalls argument to
>>> getHazardType? What are you doing with it?
>>>
>>> -Hal
>>>
>>>
>>> On 02/10/2017 02:42 PM, Alex Susu via llvm-dev wrote:
>>>> Hello.
>>>> I am progressing a bit with difficulty with the post RA scheduler
>>>> (PostRASchedulerList.cpp with ScoreboardHazardRecognizer) - the problem I have is that
>>>> it doesn't advance at the next available instruction when the overridden
>>>> ScoreboardHazardRecognizer::getHazardType() method returns NoopHazard and it gets stuck
>>>> at the same instruction (store in my runs).
>>>>
>>>> Just to make sure: I am trying to use the post-RA (Register Allocation) scheduler to
>>>> avoid data hazards by inserting, if possible, other USEFUL instructions from the program
>>>> instead of (just) NOPs. Is this out-of-order scheduling (e.g., using the
>>>> ScoreboardHazardRecognizer) that employs useful program instructions instead of NOPs
>>>> working well with the post-RA scheduler?
>>>> Otherwise, if the post RA scheduler only inserts NOPs, since I have issues using it,
>>>> I could as well insert NOPs in the [Target]AsmPrinter.cpp module .
>>>>
>>>> Thank you,
>>>> Alex
>>>>
>>>> On 2/10/2017 1:42 AM, Hal Finkel wrote:
>>>>>
>>>>> On 02/09/2017 04:46 PM, Alex Susu via llvm-dev wrote:
>>>>>> Hello.
>>>>>> Hal, thank you for the information.
>>>>>> I managed to get inspired from PPCHazardRecognizers.cpp. So I created my very
>>>>>> simple
>>>>>> [Target]HazardRecognizers.cpp pass that is also derived from
>>>>>> ScoreboardHazardRecognizer.
>>>>>> My class only implements the method getHazardType(), which checks if, as stated in my
>>>>>> first email, for example, I have a store instruction that is storing the value updated
>>>>>> by the instruction immediately above, which is NOT ok, since for my processor this is a
>>>>>> data hazard and in this case I have to insert a NOP in between by making
>>>>>> getHazardType()
>>>>>> to:
>>>>>> return NoopHazard; // this basically emits noop
>>>>>>
>>>>>> However, to my surprise, my very simple post-RA scheduler (using my class derived
>>>>>> from ScoreboardHazardRecognizer) is cycling FOREVER after this return NoopHazard, by
>>>>>> calling getHazardType() again and again for this SAME store instruction I found in the
>>>>>> first place with the data hazard problem. So, llc is no longer finishing - I have to
>>>>>> stop the process because of this strange behavior.
>>>>>> I was expecting after the first call to getHazardType() with the respective store
>>>>>> instruction (and return NoopHazard) that the scheduler would move forward to the other
>>>>>> instructions in the DAG/basic-block.
>>>>>
>>>>> It should emit a nop if all available instructions return NoopHazard.
>>>>>
>>>>>>
>>>>>> Do you have an idea what can I do to fix this problem?
>>>>>
>>>>> I'm not sure. I recall running into a situation like this years ago, but I don't recall
>>>>> now how I resolved it. Are you correctly handling the Stalls argument to getHazardType?
>>>>>
>>>>> -Hal
>>>>>
>>>>>>
>>>>>> Thank you very much,
>>>>>> Alex
>>>>>>
>>>>>> On 2/3/2017 10:25 PM, Hal Finkel wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> You can program a post-RA scheduler which will return NoopHazard in the appropriate
>>>>>>> circumstances. You can look at the PowerPC target (e.g.
>>>>>>> lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example.
>>>>>>>
>>>>>>> -Hal
>>>>>>>
>>>>>>>
>>>>>>> On 02/02/2017 05:03 PM, Alex Susu via llvm-dev wrote:
>>>>>>>> Hello.
>>>>>>>> I see there is little information on specifying instructions with delay slots.
>>>>>>>> So could you please tell me how can I insert NOPs (BEFORE or after an
>>>>>>>> instruction)
>>>>>>>> or how to make an aware instruction scheduler in order to avoid miscalculations
>>>>>>>> due to
>>>>>>>> the delay slot effect?
>>>>>>>>
>>>>>>>> More exactly, I have the following constraints on my (SIMD) processor:
>>>>>>>> - certain stores or loads, must be executed 1 cycle after the instruction
>>>>>>>> generating their input operands ends. For example, if I have:
>>>>>>>> R1 = R2 + R3
>>>>>>>> LS[R10] = R1 // this will not produce the correct result because it does not
>>>>>>>> see the updated value of R1 from the previous instruction
>>>>>>>> To make this code execute correctly we need to insert a NOP:
>>>>>>>> R1 = R2 + R3
>>>>>>>> NOP // or other instruction to fill the delay slot
>>>>>>>> LS[R10] = R1
>>>>>>>>
>>>>>>>> - a compare instruction requires to add a NOP after it, before the predicated
>>>>>>>> block (something like a conditional JMP instruction) starts.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Alex
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
More information about the llvm-dev
mailing list