[llvm-dev] Specify special cases of delay slots in the back end

Fri Feb 10 13:33:01 PST 2017

Hi Alex,

All of this makes sense, but are you correctly handling the Stalls 
argument to getHazardType? What are you doing with it?

  -Hal

On 02/10/2017 02:42 PM, Alex Susu via llvm-dev wrote:
>   Hello.
>    I am progressing a bit with difficulty with the post RA scheduler 
> (PostRASchedulerList.cpp with ScoreboardHazardRecognizer) - the 
> problem I have is that it doesn't advance at the next available 
> instruction when the overridden 
> ScoreboardHazardRecognizer::getHazardType() method returns NoopHazard 
> and it gets stuck at the same instruction (store in my runs).
>
>    Just to make sure: I am trying to use the post-RA (Register 
> Allocation) scheduler to avoid data hazards by inserting, if possible, 
> other USEFUL instructions from the program instead of (just) NOPs. Is 
> this out-of-order scheduling (e.g., using the 
> ScoreboardHazardRecognizer) that employs useful program instructions 
> instead of NOPs working well with the post-RA scheduler?
>     Otherwise, if the post RA scheduler only inserts NOPs, since I 
> have issues using it, I could as well insert NOPs in the 
> [Target]AsmPrinter.cpp module .
>
>   Thank you,
>     Alex
>
> On 2/10/2017 1:42 AM, Hal Finkel wrote:
>>
>> On 02/09/2017 04:46 PM, Alex Susu via llvm-dev wrote:
>>>   Hello.
>>>     Hal, thank you for the information.
>>>     I managed to get inspired from PPCHazardRecognizers.cpp. So I 
>>> created my very simple
>>> [Target]HazardRecognizers.cpp pass that is also derived from 
>>> ScoreboardHazardRecognizer.
>>> My class only implements the method getHazardType(), which checks 
>>> if, as stated in my
>>> first email, for example, I have a store instruction that is storing 
>>> the value updated
>>> by the instruction immediately above, which is NOT ok, since for my 
>>> processor this is a
>>> data hazard and in this case I have to insert a NOP in between by 
>>> making getHazardType()
>>> to:
>>>       return NoopHazard; // this basically emits noop
>>>
>>>     However, to my surprise, my very simple post-RA scheduler (using 
>>> my class derived
>>> from ScoreboardHazardRecognizer) is cycling FOREVER after this 
>>> return NoopHazard, by
>>> calling getHazardType() again and again for this SAME store 
>>> instruction I found in the
>>> first place with the data hazard problem. So, llc is no longer 
>>> finishing - I have to
>>> stop the process because of this strange behavior.
>>>     I was expecting after the first call to getHazardType() with the 
>>> respective store
>>> instruction (and return NoopHazard) that the scheduler would move 
>>> forward to the other
>>> instructions in the DAG/basic-block.
>>
>> It should emit a nop if all available instructions return NoopHazard.
>>
>>>
>>>     Do you have an idea what can I do to fix this problem?
>>
>> I'm not sure. I recall running into a situation like this years ago, 
>> but I don't recall
>> now how I resolved it. Are you correctly handling the Stalls argument 
>> to getHazardType?
>>
>>  -Hal
>>
>>>
>>>   Thank you very much,
>>>     Alex
>>>
>>> On 2/3/2017 10:25 PM, Hal Finkel wrote:
>>>> Hi Alex,
>>>>
>>>> You can program a post-RA scheduler which will return NoopHazard in 
>>>> the appropriate
>>>> circumstances. You can look at the PowerPC target (e.g.
>>>> lib/Target/PowerPC/PPCHazardRecognizers.cpp) as an example.
>>>>
>>>>  -Hal
>>>>
>>>>
>>>> On 02/02/2017 05:03 PM, Alex Susu via llvm-dev wrote:
>>>>>   Hello.
>>>>>     I see there is little information on specifying instructions 
>>>>> with delay slots.
>>>>>     So could you please tell me how can I insert NOPs (BEFORE or 
>>>>> after an instruction)
>>>>> or how to make an aware instruction scheduler in order to avoid 
>>>>> miscalculations due to
>>>>> the delay slot effect?
>>>>>
>>>>>     More exactly, I have the following constraints on my (SIMD) 
>>>>> processor:
>>>>>       - certain stores or loads, must be executed 1 cycle after 
>>>>> the instruction
>>>>> generating their input operands ends. For example, if I have:
>>>>>          R1 = R2 + R3
>>>>>          LS[R10] = R1 // this will not produce the correct result 
>>>>> because it does not
>>>>> see the updated value of R1 from the previous instruction
>>>>>        To make this code execute correctly we need to insert a NOP:
>>>>>          R1 = R2 + R3
>>>>>          NOP // or other instruction to fill the delay slot
>>>>>          LS[R10] = R1
>>>>>
>>>>>       - a compare instruction requires to add a NOP after it, 
>>>>> before the predicated
>>>>> block (something like a conditional JMP instruction) starts.
>>>>>
>>>>>
>>>>>   Thank you,
>>>>>     Alex
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory