[LLVMdev] how to detect data hazard in pre-RA-sched

Wed Sep 25 23:30:29 PDT 2013

On Thu, Sep 26, 2013 at 2:17 PM, Andrew Trick <atrick at apple.com> wrote:

>
> On Sep 25, 2013, at 11:03 PM, Liu Xin <navy.xliu at gmail.com> wrote:
>
>
>
>
> On Wed, Sep 25, 2013 at 1:15 PM, Andrew Trick <atrick at apple.com> wrote:
>
>>
>> On Sep 24, 2013, at 7:59 PM, Liu Xin <navy.xliu at gmail.com> wrote:
>>
>> Hi, Andrew,
>>
>> Thank you for answering my question.
>>
>> What's the status of misched? is it experimental? I found it is disabled
>> by default for all architectures(3.4svn). I also don't understand the
>> algorithm.  Could you point to me more papers or text materials about your
>> approach?  it seems that you want to balance register pressure and ILP in
>> misched.
>>
>>
>> It has been used in production for a year. It’s currently enabled on
>> trunk for PPC, R600, and Hexagon. If there are no objections I’d like to
>> move x86 and armv7 ASAP. Leaving it disabled is becoming more of a
>> maintenance burden.
>>
>>
>
>
>> Please see my llvm-dev list messages to Ghassan yesterday. MI Scheduler
>> is pass that just provides a place to do scheduling and a large toolbox to
>> do it with. ScheduleDAGMI is a list scheduler driver, and the
>> GenericScheduler strategy attempts to balance register pressure with
>> latency. In my opinion getting the right register pressure vs latency
>> balance is easy to do at a given point in time for a small benchmark suite,
>> but very, very hard to do in general with a design that works across
>> microarchitectures and is resilient to changes to incoming IR.
>> GenericScheduler doesn’t magically solve this problem, but it should never
>> do anything too terrible either.
>>
>> Sorry, I have a false statement above. I tried x86/arm/mips and found no
> misched in use.  you means misched is just like a framework.  Backend can
> configure TargetConfigaPass to run misched both pre-RA and post-RA, right?
>
>
> It's currently only setup to run pre-RA. I'd like to set it up for post-RA
> also. I don't expect that to be much work.
>
> Backends can configure MI scheduler differently depending on how much
> control they want. The easiest thing to do is define bool
> <My>SubTargetInfo::enableMachineScheduler() const { return true; }
>
> Sorry, did you mention which target you're developing?
>
>
We develop a backend for our in-house processor. Currently, we are working
on llvm 3.2 release. I also evaluate arm and mips for comparison. I will
tweak misched in your pointers.

Thank you for you insightful help!

--lx

>  On Tue, Sep 24, 2013 at 4:07 PM, Andrew Trick <atrick at apple.com> wrote:
>>
>>>
>>> On Sep 21, 2013, at 8:02 PM, Liu Xin <navy.xliu at gmail.com> wrote:
>>>
>>> > hi, LLVM,
>>> >
>>> > I found there is a flag DisableHazardRecognizer in
>>> TargetInstrImpl.cpp. I still don't understand how llvm detects data hazard
>>> in pre-RA-sched. pre-RA-sched is based on SDNode and all operands are
>>> vregs. Even you can calculate the operators of SDNodes, the data hazard in
>>> vreg are not same as physical register data hazard. Is it useful to
>>> optimize processor pipeline?
>>>
>>> The hazard recognizer enforces the instruction itineraries that are
>>> defined for some subtargets. The itineraries specify resource usage at each
>>> pipeline stage and latency. The "hazards" being recognized are resource
>>> conflicts, like two independent instructions using the FP unit, or read
>>> after write latency. It does not deal with WAR physical register hazards.
>>>
>>> (Targets are migrating to a more flexible and efficient machine model
>>> now that does not use the hazard recognizer.)
>>>
>>> I don't understand this statement. what's the meaning of "more flexible
>> & efficient machine model".  I know intel x86 processors are featured with
>> aggressive out of order function, but arm and mips don't have it. Server
>> processor can have, embedded processor will not. Compiler writers still
>> need to consider instruction pipeline and multiple issue.
>>
>> Our processor still uses mips-like multiple-stage pipeline, almost same
>> as what textbook taught me. We suffer from pipeline stalls and manager to
>> improve issue rate using instruction scheduling. by now, I use
>> post-RA-sched because It can build graph whose edges are dependencies. the
>> dependencies are real basing on physical register and instruction
>> attributes.  Because misched happens before register allocation, I don't
>> think I can make use of it to resolve data hazard. am I right?
>>
>>
>> The old itineraries allow specifying which resources are used in each
>> pipeline stage. It’s a full matrix.
>>
>> In the new machine model, you only specify the resources and number of
>> cycles. It can be implemented with simple counters. This works in practice
>> because it’s almost always the case that different instructions begin using
>> a given resource at the same time relative to when the instruction is
>> executed. Even the VLIW implementation I’ve seen in trunk could have used
>> the new model.
>>
>> It’s efficient because the scheduler doesn’t need to manage a reservation
>> table or build a state machine.
>>
>> It’s more flexible because predicates allow instructions to be modeled
>> differently based on opcode extensions or immediate values.
>>
>> I got it. I do feel the scoreboard state-tracker is not that useful
> except for software pipelining. I have resolved my instruction pipeline
> stall using existing post-RA-Sched(TD). I will investigate the new misched
> approach later.  Ghassan mentioned llvm performance regression in previous
> thread. Do you measure perf impact of compiler using llvm-test-suite ? As
> you said, there's no absolutely one good algorithm for instruction
> scheduler , so I have to learn measure my changes.
>
>
> Yes, I use llvm-suite suite, but don't tune for it at all. It's just a way
> to find bugs. We have a few other benchmark suites of course. I really
> encourage people to run their own benchmarks on their own hardware. Of
> course, if you see a problem, it's good to run -debug-only=misched to find
> out what happened. It could be a simple bug or configuration error.
>
> -Andy
>
>
>
>> The postRA hazard that your talking about is the job of the dependence
>> graph builder. That is the same for both post-RA and MI sched. When the DAG
>> builder runs before regalloc, it also has to handle virtual registers,
>> that’s the only difference.
>>
>> The best way for me to explain how to define a machine model for an
>> in-order processor would be to work with someone who is ready to migrate
>> mips or a simple ppc, arm, or x86 (atom) implementation and improve the
>> docs along the way.
>>
>> We’re also lacking a model for AVX!
>>
>> -Andy
>>
>
> thanks,
> --lx
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130926/f1dd60bb/attachment.html>