[LLVMdev] how to detect data hazard in pre-RA-sched

Wed Sep 25 23:03:12 PDT 2013

On Wed, Sep 25, 2013 at 1:15 PM, Andrew Trick <atrick at apple.com> wrote:

>
> On Sep 24, 2013, at 7:59 PM, Liu Xin <navy.xliu at gmail.com> wrote:
>
> Hi, Andrew,
>
> Thank you for answering my question.
>
> What's the status of misched? is it experimental? I found it is disabled
> by default for all architectures(3.4svn). I also don't understand the
> algorithm.  Could you point to me more papers or text materials about your
> approach?  it seems that you want to balance register pressure and ILP in
> misched.
>
>
> It has been used in production for a year. It’s currently enabled on trunk
> for PPC, R600, and Hexagon. If there are no objections I’d like to move x86
> and armv7 ASAP. Leaving it disabled is becoming more of a maintenance
> burden.
>
>

> Please see my llvm-dev list messages to Ghassan yesterday. MI Scheduler is
> pass that just provides a place to do scheduling and a large toolbox to do
> it with. ScheduleDAGMI is a list scheduler driver, and the GenericScheduler
> strategy attempts to balance register pressure with latency. In my opinion
> getting the right register pressure vs latency balance is easy to do at a
> given point in time for a small benchmark suite, but very, very hard to do
> in general with a design that works across microarchitectures and is
> resilient to changes to incoming IR. GenericScheduler doesn’t magically
> solve this problem, but it should never do anything too terrible either.
>
> Sorry, I have a false statement above. I tried x86/arm/mips and found no
misched in use.  you means misched is just like a framework.  Backend can
configure TargetConfigaPass to run misched both pre-RA and post-RA, right?

 On Tue, Sep 24, 2013 at 4:07 PM, Andrew Trick <atrick at apple.com> wrote:
>
>>
>> On Sep 21, 2013, at 8:02 PM, Liu Xin <navy.xliu at gmail.com> wrote:
>>
>> > hi, LLVM,
>> >
>> > I found there is a flag DisableHazardRecognizer in TargetInstrImpl.cpp.
>> I still don't understand how llvm detects data hazard in pre-RA-sched.
>> pre-RA-sched is based on SDNode and all operands are vregs. Even you can
>> calculate the operators of SDNodes, the data hazard in vreg are not same as
>> physical register data hazard. Is it useful to optimize processor pipeline?
>>
>> The hazard recognizer enforces the instruction itineraries that are
>> defined for some subtargets. The itineraries specify resource usage at each
>> pipeline stage and latency. The "hazards" being recognized are resource
>> conflicts, like two independent instructions using the FP unit, or read
>> after write latency. It does not deal with WAR physical register hazards.
>>
>> (Targets are migrating to a more flexible and efficient machine model now
>> that does not use the hazard recognizer.)
>>
>> I don't understand this statement. what's the meaning of "more flexible &
> efficient machine model".  I know intel x86 processors are featured with
> aggressive out of order function, but arm and mips don't have it. Server
> processor can have, embedded processor will not. Compiler writers still
> need to consider instruction pipeline and multiple issue.
>
> Our processor still uses mips-like multiple-stage pipeline, almost same as
> what textbook taught me. We suffer from pipeline stalls and manager to
> improve issue rate using instruction scheduling. by now, I use
> post-RA-sched because It can build graph whose edges are dependencies. the
> dependencies are real basing on physical register and instruction
> attributes.  Because misched happens before register allocation, I don't
> think I can make use of it to resolve data hazard. am I right?
>
>
> The old itineraries allow specifying which resources are used in each
> pipeline stage. It’s a full matrix.
>
> In the new machine model, you only specify the resources and number of
> cycles. It can be implemented with simple counters. This works in practice
> because it’s almost always the case that different instructions begin using
> a given resource at the same time relative to when the instruction is
> executed. Even the VLIW implementation I’ve seen in trunk could have used
> the new model.
>
> It’s efficient because the scheduler doesn’t need to manage a reservation
> table or build a state machine.
>
> It’s more flexible because predicates allow instructions to be modeled
> differently based on opcode extensions or immediate values.
>
> I got it. I do feel the scoreboard state-tracker is not that useful except
for software pipelining. I have resolved my instruction pipeline stall
using existing post-RA-Sched(TD). I will investigate the new misched
approach later.  Ghassan mentioned llvm performance regression in previous
thread. Do you measure perf impact of compiler using llvm-test-suite ? As
you said, there's no absolutely one good algorithm for instruction
scheduler , so I have to learn measure my changes.

> The postRA hazard that your talking about is the job of the dependence
> graph builder. That is the same for both post-RA and MI sched. When the DAG
> builder runs before regalloc, it also has to handle virtual registers,
> that’s the only difference.
>
> The best way for me to explain how to define a machine model for an
> in-order processor would be to work with someone who is ready to migrate
> mips or a simple ppc, arm, or x86 (atom) implementation and improve the
> docs along the way.
>
> We’re also lacking a model for AVX!
>
> -Andy
>

thanks,
--lx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130926/54e75e2f/attachment.html>