[llvm-commits] [llvm] r85697 - in /llvm/trunk: lib/Target/ARM/ARMInstrNEON.td test/CodeGen/ARM/fmacs.ll test/CodeGen/ARM/fnmacs.ll test/CodeGen/Thumb2/cross-rc-coalescing-2.ll

Mon Nov 2 09:26:25 PST 2009

The itineraries are much more expressive now. They can model arbitrary  
resource usage as well as operand use and def stages (see  
ARMScheduleV7.td... don't get me started on the cortex-a8  
microarchitecture ;-). I've been comparing our cortex-a8 scheduling  
against shark and so far it is quite accurate. There is also a generic  
callback into the subtarget to let it adjust latency for cases that  
don't fit the model... though I haven't used that yet.

I assume this is an important case that is not being handled  
correctly. Is there an example?

David

On Nov 1, 2009, at 10:31 AM, Evan Cheng wrote:

>
> On Nov 1, 2009, at 10:22 AM, Anton Korobeynikov wrote:
>
>> Hello, Evan
>>
>>> On the other hand, a vmla.32 followed by another vmla.32 is just  
>>> fine. And
>>> it is faster than vmul + vadd. I agree we should try to solve it  
>>> better.
>>> Perhaps expanding it before or during schedule2.
>> Right, NEON scheduling is tricky, it seems that our instruction
>> itineraries are not expressible enough for such complex pipelines.
>
> I think we should be able to handle at least the true dependency  
> cases. Instruction latency is a function of both defining  
> instruction and the use. cc'ing David for his comments.
>
> Evan
>
>>
>> -- 
>> With best regards, Anton Korobeynikov
>> Faculty of Mathematics and Mechanics, Saint Petersburg State  
>> University
>