[PATCH] D126700: [MachineScheduler] Order more stores by ascending address

Tue Jun 7 10:52:13 PDT 2022

dmgreen added a comment.

OK I see, thanks. In this version on godbolt I only see ordered STPQ's, perhaps you have some downstream differences that would alter the codegen?
https://godbolt.org/z/1nbr7Eh8r

With this kind of thing we always have two options. Either we can do it unconditionally for all cpus, or we can add a subtarget tuning feature to steer the codegen for specific cpus. There are a number already like FeatureFuseAES that control fusing in the machine scheduler. Adding another to control a more aggressive ordering in AArch64PostRASchedStrategy sounds perfectly sensible to me.

The other option is to do this for all cpus. The general rule there is it should be something that we expect to be beneficial or benign on all/most cpus that would run -mcpu=generic code. I'm not strongly against that for this patch, we need to check that with benchmarks and I did see a case in testing which changed the code from:

  STR      q1,[sp,#0x40]     
  LDP      q1,q2,[x25,#-0x10]
  STR      q0,[sp,#0x60]     
  LDUR     q0,[x24,#-0x10]   
  FDIV     v0.2D,v0.2D,v1.2D 
  STP      q2,q1,[sp,#0x20]  
  ...
  =>
  STR      q1,[sp,#0x40]     
  LDP      q1,q2,[x25,#-0x10]
  STP      q2,q1,[sp,#0x20]  
  STR      q0,[sp,#0x60]     
  LDUR     q0,[x24,#-0x10]   
  FDIV     v0.2D,v0.2D,v1.2D 
  ...

Those clumped-together stores were apparently a little slower. Perhaps it was just unlucky, and it is only one case. I would guess that in-order memory operations are a little easier for prefetchers than ones that bobble around, but I don't have any evidence of that, and prefetchers are usually pretty sophisticated nowadays. It would take a complex case for something like that to matter, and complex cases can easily defy expectation.

So - do you have any other benchmark results, on any other cpus? Or as an alternative, do you think that adding a subtarget feature is an OK solution?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126700/new/

https://reviews.llvm.org/D126700