[PATCH] D44428: [X86][SSE] Treat (V)MOVAPD/(V)MOVUPD + (V)MOVAPS/(V)MOVUPS reg-reg instructions as moves not shuffles

Tue Mar 13 08:30:55 PDT 2018

courbet added a comment.

Hi Simon,

Can you elaborate on how you used llvm-mca to derive this ?

Using the compute_itineraries <https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc> tool that I've mentioned in the past a haswell machine I see `VMOVUPSrr` and other variants use only `HWPort5`, which would make the `WriteFShuffle` more accurate.
For sandybridge, results are consistent (only HWPort5 is used).
(Note that we have an LLVM version of this tool for which we'll send an RFC shortly).

Also note that most CPU models override the sched class specifically for these instructions, e.g. on haswell:

  def HWWriteResGroup4 : SchedWriteRes<[HWPort5]> {
    let Latency = 1;
    let NumMicroOps = 1;
    let ResourceCycles = [1];
  }
  def: InstRW<[HWWriteResGroup4], (instregex "VMOVUPSrr")>;

So the change here is not going to change specific CPUs.

Repository:
  rL LLVM

https://reviews.llvm.org/D44428