[PATCH] D47723: [CodeGen] print max throughput for 0-latency insts

Tue Jun 5 03:58:55 PDT 2018

andreadb added a comment.

In https://reviews.llvm.org/D47723#1121600, @spatel wrote:

> Patch updated:
>  Please have a close look at the diffs because I'm not sure how to test all of these paths.
>
> The cases that I see behaving as intended via tests are the recent variant scheduled "xorps %xmm0, %xmm0" and existing non-zero latency cases like:
>  ; ZNVER1-SSE-NEXT:    ldmxcsr -{{[0-9]+}}(%rsp) # sched: [100:?]
>  (those cases are all unchanged, as I think is expected for this patch...but let me know if we want to treat that differently)

I think that `IssueWidth/NumMicroOps` should always be the upper bound for the throughput. So, I am of the opinion that we should not restrict this change to zero latency instructions.

> The big logical difference in this patch is that instead of using an Optional, we rely on FP NaN to represent the unknown state. Then, we modify the throughput calcs to account for the cases where no execution resources are specified by using default/max issue widths.

I think it is easier to use 0.0 as the default "unknown" value.
The reciprocal throughput can never be 0.0; a RThrougput of zero effectively means that the throughput tends to infinite. This is an impossible scenario since we don't have infinite parallelism in hardware (i.e. we cannot model infinite resources), and we cannot execute an infinite number of instructions per cycle. Essentially, 0.0 is our asymptote.
So, it is safe to use 0.0 as the special "unknown" throughput. For instructions that don't consume resource cycles we would still return N/IssueWidth.
Also, you don't need to use the "unknown" value in any max computation (see comments below).

================
Comment at: lib/CodeGen/TargetSchedule.cpp:336
+
+  return std::nan("");
 }
----------------
I would just return 0.0 instead of NaN.

================
Comment at: lib/CodeGen/TargetSchedule.cpp:351
+
+  return std::nan("");
 }
----------------
Same.

================
Comment at: lib/CodeGen/TargetSubtargetInfo.cpp:74
   raw_string_ostream CS(Comment);
-  if (RThroughput.hasValue())
-    CS << SchedPrefix << Latency << format(":%2.2f", RThroughput.getValue())
+  if (!std::isnan(RThroughput))
+    CS << SchedPrefix << Latency << format(":%2.2f", RThroughput)
----------------
If we use 0.0 as the unknown value, then this would become:
```
IF (Rthroughput != 0.0)
```

================
Comment at: lib/MC/MCSchedule.cpp:91-111
+  // fmin(NaN, number) returns number. But this function can return NaN if a
+  // schedule class specifies a latency without specifying a valid throughput.
+  double Throughput = std::numeric_limits<double>::quiet_NaN();
   const MCSchedModel &SM = STI.getSchedModel();
   const MCWriteProcResEntry *I = STI.getWriteProcResBegin(&SCDesc);
   const MCWriteProcResEntry *E = STI.getWriteProcResEnd(&SCDesc);
+
----------------
My opinion is that we should not limit this to zero-latency instructions.

Here, you could have just reused the original code which computes the Throughput as a Optional<double>.

If Throughput is not defined, then simply return `SCDesc.NumMicroOps / SM.IssueWidth`.
Basically, you should be able to only change the return statement.

================
Comment at: lib/MC/MCSchedule.cpp:149
+  if (I == E)
+    Throughput = DefaultIssueWidth;
+
----------------
You can simply return `1/DefaultIssueWidth`.
This computation may not accurate; ideally, we should use the actual IssueWidth from the scheduling model. In practice, this code is not used: x86 models don't use itineraries, and - as far as I know - x86 is the only target that enables the `-print-schedule` functionality.
Fixing this would probably require a change to this API, which is probably outside of the scope of this patch. For now, I think that returning a default `1/DefaultIssueWidth` is good enough.

https://reviews.llvm.org/D47723