[PATCH] D74156: [llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE

Fri Feb 7 09:58:51 PST 2020

lebedev.ri added a comment.

In D74156#1864096 <https://reviews.llvm.org/D74156#1864096>, @gchatelet wrote:

> First, thank you for the patch, this is going in the right direction.

Thank you for taking a look!

> Now stepping back a bit there are many dimensions that we'd like to explore:

Yeah, i suspect as much :)

> - argument values (which is what you started here),

Ack.
I have started with this because this is the current itch for me;
while i'm aware of the others, this seemed most straight-forward.

> - register selection

Right. Currently unset registers are mostly picked randomly, within constraints.

> (registers of a class are not strictly equivalent [1])

I //think// that currently can't be expressed in sched models - is that planned to change,
or we just want to know when we fail to model things?

> - snippet generation (we always select the same pattern but exploring them would help [2])

which is //roughly// what `SerialSnippetGenerator::generateCodeTemplates()`/`appendCodeTemplates()` does,
but somewhat more general, correct?
This is not very useful until analyze learns to deal with serial chained instructions (D60000 <https://reviews.llvm.org/D60000>, stuck)

> - repetition mode (to see the impact of the decoder)

This //appears// to be currently handled via `-repetition-mode` switch (D68125 <https://reviews.llvm.org/D68125>).
I'm unfamiliar with that.
Does this really have to be accounted for (a dimension in) in the greedy approach?

> Code wise this means it would be much better to have a global sampler object
>  responsible for how to explore these dimensions rather than the greedy approach we're heading to.
>  Now I understand this is a substantial redesign and I'm not asking you to do it
>  but I just wanted to share what I believe is the right direction to lower code complexity in the long run.

For context, we kind-of already explore condcodes/registers, by producing them randomly;
so if we run a lot of benchmarks, we're bound to explore them,
but without good coverage/reproducibility/repeatability though.
Which is why i'm starting with this patch - even greedy is better than what currently is.

So it's not so much that I don't want to redesign as though
I'm not sure i currently fully grasp  the idea behind "global sampler object".
How would that work?

>  ---

> [1] `LEA` is known to produce different latencies <https://github.com/golang/go/issues/21735> when using EBP, RBP, or R13 <https://reviews.llvm.org/source/pstl/> as base registers

Yeah, i saw that on bdver2 too in rG76fcf900d58826d9f21c0dd7f02b61b4d59c9193 <https://reviews.llvm.org/rG76fcf900d58826d9f21c0dd7f02b61b4d59c9193>.

> [2] for instance
> 
>   XOR EAX, EAX, EAX
> 
> 
>   is self dependent but it's also a zero idiom, if we'd also executed the back to back pattern we would have learned something new
> 
>   XOR EBX, EAX, EAX
>   XOR EAX, EBX, EBX

Right. For latency we see

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/Target.cpp:749

+static bool RecursiveCombinationGenerator(
+    ArrayRef<SmallVector<MCOperand, 1>> VariableChoices,
----------------
gchatelet wrote:
> This function deserves some documentation
Yeah, and unit tests. this function ended up being *too* smart,
although this is the best version i was able to come up with so far.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74156/new/

https://reviews.llvm.org/D74156