[PATCH] D22011: [SystemZ] Generate fewer instructions for (sub <constant>, x)

Ulrich Weigand via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 7 08:36:07 PDT 2016

uweigand added a comment.

By longer dependency chain I mean this:  With the my new variant, the final result depends on the SUBTRACT, which has two inputs, the original input and the constant in another register.  Those two inputs do not depend on one another, which means the ouf-of-order logic in the processor can schedule them independently of each other, and they could in particular run in parallel (or loading the constant can happen very early).  So the overall number of cycles from the point the original input is available to the point the output is available is just the latency time of the SUBTRACT, and loading the constant may not affect the overall run time at all.  [ As always, this is subject to many details, and can be wrong if the OOO logic is already heavily loaded. ]

However, with your variant, the overall result depends on the ADD IMMEDIATE, which depends on the LOAD COMPLEMENT, which depends on the original input.  This means the overall latency of the operation is always the *sum* of the latencies of the two instructions.  [ Since all instructions in question here are simple integer arithmetic, they will all have a pipelined latency of 1 cycle, so the new variant in effect doubles the latency. ]

[ With the original code with the additional LOAD REGISTER, there's indeed also a longer dependency chain; but that is really limited to this specific test case where input and output are forced into the same register due the ABI constraints ... in general, if the code occurs in the middle of a larger sequence, the register allocator will usually make that LOAD REGISTER go away.  In fact, I'm wondering why the register allocator doesn't make it go away even in this case by using the three-operand instruction variant ... ]


More information about the llvm-commits mailing list