[llvm-dev] Reg units for unaddressable register parts?
Bruce Hoult via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 29 15:42:22 PDT 2016
On Fri, Sep 30, 2016 at 3:45 AM, Krzysztof Parzyszek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> The problem is that at the moment, the last instruction in
> EAX = ...
> AX = ...
> ... = EAX
> would seem to only use the value from the second one, since AX= defines
> all lanes/units that EAX has. This kind of inaccuracy is not just
> suboptimal, it would lead to an incorrect conclusion. Currently, only
> x86-specific knowledge would tell us that the first instruction is still
> live, and I'd like to be able to tell by examining lane masks/reg units.
Code like this does works ok to merge the top half of EAX with the new
value inserted in AX (or AL, AH), but on many CPUs it is very slow --
slower than using proper machine-independent masking operations.
This is because the CPUs *themselves* track EAX and AX separately in the
register renaming machinery, and have to wait until the write to AX has
actually retired before EAX can be read again.
On Pentium Pro, P2, P3 this caused about a half dozen cycle stall. On Core2
it was reduced to 2 or 3 cycles. I'm not sure about P4. I think not good
:-) Sometime around Nehalem or Sandy Bridge it was finally eliminated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev