[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

Thu Mar 31 02:13:34 PDT 2022

lenary added a comment.

Thanks for the review. Lots of comments inline, hopefully Phab doesn't mangle the large one.

================
Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:145
+  case ARM::VMVNq:
+    return CondCodeIsAL(3);
+  // VMOV of 64-bit value between D registers (when condition = al)
----------------
dmgreen wrote:
> Can/should all these use findFirstPredOperandIdx?
> 
> And is it worth checking for more instruction? Anything that sets a Q register, or is that too broad?
`findFirstPredOperandIdx` doesn't work as lots of these instructions are not marked `isPredicable` in the tablegen. I'm not sure if we want to solve that in this work, or as a follow-up (I'd lean towards follow-up).

I believe "anything that sets a Q register" is too broad, as we model subregister insertion as setting the the whole register in LLVM, but I'm not sure that micro-architecturally they are actually doing that. This is why I've tried to only add 64- and 128-bit setting instructions rather than ones that are less wide. Originally I also included the `VMOVv2f32` instructions that are now at the bottom of this switch, but I felt that might have been too risky.

================
Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:250
+    // Early return if no instructions are the start of an AES Pair.
+    if (!llvm::any_of(MBB.instrs(), isFirstAESPairInstr))
+      continue;
----------------
dmgreen wrote:
> This needn't scan through checking for the instruction that the loop below is going to check for too.
Ack. Will remove. I think this is vestigal from a previous (unshared) version of the patch which was doing something more complex in the loop.

================
Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:306
+          // fixup at the start of the function.
+          LLVM_DEBUG(dbgs()
+                     << "Fixup Planned: Live-In (with safe defining instrs): "
----------------
dmgreen wrote:
> Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I understanding that correctly that it would be:
> ```
> function(q0, ...)
>   lotsofcode...
>   q0 = load
>   aes q0
> ```
> Is there a better way to detect that the live-in doesn't matter in cases like that?
I don't believe there is, and this comes down to issues with the `RDA.getGlobalReachingDefs` implementation, which I want to fix/enhance, but in a follow-up patch.

To start with, this is actually not a problem, as the pass is intended to be conservative, and we know the clobbers are a no-op at the architectural level (we insert them for their micro-architectural effects). So code will still do the right thing, but maybe with a little too much overhead in the case you showed.

However, this is necessary in some other cases, such as:

```
function(q0)
   code
   conditional branch to L2
L1:
   q0 = safe_op(…)
   branch to L3
L2:
   code without update to q0
L3:
   aes q0
```

In this case, `AllDefs` is a set containing one single defining instruction for Q0, because there is only one within the function (which is all that `RDA.getGlobalReachingDefs` can report instructions for).

But in my example, we *need* to protect q0 on the other paths, because the safe definitions of q0, when considered as a set, do not entirely dominate the AES use of q0 (this is slightly stretching the conventional definition of dominance, but think of this as "there exists a path from entry to the aes, which does not contain any of the safe instructions". Sadly, MachineDominance doesn't allow us to make this kind of query either!).

In this case though, it is safe to insert the protection at function entry, because that will (by definition) dominate all the AES uses, and the protection doesn't need to be dominated by the safe definitions, as we know they're safe.

I intend to follow-up this initial patch with an enhancement to ReachingDefAnalysis which will also provide the information that you have a set of defs inside the function, and also you're live-in, as this is required info for any conservative pass using the ReachingDefAnalysis. I felt, however, that given the pass is safe as-is, it was good to proceed without this enhancement.

================
Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:585

+  addPass(createARMFixCortexA57AES1742098Pass());
+
----------------
dmgreen wrote:
> Passes can't insert new instructions (or move things further apart) after ConstantIslandPass. The branches/constant pools it has created may go out of range of the instructions that use them. Would it be OK to move it before that?
TIL. I'll add a comment about the constant island pass as well.

Should I also look at the restrictions on the Branch Targets pass? I can imagine we also don't want to separate instructions once we've calculated their targets locally?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720