[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

Tue Apr 5 08:59:34 PDT 2022

lenary marked 6 inline comments as done.
lenary added a comment.

A few comments before I post the next version of the patch.

================
Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:145
+  case ARM::VMVNq:
+    return CondCodeIsAL(3);
+  // VMOV of 64-bit value between D registers (when condition = al)
----------------
lenary wrote:
> dmgreen wrote:
> > Can/should all these use findFirstPredOperandIdx?
> > 
> > And is it worth checking for more instruction? Anything that sets a Q register, or is that too broad?
> `findFirstPredOperandIdx` doesn't work as lots of these instructions are not marked `isPredicable` in the tablegen. I'm not sure if we want to solve that in this work, or as a follow-up (I'd lean towards follow-up).
> 
> I believe "anything that sets a Q register" is too broad, as we model subregister insertion as setting the the whole register in LLVM, but I'm not sure that micro-architecturally they are actually doing that. This is why I've tried to only add 64- and 128-bit setting instructions rather than ones that are less wide. Originally I also included the `VMOVv2f32` instructions that are now at the bottom of this switch, but I felt that might have been too risky.
I'm wrong about this. The tablegen `isPredicable` is an override, other code might also set `isPredicable` to true, so I think `findFirstPredOperandIdx` should work too.

================
Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:200-207
+  case ARM::VMOVv2f32:
+  case ARM::VMOVv4f32:
+  case ARM::VMOVv2i32:
+  case ARM::VMOVv4i32:
+  case ARM::VMOVv4i16:
+  case ARM::VMOVv8i16:
+  case ARM::VMOVv8i8:
----------------
dmgreen wrote:
> Are these vmov of an immediate? Are they not safe?
> 
> I was expecting it to be the lanes sets (VSETLNi8) and other scalar instructions that were unsafe.
I have received the information on what is safe and what is not, and the next version of the patch will have this correct.

================
Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:306
+          // fixup at the start of the function.
+          LLVM_DEBUG(dbgs()
+                     << "Fixup Planned: Live-In (with safe defining instrs): "
----------------
dmgreen wrote:
> lenary wrote:
> > dmgreen wrote:
> > > Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I understanding that correctly that it would be:
> > > ```
> > > function(q0, ...)
> > >   lotsofcode...
> > >   q0 = load
> > >   aes q0
> > > ```
> > > Is there a better way to detect that the live-in doesn't matter in cases like that?
> > I don't believe there is, and this comes down to issues with the `RDA.getGlobalReachingDefs` implementation, which I want to fix/enhance, but in a follow-up patch.
> > 
> > To start with, this is actually not a problem, as the pass is intended to be conservative, and we know the clobbers are a no-op at the architectural level (we insert them for their micro-architectural effects). So code will still do the right thing, but maybe with a little too much overhead in the case you showed.
> > 
> > However, this is necessary in some other cases, such as:
> > 
> > ```
> > function(q0)
> >    code
> >    conditional branch to L2
> > L1:
> >    q0 = safe_op(…)
> >    branch to L3
> > L2:
> >    code without update to q0
> > L3:
> >    aes q0
> > ```
> > 
> > In this case, `AllDefs` is a set containing one single defining instruction for Q0, because there is only one within the function (which is all that `RDA.getGlobalReachingDefs` can report instructions for).
> > 
> > But in my example, we *need* to protect q0 on the other paths, because the safe definitions of q0, when considered as a set, do not entirely dominate the AES use of q0 (this is slightly stretching the conventional definition of dominance, but think of this as "there exists a path from entry to the aes, which does not contain any of the safe instructions". Sadly, MachineDominance doesn't allow us to make this kind of query either!).
> > 
> > In this case though, it is safe to insert the protection at function entry, because that will (by definition) dominate all the AES uses, and the protection doesn't need to be dominated by the safe definitions, as we know they're safe.
> > 
> > I intend to follow-up this initial patch with an enhancement to ReachingDefAnalysis which will also provide the information that you have a set of defs inside the function, and also you're live-in, as this is required info for any conservative pass using the ReachingDefAnalysis. I felt, however, that given the pass is safe as-is, it was good to proceed without this enhancement.
> > 
> > 
> > 
> > 
> OK sounds good.
One note is that the exact problem you describe does show up in the tests (in `aese_set64_via_ptr`, where the `vldr` is "safe"), so when the pass is enhanced, we will notice the improvements. 

================
Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:585

+  addPass(createARMFixCortexA57AES1742098Pass());
+
----------------
dmgreen wrote:
> lenary wrote:
> > dmgreen wrote:
> > > Passes can't insert new instructions (or move things further apart) after ConstantIslandPass. The branches/constant pools it has created may go out of range of the instructions that use them. Would it be OK to move it before that?
> > TIL. I'll add a comment about the constant island pass as well.
> > 
> > Should I also look at the restrictions on the Branch Targets pass? I can imagine we also don't want to separate instructions once we've calculated their targets locally?
> Yeah - It sounds like the BTI would need to remain as the first instruction in the block.
Turns out the AES pass doesn't *have* to come before the BTI pass, because AES instructions are only available on A-profile, and BTI is M-profile. I still will move it to before all these passes anyway, just so it's clear what is going on.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720