[PATCH] D101584: [dfsan] Fix origin tracking for fast8

Fri Apr 30 14:47:02 PDT 2021

gbalats marked an inline comment as done.
gbalats added inline comments.

================
Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:505
   void injectMetadataGlobals(Module &M);
+  Value *loadNextOrigin(Instruction *Pos, Align OriginAlign,
+                        Value **OriginAddr);
----------------
stephan.yichao.zhao wrote:
> -> loadNextOriginAndIncAddr or a similar name? So a reader can follow the main code w/o looking into the method.
The "Next" part of the name alludes to a pointer advance (how can you load the next thing without advancing the pointer). Also, the fact that the `OriginAddr` param is passed by pointer (indicating that it will be updated). However, I did add some documentation to make it clearer.

================
Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:2152
+          IRB.CreateShl(CombinedWideShadow,
+                        ConstantInt::get(DFS.IntptrTy, WideShadowBitWidth / 2));
+      Value *SecondOrigin = DFS.loadNextOrigin(Pos, OriginAlign, &OriginAddr);
----------------
stephan.yichao.zhao wrote:
> The [[ https://releases.llvm.org/2.6/docs/LangRef.html#i_shl | LangRef ]] needs both operands have the same type. 
> I guess on a 64bit system, this is fine. But making this one have type WideShadowTy would make them always consistent.
Done. Thanks for catching this!

================
Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:2191
+        Shadows.push_back(NextWideShadow);
+        Origins.push_back(DFS.loadNextOrigin(Pos, OriginAlign, &OriginAddr));
+      }
----------------
stephan.yichao.zhao wrote:
> If we lift this origin load out of the if-else, this if-else could be shared with the if-else before the for-loop, so the comments and the assertion assert(BytesPerWideShadow == 8); will also be shared.
> 
> ```
> FirstOrigin = DFS.loadNextOrigin(Pos, OriginAlign, &OriginAddr);
> if () {
>    ...
>    Origins.push_back(FirstOrigin)
> } else {
>    ...
>    Origins.push_back(FirstOrigin)
> }
> 
> ```
Done. Extracted to a lambda.

================
Comment at: llvm/test/Instrumentation/DataFlowSanitizer/origin_load.ll:258
   ; CHECK-NEXT:            %[[#WIDE_SHADOW:]] = load i64, i64* %[[#WIDE_SHADOW_PTR]], align [[#SBYTES]]
+  ; CHECK8-NEXT:           %[[#WIDE_SHADOW_LO:]] = shl i64 %[[#WIDE_SHADOW]], 32
+  ; CHECK8-NEXT:           %[[#ORIGIN_PTR2:]] = getelementptr i32, i32* %[[#ORIGIN_PTR]], i64 1
----------------
stephan.yichao.zhao wrote:
> If WIDE_SHADOW_LO is named to be WIDE_SHADOW_1; and WIDE_SHADOW is named WIDE_SHADOW_12; and ORIGIN is named ORIGIN1; 
>  and we do not reuse ORIGIN, but named new assignment to ORIGIN as ORIGIN23 if it is a select from ORIGIN2 and ORIGIN3, it helps read the code. 
I think if we use WIDE_SHADOW_1 instead of _LO it will be difficult to differentiate between the different wide shadows and their left-shifted counterparts. The naming pattern I used is adding `_LO` suffix to indicate that `WIDE_SHADOW(N)_LO` is the left-shifted version of `WIDE_SHADOW(N)`. Overwriting `ORIGIN` is essential to keep the test smaller by reusing the same expression later on (see lines 312, 316). We need a way to state that, whatever path has been taken (fast8 or fast16, combine load ptr or not), the `ORIGIN` will refer to the name of the final origin that we have computed. Otherwise, we'd have to replicate those lines as well even though they are essentially the same.

However, I can partially address this (remove some of the overwriting) when removing fast16.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101584/new/

https://reviews.llvm.org/D101584