[PATCH] D92261: [dfsan] Track field/index-level shadow values in variables

Wed Dec 9 06:45:43 PST 2020

morehouse added inline comments.

================
Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:1772
     Value *Addr8 = IRB.CreateBitCast(SI.getPointerOperand(), DFSF.DFS.Int8Ptr);
-    IRB.CreateCall(DFSF.DFS.DFSanStoreCallbackFn, {Shadow, Addr8});
+    // Value *PrimitiveShadow = DFSF.collapseToPrimitiveShadow(Shadow, &SI);
+    IRB.CreateCall(DFSF.DFS.DFSanStoreCallbackFn, {PrimitiveShadow, Addr8});
----------------
Please remove this commented-out code.

================
Comment at: llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll:181
+  ; TLS_ABI: store [2 x i16] zeroinitializer, [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN]]
+  ; TLS_ABI: %_dfsret = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
----------------
stephan.yichao.zhao wrote:
> morehouse wrote:
> > I'm not familiar with the custom wrapper logic.  Why does this function store 0 to arg TLS?
> This seems the effect of the existing design.
> 
> dfs$call_custom_cb calls the customized @__dfsw_custom_cb.
> The user-provided @__dfsw_custom_cb calls @call_custom_cb.
> 
> And the instrumentation around "call %cb" inside @call_custom_cb is by this [[ https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp#L767 | visitCallInst ]]. 
> 
> This visitor is from
> ```
>    DFSanFunction DFSF(*this, F, /*IsNativeABI=*/true);
> ```
> 
> When IsNativeABI is one, getShadow always returns [[ https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp#L1153 | 0 ]] for arguments, and 
> @custom_cb [[ https://github.com/llvm/llvm-project/blame/main/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp#L1671 | does not ]] return shadow.
> 
> @__dfsw_custom_cb receives dfst0$custom_cb and @"dfs$cb" instead of @cb.
> @"dfs$cb" is a dfsan @cb that uses TLS to pass shadows at args/ret, while dfst0$custom_cb is a wrapper of @"dfs$cb", and dfst0$custom_cb passes shadows by additional arguments.
> 
> I think in @__dfsw_custom_cb, users' code is like
> 
> 
> ```
> def @__dfsw_custom_cb(@"dfst0$custom_cb, @"dfs$cb", arg1, arg2, ..., arg_shadow1, arg_shadow2, ..., ret_shadow) {
>     auto cb_ret_shadow;
>     auto my_cb = [&] (...) {
>       ...
>       ret @"dfst0$custom_cb"(@"dfs$cb", cb_arg1, cb_arg2, ..., cb_arg_shadow1, cb_arg_shadow2, ..., cb_ret_shadow);
>     }
>     auto r = @custom_cb(my_cb, ...)
>     // set ret_shadow in terms of all shadows...
> }
> ```
> 
> The puzzle is that all arguments shadow are [[ https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp#L766 | assigned ]] to this DFSF, although they should never be used. It is from the [[ https://github.com/llvm/llvm-project/commit/28a10aff4828bd07cfda28ccc6048c775b5b9f60 | first version ]].
> 
> I am not sure this is supposed to be the correct custom wrapper of a function with callbacks.
> 
> 
> 
> 
I think I understand...  Normally `custom_cb` would not be instrumented by DFSan (hence why we added it to the ABI list).  So probably whatever happens here is unimportant.  Maybe we should remove all these checks except the first line:

```
; TLS_ABI: define { i1, i7 } @custom_cb({ i1, i7 } ({ i32, i1 }, [2 x i7])* %cb, { i32, i1 } %a, [2 x i7] %b)
```

================
Comment at: llvm/test/Instrumentation/DataFlowSanitizer/array.ll:245
+  ; FAST16: [[E:%.*]] = extractvalue [17 x i16] {{.*}}, 16
+  ; FAST16: [[S:%.*]] = or i16 {{.*}}, [[E]]
+  ; FAST16: [[V1:%.*]] = insertelement <8 x i16> undef, i16 [[S]], i32 0
----------------
stephan.yichao.zhao wrote:
> morehouse wrote:
> > Shouldn't there be more ORs for each element in `%a`?
> This is where the current diff loses accuracy. 
> When saving an aggregate value into memory, we call that collapse function to convert an accurate shadow to a i16 label.
> So this diff only increases accuracy for variables, arguments and ret.
> 
> This works for O1-compiled targets, because alloca premotion removes lots of memory operations, and practice code does not save aggregate types to memory.
> If we build by O0, it does not work as those pair.cc and struct.c test.
> 
> We need to address this in the next change.
So there *should* be more ORs in the current diff, right?  But the plan is to fix this, so that's why they aren't listed here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92261/new/

https://reviews.llvm.org/D92261