[compiler-rt] ea98116 - [dfsan] Track field/index-level shadow values in variables

Wed Dec 9 11:39:17 PST 2020

Author: Jianzhou Zhao
Date: 2020-12-09T19:38:35Z
New Revision: ea981165a4ef2d6e8be0655f04cc4b61604db6d4

URL: https://github.com/llvm/llvm-project/commit/ea981165a4ef2d6e8be0655f04cc4b61604db6d4
DIFF: https://github.com/llvm/llvm-project/commit/ea981165a4ef2d6e8be0655f04cc4b61604db6d4.diff

LOG: [dfsan] Track field/index-level shadow values in variables

*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.

Consider a parsing function

   std::pair<char*, int> get_token(char* p);

where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.

If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.

The CL is the first change to address the issue.

**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.

For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.

***************************
* An potential implementation plan
***************************

The idea is to adopt the change incrementially.

1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.

After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.

2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store

These two should make O0 and real memory access work.

4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.

***************
* About this CL.
***************
The CL did the following

1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.

2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.

3) extended (insert|extract)value to use field/index-level progagation.

4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.

5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.

6) Added test cases for aggregate type related cases.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92261

Added: 
    llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll
    llvm/test/Instrumentation/DataFlowSanitizer/array.ll
    llvm/test/Instrumentation/DataFlowSanitizer/struct.ll
    llvm/test/Instrumentation/DataFlowSanitizer/vector.ll

Modified: 
    compiler-rt/test/dfsan/pair.cpp
    compiler-rt/test/dfsan/struct.c
    llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
    llvm/test/Instrumentation/DataFlowSanitizer/phi.ll
    llvm/test/Instrumentation/DataFlowSanitizer/store.ll

Removed: 
    


################################################################################
diff  --git a/compiler-rt/test/dfsan/pair.cpp b/compiler-rt/test/dfsan/pair.cpp
index 830fe393b17b..52fa8bdde7e8 100644

--- a/compiler-rt/test/dfsan/pair.cpp
+++ b/compiler-rt/test/dfsan/pair.cpp
@@ -1,4 +1,5 @@
-// RUN: %clangxx_dfsan %s -mllvm -dfsan-fast-16-labels -mllvm -dfsan-track-select-control-flow=false -mllvm -dfsan-combine-pointer-labels-on-load=false -o %t && %run %t
+// RUN: %clangxx_dfsan %s -mllvm -dfsan-fast-16-labels -mllvm -dfsan-track-select-control-flow=false -mllvm -dfsan-combine-pointer-labels-on-load=false -O0 -DO0 -o %t && %run %t
+// RUN: %clangxx_dfsan %s -mllvm -dfsan-fast-16-labels -mllvm -dfsan-track-select-control-flow=false -mllvm -dfsan-combine-pointer-labels-on-load=false -O1 -o %t && %run %t
 
 #include <algorithm>
 #include <assert.h>
@@ -64,29 +65,49 @@ void test_simple_constructors() {
   int i1 = pair1.second;
   int *ptr1 = pair1.first;
 
+#ifdef O0
   assert(dfsan_read_label(&i1, sizeof(i1)) == 10);
   assert(dfsan_read_label(&ptr1, sizeof(ptr1)) == 10);
+#else
+  assert(dfsan_read_label(&i1, sizeof(i1)) == 8);
+  assert(dfsan_read_label(&ptr1, sizeof(ptr1)) == 2);
+#endif
 
   std::pair<int *, int> pair2 = copy_pair1(pair1);
   int i2 = pair2.second;
   int *ptr2 = pair2.first;
 
+#ifdef O0
   assert(dfsan_read_label(&i2, sizeof(i2)) == 10);
   assert(dfsan_read_label(&ptr2, sizeof(ptr2)) == 10);
+#else
+  assert(dfsan_read_label(&i2, sizeof(i2)) == 8);
+  assert(dfsan_read_label(&ptr2, sizeof(ptr2)) == 2);
+#endif
 
   std::pair<int *, int> pair3 = copy_pair2(&pair1);
   int i3 = pair3.second;
   int *ptr3 = pair3.first;
 
+#ifdef O0
   assert(dfsan_read_label(&i3, sizeof(i3)) == 10);
   assert(dfsan_read_label(&ptr3, sizeof(ptr3)) == 10);
+#else
+  assert(dfsan_read_label(&i3, sizeof(i3)) == 8);
+  assert(dfsan_read_label(&ptr3, sizeof(ptr3)) == 2);
+#endif
 
   std::pair<int *, int> pair4 = copy_pair3(std::move(pair1));
   int i4 = pair4.second;
   int *ptr4 = pair4.first;
 
+#ifdef O0
   assert(dfsan_read_label(&i4, sizeof(i4)) == 10);
   assert(dfsan_read_label(&ptr4, sizeof(ptr4)) == 10);
+#else
+  assert(dfsan_read_label(&i4, sizeof(i4)) == 8);
+  assert(dfsan_read_label(&ptr4, sizeof(ptr4)) == 2);
+#endif
 }
 
 void test_branches() {
@@ -118,14 +139,24 @@ void test_branches() {
 
     {
       std::pair<const char *, uint32_t> r = return_ptr_and_i32(q, res);
+#ifdef O0
       assert(dfsan_read_label(&r.first, sizeof(r.first)) == 10);
       assert(dfsan_read_label(&r.second, sizeof(r.second)) == 10);
+#else
+      assert(dfsan_read_label(&r.first, sizeof(r.first)) == 2);
+      assert(dfsan_read_label(&r.second, sizeof(r.second)) == 8);
+#endif
     }
 
     {
       std::pair<const char *, uint64_t> r = return_ptr_and_i64(q, res);
+#ifdef O0
       assert(dfsan_read_label(&r.first, sizeof(r.first)) == 10);
       assert(dfsan_read_label(&r.second, sizeof(r.second)) == 10);
+#else
+      assert(dfsan_read_label(&r.first, sizeof(r.first)) == 2);
+      assert(dfsan_read_label(&r.second, sizeof(r.second)) == 8);
+#endif
     }
   }
 }

diff  --git a/compiler-rt/test/dfsan/struct.c b/compiler-rt/test/dfsan/struct.c
index 6441ad4de163..db31567f584b 100644
--- a/compiler-rt/test/dfsan/struct.c
+++ b/compiler-rt/test/dfsan/struct.c
@@ -1,4 +1,7 @@
-// RUN: %clang_dfsan %s -o %t && %run %t
+// RUN: %clang_dfsan %s -O1 -mllvm -dfsan-fast-16-labels=true -DFAST16_O1 -o %t && %run %t
+// RUN: %clang_dfsan %s -O1 -DO1 -o %t && %run %t
+// RUN: %clang_dfsan %s -O0 -mllvm -dfsan-fast-16-labels=true -DFAST16_O0 -o %t && %run %t
+// RUN: %clang_dfsan %s -O0 -DO0 -o %t && %run %t
 
 #include <assert.h>
 #include <sanitizer/dfsan_interface.h>
@@ -35,9 +38,14 @@ Pair copy_pair2(const Pair pair0) {
 int main(void) {
   int i = 1;
   char *ptr = NULL;
+#if defined(FAST16_O1) || defined(FAST16_O0)
+  dfsan_label i_label = 1;
+  dfsan_label ptr_label = 2;
+#else
   dfsan_label i_label = dfsan_create_label("i", 0);
-  dfsan_set_label(i_label, &i, sizeof(i));
   dfsan_label ptr_label = dfsan_create_label("ptr", 0);
+#endif
+  dfsan_set_label(i_label, &i, sizeof(i));
   dfsan_set_label(ptr_label, &ptr, sizeof(ptr));
 
   Pair pair1 = make_pair(i, ptr);
@@ -46,10 +54,18 @@ int main(void) {
 
   dfsan_label i1_label = dfsan_read_label(&i1, sizeof(i1));
   dfsan_label ptr1_label = dfsan_read_label(&ptr1, sizeof(ptr1));
+#if defined(O0) || defined(O1)
   assert(dfsan_has_label(i1_label, i_label));
   assert(dfsan_has_label(i1_label, ptr_label));
   assert(dfsan_has_label(ptr1_label, i_label));
   assert(dfsan_has_label(ptr1_label, ptr_label));
+#elif defined(FAST16_O0)
+  assert(i1_label == (i_label | ptr_label));
+  assert(ptr1_label == (i_label | ptr_label));
+#else
+  assert(i1_label == i_label);
+  assert(ptr1_label == ptr_label);
+#endif
 
   Pair pair2 = copy_pair1(&pair1);
   int i2 = pair2.i;
@@ -57,10 +73,18 @@ int main(void) {
 
   dfsan_label i2_label = dfsan_read_label(&i2, sizeof(i2));
   dfsan_label ptr2_label = dfsan_read_label(&ptr2, sizeof(ptr2));
+#if defined(O0) || defined(O1)
   assert(dfsan_has_label(i2_label, i_label));
   assert(dfsan_has_label(i2_label, ptr_label));
   assert(dfsan_has_label(ptr2_label, i_label));
   assert(dfsan_has_label(ptr2_label, ptr_label));
+#elif defined(FAST16_O0)
+  assert(i2_label == (i_label | ptr_label));
+  assert(ptr2_label == (i_label | ptr_label));
+#else
+  assert(i2_label == i_label);
+  assert(ptr2_label == ptr_label);
+#endif
 
   Pair pair3 = copy_pair2(pair1);
   int i3 = pair3.i;
@@ -68,10 +92,19 @@ int main(void) {
 
   dfsan_label i3_label = dfsan_read_label(&i3, sizeof(i3));
   dfsan_label ptr3_label = dfsan_read_label(&ptr3, sizeof(ptr3));
+#if defined(O0) || defined(O1)
   assert(dfsan_has_label(i3_label, i_label));
   assert(dfsan_has_label(i3_label, ptr_label));
   assert(dfsan_has_label(ptr3_label, i_label));
   assert(dfsan_has_label(ptr3_label, ptr_label));
+#elif defined(FAST16_O0)
+  assert(i3_label == (i_label | ptr_label));
+  assert(ptr3_label == (i_label | ptr_label));
+#else
+  assert(i3_label == i_label);
+  assert(ptr3_label == ptr_label);
+#endif
+
 
   return 0;
 }

diff  --git a/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
index 87cf67705324..597f413e9043 100644
--- a/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
@@ -362,9 +362,7 @@ class DataFlowSanitizer {
   Module *Mod;
   LLVMContext *Ctx;
   Type *Int8Ptr;
-  /// The shadow type for all primitive types. Until we support field/index
-  /// level shadow values, aggregate and vector types also use this shadow
-  /// type.
+  /// The shadow type for all primitive types and vector types.
   IntegerType *PrimitiveShadowTy;
   PointerType *PrimitiveShadowPtrTy;
   IntegerType *IntptrTy;
@@ -419,13 +417,33 @@ class DataFlowSanitizer {
 
   bool init(Module &M);
 
-  /// Returns a zero constant with the shadow type of V's type. Until we support
-  /// field/index level shadow values, the following methods always return
-  /// primitive types, values or zero constants.
+  /// Returns whether the pass tracks labels for struct fields and array
+  /// indices. Support only fast16 mode in TLS ABI mode.
+  bool shouldTrackFieldsAndIndices();
+
+  /// Returns a zero constant with the shadow type of OrigTy.
+  ///
+  /// getZeroShadow({T1,T2,...}) = {getZeroShadow(T1),getZeroShadow(T2,...}
+  /// getZeroShadow([n x T]) = [n x getZeroShadow(T)]
+  /// getZeroShadow(other type) = i16(0)
+  ///
+  /// Note that a zero shadow is always i16(0) when shouldTrackFieldsAndIndices
+  /// returns false.
+  Constant *getZeroShadow(Type *OrigTy);
+  /// Returns a zero constant with the shadow type of V's type.
   Constant *getZeroShadow(Value *V);
+
   /// Checks if V is a zero shadow.
   bool isZeroShadow(Value *V);
+
   /// Returns the shadow type of OrigTy.
+  ///
+  /// getShadowTy({T1,T2,...}) = {getShadowTy(T1),getShadowTy(T2),...}
+  /// getShadowTy([n x T]) = [n x getShadowTy(T)]
+  /// getShadowTy(other type) = i16
+  ///
+  /// Note that a shadow type is always i16 when shouldTrackFieldsAndIndices
+  /// returns false.
   Type *getShadowTy(Type *OrigTy);
   /// Returns the shadow type of of V's type.
   Type *getShadowTy(Value *V);
@@ -456,6 +474,11 @@ struct DFSanFunction {
   };
   /// Maps a value to its latest shadow value in terms of domination tree.
   DenseMap<std::pair<Value *, Value *>, CachedShadow> CachedShadows;
+  /// Maps a value to its latest collapsed shadow value it was converted to in
+  /// terms of domination tree. When ClDebugNonzeroLabels is on, this cache is
+  /// used at a post process where CFG blocks are split. So it does not cache
+  /// BasicBlock like CachedShadows, but uses domination between values.
+  DenseMap<Value *, Value *> CachedCollapsedShadows;
   DenseMap<Value *, std::set<Value *>> ShadowElements;
 
   DFSanFunction(DataFlowSanitizer &DFS, Function *F, bool IsNativeABI)
@@ -476,14 +499,44 @@ struct DFSanFunction {
 
   Value *getShadow(Value *V);
   void setShadow(Instruction *I, Value *Shadow);
+  /// Generates IR to compute the union of the two given shadows, inserting it
+  /// before Pos. The combined value is with primitive type.
   Value *combineShadows(Value *V1, Value *V2, Instruction *Pos);
+  /// Combines the shadow values of V1 and V2, then converts the combined value
+  /// with primitive type into a shadow value with the original type T.
+  Value *combineShadowsThenConvert(Type *T, Value *V1, Value *V2,
+                                   Instruction *Pos);
   Value *combineOperandShadows(Instruction *Inst);
   Value *loadShadow(Value *ShadowAddr, uint64_t Size, uint64_t Align,
                     Instruction *Pos);
-  void storeShadow(Value *Addr, uint64_t Size, Align Alignment, Value *Shadow,
-                   Instruction *Pos);
+  void storePrimitiveShadow(Value *Addr, uint64_t Size, Align Alignment,
+                            Value *PrimitiveShadow, Instruction *Pos);
+  /// Applies PrimitiveShadow to all primitive subtypes of T, returning
+  /// the expanded shadow value.
+  ///
+  /// EFP({T1,T2, ...}, PS) = {EFP(T1,PS),EFP(T2,PS),...}
+  /// EFP([n x T], PS) = [n x EFP(T,PS)]
+  /// EFP(other types, PS) = PS
+  Value *expandFromPrimitiveShadow(Type *T, Value *PrimitiveShadow,
+                                   Instruction *Pos);
+  /// Collapses Shadow into a single primitive shadow value, unioning all
+  /// primitive shadow values in the process. Returns the final primitive
+  /// shadow value.
+  ///
+  /// CTP({V1,V2, ...}) = UNION(CFP(V1,PS),CFP(V2,PS),...)
+  /// CTP([V1,V2,...]) = UNION(CFP(V1,PS),CFP(V2,PS),...)
+  /// CTP(other types, PS) = PS
+  Value *collapseToPrimitiveShadow(Value *Shadow, Instruction *Pos);
 
 private:
+  /// Collapses the shadow with aggregate type into a single primitive shadow
+  /// value.
+  template <class AggregateType>
+  Value *collapseAggregateShadow(AggregateType *AT, Value *Shadow,
+                                 IRBuilder<> &IRB);
+
+  Value *collapseToPrimitiveShadow(Value *Shadow, IRBuilder<> &IRB);
+
   /// Returns the shadow value of an argument A.
   Value *getShadowForTLSArgument(Argument *A);
 };
@@ -592,14 +645,156 @@ TransformedFunction DataFlowSanitizer::getCustomFunctionType(FunctionType *T) {
 }
 
 bool DataFlowSanitizer::isZeroShadow(Value *V) {
-  return ZeroPrimitiveShadow == V;
+  if (!shouldTrackFieldsAndIndices())
+    return ZeroPrimitiveShadow == V;
+
+  Type *T = V->getType();
+  if (!isa<ArrayType>(T) && !isa<StructType>(T)) {
+    if (const ConstantInt *CI = dyn_cast<ConstantInt>(V))
+      return CI->isZero();
+    return false;
+  }
+
+  return isa<ConstantAggregateZero>(V);
+}
+
+bool DataFlowSanitizer::shouldTrackFieldsAndIndices() {
+  return getInstrumentedABI() == DataFlowSanitizer::IA_TLS && ClFast16Labels;
+}
+
+Constant *DataFlowSanitizer::getZeroShadow(Type *OrigTy) {
+  if (!shouldTrackFieldsAndIndices())
+    return ZeroPrimitiveShadow;
+
+  if (!isa<ArrayType>(OrigTy) && !isa<StructType>(OrigTy))
+    return ZeroPrimitiveShadow;
+  Type *ShadowTy = getShadowTy(OrigTy);
+  return ConstantAggregateZero::get(ShadowTy);
 }
 
 Constant *DataFlowSanitizer::getZeroShadow(Value *V) {
-  return ZeroPrimitiveShadow;
+  return getZeroShadow(V->getType());
+}
+
+static Value *expandFromPrimitiveShadowRecursive(
+    Value *Shadow, SmallVector<unsigned, 4> &Indices, Type *SubShadowTy,
+    Value *PrimitiveShadow, IRBuilder<> &IRB) {
+  if (!isa<ArrayType>(SubShadowTy) && !isa<StructType>(SubShadowTy))
+    return IRB.CreateInsertValue(Shadow, PrimitiveShadow, Indices);
+
+  if (ArrayType *AT = dyn_cast<ArrayType>(SubShadowTy)) {
+    for (unsigned Idx = 0; Idx < AT->getNumElements(); Idx++) {
+      Indices.push_back(Idx);
+      Shadow = expandFromPrimitiveShadowRecursive(
+          Shadow, Indices, AT->getElementType(), PrimitiveShadow, IRB);
+      Indices.pop_back();
+    }
+    return Shadow;
+  }
+
+  if (StructType *ST = dyn_cast<StructType>(SubShadowTy)) {
+    for (unsigned Idx = 0; Idx < ST->getNumElements(); Idx++) {
+      Indices.push_back(Idx);
+      Shadow = expandFromPrimitiveShadowRecursive(
+          Shadow, Indices, ST->getElementType(Idx), PrimitiveShadow, IRB);
+      Indices.pop_back();
+    }
+    return Shadow;
+  }
+  llvm_unreachable("Unexpected shadow type");
+}
+
+Value *DFSanFunction::expandFromPrimitiveShadow(Type *T, Value *PrimitiveShadow,
+                                                Instruction *Pos) {
+  Type *ShadowTy = DFS.getShadowTy(T);
+
+  if (!isa<ArrayType>(ShadowTy) && !isa<StructType>(ShadowTy))
+    return PrimitiveShadow;
+
+  if (DFS.isZeroShadow(PrimitiveShadow))
+    return DFS.getZeroShadow(ShadowTy);
+
+  IRBuilder<> IRB(Pos);
+  SmallVector<unsigned, 4> Indices;
+  Value *Shadow = UndefValue::get(ShadowTy);
+  Shadow = expandFromPrimitiveShadowRecursive(Shadow, Indices, ShadowTy,
+                                              PrimitiveShadow, IRB);
+
+  // Caches the primitive shadow value that built the shadow value.
+  CachedCollapsedShadows[Shadow] = PrimitiveShadow;
+  return Shadow;
 }
 
-Type *DataFlowSanitizer::getShadowTy(Type *OrigTy) { return PrimitiveShadowTy; }
+template <class AggregateType>
+Value *DFSanFunction::collapseAggregateShadow(AggregateType *AT, Value *Shadow,
+                                              IRBuilder<> &IRB) {
+  if (!AT->getNumElements())
+    return DFS.ZeroPrimitiveShadow;
+
+  Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
+  Value *Aggregator = collapseToPrimitiveShadow(FirstItem, IRB);
+
+  for (unsigned Idx = 1; Idx < AT->getNumElements(); Idx++) {
+    Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
+    Value *ShadowInner = collapseToPrimitiveShadow(ShadowItem, IRB);
+    Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
+  }
+  return Aggregator;
+}
+
+Value *DFSanFunction::collapseToPrimitiveShadow(Value *Shadow,
+                                                IRBuilder<> &IRB) {
+  Type *ShadowTy = Shadow->getType();
+  if (!isa<ArrayType>(ShadowTy) && !isa<StructType>(ShadowTy))
+    return Shadow;
+  if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy))
+    return collapseAggregateShadow<>(AT, Shadow, IRB);
+  if (StructType *ST = dyn_cast<StructType>(ShadowTy))
+    return collapseAggregateShadow<>(ST, Shadow, IRB);
+  llvm_unreachable("Unexpected shadow type");
+}
+
+Value *DFSanFunction::collapseToPrimitiveShadow(Value *Shadow,
+                                                Instruction *Pos) {
+  Type *ShadowTy = Shadow->getType();
+  if (!isa<ArrayType>(ShadowTy) && !isa<StructType>(ShadowTy))
+    return Shadow;
+
+  assert(DFS.shouldTrackFieldsAndIndices());
+
+  // Checks if the cached collapsed shadow value dominates Pos.
+  Value *&CS = CachedCollapsedShadows[Shadow];
+  if (CS && DT.dominates(CS, Pos))
+    return CS;
+
+  IRBuilder<> IRB(Pos);
+  Value *PrimitiveShadow = collapseToPrimitiveShadow(Shadow, IRB);
+  // Caches the converted primitive shadow value.
+  CS = PrimitiveShadow;
+  return PrimitiveShadow;
+}
+
+Type *DataFlowSanitizer::getShadowTy(Type *OrigTy) {
+  if (!shouldTrackFieldsAndIndices())
+    return PrimitiveShadowTy;
+
+  if (!OrigTy->isSized())
+    return PrimitiveShadowTy;
+  if (isa<IntegerType>(OrigTy))
+    return PrimitiveShadowTy;
+  if (isa<VectorType>(OrigTy))
+    return PrimitiveShadowTy;
+  if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy))
+    return ArrayType::get(getShadowTy(AT->getElementType()),
+                          AT->getNumElements());
+  if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
+    SmallVector<Type *, 4> Elements;
+    for (unsigned I = 0, N = ST->getNumElements(); I < N; ++I)
+      Elements.push_back(getShadowTy(ST->getElementType(I)));
+    return StructType::get(*Ctx, Elements);
+  }
+  return PrimitiveShadowTy;
+}
 
 Type *DataFlowSanitizer::getShadowTy(Value *V) {
   return getShadowTy(V->getType());
@@ -760,14 +955,21 @@ Constant *DataFlowSanitizer::getOrBuildTrampolineFunction(FunctionType *FT,
     else
       RI = ReturnInst::Create(*Ctx, CI, BB);
 
+    // F is called by a wrapped custom function with primitive shadows. So
+    // its arguments and return value need conversion.
     DFSanFunction DFSF(*this, F, /*IsNativeABI=*/true);
     Function::arg_iterator ValAI = F->arg_begin(), ShadowAI = AI; ++ValAI;
-    for (unsigned N = FT->getNumParams(); N != 0; ++ValAI, ++ShadowAI, --N)
-      DFSF.ValShadowMap[&*ValAI] = &*ShadowAI;
+    for (unsigned N = FT->getNumParams(); N != 0; ++ValAI, ++ShadowAI, --N) {
+      Value *Shadow =
+          DFSF.expandFromPrimitiveShadow(ValAI->getType(), &*ShadowAI, CI);
+      DFSF.ValShadowMap[&*ValAI] = Shadow;
+    }
     DFSanVisitor(DFSF).visitCallInst(*CI);
-    if (!FT->getReturnType()->isVoidTy())
-      new StoreInst(DFSF.getShadow(RI->getReturnValue()),
-                    &*std::prev(F->arg_end()), RI);
+    if (!FT->getReturnType()->isVoidTy()) {
+      Value *PrimitiveShadow = DFSF.collapseToPrimitiveShadow(
+          DFSF.getShadow(RI->getReturnValue()), RI);
+      new StoreInst(PrimitiveShadow, &*std::prev(F->arg_end()), RI);
+    }
   }
 
   return cast<Constant>(C.getCallee());
@@ -1087,7 +1289,9 @@ bool DataFlowSanitizer::runImpl(Module &M) {
         while (isa<PHINode>(Pos) || isa<AllocaInst>(Pos))
           Pos = Pos->getNextNode();
         IRBuilder<> IRB(Pos);
-        Value *Ne = IRB.CreateICmpNE(V, DFSF.DFS.ZeroPrimitiveShadow);
+        Value *PrimitiveShadow = DFSF.collapseToPrimitiveShadow(V, Pos);
+        Value *Ne =
+            IRB.CreateICmpNE(PrimitiveShadow, DFSF.DFS.ZeroPrimitiveShadow);
         BranchInst *BI = cast<BranchInst>(SplitBlockAndInsertIfThen(
             Ne, Pos, /*Unreachable=*/false, ColdCallWeights));
         IRBuilder<> ThenIRB(BI);
@@ -1177,7 +1381,8 @@ Value *DFSanFunction::getShadow(Value *V) {
 
 void DFSanFunction::setShadow(Instruction *I, Value *Shadow) {
   assert(!ValShadowMap.count(I));
-  assert(Shadow->getType() == DFS.PrimitiveShadowTy);
+  assert(DFS.shouldTrackFieldsAndIndices() ||
+         Shadow->getType() == DFS.PrimitiveShadowTy);
   ValShadowMap[I] = Shadow;
 }
 
@@ -1197,32 +1402,38 @@ Value *DataFlowSanitizer::getShadowAddress(Value *Addr, Instruction *Pos) {
       PrimitiveShadowPtrTy);
 }
 
+Value *DFSanFunction::combineShadowsThenConvert(Type *T, Value *V1, Value *V2,
+                                                Instruction *Pos) {
+  Value *PrimitiveValue = combineShadows(V1, V2, Pos);
+  return expandFromPrimitiveShadow(T, PrimitiveValue, Pos);
+}
+
 // Generates IR to compute the union of the two given shadows, inserting it
-// before Pos.  Returns the computed union Value.
+// before Pos. The combined value is with primitive type.
 Value *DFSanFunction::combineShadows(Value *V1, Value *V2, Instruction *Pos) {
   if (DFS.isZeroShadow(V1))
-    return V2;
+    return collapseToPrimitiveShadow(V2, Pos);
   if (DFS.isZeroShadow(V2))
-    return V1;
+    return collapseToPrimitiveShadow(V1, Pos);
   if (V1 == V2)
-    return V1;
+    return collapseToPrimitiveShadow(V1, Pos);
 
   auto V1Elems = ShadowElements.find(V1);
   auto V2Elems = ShadowElements.find(V2);
   if (V1Elems != ShadowElements.end() && V2Elems != ShadowElements.end()) {
     if (std::includes(V1Elems->second.begin(), V1Elems->second.end(),
                       V2Elems->second.begin(), V2Elems->second.end())) {
-      return V1;
+      return collapseToPrimitiveShadow(V1, Pos);
     } else if (std::includes(V2Elems->second.begin(), V2Elems->second.end(),
                              V1Elems->second.begin(), V1Elems->second.end())) {
-      return V2;
+      return collapseToPrimitiveShadow(V2, Pos);
     }
   } else if (V1Elems != ShadowElements.end()) {
     if (V1Elems->second.count(V2))
-      return V1;
+      return collapseToPrimitiveShadow(V1, Pos);
   } else if (V2Elems != ShadowElements.end()) {
     if (V2Elems->second.count(V1))
-      return V2;
+      return collapseToPrimitiveShadow(V2, Pos);
   }
 
   auto Key = std::make_pair(V1, V2);
@@ -1232,12 +1443,16 @@ Value *DFSanFunction::combineShadows(Value *V1, Value *V2, Instruction *Pos) {
   if (CCS.Block && DT.dominates(CCS.Block, Pos->getParent()))
     return CCS.Shadow;
 
+  // Converts inputs shadows to shadows with primitive types.
+  Value *PV1 = collapseToPrimitiveShadow(V1, Pos);
+  Value *PV2 = collapseToPrimitiveShadow(V2, Pos);
+
   IRBuilder<> IRB(Pos);
   if (ClFast16Labels) {
     CCS.Block = Pos->getParent();
-    CCS.Shadow = IRB.CreateOr(V1, V2);
+    CCS.Shadow = IRB.CreateOr(PV1, PV2);
   } else if (AvoidNewBlocks) {
-    CallInst *Call = IRB.CreateCall(DFS.DFSanCheckedUnionFn, {V1, V2});
+    CallInst *Call = IRB.CreateCall(DFS.DFSanCheckedUnionFn, {PV1, PV2});
     Call->addAttribute(AttributeList::ReturnIndex, Attribute::ZExt);
     Call->addParamAttr(0, Attribute::ZExt);
     Call->addParamAttr(1, Attribute::ZExt);
@@ -1246,11 +1461,11 @@ Value *DFSanFunction::combineShadows(Value *V1, Value *V2, Instruction *Pos) {
     CCS.Shadow = Call;
   } else {
     BasicBlock *Head = Pos->getParent();
-    Value *Ne = IRB.CreateICmpNE(V1, V2);
+    Value *Ne = IRB.CreateICmpNE(PV1, PV2);
     BranchInst *BI = cast<BranchInst>(SplitBlockAndInsertIfThen(
         Ne, Pos, /*Unreachable=*/false, DFS.ColdCallWeights, &DT));
     IRBuilder<> ThenIRB(BI);
-    CallInst *Call = ThenIRB.CreateCall(DFS.DFSanUnionFn, {V1, V2});
+    CallInst *Call = ThenIRB.CreateCall(DFS.DFSanUnionFn, {PV1, PV2});
     Call->addAttribute(AttributeList::ReturnIndex, Attribute::ZExt);
     Call->addParamAttr(0, Attribute::ZExt);
     Call->addParamAttr(1, Attribute::ZExt);
@@ -1259,7 +1474,7 @@ Value *DFSanFunction::combineShadows(Value *V1, Value *V2, Instruction *Pos) {
     PHINode *Phi =
         PHINode::Create(DFS.PrimitiveShadowTy, 2, "", &Tail->front());
     Phi->addIncoming(Call, Call->getParent());
-    Phi->addIncoming(V1, Head);
+    Phi->addIncoming(PV1, Head);
 
     CCS.Block = Tail;
     CCS.Shadow = Phi;
@@ -1292,7 +1507,7 @@ Value *DFSanFunction::combineOperandShadows(Instruction *Inst) {
   for (unsigned i = 1, n = Inst->getNumOperands(); i != n; ++i) {
     Shadow = combineShadows(Shadow, getShadow(Inst->getOperand(i)), Inst);
   }
-  return Shadow;
+  return expandFromPrimitiveShadow(Inst->getType(), Shadow, Inst);
 }
 
 Value *DFSanVisitor::visitOperandShadowInst(Instruction &I) {
@@ -1302,7 +1517,8 @@ Value *DFSanVisitor::visitOperandShadowInst(Instruction &I) {
 }
 
 // Generates IR to load shadow corresponding to bytes [Addr, Addr+Size), where
-// Addr has alignment Align, and take the union of each of those shadows.
+// Addr has alignment Align, and take the union of each of those shadows. The
+// returned shadow always has primitive type.
 Value *DFSanFunction::loadShadow(Value *Addr, uint64_t Size, uint64_t Align,
                                  Instruction *Pos) {
   if (AllocaInst *AI = dyn_cast<AllocaInst>(Addr)) {
@@ -1456,30 +1672,34 @@ void DFSanVisitor::visitLoadInst(LoadInst &LI) {
   }
 
   Align Alignment = ClPreserveAlignment ? LI.getAlign() : Align(1);
-  Value *Shadow =
+  Value *PrimitiveShadow =
       DFSF.loadShadow(LI.getPointerOperand(), Size, Alignment.value(), &LI);
   if (ClCombinePointerLabelsOnLoad) {
     Value *PtrShadow = DFSF.getShadow(LI.getPointerOperand());
-    Shadow = DFSF.combineShadows(Shadow, PtrShadow, &LI);
+    PrimitiveShadow = DFSF.combineShadows(PrimitiveShadow, PtrShadow, &LI);
   }
-  if (!DFSF.DFS.isZeroShadow(Shadow))
-    DFSF.NonZeroChecks.push_back(Shadow);
+  if (!DFSF.DFS.isZeroShadow(PrimitiveShadow))
+    DFSF.NonZeroChecks.push_back(PrimitiveShadow);
 
+  Value *Shadow =
+      DFSF.expandFromPrimitiveShadow(LI.getType(), PrimitiveShadow, &LI);
   DFSF.setShadow(&LI, Shadow);
   if (ClEventCallbacks) {
     IRBuilder<> IRB(&LI);
     Value *Addr8 = IRB.CreateBitCast(LI.getPointerOperand(), DFSF.DFS.Int8Ptr);
-    IRB.CreateCall(DFSF.DFS.DFSanLoadCallbackFn, {Shadow, Addr8});
+    IRB.CreateCall(DFSF.DFS.DFSanLoadCallbackFn, {PrimitiveShadow, Addr8});
   }
 }
 
-void DFSanFunction::storeShadow(Value *Addr, uint64_t Size, Align Alignment,
-                                Value *Shadow, Instruction *Pos) {
+void DFSanFunction::storePrimitiveShadow(Value *Addr, uint64_t Size,
+                                         Align Alignment,
+                                         Value *PrimitiveShadow,
+                                         Instruction *Pos) {
   if (AllocaInst *AI = dyn_cast<AllocaInst>(Addr)) {
     const auto i = AllocaShadowMap.find(AI);
     if (i != AllocaShadowMap.end()) {
       IRBuilder<> IRB(Pos);
-      IRB.CreateStore(Shadow, i->second);
+      IRB.CreateStore(PrimitiveShadow, i->second);
       return;
     }
   }
@@ -1487,7 +1707,7 @@ void DFSanFunction::storeShadow(Value *Addr, uint64_t Size, Align Alignment,
   const Align ShadowAlign(Alignment.value() * DFS.ShadowWidthBytes);
   IRBuilder<> IRB(Pos);
   Value *ShadowAddr = DFS.getShadowAddress(Addr, Pos);
-  if (DFS.isZeroShadow(Shadow)) {
+  if (DFS.isZeroShadow(PrimitiveShadow)) {
     IntegerType *ShadowTy =
         IntegerType::get(*DFS.Ctx, Size * DFS.ShadowWidthBits);
     Value *ExtZeroShadow = ConstantInt::get(ShadowTy, 0);
@@ -1505,7 +1725,8 @@ void DFSanFunction::storeShadow(Value *Addr, uint64_t Size, Align Alignment,
     Value *ShadowVec = UndefValue::get(ShadowVecTy);
     for (unsigned i = 0; i != ShadowVecSize; ++i) {
       ShadowVec = IRB.CreateInsertElement(
-          ShadowVec, Shadow, ConstantInt::get(Type::getInt32Ty(*DFS.Ctx), i));
+          ShadowVec, PrimitiveShadow,
+          ConstantInt::get(Type::getInt32Ty(*DFS.Ctx), i));
     }
     Value *ShadowVecAddr =
         IRB.CreateBitCast(ShadowAddr, PointerType::getUnqual(ShadowVecTy));
@@ -1521,7 +1742,7 @@ void DFSanFunction::storeShadow(Value *Addr, uint64_t Size, Align Alignment,
   while (Size > 0) {
     Value *CurShadowAddr =
         IRB.CreateConstGEP1_32(DFS.PrimitiveShadowTy, ShadowAddr, Offset);
-    IRB.CreateAlignedStore(Shadow, CurShadowAddr, ShadowAlign);
+    IRB.CreateAlignedStore(PrimitiveShadow, CurShadowAddr, ShadowAlign);
     --Size;
     ++Offset;
   }
@@ -1536,15 +1757,19 @@ void DFSanVisitor::visitStoreInst(StoreInst &SI) {
   const Align Alignment = ClPreserveAlignment ? SI.getAlign() : Align(1);
 
   Value* Shadow = DFSF.getShadow(SI.getValueOperand());
+  Value *PrimitiveShadow;
   if (ClCombinePointerLabelsOnStore) {
     Value *PtrShadow = DFSF.getShadow(SI.getPointerOperand());
-    Shadow = DFSF.combineShadows(Shadow, PtrShadow, &SI);
+    PrimitiveShadow = DFSF.combineShadows(Shadow, PtrShadow, &SI);
+  } else {
+    PrimitiveShadow = DFSF.collapseToPrimitiveShadow(Shadow, &SI);
   }
-  DFSF.storeShadow(SI.getPointerOperand(), Size, Alignment, Shadow, &SI);
+  DFSF.storePrimitiveShadow(SI.getPointerOperand(), Size, Alignment,
+                            PrimitiveShadow, &SI);
   if (ClEventCallbacks) {
     IRBuilder<> IRB(&SI);
     Value *Addr8 = IRB.CreateBitCast(SI.getPointerOperand(), DFSF.DFS.Int8Ptr);
-    IRB.CreateCall(DFSF.DFS.DFSanStoreCallbackFn, {Shadow, Addr8});
+    IRB.CreateCall(DFSF.DFS.DFSanStoreCallbackFn, {PrimitiveShadow, Addr8});
   }
 }
 
@@ -1583,11 +1808,29 @@ void DFSanVisitor::visitShuffleVectorInst(ShuffleVectorInst &I) {
 }
 
 void DFSanVisitor::visitExtractValueInst(ExtractValueInst &I) {
-  visitOperandShadowInst(I);
+  if (!DFSF.DFS.shouldTrackFieldsAndIndices()) {
+    visitOperandShadowInst(I);
+    return;
+  }
+
+  IRBuilder<> IRB(&I);
+  Value *Agg = I.getAggregateOperand();
+  Value *AggShadow = DFSF.getShadow(Agg);
+  Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
+  DFSF.setShadow(&I, ResShadow);
 }
 
 void DFSanVisitor::visitInsertValueInst(InsertValueInst &I) {
-  visitOperandShadowInst(I);
+  if (!DFSF.DFS.shouldTrackFieldsAndIndices()) {
+    visitOperandShadowInst(I);
+    return;
+  }
+
+  IRBuilder<> IRB(&I);
+  Value *AggShadow = DFSF.getShadow(I.getAggregateOperand());
+  Value *InsShadow = DFSF.getShadow(I.getInsertedValueOperand());
+  Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
+  DFSF.setShadow(&I, Res);
 }
 
 void DFSanVisitor::visitAllocaInst(AllocaInst &I) {
@@ -1618,7 +1861,8 @@ void DFSanVisitor::visitSelectInst(SelectInst &I) {
   Value *ShadowSel = nullptr;
 
   if (isa<VectorType>(I.getCondition()->getType())) {
-    ShadowSel = DFSF.combineShadows(TrueShadow, FalseShadow, &I);
+    ShadowSel = DFSF.combineShadowsThenConvert(I.getType(), TrueShadow,
+                                               FalseShadow, &I);
   } else {
     if (TrueShadow == FalseShadow) {
       ShadowSel = TrueShadow;
@@ -1628,7 +1872,8 @@ void DFSanVisitor::visitSelectInst(SelectInst &I) {
     }
   }
   DFSF.setShadow(&I, ClTrackSelectControlFlow
-                         ? DFSF.combineShadows(CondShadow, ShadowSel, &I)
+                         ? DFSF.combineShadowsThenConvert(
+                               I.getType(), CondShadow, ShadowSel, &I)
                          : ShadowSel);
 }
 
@@ -1777,7 +2022,8 @@ void DFSanVisitor::visitCallBase(CallBase &CB) {
         i = CB.arg_begin();
         const unsigned ShadowArgStart = Args.size();
         for (unsigned n = FT->getNumParams(); n != 0; ++i, --n)
-          Args.push_back(DFSF.getShadow(*i));
+          Args.push_back(
+              DFSF.collapseToPrimitiveShadow(DFSF.getShadow(*i), &CB));
 
         if (FT->isVarArg()) {
           auto *LabelVATy = ArrayType::get(DFSF.DFS.PrimitiveShadowTy,
@@ -1788,7 +2034,9 @@ void DFSanVisitor::visitCallBase(CallBase &CB) {
 
           for (unsigned n = 0; i != CB.arg_end(); ++i, ++n) {
             auto LabelVAPtr = IRB.CreateStructGEP(LabelVATy, LabelVAAlloca, n);
-            IRB.CreateStore(DFSF.getShadow(*i), LabelVAPtr);
+            IRB.CreateStore(
+                DFSF.collapseToPrimitiveShadow(DFSF.getShadow(*i), &CB),
+                LabelVAPtr);
           }
 
           Args.push_back(IRB.CreateStructGEP(LabelVATy, LabelVAAlloca, 0));
@@ -1825,7 +2073,8 @@ void DFSanVisitor::visitCallBase(CallBase &CB) {
         if (!FT->getReturnType()->isVoidTy()) {
           LoadInst *LabelLoad = IRB.CreateLoad(DFSF.DFS.PrimitiveShadowTy,
                                                DFSF.LabelReturnAlloca);
-          DFSF.setShadow(CustomCI, LabelLoad);
+          DFSF.setShadow(CustomCI, DFSF.expandFromPrimitiveShadow(
+                                       FT->getReturnType(), LabelLoad, &CB));
         }
 
         CI->replaceAllUsesWith(CustomCI);

diff  --git a/llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll b/llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll
new file mode 100644
index 000000000000..d412f6d5723f
--- /dev/null
+++ b/llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll
@@ -0,0 +1,292 @@
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-abilist=%S/Inputs/abilist.txt -S | FileCheck %s --check-prefix=TLS_ABI
+; RUN: opt < %s -dfsan -dfsan-abilist=%S/Inputs/abilist.txt -S | FileCheck %s --check-prefix=LEGACY
+; RUN: opt < %s -dfsan -dfsan-args-abi -dfsan-abilist=%S/Inputs/abilist.txt -S | FileCheck %s --check-prefix=ARGS_ABI
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; TLS_ABI: define { i1, i7 } @functional({ i32, i1 } %a, [2 x i7] %b)
+; ARGS_ABI: define { i1, i7 } @functional({ i32, i1 } %a, [2 x i7] %b)
+define {i1, i7} @functional({i32, i1} %a, [2 x i7] %b) {
+  %a1 = extractvalue {i32, i1} %a, 1
+  %b0 = extractvalue [2 x i7] %b, 0
+  %r0 = insertvalue {i1, i7} undef, i1 %a1, 0
+  %r1 = insertvalue {i1, i7} %r0, i7 %b0, 1
+  ret {i1, i7} %r1
+}
+
+define {i1, i7} @call_functional({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: @"dfs$call_functional"
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]
+  ; TLS_ABI: [[U:%.*]] = or i16 [[A01]], [[B01]]
+  ; TLS_ABI: [[R0:%.*]] = insertvalue { i16, i16 } undef, i16 [[U]], 0
+  ; TLS_ABI: [[R1:%.*]] = insertvalue { i16, i16 } [[R0]], i16 [[U]], 1
+  ; TLS_ABI: store { i16, i16 } [[R1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  
+  ; LEGACY: @"dfs$call_functional"
+  ; LEGACY: [[B:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[A:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; LEGACY: [[U:%.*]] = call zeroext i16 @__dfsan_union(i16 zeroext [[A]], i16 zeroext [[B]])
+  ; LEGACY: [[PH:%.*]] = phi i16 [ [[U]], {{.*}} ], [ [[A]], {{.*}} ]
+  ; LEGACY: store i16 [[PH]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+
+  ; ARGS_ABI: @"dfs$call_functional"
+  ; ARGS_ABI: [[U:%.*]]  = call zeroext i16 @__dfsan_union(i16 zeroext %2, i16 zeroext %3)
+  ; ARGS_ABI: [[PH:%.*]] = phi i16 [ %7, {{.*}} ], [ %2, {{.*}} ]
+  ; ARGS_ABI: [[R0:%.*]] = insertvalue { { i1, i7 }, i16 } undef, { i1, i7 } %r, 0
+  ; ARGS_ABI: [[R1:%.*]] = insertvalue { { i1, i7 }, i16 } [[R0]], i16 [[PH]], 1
+  ; ARGS_ABI: ret { { i1, i7 }, i16 } [[R1]]
+  
+  %r = call {i1, i7} @functional({i32, i1} %a, [2 x i7] %b)
+  ret {i1, i7} %r  
+}
+
+; TLS_ABI: define { i1, i7 } @discard({ i32, i1 } %a, [2 x i7] %b)
+define {i1, i7} @discard({i32, i1} %a, [2 x i7] %b) {
+  %a1 = extractvalue {i32, i1} %a, 1
+  %b0 = extractvalue [2 x i7] %b, 0
+  %r0 = insertvalue {i1, i7} undef, i1 %a1, 0
+  %r1 = insertvalue {i1, i7} %r0, i7 %b0, 1
+  ret {i1, i7} %r1
+}
+
+define {i1, i7} @call_discard({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: @"dfs$call_discard"
+  ; TLS_ABI: store { i16, i16 } zeroinitializer, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align 2
+  
+  ; ARGS_ABI: @"dfs$call_discard"
+  ; ARGS_ABI: %r = call { i1, i7 } @discard({ i32, i1 } %0, [2 x i7] %1)
+  ; ARGS_ABI: [[R0:%.*]] = insertvalue { { i1, i7 }, i16 } undef, { i1, i7 } %r, 0
+  ; ARGS_ABI: [[R1:%.*]] = insertvalue { { i1, i7 }, i16 } [[R0]], i16 0, 1
+  ; ARGS_ABI: ret { { i1, i7 }, i16 } [[R1]]
+  
+  %r = call {i1, i7} @discard({i32, i1} %a, [2 x i7] %b)
+  ret {i1, i7} %r  
+}
+
+; TLS_ABI: define { i1, i7 } @uninstrumented({ i32, i1 } %a, [2 x i7] %b)
+define {i1, i7} @uninstrumented({i32, i1} %a, [2 x i7] %b) {
+  %a1 = extractvalue {i32, i1} %a, 1
+  %b0 = extractvalue [2 x i7] %b, 0
+  %r0 = insertvalue {i1, i7} undef, i1 %a1, 0
+  %r1 = insertvalue {i1, i7} %r0, i7 %b0, 1
+  ret {i1, i7} %r1
+}
+
+define {i1, i7} @call_uninstrumented({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: @"dfs$call_uninstrumented"
+  ; TLS_ABI: call void @__dfsan_unimplemented
+  ; TLS_ABI: store { i16, i16 } zeroinitializer, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align 2
+  
+  ; ARGS_ABI: @"dfs$call_uninstrumented"
+  ; ARGS_ABI: call void @__dfsan_unimplemented
+  ; ARGS_ABI: %r = call { i1, i7 } @uninstrumented({ i32, i1 } %0, [2 x i7] %1)
+  ; ARGS_ABI: [[R0:%.*]] = insertvalue { { i1, i7 }, i16 } undef, { i1, i7 } %r, 0
+  ; ARGS_ABI: [[R1:%.*]] = insertvalue { { i1, i7 }, i16 } [[R0]], i16 0, 1
+  ; ARGS_ABI: ret { { i1, i7 }, i16 } [[R1]]
+  
+  %r = call {i1, i7} @uninstrumented({i32, i1} %a, [2 x i7] %b)
+  ret {i1, i7} %r  
+}
+
+define {i1, i7} @call_custom_with_ret({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: @"dfs$call_custom_with_ret"
+  ; TLS_ABI: %labelreturn = alloca i16, align 2
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]
+  ; TLS_ABI: [[R:%.*]] = call { i1, i7 } @__dfsw_custom_with_ret({ i32, i1 } %a, [2 x i7] %b, i16 zeroext [[A01]], i16 zeroext [[B01]], i16* %labelreturn)
+  ; TLS_ABI: [[RE:%.*]] = load i16, i16* %labelreturn, align [[ALIGN]]
+  ; TLS_ABI: [[RS0:%.*]] = insertvalue { i16, i16 } undef, i16 [[RE]], 0
+  ; TLS_ABI: [[RS1:%.*]] = insertvalue { i16, i16 } [[RS0]], i16 [[RE]], 1
+  ; TLS_ABI: store { i16, i16 } [[RS1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: ret { i1, i7 } [[R]]
+  
+  %r = call {i1, i7} @custom_with_ret({i32, i1} %a, [2 x i7] %b)
+  ret {i1, i7} %r  
+}
+
+define void @call_custom_without_ret({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: @"dfs$call_custom_without_ret"
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]
+  ; TLS_ABI: call void @__dfsw_custom_without_ret({ i32, i1 } %a, [2 x i7] %b, i16 zeroext [[A01]], i16 zeroext [[B01]])
+  
+  call void @custom_without_ret({i32, i1} %a, [2 x i7] %b)
+  ret void
+}
+
+define void @call_custom_varg({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: @"dfs$call_custom_varg"
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: %labelva = alloca [1 x i16], align [[ALIGN]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[V0:%.*]] = getelementptr inbounds [1 x i16], [1 x i16]* %labelva, i32 0, i32 0
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]
+  ; TLS_ABI: store i16 [[B01]], i16* [[V0]], align 2
+  ; TLS_ABI: [[V:%.*]] = getelementptr inbounds [1 x i16], [1 x i16]* %labelva, i32 0, i32 0
+  ; TLS_ABI: call void ({ i32, i1 }, i16, i16*, ...) @__dfsw_custom_varg({ i32, i1 } %a, i16 zeroext [[A01]], i16* [[V]], [2 x i7] %b)
+
+  call void ({i32, i1}, ...) @custom_varg({i32, i1} %a, [2 x i7] %b)
+  ret void
+}
+
+define {i1, i7} @call_custom_cb({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: define { i1, i7 } @"dfs$call_custom_cb"({ i32, i1 } %a, [2 x i7] %b) {
+  ; TLS_ABI: %labelreturn = alloca i16, align 2
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]  
+  ; TLS_ABI: [[R:%.*]]  = call { i1, i7 } @__dfsw_custom_cb({ i1, i7 } ({ i1, i7 } ({ i32, i1 }, [2 x i7])*, { i32, i1 }, [2 x i7], i16, i16, i16*)* @"dfst0$custom_cb", i8* bitcast ({ i1, i7 } ({ i32, i1 }, [2 x i7])* @"dfs$cb" to i8*), { i32, i1 } %a, [2 x i7] %b, i16 zeroext 0, i16 zeroext [[A01]], i16 zeroext [[B01]], i16* %labelreturn)
+  ; TLS_ABI: [[RE:%.*]] = load i16, i16* %labelreturn, align [[ALIGN]]
+  ; TLS_ABI: [[RS0:%.*]] = insertvalue { i16, i16 } undef, i16 [[RE]], 0
+  ; TLS_ABI: [[RS1:%.*]] = insertvalue { i16, i16 } [[RS0]], i16 [[RE]], 1
+  ; TLS_ABI: store { i16, i16 } [[RS1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  %r = call {i1, i7} @custom_cb({i1, i7} ({i32, i1}, [2 x i7])* @cb, {i32, i1} %a, [2 x i7] %b)
+  ret {i1, i7} %r
+}
+
+define {i1, i7} @custom_cb({i1, i7} ({i32, i1}, [2 x i7])* %cb, {i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: define { i1, i7 } @custom_cb({ i1, i7 } ({ i32, i1 }, [2 x i7])* %cb, { i32, i1 } %a, [2 x i7] %b)
+
+  %r = call {i1, i7} %cb({i32, i1} %a, [2 x i7] %b)
+  ret {i1, i7} %r
+}
+
+define {i1, i7} @cb({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: define { i1, i7 } @"dfs$cb"({ i32, i1 } %a, [2 x i7] %b)
+  ; TLS_ABI: [[BL:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[AL:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[AL1:%.*]] = extractvalue { i16, i16 } [[AL]], 1
+  ; TLS_ABI: [[BL0:%.*]] = extractvalue [2 x i16] [[BL]], 0
+  ; TLS_ABI: [[RL0:%.*]] = insertvalue { i16, i16 } zeroinitializer, i16 [[AL1]], 0
+  ; TLS_ABI: [[RL:%.*]] = insertvalue { i16, i16 } [[RL0]], i16 [[BL0]], 1
+  ; TLS_ABI: store { i16, i16 } [[RL]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  %a1 = extractvalue {i32, i1} %a, 1
+  %b0 = extractvalue [2 x i7] %b, 0
+  %r0 = insertvalue {i1, i7} undef, i1 %a1, 0
+  %r1 = insertvalue {i1, i7} %r0, i7 %b0, 1
+  ret {i1, i7} %r1
+}
+
+define {i1, i7}  ({i32, i1}, [2 x i7])* @ret_custom() {
+  ; TLS_ABI: @"dfs$ret_custom"
+  ; TLS_ABI: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  ; TLS_ABI: ret {{.*}} @"dfsw$custom_with_ret"
+  ret {i1, i7}  ({i32, i1}, [2 x i7])* @custom_with_ret
+}
+
+; TLS_ABI: define linkonce_odr { i1, i7 } @"dfsw$custom_cb"({ i1, i7 } ({ i32, i1 }, [2 x i7])* %0, { i32, i1 } %1, [2 x i7] %2) {
+; TLS_ABI: %labelreturn = alloca i16, align 2
+; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 6) to [2 x i16]*), align [[ALIGN:2]]
+; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to { i16, i16 }*), align [[ALIGN]]
+; TLS_ABI: [[CB:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+; TLS_ABI: [[CAST:%.*]] = bitcast { i1, i7 } ({ i32, i1 }, [2 x i7])* %0 to i8*
+; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]  
+; TLS_ABI: [[R:%.*]]  = call { i1, i7 } @__dfsw_custom_cb({ i1, i7 } ({ i1, i7 } ({ i32, i1 }, [2 x i7])*, { i32, i1 }, [2 x i7], i16, i16, i16*)* @"dfst0$custom_cb", i8* [[CAST]], { i32, i1 } %1, [2 x i7] %2, i16 zeroext [[CB]], i16 zeroext [[A01]], i16 zeroext [[B01]], i16* %labelreturn)
+; TLS_ABI: [[RE:%.*]] = load i16, i16* %labelreturn, align [[ALIGN]]
+; TLS_ABI: [[RS0:%.*]] = insertvalue { i16, i16 } undef, i16 [[RE]], 0
+; TLS_ABI: [[RS1:%.*]] = insertvalue { i16, i16 } [[RS0]], i16 [[RE]], 1
+; TLS_ABI: store { i16, i16 } [[RS1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+ 
+
+define {i1, i7} @custom_with_ret({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: define linkonce_odr { i1, i7 } @"dfsw$custom_with_ret"({ i32, i1 } %0, [2 x i7] %1)
+  ; TLS_ABI: %labelreturn = alloca i16, align 2
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]
+  ; TLS_ABI: [[R:%.*]] = call { i1, i7 } @__dfsw_custom_with_ret({ i32, i1 } %0, [2 x i7] %1, i16 zeroext [[A01]], i16 zeroext [[B01]], i16* %labelreturn)
+  ; TLS_ABI: [[RE:%.*]] = load i16, i16* %labelreturn, align 2
+  ; TLS_ABI: [[RS0:%.*]] = insertvalue { i16, i16 } undef, i16 [[RE]], 0
+  ; TLS_ABI: [[RS1:%.*]] = insertvalue { i16, i16 } [[RS0]], i16 [[RE]], 1
+  ; TLS_ABI: store { i16, i16 } [[RS1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: ret { i1, i7 } [[R]]
+  %a1 = extractvalue {i32, i1} %a, 1
+  %b0 = extractvalue [2 x i7] %b, 0
+  %r0 = insertvalue {i1, i7} undef, i1 %a1, 0
+  %r1 = insertvalue {i1, i7} %r0, i7 %b0, 1
+  ret {i1, i7} %r1
+}
+
+define void @custom_without_ret({i32, i1} %a, [2 x i7] %b) {
+  ; TLS_ABI: define linkonce_odr void @"dfsw$custom_without_ret"({ i32, i1 } %0, [2 x i7] %1)
+  ; TLS_ABI: [[B:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN:2]]
+  ; TLS_ABI: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; TLS_ABI: [[A0:%.*]] = extractvalue { i16, i16 } [[A]], 0
+  ; TLS_ABI: [[A1:%.*]] = extractvalue { i16, i16 } [[A]], 1
+  ; TLS_ABI: [[A01:%.*]] = or i16 [[A0]], [[A1]]
+  ; TLS_ABI: [[B0:%.*]] = extractvalue [2 x i16] [[B]], 0
+  ; TLS_ABI: [[B1:%.*]] = extractvalue [2 x i16] [[B]], 1
+  ; TLS_ABI: [[B01:%.*]] = or i16 [[B0]], [[B1]]
+  ; TLS_ABI: call void @__dfsw_custom_without_ret({ i32, i1 } %0, [2 x i7] %1, i16 zeroext [[A01]], i16 zeroext [[B01]])
+  ; TLS_ABI: ret
+  ret void
+}
+
+define void @custom_varg({i32, i1} %a, ...) {
+  ; TLS_ABI: define linkonce_odr void @"dfsw$custom_varg"({ i32, i1 } %0, ...)
+  ; TLS_ABI: call void @__dfsan_vararg_wrapper
+  ; TLS_ABI: unreachable
+  ret void
+}
+
+; TLS_ABI: declare { i1, i7 } @__dfsw_custom_with_ret({ i32, i1 }, [2 x i7], i16, i16, i16*)
+; TLS_ABI: declare void @__dfsw_custom_without_ret({ i32, i1 }, [2 x i7], i16, i16)
+; TLS_ABI: declare void @__dfsw_custom_varg({ i32, i1 }, i16, i16*, ...)
+
+; TLS_ABI: declare { i1, i7 } @__dfsw_custom_cb({ i1, i7 } ({ i1, i7 } ({ i32, i1 }, [2 x i7])*, { i32, i1 }, [2 x i7], i16, i16, i16*)*, i8*, { i32, i1 }, [2 x i7], i16, i16, i16, i16*)
+
+; TLS_ABI: define linkonce_odr { i1, i7 } @"dfst0$custom_cb"({ i1, i7 } ({ i32, i1 }, [2 x i7])* %0, { i32, i1 } %1, [2 x i7] %2, i16 %3, i16 %4, i16* %5) {
+; TLS_ABI: [[A0:%.*]] = insertvalue { i16, i16 } undef, i16 %3, 0
+; TLS_ABI: [[A1:%.*]] = insertvalue { i16, i16 } [[A0]], i16 %3, 1
+; TLS_ABI: [[B0:%.*]] = insertvalue [2 x i16] undef, i16 %4, 0
+; TLS_ABI: [[B1:%.*]] = insertvalue [2 x i16] [[B0]], i16 %4, 1
+; TLS_ABI: store { i16, i16 } [[A1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN:2]]
+; TLS_ABI: store [2 x i16] [[B1]], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to [2 x i16]*), align [[ALIGN]]
+; TLS_ABI: [[R:%.*]] = call { i1, i7 } %0({ i32, i1 } %1, [2 x i7] %2)
+; TLS_ABI: %_dfsret = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+; TLS_ABI: [[RE0:%.*]] = extractvalue { i16, i16 } %_dfsret, 0
+; TLS_ABI: [[RE1:%.*]] = extractvalue { i16, i16 } %_dfsret, 1
+; TLS_ABI: [[RE01:%.*]] = or i16 [[RE0]], [[RE1]]
+; TLS_ABI: store i16 [[RE01]], i16* %5, align [[ALIGN]]
+; TLS_ABI: ret { i1, i7 } [[R]]

diff  --git a/llvm/test/Instrumentation/DataFlowSanitizer/array.ll b/llvm/test/Instrumentation/DataFlowSanitizer/array.ll
new file mode 100644
index 000000000000..5dd0656322cc
--- /dev/null
+++ b/llvm/test/Instrumentation/DataFlowSanitizer/array.ll
@@ -0,0 +1,345 @@
+; RUN: opt < %s -dfsan -S | FileCheck %s --check-prefix=LEGACY
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-event-callbacks=true -S | FileCheck %s --check-prefix=EVENT_CALLBACKS
+; RUN: opt < %s -dfsan -dfsan-args-abi -S | FileCheck %s --check-prefix=ARGS_ABI
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -S | FileCheck %s --check-prefix=FAST16
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-combine-pointer-labels-on-load=false -S | FileCheck %s --check-prefix=NO_COMBINE_LOAD_PTR
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-combine-pointer-labels-on-store=true -S | FileCheck %s --check-prefix=COMBINE_STORE_PTR
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-debug-nonzero-labels -S | FileCheck %s --check-prefix=DEBUG_NONZERO_LABELS
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define [4 x i8] @pass_array([4 x i8] %a) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$pass_array"
+  ; NO_COMBINE_LOAD_PTR: %1 = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: store [4 x i16] %1, [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align [[ALIGN]]
+
+  ; ARGS_ABI: @"dfs$pass_array"
+  ; ARGS_ABI: ret { [4 x i8], i16 }
+  
+  ; DEBUG_NONZERO_LABELS: @"dfs$pass_array"
+  ; DEBUG_NONZERO_LABELS: [[L:%.*]] = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN:2]]
+  ; DEBUG_NONZERO_LABELS: [[L0:%.*]] = extractvalue [4 x i16] [[L]], 0
+  ; DEBUG_NONZERO_LABELS: [[L1:%.*]] = extractvalue [4 x i16] [[L]], 1
+  ; DEBUG_NONZERO_LABELS: [[L01:%.*]] = or i16 [[L0]], [[L1]]
+  ; DEBUG_NONZERO_LABELS: [[L2:%.*]] = extractvalue [4 x i16] [[L]], 2
+  ; DEBUG_NONZERO_LABELS: [[L012:%.*]] = or i16 [[L01]], [[L2]]
+  ; DEBUG_NONZERO_LABELS: [[L3:%.*]] = extractvalue [4 x i16] [[L]], 3
+  ; DEBUG_NONZERO_LABELS: [[L0123:%.*]] = or i16 [[L012]], [[L3]]
+  ; DEBUG_NONZERO_LABELS: {{.*}} = icmp ne i16 [[L0123]], 0
+  ; DEBUG_NONZERO_LABELS: call void @__dfsan_nonzero_label()
+  
+  ret [4 x i8] %a
+}
+
+%ArrayOfStruct = type [4 x {i8*, i32}]
+
+define %ArrayOfStruct @pass_array_of_struct(%ArrayOfStruct %as) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$pass_array_of_struct"
+  ; NO_COMBINE_LOAD_PTR: %1 = load [4 x { i16, i16 }], [4 x { i16, i16 }]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x { i16, i16 }]*), align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: store [4 x { i16, i16 }] %1, [4 x { i16, i16 }]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x { i16, i16 }]*), align [[ALIGN]]
+
+  ; ARGS_ABI: @"dfs$pass_array_of_struct"
+  ; ARGS_ABI: ret { [4 x { i8*, i32 }], i16 }
+  ret %ArrayOfStruct %as
+}
+
+define [4 x i1]* @alloca_ret_array() {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$alloca_ret_array"
+  ; NO_COMBINE_LOAD_PTR: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  %p = alloca [4 x i1]
+  ret [4 x i1]* %p
+}
+
+define [4 x i1] @load_alloca_array() {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_alloca_array"
+  ; NO_COMBINE_LOAD_PTR: [[A:%.*]] = alloca i16, align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: [[M:%.*]] = load i16, i16* [[A]], align [[ALIGN]]
+  ; NO_COMBINE_LOAD_PTR: [[S0:%.*]] = insertvalue [4 x i16] undef, i16 [[M]], 0
+  ; NO_COMBINE_LOAD_PTR: [[S1:%.*]] = insertvalue [4 x i16] [[S0]], i16 [[M]], 1
+  ; NO_COMBINE_LOAD_PTR: [[S2:%.*]] = insertvalue [4 x i16] [[S1]], i16 [[M]], 2
+  ; NO_COMBINE_LOAD_PTR: [[S3:%.*]] = insertvalue [4 x i16] [[S2]], i16 [[M]], 3
+  ; NO_COMBINE_LOAD_PTR: store [4 x i16] [[S3]], [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align [[ALIGN]]
+  %p = alloca [4 x i1]
+  %a = load [4 x i1], [4 x i1]* %p
+  ret [4 x i1] %a
+}
+
+define [0 x i1] @load_array0([0 x i1]* %p) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_array0"
+  ; NO_COMBINE_LOAD_PTR: store [0 x i16] zeroinitializer, [0 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [0 x i16]*), align 2
+  %a = load [0 x i1], [0 x i1]* %p
+  ret [0 x i1] %a
+}
+
+define [1 x i1] @load_array1([1 x i1]* %p) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_array1"
+  ; NO_COMBINE_LOAD_PTR: [[L:%.*]] = load i16,
+  ; NO_COMBINE_LOAD_PTR: [[S:%.*]] = insertvalue [1 x i16] undef, i16 [[L]], 0
+  ; NO_COMBINE_LOAD_PTR: store [1 x i16] [[S]], [1 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [1 x i16]*), align 2
+
+  ; EVENT_CALLBACKS: @"dfs$load_array1"
+  ; EVENT_CALLBACKS: [[L:%.*]] = or i16
+  ; EVENT_CALLBACKS: call void @__dfsan_load_callback(i16 [[L]], i8* {{.*}})
+
+  ; FAST16: @"dfs$load_array1"
+  ; FAST16: [[P:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; FAST16: [[L:%.*]] = load i16, i16* {{.*}}, align [[ALIGN]]
+  ; FAST16: [[U:%.*]] = or i16 [[L]], [[P]]
+  ; FAST16: [[S1:%.*]] = insertvalue [1 x i16] undef, i16 [[U]], 0
+  ; FAST16: store [1 x i16] [[S1]], [1 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [1 x i16]*), align [[ALIGN]]
+  
+  ; LEGACY: @"dfs$load_array1"
+  ; LEGACY: [[P:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[L:%.*]] = load i16, i16* {{.*}}, align [[ALIGN]]
+  ; LEGACY: [[U:%.*]] = call zeroext i16 @__dfsan_union(i16 zeroext [[L]], i16 zeroext [[P]])
+  ; LEGACY: [[PH:%.*]] = phi i16 [ [[U]], {{.*}} ], [ [[L]], {{.*}} ]
+  ; LEGACY: store i16 [[PH]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+
+  %a = load [1 x i1], [1 x i1]* %p
+  ret [1 x i1] %a
+}
+
+define [2 x i1] @load_array2([2 x i1]* %p) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_array2"
+  ; NO_COMBINE_LOAD_PTR: [[P1:%.*]] = getelementptr i16, i16* [[P0:%.*]], i64 1
+  ; NO_COMBINE_LOAD_PTR-DAG: [[E1:%.*]] = load i16, i16* [[P1]], align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR-DAG: [[E0:%.*]] = load i16, i16* [[P0]], align [[ALIGN]]
+  ; NO_COMBINE_LOAD_PTR: [[U:%.*]] = or i16 [[E0]], [[E1]]
+  ; NO_COMBINE_LOAD_PTR: [[S1:%.*]] = insertvalue [2 x i16] undef, i16 [[U]], 0
+  ; NO_COMBINE_LOAD_PTR: [[S2:%.*]] = insertvalue [2 x i16] [[S1]], i16 [[U]], 1
+  ; NO_COMBINE_LOAD_PTR: store [2 x i16] [[S2]], [2 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [2 x i16]*), align [[ALIGN]]
+
+  ; EVENT_CALLBACKS: @"dfs$load_array2"
+  ; EVENT_CALLBACKS: [[O1:%.*]] = or i16
+  ; EVENT_CALLBACKS: [[O2:%.*]] = or i16 [[O1]]
+  ; EVENT_CALLBACKS: call void @__dfsan_load_callback(i16 [[O2]], i8* {{.*}})
+
+  ; FAST16: @"dfs$load_array2"
+  ; FAST16: [[P:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; FAST16: [[O:%.*]] = or i16
+  ; FAST16: [[U:%.*]] = or i16 [[O]], [[P]]
+  ; FAST16: [[S:%.*]] = insertvalue [2 x i16] undef, i16 [[U]], 0
+  ; FAST16: [[S1:%.*]] = insertvalue [2 x i16] [[S]], i16 [[U]], 1
+  ; FAST16: store [2 x i16] [[S1]], [2 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [2 x i16]*), align [[ALIGN]]
+  %a = load [2 x i1], [2 x i1]* %p
+  ret [2 x i1] %a
+}
+
+define [4 x i1] @load_array4([4 x i1]* %p) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_array4"
+  ; NO_COMBINE_LOAD_PTR: [[T:%.*]] = trunc i64 {{.*}} to i16
+  ; NO_COMBINE_LOAD_PTR: [[S1:%.*]] = insertvalue [4 x i16] undef, i16 [[T]], 0
+  ; NO_COMBINE_LOAD_PTR: [[S2:%.*]] = insertvalue [4 x i16] [[S1]], i16 [[T]], 1
+  ; NO_COMBINE_LOAD_PTR: [[S3:%.*]] = insertvalue [4 x i16] [[S2]], i16 [[T]], 2
+  ; NO_COMBINE_LOAD_PTR: [[S4:%.*]] = insertvalue [4 x i16] [[S3]], i16 [[T]], 3
+  ; NO_COMBINE_LOAD_PTR: store [4 x i16] [[S4]], [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align 2
+
+  ; EVENT_CALLBACKS: @"dfs$load_array4"
+  ; EVENT_CALLBACKS: [[O0:%.*]] = or i64
+  ; EVENT_CALLBACKS: [[O1:%.*]] = or i64 [[O0]]
+  ; EVENT_CALLBACKS: [[O2:%.*]] = trunc i64 [[O1]] to i16
+  ; EVENT_CALLBACKS: [[O3:%.*]] = or i16 [[O2]]
+  ; EVENT_CALLBACKS: call void @__dfsan_load_callback(i16 [[O3]], i8* {{.*}})
+
+  ; FAST16: @"dfs$load_array4"
+  ; FAST16: [[T:%.*]] = trunc i64 {{.*}} to i16
+  ; FAST16: [[O:%.*]] = or i16 [[T]]
+  ; FAST16: [[S1:%.*]] = insertvalue [4 x i16] undef, i16 [[O]], 0
+  ; FAST16: [[S2:%.*]] = insertvalue [4 x i16] [[S1]], i16 [[O]], 1
+  ; FAST16: [[S3:%.*]] = insertvalue [4 x i16] [[S2]], i16 [[O]], 2
+  ; FAST16: [[S4:%.*]] = insertvalue [4 x i16] [[S3]], i16 [[O]], 3
+  ; FAST16: store [4 x i16] [[S4]], [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align 2
+
+  ; LEGACY: @"dfs$load_array4"
+  ; LEGACY: [[P:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[PH1:%.*]] = phi i16
+  ; LEGACY: [[U:%.*]] = call zeroext i16 @__dfsan_union(i16 zeroext [[PH1]], i16 zeroext [[P]])
+  ; LEGACY: [[PH:%.*]] = phi i16 [ [[U]], {{.*}} ], [ [[PH1]], {{.*}} ]
+  ; LEGACY: store i16 [[PH]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+
+  %a = load [4 x i1], [4 x i1]* %p
+  ret [4 x i1] %a
+}
+
+define i1 @extract_array([4 x i1] %a) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$extract_array"
+  ; NO_COMBINE_LOAD_PTR: [[AM:%.*]] = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: [[EM:%.*]] = extractvalue [4 x i16] [[AM]], 2
+  ; NO_COMBINE_LOAD_PTR: store i16 [[EM]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  %e2 = extractvalue [4 x i1] %a, 2
+  ret i1 %e2
+}
+
+define [4 x i1] @insert_array([4 x i1] %a, i1 %e2) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$insert_array"
+  ; NO_COMBINE_LOAD_PTR: [[EM:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 8) to i16*), align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: [[AM:%.*]] = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN]]
+  ; NO_COMBINE_LOAD_PTR: [[AM1:%.*]] = insertvalue [4 x i16] [[AM]], i16 [[EM]], 0
+  ; NO_COMBINE_LOAD_PTR: store [4 x i16] [[AM1]], [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align [[ALIGN]]
+  %a1 = insertvalue [4 x i1] %a, i1 %e2, 0
+  ret [4 x i1] %a1
+}
+
+define void @store_alloca_array([4 x i1] %a) {
+  ; FAST16: @"dfs$store_alloca_array"
+  ; FAST16: [[S:%.*]] = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN:2]]
+  ; FAST16: [[SP:%.*]] = alloca i16, align [[ALIGN]]
+  ; FAST16: [[E0:%.*]] = extractvalue [4 x i16] [[S]], 0
+  ; FAST16: [[E1:%.*]] = extractvalue [4 x i16] [[S]], 1
+  ; FAST16: [[E01:%.*]] = or i16 [[E0]], [[E1]]
+  ; FAST16: [[E2:%.*]] = extractvalue [4 x i16] [[S]], 2
+  ; FAST16: [[E012:%.*]] = or i16 [[E01]], [[E2]]
+  ; FAST16: [[E3:%.*]] = extractvalue [4 x i16] [[S]], 3
+  ; FAST16: [[E0123:%.*]] = or i16 [[E012]], [[E3]]
+  ; FAST16: store i16 [[E0123]], i16* [[SP]], align [[ALIGN]]
+  %p = alloca [4 x i1]
+  store [4 x i1] %a, [4 x i1]* %p
+  ret void
+}
+
+define void @store_zero_array([4 x i1]* %p) {
+  ; FAST16: @"dfs$store_zero_array"
+  ; FAST16: store i64 0, i64* {{.*}}, align 2
+  store [4 x i1] zeroinitializer, [4 x i1]* %p
+  ret void
+}
+
+define void @store_array2([2 x i1] %a, [2 x i1]* %p) {
+  ; LEGACY: @"dfs$store_array2"
+  ; LEGACY: [[S:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[SP0:%.*]] = getelementptr i16, i16* [[SP:%.*]], i32 0
+  ; LEGACY: store i16 [[S]], i16* [[SP0]], align [[ALIGN]]
+  ; LEGACY: [[SP1:%.*]] = getelementptr i16, i16* [[SP]], i32 1
+  ; LEGACY: store i16 [[S]], i16* [[SP1]], align [[ALIGN]]
+  
+  ; EVENT_CALLBACKS: @"dfs$store_array2"
+  ; EVENT_CALLBACKS: [[E12:%.*]] = or i16
+  ; EVENT_CALLBACKS: [[P:%.*]] = bitcast [2 x i1]* %p to i8*
+  ; EVENT_CALLBACKS: call void @__dfsan_store_callback(i16 [[E12]], i8* [[P]])
+  
+  ; FAST16: @"dfs$store_array2"
+  ; FAST16: [[S:%.*]] = load [2 x i16], [2 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [2 x i16]*), align [[ALIGN:2]]
+  ; FAST16: [[E1:%.*]] = extractvalue [2 x i16] [[S]], 0
+  ; FAST16: [[E2:%.*]] = extractvalue [2 x i16] [[S]], 1
+  ; FAST16: [[E12:%.*]] = or i16 [[E1]], [[E2]]
+  ; FAST16: [[SP0:%.*]] = getelementptr i16, i16* [[SP:%.*]], i32 0
+  ; FAST16: store i16 [[E12]], i16* [[SP0]], align [[ALIGN]]
+  ; FAST16: [[SP1:%.*]] = getelementptr i16, i16* [[SP]], i32 1
+  ; FAST16: store i16 [[E12]], i16* [[SP1]], align [[ALIGN]]
+
+  ; COMBINE_STORE_PTR: @"dfs$store_array2"
+  ; COMBINE_STORE_PTR: [[O:%.*]] = or i16
+  ; COMBINE_STORE_PTR: [[U:%.*]] = or i16 [[O]]
+  ; COMBINE_STORE_PTR: [[P1:%.*]] = getelementptr i16, i16* [[P:%.*]], i32 0
+  ; COMBINE_STORE_PTR: store i16 [[U]], i16* [[P1]], align 2
+  ; COMBINE_STORE_PTR: [[P2:%.*]] = getelementptr i16, i16* [[P]], i32 1
+  ; COMBINE_STORE_PTR: store i16 [[U]], i16* [[P2]], align 2
+  
+  store [2 x i1] %a, [2 x i1]* %p
+  ret void
+}
+
+define void @store_array17([17 x i1] %a, [17 x i1]* %p) {
+  ; FAST16: @"dfs$store_array17"
+  ; FAST16: [[AL:%.*]] = load [17 x i16], [17 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [17 x i16]*), align 2
+  ; FAST16: [[AL0:%.*]] = extractvalue [17 x i16] [[AL]], 0
+  ; FAST16: [[AL1:%.*]] = extractvalue [17 x i16] [[AL]], 1
+  ; FAST16: [[AL_0_1:%.*]] = or i16 [[AL0]], [[AL1]]
+  ; FAST16: [[AL2:%.*]] = extractvalue [17 x i16] [[AL]], 2
+  ; FAST16: [[AL_0_2:%.*]] = or i16 [[AL_0_1]], [[AL2]]
+  ; FAST16: [[AL3:%.*]] = extractvalue [17 x i16] [[AL]], 3
+  ; FAST16: [[AL_0_3:%.*]] = or i16 [[AL_0_2]], [[AL3]]
+  ; FAST16: [[AL4:%.*]] = extractvalue [17 x i16] [[AL]], 4
+  ; FAST16: [[AL_0_4:%.*]] = or i16 [[AL_0_3]], [[AL4]]
+  ; FAST16: [[AL5:%.*]] = extractvalue [17 x i16] [[AL]], 5
+  ; FAST16: [[AL_0_5:%.*]] = or i16 %10, [[AL5]]
+  ; FAST16: [[AL6:%.*]] = extractvalue [17 x i16] [[AL]], 6
+  ; FAST16: [[AL_0_6:%.*]] = or i16 %12, [[AL6]]
+  ; FAST16: [[AL7:%.*]] = extractvalue [17 x i16] [[AL]], 7
+  ; FAST16: [[AL_0_7:%.*]] = or i16 %14, [[AL7]]
+  ; FAST16: [[AL8:%.*]] = extractvalue [17 x i16] [[AL]], 8
+  ; FAST16: [[AL_0_8:%.*]] = or i16 %16, [[AL8]]
+  ; FAST16: [[AL9:%.*]] = extractvalue [17 x i16] [[AL]], 9
+  ; FAST16: [[AL_0_9:%.*]] = or i16 %18, [[AL9]]
+  ; FAST16: [[AL10:%.*]] = extractvalue [17 x i16] [[AL]], 10
+  ; FAST16: [[AL_0_10:%.*]] = or i16 %20, [[AL10]]
+  ; FAST16: [[AL11:%.*]] = extractvalue [17 x i16] [[AL]], 11
+  ; FAST16: [[AL_0_11:%.*]] = or i16 %22, [[AL11]]
+  ; FAST16: [[AL12:%.*]] = extractvalue [17 x i16] [[AL]], 12
+  ; FAST16: [[AL_0_12:%.*]] = or i16 %24, [[AL12]]
+  ; FAST16: [[AL13:%.*]] = extractvalue [17 x i16] [[AL]], 13
+  ; FAST16: [[AL_0_13:%.*]] = or i16 %26, [[AL13]]
+  ; FAST16: [[AL14:%.*]] = extractvalue [17 x i16] [[AL]], 14
+  ; FAST16: [[AL_0_14:%.*]] = or i16 %28, [[AL14]]
+  ; FAST16: [[AL15:%.*]] = extractvalue [17 x i16] [[AL]], 15
+  ; FAST16: [[AL_0_15:%.*]] = or i16 %30, [[AL15]]
+  ; FAST16: [[AL16:%.*]] = extractvalue [17 x i16] [[AL]], 16
+  ; FAST16: [[AL_0_16:%.*]] = or i16 {{.*}}, [[AL16]]
+  ; FAST16: [[V1:%.*]] = insertelement <8 x i16> undef, i16 [[AL_0_16]], i32 0
+  ; FAST16: [[V2:%.*]] = insertelement <8 x i16> [[V1]], i16 [[AL_0_16]], i32 1
+  ; FAST16: [[V3:%.*]] = insertelement <8 x i16> [[V2]], i16 [[AL_0_16]], i32 2
+  ; FAST16: [[V4:%.*]] = insertelement <8 x i16> [[V3]], i16 [[AL_0_16]], i32 3
+  ; FAST16: [[V5:%.*]] = insertelement <8 x i16> [[V4]], i16 [[AL_0_16]], i32 4
+  ; FAST16: [[V6:%.*]] = insertelement <8 x i16> [[V5]], i16 [[AL_0_16]], i32 5
+  ; FAST16: [[V7:%.*]] = insertelement <8 x i16> [[V6]], i16 [[AL_0_16]], i32 6
+  ; FAST16: [[V8:%.*]] = insertelement <8 x i16> [[V7]], i16 [[AL_0_16]], i32 7
+  ; FAST16: [[VP:%.*]] = bitcast i16* [[P:%.*]] to <8 x i16>*
+  ; FAST16: [[VP1:%.*]] = getelementptr <8 x i16>, <8 x i16>* [[VP]], i32 0
+  ; FAST16: store <8 x i16> [[V8]], <8 x i16>* [[VP1]], align [[ALIGN:2]]
+  ; FAST16: [[VP2:%.*]] = getelementptr <8 x i16>, <8 x i16>* [[VP]], i32 1
+  ; FAST16: store <8 x i16> [[V8]], <8 x i16>* [[VP2]], align [[ALIGN]]
+  ; FAST16: [[P3:%.*]] = getelementptr i16, i16* [[P]], i32 16
+  ; FAST16: store i16 [[AL_0_16]], i16* [[P3]], align [[ALIGN]]
+  store [17 x i1] %a, [17 x i1]* %p
+  ret void
+}
+
+define [2 x i32] @const_array() {
+  ; FAST16: @"dfs$const_array"
+  ; FAST16: store [2 x i16] zeroinitializer, [2 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [2 x i16]*), align 2
+  ret [2 x i32] [ i32 42, i32 11 ]
+}
+
+define [4 x i8] @call_array([4 x i8] %a) {
+  ; FAST16: @"dfs$call_array"
+  ; FAST16: [[A:%.*]] = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN:2]]
+  ; FAST16: store [4 x i16] [[A]], [4 x i16]* bitcast ([100 x i64]* @__dfsan_arg_tls to [4 x i16]*), align [[ALIGN]]
+  ; FAST16: %_dfsret = load [4 x i16], [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align [[ALIGN]]
+  ; FAST16: store [4 x i16] %_dfsret, [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align [[ALIGN]]
+
+  %r = call [4 x i8] @pass_array([4 x i8] %a)
+  ret [4 x i8] %r
+}
+
+%LargeArr = type [1000 x i8]
+
+define i8 @fun_with_large_args(i1 %i, %LargeArr %a) {
+  ; FAST16: @"dfs$fun_with_large_args"
+  ; FAST16: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  %r = extractvalue %LargeArr %a, 0
+  ret i8 %r
+}
+
+define %LargeArr @fun_with_large_ret() {
+  ; FAST16: @"dfs$fun_with_large_ret"
+  ; FAST16-NEXT: ret  [1000 x i8] zeroinitializer
+  ret %LargeArr zeroinitializer
+}
+
+define i8 @call_fun_with_large_ret() {
+  ; FAST16: @"dfs$call_fun_with_large_ret"
+  ; FAST16: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  %r = call %LargeArr @fun_with_large_ret()
+  %e = extractvalue %LargeArr %r, 0
+  ret i8 %e
+}
+
+define i8 @call_fun_with_large_args(i1 %i, %LargeArr %a) {
+  ; FAST16: @"dfs$call_fun_with_large_args"
+  ; FAST16: [[I:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; FAST16: store i16 [[I]], i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; FAST16: %r = call i8 @"dfs$fun_with_large_args"(i1 %i, [1000 x i8] %a)
+  
+  %r = call i8 @fun_with_large_args(i1 %i, %LargeArr %a)
+  ret i8 %r
+}

diff  --git a/llvm/test/Instrumentation/DataFlowSanitizer/phi.ll b/llvm/test/Instrumentation/DataFlowSanitizer/phi.ll
index 6ef8fef85de1..f0df19ff932c 100644
--- a/llvm/test/Instrumentation/DataFlowSanitizer/phi.ll
+++ b/llvm/test/Instrumentation/DataFlowSanitizer/phi.ll
@@ -1,11 +1,18 @@
-; RUN: opt < %s -dfsan -S | FileCheck %s
+; RUN: opt < %s -dfsan -S | FileCheck %s --check-prefix=LEGACY
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -S | FileCheck %s --check-prefix=FAST16
 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 define {i32, i32} @test({i32, i32} %a, i1 %c) {
-  ; CHECK: [[E0:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
-  ; CHECK: [[E3:%.*]] = phi i16 [ [[E0]], %T ], [ [[E0]], %F ]
-  ; CHECK: store i16 [[E3]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  ; LEGACY: [[AL:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[PL:%.*]] = phi i16 [ [[AL]], %T ], [ [[AL]], %F ]
+  ; LEGACY: store i16 [[PL]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+
+  ; FAST16: [[AL:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN:2]]
+  ; FAST16: [[AL0:%.*]] = insertvalue { i16, i16 } [[AL]], i16 0, 0
+  ; FAST16: [[AL1:%.*]] = insertvalue { i16, i16 } [[AL]], i16 0, 1
+  ; FAST16: [[PL:%.*]] = phi { i16, i16 } [ [[AL0]], %T ], [ [[AL1]], %F ]
+  ; FAST16: store { i16, i16 } [[PL]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
 
 entry:
   br i1 %c, label %T, label %F

diff  --git a/llvm/test/Instrumentation/DataFlowSanitizer/store.ll b/llvm/test/Instrumentation/DataFlowSanitizer/store.ll
index 4560c3d7fb8f..2701f968f93d 100644
--- a/llvm/test/Instrumentation/DataFlowSanitizer/store.ll
+++ b/llvm/test/Instrumentation/DataFlowSanitizer/store.ll
@@ -163,4 +163,4 @@ define void @store_zero(i32* %p) {
   ;  NO_COMBINE_PTR_LABEL: store i64 0, i64* {{.*}}, align 2
   store i32 0, i32* %p
   ret void
-}
+}
\ No newline at end of file

diff  --git a/llvm/test/Instrumentation/DataFlowSanitizer/struct.ll b/llvm/test/Instrumentation/DataFlowSanitizer/struct.ll
new file mode 100644
index 000000000000..f2279d088339
--- /dev/null
+++ b/llvm/test/Instrumentation/DataFlowSanitizer/struct.ll
@@ -0,0 +1,283 @@
+; RUN: opt < %s -dfsan -S | FileCheck %s --check-prefix=LEGACY
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-event-callbacks=true -S | FileCheck %s --check-prefix=EVENT_CALLBACKS
+; RUN: opt < %s -dfsan -dfsan-args-abi -S | FileCheck %s --check-prefix=ARGS_ABI
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -S | FileCheck %s --check-prefix=FAST16
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-combine-pointer-labels-on-load=false -S | FileCheck %s --check-prefix=NO_COMBINE_LOAD_PTR
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-combine-pointer-labels-on-store=true -S | FileCheck %s --check-prefix=COMBINE_STORE_PTR
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-track-select-control-flow=false -S | FileCheck %s --check-prefix=NO_SELECT_CONTROL
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -dfsan-debug-nonzero-labels -S | FileCheck %s --check-prefix=DEBUG_NONZERO_LABELS
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define {i8*, i32} @pass_struct({i8*, i32} %s) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$pass_struct"
+  ; NO_COMBINE_LOAD_PTR: [[L:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: store { i16, i16 } [[L]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  ; ARGS_ABI: @"dfs$pass_struct"
+  ; ARGS_ABI: ret { { i8*, i32 }, i16 }
+  
+  ; DEBUG_NONZERO_LABELS: @"dfs$pass_struct"
+  ; DEBUG_NONZERO_LABELS: [[L:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN:2]]
+  ; DEBUG_NONZERO_LABELS: [[L0:%.*]] = extractvalue { i16, i16 } [[L]], 0
+  ; DEBUG_NONZERO_LABELS: [[L1:%.*]] = extractvalue { i16, i16 } [[L]], 1
+  ; DEBUG_NONZERO_LABELS: [[L01:%.*]] = or i16 [[L0]], [[L1]]
+  ; DEBUG_NONZERO_LABELS: {{.*}} = icmp ne i16 [[L01]], 0
+  ; DEBUG_NONZERO_LABELS: call void @__dfsan_nonzero_label()
+  ; DEBUG_NONZERO_LABELS: store { i16, i16 } [[L]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  
+  ret {i8*, i32} %s
+}
+
+%StructOfAggr = type {i8*, [4 x i2], <4 x i3>, {i1, i1}}
+
+define %StructOfAggr @pass_struct_of_aggregate(%StructOfAggr %s) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$pass_struct_of_aggregate"
+  ; NO_COMBINE_LOAD_PTR: %1 = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN:2]]
+  ; NO_COMBINE_LOAD_PTR: store { i16, [4 x i16], i16, { i16, i16 } } %1, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN]]
+
+  ; ARGS_ABI: @"dfs$pass_struct_of_aggregate"
+  ; ARGS_ABI: ret { %StructOfAggr, i16 }
+  ret %StructOfAggr %s
+}
+
+define {} @load_empty_struct({}* %p) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_empty_struct"
+  ; NO_COMBINE_LOAD_PTR: store {} zeroinitializer, {}* bitcast ([100 x i64]* @__dfsan_retval_tls to {}*), align 2
+
+  %a = load {}, {}* %p
+  ret {} %a
+}
+
+ at Y = constant {i1, i32} {i1 1, i32 1}
+
+define {i1, i32} @load_global_struct() {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_global_struct"
+  ; NO_COMBINE_LOAD_PTR: store { i16, i16 } zeroinitializer, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align 2
+
+  %a = load {i1, i32}, {i1, i32}* @Y
+  ret {i1, i32} %a
+}
+
+define {i1, i32} @select_struct(i1 %c, {i1, i32} %a, {i1, i32} %b) {
+  ; NO_SELECT_CONTROL: @"dfs$select_struct"
+  ; NO_SELECT_CONTROL: [[B:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 6) to { i16, i16 }*), align [[ALIGN:2]]
+  ; NO_SELECT_CONTROL: [[A:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to { i16, i16 }*), align [[ALIGN]]
+  ; NO_SELECT_CONTROL: [[C:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; NO_SELECT_CONTROL: [[S:%.*]] = select i1 %c, { i16, i16 } [[A]], { i16, i16 } [[B]]
+  ; NO_SELECT_CONTROL: store { i16, i16 } [[S]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  ; FAST16: @"dfs$select_struct"
+  ; FAST16: [[B_S:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 6) to { i16, i16 }*), align [[ALIGN:2]]
+  ; FAST16: [[A_S:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to { i16, i16 }*), align [[ALIGN]]
+  ; FAST16: [[C_S:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; FAST16: [[S_S:%.*]] = select i1 %c, { i16, i16 } [[A_S]], { i16, i16 } [[B_S]]
+  ; FAST16: [[S0_S:%.*]] = extractvalue { i16, i16 } [[S_S]], 0
+  ; FAST16: [[S1_S:%.*]] = extractvalue { i16, i16 } [[S_S]], 1
+  ; FAST16: [[S01_S:%.*]] = or i16 [[S0_S]], [[S1_S]]
+  ; FAST16: [[CS_S:%.*]] = or i16 [[C_S]], [[S01_S]]
+  ; FAST16: [[S1:%.*]] = insertvalue { i16, i16 } undef, i16 [[CS_S]], 0
+  ; FAST16: [[S2:%.*]] = insertvalue { i16, i16 } [[S1]], i16 [[CS_S]], 1
+  ; FAST16: store { i16, i16 } [[S2]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  ; LEGACY: @"dfs$select_struct"
+  ; LEGACY: [[U:%.*]] = call zeroext i16 @__dfsan_union
+  ; LEGACY: [[P:%.*]] = phi i16 [ [[U]],
+  ; LEGACY: store i16 [[P]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+
+  %s = select i1 %c, {i1, i32} %a, {i1, i32} %b
+  ret {i1, i32} %s
+}
+
+define { i32, i32 } @asm_struct(i32 %0, i32 %1) {
+  ; FAST16: @"dfs$asm_struct"
+  ; FAST16: [[E1:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to i16*), align [[ALIGN:2]]
+  ; FAST16: [[E0:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; FAST16: [[E01:%.*]] = or i16 [[E0]], [[E1]]
+  ; FAST16: [[S0:%.*]] = insertvalue { i16, i16 } undef, i16 [[E01]], 0
+  ; FAST16: [[S1:%.*]] = insertvalue { i16, i16 } [[S0]], i16 [[E01]], 1
+  ; FAST16: store { i16, i16 } [[S1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  ; LEGACY: @"dfs$asm_struct"
+  ; LEGACY: [[E1:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[E0:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; LEGACY: [[E01:%.*]] = call zeroext i16 @__dfsan_union(i16 zeroext [[E0]], i16 zeroext [[E1]])
+  ; LEGACY: [[P:%.*]] = phi i16 [ [[E01]], {{.*}} ], [ [[E0]], {{.*}} ]
+  ; LEGACY: store i16 [[P]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  
+entry:
+  %a = call { i32, i32 } asm "", "=r,=r,r,r,~{dirflag},~{fpsr},~{flags}"(i32 %0, i32 %1)
+  ret { i32, i32 } %a
+}
+
+define {i32, i32} @const_struct() {
+  ; FAST16: @"dfs$const_struct"
+  ; FAST16: store { i16, i16 } zeroinitializer, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align 2
+  
+  ; LEGACY: @"dfs$const_struct"
+  ; LEGACY: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  ret {i32, i32} { i32 42, i32 11 }
+}
+
+define i1 @extract_struct({i1, i5} %s) {
+  ; FAST16: @"dfs$extract_struct"
+  ; FAST16: [[SM:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN:2]]
+  ; FAST16: [[EM:%.*]] = extractvalue { i16, i16 } [[SM]], 0
+  ; FAST16: store i16 [[EM]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  
+  ; LEGACY: @"dfs$extract_struct"
+  ; LEGACY: [[SM:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; LEGACY: store i16 [[SM]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  %e2 = extractvalue {i1, i5} %s, 0
+  ret i1 %e2
+}
+
+define {i1, i5} @insert_struct({i1, i5} %s, i5 %e1) {
+  ; FAST16: @"dfs$insert_struct"
+  ; FAST16: [[EM:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 4) to i16*), align [[ALIGN:2]]
+  ; FAST16: [[SM:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; FAST16: [[SM1:%.*]] = insertvalue { i16, i16 } [[SM]], i16 [[EM]], 1
+  ; FAST16: store { i16, i16 } [[SM1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  
+  ; LEGACY: @"dfs$insert_struct"
+  ; LEGACY: [[EM:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[SM:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; LEGACY: [[U:%.*]] = call zeroext i16 @__dfsan_union(i16 zeroext [[SM]], i16 zeroext [[EM]])
+  ; LEGACY: [[P:%.*]] = phi i16 [ [[U]], {{.*}} ], [ [[SM]], {{.*}} ]
+  ; LEGACY: store i16 [[P]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  %s1 = insertvalue {i1, i5} %s, i5 %e1, 1
+  ret {i1, i5} %s1
+}
+
+define {i1, i1} @load_struct({i1, i1}* %p) {
+  ; NO_COMBINE_LOAD_PTR: @"dfs$load_struct"
+  ; NO_COMBINE_LOAD_PTR: [[OL:%.*]] = or i16
+  ; NO_COMBINE_LOAD_PTR: [[S0:%.*]] = insertvalue { i16, i16 } undef, i16 [[OL]], 0
+  ; NO_COMBINE_LOAD_PTR: [[S1:%.*]] = insertvalue { i16, i16 } [[S0]], i16 [[OL]], 1
+  ; NO_COMBINE_LOAD_PTR: store { i16, i16 } [[S1]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align 2
+  
+  ; EVENT_CALLBACKS: @"dfs$load_struct"
+  ; EVENT_CALLBACKS: [[OL0:%.*]] = or i16
+  ; EVENT_CALLBACKS: [[OL1:%.*]] = or i16 [[OL0]],
+  ; EVENT_CALLBACKS: [[S0:%.*]] = insertvalue { i16, i16 } undef, i16 [[OL1]], 0
+  ; EVENT_CALLBACKS: call void @__dfsan_load_callback(i16 [[OL1]]
+  
+  %s = load {i1, i1}, {i1, i1}* %p
+  ret {i1, i1} %s
+}
+
+define void @store_struct({i1, i1}* %p, {i1, i1} %s) {
+  ; FAST16: @"dfs$store_struct"
+  ; FAST16: [[S:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to { i16, i16 }*), align [[ALIGN:2]]
+  ; FAST16: [[E0:%.*]] = extractvalue { i16, i16 } [[S]], 0
+  ; FAST16: [[E1:%.*]] = extractvalue { i16, i16 } [[S]], 1
+  ; FAST16: [[E:%.*]] = or i16 [[E0]], [[E1]]
+  ; FAST16: [[P0:%.*]] = getelementptr i16, i16* [[P:%.*]], i32 0
+  ; FAST16: store i16 [[E]], i16* [[P0]], align [[ALIGN]]
+  ; FAST16: [[P1:%.*]] = getelementptr i16, i16* [[P]], i32 1
+  ; FAST16: store i16 [[E]], i16* [[P1]], align [[ALIGN]]
+  
+  ; EVENT_CALLBACKS: @"dfs$store_struct"
+  ; EVENT_CALLBACKS: [[OL:%.*]] = or i16
+  ; EVENT_CALLBACKS: call void @__dfsan_store_callback(i16 [[OL]]
+  
+  ; COMBINE_STORE_PTR: @"dfs$store_struct"
+  ; COMBINE_STORE_PTR: [[PL:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; COMBINE_STORE_PTR: [[SL:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to { i16, i16 }*), align [[ALIGN]]
+  ; COMBINE_STORE_PTR: [[SL0:%.*]] = extractvalue { i16, i16 } [[SL]], 0
+  ; COMBINE_STORE_PTR: [[SL1:%.*]] = extractvalue { i16, i16 } [[SL]], 1
+  ; COMBINE_STORE_PTR: [[SL01:%.*]] = or i16 [[SL0]], [[SL1]]
+  ; COMBINE_STORE_PTR: [[E:%.*]] = or i16 [[SL01]], [[PL]]
+  ; COMBINE_STORE_PTR: [[P0:%.*]] = getelementptr i16, i16* [[P:%.*]], i32 0
+  ; COMBINE_STORE_PTR: store i16 [[E]], i16* [[P0]], align [[ALIGN]]
+  ; COMBINE_STORE_PTR: [[P1:%.*]] = getelementptr i16, i16* [[P]], i32 1
+  ; COMBINE_STORE_PTR: store i16 [[E]], i16* [[P1]], align [[ALIGN]]
+  
+  store {i1, i1} %s, {i1, i1}* %p
+  ret void
+}
+
+define i2 @extract_struct_of_aggregate11(%StructOfAggr %s) {
+  ; FAST16: @"dfs$extract_struct_of_aggregate11"
+  ; FAST16: [[E:%.*]] = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN:2]]
+  ; FAST16: [[E11:%.*]] = extractvalue { i16, [4 x i16], i16, { i16, i16 } } [[E]], 1, 1
+  ; FAST16: store i16 [[E11]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+
+  %e11 = extractvalue %StructOfAggr %s, 1, 1
+  ret i2 %e11
+}
+
+define [4 x i2] @extract_struct_of_aggregate1(%StructOfAggr %s) {
+  ; FAST16: @"dfs$extract_struct_of_aggregate1"
+  ; FAST16: [[E:%.*]] = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN:2]]
+  ; FAST16: [[E1:%.*]] = extractvalue { i16, [4 x i16], i16, { i16, i16 } } [[E]], 1
+  ; FAST16: store [4 x i16] [[E1]], [4 x i16]* bitcast ([100 x i64]* @__dfsan_retval_tls to [4 x i16]*), align [[ALIGN]]
+  %e1 = extractvalue %StructOfAggr %s, 1
+  ret [4 x i2] %e1
+}
+
+define <4 x i3> @extract_struct_of_aggregate2(%StructOfAggr %s) {
+  ; FAST16: @"dfs$extract_struct_of_aggregate2"
+  ; FAST16: [[E:%.*]] = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN:2]]
+  ; FAST16: [[E2:%.*]] = extractvalue { i16, [4 x i16], i16, { i16, i16 } } [[E]], 2
+  ; FAST16: store i16 [[E2]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  %e2 = extractvalue %StructOfAggr %s, 2
+  ret <4 x i3> %e2
+}
+
+define { i1, i1 } @extract_struct_of_aggregate3(%StructOfAggr %s) {
+  ; FAST16: @"dfs$extract_struct_of_aggregate3"
+  ; FAST16: [[E:%.*]] = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN:2]]
+  ; FAST16: [[E3:%.*]] = extractvalue { i16, [4 x i16], i16, { i16, i16 } } [[E]], 3
+  ; FAST16: store { i16, i16 } [[E3]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  %e3 = extractvalue %StructOfAggr %s, 3
+  ret { i1, i1 } %e3
+}
+
+define i1 @extract_struct_of_aggregate31(%StructOfAggr %s) {
+  ; FAST16: @"dfs$extract_struct_of_aggregate31"
+  ; FAST16: [[E:%.*]] = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN:2]]
+  ; FAST16: [[E31:%.*]] = extractvalue { i16, [4 x i16], i16, { i16, i16 } } [[E]], 3, 1
+  ; FAST16: store i16 [[E31]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  %e31 = extractvalue %StructOfAggr %s, 3, 1
+  ret i1 %e31
+}
+
+define %StructOfAggr @insert_struct_of_aggregate11(%StructOfAggr %s, i2 %e11) {
+  ; FAST16: @"dfs$insert_struct_of_aggregate11"
+  ; FAST16: [[E11:%.*]]  = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 16) to i16*), align [[ALIGN:2]]
+  ; FAST16: [[S:%.*]]  = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN]]
+  ; FAST16: [[S1:%.*]]  = insertvalue { i16, [4 x i16], i16, { i16, i16 } } [[S]], i16 [[E11]], 1, 1
+  ; FAST16: store { i16, [4 x i16], i16, { i16, i16 } } [[S1]], { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN]]
+
+  %s1 = insertvalue %StructOfAggr %s, i2 %e11, 1, 1
+  ret %StructOfAggr %s1
+}
+
+define {i8*, i32} @call_struct({i8*, i32} %s) {
+  ; FAST16: @"dfs$call_struct"
+  ; FAST16: [[S:%.*]] = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN:2]]
+  ; FAST16: store { i16, i16 } [[S]], { i16, i16 }* bitcast ([100 x i64]* @__dfsan_arg_tls to { i16, i16 }*), align [[ALIGN]]
+  ; FAST16: %_dfsret = load { i16, i16 }, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+  ; FAST16: store { i16, i16 } %_dfsret, { i16, i16 }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, i16 }*), align [[ALIGN]]
+
+  %r = call {i8*, i32} @pass_struct({i8*, i32} %s)
+  ret {i8*, i32} %r
+}
+
+declare %StructOfAggr @fun_with_many_aggr_args(<2 x i7> %v, [2 x i5] %a, {i3, i3} %s)
+
+define %StructOfAggr @call_many_aggr_args(<2 x i7> %v, [2 x i5] %a, {i3, i3} %s) {
+  ; FAST16: @"dfs$call_many_aggr_args"
+  ; FAST16: [[S:%.*]] = load { i16, i16 }, { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 6) to { i16, i16 }*), align [[ALIGN:2]]
+  ; FAST16: [[A:%.*]] = load [2 x i16], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to [2 x i16]*), align [[ALIGN]]
+  ; FAST16: [[V:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; FAST16: store i16 [[V]], i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; FAST16: store [2 x i16] [[A]], [2 x i16]* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to [2 x i16]*), align [[ALIGN]]
+  ; FAST16: store { i16, i16 } [[S]], { i16, i16 }* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 6) to { i16, i16 }*), align [[ALIGN]]
+  ; FAST16: %_dfsret = load { i16, [4 x i16], i16, { i16, i16 } }, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN]]
+  ; FAST16: store { i16, [4 x i16], i16, { i16, i16 } } %_dfsret, { i16, [4 x i16], i16, { i16, i16 } }* bitcast ([100 x i64]* @__dfsan_retval_tls to { i16, [4 x i16], i16, { i16, i16 } }*), align [[ALIGN]]
+
+  %r = call %StructOfAggr @fun_with_many_aggr_args(<2 x i7> %v, [2 x i5] %a, {i3, i3} %s)
+  ret %StructOfAggr %r
+}
\ No newline at end of file

diff  --git a/llvm/test/Instrumentation/DataFlowSanitizer/vector.ll b/llvm/test/Instrumentation/DataFlowSanitizer/vector.ll
new file mode 100644
index 000000000000..98f57a2cbbac
--- /dev/null
+++ b/llvm/test/Instrumentation/DataFlowSanitizer/vector.ll
@@ -0,0 +1,60 @@
+; RUN: opt < %s -dfsan -S | FileCheck %s --check-prefix=LEGACY
+; RUN: opt < %s -dfsan -dfsan-args-abi -S | FileCheck %s --check-prefix=ARGS_ABI
+; RUN: opt < %s -dfsan -dfsan-fast-16-labels=true -S | FileCheck %s --check-prefix=FAST16
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define <4 x i4> @pass_vector(<4 x i4> %v) {
+  ; ARGS_ABI: @"dfs$pass_vector"
+  ; ARGS_ABI: ret { <4 x i4>, i16 }
+  
+  ; FAST16: @"dfs$pass_vector"
+  ; FAST16: {{.*}} = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; FAST16: store i16 %1, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  ret <4 x i4> %v
+}
+
+define void @load_update_store_vector(<4 x i4>* %p) {
+  ; FAST16: @"dfs$load_update_store_vector"
+  ; FAST16: {{.*}} = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align 2
+
+  %v = load <4 x i4>, <4 x i4>* %p
+  %e2 = extractelement <4 x i4> %v, i32 2
+  %v1 = insertelement <4 x i4> %v, i4 %e2, i32 0
+  store <4 x i4> %v1, <4 x i4>* %p
+  ret void
+}
+
+define <4 x i1> @icmp_vector(<4 x i8> %a, <4 x i8> %b) {
+  ; LEGACY: @"dfs$icmp_vector"
+  ; LEGACY: [[B:%.*]] = load i16, i16* inttoptr (i64 add (i64 ptrtoint ([100 x i64]* @__dfsan_arg_tls to i64), i64 2) to i16*), align [[ALIGN:2]]
+  ; LEGACY: [[A:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; LEGACY: [[U:%.*]] = call zeroext i16 @__dfsan_union(i16 zeroext [[A]], i16 zeroext [[B]])
+  ; LEGACY: [[PH:%.*]] = phi i16 [ [[U]], {{.*}} ], [ [[A]], {{.*}} ]
+  ; LEGACY: store i16 [[PH]], i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  
+  %r = icmp eq <4 x i8> %a, %b
+  ret <4 x i1> %r
+}
+
+define <2 x i32> @const_vector() {
+  ; LEGACY: @"dfs$const_vector"
+  ; LEGACY: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  
+  ; FAST16: @"dfs$const_vector"
+  ; FAST16: store i16 0, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align 2
+  ret <2 x i32> < i32 42, i32 11 >
+}
+
+define <4 x i4> @call_vector(<4 x i4> %v) {
+  ; LEGACY: @"dfs$call_vector"
+  ; LEGACY: [[V:%.*]] = load i16, i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN:2]]
+  ; LEGACY: store i16 [[V]], i16* bitcast ([100 x i64]* @__dfsan_arg_tls to i16*), align [[ALIGN]]
+  ; LEGACY: %_dfsret = load i16, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+  ; LEGACY: store i16 %_dfsret, i16* bitcast ([100 x i64]* @__dfsan_retval_tls to i16*), align [[ALIGN]]
+
+  %r = call <4 x i4> @pass_vector(<4 x i4> %v)
+  ret <4 x i4> %r
+}
+
+