[PATCH] D46018: [GlobalISel][IRTranslator] Split aggregates during IR translation

Wed May 9 00:04:17 PDT 2018

rtereshin added inline comments.

================
Comment at: include/llvm/CodeGen/GlobalISel/IRTranslator.h:78
+    using const_offset_iterator =
+        DenseMap<const Value *, OffsetListT *>::const_iterator;
+
----------------
aemerson wrote:
> rtereshin wrote:
> > aemerson wrote:
> > > rtereshin wrote:
> > > > This looks like over 128 bytes per `Value` or more. How does memory consumption change with this patch?
> > > > 
> > > > If it ends up being a problem, we might reuse the `MachineInstr`s approach with storing machine memory operands. It's a similar pattern - in absolute majority of the cases we have just one to one (in case of value to vregs, or one to zero in case of the machine instruction to memory operands) mapping, but only sometimes its one to many.
> > > I haven't measured memory consumption, but changing the SmallVector capacities to 1 has no impact on the compile time on CTMark so I'll do that.
> > > 
> > > Can you be more specific about the MachineInstr thing? To which bit of code are you referring to? I think at capacities of 1 this should be fine.
> > > I haven't measured memory consumption, but changing the SmallVector capacities to 1 has no impact on the compile time on CTMark so I'll do that.
> > 
> > Makes sense.
> > 
> > I came up with 128+ for bytes as follows:
> > 1. every SmallVector has `3 * sizeof(intptr_t)` worth of overhead (size, capacity, pointer to the underlying array), looks like we have 2 of them per `Value`, so 6 `intptr_t`s of overhead.
> > 2. We also have 2 separate maps that in very best case consume 4 `intptr_t`s per `Value` on top of that, more likely about twice that if `DenseMap`s keep the load factor around 0.5, I didn't check that, but AFAIK it's a common value.
> > 3. The actual data consume 2 `intptr_t`s for 4 `unsigned`s and 4 `intptr_t`s for 4 `uint64_t` on 64-bit platforms.
> > Overall it's optimistically `(10 + 6) * sizeof(intptr_t)` = 128+ bytes per `Value` on 64-bit platforms.
> > 
> > Reducing the default `SmallVector` sizes to 1 changes only (3) to 1 (due to alignment) + 1 `intptr_t`s with overall memory consumption `(10 + 2) * sizeof(intptr_t)` = 96+ bytes per `Value` on 64-bit platforms.
> > 
> > If we put a vreg and its offset in the same `pair` and have a single `SmallVector` and a single map it will go down like follows:
> > 1. 3 `intptr_t`s of overhead per `Value` due to `SmallVector`
> > 2. 2+ `intptr_t`s of overhead per `Value` due to the pointer to pointer map.
> > 3. 2 `intptr_t` of actual payload assuming the default size of `SmallVector` being 1.
> > overall `(5 + 2) * sizeof(intptr_t)` = 56+ bytes per `Value` on 64-bit platforms.
> > 
> > If the memory consumption is not a problem, that should be more than enough I think.
> > 
> > The memory operand's trick I was referring to lives roughly here:
> > https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/include/llvm/CodeGen/MachineInstr.h#L107-L114
> > https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/lib/CodeGen/MachineInstr.cpp#L318-L332
> > https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/include/llvm/CodeGen/MachineInstr.h#L1304-L1313
> > 
> > It won't save much here on top of the "just one SmallVector and one map" suggestion, basically just the `capacity` part of `SmallVector`, AFAICT, we definitely don't need it if the memory consumption is not an issue here.
> > 
> > All of this is just a mere suggestion, we can keep it as it is and worry about it later if it becomes an apparent problem, I think.
> > 
> > UPD. I took a look at `DenseMap` implementation, and it appears to me that the expected value of it's load factor is 9/16, so it takes approximately twice as much memory as estimated above. Therefore estimations change from 128 -> 96 -> 56 to 160 -> 128 -> 72.
> Thanks for the analysis. I don't think we can have just a single SmallVector because we need the ability to dynamically grow each set of vregs and/or offsets, so storing one SmallVector for each Value is better despite the extra memory cost in terms of the design/
I didn't mean the same `SmallVector` for all values in the function, I meant one `SmallVector` for each `Value`, just as you say, but //one// (of `std::pair<unsigned, int64_t>`s, for instance), instead of two per each value (for offsets and vregs themselves). 

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:467

   uint64_t Offset = 8 * DL->getIndexedOffsetInType(Src->getType(), Indices);
+  ArrayRef<unsigned> SrcRegs = getOrCreateVRegs(*Src);
----------------
For better or for worse, `getIndexedOffsetInType` returns `int64_t`, not `uint64_t`.

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:494
     for (unsigned i = 2; i < U.getNumOperands(); ++i)
       Indices.push_back(U.getOperand(i));
   }
----------------
This logic of getting indices from values is quite repeated here and in `extractvalue`, do you think it makes sense to refactor it out as a separate function?

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:1275
+    auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_PHI);
+    MIB.addDef(Reg);
+    Insts.push_back(MIB.getInstr());
----------------
This could be inlined as `MIRBuilder.buildInstr(TargetOpcode::G_PHI, Reg);`

Repository:
  rL LLVM

https://reviews.llvm.org/D46018