[PATCH] D46018: [GlobalISel][IRTranslator] Split aggregates during IR translation

Mon Apr 30 11:58:38 PDT 2018

rtereshin added inline comments.

================
Comment at: include/llvm/CodeGen/GlobalISel/IRTranslator.h:78
+    using const_offset_iterator =
+        DenseMap<const Value *, OffsetListT *>::const_iterator;
+
----------------
aemerson wrote:
> rtereshin wrote:
> > This looks like over 128 bytes per `Value` or more. How does memory consumption change with this patch?
> > 
> > If it ends up being a problem, we might reuse the `MachineInstr`s approach with storing machine memory operands. It's a similar pattern - in absolute majority of the cases we have just one to one (in case of value to vregs, or one to zero in case of the machine instruction to memory operands) mapping, but only sometimes its one to many.
> I haven't measured memory consumption, but changing the SmallVector capacities to 1 has no impact on the compile time on CTMark so I'll do that.
> 
> Can you be more specific about the MachineInstr thing? To which bit of code are you referring to? I think at capacities of 1 this should be fine.
> I haven't measured memory consumption, but changing the SmallVector capacities to 1 has no impact on the compile time on CTMark so I'll do that.

Makes sense.

I came up with 128+ for bytes as follows:
1. every SmallVector has `3 * sizeof(intptr_t)` worth of overhead (size, capacity, pointer to the underlying array), looks like we have 2 of them per `Value`, so 6 `intptr_t`s of overhead.
2. We also have 2 separate maps that in very best case consume 4 `intptr_t`s per `Value` on top of that, more likely about twice that if `DenseMap`s keep the load factor around 0.5, I didn't check that, but AFAIK it's a common value.
3. The actual data consume 2 `intptr_t`s for 4 `unsigned`s and 4 `intptr_t`s for 4 `uint64_t` on 64-bit platforms.
Overall it's optimistically `(10 + 6) * sizeof(intptr_t)` = 128+ bytes per `Value` on 64-bit platforms.

Reducing the default `SmallVector` sizes to 1 changes only (3) to 1 (due to alignment) + 1 `intptr_t`s with overall memory consumption `(10 + 2) * sizeof(intptr_t)` = 96+ bytes per `Value` on 64-bit platforms.

If we put a vreg and its offset in the same `pair` and have a single `SmallVector` and a single map it will go down like follows:
1. 3 `intptr_t`s of overhead per `Value` due to `SmallVector`
2. 2+ `intptr_t`s of overhead per `Value` due to the pointer to pointer map.
3. 2 `intptr_t` of actual payload assuming the default size of `SmallVector` being 1.
overall `(5 + 2) * sizeof(intptr_t)` = 56+ bytes per `Value` on 64-bit platforms.

If the memory consumption is not a problem, that should be more than enough I think.

The memory operand's trick I was referring to lives roughly here:
https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/include/llvm/CodeGen/MachineInstr.h#L107-L114
https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/lib/CodeGen/MachineInstr.cpp#L318-L332
https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/include/llvm/CodeGen/MachineInstr.h#L1304-L1313

It won't save much here on top of the "just one SmallVector and one map" suggestion, basically just the `capacity` part of `SmallVector`, AFAICT, we definitely don't need it if the memory consumption is not an issue here.

All of this is just a mere suggestion, we can keep it as it is and worry about it later if it becomes an apparent problem, I think.

UPD. I took a look at `DenseMap` implementation, and it appears to me that the expected value of it's load factor is 9/16, so it takes approximately twice as much memory as estimated above. Therefore estimations change from 128 -> 96 -> 56 to 160 -> 128 -> 72.

Repository:
  rL LLVM

https://reviews.llvm.org/D46018