[PATCH] D46018: [GlobalISel][IRTranslator] Split aggregates during IR translation

Thu May 10 13:23:02 PDT 2018

aemerson added inline comments.

================
Comment at: include/llvm/CodeGen/GlobalISel/IRTranslator.h:78
+    using const_offset_iterator =
+        DenseMap<const Value *, OffsetListT *>::const_iterator;
+
----------------
rtereshin wrote:
> aemerson wrote:
> > rtereshin wrote:
> > > aemerson wrote:
> > > > rtereshin wrote:
> > > > > This looks like over 128 bytes per `Value` or more. How does memory consumption change with this patch?
> > > > > 
> > > > > If it ends up being a problem, we might reuse the `MachineInstr`s approach with storing machine memory operands. It's a similar pattern - in absolute majority of the cases we have just one to one (in case of value to vregs, or one to zero in case of the machine instruction to memory operands) mapping, but only sometimes its one to many.
> > > > I haven't measured memory consumption, but changing the SmallVector capacities to 1 has no impact on the compile time on CTMark so I'll do that.
> > > > 
> > > > Can you be more specific about the MachineInstr thing? To which bit of code are you referring to? I think at capacities of 1 this should be fine.
> > > > I haven't measured memory consumption, but changing the SmallVector capacities to 1 has no impact on the compile time on CTMark so I'll do that.
> > > 
> > > Makes sense.
> > > 
> > > I came up with 128+ for bytes as follows:
> > > 1. every SmallVector has `3 * sizeof(intptr_t)` worth of overhead (size, capacity, pointer to the underlying array), looks like we have 2 of them per `Value`, so 6 `intptr_t`s of overhead.
> > > 2. We also have 2 separate maps that in very best case consume 4 `intptr_t`s per `Value` on top of that, more likely about twice that if `DenseMap`s keep the load factor around 0.5, I didn't check that, but AFAIK it's a common value.
> > > 3. The actual data consume 2 `intptr_t`s for 4 `unsigned`s and 4 `intptr_t`s for 4 `uint64_t` on 64-bit platforms.
> > > Overall it's optimistically `(10 + 6) * sizeof(intptr_t)` = 128+ bytes per `Value` on 64-bit platforms.
> > > 
> > > Reducing the default `SmallVector` sizes to 1 changes only (3) to 1 (due to alignment) + 1 `intptr_t`s with overall memory consumption `(10 + 2) * sizeof(intptr_t)` = 96+ bytes per `Value` on 64-bit platforms.
> > > 
> > > If we put a vreg and its offset in the same `pair` and have a single `SmallVector` and a single map it will go down like follows:
> > > 1. 3 `intptr_t`s of overhead per `Value` due to `SmallVector`
> > > 2. 2+ `intptr_t`s of overhead per `Value` due to the pointer to pointer map.
> > > 3. 2 `intptr_t` of actual payload assuming the default size of `SmallVector` being 1.
> > > overall `(5 + 2) * sizeof(intptr_t)` = 56+ bytes per `Value` on 64-bit platforms.
> > > 
> > > If the memory consumption is not a problem, that should be more than enough I think.
> > > 
> > > The memory operand's trick I was referring to lives roughly here:
> > > https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/include/llvm/CodeGen/MachineInstr.h#L107-L114
> > > https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/lib/CodeGen/MachineInstr.cpp#L318-L332
> > > https://github.com/llvm-mirror/llvm/blob/2a793f6500a1a77cb7186549a6f9245bea847cf5/include/llvm/CodeGen/MachineInstr.h#L1304-L1313
> > > 
> > > It won't save much here on top of the "just one SmallVector and one map" suggestion, basically just the `capacity` part of `SmallVector`, AFAICT, we definitely don't need it if the memory consumption is not an issue here.
> > > 
> > > All of this is just a mere suggestion, we can keep it as it is and worry about it later if it becomes an apparent problem, I think.
> > > 
> > > UPD. I took a look at `DenseMap` implementation, and it appears to me that the expected value of it's load factor is 9/16, so it takes approximately twice as much memory as estimated above. Therefore estimations change from 128 -> 96 -> 56 to 160 -> 128 -> 72.
> > Thanks for the analysis. I don't think we can have just a single SmallVector because we need the ability to dynamically grow each set of vregs and/or offsets, so storing one SmallVector for each Value is better despite the extra memory cost in terms of the design/
> I didn't mean the same `SmallVector` for all values in the function, I meant one `SmallVector` for each `Value`, just as you say, but //one// (of `std::pair<unsigned, int64_t>`s, for instance), instead of two per each value (for offsets and vregs themselves). 
That's possible, I'd rather do it in a follow up patch.

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:128
+    uint64_t EltSize = DL.getTypeAllocSize(EltTy);
+    for (unsigned i = 0, e = ATy->getNumElements(); i != e; ++i)
+      computeValueLLTs(DL, *EltTy, ValueTys, Offsets,
----------------
rtereshin wrote:
> I think we're trying to stick to capitalized local variables' names, including loop induction variables.
To me using a capital I normally indicates an iterator while a lower case 'i' is assumed to be an integer type, and we already use lower case 'i' idiomatically everywhere including GISel.

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:494
     for (unsigned i = 2; i < U.getNumOperands(); ++i)
       Indices.push_back(U.getOperand(i));
   }
----------------
rtereshin wrote:
> This logic of getting indices from values is quite repeated here and in `extractvalue`, do you think it makes sense to refactor it out as a separate function?
Could do.

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:508
+    else
+      MIRBuilder.buildCopy(DstRegs[i], SrcRegs[i]);
+  }
----------------
rtereshin wrote:
> Let's say we have a large aggregate that eventually maps to N vregs, and also N `insertvalue`s each of which replaces just a single item (vreg). Will we end up generating N^2 `COPY`s for the IR of initial size `O(N)` here?
> 
> Granted, this probably happens very rarely in practice, but what if we don't `getOrCreateVRegs(U)`, but rather just `get(U)` (no implicit `MIR.createGenericRegister` calls), resize, and assign either `*InsertedIt++` or `SrcRegs[i]` to it directly w/o building any explicit `COPY`s at all? We're just remapping values back and forth here during translation, with always constant indices, so the whole thing could be copy-propagated on the fly avoiding quite a bit of MIR-churn.
I don't know why you mean by `get(U)`. What would it return?

Repository:
  rL LLVM

https://reviews.llvm.org/D46018