[PATCH] D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.

Fri May 11 08:57:03 PDT 2018

aprantl added a comment.

In https://reviews.llvm.org/D44338#1094746, @hsaito wrote:

> In https://reviews.llvm.org/D44338#1094601, @aprantl wrote:
>
> > Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?
>
>
> I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Assuming widening means putting a smaller value into a larger register at offset 0, then it is safe to just point the dbg.value to the new larger register. If the offset is nonzero, you'll need to generate a DW_OP_shr DIExpression to shift the value into place in the debugger.

> 
> 
>>> Great thanks. Besides the DbgInfoIntrinsics,  do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.
>> 
>> Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)
> 
> Aside from widening, what vectorizer does is not much different from expression tree rewriting (in LLVM, 3-address code equivalent of that). We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same. What could be tricky is handling of interleaved memory access optimization, e.g., where multiple strided vector memrefs are converted into unit-stride vector memrefs and shuffles. Even then, I'd hope there are some existing memory access optimizations doing something comparable enough and learn from it.

Great! Let me know if you come across any concrete questions.

https://reviews.llvm.org/D44338