PATCH: WIP SLPVectorize: Enable vectorization of allocas

Fri Oct 25 11:50:58 PDT 2013

----- Original Message -----
> 
> I see the point that Hal and Chandler made about ExtractElement being
> the canonical form. If we can cannonicalize this pattern (in
> InstCombine) into ExtractElement with a dynamic index then we should
> probably do it.

I'm in complete agreement, but I'm just not sure that we currently do canonicalize this way. Do we?

> We can probably undo this canonicalization during
> isle without generating worse code. I prefer to perform this kind of
> canonicalization without TTI inside InstCombine. Hal, why do you
> suspect that lowering dynamic extract element would result in worse
> code had we not canonicalized it ?

My fear, based on zero data, was that we might find cases where the build_vector is only partially loop dependent. In that case, if we leave things as is, then those non-loop-dependent stores can be hoisted out of the loop. If we canonicalize that way, then the build_vector, if we're expanding it, will be expanded completely in the loop. Maybe MI-LICM will compensate for this in all relevant cases (as I said, it was based on zero data). I'm comfortable canonicalizing this way, and then fixing any regressions as they appear (if they appear).

 -Hal

> 
> 
> 
> 
> On Oct 25, 2013, at 1:14 AM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> Hi Tom,
> 
> I am not sure that it is a good idea to generate ExtractElement
> instructions with dynamic indices at IR level.
> 
> I think that we should handle this decision using TTI. ExtractElement
> with a dynamic index, in theory, seems like it should be the
> preferred canonical form. In practice, I suspect that preferring it
> without target input will lead to hard-to-fix regressions. On my
> target, I can generate dynamic EEs in a relatively inexpensive way
> (one integer op, one shuffle generation and one shuffle
> instruction), but for other targets, dynamic EEs are always expanded
> through stack-slot load/stores. In the latter case, these extra
> load/stores may not be removable by DAGCombine (even if they might
> otherwise be in this case) because the EE's input may have been
> moved into some predecessor BB.
> 
> -Hal
> 
> 
> 
> I think that this
> kind of patterns should be matched during instruction selection.
> Assuming that you can pattern match it during isel, you are still
> left with the alloca. Allocas are lowered into stack slots and I
> think that we should be able to move stack slots that are only
> stored to.
> 
> Thanks,
> Nadav
> 
> 
> On Oct 24, 2013, at 9:02 PM, Tom Stellard < tom at stellard.net > wrote:
> 
> 
> 
> Hi,
> 
> As a follow up to this discussion:
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066780.html
> 
> I put together a very simple patch that begins to implement the
> transformation
> mentioned in the llvm-dev thread. The patch is incomplete and is
> mostly comments
> with question about how to do certain things, but it does work for
> the
> simple test case included in the patch.
> 
> I'd appreciate any feedback people can give me on this patch and
> the
> questions posed in the comments.
> 
> Thanks,
> Tom
> <vectorize-alloca.diff>
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory