[PATCH] Optimize insertqi when we copy all the lower 64 bits.

Wed Apr 23 17:00:16 PDT 2014

On Apr 23, 2014, at 2:19 PM, Filipe Cabecinhas <filcab+llvm.phabricator at gmail.com> wrote:

> How could we use insertelement here?
> Should we just (on the clang side) bitcast the vectors to <128 x i1> and use extractelement + insertelement?
> That seems… hard to match on the output side.

After reading your comment I realized that insertqi does not only insert elements. It is actually copying n bits at location m into the destination.  I think that we can represent this as a sequence of logical operations with a constant mask. Later we would need to pattern match this instruction in isel. 

I think that eventually we should lower this intrinsic to an IR, but until we do I am okay with the current patch. 

Thanks,
Nadav

> 
> The instruction copies _bits_ from one vector to another. We can special-case it when we're copying from an 8/16/32/64 bit boundary, for a multiple of that amount of bits, but doing the generic instruction in IR doesn't seem that easy to do.
> 
> I've also looked a bit more at SelectionDAG and I think it might be worth it to move this optimization there. But I don't know if it could handle the sequence of insertqi -> copy source vector, when the insertqi ranges add up to [0,64).
> 
> Which of these would be better?
> 
> Regards,
>   Filipe
> 
> 
> On Wed, Apr 23, 2014 at 9:16 AM, Rafael Avila de Espindola <rafael.espindola at gmail.com> wrote:
> 
> 
> Sent from my iPhone
> 
> > On Apr 23, 2014, at 12:08, Nadav Rotem <nrotem at apple.com> wrote:
> >
> >
> >> On Apr 23, 2014, at 6:56 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> >>
> >>> On 15 April 2014 14:04, Nadav Rotem <nrotem at apple.com> wrote:
> >>> Hi Filipe,
> >>>
> >>> Why is this an IR-level transform? Could you implement this in SelectionDAG ?
> >>
> >> What is the advantage of doing this at SelectionDAG? Since this is an
> >> intrinsic, we know all that we need at the IR level already. IR also
> >> has the advantage of opening the potential for further optimizations
> > It is not clear to me why we represent this intrinsic as an IR-level intrinsic and not as a regular insertelement instruction. We already have IR-level optimizations on insertelement and I prefer not to duplicate all of them.
> >
> 
> That I fully agree with. If the operation can be represented with generic ir that is by far the best solution.
> 
> 
> >> an has much better testing than SelecetionDAG.
> >>
> >> Cheers,
> >> Rafael
> >
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140423/a7e5bd7c/attachment.html>