[PATCH] improve vectorizers by removing cost of unnecessary truncs and exts.

James Molloy james at jamesmolloy.co.uk
Tue May 12 06:48:32 PDT 2015


s/<4 x i16>/<4 x i8>/ above- typo.

On Tue, 12 May 2015 at 14:48 James Molloy <james at jamesmolloy.co.uk> wrote:

> Hi Sam,
>
> Thanks for making those changes.
>
>   * Don't use a newline between } and else. Good: "} else {" Bad:
> "}\nelse{".
>   * Trivial one-liners should not have braces.
>   * Other vectorizer testcases check the debug output to check the correct
> costs are being considered internally. It would be a good idea to do that
> here too.
>   * I think both your patches have an issue: all other instructions should
> be clamped to the widest type before cost analysis. Consider:
>
> getWidestType() == i8
> %1 = zext i8 %foo to i32
> %2 = add i32 %1, 5
>
> You will query TTI for the %2 instruction and ask "what is the cost of a
> <4 x i32> add?". You should be asking "What is the cost of an <4 x i16>
> add?".
>
> This is going to be more difficult for the SLP case, so I suggest you peel
> the SLP stuff out into a separate patch.
>
> Cheers,
>
> James
>
> On Tue, 12 May 2015 at 13:05 Sam Parker <sam.parker at arm.com> wrote:
>
>> Hi James,
>>
>>
>>
>> I’ve added the two test files for loop- and slp-vectorize. I have also
>> added an additional check so that if the widest type is less than the
>> target ‘VectorTy’, the cost is actually calculated as a cast to a vector of
>> the widest type instead.
>>
>>
>>
>> Cheers,
>>
>> Sam
>>
>>
>>
>>
>>
>> *From:* James Molloy [mailto:james at jamesmolloy.co.uk]
>> *Sent:* 11 May 2015 16:34
>> *To:* Sam Parker; Demikhovsky, Elena
>>
>>
>> *Cc:* llvm-commits at cs.uiuc.edu
>> *Subject:* Re: [PATCH] improve vectorizers by removing cost of
>> unnecessary truncs and exts.
>>
>>
>>
>> Hi Sam,
>>
>> It's a mismodelling, where you'll say something is cheaper than it
>> actually is.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> James
>>
>>
>>
>> On Mon, 11 May 2015 at 15:46 Sam Parker <sam.parker at arm.com> wrote:
>>
>> Hi James,
>>
>>
>>
>> By not totally sound, do you mean incorrect or that I’m just missing an
>> extra opportunity and I should check that extensions aren’t targeting a
>> larger than necessary bitwidth? As for the test file, I didn’t know if I
>> was supposed to add it along with the patch or not, and will attach it in
>> the next version.
>>
>>
>>
>> Cheers,
>>
>> Sam
>>
>>
>>
>>
>>
>> *From:* James Molloy [mailto:james at jamesmolloy.co.uk]
>> *Sent:* 11 May 2015 13:49
>> *To:* Demikhovsky, Elena; Sam Parker
>> *Cc:* llvm-commits at cs.uiuc.edu
>> *Subject:* Re: [PATCH] improve vectorizers by removing cost of
>> unnecessary truncs and exts.
>>
>>
>>
>> Hi Sam,
>>
>> Thanks for working on this! This has been a running sore in ARM
>> vectorization for a long time now.
>>
>>
>>
>> Initial comments:
>>
>>    * In the future, could you please use Phabricator (
>> http://reviews.llvm.org) to upload patches to; it makes it much easier
>> to review.
>>
>>    * There are no regression tests included in this patch - did you
>> forget to git add them?
>>
>>    * I'm not sure the logic is totally sound. Consider this:
>>
>>
>>
>> loop:
>>
>>   %1 = load i8* %foo
>>
>>   %2 = zext i8 %1 to i32
>>
>>   %3 = add i32 %2, 42
>>
>>   %4 = trunc i32 %3 to i16
>>
>>   %5 = add i16 %4, 42
>>
>>   store i16* %5, %bar
>>
>>   br loop
>>
>>
>>
>> getWidestType() will return i16. So the first cast is not free, but
>> neither is it a cast to i32, it's a cast to i16.
>>
>>
>>
>> I'm not exactly sure what Elena's query was; it looks like the
>> implementation here should be architecture-agnostic as it's just modelling
>> changes to the IR (truncate nodes disappear) and the rest is the TTI's
>> responsibility.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> James
>>
>>
>>
>> On Mon, 11 May 2015 at 12:17 Demikhovsky, Elena <
>> elena.demikhovsky at intel.com> wrote:
>>
>> +        if (Opcode == Instruction::Trunc) {
>>
>> +          if (TTI->isTypeLegal(DstVecTy)) {
>>
>> +            VecCost = 0;
>>
>> +          }
>>
>>
>>
>> On AVX-512 the “truncate” is usually one instruction, the VecCost should
>> be 1.
>>
>> On AVX the type may be legal, but “truncate” is more than one instruction.
>>
>>
>>
>> -          * Elena*
>>
>>
>>
>> *From:* llvm-commits-bounces at cs.uiuc.edu [mailto:
>> llvm-commits-bounces at cs.uiuc.edu] *On Behalf Of *Sam Parker
>> *Sent:* Monday, May 11, 2015 13:57
>> *To:* llvm-commits at cs.uiuc.edu
>> *Subject:* [PATCH] improve vectorizers by removing cost of unnecessary
>> truncs and exts.
>>
>>
>>
>> Hi,
>>
>>
>>
>> I’ve attached a patch to both the loop vectorizer and slp-vectorizer
>> which checks to see whether truncs and extensions would actually be
>> required if the code was vectorized. This is so that the vectorizers
>> understand that the cost of these instructions is effectively zero if
>> vectorization happens. This is helpful when working on smaller data types,
>> such as i8 and i16, that do not have native support in general purpose
>> registers, but are supported in vector register files.
>>
>>
>>
>> Regards,
>>
>> Sam
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150512/2dd74200/attachment.html>


More information about the llvm-commits mailing list