[cfe-dev] Disable integer promotion (Dilan Manatunga via cfe-dev)

Tue May 31 05:22:30 PDT 2016

Hi Norman,

The main impact of the implicit promotions that we observed was that
vectorisation was taking place using vectors of a wider type than was
necessary and hence used more of them.  For example:

char* _restrict dst;

const char* __restrict src1;

const char* __restrict src2;

for (int i = 0; i < 256; ++i)

  dst[i] = src1[i] + src2[i];

would vectorise using v4i32 vectors instead of the v16i8 vectors,
resulting in 4 times as many instructions and increased vector register
pressure.  And of course, the use of unnecessary extend and truncate
operations.

            MartinO

From: Norman Rink [mailto:norman.rink at tu-dresden.de] 
Sent: 30 May 2016 15:11
To: Martin J. O'Riordan; 'James Molloy'; 'David Majnemer'; 'Dilan Manatunga'
Cc: 'Clang Dev'
Subject: Re: [cfe-dev] Disable integer promotion (Dilan Manatunga via
cfe-dev)

Hi all,

I realize this is potentially only tangent to the ongoing discussion, but
does anyone have significant experience with how integer promotion interacts
with vectorization? When I looked into this interaction, I did not have the
time to conduct a careful analysis, but I have reason to believe that
integer promotion can get in the way of vectorization, thereby limiting its
benefits. Can anyone comment? Thanks.

Best,

Norman

From: "Martin J. O'Riordan" <martin.oriordan at movidius.com
<mailto:martin.oriordan at movidius.com> >
Organization: Movidius Ltd.
Date: Monday 30 May 2016 14:53
To: 'James Molloy' <james at jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk>
>, 'David Majnemer' <david.majnemer at gmail.com
<mailto:david.majnemer at gmail.com> >, 'Dilan Manatunga' <manatunga at gmail.com
<mailto:manatunga at gmail.com> >
Cc: 'Clang Dev' <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> >,
Norman Rink <norman.rink at tu-dresden.de <mailto:norman.rink at tu-dresden.de> >
Subject: RE: [cfe-dev] Disable integer promotion (Dilan Manatunga via
cfe-dev)

Hi James and thanks for pointing out the existence of this transformation,
we were quite unaware of it.

As it happens, I am highly allergic to re-invention and avoid doing so
whenever possible; the only reason an already overburdened team of 2
developers will re-invent is because they are unaware of an existing
solution which is not difficult given the scope and complexity of LLVM.

So far as I can tell, truncateToMinimalBitwidths is always enabled, so it
is not a target specific selection and our target should automatically reap
the rewards of this optimisation pass.  I certainly cannot find a switch to
enable or disable it.  But in fact we are not seeing anywhere near the
benefits we would expect.

void InnerLoopVectorizer::truncateToMinimalBitwidths() {

  // For every instruction `I` in MinBWs, truncate the operands, create a

  // truncated version of `I` and reextend its result. InstCombine runs

  // later and will remove any ext/trunc pairs.

This appears to only run on inner-loops, and it appear to insert
narrowings/truncations and subsequent widenings/extendings into the IR
chains.

The DataLayout for our target includes -n8:16:32, so it should see the
benefits of optimisations for multiple native integer support.  We also
provide both 32-bit SIMD and 128-bit SIMD native support.

The pass that we wrote is quite different.  It is run as a machine pass
prior to loop-unrolling and vectorisation, and instead of pre-truncating and
post-extending IR chains, it removes the existing pre-extending and
post-truncating that brackets a sequence of IR operations if it can prove
that the outcome is the same.  The results are actually very good and match
what our expectations are from such a transformation, which makes me wonder
why does truncateToMinimalBitwidths not already produce comparable
results?.

Our observations are that with the new pass, a significant majority of
vectorised code showed some improvement, with results as high as 40X faster
than without.  Of the small number of tests that regressed in performance,
adding a #pragma clang unroll_count(N) eliminated the loss.  This could
probably be eliminate too by better tuning of the cost-models.

The re-invention is inadvertent, but in any event our new pass appears to
provide considerable additional performance improvements that are not
currently happening with the stock LLVM transformations.

I will have to contrive some tests to see why truncateToMinimalBitwidths
is not already doing this, and if there is something that we have done wrong
in our target that is breaking it, I will happily revert to an existing
solution.

            MartinO

From: James Molloy [mailto:james at jamesmolloy.co.uk] 
Sent: 28 May 2016 19:58
To: Martin.ORiordan at movidius.com <mailto:Martin.ORiordan at movidius.com> ;
David Majnemer; Dilan Manatunga
Cc: Clang Dev; Norman Rink
Subject: Re: [cfe-dev] Disable integer promotion (Dilan Manatunga via
cfe-dev)

Hi,

X86 has native support for i8 and i16. Aarch64 and ARM have native i8 and
i16 vector operations that are lowered and analysed using
truncateToMinimalBitwidths in LoopVectorize. Similarly for scalar code on
x86 truncation is done in instcombine. 

Why do you need to reinvent this?

Cheers,

James

On Sat, 28 May 2016 at 19:02, Martin J. O'Riordan via cfe-dev
<cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > wrote:

Instead of suppressing the integer promotion rules which are part of the ISO
C/C++ Standards, we wrote a new pass that analyses the IR to see if the
input values and output value were of an integer type that was narrower than
the promoted types used in the IR, and if we could prove that the outcome
would be identical if the type was unpromoted, then we reduced the IR to use
the narrower form.

In our case the motive was to enhance vectorisation because our vector ALU
can work with 8-, 16- and 32-bit integers natively, and handling vXi8
vectors ended was actually being promoted to multiple v4i32 vectors
requiring 4 times as many instructions as were necessary, or worse still,
fully scalarized.

This pass was presented by my colleague Stephen Rogers in a Lighting Talk
at the October 2015 LLVM Conference in San Jose and titled Integer Vector
Optimizations and Usual Arithmetic Conversions.  I cant find the paper
or slides on the LLVM Meetings page, perhaps these are not archived for
Lightning Talks (?), but as they are not large I have attached them here.

This approach allowed us to gain the optimisations that are possible with
our architecture which supports 8-, 16- and 32-bit native integer
computations (scalar and vector), while also respecting the ISO C and C++
Standards.  I am a lot more nervous of a front-end switch for this, as it
will lead to non-compliant programs, and in the presence of overloading and
template-instantiation it could also lead to very different programs, and
would recommend that we do not add a front-end switch which alters the
semantics of the language in this way.

It is my intention to publish this pass if it is of general interest, and
since it is target independent there are no particular blocking issue for me
(Patents, IP, etc.) to doing so.  I do have to catch-up on the HEAD revision
to ensure that it still works correctly, but it was working perfectly at SVN
#262824 and it will be a month before I have enough time to catch up on the
HEAD revision as we are busy with a product release that takes precedence.

All the best,

            MartinO

From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org
<mailto:cfe-dev-bounces at lists.llvm.org> ] On Behalf Of David Majnemer via
cfe-dev
Sent: 27 May 2016 19:55
To: Dilan Manatunga <manatunga at gmail.com <mailto:manatunga at gmail.com> >
Cc: clang developer list <cfe-dev at lists.llvm.org
<mailto:cfe-dev at lists.llvm.org> >; Norman Rink <norman.rink at tu-dresden.de
<mailto:norman.rink at tu-dresden.de> >; cfe-dev-request at lists.llvm.org
<mailto:cfe-dev-request at lists.llvm.org> 
Subject: Re: [cfe-dev] Disable integer promotion (Dilan Manatunga via
cfe-dev)

You could set IntWidth to 16 or 8 in clang, not unlike what MSP430 does:

https://github.com/llvm-mirror/clang/blob/3317d0fa0bd1f5c5adc14bcc6adc2a38ac
c9064b/lib/Basic/Targets.cpp#L6823

On Fri, May 27, 2016 at 10:32 AM, Dilan Manatunga via cfe-dev
<cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > wrote:

I need disabling this feature because I am researching architectures where
8-bit or 16-bit adds are preferred to 32-bit. So, integer promotion kinda
mucks everything up. I was hoping there was a way in clang to disable it,
instead of having to implement an LLVM pass to coalesce unnecessary
promotions. 

Thanks for catching the IR mistake. Should have double checked that. This
should be the correct version:

nt8_t a  = 1;

int8_t b = 2;

int8_t c = a + b

The LLVM IR will be:

%x = sext i8 %a to i32

%y = sext i8 %b to i32

%z = add nsw i32 %x, %y

%c = trunc i32 %z to i8

Instead, it would simply compile to:

$c = add nsw i8 %z, $y

-Dilan

On Fri, May 27, 2016 at 5:30 AM Norman Rink via cfe-dev
<cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > wrote:

Hi Dilan,

I would like to second your request for an option to disable integer
promotion. What do you need it for?

As far as I am aware, there is no such option and the code that implements
integer promotion is somewhat scattered across ³SemaExpr.cpp².

Also, I think your example code snippet contains a few ³i32²s too many. It
will be clearer to people what you are looking for if your code example is
consistent with your question.

Best,

Norman

>Message: 1
>Date: Fri, 27 May 2016 01:50:12 +0000
>From: Dilan Manatunga via cfe-dev <cfe-dev at lists.llvm.org
<mailto:cfe-dev at lists.llvm.org> >
>To: cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> 
>Subject: [cfe-dev] Disable integer promotion
>Message-ID:
>       <CAHpgGu4=jFC9ohQQZZMp2NMG3Hw0sE5U4-Lqrgb+6gcXv9SEtQ at mail.gmail.com
<mailto:jFC9ohQQZZMp2NMG3Hw0sE5U4-Lqrgb%2B6gcXv9SEtQ at mail.gmail.com> >
>Content-Type: text/plain; charset="utf-8"
>
>Is there a way to disable integer promotion when performing math
>operations. For example, when compiling a statement such as this:
>int8_t a  = 1;
>int8_t b = 2;
>int8_t c = a + b
>
>The LLVM IR will be:
>%x = sext i32 %a to i32
>%y = sext i32 %b to i32
>%z = add nsw i32 %x, %y
>%c = trunc i32 %z to i16
>
>Instead, it would simply compile to:
>$c = add nsw i32 %z, $y
>
>-Dilan Manatunga
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL:
><http://lists.llvm.org/pipermail/cfe-dev/attachments/20160527/4a7920ab/att
>achment-0001.html>
>
>------------------------------

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> 
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> 
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> 
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160531/c9de1373/attachment.html>