[cfe-dev] Disable integer promotion (Dilan Manatunga via cfe-dev)

Nemanja Ivanovic via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 1 10:57:40 PDT 2016


Martin,
I am interested to know as well. Perhaps it is just that your target's
TargetLowering constructor has a call to addRegisterClass() for that value
type, thereby making it a legal type. Looking through the code, it appears
this is the mechanism for TTI to enquire about the legality of a type.

Nemanja

On Tue, May 31, 2016 at 2:49 PM, Martin J. O'Riordan via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> No rush on an answer James I won’t get a chance to follow up on this for a
> couple of weeks anyway, so enjoy your vacation.
>
>
>
> I think that we have the TTI correct, but there are other problems.  For
> instance ‘getNumberRegisters(true)’ is awkward because we have both
> 32-bit SIMD and 128-bit SIMD registers.  Similarly ‘
> getRegisterBitWidth(true)’.  We just return ‘32’ and ‘128’ respectively,
> because the TTI interface does not allow us to discriminate for 32-bit
> versus 128-bit vectors.
>
>
>
> The other hooks are for costs, and for the most part they look reasonable,
> though I delegate to the ‘BasicTTIImpl’ implementation for the
> interleaved memory cost because I have not yet measured the impact of
> changing this.  Is there any particular cost hook that is more likely than
> another to influence ‘truncateToMinimalBitwidths’?
>
>
>
> Regarding:
>
>
>
> Both of these should be kicking in already if your target advertises i8 as
> being legal for scalars or vectors
>
>
>
> I am only aware of the DataLayout ‘-n8:16:32’ for this, is there an
> equivalent for vectors?  I also have ‘-v16:16-v32:32-v128:64’, but these
> only deal with the aggregate size of the vector and not it’s element type.
> Am I missing a hook in the TTI or STI perhaps?
>
>
>
> Thanks,
>
>
>
>             MartinO
>
>
>
> *From:* James Molloy [mailto:james at jamesmolloy.co.uk]
> *Sent:* 30 May 2016 21:06
> *To:* Norman Rink; Martin J. O'Riordan; David Majnemer; Dilan Manatunga
> *Cc:* Clang Dev
>
> *Subject:* Re: [cfe-dev] Disable integer promotion (Dilan Manatunga via
> cfe-dev)
>
>
>
> Hi,
>
> I'm on vacation at the moment with only a phone to reply on but...
>
> TruncateToMinimalBitwidths is, as you point out, only for vectorisation.
> There are three cases:
>
> 1. I8 and i16 are never supported on the target.
> 2. I8 and i16 are supported for vectors but not for scalars (ARM)
> 3. I8 and i16 are supported for scalars and vectors (x86)
>
> (3) is handled by simplifyDemandedBits in the instruction combiner, so it
> will work on scalars and vectors but will only ever truncate promotions if
> the smaller integer operation is valid on the target.
>
> (2) is handled by truncateToMinimalBitwidths where we need to, as part of
> the vectorisation profitability analysis, determine what the loop will look
> like after vectorisation. We insert trunc and ext nodes simply as a
> shortcut - we could elide the promotions as you do, but there are corner
> cases that make it a bit awkward so we just add more casts and let a
> cleanup pass (instcombine) remove them intelligently.
>
> I hope this answers your queries a bit more? Both of these should be
> kicking in already if your target advertises i8 as being legal for scalars
> or vectors, so I would check your target transform info to ensure the
> legality hook is returning true when it should.
>
> Cheers
>
> James
>
> On Mon, 30 May 2016 at 16:11, Norman Rink <norman.rink at tu-dresden.de>
> wrote:
>
> Hi all,
>
>
>
> I realize this is potentially only tangent to the ongoing discussion, but
> does anyone have significant experience with how integer promotion
> interacts with vectorization? When I looked into this interaction, I did
> not have the time to conduct a careful analysis, but I have reason to
> believe that integer promotion can get in the way of vectorization, thereby
> limiting its benefits. Can anyone comment? Thanks.
>
>
>
> Best,
>
>
>
> Norman
>
>
>
>
>
> *From: *"Martin J. O'Riordan" <martin.oriordan at movidius.com>
> *Organization: *Movidius Ltd.
> *Date: *Monday 30 May 2016 14:53
> *To: *'James Molloy' <james at jamesmolloy.co.uk>, 'David Majnemer' <
> david.majnemer at gmail.com>, 'Dilan Manatunga' <manatunga at gmail.com>
> *Cc: *'Clang Dev' <cfe-dev at lists.llvm.org>, Norman Rink <
> norman.rink at tu-dresden.de>
> *Subject: *RE: [cfe-dev] Disable integer promotion (Dilan Manatunga via
> cfe-dev)
>
>
>
> Hi James and thanks for pointing out the existence of this transformation,
> we were quite unaware of it.
>
>
>
> As it happens, I am highly allergic to re-invention and avoid doing so
> whenever possible; the only reason an already overburdened team of 2
> developers will re-invent is because they are unaware of an existing
> solution which is not difficult given the scope and complexity of LLVM.
>
>
>
> So far as I can tell, ‘truncateToMinimalBitwidths’ is always enabled, so
> it is not a target specific selection and our target should automatically
> reap the rewards of this optimisation pass.  I certainly cannot find a
> switch to enable or disable it.  But in fact we are not seeing anywhere
> near the benefits we would expect.
>
>
>
> void InnerLoopVectorizer::truncateToMinimalBitwidths() {
>
>   // For every instruction `I` in MinBWs, truncate the operands, create a
>
>   // truncated version of `I` and reextend its result. InstCombine runs
>
>   // later and will remove any ext/trunc pairs.
>
>
>
> This appears to only run on inner-loops, and it appear to insert
> narrowings/truncations and subsequent widenings/extendings into the IR
> chains.
>
>
>
> The DataLayout for our target includes “-n8:16:32”, so it should see the
> benefits of optimisations for multiple native integer support.  We also
> provide both 32-bit SIMD and 128-bit SIMD native support.
>
>
>
> The pass that we wrote is quite different.  It is run as a machine pass
> prior to loop-unrolling and vectorisation, and instead of pre-truncating
> and post-extending IR chains, it removes the existing pre-extending and
> post-truncating that brackets a sequence of IR operations if it can prove
> that the outcome is the same.  The results are actually very good and match
> what our expectations are from such a transformation, which makes me wonder
> “why does ‘truncateToMinimalBitwidths’ not already produce comparable
> results?”.
>
>
>
> Our observations are that with the new pass, a significant majority of
> vectorised code showed some improvement, with results as high as 40X faster
> than without.  Of the small number of tests that regressed in performance,
> adding a ‘#pragma clang unroll_count(N)’ eliminated the loss.  This could
> probably be eliminate too by better tuning of the cost-models.
>
>
>
> The re-invention is inadvertent, but in any event our new pass appears to
> provide considerable additional performance improvements that are not
> currently happening with the stock LLVM transformations.
>
>
>
> I will have to contrive some tests to see why ‘truncateToMinimalBitwidths’
> is not already doing this, and if there is something that we have done
> wrong in our target that is breaking it, I will happily revert to an
> existing solution.
>
>
>
>             MartinO
>
>
>
> *From:* James Molloy [mailto:james at jamesmolloy.co.uk
> <james at jamesmolloy.co.uk>]
> *Sent:* 28 May 2016 19:58
> *To:* Martin.ORiordan at movidius.com; David Majnemer; Dilan Manatunga
> *Cc:* Clang Dev; Norman Rink
> *Subject:* Re: [cfe-dev] Disable integer promotion (Dilan Manatunga via
> cfe-dev)
>
>
>
> Hi,
>
> X86 has native support for i8 and i16. Aarch64 and ARM have native i8 and
> i16 vector operations that are lowered and analysed using
> truncateToMinimalBitwidths in LoopVectorize. Similarly for scalar code on
> x86 truncation is done in instcombine.
>
> Why do you need to reinvent this?
>
> Cheers,
>
> James
>
> On Sat, 28 May 2016 at 19:02, Martin J. O'Riordan via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Instead of suppressing the integer promotion rules which are part of the
> ISO C/C++ Standards, we wrote a new pass that analyses the IR to see if the
> input values and output value were of an integer type that was narrower
> than the promoted types used in the IR, and if we could prove that the
> outcome would be identical if the type was unpromoted, then we reduced the
> IR to use the narrower form.
>
>
>
> In our case the motive was to enhance vectorisation because our vector ALU
> can work with 8-, 16- and 32-bit integers natively, and handling ‘vXi8’
> vectors ended was actually being promoted to multiple ‘v4i32’ vectors
> requiring 4 times as many instructions as were necessary, or worse still,
> fully scalarized.
>
>
>
> This pass was presented by my colleague Stephen Rogers in a “Lighting
> Talk” at the October 2015 LLVM Conference in San Jose and titled “Integer
> Vector Optimizations and “Usual Arithmetic Conversions””.  I can’t find
> the paper or slides on the LLVM Meetings page, perhaps these are not
> archived for Lightning Talks (?), but as they are not large I have attached
> them here.
>
>
>
> This approach allowed us to gain the optimisations that are possible with
> our architecture which supports 8-, 16- and 32-bit native integer
> computations (scalar and vector), while also respecting the ISO C and C++
> Standards.  I am a lot more nervous of a front-end switch for this, as it
> will lead to non-compliant programs, and in the presence of overloading and
> template-instantiation it could also lead to very different programs, and
> would recommend that we do not add a front-end switch which alters the
> semantics of the language in this way.
>
>
>
> It is my intention to publish this pass if it is of general interest, and
> since it is target independent there are no particular blocking issue for
> me (Patents, IP, etc.) to doing so.  I do have to catch-up on the HEAD
> revision to ensure that it still works correctly, but it was working
> perfectly at SVN #262824 and it will be a month before I have enough time
> to catch up on the HEAD revision as we are busy with a product release that
> takes precedence.
>
>
>
> All the best,
>
>
>
>             MartinO
>
>
>
> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of *David
> Majnemer via cfe-dev
> *Sent:* 27 May 2016 19:55
> *To:* Dilan Manatunga <manatunga at gmail.com>
> *Cc:* clang developer list <cfe-dev at lists.llvm.org>; Norman Rink <
> norman.rink at tu-dresden.de>; cfe-dev-request at lists.llvm.org
> *Subject:* Re: [cfe-dev] Disable integer promotion (Dilan Manatunga via
> cfe-dev)
>
>
>
> You could set IntWidth to 16 or 8 in clang, not unlike what MSP430 does:
>
>
> https://github.com/llvm-mirror/clang/blob/3317d0fa0bd1f5c5adc14bcc6adc2a38acc9064b/lib/Basic/Targets.cpp#L6823
>
>
>
> On Fri, May 27, 2016 at 10:32 AM, Dilan Manatunga via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> I need disabling this feature because I am researching architectures where
> 8-bit or 16-bit adds are preferred to 32-bit. So, integer promotion kinda
> mucks everything up. I was hoping there was a way in clang to disable it,
> instead of having to implement an LLVM pass to coalesce unnecessary
> promotions.
>
>
>
> Thanks for catching the IR mistake. Should have double checked that. This
> should be the correct version:
>
> nt8_t a  = 1;
>
> int8_t b = 2;
>
> int8_t c = a + b
>
>
>
> The LLVM IR will be:
>
> %x = sext i8 %a to i32
>
> %y = sext i8 %b to i32
>
> %z = add nsw i32 %x, %y
>
> %c = trunc i32 %z to i8
>
>
>
> Instead, it would simply compile to:
>
> $c = add nsw i8 %z, $y
>
>
>
> -Dilan
>
>
>
>
>
> On Fri, May 27, 2016 at 5:30 AM Norman Rink via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Hi Dilan,
>
> I would like to second your request for an option to disable integer
> promotion. What do you need it for?
>
> As far as I am aware, there is no such option and the code that implements
> integer promotion is somewhat scattered across ³SemaExpr.cpp².
>
> Also, I think your example code snippet contains a few ³i32²s too many. It
> will be clearer to people what you are looking for if your code example is
> consistent with your question.
>
> Best,
>
> Norman
>
>
> >Message: 1
> >Date: Fri, 27 May 2016 01:50:12 +0000
> >From: Dilan Manatunga via cfe-dev <cfe-dev at lists.llvm.org>
> >To: cfe-dev at lists.llvm.org
> >Subject: [cfe-dev] Disable integer promotion
> >Message-ID:
> >       <CAHpgGu4=
> jFC9ohQQZZMp2NMG3Hw0sE5U4-Lqrgb+6gcXv9SEtQ at mail.gmail.com>
> >Content-Type: text/plain; charset="utf-8"
> >
> >Is there a way to disable integer promotion when performing math
> >operations. For example, when compiling a statement such as this:
> >int8_t a  = 1;
> >int8_t b = 2;
> >int8_t c = a + b
> >
> >The LLVM IR will be:
> >%x = sext i32 %a to i32
> >%y = sext i32 %b to i32
> >%z = add nsw i32 %x, %y
> >%c = trunc i32 %z to i16
> >
> >Instead, it would simply compile to:
> >$c = add nsw i32 %z, $y
> >
> >-Dilan Manatunga
> >-------------- next part --------------
> >An HTML attachment was scrubbed...
> >URL:
> ><
> http://lists.llvm.org/pipermail/cfe-dev/attachments/20160527/4a7920ab/att
> >achment-0001.html>
> >
> >------------------------------
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160601/18bb4656/attachment.html>


More information about the cfe-dev mailing list