[llvm-dev] Handling native i16 types in clang and opt

Wed Jul 25 08:12:19 PDT 2018

   Hello.
     I come back to this older thread.
     I'd also like to thank Peter Lawrence for the insightful answer (see below his email, 
if interested). Actually I would like to add that the C11 standard, Section 6.3.1.1, talks 
about integer promotions, which explains why the C language requires short arithmetic to 
be promoted to the size of int. See also 
https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules .

     I would like to give an answer to Craig Topper: indeed I have a simple very 
interesting case where these promotions happen - the Floyd-Warshall algorithm, with the 
below program (also try the example at 
https://www.geeksforgeeks.org/integer-promotions-in-c/) . But in all cases do give clang 
-O0 to emit unoptimized LLVM IR.
       #define SIZE 128
       short path[SIZE][SIZE];
       void FloydWarshall() {
         int i, j, k;

         for (k = 0; k < SIZE; k++) {
             for (i = 0; i < SIZE; ++i) {
                 short pik = path[i][k];
                 for (j = 0; j < SIZE; j++) {
                     path[i][j] = path[i][j] < pik + path[k][j] ?
                               path[i][j] : pik + path[k][j];
                 }
             }
         }
       }

     The innermost's loop body is translated to the following UNoptimized LLVM IR code - 
see lines with comment "IMPORTANT":
         for.body8:                                        ; preds = %for.cond6
           %6 = load i32, i32* %j, align 4
           %idxprom9 = sext i32 %6 to i64
           %7 = load i32, i32* %i, align 4
           %idxprom10 = sext i32 %7 to i64
           %arrayidx11 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* 
@path, i64 0, i64 %idxprom10
           %arrayidx12 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx11, i64 
0, i64 %idxprom9
           %8 = load i16, i16* %arrayidx12
           %conv = sext i16 %8 to i32 ; IMPORTANT
           %9 = load i16, i16* %pik
           %conv13 = sext i16 %9 to i32 ; IMPORTANT
           %10 = load i32, i32* %j, align 4
           %idxprom14 = sext i32 %10 to i64
           %11 = load i32, i32* %k, align 4
           %idxprom15 = sext i32 %11 to i64
           %arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]* 
@path, i64 0, i64 %idxprom15
           %arrayidx17 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx16, i64 
0, i64 %idxprom14
           %12 = load i16, i16* %arrayidx17, align 2, !dbg !61
           %conv18 = sext i16 %12 to i32
           %add = add nsw i32 %conv13, %conv18
           %add = add nsw i16 %9, %12 ; IMPORTANT
           %cmp19 = icmp slt i32 %conv, %add
           %cmp19 = icmp slt i16 %8, %add ; IMPORTANT
           br i1 %cmp19, label %cond.true, label %cond.false

   Best regards,
     Alex


On 5/21/2017 11:40 AM, Craig Topper wrote:
> Do you have a simple test case you can send?  I'm having trouble replicating this on
> x86-64 with the simplest possible test.
>
> unsigned short foo(unsigned short a, unsigned short b) {
>   return a + b;
> }
>
> This gives IR with no mention of i32. Maybe there's somethings misconfigured for your
> target or I need a more complex test case.
>
> ~Craig


On 5/31/2017 11:04 PM, Peter Lawrence via llvm-dev wrote:
 > Alex,
 >         The C language requires “short” arithmetic to be promoted to the size
 > of “int”, hence the conversions to “int” and later the optimizations back to “short”
 > but only when the optimizer can prove that the result will be the same.
 >
 > If your machine has only 16-bit registers and arithmetic then you should
 > change clang. There won’t be any conversions in the IR (but there are
 > A variety of problems with LLVM’s optimizations that you will run into ! ).
 >
 > If your machine has both 16-bit and 32-bit registers and arithmetic, then
 > you probably must leave clang alone. I am inclined to read your email
 > as implying this is the case for you.
 >
 > Do you really need signed div and rem, usually people don’t need the
 > quirky results of signed div and rem (in fact more often than not they
 > need results consistent with two’s-complement shifts and masks) ?
 >
 > If unsigned is OK then CI Should (?) transform unsigned 32-bit div
 > and rem of unsigned short into 16-bit unsigned div and rem. (Can someone
 > verify / confirm that I’m thinking correctly here ?)
 >
 >
 > The only thing I can think of off the top of my head for getting 16-bit sdiv
 > and srem instructions emitted on a 32-bit machine is with inline-asm ?
 >
 >
 > BTW, IIRC sdiv and srem also inhibit vectorization to 16-bit SIMD
 > instructions for the same reason (similarly shifts become undef for different
 > shift amounts in 16-bit), I wonder what work-arounds folks use in
 > this context, perhaps someone else on this list can chime in ?
 >
 >
 > -Peter Lawrence.
 >

>
> On Sun, May 21, 2017 at 1:22 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>       Hello.
>         My target architecture supports natively 16 bit integers (i16).
>
>         Whenever I write in C programs using only short types, clang compiles the program
>     to LLVM and converts the i16 data to i32 to perform arithmetic operations and then
>     truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass
>     removes these conversions back and forth from i16, except for the (s)div LLVM IR
>     operation.
>
>         Is there a way to avoid these conversion made by clang back and forth from i16 to
>     i32, if my source program uses only short types?
>         Otherwise, how can I make the IC pass handle sdiv the way it does with add (sub),
>     mul? (that is, if the input operands are i16, the add/mul operation will eventually be
>     i16, with any unnecessary conversion back and forth from i32 removed).
>
>       Thank you,
>         Alex