[llvm-dev] Handling native i16 types in clang and opt
Alex Susu via llvm-dev
llvm-dev at lists.llvm.org
Wed Jul 25 08:12:19 PDT 2018
Hello.
I come back to this older thread.
I'd also like to thank Peter Lawrence for the insightful answer (see below his email,
if interested). Actually I would like to add that the C11 standard, Section 6.3.1.1, talks
about integer promotions, which explains why the C language requires short arithmetic to
be promoted to the size of int. See also
https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules .
I would like to give an answer to Craig Topper: indeed I have a simple very
interesting case where these promotions happen - the Floyd-Warshall algorithm, with the
below program (also try the example at
https://www.geeksforgeeks.org/integer-promotions-in-c/) . But in all cases do give clang
-O0 to emit unoptimized LLVM IR.
#define SIZE 128
short path[SIZE][SIZE];
void FloydWarshall() {
int i, j, k;
for (k = 0; k < SIZE; k++) {
for (i = 0; i < SIZE; ++i) {
short pik = path[i][k];
for (j = 0; j < SIZE; j++) {
path[i][j] = path[i][j] < pik + path[k][j] ?
path[i][j] : pik + path[k][j];
}
}
}
}
The innermost's loop body is translated to the following UNoptimized LLVM IR code -
see lines with comment "IMPORTANT":
for.body8: ; preds = %for.cond6
%6 = load i32, i32* %j, align 4
%idxprom9 = sext i32 %6 to i64
%7 = load i32, i32* %i, align 4
%idxprom10 = sext i32 %7 to i64
%arrayidx11 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]*
@path, i64 0, i64 %idxprom10
%arrayidx12 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx11, i64
0, i64 %idxprom9
%8 = load i16, i16* %arrayidx12
%conv = sext i16 %8 to i32 ; IMPORTANT
%9 = load i16, i16* %pik
%conv13 = sext i16 %9 to i32 ; IMPORTANT
%10 = load i32, i32* %j, align 4
%idxprom14 = sext i32 %10 to i64
%11 = load i32, i32* %k, align 4
%idxprom15 = sext i32 %11 to i64
%arrayidx16 = getelementptr inbounds [256 x [256 x i16]], [256 x [256 x i16]]*
@path, i64 0, i64 %idxprom15
%arrayidx17 = getelementptr inbounds [256 x i16], [256 x i16]* %arrayidx16, i64
0, i64 %idxprom14
%12 = load i16, i16* %arrayidx17, align 2, !dbg !61
%conv18 = sext i16 %12 to i32
%add = add nsw i32 %conv13, %conv18
%add = add nsw i16 %9, %12 ; IMPORTANT
%cmp19 = icmp slt i32 %conv, %add
%cmp19 = icmp slt i16 %8, %add ; IMPORTANT
br i1 %cmp19, label %cond.true, label %cond.false
Best regards,
Alex
On 5/21/2017 11:40 AM, Craig Topper wrote:
> Do you have a simple test case you can send? I'm having trouble replicating this on
> x86-64 with the simplest possible test.
>
> unsigned short foo(unsigned short a, unsigned short b) {
> return a + b;
> }
>
> This gives IR with no mention of i32. Maybe there's somethings misconfigured for your
> target or I need a more complex test case.
>
> ~Craig
On 5/31/2017 11:04 PM, Peter Lawrence via llvm-dev wrote:
> Alex,
> The C language requires “short” arithmetic to be promoted to the size
> of “int”, hence the conversions to “int” and later the optimizations back to “short”
> but only when the optimizer can prove that the result will be the same.
>
> If your machine has only 16-bit registers and arithmetic then you should
> change clang. There won’t be any conversions in the IR (but there are
> A variety of problems with LLVM’s optimizations that you will run into ! ).
>
> If your machine has both 16-bit and 32-bit registers and arithmetic, then
> you probably must leave clang alone. I am inclined to read your email
> as implying this is the case for you.
>
> Do you really need signed div and rem, usually people don’t need the
> quirky results of signed div and rem (in fact more often than not they
> need results consistent with two’s-complement shifts and masks) ?
>
> If unsigned is OK then CI Should (?) transform unsigned 32-bit div
> and rem of unsigned short into 16-bit unsigned div and rem. (Can someone
> verify / confirm that I’m thinking correctly here ?)
>
>
> The only thing I can think of off the top of my head for getting 16-bit sdiv
> and srem instructions emitted on a 32-bit machine is with inline-asm ?
>
>
> BTW, IIRC sdiv and srem also inhibit vectorization to 16-bit SIMD
> instructions for the same reason (similarly shifts become undef for different
> shift amounts in 16-bit), I wonder what work-arounds folks use in
> this context, perhaps someone else on this list can chime in ?
>
>
> -Peter Lawrence.
>
>
> On Sun, May 21, 2017 at 1:22 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Hello.
> My target architecture supports natively 16 bit integers (i16).
>
> Whenever I write in C programs using only short types, clang compiles the program
> to LLVM and converts the i16 data to i32 to perform arithmetic operations and then
> truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass
> removes these conversions back and forth from i16, except for the (s)div LLVM IR
> operation.
>
> Is there a way to avoid these conversion made by clang back and forth from i16 to
> i32, if my source program uses only short types?
> Otherwise, how can I make the IC pass handle sdiv the way it does with add (sub),
> mul? (that is, if the input operands are i16, the add/mul operation will eventually be
> i16, with any unnecessary conversion back and forth from i32 removed).
>
> Thank you,
> Alex
More information about the llvm-dev
mailing list