<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Malgun Gothic";
panose-1:2 11 5 3 2 0 0 2 0 4;}
@font-face
{font-family:"\@Malgun Gothic";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi All,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I am looking at a simple example as below.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"<o:p></o:p></p>
<p class="MsoNormal">target triple = "aarch64-unknown-linux-gnu"<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">%struct.base_s = type { %struct.range, i64, i64, i64*, i32, [4 x i32], [274 x %struct.match], i32, i32, i8, i8, i8, i32, i32, i32, [16 x [768 x i16]], [12 x [16 x i16]], [12 x i16], [12 x i16], [12 x i16], [12 x i16], [12 x [16 x i16]],
[4 x [64 x i16]], [114 x i16], [16 x i16], %struct.length, %struct.length, [4 x [64 x i32]], [4 x [128 x i32]], i32, i32, [16 x i32], i32, i32, i32, [4096 x %struct.opt] }<o:p></o:p></p>
<p class="MsoNormal">%struct.range = type { i64, i64, i32, i8, i64, i32, i32, [53 x i32], [53 x i16*] }<o:p></o:p></p>
<p class="MsoNormal">%struct.match = type { i32, i32 }<o:p></o:p></p>
<p class="MsoNormal">%struct.length = type { i16, i16, [16 x [8 x i16]], [16 x [8 x i16]], [256 x i16], [16 x [272 x i32]], i32, [16 x i32] }<o:p></o:p></p>
<p class="MsoNormal">%struct.opt = type { i32, i8, i8, i32, i32, i32, i32, i32, [4 x i32] }<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">define i32 @test(i32 %len, %struct.base_s* nocapture readonly %obj) {<o:p></o:p></p>
<p class="MsoNormal">entry:<o:p></o:p></p>
<p class="MsoNormal"> br label %while.cond<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">while.cond: ; preds = %while.cond, %entry<o:p></o:p></p>
<p class="MsoNormal"> %i.0 = phi i32 [ 0, %entry ], [ %inc, %while.cond ]<o:p></o:p></p>
<p class="MsoNormal"> %idxprom = zext i32 %i.0 to i64<o:p></o:p></p>
<p class="MsoNormal"> %len1 = getelementptr inbounds %struct.base_s, %struct.base_s* %obj, i64 0, i32 6, i64 %idxprom, i32 0
<o:p></o:p></p>
<p class="MsoNormal"> %0 = load i32, i32* %len1, align 4<o:p></o:p></p>
<p class="MsoNormal"> %cmp = icmp ult i32 %0, %len<o:p></o:p></p>
<p class="MsoNormal"> %inc = add i32 %i.0, 1<o:p></o:p></p>
<p class="MsoNormal"> br i1 %cmp, label %while.cond, label %while.end<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">while.end: ; preds = %while.cond<o:p></o:p></p>
<p class="MsoNormal"> ret i32 %i.0<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I expected the LSR pass extracts the loop invariant part from `%len1 = getelementptr` and hoists it to preheader. It could cause a new IV for the loop dependent part from gep inside loop and `%0 = load` could use it. However, it looks the
`IVUsers` does process the `%idxprom = zext`. I can see the `SCEVAddRecExpr` and `SCEVAddExpr` are handled in `isInteresting` function. It seems LSR pass does not also handle the `zext` for `IVChain`. If I remove the `%idxprom = zext` manually on above example,
I can see LSR works as the expectation. Does anyone know why the `zext` is not supported on IVUsers and LSR? Does it make LSR difficult to construct formulas and compare them? If I missed something, please let me know.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">For reference, the assembly output of above example with `-O3` is as below.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">test:<o:p></o:p></p>
<p class="MsoNormal"> mov w8, w0<o:p></o:p></p>
<p class="MsoNormal"> mov w0, #-1<o:p></o:p></p>
<p class="MsoNormal">.LBB0_1:<o:p></o:p></p>
<p class="MsoNormal"> add w0, w0, #1<o:p></o:p></p>
<p class="MsoNormal"> add x9, x1, w0, uxtw #3<o:p></o:p></p>
<p class="MsoNormal"> ldr w9, [x9, #724]<o:p></o:p></p>
<p class="MsoNormal"> cmp w9, w8<o:p></o:p></p>
<p class="MsoNormal"> b.lo .LBB0_1<o:p></o:p></p>
<p class="MsoNormal"> Ret<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If I remove the `zext`, the output is as below and the loop has one less instruction against above output.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">test:<o:p></o:p></p>
<p class="MsoNormal"> add x9, x1, #724<o:p></o:p></p>
<p class="MsoNormal"> mov x8, #-1<o:p></o:p></p>
<p class="MsoNormal">.LBB0_1:<o:p></o:p></p>
<p class="MsoNormal"> ldr w10, [x9], #8<o:p></o:p></p>
<p class="MsoNormal"> add x8, x8, #1<o:p></o:p></p>
<p class="MsoNormal"> cmp w10, w0<o:p></o:p></p>
<p class="MsoNormal"> b.lo .LBB0_1<o:p></o:p></p>
<p class="MsoNormal"> mov x0, x8<o:p></o:p></p>
<p class="MsoNormal"> ret<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The IR code, in which the `zext` is removed, is as below.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"<o:p></o:p></p>
<p class="MsoNormal">target triple = "aarch64-unknown-linux-gnu"<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">%struct.base_s = type { %struct.range, i64, i64, i64*, i32, [4 x i32], [274 x %struct.match], i32, i32, i8, i8, i8, i32, i32, i32, [16 x [768 x i16]], [12 x [16 x i16]], [12 x i16], [12 x i16], [12 x i16], [12 x i16], [12 x [16 x i16]],
[4 x [64 x i16]], [114 x i16], [16 x i16], %struct.length, %struct.length, [4 x [64 x i32]], [4 x [128 x i32]], i32, i32, [16 x i32], i32, i32, i32, [4096 x %struct.opt] }<o:p></o:p></p>
<p class="MsoNormal">%struct.range = type { i64, i64, i32, i8, i64, i32, i32, [53 x i32], [53 x i16*] }<o:p></o:p></p>
<p class="MsoNormal">%struct.match = type { i32, i32 }<o:p></o:p></p>
<p class="MsoNormal">%struct.length = type { i16, i16, [16 x [8 x i16]], [16 x [8 x i16]], [256 x i16], [16 x [272 x i32]], i32, [16 x i32] }<o:p></o:p></p>
<p class="MsoNormal">%struct.opt = type { i32, i8, i8, i32, i32, i32, i32, i32, [4 x i32] }<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">;define i32 @test(i32 %len, %struct.base_s* nocapture readonly %obj) {<o:p></o:p></p>
<p class="MsoNormal">define i64 @test(i32 %len, %struct.base_s* nocapture readonly %obj) {<o:p></o:p></p>
<p class="MsoNormal">entry:<o:p></o:p></p>
<p class="MsoNormal"> br label %while.cond<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">while.cond: ; preds = %while.cond, %entry<o:p></o:p></p>
<p class="MsoNormal">; %i.0 = phi i32 [ 0, %entry ], [ %inc, %while.cond ]<o:p></o:p></p>
<p class="MsoNormal"> %i.0 = phi i64 [ 0, %entry ], [ %inc, %while.cond ]<o:p></o:p></p>
<p class="MsoNormal">; %idxprom = zext i32 %i.0 to i64<o:p></o:p></p>
<p class="MsoNormal">; %len1 = getelementptr inbounds %struct.base_s, %struct.base_s* %obj, i64 0, i32 6, i64 %idxprom, i32 0<o:p></o:p></p>
<p class="MsoNormal"> %len1 = getelementptr inbounds %struct.base_s, %struct.base_s* %obj, i64 0, i32 6, i64 %i.0, i32 0<o:p></o:p></p>
<p class="MsoNormal"> %0 = load i32, i32* %len1, align 4<o:p></o:p></p>
<p class="MsoNormal"> %cmp = icmp ult i32 %0, %len<o:p></o:p></p>
<p class="MsoNormal">; %inc = add i32 %i.0, 1<o:p></o:p></p>
<p class="MsoNormal"> %inc = add i64 %i.0, 1<o:p></o:p></p>
<p class="MsoNormal"> br i1 %cmp, label %while.cond, label %while.end<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">while.end: ; preds = %while.cond<o:p></o:p></p>
<p class="MsoNormal">; ret i32 %i.0<o:p></o:p></p>
<p class="MsoNormal"> ret i64 %i.0<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks<o:p></o:p></p>
<p class="MsoNormal">JinGu Kang<o:p></o:p></p>
</div>
</body>
</html>