<p><br>

On Oct 27, 2013 2:16 PM, "David Nadlinger" <<a href="mailto:code@klickverbot.at">code@klickverbot.at</a>> wrote:<br>

><br>

> The following piece of IR is a fixed point for opt -std-compile-opts/-O3:<br>

><br>

> ---<br>

> target datalayout =<br>

> "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"<br>

> target triple = "x86_64-unknown-linux-gnu"<br>

><br>

> ; Function Attrs: nounwind readonly<br>

> define i32 @get32Bits(i8* inreg nocapture readonly %x_arg) #0 {<br>

>   %tmp1 = getelementptr inbounds i8* %x_arg, i64 3<br>

>   %tmp2 = load i8* %tmp1, align 1<br>

>   %tmp3 = zext i8 %tmp2 to i32<br>

>   %tmp4 = shl nuw nsw i32 %tmp3, 24<br>

>   %tmp6 = getelementptr inbounds i8* %x_arg, i64 2<br>

>   %tmp7 = load i8* %tmp6, align 1<br>

>   %tmp8 = zext i8 %tmp7 to i32<br>

>   %tmp9 = shl nuw nsw i32 %tmp8, 16<br>

>   %tmp10 = or i32 %tmp9, %tmp4<br>

>   %tmp12 = getelementptr inbounds i8* %x_arg, i64 1<br>

>   %tmp13 = load i8* %tmp12, align 1<br>

>   %tmp14 = zext i8 %tmp13 to i32<br>

>   %tmp15 = shl nuw nsw i32 %tmp14, 8<br>

>   %tmp16 = or i32 %tmp10, %tmp15<br>

>   %tmp19 = load i8* %x_arg, align 4<br>

>   %tmp20 = zext i8 %tmp19 to i32<br>

>   %tmp21 = or i32 %tmp16, %tmp20<br>

>   ret i32 %tmp21<br>

> }<br>

><br>

> attributes #0 = { nounwind readonly }<br>

> ---<br>

><br>

> Is there a reason why this can't be optimized down to a single i32<br>

> load based on the IR semantics, or is this just a missed optimization<br>

> opportunity?<br>

><br>

My guess is that this is a missed optimization, but in real life, all projects i have worked fix this in the C or C++ code using macros that change what instructions are used based on target platform and its endedness.<br>


James<br>

</p>