<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Apr 17, 2012, at 6:09 AM, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-family: Optima; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; ">I really dislike hasOneUse-based "solutions" at the IR / InstCombine layer. They result in strange artifacts during optimization: places where adding similar code turns off optimizations because we fold the similar bits together and reuse parts of the computation.</div></blockquote></div><br><div>In my example, it won't even work. The second use of 'x+x' only emerges after GVN, after the harm has been done.</div><div><br></div><div>However, I am more concerned about hiding things from scalar evolution than from InstCombine itself. Here is an example I was expecting to fail:</div><div><br></div><div><div>$ cat small2.c </div><div>struct desc {</div><div>  unsigned size : 2;</div><div>  unsigned skip : 6;</div><div>  unsigned value;</div><div>};</div><div><br></div><div>void f(struct desc d, unsigned *p) {</div><div>  for (unsigned i = 0; i != 64; ++i) {</div><div>    *p = 0; p += d.skip;</div><div>    *p = 1; p += d.skip;</div><div>  }</div><div>}</div></div><div><br></div><div>It does the right thing, though:</div><div><br></div><div><div>define void @f(i64 %d.coerce, i32* nocapture %p) nounwind uwtable ssp {</div><div>entry:</div><div>  %0 = lshr i64 %d.coerce, 2</div><div>  %bf.clear = and i64 %0, 63</div><div>  %add.ptr.sum = shl nuw nsw i64 %bf.clear, 1</div><div>  br label %for.body</div><div><br></div><div>for.body:                                         ; preds = %entry, %for.body</div><div>  %p.addr.06 = phi i32* [ %p, %entry ], [ %add.ptr3, %for.body ]</div><div>  %i.05 = phi i32 [ 0, %entry ], [ %inc, %for.body ]</div><div>  store i32 0, i32* %p.addr.06, align 4, !tbaa !0</div><div>  %add.ptr = getelementptr inbounds i32* %p.addr.06, i64 %bf.clear</div><div>  store i32 1, i32* %add.ptr, align 4, !tbaa !0</div><div>  %add.ptr3 = getelementptr inbounds i32* %p.addr.06, i64 %add.ptr.sum</div><div>  %inc = add i32 %i.05, 1</div><div>  %cmp = icmp eq i32 %inc, 64</div><div>  br i1 %cmp, label %for.end, label %for.body</div><div><br></div><div>for.end:                                          ; preds = %for.body</div><div>  ret void</div><div>}</div></div><div><br></div><div>Notice how the two gep offsets %bf.clear and %add.ptr.sum have a simple arithmetic relationship that scalar evolution can figure out.</div><div><br></div><div>With a very small change, that breaks:</div><div><br></div><div><div>$ cat small3.c </div><div>void f(unsigned long d, unsigned *p) {</div><div>  unsigned long skip = d/4;</div><div>  for (unsigned i = 0; i != 64; ++i) {</div><div>    *p = 0; p += skip;</div><div>    *p = 1; p += skip;</div><div>  }</div><div>}</div></div><div><br></div><div><div>define void @f(i64 %d, i32* nocapture %p) nounwind uwtable ssp {</div><div>entry:</div><div>  %div = lshr i64 %d, 2</div><div>  %0 = lshr i64 %d, 1</div><div>  %add.ptr.sum = and i64 %0, 9223372036854775806</div><div>  br label %for.body</div><div><br></div><div>for.body:                                         ; preds = %entry, %for.body</div><div>  %i.03 = phi i32 [ 0, %entry ], [ %inc, %for.body ]</div><div>  %p.addr.02 = phi i32* [ %p, %entry ], [ %add.ptr1, %for.body ]</div><div>  store i32 0, i32* %p.addr.02, align 4, !tbaa !0</div><div>  %add.ptr = getelementptr inbounds i32* %p.addr.02, i64 %div</div><div>  store i32 1, i32* %add.ptr, align 4, !tbaa !0</div><div>  %add.ptr1 = getelementptr inbounds i32* %p.addr.02, i64 %add.ptr.sum</div><div>  %inc = add i32 %i.03, 1</div><div>  %cmp = icmp eq i32 %inc, 64</div><div>  br i1 %cmp, label %for.end, label %for.body</div><div><br></div><div>for.end:                                          ; preds = %for.body</div><div>  ret void</div><div>}</div></div><div><br></div><div>Now it suddenly becomes very difficult for scalar evolution to figure out the relationship between %div and %add.ptr.sum.</div><div><br></div><div>I think the canonical form should look more like the bit field example - don't hide the shl behind a bit mask.</div><div><br></div><div>I'll try deferring the transforms that hide an shl instruction to DAGCombine. I'll run some benchmarks.</div><div><br></div><div>/jakob</div><div><br></div></body></html>