<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 22, 2014 at 5:54 PM, Louis Gerbarg <span dir="ltr"><<a href="mailto:lgg@apple.com" target="_blank">lgg@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Hmm… you are correct, it turns out you are correct, I only have to worry if it is constant after the second arg of the GEP. That should allow my case to continue to work. I am currently building a stage2 now to see if works cleanly.</div>
</blockquote><div><br></div><div>Cool. Hopefully we can make incremental progress on the problem while figuring out the right thing to do in the fully general IR case...</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><br></div><div>Having said that, there still exist cases where you are indexing into a homogenous struct like so. Consider the following two functions, which are the same except one is written using structs and the other is written using arrays:</div>
</blockquote><div><br></div><div>Ah, delightful. I was pretty sure there was a more complex example that still exhibited the problem, but wanted to see it to think through things properly.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div>%struct1 = type { i32, i32 }                                                    </div><div>%struct2 = type { %struct1, %struct1 }                                          </div>
<div>; Function Attrs: ssp uwtable                                                   </div><div>define i32 @test1(%struct2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) {             </div><div>bb:                                                                             </div>
<div>  br i1 %tmp4, label %bb1, label %bb2                                           </div><div>bb1:                                              ; preds = %bb5                </div><div>  %tmp10 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 0               </div>
<div>  br label %bb3                                                                 </div><div>                                                                                </div><div>bb2:                                             ; preds = %.lr.ph.i.i          </div>
<div>  %tmp20 = getelementptr inbounds %struct2* %dm, i64 %tmp9, i32 1              </div><div>  br label %bb3                                                                 </div><div>                                                                                </div>
<div>bb3:                                      ; preds = %bb2, %bb1                  </div><div>  %phi = phi %struct1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ]                       </div><div>  %tmp24 = getelementptr inbounds %struct1* %phi, i64 0, i32 1                  </div>
<div class=""><div>  %tmp25 = load i32* %tmp24, align 4                                            </div></div><div>  ret i32 %tmp25                                                                </div><div>}                                                                               </div>
<div>                                                                                </div><div>                                                                                </div><div>%array1 = type [2 x i32]                                                        </div>
<div>%array2 = type [2 x %array1]                                                    </div><div>                                                                                </div><div>; Function Attrs: ssp uwtable                                                   </div>
<div>define i32 @test2(%array2* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) {              </div><div>bb:                                                                             </div><div>  br i1 %tmp4, label %bb1, label %bb2                                           </div>
<div>bb1:                                              ; preds = %bb5                </div><div>  %tmp10 = getelementptr inbounds %array2* %dm, i64 %tmp9, i32 0                </div><div>  br label %bb3                                                                 </div>
<div>                                                                                </div><div>bb2:                                             ; preds = %.lr.ph.i.i          </div><div>  %tmp20 = getelementptr inbounds %array2* %dm, i64 %tmp19, i32 1               </div>
<div>  br label %bb3                                                                 </div><div>                                                                                </div><div>bb3:                                      ; preds = %bb2, %bb1                  </div>
<div>  %phi = phi %array1* [ %tmp10, %bb1 ], [ %tmp20, %bb2 ]                        </div><div>  %tmp24 = getelementptr inbounds %array1* %phi, i64 0, i32 1                   </div><div class=""><div>  %tmp25 = load i32* %tmp24, align 4                                            </div>
</div><div>  ret i32 %tmp25                                                                </div><div>}</div></blockquote><div><br></div><div>The @test1 function cannot have the optimization applied because the %tmp10 and %tmp20 GEPs vary by a field index into the struct, whereas @test2 can be optimized to:</div>
<div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div>define i32 @test2([2 x [2 x i32]]* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19) {</div><div>bb:</div><div>  br i1 %tmp4, label %bb1, label %bb2</div>
<div><br></div><div>bb1:                                              ; preds = %bb</div><div>  br label %bb3</div><div><br></div><div>bb2:                                              ; preds = %bb</div><div>  br label %bb3</div>
<div><br></div><div>bb3:                                              ; preds = %bb2, %bb1</div><div>  %0 = phi i32 [ 0, %bb1 ], [ 1, %bb2 ]</div><div>  %tmp24 = getelementptr inbounds [2 x [2 x i32]]* %dm, i64 %tmp9, i32 %0, i32 1</div>
<div class=""><div>  %tmp25 = load i32* %tmp24, align 4</div></div><div>  ret i32 %tmp25</div><div>}</div></blockquote></blockquote></div><br></div><div class="gmail_extra">So, here is why relaxing this rule is hard:</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Without a constant index, we don't know what sub-struct to use for interpreting the next index in the event of heterogeneity. Without this, we can't do anything, and so we definitionally preclude the transform.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">While clearly we can make this transform safely for homogenous structs, doing so seriously weakens the IR's guarantees. I'd not like to see us go that direction.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Instead, if this transformation is indeed important (and it sounds like it is), I have a somewhat crazier idea: fold homogeneous struct type layers into arrays. Thoughts? (I've not thought about this much, there might be dragons here...)</div>
</div>