[PATCH] D34726: AMDGPU/SI: Do not insert an instruction into worklist twice in movetovalu

Sun Jul 9 15:05:52 PDT 2017

alfred.j.huang added a comment.

Thanks for mentioning this.  I have reservation for the order myself.  I was thinking reverse might work due to https://reviews.llvm.org/D26718.   There was also this  "LLVM_ENABLE_REVERSE_ITERATION" define and -mllvm reverse-iterate that when used in a single compilation, would make the SmallPtrSet iteration in reverse order.  But I think you are right, SmallPtrSet is actually an unordered containiner, anything larger than the small 32 entries are hashed, so there is really no order, talking of iteration order is a bit strange here.

So back to the original question, when the original SmallPtrVector was used, we were traversing the "USE" chain for a previous dest register, we inserted them into the Worklist in order of the USErs and processed them by popping, so they are in the reverse order.   I'm not really sure if it matters at all which order they were pushed and popped,   I realized the difference only when one llvm lit test shows a difference of " v_add_i32_e32 v0, vcc, v1, v0" versus " v_add_i32_e32 v0, vcc, v0, v1", which in theory are the same.   If testing with ocl test, conformance and llvm lit show there is no difference in runtime, can we actually ignore the ordering?

As a last resort, I can reuse my original changes which it is the most secure, but probably not with the best performance in mind by using 2 containers; a SmallPtrVector for push and pop, plus a SmallPtrSet to avoid inserting duplicates.   In this case the result will guarantee to be the same as before.

https://reviews.llvm.org/D34726