<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">r164089. Thanks.<div><br></div><div>Evan</div><div><br><div><div>On Sep 17, 2012, at 11:11 AM, Evan Cheng <<a href="mailto:evan.cheng@apple.com">evan.cheng@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=us-ascii"><base href="x-msg://244/"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi David,<div><br></div><div>Thanks for working on this. This is a big omission that I was planning to look at. It's good you got to it first. Some comments though:</div><div><br></div><div><div> bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {                                                                                                                                                                                                       </div><div>-  if (!Subtarget->allowsUnalignedMem())                                                                                                                                                                                                                                     </div><div>-    return false;                                                                                                                                                                                                                                                           </div><div>+  // The AllowsUnaliged flag models the SCTLR.A setting in ARM cpus                                                                                                                                                                                                         </div><div>+  bool AllowsUnaligned = Subtarget->allowsUnalignedMem();                                                                                                                                                                                                                   </div><div>                                                                                                                                                                                                                                                                             </div><div>   switch (VT.getSimpleVT().SimpleTy) {                                                                                                                                                                                                                                      </div><div>   default:                                                                                                                                                                                                                                                                  </div><div>@@ -9034,10 +9034,15 @@ bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const {</div><div>   case MVT::i8:                                                                                                                                                                                                                                                             </div><div>   case MVT::i16:                                                                                                                                                                                                                                                            </div><div>   case MVT::i32:                                                                                                                                                                                                                                                            </div><div>-    return true;                                                                                                                                                                                                                                                            </div><div>+    // Unaligned access can use (for example) LRDB, LRDH, LDR                                                                                                                                                                                                               </div><div>+    return AllowsUnaligned;                                                                                                                                                                                                                                                 </div><div>   case MVT::f64:                                                                                                                                                                                                                                                            </div><div>-    return Subtarget->hasNEON();                                                                                                                                                                                                                                            </div><div>-  // FIXME: VLD1 etc with standard alignment is legal.                                                                                                                                                                                                                      </div><div>+  case MVT::v2f64:                                                                                                                                                                                                                                                          </div><div>+    // For any little-endian targets with neon, we can support unaligned ld/st                                                                                                                                                                                              </div><div>+    // of D and Q (e.g. {D0,D1}) registers by using vld1.i8/vst1.i8.                                                                                                                                                                                                        </div><div>+    // A big-endian target may also explictly support unaligned accesses                                                                                                                                                                                                    </div><div>+    return Subtarget->hasNEON() &&                                                                                                                                                                                                                                          </div><div>+           (getTargetData()->isLittleEndian() || AllowsUnaligned);                                                                                                                                                                                                          </div><div>   }                                                                                                                                                                                                                                                                         </div><div> } </div><div><br></div><div>This part is not quite right:</div><div><div>+    return Subtarget->hasNEON() &&                                                                                                                                                                                                                                          </div><div>+           (getTargetData()->isLittleEndian() || AllowsUnaligned);   </div></div><div><br></div><div>vld1 / vst1 requires alignment of element size. If not, then it's a fault unless SCTLR.A is 1.  This should not require true for all little endian cpus with NEON. It should still be controlled by the subtarget feature.</div><div><br></div><div>I'll fix up your patch and commit it for you. Thanks.</div><div><br></div><div>Evan</div><div><br></div><div><br></div><div><br></div><div><div>On Sep 13, 2012, at 5:53 PM, David Peixotto <<a href="mailto:dpeixott@codeaurora.org">dpeixott@codeaurora.org</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div lang="EN-US" link="blue" vlink="purple" style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div class="WordSection1" style="page: WordSection1; "><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">This patch is the result of a discussion of unaligned vector loads/store on llvmdev:<span class="Apple-converted-space"> </span><a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/053082.html" style="color: purple; text-decoration: underline; ">http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-September/053082.html</a>.<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; "><o:p> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">The vld1 and vst1 variants in armv7 neon only require memory<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">alignment to the element size of the vector. Because of this<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">property, we can use a vld1.8 and vst1.8 to load/store f64 and v2f64<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">vectors to unaligned addresses on little-endian targets. This should<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">be faster than the target-independent codegen lowering that does an<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">aligned load/store to the stack and unaligned load/store of each<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">element of the vector.<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; "><o:p> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">This patch includes two changes:<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">  1. Add new patterns for selecting vld1/vst1 for byte and half-word<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">     aligned vector stores for v2f64 vectors.<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">  2. Allow unaligned load/store using vld1/vst1 for little-endian<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">     arm targets that support NEON.  The vld1/vst1 instructions will<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">     be used to load/store f64 and v2f64 types aligned along byte<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">     and half-word memory accesses.<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; "><o:p> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; ">-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation<o:p></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; "><o:p> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif; "><o:p> </o:p></div></div><span><0001-Use-vld1-vst1-for-unaligned-load-store.patch></span>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu" style="color: purple; text-decoration: underline; ">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" style="color: purple; text-decoration: underline; ">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br></div></blockquote></div><br></div></div>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits<br></blockquote></div><br></div></body></html>