<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div><br></div><blockquote type="cite"><div class="gmail_extra" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">I'm not sure I understand the full impact of this example, and I would like to.</div><div class="gmail_extra" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br></div><div class="gmail_extra" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">What are the desired memory model semantics for a masked store? Specifically, let me suppose a simplified vector model of <2 x i64> on an i64-word-size platform.</div><div class="gmail_extra" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br></div></blockquote><br></div><div>Hi Chandler, </div><div><br></div><div>I brought the example in this email thread to show that the optimizations that we currently have won't work on masked load/store operations because they don't take the mask into consideration. The memory model interesting question but I am not sure how it is related. In our example you can see the problem with a single thread. Both MIC and AVX[1] have masked stores operations and they have a different memory model. </div><div><br></div><div>Thanks,</div><div>Nadav</div><div><br></div><div>[1]  <a href="http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_maskstore_pd.htm">http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_maskstore_pd.htm</a></div><br></body></html>