<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/96502>96502</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [LSV] Enhance LoadStoreVectorizer to Handle Disjoint Flag in OR Instructions and Restore Vectorization Opportunities 
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          LiHao217
      </td>
    </tr>
</table>

<pre>
    ### Summary
This PR proposes an enhancement to the LoadStoreVectorizer (LSV) pass in LLVM to handle the recent changes in how LLVM treats or instructions with the disjoint flag. This change aims to restore vectorization opportunities that were lost due to a recent patch that introduced the disjoint flag for or instructions.
### Background
In the previous implementation, haveNoCommonBitsSet() was used to detect and convert certain or instructions to add instructions, a transformation that LoadStoreVectorizer relies on for vectorization. A recent patch changed this behavior by introducing a disjoint flag, which serves the same purpose but is more reliable and easier to revert.
However, this change resulted in LoadStoreVectorizer no longer recognizing these transformed or instructions, causing certain vectorization opportunities to be missed.
### Problem
Consider the following IR:
```
%linear_index3 = or i32 %linear_index_plus_base.fr, 3
%linear_index2 = or i32 %linear_index_plus_base.fr, 2
%linear_index1 = or i32 %linear_index_plus_base.fr, 1
```

In LLVM versions prior to patch [74467](https://github.com/llvm/llvm-project/pull/74467), LLVM used the haveNoCommonBitsSet() function to determine whether these or operations had no common bits. If they had no common bits, these or operations were converted to add operations, thus allowing the LoadStoreVectorizer (LSV) to vectorize them.

In LLVM versions after patch https://github.com/llvm/llvm-project/pull/74467, a new disjoint flag was introduced to replace the haveNoCommonBitsSet() function. However, in this new implementation, the or instructions were not recognized as disjoint and thus were not transformed into add operations. This led to a failure in the LSV to vectorize these instructions.
Specifically:
    1.The previous implementation used haveNoCommonBitsSet() to detect and convert specific or instructions to add instructions. LSV could then recognize and vectorize these add instructions.
    2.In the new implementation, or instructions were not correctly identified as disjoint, and thus were not converted to add instructions. This caused LSV to miss the opportunity to vectorize them.
To solve this problem, we need to identify if the or instructions in the IR are disjoint. If they are, the IR should be transformed to the following form to enable vectorization:
```
%linear_index3 = or disjoint i32 %linear_index_plus_base.fr, 3
%linear_index2 = or disjoint i32 %linear_index_plus_base.fr, 2
%linear_index1 = or disjoint i32 %linear_index_plus_base.fr, 1
```
This modification will allow LSV to recognize these instructions and restore vectorization opportunities.
### Solution
To resolve this, we propose updating the LoadStoreVectorizer pass to check for or instructions and apply the disjoint flag based on the haveNoCommonBitsSet() function if applicable. This will ensure that vectorization opportunities are restored.
### Implementation
In the Vectorizer::run function, we iterate over the basic blocks and check for or instructions. If an or instruction has no common bits set between its operands, we set the disjoint flag. This preprocessing step makes sure that LSV recognizes the pattern and can perform the necessary vectorizations.
```
bool Vectorizer::run() {
  bool Changed = false;
  for (BasicBlock *BB : post_order(&F)) {
    assert(!BB->empty());

    SmallVector<BasicBlock::iterator, 8> Barriers;
    Barriers.push_back(BB->begin());
    for (Instruction &I : *BB) {
      if (auto *OrInst = dyn_cast<PossiblyDisjointInst>(&I)) {
        Value *Op0 = OrInst->getOperand(0);
        Value *Op1 = OrInst->getOperand(1);
        if (haveNoCommonBitsSet(Op0, Op1, DL)) {
          OrInst->setIsDisjoint(true);
        }
      }
      if (!isGuaranteedToTransferExecutionToSuccessor(&I))
        Barriers.push_back(I.getIterator());
    }
    Barriers.push_back(BB->end());

    for (auto It = Barriers.begin(), End = std::prev(Barriers.end()); It != End;
         ++It)
      Changed |= runOnPseudoBB(*It, *std::next(It));

    for (Instruction *I : ToErase) {
      auto *PtrOperand = getLoadStorePointerOperand(I);
      if (I->use_empty())
        I->eraseFromParent();
      RecursivelyDeleteTriviallyDeadInstructions(PtrOperand);
    }
    ToErase.clear();
  }
  return Changed;
}
```
### Testing and Validation
The modified implementation has been tested with the existing suite of vectorization benchmarks and specific cases that regressed due to the introduction of the disjoint flag. The results showed that vectorization opportunities originally intended to be captured by LSV were successfully restored.
### Discussion
This enhancement is expected to bring back performance gains lost with the new disjoint flag handling. It is a minor change but will have a significant impact on GPU-targeted vectorization especially for AMD and NVIDIA architectures.
The community’s feedback on this approach is highly appreciated. We would particularly like to know if there are any potential edge cases or alternate methods to handle similar scenarios in the future.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysWVtv4zjP_jXqDbFB4vSQXvSiadrdAN1t0fbrd1nIMh1rK0uGKCeT_fUvKDkHJ2l3Bu87CFDYlijy4cOTRhLphUW8ERdTcTE7k22onL951H9Il42uznJXrG9ENk4_eG3rWvq1GM7E8Pat0gTPL9B41zhCAmkBbSWtwhptgOAgVAiPThavwXl8RxWc1_-gB5FNHl_fRXYNjSQCbeHx8f1P3lFJWxiMGz0qFqMqaRcYF1Vu1S30KAOB86AtBd-qoJ0lWOlQxa2Fpr-dtgFKIxcDiJomOSB1TXyQR2KlYNlpJVkEuKZxPrRWB40EoZIBVugRjKMARYu8U240a2RQVVqkbfCuaBUWx-dD6fyhqoME4Q7ZqVSfC-9aW6QvcxsFNR6X2rUEum5MhDXqKbI7qOQS_3J3rq6dnepArxhENmFMV5KgJVbFQYEBVQBpC1DOLtEHUOiD1PYIPTatKHrv-BwJwUtLpfN1wigafMqrHg2j5my0uAfsAG77qCVvMFqaIMdKLrXzkK-3SGq7ANkHkrVZVVpVQOiX0T8IJGuEpvVMQcjbAJqgZseyNjI3GG1HSRp98juD0OH_h1vxIwsOexzxSK0JWERinjDUOjDOLqLJyi2s_oe1DRUS7sDC4hBhPkbJlnjxxgnf0s9BjlBrIiyOCPPsXW6wTq_vnCVdsIEVQumMcSs-ZP4ixrfdxsth9-vkXBhtUfoPbQv8MQYxnkV1xxkcfPtoTEsfuSQclBGp8UkR2a-IyE6KGP2KiNFpuzbBE9PEEj1Fajee2RVcRz5xMb06P7-8EhczkU2qEBpioLIHkT0sdKjafKBcLbIHY5abP7813v2NKojsoWmNEdlDEpFdszbxuBRzFX4TmmVrVQqiFJq-1hZhVWGokvcIGQDXoJcpKitZMN9UFAa5DjSAeclL1ye-JSYfS4lJrEsAKTFwqO8WpH0tgdxw599Td3Bb9saEXQ--9YEsA_rOAf8l5JyULK4O0iynvf08zLHeGKnwJ30ygL10oG3KCHzMce5lgUfFhyG2LmxzAhYgaacjp6EI8XbhfqrQ9sgnXd0ynb-glNq0HpNqCI-v70cuIDxVZV4bVLrUShqz3iYEAIDR4O3rGpPo_DVsp4sLdWf9THUZRBuUa00MG7tDLoo8tOxo-86QbNDVy9Pu-tJVynmPKpg16AJt0KXuOy1S7chvR3HUNyp1GzLC1zmJc3gizTbBr7-MnzcH5MwSEwGbLs9z7WP70qmdumvQ5UkydhyZv4D0u3Zklzmkxw2P5y9AVfRB3q9eXf-2Kyf8nt-ijXW1V7p-tdBsw-J_UHF-Sda_lJ5fknW6BkX_166IIRcjaaWNSZl1Q4gd04-jNjLuJ9rTo5bg1Zk2-mJDI487InUE6pp1aJtChu_yfGzMgwNVofo81cJGNWXTmPWJlpdBKsDZny2HuoyitGJmdSEUUUNLnPNiz_ldryRjxxcxO-6V5v2MsN9g7yxmBo9vfWu3WnWQ6cAZGcEtuwYrl6QV5Mapz4TClxjFgJOHrTZUkg7qNhAGyDGsEC3wcywDtti4jT9_Ndg0HhvvFFJsLClgA7X8RIIdcsy6LeVSJmpkCOht0l9aaNCn-I5ZlKVJv-5DvmVcn--5c-YUjp2PxdV0k6jjyruu8-eAK6UhFOPtAoZQZJMpAzxlfEFkt9MpiPEtNI7Ch_MF1-eJyC4fYuvVkw8gidAndo2m09_E-B7rJqyTKvzbnLXb8lpLY5L6Yny3OzrZkZzvYrhPxPgeptJ7jZ72tIbty0HTUvWRS_XJVsTzc1xoe3w-7-qsne8xQ2SX82httPvIPOBAEdlEtsHxkifPmyOUxdp-KElBjO-eHZHOzXrWkYXXiPF9gm1-Cjb-9y5Ni1FoM4wSk3C2YYHhKfFRZJPhoRkHm0ffbR6d2pxsOp0lnpohQ__UjPjP7PEr7WHvRMIwp9m2hE-Cb_HUueJqtv_i4DFpJbKRpt9b6aUNiMWbe4sVEv39D1Qx276511ZxvDjfQ7h_1kmGzAcLDPMtw05xpKfUNzTDCO-XJO-oFnkzT4TZCusx9A7ubYpNCkUKAe4PY1B26w-OivKyEW-5t8URyCCyqcim83AAyTYNXN3xVt_aJ_tM2BaOeT8R2e08tl8iu91qYvEH-zPJ-s7OfkjdppB6c_deEp6gzyacnoPvqBoRWGDYVsZn5hL6HZPnx4xKjJmzO1rCj4PM00MlLkJW58G7-ll6tF1NPJD5gqr1pJdo1jM0GPDN66XmTn6Gspj3bhcmO_2_pVEHxEAZlP7o1L2lHkPr7cZVO7g3K456vU3BfUOKzQUj-S6NLvYKL48cqTvisac_cnBhzLkEBiTur7cXevhDJ4nUai7F5UEzkKNVVS19V5C3U4iStLnH87jwSNyWdPd4LHczL6aOojxdYzcXQsSN8irO-P_SjTivF9qyl_gEtEVqp3MEJZvQeiwgX8eqHIcKSgmkbHnDl13MTJNqifaA1NS7cOXHHw2qbjLJPQPGWWJT3HklLKS2lC40t_AeD9PxHlbbxYDDWxNIqLV1fnNDlrch9WectUEC6YWN_S6rUTdSBW7-fn_-v9-C9Atkjfp4YXRRRIhj9vbPWXTcX-_z2fwWpFeV5tmy9ds-l_3AHVMcnsR9JiZDcX1NUCIW0UjXjeyyabyTqmK1K72ozDq-4uMCFgP4f4RVnHca6YNWrZHerMHoz8iKT-tW3VTlMbaV0q6hcYEHLmkAiwV2tHIepOEmirvDGkPlCtq7wyZdayM9kEIrvXbbqaxs2a7BWXEzLq7H1_IMb0ZXo-vsfJSNL86qm4uL8moyKtX4-rLMLi_leXF1qc7L4mo8GY6uysszfZMNs_PhZXY-uhgOx8PBZJiNJpPhpBhOJmOlxuJ8iLXUZmDMsh44vzjTRC3eXF9eDLMzI3M0FK_8syxOzPxRZJm4mJ35m3j1krcLEudDoynQTkrQwcT_K3h8fRcXnPAj-04OEMHBHwmITSmGB2aWtvD0AvPDSeKlG3jeezR56oXVWevNzS_fHUXrSGQPyfrlTfafAAAA__9RbU9p">