<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56269>56269</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            LinalgOdsGen target causing build-time race conditions that fails build
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jerryyin
      </td>
    </tr>
</table>

<pre>
    Note: This is mirrored from mlir discourse for tracking purpose. The link to original discussion is [here](https://discourse.llvm.org/t/linalgodsgen-target-causing-build-time-race-conditions-that-fails-build/63334).

CC: @stellaraccident  @joker-eph 

Original description is attached below.

------


## Summary

We discovered some hard-to-reproduce race conditions when building Linalg dialect. This turns out to only happen at our highest core count (256) build machines with make -j$(nproc). Personally I have never been able to reproduce it with my dev machine. The frequency of it happens is roughly 1 failure out of ~50 builds of the same code base. It has persists for roughly past half year.

## Environment
Build tool: Make (instead of ninja)
OS: Multiple OSes, including ubuntu, centos
Machine nproc: 256

## Failure Signature

2022-04-19T04:42:06.794272790Z [ 47%] Built target MLIRLinalgOpsIncGen
2022-04-19T04:42:06.794202669Z Included from /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/cget/build/tmp-fa6ce9b8cc6a4fda9ed7d678dcc4b43a/llvm-project-mlir-release-rocm-5.2/external/llvm-project/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td:274:
2022-04-19T04:42:06.794290490Z /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/cget/build/tmp-fa6ce9b8cc6a4fda9ed7d678dcc4b43a/build/external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yamlgen.td:3735:20: error: Couldn’t find class ‘LinalgStructur’
2022-04-19T04:42:06.794297140Z def SoftPlus2DOp : LinalgStructur
2022-04-19T04:42:06.794302770Z ^
2022-04-19T04:42:06.794308300Z gmake[2]: *** [external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/CMakeFiles/MLIRLinalgStructuredOpsIncGen.dir/build.make:123: external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc] Error 1
2022-04-19T04:42:06.794314851Z gmake[2]: *** Waiting for unfinished jobs…
2022-04-19T04:42:06.794320581Z Included from /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/cget/build/tmp-fa6ce9b8cc6a4fda9ed7d678dcc4b43a/llvm-project-mlir-release-rocm-5.2/external/llvm-project/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td:274:
2022-04-19T04:42:06.794327081Z /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps/cget/build/tmp-fa6ce9b8cc6a4fda9ed7d678dcc4b43a/build/external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOps.yamlgen.td:3735:20: error: Couldn’t find class ‘LinalgStructur’
2022-04-19T04:42:06.794346731Z def SoftPlus2DOp : LinalgStructur
2022-04-19T04:42:06.794352501Z ^
2022-04-19T04:42:06.794358121Z gmake[2]: *** [external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/CMakeFiles/MLIRLinalgStructuredOpsIncGen.dir/build.make:174: external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.h.inc] Error 1
2022-04-19T04:42:06.794364351Z gmake[1]: *** [CMakeFiles/Makefile2:26904: external/llvm-project/llvm/tools/mlir/include/mlir/Dialect/Linalg/IR/CMakeFiles/MLIRLinalgStructuredOpsIncGen.dir/all] Error 2
2022-04-19T04:42:06.794370471Z gmake[1]: *** Waiting for unfinished jobs…

## Analysis
There are a couple of targets involved:

LinalgOdsGen: It generate the tablegen file according to the yaml file (LinalgNamedStructuredOps.yaml)
MLIRLinalgStructuredOpsIncGen: It generate a header according to the tablegen from LinalgOdsGen
What likely happened is:

In a machine that cannot do highly parallelized build, the two targets happen at the right order
LinalgOdsGen finished compilation, and tablegen written out to the disk
The generated tablegen picked up by MLIRLinalgStructuredOpsIncGen
In a machine that can do highly parallelized build, there’s a race condition
LinalgOdsGen finished compilation, output file created and tablegen partially written to disk
Before 1 is completed done, MLIRLinalgStructuredOpsIncGen picked up a partially written tablegen, finding that its content doesn’t make sense, decide that this is a failure
Cmake precedes with unfinished task that actually give up only after 1 is completely done. Therefore, in a environment where it failed the build, I can see a fully constructed tablegen, and do not have direct evidence of the partial written file.
I think the underlining problem is that the LinalgOdsGen target ([here](https://github.com/llvm/llvm-project/blob/1004d6e7e2eb24edc01d7e330c538ce5cb590001/mlir/include/mlir/Dialect/Linalg/IR/CMakeLists.txt#L37-L43)) is using the below paradigm trying to make sure an ordering happens:

```cmake
add_custom_target(
    myCustomTarget
    COMMAND foo outfile
)
add_dependencies(myTarget myCustomTarget)
```

The ordering does get populated correctly. However, Later build stages will use the incompletely-written outfile. I can’t seem to figure a way for later build stages to force to wait till the custom target to run at its full completion.

## Suggestion
Commit the yamlgen.td and yamlgen.cpp.inc, instead of having them dynamically generated, make them a manual target. Make it such that the developer that updated the yaml file has to run a codegen and commit it together. This way we wouldn’t need the custom command to happen before the tablegen therefore avoiding the race condition.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJztWd9v2zgS_mucF8KGftiW9ZCHNG3vAiTNYlNggX0JaJK2mUiijqSS9f319w0pyXbaTdPF9lDgLnBsmaKGnG9mvuGM10buzz8Zryb5Bfu8047hVWtrjVWSbaypWV1py6R2wnTWKbYxlnnLxaNutqztbGucmuFRxSrdPDJvmLF6qxtehYc657RpSOpk8W6nrJos3k-y1c771mHNSfYRr1H6rKqe6pmxWwx6_FckZ2uk26pm6rndKj8VvHNYe7rudCWnXtdqiu2oqTCN1B6LuanfcT_dcF25OAuSlnmezydZOZsk7yfJRXy_vCS1J_PEeVVVHGKElqrxjMYezKOyU9Xu2PEjt6NyygmrW99rx73nYgfM1qoyzyerTMPf8Uj_nuV4sbuurrndH9_5TUXAnxRZwZlasR23UNZMrWqtkZ1QjJRmB6XZ8041LKhLlrkOyEEMr5Tws2hb31lMNJ0PZmqqPcS2LR7jHqOW7fR2p5yHVEuiOyABW2WLJYCLolkNLXWjsJz2O3x7VGz6MMkA7arBzgRBzH5R1hmsjwWusMSTYo2CLsCG1lpXitY_aKJ9L20PVJ-GJaJTbaz6V6casWdmQxPjhoObWtNtd1giZWTqDlsmzTBtUnxYJHG_jr57yHG8JpWkYmtODntFohxrsVPtvAtuPQhsuaO71YbtFbezr9jsQ_OkrWlq-EocfxfA8cZU5FE3BAsQ0Q0ci0vaQ6ObBw5wei-6C9O6yusWaNzeKTfJLpluRNUF83VrgN_RmMAaxsXHbiIyLCINCWSaL3f3sYfjTm8bDpur_u6HbLJKJhf9I1mSZdNkPk3Lz8kcwuYZ3pLlrCjnWZEVZfI7hSybF5NsgaBlpCIcJwQhu7m--jW62G3rrhrxD9V8W2qSLZfl7-wqaDnQC0W5abb3Lfe7Bka6d-ae4vfetrW7F7zBHfHItwqj6l6qdbe9183GEG1wz-nDajgXLm6ub-EbNKJALtlHgZ3iY6AAX7cghaVQ5XolxJLPN5KXShZyWaykEPP1PCdxREFTAPyAuJkS-SHmKgWnmQL0erqYZZik_vDKQv0X8_GVnsBHNKU6DLyPkYiriBsurn4dv9152wkylQSeMy8BWlYQft9GtUzmwVY_G47D9D-Hir6SPESN-8vIfYKu8hS-Pa8rJIwIY17kC0IzoXhRlNjo4tJ0lWz6iChLzza6kUxU3CFR9aOrU9OMk99gkiKdwyRSbdid2fhfqs5l729bRiu_EPotWXmSFUUIxQ9vmLvKE8zdEi0jdjPKtZTgsov4ooD-Yea4JNb7qCuiso8HfjgxTWSKmQyCgoPMwl7zixTURQb6sc5y6ieibWcQQOT2gRyDpd-GOJ2vFulrEP_GkY_B4JRQugZupR0dCx7M2r2dgPMsWazS_1Pl30uVeVYkhOpPh-P_LlXm82WRp38TVS6yRZK-kSoRXtmrcfwTU2Xw9v8uVe6-myiXwPgI4PRrAJ_igOsNrklGtiyTH6vkd5sA1cxB_-zb-hfJvHhV_7cnipPD_QW2uUfNEgc_U0nNOP1TwUbFBBU84ZCOIql5MtWTkiM5xvf-4C4dndqxJ9RC4ABluVehVvJUomGEkTkYimJjQ1GCoo1uE2fEWyhxXmWWsd55FeCXW-Bsh6IJ5eIXKx82RunwRI1YN4PEWaUf1VjZAlDtXqh_hRp0KDIZ8T4D5TfGM2lCBRwqQAt7q0r_myr6yM6XcQvPZoT3UDzTHYtHUX5a7PxLmNloX2HqVlecanaSycGlo1rPVnuPz75CJ6lSu8fR1CNGR8-0Wjzie9ey9Z69jvOfa_8W1VFGDrTuIOK0_fAdGkO5tvPRgYRVQZsTFLADr0PzYMADWBxweKc21J5IqQVAoitFEqRpFEl_FYEjsPjXlul3QHIoyQXPI4S0p6UaT60haZQ7zoahBeJU48LyUlEHKT7l-3YaH_oTfdMpPNFaJZQc-ihH8e-5e4zPc-w-7G-LwwbtOXRs-AaMeKo9Rkn90DCxAZ3YTMDK6tCooAaRDc0W2g6tBIcaDXwV3MApCr9NR6tCYRcgPPK2wWPhLhQwobUDegTBMvVErTOhhoZLD-8ILtm7b6VcETTUK8S0rkG8VFCeuonWYJmaVOsBVCdBPjQfQDuvNBO3ALRbzwDOIUu8yB3ryqzxkSbJXC5VoTK1zuZKiiSVhcrzRCzylVALsV6USZKkfzW9XFNnaeb_wO38Oi-m1_OcGDErScPQxIw2oI5hiDuptzXzdt9TXnQt6uTANIFY6EbfAXtBapNlEl8iJJwwxqW8F53zpr6PyBFw4Q7DX72_DPc-x1vj-OXtzc3Fp_fIS4Zilew2JKDyIBfnXdWQwTVl0FW9j2JeSh0eGbd3vGditFEtiitGElrTdlVgBbA_uVa1n7F_mmcVTtyX7JpTAMRupPM4pVMMVRUAjdkLNhrjYnrEqMEBo58fhS88viasN3obgGbPfB9ScvXlMjTNWBG6l8_I3szTurRmRHnwT2pudiEvEHFQNA2hCgr8WjvxrttC_oFHL01daz8m23hAD4E3fB1K1xDnY5MR8dg7Vc3kHqWNFpFAhrRB84NXhSmUBxpQTL_vWexbYmHXid0hBCWQr0wLNMJQ18qYgU6OAtRMHfQOfVZictqxiLqQOgaLIGr7ZjQB_azY88vyolG98B5UEhDygxkS7jpmgJMTgR-oj_Eno-UQW6dZanYmz3NZ5iU_89pX6vxr7NL_vsAOvy980WoPOITfF-Kss85W59_NQtq5Lhw_F0ucd89258tNKle8KEq1SkuRlUKpNQcppcUiWRfl8qzi4Ap3DvKbZFmjnlkQgWvw4Jk-p-NossxW6TLF5yxPi40qVuWikOtsk4rJPFE1Nj3-yHJmz8OWUN863KwCXY03Ud_pLYwRloN83vmdsecPKAz3YKizsPZ52Pt_AM5Bkt0">