<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/63989>63989</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[flang] fir.alloca without stack clean-up inside hlfir.elemental
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
vzakhari
</td>
</tr>
</table>
<pre>
Noticed in the LIT test added by https://reviews.llvm.org/D155778
```
subroutine test_multiple_expr_uses_inside_elemental(o, r)
character(*,1) :: o(:)
integer :: r
r = 0
if (any(o/=repeat('*',len(o)))) r = 1
end subroutine test_multiple_expr_uses_inside_elemental
```
HLFIR expressions bufferization introduces a temporary inside `hlfir.elemental` for an expression defined outside of `hlfir.elemental`. The temporary involves `fir.alloca` inside `hlfir.elemental` which is then transformed into a do-loop. Since there is no stacksave/stackrestore, the stack may eventually overflow.
Before bufferization:
```
%23 = hlfir.elemental %22 unordered : (!fir.shape<1>) -> !hlfir.expr<?x!fir.logical<4>> {
^bb0(%arg2: index):
%32 = fir.box_elesize %1#1 : (!fir.box<!fir.array<?x!fir.char<1,?>>>) -> index
%33 = hlfir.designate %1#0 (%arg2) typeparams %32 : (!fir.box<!fir.array<?x!fir.char<1,?>>>, index, index) -> !fir.boxchar<1>
%34:2 = fir.unboxchar %33 : (!fir.boxchar<1>) -> (!fir.ref<!fir.char<1,?>>, index)
%35:3 = hlfir.associate %20 typeparams %17 {uniq_name = "adapt.valuebyref"} : (!hlfir.expr<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>, i1)
%36 = fir.convert %34#0 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
%37 = fir.convert %35#1 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
%38 = fir.convert %32 : (index) -> i64
%39 = fir.convert %17 : (index) -> i64
%40 = fir.call @_FortranACharacterCompareScalar1(%36, %37, %38, %39) fastmath<contract> : (!fir.ref<i8>, !fir.ref<i8>, i64, i64) -> i32
%41 = arith.cmpi ne, %40, %c0_i32 : i32
hlfir.end_associate %35#1, %35#2 : !fir.ref<!fir.char<1,?>>, i1
%42 = fir.convert %41 : (i1) -> !fir.logical<4>
hlfir.yield_element %42 : !fir.logical<4>
}
```
After bufferization:
```
fir.do_loop %arg2 = %c1_1 to %23#1 step %c1_1 unordered {
%39 = fir.box_elesize %1#1 : (!fir.box<!fir.array<?x!fir.char<1,?>>>) -> index
%40 = hlfir.designate %1#0 (%arg2) typeparams %39 : (!fir.box<!fir.array<?x!fir.char<1,?>>>, index, index) -> !fir.boxchar<1>
%41:2 = fir.unboxchar %40 : (!fir.boxchar<1>) -> (!fir.ref<!fir.char<1,?>>, index)
%42 = fir.alloca !fir.char<1,?>(%17 : index) {bindc_name = ".tmp"}
%false = arith.constant false
%43:2 = hlfir.declare %42 typeparams %17 {uniq_name = ".tmp"} : (!fir.ref<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>)
hlfir.assign %19#0 to %43#0 temporary_lhs : !fir.boxchar<1>, !fir.boxchar<1>
%44 = fir.undefined tuple<!fir.boxchar<1>, i1>
%45 = fir.insert_value %44, %false, [1 : index] : (tuple<!fir.boxchar<1>, i1>, i1) -> tuple<!fir.boxchar<1>, i1>
%46 = fir.insert_value %45, %43#0, [0 : index] : (tuple<!fir.boxchar<1>, i1>, !fir.boxchar<1>) -> tuple<!fir.boxchar<1>, i1>
%47 = fir.convert %41#0 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
%48 = fir.convert %43#1 : (!fir.ref<!fir.char<1,?>>) -> !fir.ref<i8>
%49 = fir.convert %39 : (index) -> i64
%50 = fir.convert %17 : (index) -> i64
%51 = fir.call @_FortranACharacterCompareScalar1(%47, %48, %49, %50) fastmath<contract> : (!fir.ref<i8>, !fir.ref<i8>, i64, i64) -> i32
%52 = arith.cmpi ne, %51, %c0_i32 : i32
%53 = fir.convert %52 : (i1) -> !fir.logical<4>
%54 = hlfir.designate %26#0 (%arg2) : (!fir.box<!fir.array<?x!fir.logical<4>>>, index) -> !fir.ref<!fir.logical<4>>
hlfir.assign %53 to %54 temporary_lhs : !fir.logical<4>, !fir.ref<!fir.logical<4>>
}
```
We should probably analyze `hlfir.elemental` before cloning it into a do-loop: if there are `fir.alloca`'s, then conservatively insert stacksave/stackrestore around the cloned body.
One thing about this particular example is that it is possible to hoist `fir.alloca` outside of the do-loop, but the inserted stacksave/stackrestore may complicate the hoisting task. At the same time, `fir.alloca` and the do-loop only appear during the bufferization, and embedding the hoisting into the bufferization does not feel right.
We may probably consider hoisting `hlfir.associate` and sinking `hlfir.end_associate` out of `hlfir.elemental` before bufferization. This may be done in an optimization pass and will not interfere with the bufferization. In any case, it looks like `hlfir.associate` codegen must be aware of where the temporaries are inserted and insert stacksave/stackrestore when needed.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEWUtv4zgS_jX0pRBDoiQ_Dj44ThvbwGAX2BlgjwYllSxuKFJLUk7cv35B6mFbkdPpzPR0ECSUxKr66sUqlZgx_CgRNyR5JMnTjDW2VHpz-saeS6b5LFX5efNPZXmGOXAJtkT47esfYNFYYHmOOaRnKK2tDYm2hO4J3Ws8cXwxcyFO1VzpI6H7pzBJlssVCZ5IsO3-LoLu11-aJtWqsVyi532oGmF5LfCAr7U-NAbNgUvDczygwAqlZYLQlSJ0B5rQdcsEICuZZplFTeiK0C2hu5DQNThs0RaUu-tgDvu5tHhE3W_Q_X135wmCYVsBhK6YPLcy9yR60lgjs17M0ktaEroTKNsd6-G3YxW2rFDm8BlVp0zW_v3Hb_uv_wZHisZwJQ2kTVGg5t-Y5Uo6DbXKmwwNMLBY1UozfYZWApBFUIqC6_lF1CKAQmlg8oop5FhwiTmoxno6VUyTzgH-KPFGzkmJExq33W1mQqiMOSHvIngpeVYCNy7gJFjNpCmUrnwQWgUMcvUglKrn8DuXGbptGt1-qcBYlj0bdkJC936t0Vil0cWKi19_Dyp2BjyhtA0T4gzqhLoQ6mV-bdtHLJTGW4O6SJnyBgAAoQmNvLtHKvknFBqpdI4acxdv4GMndPtMyWok0S4k0RcXMg8k-gKEhh2X11qTaEei_Wu3X6gjz5gg0S52FG7z8rGDQZIvaRp43gnTR-okcZnjqwvHHjtABzeiHq5jmqpXF3CGf0P3JCQ0Ckc4U_XqgLQXTGt2vsXlss-rQd3dFtq1Si2OEYJrg-XojiNmBwQBXGlC1wD2XGPNNKvMAP8vQbjrjTQsLm7oGA-00Zfe1jSJYhJtL0ZsZLdxUG2E7prJRUS_QWNxgT-J9Qrf2JAJibbXxmTGqIx3xqTByHTh0gVNI_n_DpJV6OkIpSxntZ2fmGgwPTs4lJLl05UetzH5MaRjNcd22MEP6R9OKL8YXJApeUJtO-f4ELpxwgdk3Lq-JeCri98HqctJqclE6vyVUleTUodUuLU6X8Rj-vUUvYuHj9HHwYWeCQEkDg57pd0hvd319Xenqppp_D1jgumwTeJo0bo6iZb9YtUvfKksmLEVsyWJdpmS1nHyFpmwZGuYceQMdx3q_l-vSEQvaRuHXgemuS3nWVVzkNhBiYNukQUH3ln1Qut_uhyQ-eEmx1rH9xq5i94nPxTcI2vTKW_FQ3z5bLiJnFF1eIP7zFHkfW8xiNi-S06WT-_0INvCov54mfQnvTq4-g3d2d4dP0kWHkKwqq2kPo2MxXp4clVAh4I3Eda_qJR1ifHJUrb-1aUsDu-Xsnh8iv7UUnYV822_CHf5eJN2Z9egLVk-plzm2U1lm9uqbqvZSFrBhMHr40BJY5m04B-MoUWDkXo_Z4Jp7GB_qMZekPx4kfg5RXX99pRo3wy9FmsfvW1WxlF70Tf4B1Ga69PjLorJqBusGl-FXf-uYZta4AX5W858ilMycOLSoLYH38u0QrqjufWrWyeP4VXkJIM_Pii6b0ZaV3wO7-I-3qSvSN7mHeDgTwJ-P38_p8NkGxSHP7f5iifboDj6uc1XPNk8XQ7v7zVPSfDnmq8k_GTzFfc9V9z3XPG6WyTB39t8JfRu85WE322-3K5oyogJ_UxP5Ajje3WbLqYK94_V6Tcv63fO8beR-pbynVM6iboTOonvn84jjndqxDty3-8D_4NgStWIHGqtUpaKMzDJxPnbvTFP2g5YMqEkl0fgdjTe8e4vuumOr7KjKRKhS9PNdSS4wo36xCw_ofAjLhcZ98ZBwLRqZO5HQg4A5pCq_HwzAPqXRLClg8ZS1Vi3NlAzbXnWCKYBX1lVC2wHVcx6BQzUyhieCnQOKRU39u3w62qO5uT36tIdpF4Mdugxv4-_YmfIVFULnrlwdURemoNrmXmeA2xbXsY1H5ZXbZaNsbDOCB0IUNL5ra6Racgb7dmV4ykY3Xk6rFLM837LIN578Q0R5AoNSGWhQBSg-bG081H8OKWG4HEO5TnqC-MhjIYXr14Dw-XzzY6b17PO5nfnln0k3uD1w0xuPKbU2Uc6twCToGrLq16rmhnjIbxwIbx6XFrUhYvZF27Lt4aYA3x1fM6QsbYT4RaEUs8GBH_GO1pmKscjSqgaYx0e9uIyQhXw4tPDXs1dORqfLkMQOXjfy4cXl0MSMcd8Pss3Ub6O1myGm3CxWq_Wq8UimJWbeB0iMkyCIl9ikUQ5TRZxHi3WMQtTRvMZ39CARsGShkEQLuNgni6iEJdBsGABxYQuSRxgxbgYvg_MuDENbhbRerWeCZaiMP5zBKUSX8A_dN1y8jTTG0fzkDZHQ-JAcGMvXxlmllvhv2MUgsmj642u3h-cH5z_29lvJpDJh6buZ9CjeJg1Wmxuv2kcuS2bdJ6pitC9E9n9e6i1-i9mltC9B2oI3XtF_h8AAP__S0RIig">