<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/77803>77803</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[flang] Async Execution Termination Issue with Zombie Process in EXECUTE_COMMAND_LINE
</td>
</tr>
<tr>
<th>Labels</th>
<td>
flang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
yi-wu-arm
</td>
</tr>
</table>
<pre>
### The Problem
The harvesting of zombie process (finished process created by `EXECUTE_COMMAND_LINE` in async mode) is a issue. The current implementation calls `signal(SIGCHLD, SIG_IGN)` to ignore the signal from the child process created by `fork()`, this can prevent creation of zombie child.
https://man7.org/linux/man-pages/man2/sigaction.2.html
```
POSIX.1-1990 disallowed setting the action for SIGCHLD to
SIG_IGN. POSIX.1-2001 and later allow this possibility, so that
ignoring SIGCHLD can be used to prevent the creation of zombies
(see [wait(2)](https://man7.org/linux/man-pages/man2/wait.2.html)).
```
This cause a problem as if one async `EXECUTE_COMMAND_LINE` was called, all `EXECUTE_COMMAND_LINE` after that will have a command status of -1 (this value is the return value of calling `std::system()`, for both async and sync mode).
### Simplified Implementation and Reproducer
Simplified current implementation of `EXECUTE_COMMAND_LINE`:
```cpp
signal(SIGCHLD, SIG_DFL);
pid_t pid{fork()};
if (pid < 0) {
error;
} else if (pid == 0) {
int val = std::system(cmd);
exit(val)
}
```
Reproducer in cpp:
Reproducer: aarch64, linux, clang++
```cpp
#include <unistd.h>
#include <signal.h>
#include <iostream>
void execute_command_line(const char *cmd, bool isWait, int &exitstatus){
if(isWait){
exitstatus=std::system(cmd);
}else{
signal(SIGCHLD, SIG_IGN);
pid_t pid=fork();
if(pid<0){
exitstatus = -999;
}else if(pid==0){
int status=std::system(cmd);
exitstatus=status;
exit(status);
}
}
}
int main() {
int exitstatus=404;
execute_command_line("InvalidCommand", false, exitstatus);
std::cout << "exitstatus async: " << exitstatus<<std::endl;
execute_command_line("InvalidCommand", true, exitstatus);
std::cout << "exitstatus sync: " << exitstatus<<std::endl;
return 0;
}
```
console printout:
```
exitstatus async: 404
sh: 1: InvalidCommand: not found
exitstatus sync: -1
sh: 1: InvalidCommand: not found
```
console printout when `signal(SIGCHLD, SIG_IGN);` is commented out.
```
exitstatus async:404
sh: 1: InvalidCommand: not found
exitstatus sync:32512
sh: 1: InvalidCommand: not found
```
*This is not a very good representation as the value of `exitstatus` is not stored.
### Buggy Solution
One way to solve this is to reset the signal in the child process, using `signal(SIGCHLD, SIG_DFL)`, but it didn't work. (?)
Another similar approach to solve this is using a non-blocking `waitpid` to get the exit code of child process will clear up the zombie.
```cpp
void sigchld_handler(int signo) {
int status;
waitpid(-1, &status, WNOHANG); // -1 means any child, WNOHANG means non-blocking
}
.......
signal(SIGCHLD, sigchld_handler);
pid_t pid{fork()};
if (pid < 0) {
error;
} else if (pid == 0) {
std::system(cmd);
}
```
### The Risks
However, both approaches use signal, which has potential side-effect. In an offline discussion @tblah and others point out that, "The problem is that the signal handler(s) are global state for the process, not just for the fortran runtime library. So if some fortran code calls `EXECUTE_COMMAND_LINE` and then afterwards runs some C code of its own which expects the default handler for sigchld, it will be surprised by the signal handler installed by `EXECUTE_COMMAND_LINE`."
### Some Reference
https://github.com/llvm/llvm-project/pull/74077
gfortran docs: https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gfortran/EXECUTE_005fCOMMAND_005fLINE.html
fortran standard: 16.9.73
classic-flang implementation: https://github.com/flang-compiler/flang/blob/2693fbaff2a67ccb64d8e6800e8d63cfc8180a4f/runtime/flang/rdst.c#L2802
- Note: class-flang does not deal with zombie process.
llvm-test-suite:
- https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/execute_command_line_1.f90
- Note: in this test it calls an unimplemented GNU extension `call sleep()`
- https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/execute_command_line_2.f90
- https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/execute_command_line_3.f90
- Note: the standard ask that if no `command status` is present and it is assigned with a non-zero value, terminated immediately. This behavior doesn't exist in gfortran when executing async so the test, while it does run, but in flang-new a termination would be called.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMWFtv27gS_jXMy8CCTDm-POQhsetugG662Haxi_MSUOLI4pYiDZKKm_31B0NKjpzY3UsPsKcwUlukhjPf3L6h8F7tDOINu75j15sr0YXGuptnNTl0E-Haq9LK5xvGi_SBzw3CT86WGluWb1h-Sw8a4Z7QB2V2YGv4w7alQtg7W6H3wPiyVkb5BuXxWeVQBJRQPgOb5-9-e7f-5fO7x_XHH3-8fdg8frh_eMfmOSgDwj-bClorkfEVKA8ClPcdZlGRqnMOTQDV7jW2aIIIyhqohNaeBJNpQjO-_HT_fv3Dhw3ja_h0__7x_v0D4ys6IlhQO2MdQmgQ0n6onW3j76pR-pLStXVfGF8mOSQ4NMpDJQzsHT6RVnE_6fOCSRSYJeCaEPaeFbeMbxnftsIsMut2jG-1Mt3X9GiyFzv06TtnfOvVTlQkM-NZE1qdJJEC6RN__vTx0_1v2XQyXa1ykMoLre0BJXgM0UVkWZICtXXQYwPBptd7gDKAQRDP8ykII0GLgA6ivGTu3nqvSqVVeCYIvIXQiJDkRFzpvOEAwqZE6DxKwn2AKQL9Birfm8aXHhHY9d1BqMD4khPg1xvGl_8EPxIyQMdXjK-yswh-Tq7sPIIg91O0g_CgarAG-6D8RuAehI9BiJJQEVp_a7OoCVTCDQ5Ka2jEEx1b2bYlzH0QofMEzGRKuRRxfxK6Q0oHAs9h6Jzpn9k6nky4UwIESQgVt_7ZB2xP4pV8X9rQ9ObEs0bJNiDTu2HI_0-Ua6pWKOH-NOtIwM-4d1Z2FTrog-ll-4VktfU3wCHlTx1U7ffpyaXc3mw_kI3FXdq2V_IxwF5Jtrgbpexic9yhaoJ1rySwYg051Rm26NcA0DnrjnvZYgOoPcL4pQ0rNm_eUyaQR2gZ3nqhauVYSQD8GqP7ScS4HM46G5wjjJUBwmMA6WWFFbcghKua-Yxw6TNiDZUWZsf4HX0u4Mp4oUylO4mER2eUDzJrWPHu3HJywsVlZX1wKNrj8pNVEvArVl3Axz7CH7UySJhY4wNUjXDA-G1EaA2ltRqU_zUm_zqiyvic0Ep5EX05oKhqxpfD5tHz_t_orWLzF3wCwBYb8vaLpD9pKMWrE1-Cr9iMgq84UTgur_NzGp9qHWNpslq9PafXcySPYvJUJEH396y_gFv6MuzrA_fFG2d0OwH0dXCnHDQBWqFMAghOtT45f5bPTo64EEyM83vzJLSS67TAOI9FT5A_-RpOQ2gs8YhNZbtAUUxlgXE-ckSsmJRjjPNhx1hLenAUg0bq71E5uO77Nf7HCr-c0reZfFwMzxYoSmSriQAqE2wX3hbx9PMcoOTeVN4b-jmlP69QKW7B2AC17Yx8I2iQM5n-fTF_YgYcGjR_gVUWd5G7-tjB0RBjtF04TzTOQPA_QqDg11P-3RgwnriQ8nGjgCd0z7CzVoLDvUP_0v0TFzmSEDbPRwGWACERPliH8gK5uOt2u2f4ZHVHMtPqR4NwEM_EGL3VT5iIJ1EfC6RAGDN3Zd7ydnJP5wdC9G3WkJhR2QVQAaSShvFFgIN1XzLq96zYvrTn-PfW2NCgA69apYUDsd87K6rmrbpJBQHGmkmpbfWl14goKZXsNIzsensIO6isTITuZAyJJLHSKBx0-7g5MeYMLrX02HW92lWNlo-NMFKjo05JHYFY-tuS-7rKA8CgJ19OpgQS4_OhGq3h14ePP9w-vE_hD4mRE19tURgPwjwnG0Zb-6UxGq-qSpb-XWZ7byz691nfn_bVSzXzdMD-Wfkv_QD0gz3gExm37sl6H2BIEYVHQrKGQ6OqBhpBQ1lAE5TQ4JXECdY1ViGDe6LoYOua-g3NhVXnPaUum-Wh1KKJDD6GM8mgMKCqFwe66G9Oqg3TUJw9xEnyvUQWdSgQDmGnbUl6BBEwDhyhwXFeUkX4vfPhuFZbF5ww4DoTVIugVemEe87gkyX0vW1f9sT0OM76F8crI0mySXPWQTjpSbpPstbHJFPBgz2YHkf8uscqpJomsRadDoN9Udc-9iIn7ee2EsF3bu-UT3cEb5EBZXyIY-G3Lz6y2KXPj1-k9M9Yo0NT4bmLhJ0KTVdmlW1pFNZPw3-TvbO_YxUY3-47rRnfLmb5YpFE7AZMpa1IFLySWVXZznT9fG0NRVDcGZcm0yLjWU4_ejGMbwfT8vy6Hsyj72Ti6OpiONcHYaRwsSVN59kqWxR9F9bCe1VNappcXs2OZxQdGx9fmVS23asYlukB49tS25LxLZ-viroUdc3FfFFV5Xwmlzhf5jku5byo6mo5XeZiVjO-7cNxJMNJH7KK8eIDX-aDt2ACDzYg6RX17rWWFlPzkyg0HFRoXt2RnbTD6KyAPkx8p5Kwo_S_6OrR20drE73ebo8eGjnL4c5hLAaMb8_R08dpVq_yN0bGdkuVAH3smCkZhYHOHD2FEt4__AL4NaBJ5Wae0z7wGnF_vJH4P7CRn9j4r6lRnIU6VpM-SUD4L6n6qhqMjYie3Bb1fKsnaLEEqhBvT9Ntr0xBmMjIH-hs4m1x2kDXKhPvOVXbolQioH7OINLAEhvxpKyLEZ24EX5V5HsDxxISaXKyLDKeeKkUbwYxBkrfrDRGjkWp4TpzpF0GUtoaPIA4akNxc7CdllRl081aBlfyppCrYiWu8Ga6yGer1WzO-VVzI-RsWcucF_J6eT1byWtccFFWsyIvqvkcV1fqhud8lk-n0-miWOQ8m4tarlaLMi9FJRdywWY5tkLpjHxMZe8q3jrfLBbLvLjSokTt4405531B4Ox6c-VuYkyU3c6zWa6VD_5FQlBBx2v29Mb1Bm4jNO8SVNbA55G193RectN_Uq34qed_ysC5vnHVOX3zt1tBNIvqeLTsvwEAAP__G0B1vw">