Sending signals to Aspherix®
Aspherix® employs so called signal handlers to catch signals from inter-process communication. A classical example of a signal is SIGINT which can be typically sent to a program by using CTRL+C.
Aspherix® catches several different signals. What follows is a list of the different signals and their effects.
SIGINT: writes a restart file and terminates Aspherix® after the current time step is completed. If it is sent a second time before the current time step is completed, then Aspherix® is stopped immediately without writing a restart file.
SIGTERM: same as SIGINT
SIGUSR1: writes a restart file when the next time step is completed, run continues
SIGTSTP: halts, but not terminates, Aspherix® after the current time step is completed
SIGCONT: continues a simulation that was halted with SIGTSTP
The restart files are written by default to the current working directory and their file name is aspherix_auto_restart_%.basx, where % is either base or the processor id that wrote part of the restart file.
If you wish to restart a simulation from this file you can either use the read_restart command to do a classical restart or use the -rs command line option in Aspherix®. The latter will only work if your input script contains only one simulate command (always true for GUI cases) and will continue the simulation from where it was interrupted.
To send a signal to Aspherix® you can use a command line tool like htop and using the F9 key to select the specific signal or the command kill as
$ kill -SIGNAL ${PID}
where SIGNAL is any of the signals listed above and ${PID} is the process id of the Aspherix® command. An example to halt and continue Aspherix® at PID 1234 is:
$ kill -SIGTSTP 1234
$ kill -SIGCONT 1234
Getting the PID from the logfile
The PID of the first Aspherix process is written to the log file for your convenience. In your log_aspherix.txt file you should find a line similar to the following:
INFO: PID for signal handling is: 1234
which you can then use for the kill command or the filename described below.
Sending signals on Windows or large clusters
On Windows such fine grained signal handling is not possible and on large compute clusters you might not have access to the node directly.
To circumvent these issues Aspherix also reacts to special files in the working directory. Two different files can be used:
_KILL_PID_${PID}_.asx to write a restart file and terminate Aspherix after the current time step
_RESTART_PID_${PID}_.asx to write a restart file, run continue
Note the ${PID} needs to be replaced with the PID of any Aspherix process or the one of the parent MPI process.
The files will be removed automatically once Aspherix has detected them. Note, Aspherix needs to check for the existence of these files and does so only once every time step and only if at least 10 seconds have passed since the last check so in large simulations it might not react immediately.
This feature can be fully disable by adding the command disable_file_check to your input script.