7. Signal Handling
Simframe has the capability to be interacted with from outside its own process. This can be used to trigger the writing of dump and/or data files of a running simulation or to stop the simulation entirely. Typical use cases can be to inspect the current state of a simulation before the next snapshot is written, or to write dump files and terminate a simulation before the time limit of a SLURM job occurs.
In this notebook we set up a simple mock simulation without any integration instruction, that is simply advancing the integration variable.
[1]:
import numpy as np
from simframe import Frame
from simframe import Integrator
from simframe import writers
[2]:
sim = Frame()
[3]:
sim.writer = writers.hdf5writer(datadir="7_data", overwrite=True)
[4]:
sim.addintegrationvariable("x", 0.)
sim.x.updater = lambda sim: 0.15
sim.x.snapshots = np.arange(0., 6., 1.)
[5]:
sim.integrator = Integrator(sim.x)
[6]:
sim.run()
Writing file 7_data/data0000.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0001.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0002.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0003.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0004.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0005.hdf5
Writing dump file 7_data/frame.dmp
Execution time: 0:00:00
File Signals
Dump Files
Now we want to trigger the writing of dump files imeadiately before they would be scheduled by the simulation. This can be done by simply creating a file called DUMP (capital letters for case sensitive operating systems) in the data directory of the simulation.
In this example we are creating a systole that is touching this file as soon as \(\large x = \pi\) is reached, to simulate the creation of the file by the user. We add a flag to the simulation frame to make sure the file is only created once.
[7]:
sim.writeflag = True
sim.writefile = "DUMP"
[8]:
def systole(sim):
if sim.writeflag and sim.x > np.pi:
file = sim.writer.datadir / sim.writefile
file.touch()
sim.writeflag = False
sim.updater.systole = systole
We reset the integration variable and run the simulation again.
[9]:
sim.x = 0.
[10]:
sim.run()
Writing file 7_data/data0000.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0001.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0002.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0003.hdf5
Writing dump file 7_data/frame.dmp
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0004.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0005.hdf5
Writing dump file 7_data/frame.dmp
Execution time: 0:00:00
As can be seen, after the output 0003, there are two consecutive dump files written, because we triggered the event at \(\large x = \pi\).
Note: This only works if the Frame has a writer with a data directory assigned, since Simframe needs to know where to look for the signal files and how to write dump or output files. After the event has been triggered, the triggering file has been automatically deleted.
[11]:
(sim.writer.datadir / sim.writefile).is_file()
[11]:
False
Output Files
In the next example we want Simframe to write unscheduled output and dump files. This can be achieved by creating the file WRITE in the data directory of the writer.
To demonstrate this, we change the file name, activate the flag, and reset the integration variable.
[12]:
sim.writeflag = True
sim.writefile = "WRITE"
sim.x = 0.
Now we can run the simulation again.
[13]:
sim.run()
Writing file 7_data/data0000.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0001.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0002.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0003.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/__OUTPUT__
Writing dump file 7_data/frame.dmp
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0004.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0005.hdf5
Writing dump file 7_data/frame.dmp
Execution time: 0:00:00
As can be seen, after the output 0003 an unscheduled output file __OUTPUT__ has been written to the data directory. Aftwards two dump files were created. One was created by the writer after writing the output file __OUTPUT__, since the writing of dump files at outputs was activated (sim.writer.dumping = True). The second one was triggered by the WRITE file to make sure that a dump file is always written by this event, even if the writer has dumping deactived.
Note: The __OUTPUT__ has the same file format as defined by the writer and will always be forcefully overwritten in this event, even if sim.writer.overwrite = False.
Stopping the Simulation
In this example, we want to trigger the writing of data and dump files and stop the simulation. This can be achieved by creating the file STOP in the data directory of the writer.
To demonstrate this, we change the file name, activate the flag, and reset the integration variable.
[14]:
sim.writeflag = True
sim.writefile = "STOP"
sim.x = 0.
Now we can run the simulation again. However, since this will trigger a SystemExit exception we need to catch it to keep this example notebook going.
[15]:
try:
sim.run()
except SystemExit:
print("-----> SystemExit")
Writing file 7_data/data0000.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0001.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0002.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/data0003.hdf5
Writing dump file 7_data/frame.dmp
Writing file 7_data/__OUTPUT__
Writing dump file 7_data/frame.dmp
Writing dump file 7_data/frame.dmp
-----> SystemExit
As can be seen, after the output 0003 the additional output file __OUTPUT__ and twice the dump file has been written as was the case for the WRITE file previously, then the SystemExit exception has been raised and the simulation has stopped.
System Signals
In addition to file signals in the simulation’s data directory, system signals sent to the simulation’s process can trigger the writing ouf data and dump files and stop the simulation. By default Simframe is listening for the abort signal SIGABRT (6).
[16]:
import signal
[17]:
try:
signal.raise_signal(signal.SIGABRT)
except SystemExit:
print("-----> SystemExit")
Signal detected: SIGTERM (15)
Writing file 7_data/__OUTPUT__
Writing dump file 7_data/frame.dmp
Writing dump file 7_data/frame.dmp
-----> SystemExit
SIGABRT signal triggered the writing of dump and output files and exited the simulation with a SystemExit exception. This even worked without the simulation running since signal.raise_signal() is signalling its own process.pid is known. This can be done with the os module:import signal
import os
pid = 12345
os.kill(pid, signal.SIGABRT)
SLURM Jobs
This functionality of Simframe can be used to trigger the writing of dump and output files just before a SLURM job would hit its time limit. This is especially useful to restart a simulation right from the last state and not from the last scheduled snapshot, that may have been written minutes or hours earlier.
A typical SLURM script could look as follows:
#!/bin/bash
#SBATCH ---job-name my_job
# .
# .
# .
#SBATCH --time=24:00:00
#SBATCH --signal=B:SIGABRT@300
# .
# .
# .
exec python -u start.py
This job will run for a maximum of 24 hours. 300 seconds before the time limit is reached the SIGABRT signal will be sent to the process, which will trigger Simframe to write output and dump files and terminate the simulation. Note that the signal may be not sent exactly at the requested time and that – depending on the size of the simulation – the writing of the files can take some time. It is therefore recommended to add some safety margin.
Note: SLURM is signalling the main batch script of the job. Therefore, the script with the Simframe simulation has to be invoked with exec to receive the signal.
If Simframe caught the SIGABRT signal and a SLURM job is detected by checking for the environmental variable $SLURM_JOB_ID, Simframe will automatically requeue the job by running the shell command scontrol requeue incomplete $SLURM_JOB_ID. Therefore, the Simframe simulation should be written in a way such that it automatically restarts from a dump file, if one is present. For example:
import numpy as np
from simframe import Frame
from simframe import writers
from simframe.io import readdump
sim = Frame()
sim.writer = writers.hdf5writer(overwrite=True)
#
# Your setup here
#
if __name__=="__main__":
# Checking for existing dump file
dumpfile = sim.writer.datadir / "frame.dmp"
if dumpfile.is_file():
print("Loading from dump")
sim = readdump(dumpfile)
# Start/resume simulation
print("Running simulation")
sim.run()
To turn off the automatic requeuing we need to remove the requeueing action from the stop signal event.
[18]:
from simframe.utils.signalhandler import events
[19]:
events.STOPSIGNALEVENT.actions
[19]:
[<simframe.utils.signalhandler.actions.writeaction.WriteAction at 0x7a21bd9dfb60>,
<simframe.utils.signalhandler.actions.dumpaction.DumpAction at 0x7a21bd9dfcb0>,
<simframe.utils.signalhandler.actions.slurmrequeueaction.SlurmRequeueAction at 0x7a21bd9df770>,
<simframe.utils.signalhandler.actions.stopaction.StopAction at 0x7a21bd9df8c0>]
[20]:
del(events.STOPSIGNALEVENT.actions[2])
[21]:
events.STOPSIGNALEVENT.actions
[21]:
[<simframe.utils.signalhandler.actions.writeaction.WriteAction at 0x7a21bd9dfb60>,
<simframe.utils.signalhandler.actions.dumpaction.DumpAction at 0x7a21bd9dfcb0>,
<simframe.utils.signalhandler.actions.stopaction.StopAction at 0x7a21bd9df8c0>]