.. _job_submission: How do you submit a job? ======================== .. secauthor:: Thomas Alexander Gaƫl Donval You should currently be on one of the two *front* (or login) nodes on |Balena|, named either ``balena-01`` or ``balena-02`` as you can see on your prompt or by typing: .. code-block:: console $ hostname These login nodes are only intended as entry points to the cluster: running simulations or other long-running/demanding programs on them is **strictly forbidden**. The correct way to leverage |Balena|'s power is by submitting |Bash| *scripts* to the job scheduler. Writing a submission script --------------------------- These scripts are simple text files containing information about resource allocation and program execution. A typical script looks like this: .. code-block:: bash #!/usr/bin/env bash #SBATCH --job-name=test #SBATCH --partition=batch #SBATCH --account=free #SBATCH --time=06:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 module purge module load group ce-molsim stack module load taskfarmer cd ~/scratch cd Simulation mpirun -np 16 taskfarmer -f task_list -v -r The first line, starting with ``#!``, tells the job scheduler what kind of script is going to be provided (here a |Bash| script). The lines starting with ``#SBATCH`` are requests for resources at the sole discretion of the scheduler: the job name is set to *test*, the scheduler can choose computing nodes out of a partition called *batch*, we are asking for free nodes rather than paid-for priority nodes, the maximum execution time is of 6 hours, we only want one node and we request all 16 |cores| of it. The lines starting with ``module`` are used to make programs available on the computing nodes as described in :ref:`sec_module_management`: in this case we load ``taskfarmer``, but it could be any program you need in your simulations such as ``gnuplot``, ``music``, ``raspa`` or ``gromacs/2019.2`` for instance. The rest of the lines constitutes the script itself: instructions executed in order, one after the other, given the requested resources. Such a script can be submitted to the scheduler using commands described later in this document. Submitting a job ---------------- Given a working submission script called :file:`test_job.sub` available in the directory :file:`~/scratch/simulations/` for instance, you can submit it to the scheduler by invoking: .. code-block:: console $ sbatch ~/scratch/simulations/test_job.sub The computer should return an answer stating that the submission was fine and giving you the job number for future reference. You don't really need to store that number anywhere as you can get it back at any time. Probing the state of your jobs ------------------------------ To get the status of your jobs, you need to type: .. code-block:: console $ squeue -u The columns ``ST`` (state) and ``START_TIME`` are the most interesting ones. Being in state ``R`` means that the job is currently *running* and ``P`` means *pending*, in which case you may want to have a look at the estimated starting time to know when your calcualtion is scheduled to start. For more details, have a look at ``squeue``'s manual page: .. code-block:: console $ man squeue You can search for the job state codes for instance by pressing :kbd:`/JOB STAT CODES` followed by :kbd:`n` or :kbd:`Shift+n` to look for the next or previous reference, respectively, until you reach the right section. Getting the status of your calculation -------------------------------------- Most programs output data every so often so that you can follow the status of the calculation. You can get access to that by looking at the files named :file:`.out` and :file:`.err`. Given the example script provided at the beginning of this section, you should look for :file:`test.out` and :file:`test.err`. The two usual ways to *follow* the output of a calculation from a file is either using ``less`` and press :kbd:`Shift-f` (use :kbd:`Ctrl-c` to get back to normal mode and :kbd:`q` to quit); or using ``tail`` which gives the last lines of a file (10 by default) but can be made to follow a file by using the ``-f`` flag. That way, you can display the file in its current state and then continue to show new lines as they are added. This is helpful in following outputs during the course of a simulation. Cancelling a job ---------------- If you want to cancel a running or pending calculation, you can get its ``id`` by using ``squeue`` as described above and then cancel it by typing: .. code-block:: console $ scancel Once cancelled, you can move or delete whatever job-related file/directory you want: never change files in a folder where a calculation is running.