.. _tnt_in_gromacs:

******************************************
Calculating |TNT| solvation with |Gromacs|
******************************************

.. secauthor:: Carmelo Herdes <c.e.herdes.moreno@bath.ac.uk>
               Claire Hobday <c.l.hobday@bath.ac.uk>
               Gaël Donval <g.donval@bath.ac.uk>

.. |MarvinSketch| replace:: :name:`MarvinSketch`
.. |MKTOP| replace:: :name:`MkTop`


.. .. This tutorial makes use of free access software, specifically: 
.. .. |MarvinSketch| under *End User License Agreement*, |MKTOP|,  
.. .. |Gromacs| and |Vmd| under *GNU Lesser General Public License*. 
.. .. Access to |Balena| should be requested to and handled by
.. .. unidesk.campus.bath.ac.uk

In this tutorial you will build on your molecular dynamics simulations skills
and use |Gromacs| in conjuction with other computational tools to build up a
simulation cell from scratch to investigate the effects of temperature on the
solvation of |chemTNT| (|TNT|) in water.

.. |chemTNT| replace:: :math:`2,4,6\mathrm{-trinitrotoluene}`

.. admonition:: Objectives
   :class: hint

   #. Enhancing practical understanding of molecular dynamics simulations.
   #. Learning how to use |MarvinSketch| to build 3-D models of molecules for |MD|
      input.
   #. Learning how to use |MKTOP| to create a topology and force field input for
      |MD|.
   #. Learning how to run |MD| simulations using the program |Gromacs|
      on the university's high performance computing cluster, |Balena|.
   #. Learning how to visualize results using |Vmd|.
   #. Using |MD| to work in different statistical ensembles, e.g. 
      :smallcaps:`nvt` and :smallcaps:`npt` simulations.


**Before you start:**

#. Review the slides on |MD| simulations.
#. Be sure you can :ref:`access <connect_to_Balena>` |Balena|.


Background
----------

Nowadays complicated problems are often simulated using computers for three
basic reasons: to minimize human error, to avoid time constraints and to save
money. Different programs are used everyday and everywhere depending on the
field, application, and desired simulation, ranging from the quantum level up
to the cosmological scale (e.g.  the |Illustris|_ project is a large
cosmological simulation of galaxy formation).

.. |Illustris| replace:: :name:`Illustris`
.. _Illustris: https://www.youtube.com/watch?v=NjSFR40SY58

Over the course of this tutorial, you will be introduced to some of the (basic)
features of |MarvinSketch|, |MKTOP|, |Gromacs| and |Vmd| to build up a
simulation cell from scratch and to investigate the effects of temperature 
on the solvation of |TNT| in water.  These off-the-shelf programs are very 
powerful common tools in the field of molecular simulation.

.. .. |RoEtAl| replace:: :name:`Ro` *et al.* 
   .. _RoEtAl: https://doi.org/10.1021/je950322w 

   |RoEtAl|_ studied |TNT| in 1996. They were constantly mentioning experimental
   difficulties and concerns:
  
   * "|chemTNT| (|TNT|) is one of many explosives contaminating soils in munitions
    facilities and military bases. It is toxic to humans, causing liver damage
    and anemia, inhibits growth of many fungi, yeasts, actinomycetes, and
    gram-positive bacteria, and exhibits mutagenicity in the Ames screening
    test..."
  
   * "Large chunks of neat (pure) |TNT| materials pose a detonation hazard..."
  
   * "|TNT| crystals in the soil matrix serve as a continued contamination source
    for the munitions facilities and military bases even long after activities,
    which led to soil contamination cease.  Knowledge of environmental fate and
    mass transport characteristics of |TNT| in the subsurface environment is
    essential for risk assessment..."
  
   These points define areas where simulation typically shines: 
  
   * Modelled |TNT| molecules won't cause any liver damage;
   * A molecular model that "explodes" wouldn't do any sort of damage;
   * You are going to use one of the *best* laboratories in the world in terms of
    control of experimental conditions: |Balena|, a computer cluster is at your
    service. In |Balena|, your |TNT| and water model molecules are as pure as one
    can ask for!

Let’s start!

Drawing your |TNT| molecule
---------------------------

|MarvinSketch| is an advanced chemical editor for drawing chemical structures,
queries and reactions.  It allows you to quickly draw molecules through basic
functions on the screen and advanced functionalities such as sprout drawing,
customisable shortcuts, abbreviated groups, default and user-defined templates
and context-sensitive popup menus.

It should be available on the university computers.  To install it on own your
computer, go to the |ChemAxon|_ website.  Click on download and select the
relevant version for you (note that a portable version is provided under the
*Platform Independent* section).

.. |ChemAxon| replace:: :name:`ChemAxon`
.. _ChemAxon: https://chemaxon.com/products/marvin


Once installed/copied over, open |MarvinSketch| and familiarise yourself with
the canvas window (see :numref:`marvin_canvas`); hover over the various items
and toolbars to see what you can do!

.. _marvin_canvas:

.. figure:: _img/TNT2.*
    :width: 70%
   
    |MarvinSketch|'s canvas is where you can draw molecules.

Once you are familiar with the interface, you can start building your target
molecule, |TNT|, following its representations in :numref:`marvin_draw_TNT`. 

.. _marvin_draw_TNT:

.. figure:: _img/TNT3.*
    :width: 70%
   
    |TNT| chemical structure and 3D atomistic model. 
    `Source <https://en.wikipedia.org/wiki/TNT>`_.

There are actually many ways of drawing that molecule in |MarvinSketch|.  None
of them are intrinsically better but we suggest that you start by using building
elements that are the closest possible to the final structure you want to
represent. In this case, we could start by inserting a benzene ring by clicking
on the benzene ring symbol at the left of the bottom toolbar. Then place it on
the canvas by clicking anywhere and press :kbd:`Esc` (to avoid placing any more
of them next time you click on the canvas).

Then create new methyl groups (:math:`\mathrm{CH}_3`) by selecting and using
the *Single bond* tool (top of the left toolbar) until you get a structure
similar to what is depicted in :numref:`marvin_draw_CH3`. Once finished,
remember to press :kbd:`Esc` to unselect the current tool.

.. _marvin_draw_CH3:

.. figure:: _img/TNT5.*
   :width: 70%
   
   Benzene ring with four extra methyl groups ready to be edited.
   Note that, by default, hydrogen atoms are not displayed: each
   dangling bond is actually a :math:`\mathrm{CH}_3` in disguise!

But as you can see back in :numref:`marvin_draw_TNT`, only one group should
actually be a :math:`\mathrm{CH}_3`. The rest should be made of nitro groups
(:math:`\mathrm{NO}_2`) instead! Fortunately, |MarvinSketch| allows you to
change the chemical nature of any atom you want. Have a look at the toolbar at
the right of your screen, click on ``N``. Now hover your pointer over the
carbon atoms you would like to replace by nitrogen and click to make the
substitution. Repeat until you get the result shown in :numref:`marvin_N_subst`.

.. _marvin_N_subst:

.. figure:: _img/TNT7.*
   :width: 70%
   
   The result of the substitution of the methyl groups by nitrogen.

Now it will be easier if we could see all the hydrogen atoms that were kept
hidden until now. To do so, click :menuselection:`Structure --> Add --> H
Explicit Hydrogens`. Then perform the required substitution of hydrogens to
oxygens to transform the amino groups (:math:`\mathrm{NH}_2`) to nitro groups.

Finally, you must convert the single bonds between the oxygens and nitrogens
into double bonds. To do so, click on the small arrow at the right of the
*Single bond* tool (top of the left toolbar), select the *Double bond* tool
and click on the single bonds you need to convert.

The final structure is shown in :numref:`marvin_TNT_final2D`. Check carefully
that you obtained the same result.

.. _marvin_TNT_final2D:

.. figure:: _img/TNT9.*
   :width: 70%
   
   TNT 2D model.

The very last step before saving that structure is to make it 3D!  Select
:menuselection:`Structure --> Clean 3D` and you should get a 3D structure that
you must save (:menuselection:`File --> Save`) as a protein data bank
(:file:`.pdb`) file: :file:`TNT.pdb`. We suggest that you try opening this
structure in |Vmd| to really see that the structure you generated seems proper.

Congratulations! You have just created your first structure from scratch.
You are now ready to set up your |MD| simulation.


Converting |TNT| into the |Gromacs| format
------------------------------------------

First we need to convert the structure file :file:`TNT.pdb` into a structure
file readable by |Gromacs|: :file:`TNT.gro`. As before, we are providing you
with a few files to help you setting your working environment up.
Download, copy and extract them into a relevant directory (e.g.
:file:`~/scratch/tutorials/gromacs`) from this link: :dl:`TNT.tar.xz <_data/TNT.tar.xz>`.

.. code-block:: console

    $ tar -xf TNT.tar.xz

.. only:: latex

    .. hint:: The reference to ``TNT.tar.xz`` above is in fact a 
      :smallcaps:`pdf` attachment; right-click on it and select the option
      to save the file in the drop-down menu.

Locate the newly created directory and create two copies of that folder:
:file:`TNT_NVT` and :file:`TNT_NPT`:

.. code-block:: console

    $ cp -r TNT TNT_NPT
    $ mv TNT TNT_NVT


Go into :file:`TNT_NVT` and copy over the file :file:`TNT.pdb` that you have
just created with |MarvinSketch| into that directory. As usual when we need to
do any actual work, we need to log into a computing node and load |Gromacs|:

.. code-block:: console

    $ sint
    $ module load gromacs/2019.2

Then edit the :file:`TNT.pdb` file you have just created and replace all
the instances of ``UNK`` by ``TNT``: this is a way of specifying that all
the atoms in that file belong to a |Gromacs| group (here, a molecule) called 
``TNT``. That name would usually be arbitrary: you would just need to find
something descriptive. In this specific case though, ``TNT`` is also used in
other files we provided (most notably :file:`system.top`): so you need to
stick to it.

.. hint:: The easiest way to do so in ``nano`` is to use its find-and-replace
    feature: press :kbd:`Ctrl-\\` then type ``UNK`` as a search token,
    replace with ``TNT`` and press :kbd:`a` to replace all occurrences at once
    or :kbd:`y` to validate each line.

Then convert that :file:`.pdb` file to a :file:`.gro` file:

.. code-block:: console

    $ gmx editconf -f TNT.pdb -o TNT.gro

You can have a look at the generated :file:`TNT.gro` file. It contains
information about the type and position of each atom.


Defining force-fields
---------------------

|Gromacs| now has a structure to work with but if you remember the
`introductory lecture <http://people.bath.ac.uk/td222/research_projects/ResearchProjects_Intermolecular_Interactions.pdf>`_ 
about molecular simulations, you also need to provide |Gromacs| with 
information about how each atom interacts with each other: you need to define
force-fields.

|Gromacs| already contains extensive lists of parameters. You are going to use
the |OPLS-AA|_ set (optimized potential for liquid simulations all atoms).
However |Gromacs| does not know that. Neither does it know how you intend to
use them exactly: you need to write a topology file (:file:`.itp`) to do that.

.. _OPLS-AA: https://doi.org/10.1021/ja9621760
.. |OPLS-AA| replace:: :name:`opls-aa`

Such a file must contain:

* A list of all the atoms in the molecule, their atom type, their mass and charge.
* A list of bonds: a list of what atom numbers are bonded to each other plus a
  flag to describe the kind of bond (i.e. which sort of force-field to use,
  e.g. ``1`` for a harmonic potential).
* A list of angles: like bonds, but three atom numbers are needed.
* A list of dihedrals (including improper dihedrals where appropriate):
  four atoms numbers must be specified.
* The 1,4 pairs: a list of pairs of atoms numbers. 

Defining such a file would most likely take days. In practice, :file:`.itp`
topology files can often be readily found alongside publications or in
databases. In our case we are going to create that file nonetheless with the
help of a small program, |MKTOP|_ (make topology).

.. _MKTOP: http://www.aribeiro.net.br/mktop/

That program should already be in your tutorial folder:

.. code-block:: console

    $ ./mktop -i TNT.pdb -o TNT.itp -ff opls -conect no

The ``-ff opls`` flag specifies that the |OPLS-AA| force-field should be used;
The ``-conect no`` flag tells |MKTOP| that our :file:`.pdb` file  
does not contain any bond information (this is not included
in output files from |MarvinSketch|).

Getting your topology file right
--------------------------------

In a perfect world, the ``./mktop`` command above would be enough to get a
working :file:`TNT.itp` file... but it isn't. The file :file:`TNT.itp` needs
some editing:

* Under ``[ moleculetype ]``, change the name of the molecule from 
  ``MOL`` to ``TNT``.

* Under ``[ atoms ]``, remove the numbers after each atom symbol. For
  instance ``C1`` should become ``C``, ``C2`` should become ``C`` as well, 
  etc.

* Still in the ``[ atoms ]`` section, you must set each of the atom's charge
  by its proper value (currently they are all set to ``0.00000``).
  To do so,

    * make a note of each atom type (i.e. the names starting by ``opls_``); 
    * open :file:`oplsaa.ff/ffnonbonded.itp` (with ``less``): that file contains
      all the force-fields parameters |Gromacs| will be using;
    * find each entry corresponding to each atom type you identified in
      :file:`TNT.itp`;
    * find the charge corresponding to each entry;
    * replace each charge in :file:`TNT.itp` by its proper value.

  .. hint:: Use :kbd:`/` to search for something ``less``. Use :kbd:`n` to jump
    to the next matching element or :kbd:`Shift-n` to get back to the previous
    one.

* Remove the following unneeded lines from the end of :file:`TNT.itp`:

  .. code-block:: none

    [system]
    ; Name
    MKTOP

    [ molecules ]
    ; Compound       #mols
    MOL           1

From |Gromacs|'s perspective, the atom type (starting with ``opls_``) is a
reference to the force-field parameters found in the database file 
:file:`oplsaa.ff/ffnonbonded.itp`. |MKTOP| cannot read that file though to
copy charges over: this is why you had to do it yourself.

A human-readable description of each type is also provided in
:file:`oplsaa.ff/atomtypes.atp`: have a look at it and make sure the atom types
in :file:`TNT.itp` look right.


Checking your topology
----------------------

The best way to quickly find if there is something really wrong in your system
is to calculate its total charge: just sum all the partial charges you have
set in :file:`TNT.itp`. Do they add up to zero? Why do you expect that system
to be neutral? What would happen if the system weren't neutral (think about
periodic boundary conditions)?

In principle, you should also check for consistency using indications in
:file:`oplsaa.ff/atomtypes.atp`: e.g. if one of your nitrogens were of a type 
``opls_145`` described as ``Benzene C`` in :file:`oplsaa.ff/atomtypes.atp`,
something would be really wrong somewhere. However doing a full manual 
check with :file:`oplsaa.ff/atomtypes.atp` and |Vmd| would take too much
time in the span of this tutorial: we have already checked that |MKTOP|
behaves correctly with |TNT|.

.. attention:: Should you want to do that consistency check yourself,
    be aware that |Vmd| start counting from 0 while atoms are referenced from
    1 in :file:`TNT.itp`. This means that what |Vmd| identifies as ``atom 0``
    is actually the atom of index 1 in :file:`TNT.itp` while ``atom 1`` in 
    |Vmd| is actually at index 2 in :file:`TNT.itp`, etc.


Preparing the simulation box
----------------------------

If we really wanted to follow the aforementioned paper we would need to
simulate the behaviour of :math:`50\,\mathrm{mg}` of |TNT| into
:math:`1\,\mathrm{L}` of water at :math:`35\,\mathrm{ºC}` and
:math:`1\,\mathrm{bar}`. What usually really matters though is not absolute 
quantity of components but concentrations. So the real question is how many 
molecules of |TNT| and water do we need to place in a given simulation cell 
to reach such a concentration?

Knowing that |TNT| molecular weight is :math:`227.13\,\mathrm{g/mol}` and
water density in these conditions is :math:`55.177\,\mathrm{mol/L}` we find
that we need about 250 thousand water molecules per |TNT| molecule. 
Do you think this kind of system size can be reached in practice? 
Is it worth trying it? If you have no idea, try to roughly extrapolate
from the execution time you had when calculating bulk water properties.

Anyway, such a system cannot be studied in the remaining time of this tutorial.
We are going to create a much simpler system instead: 10 |TNT| molecules 
in a simulation box containing 1000 water molecules. We are going to proceed
in two steps:

#. We need to insert our |TNT| molecules into a simulation box 
   (here :math:`5^3\,\mathrm{nm}^3`):

   .. code-block:: console

       $ gmx insert-molecules -ci TNT.gro -box 5 -nmol 10 -o 10TNT.gro

   Here, ``gmx insert-molecules`` is the |Gromacs| sub-command used to
   insert molecules in a simulation box; ``-ci`` specifies the structure of
   the molecule that must be inserted; ``-box`` sets the size of the 
   simulation box in nanometres (if only one number is given, |Gromacs|
   just assumes the system is cubic); ``-nmol`` sets the number of molecules
   to add and ``-o`` specifies the name of the output file.

   By all mean, visualise your 10 |TNT| molecules (:file:`10TNT.gro`) in |Vmd|!

#. Now we need to add the 1000 water molecules in the box as well:

   .. code-block:: console
   
       $ gmx solvate -cp 10TNT.gro -cs -maxsol 1000 -o TNT_Water.gro

   Here, ``gmx solvate`` is the solvation sub-command; ``-cp``
   specifies the box we want to solvate; ``-cs`` specifies the the solvent
   we want to insert, ``-maxsol`` sets the number of solvent molecules to add
   and ``-o`` redirects the output into a file called :file:`TNT_Water.gro`.

Visualise the output in |Vmd| and type the following (in |Vmd| terminal)
to visualise periodic boundary conditions on your simulation cell:

.. code-block:: console

   $ pbc box

Do you see why we did not start by adding the water and then add the |TNT| 
molecules in?


Running the simulation
----------------------

We are finally ready to run the simulation! We *could* execute
the following if we are still on a computing node (but wait, don't
do it):

.. code-block:: console

    $ gmx grompp -v -f NVT.mdp -c TNT_Water.gro -o topo.tpr -p system.top
    $ gmx mdrun -v -s topo.tpr -c TNT_Water.gro

.. only:: latex

    .. hint:: Mind the little red arrow that means the line continues 
        (without an actual line break).

However this is going to take hours! Since the beginning, we have been using
computing nodes directly: they are nice but limited (you can't ask for more
than one node at once, you can only stay on it 6 hours at a time, you
cannot disconnect without killing your job). We should instead employ an 
automated approach to running our jobs.

The *usual* way of requesting computing resources on |Balena| consists in
submitting so-called submission scripts. In our case, such a submission script
has been provided: :file:`TNT.sbatch`. You can read more about that file in
:ref:`job_submission`.

That file can be submitted to the job scheduler using:

.. code-block:: console

    $ sbatch TNT.sbatch

.. note:: If you are a project student in you training week, you can change
   the following lines:

   .. code-block:: bash
   
      #SBATCH --account=free
      #SBATCH --partition=batch

   into those ones to get a faster access to resources:

   .. code-block:: bash
   
      #SBATCH --account=ce30122
      #SBATCH --partition=teaching

.. warning:: **Don't** play with :file:`TNT.sbatch` in general: computing 
    resources are accounted for and an overuse would make further submissions
    more difficult.


That file is going to request resources for your job and execute |Gromacs| in
parallel on multiple |cores|: instead of having to wait for
hours, the calculation should terminate within 30 minutes.

You can review what you have submitted using:

.. code-block:: console

    $ squeue -u <username>

Once your job is running (i.e. the ``ST`` column should show ``R``),
you can follow its state in :file:`slurm-*.out` (again ``*`` in that context
is a wildcard). You can open it with ``less`` and press :kbd:`Shift-f` to
display the data as it comes in.

One a job is submitted like this, don't change anything related
to its working directory anymore: don't rename, move, edit, change or copy
anything.


Exercises
---------

The simulation you are running is performed in the :smallcaps:`nvt` 
ensemble (you should see that a thermostat is set in :file:`NVT.mdp`, 
``Tcoupl``). It is going to take around 20 minutes to finish (unless you
made a mistake somewhere).

In the meantime, go in the other copy of the directory you created right at
the beginning:

.. code-block:: console

    $ cd ../TNT_NPT

Rename (``mv``) :file:`NVT.mdp` to :file:`NPT.mdp` and edit the file to use
``berendsen`` barostat (``Pcoupl``). Edit :file:`TNT.sbatch` to use
:file:`NPT.mdp` instead of :file:`NVT.mdp`. Copy the required files from
:file:`../TNT_NVT` to avoid having to go through the whole manual editing
process again.  Finally submit that new job to |Balena|.

Once the :smallcaps:`nvt` simulation is run, visualise the |MD| trajectory
(with |Vmd|) and use the post-processing commands you saw in
:ref:`water_in_gromacs`. When is your system equilibrated? Have a look at the
time evolution of temperature: is it what you expected?

Finally, calculate the density of your system using the results from the
:smallcaps:`npt` simulation.

Have a nice simulation!