.. _tnt_in_gromacs: ****************************************** Calculating |TNT| solvation with |Gromacs| ****************************************** .. secauthor:: Carmelo Herdes <c.e.herdes.moreno@bath.ac.uk> Claire Hobday <c.l.hobday@bath.ac.uk> Gaël Donval <g.donval@bath.ac.uk> .. |MarvinSketch| replace:: :name:`MarvinSketch` .. |MKTOP| replace:: :name:`MkTop` .. .. This tutorial makes use of free access software, specifically: .. .. |MarvinSketch| under *End User License Agreement*, |MKTOP|, .. .. |Gromacs| and |Vmd| under *GNU Lesser General Public License*. .. .. Access to |Balena| should be requested to and handled by .. .. unidesk.campus.bath.ac.uk In this tutorial you will build on your molecular dynamics simulations skills and use |Gromacs| in conjuction with other computational tools to build up a simulation cell from scratch to investigate the effects of temperature on the solvation of |chemTNT| (|TNT|) in water. .. |chemTNT| replace:: :math:`2,4,6\mathrm{-trinitrotoluene}` .. admonition:: Objectives :class: hint #. Enhancing practical understanding of molecular dynamics simulations. #. Learning how to use |MarvinSketch| to build 3-D models of molecules for |MD| input. #. Learning how to use |MKTOP| to create a topology and force field input for |MD|. #. Learning how to run |MD| simulations using the program |Gromacs| on the university's high performance computing cluster, |Balena|. #. Learning how to visualize results using |Vmd|. #. Using |MD| to work in different statistical ensembles, e.g. :smallcaps:`nvt` and :smallcaps:`npt` simulations. **Before you start:** #. Review the slides on |MD| simulations. #. Be sure you can :ref:`access <connect_to_Balena>` |Balena|. Background ---------- Nowadays complicated problems are often simulated using computers for three basic reasons: to minimize human error, to avoid time constraints and to save money. Different programs are used everyday and everywhere depending on the field, application, and desired simulation, ranging from the quantum level up to the cosmological scale (e.g. the |Illustris|_ project is a large cosmological simulation of galaxy formation). .. |Illustris| replace:: :name:`Illustris` .. _Illustris: https://www.youtube.com/watch?v=NjSFR40SY58 Over the course of this tutorial, you will be introduced to some of the (basic) features of |MarvinSketch|, |MKTOP|, |Gromacs| and |Vmd| to build up a simulation cell from scratch and to investigate the effects of temperature on the solvation of |TNT| in water. These off-the-shelf programs are very powerful common tools in the field of molecular simulation. .. .. |RoEtAl| replace:: :name:`Ro` *et al.* .. _RoEtAl: https://doi.org/10.1021/je950322w |RoEtAl|_ studied |TNT| in 1996. They were constantly mentioning experimental difficulties and concerns: * "|chemTNT| (|TNT|) is one of many explosives contaminating soils in munitions facilities and military bases. It is toxic to humans, causing liver damage and anemia, inhibits growth of many fungi, yeasts, actinomycetes, and gram-positive bacteria, and exhibits mutagenicity in the Ames screening test..." * "Large chunks of neat (pure) |TNT| materials pose a detonation hazard..." * "|TNT| crystals in the soil matrix serve as a continued contamination source for the munitions facilities and military bases even long after activities, which led to soil contamination cease. Knowledge of environmental fate and mass transport characteristics of |TNT| in the subsurface environment is essential for risk assessment..." These points define areas where simulation typically shines: * Modelled |TNT| molecules won't cause any liver damage; * A molecular model that "explodes" wouldn't do any sort of damage; * You are going to use one of the *best* laboratories in the world in terms of control of experimental conditions: |Balena|, a computer cluster is at your service. In |Balena|, your |TNT| and water model molecules are as pure as one can ask for! Let’s start! Drawing your |TNT| molecule --------------------------- |MarvinSketch| is an advanced chemical editor for drawing chemical structures, queries and reactions. It allows you to quickly draw molecules through basic functions on the screen and advanced functionalities such as sprout drawing, customisable shortcuts, abbreviated groups, default and user-defined templates and context-sensitive popup menus. It should be available on the university computers. To install it on own your computer, go to the |ChemAxon|_ website. Click on download and select the relevant version for you (note that a portable version is provided under the *Platform Independent* section). .. |ChemAxon| replace:: :name:`ChemAxon` .. _ChemAxon: https://chemaxon.com/products/marvin Once installed/copied over, open |MarvinSketch| and familiarise yourself with the canvas window (see :numref:`marvin_canvas`); hover over the various items and toolbars to see what you can do! .. _marvin_canvas: .. figure:: _img/TNT2.* :width: 70% |MarvinSketch|'s canvas is where you can draw molecules. Once you are familiar with the interface, you can start building your target molecule, |TNT|, following its representations in :numref:`marvin_draw_TNT`. .. _marvin_draw_TNT: .. figure:: _img/TNT3.* :width: 70% |TNT| chemical structure and 3D atomistic model. `Source <https://en.wikipedia.org/wiki/TNT>`_. There are actually many ways of drawing that molecule in |MarvinSketch|. None of them are intrinsically better but we suggest that you start by using building elements that are the closest possible to the final structure you want to represent. In this case, we could start by inserting a benzene ring by clicking on the benzene ring symbol at the left of the bottom toolbar. Then place it on the canvas by clicking anywhere and press :kbd:`Esc` (to avoid placing any more of them next time you click on the canvas). Then create new methyl groups (:math:`\mathrm{CH}_3`) by selecting and using the *Single bond* tool (top of the left toolbar) until you get a structure similar to what is depicted in :numref:`marvin_draw_CH3`. Once finished, remember to press :kbd:`Esc` to unselect the current tool. .. _marvin_draw_CH3: .. figure:: _img/TNT5.* :width: 70% Benzene ring with four extra methyl groups ready to be edited. Note that, by default, hydrogen atoms are not displayed: each dangling bond is actually a :math:`\mathrm{CH}_3` in disguise! But as you can see back in :numref:`marvin_draw_TNT`, only one group should actually be a :math:`\mathrm{CH}_3`. The rest should be made of nitro groups (:math:`\mathrm{NO}_2`) instead! Fortunately, |MarvinSketch| allows you to change the chemical nature of any atom you want. Have a look at the toolbar at the right of your screen, click on ``N``. Now hover your pointer over the carbon atoms you would like to replace by nitrogen and click to make the substitution. Repeat until you get the result shown in :numref:`marvin_N_subst`. .. _marvin_N_subst: .. figure:: _img/TNT7.* :width: 70% The result of the substitution of the methyl groups by nitrogen. Now it will be easier if we could see all the hydrogen atoms that were kept hidden until now. To do so, click :menuselection:`Structure --> Add --> H Explicit Hydrogens`. Then perform the required substitution of hydrogens to oxygens to transform the amino groups (:math:`\mathrm{NH}_2`) to nitro groups. Finally, you must convert the single bonds between the oxygens and nitrogens into double bonds. To do so, click on the small arrow at the right of the *Single bond* tool (top of the left toolbar), select the *Double bond* tool and click on the single bonds you need to convert. The final structure is shown in :numref:`marvin_TNT_final2D`. Check carefully that you obtained the same result. .. _marvin_TNT_final2D: .. figure:: _img/TNT9.* :width: 70% TNT 2D model. The very last step before saving that structure is to make it 3D! Select :menuselection:`Structure --> Clean 3D` and you should get a 3D structure that you must save (:menuselection:`File --> Save`) as a protein data bank (:file:`.pdb`) file: :file:`TNT.pdb`. We suggest that you try opening this structure in |Vmd| to really see that the structure you generated seems proper. Congratulations! You have just created your first structure from scratch. You are now ready to set up your |MD| simulation. Converting |TNT| into the |Gromacs| format ------------------------------------------ First we need to convert the structure file :file:`TNT.pdb` into a structure file readable by |Gromacs|: :file:`TNT.gro`. As before, we are providing you with a few files to help you setting your working environment up. Download, copy and extract them into a relevant directory (e.g. :file:`~/scratch/tutorials/gromacs`) from this link: :dl:`TNT.tar.xz <_data/TNT.tar.xz>`. .. code-block:: console $ tar -xf TNT.tar.xz .. only:: latex .. hint:: The reference to ``TNT.tar.xz`` above is in fact a :smallcaps:`pdf` attachment; right-click on it and select the option to save the file in the drop-down menu. Locate the newly created directory and create two copies of that folder: :file:`TNT_NVT` and :file:`TNT_NPT`: .. code-block:: console $ cp -r TNT TNT_NPT $ mv TNT TNT_NVT Go into :file:`TNT_NVT` and copy over the file :file:`TNT.pdb` that you have just created with |MarvinSketch| into that directory. As usual when we need to do any actual work, we need to log into a computing node and load |Gromacs|: .. code-block:: console $ sint $ module load gromacs/2019.2 Then edit the :file:`TNT.pdb` file you have just created and replace all the instances of ``UNK`` by ``TNT``: this is a way of specifying that all the atoms in that file belong to a |Gromacs| group (here, a molecule) called ``TNT``. That name would usually be arbitrary: you would just need to find something descriptive. In this specific case though, ``TNT`` is also used in other files we provided (most notably :file:`system.top`): so you need to stick to it. .. hint:: The easiest way to do so in ``nano`` is to use its find-and-replace feature: press :kbd:`Ctrl-\\` then type ``UNK`` as a search token, replace with ``TNT`` and press :kbd:`a` to replace all occurrences at once or :kbd:`y` to validate each line. Then convert that :file:`.pdb` file to a :file:`.gro` file: .. code-block:: console $ gmx editconf -f TNT.pdb -o TNT.gro You can have a look at the generated :file:`TNT.gro` file. It contains information about the type and position of each atom. Defining force-fields --------------------- |Gromacs| now has a structure to work with but if you remember the `introductory lecture <http://people.bath.ac.uk/td222/research_projects/ResearchProjects_Intermolecular_Interactions.pdf>`_ about molecular simulations, you also need to provide |Gromacs| with information about how each atom interacts with each other: you need to define force-fields. |Gromacs| already contains extensive lists of parameters. You are going to use the |OPLS-AA|_ set (optimized potential for liquid simulations all atoms). However |Gromacs| does not know that. Neither does it know how you intend to use them exactly: you need to write a topology file (:file:`.itp`) to do that. .. _OPLS-AA: https://doi.org/10.1021/ja9621760 .. |OPLS-AA| replace:: :name:`opls-aa` Such a file must contain: * A list of all the atoms in the molecule, their atom type, their mass and charge. * A list of bonds: a list of what atom numbers are bonded to each other plus a flag to describe the kind of bond (i.e. which sort of force-field to use, e.g. ``1`` for a harmonic potential). * A list of angles: like bonds, but three atom numbers are needed. * A list of dihedrals (including improper dihedrals where appropriate): four atoms numbers must be specified. * The 1,4 pairs: a list of pairs of atoms numbers. Defining such a file would most likely take days. In practice, :file:`.itp` topology files can often be readily found alongside publications or in databases. In our case we are going to create that file nonetheless with the help of a small program, |MKTOP|_ (make topology). .. _MKTOP: http://www.aribeiro.net.br/mktop/ That program should already be in your tutorial folder: .. code-block:: console $ ./mktop -i TNT.pdb -o TNT.itp -ff opls -conect no The ``-ff opls`` flag specifies that the |OPLS-AA| force-field should be used; The ``-conect no`` flag tells |MKTOP| that our :file:`.pdb` file does not contain any bond information (this is not included in output files from |MarvinSketch|). Getting your topology file right -------------------------------- In a perfect world, the ``./mktop`` command above would be enough to get a working :file:`TNT.itp` file... but it isn't. The file :file:`TNT.itp` needs some editing: * Under ``[ moleculetype ]``, change the name of the molecule from ``MOL`` to ``TNT``. * Under ``[ atoms ]``, remove the numbers after each atom symbol. For instance ``C1`` should become ``C``, ``C2`` should become ``C`` as well, etc. * Still in the ``[ atoms ]`` section, you must set each of the atom's charge by its proper value (currently they are all set to ``0.00000``). To do so, * make a note of each atom type (i.e. the names starting by ``opls_``); * open :file:`oplsaa.ff/ffnonbonded.itp` (with ``less``): that file contains all the force-fields parameters |Gromacs| will be using; * find each entry corresponding to each atom type you identified in :file:`TNT.itp`; * find the charge corresponding to each entry; * replace each charge in :file:`TNT.itp` by its proper value. .. hint:: Use :kbd:`/` to search for something ``less``. Use :kbd:`n` to jump to the next matching element or :kbd:`Shift-n` to get back to the previous one. * Remove the following unneeded lines from the end of :file:`TNT.itp`: .. code-block:: none [system] ; Name MKTOP [ molecules ] ; Compound #mols MOL 1 From |Gromacs|'s perspective, the atom type (starting with ``opls_``) is a reference to the force-field parameters found in the database file :file:`oplsaa.ff/ffnonbonded.itp`. |MKTOP| cannot read that file though to copy charges over: this is why you had to do it yourself. A human-readable description of each type is also provided in :file:`oplsaa.ff/atomtypes.atp`: have a look at it and make sure the atom types in :file:`TNT.itp` look right. Checking your topology ---------------------- The best way to quickly find if there is something really wrong in your system is to calculate its total charge: just sum all the partial charges you have set in :file:`TNT.itp`. Do they add up to zero? Why do you expect that system to be neutral? What would happen if the system weren't neutral (think about periodic boundary conditions)? In principle, you should also check for consistency using indications in :file:`oplsaa.ff/atomtypes.atp`: e.g. if one of your nitrogens were of a type ``opls_145`` described as ``Benzene C`` in :file:`oplsaa.ff/atomtypes.atp`, something would be really wrong somewhere. However doing a full manual check with :file:`oplsaa.ff/atomtypes.atp` and |Vmd| would take too much time in the span of this tutorial: we have already checked that |MKTOP| behaves correctly with |TNT|. .. attention:: Should you want to do that consistency check yourself, be aware that |Vmd| start counting from 0 while atoms are referenced from 1 in :file:`TNT.itp`. This means that what |Vmd| identifies as ``atom 0`` is actually the atom of index 1 in :file:`TNT.itp` while ``atom 1`` in |Vmd| is actually at index 2 in :file:`TNT.itp`, etc. Preparing the simulation box ---------------------------- If we really wanted to follow the aforementioned paper we would need to simulate the behaviour of :math:`50\,\mathrm{mg}` of |TNT| into :math:`1\,\mathrm{L}` of water at :math:`35\,\mathrm{ºC}` and :math:`1\,\mathrm{bar}`. What usually really matters though is not absolute quantity of components but concentrations. So the real question is how many molecules of |TNT| and water do we need to place in a given simulation cell to reach such a concentration? Knowing that |TNT| molecular weight is :math:`227.13\,\mathrm{g/mol}` and water density in these conditions is :math:`55.177\,\mathrm{mol/L}` we find that we need about 250 thousand water molecules per |TNT| molecule. Do you think this kind of system size can be reached in practice? Is it worth trying it? If you have no idea, try to roughly extrapolate from the execution time you had when calculating bulk water properties. Anyway, such a system cannot be studied in the remaining time of this tutorial. We are going to create a much simpler system instead: 10 |TNT| molecules in a simulation box containing 1000 water molecules. We are going to proceed in two steps: #. We need to insert our |TNT| molecules into a simulation box (here :math:`5^3\,\mathrm{nm}^3`): .. code-block:: console $ gmx insert-molecules -ci TNT.gro -box 5 -nmol 10 -o 10TNT.gro Here, ``gmx insert-molecules`` is the |Gromacs| sub-command used to insert molecules in a simulation box; ``-ci`` specifies the structure of the molecule that must be inserted; ``-box`` sets the size of the simulation box in nanometres (if only one number is given, |Gromacs| just assumes the system is cubic); ``-nmol`` sets the number of molecules to add and ``-o`` specifies the name of the output file. By all mean, visualise your 10 |TNT| molecules (:file:`10TNT.gro`) in |Vmd|! #. Now we need to add the 1000 water molecules in the box as well: .. code-block:: console $ gmx solvate -cp 10TNT.gro -cs -maxsol 1000 -o TNT_Water.gro Here, ``gmx solvate`` is the solvation sub-command; ``-cp`` specifies the box we want to solvate; ``-cs`` specifies the the solvent we want to insert, ``-maxsol`` sets the number of solvent molecules to add and ``-o`` redirects the output into a file called :file:`TNT_Water.gro`. Visualise the output in |Vmd| and type the following (in |Vmd| terminal) to visualise periodic boundary conditions on your simulation cell: .. code-block:: console $ pbc box Do you see why we did not start by adding the water and then add the |TNT| molecules in? Running the simulation ---------------------- We are finally ready to run the simulation! We *could* execute the following if we are still on a computing node (but wait, don't do it): .. code-block:: console $ gmx grompp -v -f NVT.mdp -c TNT_Water.gro -o topo.tpr -p system.top $ gmx mdrun -v -s topo.tpr -c TNT_Water.gro .. only:: latex .. hint:: Mind the little red arrow that means the line continues (without an actual line break). However this is going to take hours! Since the beginning, we have been using computing nodes directly: they are nice but limited (you can't ask for more than one node at once, you can only stay on it 6 hours at a time, you cannot disconnect without killing your job). We should instead employ an automated approach to running our jobs. The *usual* way of requesting computing resources on |Balena| consists in submitting so-called submission scripts. In our case, such a submission script has been provided: :file:`TNT.sbatch`. You can read more about that file in :ref:`job_submission`. That file can be submitted to the job scheduler using: .. code-block:: console $ sbatch TNT.sbatch .. note:: If you are a project student in you training week, you can change the following lines: .. code-block:: bash #SBATCH --account=free #SBATCH --partition=batch into those ones to get a faster access to resources: .. code-block:: bash #SBATCH --account=ce30122 #SBATCH --partition=teaching .. warning:: **Don't** play with :file:`TNT.sbatch` in general: computing resources are accounted for and an overuse would make further submissions more difficult. That file is going to request resources for your job and execute |Gromacs| in parallel on multiple |cores|: instead of having to wait for hours, the calculation should terminate within 30 minutes. You can review what you have submitted using: .. code-block:: console $ squeue -u <username> Once your job is running (i.e. the ``ST`` column should show ``R``), you can follow its state in :file:`slurm-*.out` (again ``*`` in that context is a wildcard). You can open it with ``less`` and press :kbd:`Shift-f` to display the data as it comes in. One a job is submitted like this, don't change anything related to its working directory anymore: don't rename, move, edit, change or copy anything. Exercises --------- The simulation you are running is performed in the :smallcaps:`nvt` ensemble (you should see that a thermostat is set in :file:`NVT.mdp`, ``Tcoupl``). It is going to take around 20 minutes to finish (unless you made a mistake somewhere). In the meantime, go in the other copy of the directory you created right at the beginning: .. code-block:: console $ cd ../TNT_NPT Rename (``mv``) :file:`NVT.mdp` to :file:`NPT.mdp` and edit the file to use ``berendsen`` barostat (``Pcoupl``). Edit :file:`TNT.sbatch` to use :file:`NPT.mdp` instead of :file:`NVT.mdp`. Copy the required files from :file:`../TNT_NVT` to avoid having to go through the whole manual editing process again. Finally submit that new job to |Balena|. Once the :smallcaps:`nvt` simulation is run, visualise the |MD| trajectory (with |Vmd|) and use the post-processing commands you saw in :ref:`water_in_gromacs`. When is your system equilibrated? Have a look at the time evolution of temperature: is it what you expected? Finally, calculate the density of your system using the results from the :smallcaps:`npt` simulation. Have a nice simulation!