Readme file for the CDNN program Version 2
==========================================

Please read this information carefully before proceeding to install
and/or use the program.

Operating system:
+++++++++++++++++

This program will work only with Microsoft Windows 95 or Windows NT. Do
not try to use it with Microsoft Windows 3.x. Also, for the future a 
Win16-version is not planned. Macintosh, Unix, etc. are also not 
supported.

However, we are about to install a Java-based version of the software
running on our webserver
     http://bioinformatik.biochemtech.uni-halle.de/cdnn
and you may run it via a web browser such as Netscape or Internet 
Explorer from any platform you like, or you may copy the applet version
of this software and run it locally on your machine. In this case you 
need an appletviewer software, which is usually free of charge for any
computer platform. Expect the Java-based version to be finished in 
Summer 1998.



Setup:
++++++

No setup problems are known yet. If so, please report to the author.
In order to install the software, download the executable INSTALL.EXE 
from the webserver, copy it to a temporary directory, e.g. "C:\TEMP\",
on your harddisk, and start the executable. It will then self-extract 
the neccessary installation files in the temporary directory. 
In the temporary directory where the installation files were extracted, 
you will find a program called SETUP.EXE. This is the program that will
actually install the CDNN software, including the trained networks, 
documentation, the freeware license agreement, sample files, and others 
in a directory of your choice. Installation is straightforward.
For deinstallation, use the standard Windows95/NT procedures: Select 
[Software] in the system control, select the CDNN entry, and press the
[Install/Deinstall] button.


Command-line argument:
++++++++++++++++++++++

You may start the program with one argument which indicates the networks
that are used. Possible values are NNET_13, NNET_23, and NNET_33. For an
explanation of these three different "strategies", cf. to the program
documentation. Default is NNET_13 which may be used to deconvolute (most)
simple CD spectra, i.e. spectra that are dominated by secondary structure
effects, and other parameters such as aromatic amino acids and disulphide
bonds are neglegible. 



Training:
+++++++++

It is NOT neccessary to train the networks again. The network files
which reside in three different subdirectories are already trained.
IT IS STRONGLY RECOMMENDED NOT TO CHANGE THESE TRAINED FILES!!!!
On a high-end Pentium Pro 200 MHz Win95 workstation, training of one
of the 21 neural network files used by the program with 300,000 cycles
takes about three to 10 hours.

However, you may also define your own set of training files. These are 
located in the USER subdirectory of the CDNN program. It is neccessary to
concatenate the text files with the data in a special format to a single 
file ("basespec.txt") and then convert this data file to a single 
binary file ("basespec.bin") which is read by the CDNN program upon 
startup (command-line parameter "USER"), or with [Options][Preferences].

The program has a feature in the [Options] menu which is named 
"LiveUpdate NetFiles". Computers that are connected to the internet may
download and install automatically the latest neural network files. There
is continuing research on the neural network strategies and topologies;
expect new network files approximately twice a year.

For more information, see the program documentation.



Program usage:
++++++++++++++


It is simple to use the program. First, you need your data in a format
that can be used by the program. The optimal file format is a text format
like the following example (see also files in the "Sample Files" direc-
tory):

 180.0      0.4000
 181.0      0.7771
 182.0      1.2200
 183.0      1.7353
 184.0      2.3000

 .....

 258.0      0.0000
 259.0      0.0000
 260.0      0.0000

It is ABSOLUTELY neccessary to measure the spectra to 260 nm, in order to
assure that the buffer has been subtracted correctly - the signal 
sould be zero at these wavelength. Of course, it should be measured to
the lowest possible wavelength, e.g. 180 nm. If minimum wavelength is 
higher than 210.0 nm, or maximum wavelength is lower than 260.0 nm, then
the data are rejected; the program can/will not use these data.

The data are then sorted, the minimum used for selecting the neural net-
work(s) is determined (180 nm, 185 nm, 190 nm, 195 nm, 200 nm, 205 nm, or
210 nm), and a spline interpolation is used to get data values of equal
distance (1.0 nm steps). If your data are measured to 185.0 nm, then 
the 185-nm-net is used as the lowest-limit-net; if your data are measured
to 185.1, 187.0 or 189.7 nm, then the 190-nm-net will be the "lowest". 

Generally, all networks recognize helices pretty good. In order to have
beta-sheets (parallel and/or antiparallel) correctly, you should measure
down to 190 or 185 nm. Best results are, of course, data down to 180 nm.


You have to choose the dimension (units) of your CD signal data when
loading them into the program.

 * Delta Epsilon
 * Molar Ellipticity
 * Milli-Degrees (you then need protein concentration, path length, and 
   molecular mass to allow calculation of normalized DeltaEpsilon values)

The program works with Delta Epsilon; if you supply this dimension, 
then your data are not converted. Otherwise, the equations published
elsewhere in the literature are used to convert your units to
DeltaEpsilon.

You may check the correct reading of the data by inspecting them via
[Edit] - [Edit data...] from the main menu.

As soon as your data are loaded, click [Deconvolution]. Thats all.



Error of prediction:
++++++++++++++++++++

Starting with the lowest prediction possible (depending on your data),
there will be several predictions. Most reliable is, of course, the 
lowest-wavelength prediction. The other predictions may give you a 
feeling on the reliability of the predictions; they should not differ
too far from each other.

Also, the sum of the secondary structure elements should be as close as
possible to 100 %. If the values deviate more than 5-10 %, then either 
the data are incorrect (for example, units were converted wrongly), or
the network structure / base spectra are not suitable for solving your
problem.

TABLE: Current state of the trained networks:
Average Error (%) of predicted secondary structures of the networks

Waveln.	| 180-260 | 185-260 | 190-260 | 195-260 | 200-260 | 205-260 | 210-260
-----------------------------------------------------------------------------
NNET_13 |  4,32 % |  4,37 % |  4,35 % |  4,51 % |  4,63 % |  4,84 % |  4,91 %
NNET_23 |  4,87 % |  5,06 % |  5,03 % |  5,16 % |  5,38 % |  5,47 % |  5,44 %
NNET_33 |  3,98 % |  5,92 % |  5,57 % |  6,20 % |  6,45 % |  6,39 % |  6,60 %




Known bugs and limitations:
+++++++++++++++++++++++++++

Clearly the biggest limitation at the moment is lack of a high-quality
documentation. Do not try to open the help file, you would be 
disappointed. I haven't managed yet to set up the documentation yet.
On the other hand, the program works very stable in my hands, and it is
quite simple to use, so I do not see the immediate need for a lengthy
documentation at the moment.

The numeric display shown during the training cycles (counting the
learning progress) has ugly bands running through the upper part of
the bitmap(s). This is a problem of one of the run-time component 
together with the latest release of Borlands Delphi (3) which I could 
not debug yet (worked fine with Delphi 2). Since it is only an optical
effect, you should not worry about it.

Print preview sometimes did not work and was therefore disabled in
the currently distributed version.

The Internet LiveUpdate for the network files does not check all error
conditions which could happen. It works fine when the computer is 
connected correctly to the internet, and the server for the updating 
process is up (141.48.11.210, 'bioinformatik.biochemtech.uni-halle.de').
If you encounter problems with LiveUpdate, please report them to the
author.

If you find bugs, please report then to the author (see below), 
preferrably by e-mail. Please indicate the conditions where the error
appeared, the error details, and whether it is reproducible. I also 
need the exact version and build info of the software. Therefore, start
the Windows explorer, right-click on the program file CDNN.EXE, and find
the "Properties..." menu. Go to the tab named "Version", and find the
version and build number of the software; the number looks like 
"2.0.3.188" which means version 2.0.3 (or 2.0c), build 188. For debugging
your software version, I need the exact build number.



Future Developments:
++++++++++++++++++++

All printing will have to be reprogrammed in order to print multi-column 
tables. The QuickReport system will be used in the next version which 
will give much nicer printouts. Also, printing should be compressed to
save paper, e.g. printing data and figures on a single page.

More documentation possibilities for the user are neccessary, e.g. indi-
cating a protein name, experimental documentations, etc. which will also
be on the printouts. This will help experimental logging.

More data formats for input files should be supplied (Jasco, ...). Since
I do not have respective sample files, this is not possible yet. If you
are interested in loading files directly into the program using your 
preferred file format, please contact the author.

My colleague, Lars Waldmann, is about to prepare a large set of high-
resolution spectra of reference proteins on our brand-new AVIV CD
spectrometer. These files will replace the current data base for training
and all networks will then be trained with these spectra.
Major benefits:
 o Proteins are highly purified with our new Pharmacia AECTA system.
 o Protein concentrations (a major source of error) are determined
   with highest precision achievable.
 o Secondary structure fractions are calculated by a single program
   and are therefore of comparable precision.
 o Measurements are made with greatest care, very low scattering of 
   data (16 - 32 accumulations), highly purified and low-absorption
   buffers.

Expect the next major program version in spring or summer 1998.


REVISION HISTORY
================

Version 2.0c
First Major Release

Version 2.0d
- bugfixes in LED display of the numbers during training
- bugfix when starting program from a write-protected disk

Version 2.1
- bugfix: 2.0d had several run-time libraries not linked in the binary
  and ran successfully only when Delphi 3.0 libs were installed
- Program was completely re-ported to Delphi 4.0
- several libraries had to be changed
- new data format for basespectra (basespec.bin)
- new version of the conversion program cdconv.exe
- Major improvement to printing of tables: Times New Roman 12 pt as
  standard font




##################################################################
Gerald Bhm
Institut fr Biotechnologie
Martin-Luther-Universitt Halle-Wittenberg
Germany

http://bioinformatik.biochemtech.uni-halle.de/cdnn
##################################################################
