It is strongly recommended that the reader does not attempt to read this section if they have not at least skimmed the guidance on document production and transform methods from the main site. It is questionable how much sense the section will make without some insight as to the restrictions and requirements of the methods.
We recommend that separate Linux and Windows 7 machines be used (the Linux software requires only command line interface). We used a dual boot machine and spent quite a lot of time rebooting...
The following is a list of necessary software on the two operating systems. Please see appendix A for links to further information about the software.
Linux:
TeXLive
ImageMagick
C, Python and Flex.
Stix fonts or other fonts to enable a visual check of MathML output in Linux.
A PDF viewer and web browser.
TeX4ht.
PlasTeX compiled from source with our changed files incorporated — see the transformations document for instructions.
Our proof of concept code for producing LaTeX for large print PDF and for input to TeX4ht in order to produce LibreOffice documents. Requires Flex and C to compile — see the transformations document for instructions.
None of the above software has a direct cost however, the installation and maintenance of the system will require staff time.
Windows:
Software which must be purchased:
Office 2010.
MathType (minimum version 6.8 required).
Word2TeX.
Software for testing purposes only:
Multiple browsers.
Screenreader and literacy support software from the list: http://www.dessci.com/en/solutions/access/atsupport.htm#Reader_Tools
BaKoMa.
ChattyInfty (if exploring options for speech and Braille).
Software which has no direct cost but will require staff time to install and maintain:
LibreOffice.
MathPlayer (version 2.2 required).
LaTeX-access and LaTeXLex (if exploring options for speech and Braille).
Time for an expert user to gain a basic understanding of the key restrictions is approximately 1 hour but from experience they will need to produce and transform several practice documents to reach a full understanding. Once this has occurred the only remaining time overhead in the creation of the master documents is the extra work required to incorporate EPS diagrams into documents.
Transformation of the documents cannot be truly automated as a single transform may rely on two operating systems and on the use of software with a GUI interface. The transformations are completed without change to the master document. With careful use of make files, scripts, macros and storage of files many steps can be completed with limited user intervention. Some steps appear to scale badly with the number of equations in a document but we have not quantified this.
We recommend that support staff focused on the task produce masters, transform from masters to other formats as required and advise staff updating master documents. Specialist staff time is likely to be the most substantial cost. Assuming such staff are already employed to produce accessible formats this cost will diminish over time provided that master documents are updated.
All transforms from LaTeX rely on open source, free software. There remains the risk that this software is not updated in the future and ceases to be viable on a modern computer. This is a wider question for the mathematical community which uses these tools. To attempt to quantify this risk we provide factual information about some of the tools, as of 31st May 2012, which may allow a reader to investigate further.
TeX4ht: was created by Eitan Gurari at Ohio State University, he died unexpectedly in June 2009. It is stated on the new TeX4ht website, hosted by TUG (TeX User Group) that “With the encouragement and support of Eitan’s family, CV Radhakrishnan and Karl Berry are now working on the package. Involvement by other volunteers, from bug reports to major new development, is welcome and needed”; “No full post-Eitan release has been made to date”; “In TeX Live, we have installed small updates to hyperref.4ht, biblatex.4ht, and nameref.4ht (and no other files). All other development changes remain solely in the source repository.”. The source repository was last updated on 25th April 2012.
PlasTeX: Kevin Smith is the contact name, registered on sourceforge in October 2004, he last updated the source on 30th April 2012. The sourceforge site notes that it remains a “beta” release.
LibreOffice: is a project of the not-for-profit organisation, “The Document Foundation” http://www.documentfoundation.org/. Full information about the foundation can be found at that page.
MathJax: is a joint project of the American Mathematical Society, Design Science, Inc., and the Society for Industrial and Applied Mathematics. These organisations also provide major funding. The list of MathJax sponsors can be found at http://www.mathjax.org/sponsors/
LaTeX-access: was created by Alastair Irving and Robin Williams. They ask that if you are interested in assisting with the project, even in a small way, to e-mail them. The last update to their sourceforge repository was on 27th May 2012.
Of course, some of the above appear to have substantial funding and community support. It is possible that their future is more secure than some of the commercial products we have used.
We attempt to capture some of the more specific remaining risks and barriers noted throughout the project. We have undoubtedly missed something and will probably update this section in the future.
Some methods are currently reliant on our own proof of concept work rounds. This is a major risk. Our proof of concept work rounds may not be reliable, may not be updated and ultimately may not be the correct approaches. They were created to overcome known bugs, issues or missing functionality in software.
Unclear situation with respect to native MathML support in browsers: the lack of native browser support for MathML rendering led us to MathJax as an interim solution for large print/small screen devices while retaining the possibility of speech in IE with MathPlayer. We found that a full LaTeX to MathML converter was needed for completely correct speech. However, this format cannot be rendered by most browsers. MathJax is not the work round — the particular manner in which we are producing files which render using MathJax is a work round.
We have provided a proof of concept renderer for PlasTeX which retains the LaTeX of equation environments within the HTML and incorporates MathJax into the header. This renderer should be completed and submitted to PlasTeX. PlasTeX was a chosen technology due to the production of an interim document model which would allow rendering to multiple formats perhaps including mind-maps or other visually organised notes. It was also the simplest method to created a proof of concept of MathJax. In this respect we believe that improvements to PlasTeX offer significant gains.
Unclear situation with respect to linebreaking: the breqn package cannot be used in master documents as no LaTeX to MathML converter, including those we use (TeX4ht and MathJax) supports it. Automated linebreaking is not available in all output formats. The breqn package is the only available method to produce hard copy reflowed, large print mathematical documents. MathML3 supports linebreaking and MathJax (with the HTML-CSS renderer) supports this. However, MathJax webpages do not currently produce stable enough hard copy print output. TeX4ht and similar do not yet support MathML3. TeX4ht are currently undecided whether to support the breqn package or to use the automated linebreaking in MathML3 http://www.cvr.cc/mathml-3-and-tex4ht/.
The lack of support for any automated line-breaking in TeX4ht, along with the lack of browser support for MathML is what led us to MathJax as an interim solution for large print/small screen devices with some possibility of speech. However, TeX4ht remains the only software which produces a screenreader ready version. Automated linebreaking for mathematics is a fairly young technology; it is hoped that the correct method becomes clearer with time.
Knowledge that automated linebreaking might occur is required for authors to protect subparts of equations from linebreaking or to ensure that they have a full understanding of the implications of linebreaking for their encoding of content. In the present methods authors are divorced from direct experience of this when compiling their master as breqn cannot be used directly.
PlasTeX missing functionality: code to deal correctly with newtheorem commands in LaTeX appears to be incomplete; a basic and incomplete work round to this provided in our additional files does not encompass numbering or labels and references to any environments produced by newtheorem. The code is an incorrect approach put in place to avoid loss of the environment title. The missing functionality is reported at http://sourceforge.net/tracker/?group_id=120835 (bug 3061855, reported 2010).
It does not seem possible to insert alternative text for included images using PlasTeX. This is a key barrier to using this format with a screenreader regardless of the accessibility of the renderer used.
The renderers provided by PlasTeX are not aware of all equation environments. However, we do not recommend use of these in any case.
MathJax vs PlasTeX numbering: We have encoded a requirement that equation numbering is by subsection and that the equation counter is reset by hand at the start of all sections and subsections and altered nowhere else (tag and notag also cannot be used but this is a stricter requirement as they can’t be transformed by some software). This soft requirement on the numbering mode is very restrictive and applied solely to ensure that the numbering in all other formats is equivalent, but not even identical, to the PlasTeX produced HTML+MathJax rendering. The HTML+MathJax format is intended for large print/small screens and the output is broken into pages of a crosslinked website. This will greatly improve navigation for some students. Currently we “ask” MathJax to number the equations on each subpage.
Requiring PlasTeX to produce the numbers may be possible and this would be a more appropriate method. The current risk is that authors either forget to reset page numbers, choose not to or ask PlasTeX to break the document in a non-default manner. All formats will then have internally consistent but non-equivalent equation numbering.
Requirement to use local Unicode transformation files with TeX4ht: In order to produce completely correct speech in the XHTML+MathML format, for every symbol permitted, unicode fonts had to be used. However, this produces unicode ligatures within the English text and these cannot be spoken correctly. This is a known behaviour http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=307647 and the recommended approach is to alter the unicode transformation files. This was completed for some but not all unicode.4hf files in the system. Rather than forcing a user/maintainer to alter parts of TeX4ht we have introduced local versions for the particular command we recommend. This adds an additional risk in that the transformation will appear visually correct if this file is not present in the directory but will not be read correctly by screenreader or text-to-speech.
In order to produce correct rendering of the various permitted symbols in the transformation to Word+MathType we must use the unicode expected by MathType (some are in the Unicode Private Use Area, see http://www.dessci.com/en/support/mathtype/tech/encodings/mtcode.htm). A second unicode.4hf file for the oolatex command was produced. The unicode produced by this command will not render successfully in the interim office formats, only once the full sequence of transformations to MathType format has been completed. There is a risk that interim formats will be used in the belief they are correct as this does not affect all symbols. There is a risk that if the unicode.4hf file is missing from the directory that the transformation to Word+MathType will be incorrect.
Considering both of the above factors, there is a risk that the two different unicode.4hf files, for XHTML and office transformations will be mistaken for each other. The files cannot be renamed. Due to this the two TeX4ht transformations must take place in different directories. This is an additional risk.
Despite the above, TeX4ht was retained as the most stable and configurable LaTeX to MathML transformation experienced and the one in which additional LaTeX packages are likely to be supported without production of configuration files (TeX4ht was the only transform to be successful on a test document using author provided documentclass and packages). This is the only method by which correct speech has been produced for mathematics alphabets such as blackboard bold, calligraphic etc. in XHTML and MathML.
Requirement to use local configuration files with TeX4ht: the local configuration files provided allow included images to be dealt with and alternative text for these to be inserted into the XHTML+MathML format. The configuration files are also necessary to combat difficulties with sub- and super-scripts explained at http://www.tug.org/applications/tex4ht/mn3.html#QQ1-3-14. Three possible work rounds are advised on that page but only the one we have used functions with the office transform as well as the XHTML+MathML transform. The main risk is that the configuration files are not present in the directory. This will cause unpredictable behaviour depending on the content of the LaTeX file. The secondary risk is that users do not note that they must update the configuration file as per the content of their preamble.
LibreOffice bugs with MathML import: This bug is reported at https://bugs.freedesktop.org/show_bug.cgi?id=47414. The bug also exists in OpenOffice and dates back to at least 2006. Our proof of concept work round for this produces a variant of the LaTeX file which forces sub- and super-scripts to appear not to be a single symbol and all array cells to be non-empty (by effectively introducing "" in both cases). This forces the output from TeX4ht to produce MathML within the ODT format document which avoids the known cases of the bugs. LibreOffice is used as an interim format (after use of TeX4ht) to enable the ultimate production of Word documents for use with MathType. Multiple transformations of this type hold their own risk as errors may accumulate. It is possible that different software would be more appropriate.
Other risks and barriers include:
LaTeX is both more “relaxed” and provides far greater flexibility than the transformation methods. All of the following may result in failed or no output:
Common typos;
Common author practices which are not recommended but apparently tolerated by LaTeX;
Incorporating untested LaTeX packages;
Change in preamble ordering.
The guidance on writing LaTeX documents must be followed precisely and documents checked carefully for the above if output fails. Further information on the above is given in the guidance document.
Known bugs, issues or missing functionality for which there is no work round (tolerances):
The symbol, command and environment set permitted by all transforms is relatively limited. Some authors may have difficulty encoding their learning resources within the restrictions.
In LaTeX certain types of environment can not be permitted within lists.
The method for working with EPS diagrams and psfrag may be a restriction which some authors are unwilling to work within.
Lists of theorems, figures, glossaries and indexes cannot be produced from LaTeX.
The eqnarray environment is not numbered by MathJax https://github.com/mathjax/MathJax/issues/229 though see above risks associated with the current approach to numbering in this format.
Mathematics alphabets (e.g. blackboard bold, calligraphic) appear not to be voiced correctly when published to MathPage by MathType and read by MathPlayer (and hence any speech technology reading the page). This is also true for MathJax pages rendered using MathPlayer but not the case for TeX4ht output.
Some two dimensional layouts simply cannot be accommodated beyond a certain font size. This requires the document author to consider elision but they may have difficulty ascertaining what types of structure or sizes of structure may cause reading difficulties. See LaTeX guidance notes for comments on this.
URLs, in particular still sometimes overrun the RHS as they contain no natural breaking points and these cannot be introduced while retaining the live link in the HTML/XHTML versions.
URLs containing ampersands cannot be included in documents which are transformed to XHTML formats.
Commuting diagrams with diagonal arrows cannot be produced without working in a separate document, compiling it to a PDF and including the PDF as an image. All available packages were tested and all fail to be incorporated (or disrupt the transformations).
Missing output formats:
A clear process for producing correct UK Braille did not emerge, all of the tested software was either too constrictive for incorporation in the general method, too unstable or too difficult to realistically evaluate in the time frame. Various possibilities for working with the LaTeX source exist and were trialled. None have a symbol set as large as the general method but the focus on software which retains the LaTeX commands which are not transformed is a possible work round. It is recommended that departments work closely with any student using Braille as a reading format in exploring these methods and brief information about them is in the transform guidance document.
The process for working with LaTeX beamer documents not yet stable but we hope to deliver some guidance on this in the future as it is a commonly required set up.
A clear process for working with PowerPoints did not emerge although some guidance as to how to improve the accessibility of PowerPoint documents containing equations can be given and we hope to provide this in future.
Clear guidance on using handwriting as an input method did not emerge but some guidance on this can be given and we hope to provide this in future.
The production of a process for creating a visual organiser as an interface to long documents was not stable. It seems likely that this should be approached as a renderer from the PlasTeX document model but this was not considered at the time.
From a staff point of view the methods require a change in mindset away from typesetting or “What you see is what you get” (WYSIWYG) approaches to document production. This in itself may be a barrier.
Finally, a remaining barrier is that students and their support staff may not know how to use the software or resources effectively. For instance, using text-to-speech software with XHTML+MathML requires installation of MathPlayer and knowledge of how to request the text-to-speech software to read the resource. Guidance and possibly training for students and staff may be required.