Haiola Documentation

What is Haiola?

Haiola is free and open source software that assists people in publishing the Holy Bible or portions thereof in many formats. It converts Scriptures in a source format to many output formats. The initial source format supported is Unicode USFM. USFX is also supported. We plan future support for importing USX files, as well. Currently, Haiola supports one style of HTML output, plus USFX as an output. In the future, it will support conversions to multiple styles of HTML, PDF, and formats used by various Bible study programs and electronic book reader devices and software.

Notice

You are responsible to make sure that you abide by all applicable copyright laws and the rules of the applicable Bible translation agencies when publishing Scripture. More importantly, due care appropriate to handling God's Word is your responsibility.

What is the status of Haiola?

This program is under construction. It does useful things for the authors, and we have reason to believe that it will do so for you, too, but please use it at your own risk, and only with data that you back up frequently. This is a sort of a partial alpha release, but the parts that are done are useful and in operational use. We believe that it is better to build incrementally, and release frequently as new features are added or other improvements are made. This allows for early feedback as to what works, what doesn’t work so well, and what would be best to work on, next. However, this also means that you need to use this program with due caution. Back up your important data frequently. I never try to make mistakes, and I test this software on my system, but I never tried it on your system with your data. This is not consumer-level software. There aren’t many error traps. Strange data will cause very strange results and/or strange messages, so please watch for error messages and check your outputs to make sure they look reasonable. In particular, please note that bad markup that does not conform exactly to the USFM specification WILL cause problems. Customizations of the style sheet (custom.sty) in Paratext are NOT honored by Haiola, and will confuse Haiola.

Haiola is a free and open source collection of programs. It inherits major portions of Prophero and WordSend.

What is new in Haiola?

See the change log for the list of recent changes.

What is required to run Haiola?

The following prerequisites must be met to run Haiola:

How does Haiola work?

Haiola is a file format conversion program. It works with a mandatory directory structure and some auxiliary files. Rather than doing the traditional thing and specifying input file locations one by one, using this directory structure saves a lot of time for both the user and the programmer, at least when processing large numbers of language projects. For each Bible translation project, you enter some meta-data (information about the project), then run the conversions on that project or on all selected projects. Some of the details as to what to do with the outputs are left up to you, either to process manually or to supply a script or batch file to process them.

How do I install Haiola on Microsoft Windows?

Method 1: Standard windows install

  1. Install prerequisites: Java, XeTeX (MiKTeX is a nice way to do that in Windows), SIL Andika Font, SIL Gentium Plus Font, and Microsoft .NET Framework 4.5 SP 1 and/or Mono for Windows.
  2. Download the Haiola Windows installation program, setuphaiola.exe.
  3. Run setuphaiola.exe.
  4. Run Haiola. On the first run, you will be asked to specify where you want to put your data files. Normally, this is in the BibleConv folder in your Documents folder. NOTE: This must be a directory used JUST for Haiola. Do not make this your Paratext directory.
  5. If you wish to use direct Paratext project data import (without having to manually copy files over or make a backup file and unzip it to the Source directory), then you must tell Haiola where your "My Paratext Projects" directory is (whatever you called it when you installed Paratext) by pressing the Paratext button and navigating to it.

Method 2: Portable install (for installation on removable media)

  1. Install prerequisites: Java, XeTeX (MiKTeX is a nice way to do that in Windows), SIL Andika Font, SIL Gentium Plus Font, and Microsoft .NET Framework 4.5 and/or Mono for Windows.
  2. Download the Haiola program image, haiola.zip.
  3. Unzip haiola.zip to the directory you want to run it from (which may be on a portable device).
  4. Optional: create a shortcut to start the program from your menu and/or desktop. (Otherwise, just double-click on haiola.exe or execute it from the command line to start it.)
  5. Run Haiola. On the first run, you will be asked to specify where you want to put your data files. Normally, this is in the BibleConv folder in your Documents folder. This should be a directory that is used exclusively for Haiola data.
  6. If you wish to use direct Paratext project data import (without having to manually copy files over or make a backup file and unzip it to the Source directory), then you must tell Haiola where your "My Paratext Projects" directory is (whatever you called it when you installed Paratext) by pressing the Paratext button and navigating to it.

How do I install Haiola on Linux?

  1. Install prerequisites: XeTeX (texlive-xetex package in Ubuntu Linux), SIL Andika Font, SIL Gentium Plus Font, and Mono for Linux. (These may be available in your distribution package management system. For Ubuntu, you can get the latest SIL fonts from "apt source deb http://packages.sil.org/ubuntu distro main", where "distro" is your current distribution (i. e. xenial).)
    In other words, in /etc/apt/sources.list, add a line like:
    deb http://packages.sil.org/ubuntu distributionName main

    Then issue the following commands:
    wget -O - http://packages.sil.org/sil.gpg | sudo apt-key add -
    apt-get install fonts-sil-andika
    sudo apt-get install fonts-sil-gentium-plus
    sudo apt-get install mono-complete
    sudo apt-get install libmono-winforms2.0-cil

  2. Download the Haiola distribution .zip file and unzip it to a suitable location in your file system. Note that there are several files in the haiola.zip file that haiola.exe must be able to find for things to work right. The best way to ensure that is to place them all together in the same directory.
  3. Create start-up scripts or icons to start Haiola with mono. If you installed Haiola in ~/haiola/, the command to start it would be "mono ~/haiola/haiola.exe".
  4. Run Haiola. On the first run, you will be asked to specify where you want to put your data files. Normally, this is in "~/BibleConv".
  5. If you wish to use direct Paratext project data import (without having to manually copy files over or make a backup file and unzip it to the Source directory), then you must tell Haiola where your "My Paratext Projects" directory is (whatever you called it when you installed Paratext) by pressing the Paratext button and navigating to it.

What is the required directory structure?

The root of the directory structure can be anywhere in the file system that you have read and write permissions. It is normally a directory called “BibleConv” in your Documents directory, on an external device, or on a network drive. You choose which directory you want to contain your Haiola project data during your first run of Haiola, or at any later time using the “Set data directory” button. Under the BibleConv directory, Haiola will create Work, Site, and FilesToCopyToOutput directories.

Note that Haiola is designed to work on multiple operating systems. Most Linux file systems are case sensitive. Windows is not. MacOS is normally not case sensitive, but supports case sensitive file systems, too. Linux and Mac OS use “/” to separate directories, and Windows uses “\”. In my examples, I'll use one or the other style, and leave it to you to adjust when necessary to fit your operating system.

BibleConv
   input — Note: it is your responsibility to create the input project folders within this directory.
      project folders, which must be named the same as the short translation identifier
         Source
           Unicode USFM files for this project (if direct Paratext import is not used)
         -or-
         usfx
            USFX file for import
         -or-
         usx
            USX files named with .usx suffix (i. e. an unziped Digital Bible Library bundle). The .usx files may be in a subdirectory (i.e. USX_1) of this directory.
         -and (optional)-
         htmlextras
            Files to copy directly to the html output directory, such as images, introduction files, etc.
   output — Note: output directories will be created and filled automatically as needed.
         cover — cover image(s)
         epub
         extendedusfm — may contain nonstandard markup, e.g. for Strong's numbers. This format will be phased out as USFM gains those features.
         html — your configured HTML output
         search — text and XML files for searching text or importing to limited Bible study apps
         sql — Structured Query Language files for building a Bible contents database
         usfm — Unified Standard Format Markup, great for interchange
         usfx — Unified Standard Format XML, equivalent to extended USFM in features, but easier to process
         WordML — Microsoft Word 2003 (and later) XML document format
         xetex — XeTeX format (an intermediate step to producing PDFs)
   sword — output directory for a local Sword module repository
   swordRestricted — output directory for a local Sword module repository for modules with restricted rights

Of the folders above, the ones you create are the project folders and ONE of Source, usfx, or usx folder for input under each project folder, unless you are reading the data directly from a Paratext project. In that case, you must configure the Paratext data directory (commonly c:\My Paratext Projects, but it might be any folder that you configured in Paratext). Haiola checks for a specified Paratext project first, then Source (USFM files), then if that directory doesn't exist, it checks for usfx, then if that doesn't exist, it checks for usx. You then fill the Source, usfx, or usx folder with the appropriate input files for that Bible translation. For example, if you have a Bible translation (or portion of a translation) for a language with Ethnologue code "abc" (Ambala Ayta), you would create a folder like BibleConv/Work/abc/Source/ and then put the USFM files in there. The USFM files should use Unicode UTF-8 text encoding.

If you have source books that you don't want to publish, like unfinished books in a Paratext project, you may omit them by deleting their abbreviations from a custom bookorder.txt file in the input/project directory. The bookorder.txt file is simply a list of SIL/UBS 3-letter abbreviations of the books that are to be included (if present), in the order that they are to be presented to the user. This file is also useful if you would like to reorder the books to fit different traditions, such as Messianic Jewish tradition or Roman Catholic tradition. The default book order is a common traditional order that is usually appropriate, but this gives you control when you need to exclude or reorder books in a project.

Including Illustrations

To include illustrations in lightweight HTML output, include suitably-sized illustrations in a browser-compatible format (like .jpg). Place these files in the htmlextras directory. Include the exact case-sensitive file name of those illustrations, without path information, in the "catalog" or "file name" field of the \fig ...\fig* tag. Filling in the copyright field is strongly recommended, and required for some illustrations. The reference and caption, if present, will appear below the illustration on the web page.

If illustrations have a different suffix, i.e. ".jpg" instead of ".TIF" in the htmlextras directory, but the base file name matches, the extension will automatically be changed in the HTML output. If an illustration is missing in the htmlextras directory, no <IMG> tag will be generated. Thus, if you have copyright permission to include only selected image files, just put those files for which you have permission in the htmlextras directory for that project.

ePub generation

When generating an ePub, make sure that there is a cover image named cover.jpg or cover.png in the input project directory.

Note: cover art placed in BibleConv/covers and named with the FCBHID of a project will override and overwrite cover art in the input project directory.

HTML Output Options

Haiola currently offers 4 options for HTML generation, selected on the "Advanced" tab:

Note: please use the latest haiola.css or a derivative of it when generating Mobile HTML sites, and the latest prophero.css or a derivative of it when generating Classic HTML or concordance sites, since there are slight differences in the tag sets used. If you customize either of these, it is best to rename your resulting .css file and use the CSS name option on the advanced tab to use yours to prevent future overwrites.

See the options on the "Concordance" and "Frames" tabs of the Haiola user interface if you are using one of the two later formats. The most-tested HTML output is with concordance and frames options turned off. Only the two simple HTML options (Mobile and Classic) support sparse books with not all chapters present. If you use the framed concordance option, the navigation generation fails if the project does not include non-canonical section titles. Therefore, I recommend that you use either or both of these options only with projects that have no incomplete books, which have introduction files, and which have \s section headers in all books. The static concordance option with frame-based navigation is a slight improvement over the original Prophero HTML output. It works well on larger screens (not smart phones). This is a stop-gap option. A better search option is now available with inScript output, but the generator for that format is currently a proprietary plugin for Haiola. Also, please note that when generating concordance files, the process takes a LONG time, and may appear frozen for several minutes at a time. The process can be sped up some by deleting the output html directory before starting. If it already exits, it will be deleted anyway and replaced. That is why web page elements that you want to persist between runs must be put in the input/project/htmlextras directory.

Files placed in the input/project/htmlextras directory and ending in "_Introduction.htm" are used in the navigational structure of frame-based HTML.

What auxiliary files does Haiola use?

Besides the files in the installation bundle, you provide any regular expression substitution files to operate on the input files to convert them to USFM. This is a systematic way to consistently change a marker that isn't consistent with the USFM standard, or possibly clean up some encoding issues. By default, fixquotes.re is called, which turns << and >> into typographic quotes, etc. The regular expression files are UTF-8 text files whose file name must end in ".re". On each line, the first character is taken as a delimiter that separates the "find" portion from the "replace" portion of the line and also ends the "replace" portion of the line. For example:

/speeling/spelling/

See fixquotes.re for another example. See http://en.wikipedia.org/wiki/Regular_expression for more about regular expressions. It is wise to test your regular expressions to make sure they do what you think they should do before trusting the transformation.

For HTML output, prophero.css is copied from the project directory, if it is there, otherwise it is copied from the input directory to the output/html directory.

On the processes tab, you may specify additional programs or scripts to run after Haiola does its transformations. Use of these transformations are optional. I use them to create digital signatures and zip files of the various output formats, then to copy the output files to a local image of a file server in the appropriate places for each project. It is up to you if you want to automate that stuff or just use a graphical file manager to do all of that.

In addition to the processes in the processes tab for each project, you may specify one process to run after all projects in a run are done. To do that, just name it "postprocess.bat" and place it in the main input directory.

If you want to present the books of the Bible in other than the default order, or if you wish to exclude certain books from publication, you may create a bookorder.txt file in a project input directory that specifies the standard 3-letter abbreviation of each book to include, one abbreviation per line at the beginning of the line, in the order the books are to be presented. Abbreviations not included are omitted from the output, so this can also be used to generate a subset project (i. e. just the New Testament, or just the books that have been cleared for publication).

If you want to merge a standard crossreference list to the published output, you may put a file named xref.xml in the project directory. This file must be formatted like the crossreferences.xml sample file in the distribution. The "xlat" entries allow translation of book names or abbreviations to the local language, and the "xref" entries are the actual crossreference notes.

The Books List

The book name and abbreviation list (BookNames.xml) as generated by Paratext (or manually constructed) will be read to help parse references and make links from them.

Header items

Identification tab

Copyright tab

Processes tab

The upper section is for specifying the name(s) of regular expression files used to preprocess USFM files, i.e. to make them proper USFM when they start with some consistent variation on SFM-style markup, or possibly to correct a deviation from the Unicode standard.

The lower section is for running external processes on a project, possibly including steps like adding digital signatures, prestaging them for publication on a web site, running additional transformations, etc.

Media tab

Links to related media... under construction.

Messages

Progress and error messages appear here when Haiola processes are run.

Stats tab

The upper section gives statistics about the currently-selected project.

Command Line Options

If you specify "haiola -a" (without the quotes) on the command line, that is the same as pressing the "Run marked" button on startup, then closing the program when that run is completed.

Companion Command Line Utilities

The following command line utilities come with Haiola. Run one without parameters to get the syntax for its use. When running on Linux or Mac OS X, invoke them with mono, like mono massregex.exe.

USFM and USFX Tag Support

Haiola supports most of the Unified Standard Format Marker (USFM) standard, version 2.4. Study Bible sidebars and content category tags are not supported, nor are \periph section markers. Extensions to USFM are not supported except for the ones listed below, and then only for the stated use. (We only reluctantly added those markers, because there were real needs that the base standard couldn't handle.) Haiola does not read Paratext style sheets, and does not support any changes or additions to markers made there. It does read its own sfminfo.xml file, but please note that changes or additions to that file are not supported by Haiola's authors. For information about supported USFX tags, please see the USFX documentation.

tagend taguse
\ztoc4 noneAlternate book name used for detecting references in footnotes, cross references, etc. This is normally used for "Psalm" (singular) or equivalent. This tag must appear between \toc3 and \mt# at the beginning of a book.
\zw \zw*Encloses a Strong's number pertaining to the following word or phrase (that is between \zw* and \zx).
\zx \zx*Marks the end of the word or phrase that the previous Strong's number applies to. Nothing is allowed between \zx and \zx*. (Paratext hates that, but it will do it.) For example, \zw H7225\zw*beginning\zx \zx* (which is equivalent to the USFX <w s="H7225">beginning</w>. Note that the XML looks a lot more logical, but the extended USFX markers are exactly equivalent in meaning. A pair of markers is used for the end marker to avoid some of the ambiguity of how to handle such markers.
\zref \zref* Start of a reference hyperlink, equivalent to USFX ref element
\zsrc \zsrc* Source Bible reference, equivalent to USFX src attribute of the ref element
\ztgt \ztgt* Target Bible reference, equivalent to USFX tgt attribute of the ref element
\zweb \zweb* Internet URL reference, equivalent to USFX web attribute of the ref element
\zrefend \zrefend* Marks end of text linked by ref element; equivalent to USFX </ref>. This pair of markers is normally empty.

Note: to remove the above tags and the extra features they convey, delete everything between a tag and its own end tag, including those tags, except for \ztoc4. For that one, remove that tag and its text up to and not including the next tag (which would normally be \mt).

Output Formats

The following output formats are supported:

If the Sword utility osis2mod (not included in this distribution) is in your executable path, Haiola will call it with an OSIS file fragment and Sword configuration file to create a local Sword module repository.
FormatDirectoryComment
Simple HTMLhtmlSimple HTML optimized for just plain reading, one chapter at a time, on almost any browser or device. (Only one of the HTML options can be chosen in one run of Haiola.)
HTML with concordancehtmlIncludes a concordance for finding where any one word is in the Bible. (Only one of the HTML options can be chosen in one run of Haiola.)
Framed HTML with concordancehtmlMay not work, as frames have been deprecated in HTML 5. (Only one of the HTML options can be chosen in one run of Haiola.)
USFXusfxNormalized USFX. USFX is not a display format, but it is the hub format that Haiola uses to make other formats from. It is also very useful for importing into other Bible study programs, etc.
Modified OSISmosisRecommended only as a step in conversion to The Crosswire Bible Society's Sword format. See more information on creating Sword modules. Note that OSIS is not necessarily the best archival or interchange format, but it is good for this application.
BibleWorks importsearch/verseText.vpltxtThis format includes just the canonical text, with no formatting other than italics for "add" markers converted to square brackets [].
Verse-oriented XMLsearch/verseTetx.xmlThis format includes just the canonical text, with no formatting, with each verse contained in an xml element. May be useful for import into simple Bible study programs. (Haiola uses this for creation of search indexes.)
USFMusfmNormalized USFM with most comments stripped out, suitable for import into Paratext, Bibledit, or Adapt It.
Extended USFMextendedusfmThe same as the USFM files, except that these may have nonstandard, extended tags for Strong's numbers and morphological data, and maybe a tag for an additional book format in references.
ePubepubAn ePub version 3 file with some backwards compatibility with ePub 2 readers.
Microsoft WordWordMLThis is the XML document format introduced with Microsoft Word 2003, sometimes called WordML. Since then, Microsoft has dropped support for embedding your own XML documents and schema within a Microsoft Word document. Therefore the option to embed USFX in WordML is no longer supported (but you can use an old copy of WordSend and an old copy of Microsoft Word if you really need this.) Current versions of Microsoft Word can read these files if they are properly generated. (Minor USFM markup errors tend to make the WordML files not come out well-formed, so check the input well.) The WordML generation code in Haiola is extracted from WordSend. Any further updates to WordML generation will occur here in Haiola, and not in WordSend. Currently, the WordML generator does not support tables, and may not support the full range of newer USFM tags and features. Put WordSend-style seed files in the project input directory to customize the output.
Sword modulessword or swordRestricted
XeTeXxetexUnder construction—not yet implemented.
PDFpdfUnder construction—not yet implemented.
inScriptinscriptRequires proprietary extension to Haiola. See eBible.org/study for a sample. This extension is not available to the general public, but free conversion services are available for Scriptures that are Public Domain or licensed under an acceptable license that allows free redistribution.

Who is writing Haiola?

Haiola uses pieces written by Dave van Grootheest, John Duerkson, John Thomson, Nathan Miles, Michael Paul Johnson, and possibly some other people. (Kahunapule) Michael Paul Johnson is currently the only active programmer working on this project. Haiola inherits open source code from Onyx, WordSend, and Prophero. (Prophero was called SEPP at one time, and that name lives on in the source code, too.)

Where did the name “Haiola” come from?

Haiola is derived from the Hawaiian phrase “ha’i ola”, which means “preach salvation”. It is really Prophero with a new user interface and several updates. We (mostly) left the name “Prophero” behind because of confusion with a similar product. We have prior claim to the name, but it wasn’t worth the hassle and confusion of fighting for it internationally.

Support

I don't promise support with this program, but I do want it to work for you. I also want to make this program better. If you have read this document and still need help, you may send email via a secure web form at https://cryptography.org/cgi-bin/contact.cgi with enough information that I don't have to guess what the problem is. Since this document is published on the open Internet, I don't want to include a plain reference to any of my current email addresses, but if you concatinate "Michael", the usual email separator between name and domain, and eBible.org, that should work. Or, just use the contact URL above. Remember the part about reading this document first. Also, make sure you have the most recent version of this program if you have a problem, just to make sure I haven't already fixed whatever is bugging you.

If you want to get announcements of what is new with Haiola, please sign up for the announcement email list at http://groups.google.com/group/haiola?hl=en.

Copyright and Permission to Copy

Copyright © 2012-2016 SIL, EBT, DBS, eBible.org and Michael Paul Johnson.

This program is free software: you may redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License and GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Source Code

Current source code for Haiola is available at https://github.com/kahunapule/haiola. An older copy of source code when we were using Mercurial is at at Palaso.