Haiola is free and open source software that assists people in publishing the Holy Bible or portions thereof in many formats. It converts Scriptures in a source format to many output formats. The initial source format supported is Unicode USFM. USFX is also supported. We plan future support for importing USX files, as well. Currently, Haiola supports one style of HTML output, plus USFX as an output. In the future, it will support conversions to multiple styles of HTML, PDF, and formats used by various Bible study programs and electronic book reader devices and software.
You are responsible to make sure that you abide by all applicable copyright laws and the rules of the applicable Bible translation agencies when publishing Scripture. More importantly, due care appropriate to handling God's Word is your responsibility.
This program is under construction.
It does useful things for the authors, and we have reason to believe
that it will do so for you, too, but please use it at your own risk, and only
with data that you back up frequently.
This is a sort
of a partial alpha release, but the parts that are done are useful
and in operational use. We believe that it is better to build
incrementally, and release frequently as new features are added or
other improvements are made. This allows for early feedback as to
what works, what doesn’t work so well, and what would be best to
work on, next. However, this also means that you need to use this
program with due caution. Back
up your important data frequently. I
never try to make mistakes, and I test this software on my system,
but I never tried it on your system with your data. This is not
consumer-level software. There aren’t many error traps. Strange
data will cause very strange results and/or strange messages, so please
watch for error
messages and check your outputs to make sure they look reasonable. In particular,
please note that bad markup that does not conform exactly to the USFM specification
WILL cause problems. Customizations of the style sheet (custom.sty) in Paratext
are NOT honored by Haiola, and will confuse Haiola.
Haiola is a free and open source collection of programs. It inherits major portions of Prophero and WordSend.
See the change log for the list of recent changes.
The following prerequisites must be met to run Haiola:
Haiola is a file format conversion
program. It works with a mandatory directory structure and some
auxiliary files. Rather than doing the traditional thing and
specifying input file locations one by one, using this directory
structure saves a lot of time for both the user and the programmer,
at least when processing large numbers of language projects. For each
Bible translation project, you enter some meta-data (information about
the project), then run the conversions on that project or on all
selected projects. Some of the details as to what to do with the
outputs are left up to you, either to process manually or to supply a
script or batch file to process them.
Method 1: Standard windows install
Method 2: Portable install (for installation on removable media)
deb http://packages.sil.org/ubuntu distributionName main
wget -O - http://packages.sil.org/sil.gpg | sudo apt-key add -
apt-get install fonts-sil-andika
sudo apt-get install fonts-sil-gentium-plus
sudo apt-get install mono-complete
sudo apt-get install libmono-winforms2.0-cil
The root of the directory structure can be anywhere in the file system that you have read and write permissions. It is normally a directory called “BibleConv” in your Documents directory, on an external device, or on a network drive. You choose which directory you want to contain your Haiola project data during your first run of Haiola, or at any later time using the “Set data directory” button. Under the BibleConv directory, Haiola will create Work, Site, and FilesToCopyToOutput directories.
Note that Haiola is designed to work on multiple operating systems. Most Linux file systems are case sensitive. Windows is not. MacOS is normally not case sensitive, but supports case sensitive file systems, too. Linux and Mac OS use “/” to separate directories, and Windows uses “\”. In my examples, I'll use one or the other style, and leave it to you to adjust when necessary to fit your operating system.
BibleConv
input — Note: it is your responsibility to create the input project folders within this directory.
project folders, which must be named the same as the short translation identifier
Source
Unicode USFM files
for this project (if direct Paratext import is not used)
-or-
usfx
USFX
file for import
-or-
usx
USX
files named with .usx suffix (i. e. an unziped Digital Bible Library bundle). The .usx files may be in a subdirectory (i.e. USX_1) of this directory.
-and (optional)-
htmlextras
Files to copy directly to the html output directory, such as images,
introduction files, etc.
output — Note: output directories will be created and filled automatically as needed.
browserBible — Browser Bible module
cover — cover image(s)
epub
html — your configured HTML output
search — text and XML files for searching text or importing to limited Bible study apps
sql — Structured Query Language files for building a Bible contents database
usfm — Unified Standard Format Markup, great for interchange
usfx — Unified Standard Format XML, equivalent to USFM in features, but easier to process
WordML — Microsoft Word 2003 (and later) XML document format
xetex — XeTeX format (an intermediate step to producing PDFs)
browserBiblecss — CSS file input for Browser Bible modules
sword — output directory for a local Sword module repository
swordRestricted — output directory for a local Sword module repository for modules with restricted rights
Of the folders above, the ones you create are the project folders and ONE of Source, usfx, or usx folder for input under each project folder, unless you are reading the data directly from a Paratext project. In that case, you may configure a custom source directory that reads directly from the USFM files of a Paratext project. Haiola checks for a custom source directory first, then Source (USFM files), then if that directory doesn't exist, it checks for usfx, then if that doesn't exist, it checks for usx. You then fill the Source, usfx, or usx folder with the appropriate input files for that Bible translation. For example, if you have a Bible translation (or portion of a translation) for a language with Ethnologue code "abc" (Ambala Ayta), you would create a folder like BibleConv/Work/abc/Source/ and then put the USFM files in there. The USFM files should use Unicode UTF-8 text encoding.
If you have source books that you don't want to publish, like unfinished books in a Paratext project, you may omit them by deleting their abbreviations from a custom bookorder.txt file in the input/project directory. The bookorder.txt file is simply a list of SIL/UBS 3-letter abbreviations of the books that are to be included (if present), in the order that they are to be presented to the user. This file is also useful if you would like to reorder the books to fit different traditions, such as Messianic Jewish tradition or Roman Catholic tradition. The default book order is a common traditional order that is usually appropriate, but this gives you control when you need to exclude or reorder books in a project.
To include illustrations in lightweight HTML output, include suitably-sized illustrations in a browser-compatible format (like .jpg). Place these files in the htmlextras directory. Include the exact case-sensitive file name of those illustrations, without path information, in the "catalog" or "file name" field of the \fig ...\fig* tag. Filling in the copyright field is strongly recommended, and required for some illustrations. The reference and caption, if present, will appear below the illustration on the web page.
If illustrations have a different suffix, i.e. ".jpg" instead of
".TIF" in the htmlextras directory, but the base file name matches, the
extension will automatically be changed in the HTML output. If an
illustration is missing in the htmlextras directory, no <IMG> tag
will be generated. Thus, if you have copyright permission to include
only selected image files, just put those files for which you have
permission in the htmlextras directory for that project.
When generating an ePub, make sure that there is a cover image named cover.jpg or cover.png in the input project directory.
Note: cover art placed in BibleConv/covers and named with the FCBHID of a project will override and overwrite cover art in the input project directory.
Haiola currently offers 4 options for HTML generation, selected on the "Advanced" tab:
Note: please use the latest haiola.css or a derivative of it when generating Mobile HTML sites, and the latest prophero.css or a derivative of it when generating Classic HTML or concordance sites, since there are slight differences in the tag sets used. If you customize either of these, it is best to rename your resulting .css file and use the CSS name option on the advanced tab to use yours to prevent future overwrites.
See the options on the "Concordance" and "Frames" tabs of the Haiola user interface if you are using one of the two later formats. The most-tested HTML output is with concordance and frames options turned off. Only the two simple HTML options (Mobile and Classic) support sparse books with not all chapters present. If you use the framed concordance option, the navigation generation fails if the project does not include non-canonical section titles. Therefore, I recommend that you use either or both of these options only with projects that have no incomplete books, which have introduction files, and which have \s section headers in all books. The static concordance option with frame-based navigation is a slight improvement over the original Prophero HTML output. It works well on larger screens (not smart phones). This is a stop-gap option. A better search option is now available with inScript output, but the generator for that format is currently a proprietary plugin for Haiola. Also, please note that when generating concordance files, the process takes a LONG time, and may appear frozen for several minutes at a time. The process can be sped up some by deleting the output html directory before starting. If it already exits, it will be deleted anyway and replaced. That is why web page elements that you want to persist between runs must be put in the input/project/htmlextras directory.
Files placed in the input/project/htmlextras directory and ending in "_Introduction.htm" are used in the navigational structure of frame-based HTML.
Besides the files in the installation bundle, you provide any regular expression substitution files to operate on the input files to convert them to USFM. This is a systematic way to consistently change a marker that isn't consistent with the USFM standard, or possibly clean up some encoding issues. By default, fixquotes.re is called, which turns << and >> into typographic quotes, etc. The regular expression files are UTF-8 text files whose file name must end in ".re". On each line, the first character is taken as a delimiter that separates the "find" portion from the "replace" portion of the line and also ends the "replace" portion of the line. For example:
/speeling/spelling/
See fixquotes.re for another example. See http://en.wikipedia.org/wiki/Regular_expression for more about regular expressions. It is wise to test your regular expressions to make sure they do what you think they should do before trusting the transformation.
For HTML output, prophero.css is copied from the project directory, if it is there, otherwise it is copied from the input directory to the output/html directory.
On the processes tab, you may specify additional programs or scripts to run after Haiola does its transformations. Use of these transformations are optional. I use them to create digital signatures and zip files of the various output formats, then to copy the output files to a local image of a file server in the appropriate places for each project. It is up to you if you want to automate that stuff or just use a graphical file manager to do all of that.
In addition to the processes in the processes tab for each project, you may specify one process to run after all projects in a run are done. To do that, just name it "postprocess.bat" and place it in the main input directory.
If you want to present the books of the Bible in other than the default order, or if you wish to exclude certain books from publication, you may create a bookorder.txt file in a project input directory that specifies the standard 3-letter abbreviation of each book to include, one abbreviation per line at the beginning of the line, in the order the books are to be presented. Abbreviations not included are omitted from the output, so this can also be used to generate a subset project (i. e. just the New Testament, or just the books that have been cleared for publication).
If you want to merge a standard crossreference list to the published output, you may put a file named xref.xml in the project directory. This file must be formatted like the crossreferences.xml sample file in the distribution. The "xlat" entries allow translation of book names or abbreviations to the local language, and the "xref" entries are the actual crossreference notes.The book name and abbreviation list (BookNames.xml) as generated by Paratext (or manually constructed) will be read to help parse references and make links from them.
The upper section is for specifying the name(s) of regular expression files used to preprocess USFM files, i.e. to make them proper USFM when they start with some consistent variation on SFM-style markup, or possibly to correct a deviation from the Unicode standard.
The lower section is for running external processes on a project, possibly including steps like adding digital signatures, prestaging them for publication on a web site, running additional transformations, etc.
Links to related media... under construction.
Progress and error messages appear here when Haiola processes are run.
The upper section gives statistics about the currently-selected project.
If you specify "haiola -a" (without the quotes) on the command line, that is the same as pressing the "Run marked" button on startup, then closing the program when that run is completed.
The following command line utilities come with Haiola. Run one without parameters to get the syntax for its use. When running on Linux or Mac OS X, invoke them with mono, like mono massregex.exe.
Haiola supports most of the Unified Standard Format Marker (USFM) standard, version 3. Study Bible sidebars and content category tags are not supported, nor are \periph section markers. Extensions to USFM are not supported except for the ones listed below, and then only for the stated use. (We only reluctantly added those markers, because there were real needs that the base standard couldn't handle.) Haiola does not read Paratext style sheets, and does not support any changes or additions to markers made there. It does read its own sfminfo.xml file, but please note that changes or additions to that file are not supported by Haiola's authors. For information about supported USFX tags, please see the USFX documentation.
tag | end tag | use |
\ztoc4 | none | Alternate book name used for detecting references in footnotes, cross references, etc. This is normally used for "Psalm" (singular) or equivalent. This tag must appear between \toc3 and \mt# at the beginning of a book. |
\zw | \zw* | Encloses a Strong's number pertaining to the following word or phrase (that is between \zw* and \zx). |
\zx | \zx* | Marks the end of the word or phrase that the previous Strong's number applies to. Nothing is allowed between \zx and \zx*. (Paratext hates that, but it will do it.) For example, \zw H7225\zw*beginning\zx \zx* (which is equivalent to the USFX <w s="H7225">beginning</w>. Note that the XML looks a lot more logical, but the extended USFX markers are exactly equivalent in meaning. A pair of markers is used for the end marker to avoid some of the ambiguity of how to handle such markers. |
\zref | \zref* | Start of a reference hyperlink, equivalent to USFX ref element |
\zsrc | \zsrc* | Source Bible reference, equivalent to USFX src attribute of the ref element |
\ztgt | \ztgt* | Target Bible reference, equivalent to USFX tgt attribute of the ref element |
\zweb | \zweb* | Internet URL reference, equivalent to USFX web attribute of the ref element |
\zrefend | \zrefend* | Marks end of text linked by ref element; equivalent to USFX </ref>. This pair of markers is normally empty. |
Note: to remove the above tags and the extra features they convey, delete everything between a tag and its own end tag, including those tags, except for \ztoc4. For that one, remove that tag and its text up to and not including the next tag (which would normally be \mt).
The following output formats are supported:
Format | Directory | Comment |
Simple HTML | html | Simple HTML optimized for just plain reading, one chapter at a time, on almost any browser or device. (Only one of the HTML options can be chosen in one run of Haiola.) |
HTML with concordance | html | Includes a concordance for finding where any one word is in the Bible. (Only one of the HTML options can be chosen in one run of Haiola.) |
Framed HTML with concordance | html | May not work, as frames have been deprecated in HTML 5. (Only one of the HTML options can be chosen in one run of Haiola.) |
USFX | usfx | Normalized USFX. USFX is not a display format, but it is the hub format that Haiola uses to make other formats from. It is also very useful for importing into other Bible study programs, etc. |
Modified OSIS | mosis | Recommended only as a step in conversion to The Crosswire Bible Society's Sword format. See more information on creating Sword modules. Note that OSIS is not necessarily the best archival or interchange format, but it is good for this application. |
BibleWorks import | search/verseText.vpltxt | This format includes just the canonical text, with no formatting other than italics for "add" markers converted to square brackets []. |
Verse-oriented XML | search/verseTetx.xml | This format includes just the canonical text, with no formatting, with each verse contained in an xml element. May be useful for import into simple Bible study programs. (Haiola uses this for creation of search indexes.) |
USFM | usfm | Normalized USFM with most comments stripped out, suitable for import into Paratext, Bibledit, or Adapt It. |
ePub | epub | An ePub version 3 file with some backwards compatibility with ePub 2 readers. |
Microsoft Word | WordML | This is the XML document format introduced with Microsoft Word 2003, sometimes called WordML. Since then, Microsoft has dropped support for embedding your own XML documents and schema within a Microsoft Word document. Therefore the option to embed USFX in WordML is no longer supported (but you can use an old copy of WordSend and an old copy of Microsoft Word if you really need this.) Current versions of Microsoft Word can read these files if they are properly generated. (Minor USFM markup errors tend to make the WordML files not come out well-formed, so check the input well.) The WordML generation code in Haiola is extracted from WordSend. Any further updates to WordML generation will occur here in Haiola, and not in WordSend. Currently, the WordML generator does not support tables, and may not support the full range of newer USFM tags and features. Put WordSend-style seed files in the project input directory to customize the output. |
readaloud | readaloud | Plain text files for reading out loud, with verse numbers and notes removed. |
Sword modules | sword or swordRestricted | If the Sword utility osis2mod (not included in this distribution) is in your executable path, Haiola will call it with an OSIS file fragment and Sword configuration file to create a local Sword module repository. |
XeTeX | xetex | Under construction—not yet implemented. |
Under construction—not yet implemented. | ||
inScript | inscript | Requires proprietary extension to Haiola. See eBible.org/study for a sample. This extension is not available to the general public, but free conversion services are available for Scriptures that are Public Domain or licensed under an acceptable license that allows free redistribution. |
Haiola uses pieces written by Dave van Grootheest, John Duerkson,
John Thomson, Nathan Miles, Michael Paul Johnson, and possibly some
other people. (Kahunapule) Michael Paul Johnson is currently the only
active programmer working on this project. Haiola inherits open source
code from Onyx, WordSend, and Prophero. (Prophero was called SEPP at
one time, and that name lives on in the source code, too.)
Haiola is derived from the Hawaiian phrase “ha’i ola”, which means “preach salvation”. It is really Prophero with a new user interface and several updates. We (mostly) left the name “Prophero” behind because of confusion with a similar product. We have prior claim to the name, but it wasn’t worth the hassle and confusion of fighting for it internationally.
Copyright © 2012-2016 SIL, EBT, DBS, eBible.org and Michael Paul Johnson.
This program is free software: you may redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License and GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Current source code for Haiola is available at https://github.com/kahunapule/haiola. An older copy of source code when we were using Mercurial is at at Palaso.