Home || DocBook Sample #1 | DocBook Sample #2 ||Japanese
Download: docbook-sample.zip
This sample is an old version.
The newer version that also supports Epub and Kindle is here: DocBook Sample #2
The HtmlHelp output problem described below has been solved with a new sample.
I created Japanese documents using DocBook format as an experiment. I wrote a manuscript following DocBook XML vocabulary. I was able to generate, from the single source, a Windows Help (HtmlHelp) file as well as HTML and PDF documents using the DocBook XSL stylesheets and Apache's conversion tools.
Unfortunately, the resultant HtmlHelp file was unable to display Japanese characters and search operation in Japanese characters fail. I had to fix the project source to cope with this problem. As a result, it worked fine. This phenominon should be common in CJK area and Korean and Chinese language users may experience the same problem. Therefore I believe the solution applies to Korean and Chinese language too.
DocBook is an XML format designed to write mainly a technical document. I created HTML, PDF and Windows Help (HtmlHelp) documents from a DocBook source document in Japanese.
Here are tools I used:
This test is done under Windows XP.
In the following sections, I explain brief conversion procedures, source and generated files and tools used to generate the output. Finally, I describe the necessary fix work applied to the HtmlHelp project source.
You can find the following folder if you unzip the sample zip file.
src/ sample.xml <- Click to view toolchain.gif
The DocBook source is sample.xml and it uses a GIF graphics file.
A design principle of DocBook is get multiple output formats from a single source. I generated sample documents in various formats from the single manuscript.
By the way, you can preview a DocBook source document with a DocBook CSS directly in a Web browser without first converting it with XSLT or XSL-FO processor.
Preview with DocBook CSS (WYSIWYGdocbook) -> sample.xml
DocBook XSL and XSLT processor are used to generate HTML files from a DocBook source. DocBook XSL allows us to generate two different configuration of resultant HTML files: single and chunked.
html/ single.html <- Click to view run.bat chunk/ index.html <- Click to view ch*.html ix01.html run.bat
The html folder contains single.html which contains the entire contents in a single HTML file.
The chunk folder contains multiple HTML files each containing an individual chapter or section contents. index.html is table of contents. HTML files starting with ch are chapter or section text. ix01.html is a back-of-the-book index.
The run.bat found in both folders is a DOS batch file that copies source files in the src folder to the current folder and executes the conversion tools to generate final output.
HTML output was created in one pass by passing a DocBook source and DocBook XSL to the XSLT processor. Generation of a PDF document requires two conversion steps:
However, Apache FOP XSL-FO processor includes Apache Xalan XSLT processor and finishes the two conversion operations in one step.
Here are contens of the sanple folder.
pdf/ mydocbook.xsl sample.pdf <- Click to view sample.fo run.bat
sample.pdf is the final output PDF file. sample.fo is the intermediary FO file. You don't see this file if you use Apache FOP in a normal way. I created this file with -foout option. mydocbook.xsl is an XSL file used to pass some parameters to DocBook XSL.
The run.bat is a batch file for copying the source files in the src folder and executeing the conversion tools. Here are tools used in run.bat.
Generation of Windows Help (HtmlHelp) also requires two steps of processings as in the case of PDF generation. Multiple HTML files are generated with DocBook XSL as in the case of chunked HTML format. At this time, project files are also generated which are input to HtmlHelp Workshop (help compiler). In the second step, generated project source and HTML files are compiled to produce a single binary help file.
htmlhelp/ sample.chm index.html ch*.html htmlhelp.hhp toc.hhc index.hhk mydbk2hh.xsl run.bat textcook-docbook-htmlhelp-fixer.jar <- described later
sample.chm is the final output HtmlHelp file. index.html is the first displayed page. HTML files starting with ch are chapter or section text. htmlhelp.hhp, toc.hhc and index.hhk collectively are project source passed to HtmlHelp Workshop and are a project file, table-of-contents file and index (keyword) files. mydbk2hh.xsl is an XSL file used to pass some parameters to DocBook XSL.
Here are tools used in run.bat.
However, the resultant HtmlHelp is not usable as a Japanese languagae online help.
The problems with the HtmlHelp file created above are:
I expect that these problems will be fixed in the future versions of DocBook XML and XSLT processor. For temporary solution, I converted character set specification and character codes output by DocBook XSL to those of the platform character set (Shift_JIS in Japanese case) expected by HtmlHelp Workshop. This solution went successful.
For this purpose, I used a tool named TextCooker developed and used in Kobu.Com for converting character set and global search and replace.
The sample folder contains a limited version of TextCooker (textcook-docbook-htmlhelp-fixer.jar) which can be used to fix the HtmlHelp source generated by DocBook XSL. You need to install Java 1.4 or later to run this program.
I doubt this problem may also happen in Korean or Chinese language and I expect the problem can be fixed with the same technique described here. I would like to hear the result from someone who experiments in Korean or Chinese.
If you want to try generation of documents from the sample source by yourself, you need to set up XSLT processor, XSL-FO processor and DocBook XSL, edit the batch files (run.bat) in each folder to adapt to your environment, and run the batch file.
Here are my configuration.
set XSL=c:\docbook-xsl-1.75.2 set XLN=c:\xalan-2.7.1 set FOP=c:\fop-0.95 set XLNEXT=%XSL%\xalan-extension-1.00 <- DocBook XSL extension for Xalan set FOPFNT=%fonts%\fonts <- contains font metrics files for Windows fonts
Other document samples in this site are:
Kobu.Com's related businesses are:
Kobu.Com welcomes questions and comments about this Docbook sample. Please contact us if you need some help with text- and XML-based document creation and digital publishing in general.
Presented by: Kobu.Com
Written: 2009/08/07
Updated: 2010/03/01
The published sample code is a prototype and is not complete.
Please refrain from duplicating the sample code and its document in another place.
This page is link-free. We welcome your questions and comments.