Draft; Nov 6, 1997 - was 'Convention for adding Yomi to Index Entries'
Update; Nov 9, 1997 - I understand yomi is a special case of sorting keys.
Comments and suggestions are welcome.
Corrections to mistakes in English are also welcome.
This file can be freely distributed if you don't modify it.
There are several on-going projects for adding Table of Contents and Index to HTML-based documents (such as HTML-based help from Microsoft HTML Help, Netscape NetHelp 2, Sun JavaHelp, etc.). They define TOC and Index file formats. Some of them also define TOC and Index entry format. To assist these efforts, I would like to present a simple and general way of adding sorting keys to each index entry.
Briefly, the format of an index entry will be:
word-spelling ~ sort-keywhere tilde (~) and sort-key are optional. When sort-key is present, it must be used to determine its order; otherwise word-spelling is used as always.
I know there are lots of nice authoring technologies and tools created in the United States that support double-byte character set (DBCS) so that Japanese authors can also use them. But unfortunately, these technologies and tools are not fully useful to Japanese authors if they can't create appropriate Index. They need extra tools or manual labor to do the work or they have to abandon the use of such tools and technologies.
By supporting this or similar sorting key entry format, authoring technologies and tools created outside of Japan will become very useful and sufficient to even Japanese authors. I believe this also applies to other far east countries where Chinese-origin characters are used. Then writers and programmers in these countries can be good customers to authoring tools developers world-wide.
In English, most words consists of only Latin alphabets and they are ordered by their spelling. This is not true in Japanese language. I don't know much about other languages but I believe that this is also a case in Koera and China. I imagine it is also a case for European languages which require accented alphabets.
Japanese words containing kanji are ordered by their yomi not by its spelling. Yomi is pronunciation of kanji words and traditionally used for arranging words in order.
Japanese characters are divided into two groups;
kana (phonetic characters such as
) and kanji (ideographic symbols imported from ancient China such as
).
Off course we sometimes mix Latin and Greek alphabets too.
Here is an example.
A kanji for mountain is
.
River is
.
Ancient Chinese people pronouned
like 'san' (
in kana pronunciation) and pronounced
like 'ga' (
in kana).
A Japanese kana word for mountain is
(pronounced like 'yama').
River is
('kawa').
We read
as 'yama' and
as 'kawa' when it stands alone in a sentence.
However if
and
are concatenated to form a single word
to mean a landscape with mountains and rivers, they are pronounced as 'sanga' (
in kana).
Yomi is a kana word associated with a kanji word for pronunciation and sorting purposes. Japanese authors need some mechanism for adding yomi to every word containing at least one kanji when they create Index. However we don't have to add yomi to words consisting only of kana and Latin alphabet.
<A NAME=... INDEXSTRING="index-entry^index-entry^...">Multiple entries are delimited by a circumflex (^).
An example:
<A NAME=... INDEXSTRING="mountain^river">My suggestion is to provide an optional place for adding sorting key to index entry. Sorting key is optional. English and kana-only words don't need sorting keys. Only words containing some kanji (or maybe accented characters) need sorting keys.
index-entry = word-spelling index-entry-with-sort-key = word-spelling ~ sort-keyWhen you add sorting key to index entry, add tilde (~) and the sorting key of the entry. One or more spaces can be inserted before and/or after the delimiting tilde for readability. If an index entry contains any of entry delimiter (^) or sort key delimiter (~) or backslash (\), then add a backslash in front of it.
Although not related to yomi, I would like to point out that some mechanisms are necessary to indicate subordinate (or indented) entries. WinHelp uses comma (,) and colon (:) for this purpose by default.An example:
<A NAME=... INDEXSTRING="Sorting key feature is critical to Japanese authors to add yomi to entries. It should be useful to everyone because it gives finer control in arranging index entries. For example, the C language compier command '#include' can be placed in I section not in Symbols section if you prefer.~
^
~
^
~
">
<A NAME=... INDEXSTRING="#include~include">The point of this convension is that. If the authoring tool sees a tilde in the middle of an index entry, the words following the tilde should be treated as sorting key for the words in front of the tilde.
Note that, after sorting of index entries is complete, sorting keys are not necessary any more. You don't have to keep them. For example, Microsoft adopts W3C Site Map format for holding sorted index entries in HTML Help. The Site Map file doesn't have to keep sorting keys in it.
I hope the reader understands that this simple addition is very useful to authors who works in the countries where word spelling just don't mean word order. I am glad that architects of authoring systems can spend some time in designing such a feature. I am also glad that programmers of authoring tools can spend some time in implementing it for us.
Thank you for reading this memo.
I am ready to answer questions if you need more information about this subject. I welcome input from non-English writers about 'yomi' of their language and sorting requirements for Index.
About the author