*
University of Library and Information Science
1-2 Kasuga, Tsukuba, Ibaraki 305, Japan
**
Nara Institute of Science and Technology
8916-5 Takayama, Ikoma, Nara 630-01 Japan
***
Kyoritsu Women's University
3-27 Kanda-Jinbocho, Chiyoda-ku, Tokyo 101, Japan
{myriam, saka, sugimoto, tabata}@ulis.ac.jp,
aki-mae@is.aist-nara.ac.jp,
take@bungei.kyoritsu-wu.ac.jp
http://www.DL.ulis.ac.jp/oldtal
es
Another important aspect of multilingual e-texts is access to foreign language and culture. With the expansion of the WWW, the general public and even children have become an important category of users. For example, we can find a number of home pages which provide folk tales for children as well as grown-ups. Folk tales can be an introduction to the culture of a country, and an enjoyable reading for children to develop their interests in foreign cultures. Since every nation has tales, they are a fruitful material as a multilingual collection on the global network for world-wide users. As a matter of fact, some folk tales home pages provide a multilingual access [1][2]. But they contain distinct versions of the same stories in different languages.
Web pages that provide multilingual texts have to provide users with access to their texts through off-the-shelf browsers. However, the browsers cannot display all languages because of the problem caused by the character code sets acceptable to them and the font sets available on them. For example, if you are in Europe or in the United States, your browser may not be able to display documents written in an Asian language that does not employ the Roman script, e.g. Japanese, Thai, and Korean. One way of solving this problem is for users to install different sets of fonts on their machines. However, it is not actually possible to install fonts for all languages used on the Internet.
Unicode will ease the multilinguality problem, but users still have to load fonts for complete set of Unicode. We believe that the key point for multilingual information distribution is simple and inexpensive technology for users, especially those who casually access multilingual information. To that end, we have developed a multilingual browser, called the Multilingual HTML Browser (MHTML Browser), which allows users to view documents written in foreign languages without installing any fonts on their machines in advance to access the foreign documents. The MHTML browser is realized using Java, so that the user needs only an off-the-shelf WWW browser capable of Java applet.
This article presents a multilingual e-texts collection of Japanese old tales which is provided for users using the MHTML technology. The collection contains e-texts written in English, French, and Japanese. This article briefly describe the MHTML technology; details are given elsewhere [3][4][5].
2.1 Folk Tales as Multilingual Information Resource
Folk tales are a world-wide cultural heritage. Each nation has its own folk tales which have been transmitted from generation to generation. Tales had been orally transmitted and were written down rather recently in our history. A folk tale has a number of variations and shows us the rich cultural background of a nation. On the one hand, folk tales are regional, but on the other hand they are universal. Yet we can find a lot of themes in them which are common among nations. They tell us the story about human kind and the rules of the human community; for example, you must not judge a man on his appearance or on his size. This is the theme of the Japanese tale "Issunboshi" and of the French tale "Le Petit Poucet". Both of the heroes are small men, but both of them are courageous and clever. Another common feature is the diversity of protagonists and leading characters, e.g., human beings, animals, imaginary beings and things common in our lives. We can also find cultural diversity from folk tales, e.g. description of everyday life and habits, characteristics of leading characters and background of a story. Thus, folk tales are a very rich material for sharing cultural heritage over the world and for understanding the cultural diversity.
Folk tales have been transmitted from parents to children, and to grandchildren as an enjoyable medium to learn what they need to live in their community. In some cases there are several variants of a tale which differ region by region, or even village by village. Thus, folk tale is an important part of culture of a nation, a region or a village. This fact implies that folk tales give people a common background as a member of a community. For the people of a community, which may be nation-wide, region-wide, or village-wide, a leading character of tales represents a certain meaning, e.g., bravery, gentleness, beauty and so on. For example, the leading character of a Japanese folk tale "Momotaro" is a young boy called Momotaro and he is a symbol of strength and bravery for the Japanese people.
Folk tales have been orally transmitted from a generation to the next generation in a community. This fact is, in a sense, quite different from the communities on the WWW and the Internet. The traditional community is built based on geographical distance. But geographical distance is meaningless on the Internet which provides us with a global communications infrastructure. This means that cross-lingual and cross-cultural information is crucial on the Internet since the users can access foreign information very easily. In addition to researchers and students, members of the general public and children are important part of the user communities on the Internet. We have been working on the multilingual folk tale collection in order to create a shareable information resource based on multiple cultures for various users with different cultural backgrounds.
2.2 Building the Folk Tales Collection
The multilingual e-text collection of Japanese old tales contains ten Japanese tales chosen from well-known Japanese tales. Every tale is written in Japanese, English, and French. It has three distinct entrance pages (i.e. home pages) in parallel written in these three languages linked from the primary entrance [http://www.DL.ulis.ac.jp/oldtales/] Maintaining entrance pages for multiple languages is an important aspect for users, but it makes our maintenance procedure a little complicated. All of these three home pages are organized in the same structure. In this section, we explain the structure using the English pages. Three pages are linked from the home page, a multilingual page, a monolingual page with MHTML support, and a monolingual page without MHTML support.
The multilingual page contains a multilingual text viewer implemented as a Java applet based on the MHTML technology. The first page contains a table of contents written in English, French, and Japanese. The table of contents is displayed in parallel as illustrated in Figure 1.
This multilingual table of contents is implemented as a table whose elements are an applet to display multilingual texts. Since font glyphs are supplied from the folk tales server, a user need not install fonts in advance. A click on a title of one story, in whichever language, leads the user to the tale chosen. The texts of the tale in three languages are then displayed in parallel as shown in Figure 2.
Illustrations are added to the texts as an important component to make the texts attractive for readers, especially for children. Every document linked from the table-of-contents is organized in the same way. This use of multilingual e-text reminds of the use of bilingual books, allowing the reader to switch immediately from one language to the other. The advantage of this system is its flexibility; it is possible to present the same text in three or even more languages in parallel. More over, this system has potential to offer a more flexible service for a user such as choosing a set of languages to read texts.
In the monolingual pages, readers can get a text written in English, French or Japanese. The monolingual page without MHTML provides the texts in HTML. Readers are supposed to have Japanese and Latin-1 fonts to read the texts. The monolingual page with MHTML also offers another presentation of multilingual texts for users who do not want to browse several languages in parallel. This access point to the Japanese old tales collection provides the user with the text of a tale in one language only. The text is displayed using the same applet as used in the multilingual page; the difference is that, in this case, only one panel is displayed on a screen. Figure 3 shows a page displaying one text.
As shown in the next section, the multilingual document panel implemented as an applet receives an object which contains a source text string and the minimum set of font glyphs required to display the text. Since the object is automatically created from the source HTML text by an MHTML server, the tales are encoded purely in HTML. The same HTML text is used for these three different interfaces.
The MHTML browser system is composed of two components, MHTML server and MHTML viewer applet. The MHTML server converts an HTML document into an MHTML document object on the fly and sends the objects and the applet to a client. The applet running on the client receives the object and displays the text encapsulated in the object. Figure 5 shows an overview of the MHTML browser system. (The details of MHTML are described in [3][4][5].)
The MHTML technology is advantageous for browsing multilingual documents in
the following aspects.
Building a multilingual folk tales collection includes other problems, which include collecting tales and translating them into other languages. First, let us show our strategy for collecting tales. We chose ten famous folk tales. Those tales are very old and have no copyright restrictions. (Books containing those tales are subject to copyright, but the tales themselves are public domain.) We wrote our own copy of the tales in Japanese and translated them into French and English.
This adaptation was inevitable not only to resolve the copyright problems but also to cope with variants of the tales. Each tale has several variants, some of them are created by the authors of books and some are regional variants. The range of variations of a tale is quite large, from the characters' names to key elements of the story. For example, the Japanese tale "Cracking Mountain" has what we could call a "hard" and a "soft" version. In the "hard" one, the wicked badger kills an old woman, cooks her, and after taking the appearance of the old woman, makes the husband eat the stew made out of his own wife. In the "soft" version, there is no human stew. So we had to cope with the variants of each tale and, just as other authors did, adapted our own version. For this purpose, we examined and compared at least four or five variants, and also read documents on the source of the tale. We then selected elements and wrote our own version of the folk tale.
The adaptation was also required to deal with the characters which appear in a tale and their properties. Quite a few common characters appear in tales of different countries, however they represent different meanings. For example, the tortoise is a symbol of longevity and mutual love in the Japanese tales (e.g. "Urashimataro") but is a symbol of tenacity in Europe (e.g. "The Hare and the Tortoise"). This point is quite important for translation from Japanese to English and French.
4.2 Issues in Translation of Folk Tales
We first wrote a Japanese text of each tale, which is considered as the source text in our collection. Then we translated it into English and French. At this point, we had to deal with the translation of concepts: the basic concept of ogre is common to both Japan and Europe, but several differences appear. In Europe, an ogre is usually a character nearly a human being, a man who eats little children (e.g. "Le Petit Poucet", a French tale). Sometimes, it may be an old woman as in "Hansel and Gretel" -- but this is not common. But in Japan, an ogre ("Oni" in Japanese) is not a human being, it is much more a monster which behaves like a robber and usually does not eat people (e.g. "Momotaro the Peach-boy", "Issunboshi"). So it was quite difficult to determine the term that means Japanese ogre "oni", because the word "ogre" conveys a different meaning in Europe. And, there is no appropriate word representing "Oni" in the European cultural context. On the other hand, the term "ogre" seems appropriate as a translation from the Japanese "Yamamba". This character is usually an old woman who eats people. But "ogre" generally refers to a man, and on the other hand an old woman like "Yamamba" would be a witch. Therefore, lots of problems are raised by translation and have to be solved in a more or less free interpretation in order to preserve the integrity of the original tales in translation.
4.3 Issues of Human Resource
Human resources are crucial to developing a multilingual system, i.e., native speakers and/or language specialists. The translated texts of the e-text collection have been checked by native speakers, which is necessary to guarantee the quality of the texts. This check addresses not only grammatical accuracy but also selection of words and phrases that requires cultural sensitivity. The MHTML technology is advantageous in this aspect because we can distribute MHTML servers to locations where we can get the human resources to build collection for local languages and create multilingual collection as a whole.
4.4 Issues Specific in Japanese e-texts: Multiple Character Sets
With respect to the Japanese texts, we had to deal with the problem of Chinese characters, which are called "Kanji". The number and level of difficulty of characters is related to the difficulty of the text. As we wanted to provide texts easy to read for children, we had to use only characters easy to read for children, i.e. "Hirakana", which is a set of about 50 phonetic characters, and a limited number of Kanji which children learn in the lower grades in elementary schools. However, a text written mainly in Hirakana is not easy to read for an adult. An ordinary solution in Japanese books for this kind of problem is to write the text using Kanji where they should be used for adults and add transcription in Hirakana for children as a superscript for Kanji, which is called "Furigana" in Japanese. This additional text would be useful not only for children but also for foreign readers. However, it is difficult to add Furigana because HTML has no function to add such superscript.
We hope to extend the collection to other languages in order to fully use the multilingual environment and its capacities. We also plan to extend it to the tales of other countries in the future.
In addition to the e-texts, our future work also includes defining metadata for multilingual texts and creating a flexible user interface for children. We believe that metadata for coexisting multilingual texts have to be provided to extend the collection. Since the old tales collection is created not only for adult readers but also for children, we are working on a user interface designed for children making use of images and animations.
The authors would like to thank Frances Marr who was an English teacher at ULIS. We could not translate the tales into English without her help.
[4] Tetsuo Sakaguchi, Akira Maeda, Takehisa Fujita, Shigeo Sugimoto, and
Koichi Tabata:
A Browsing Tool for Multi-lingual Documents for Users without
Multi-lingual Fonts
Proceedings of the 1st ACM International Conference on Digital Libraries.
p.63-71. Mar. 1996.
hdl:cnri.dlib/october97-sugimoto