20-865 Language Technologies in e-Commerce: Homework #2 Due by e-mail to , noon May 11, 2001. (please send only your numbered answers) A. Internationalization As discussed in class, the Google search engine provides pages in multiple languages, automatically selected according to the user's preference (set either in the browser's options or via a cookie from a "Preferences" page accessible to the right of the search box). But Google also provides a number of language-specific sites that default to languages other than English. 1. Compare the German version of www.google.com with the (default) German www.google.de. How does the main page differ? Since the various elements of Google's pages are in the same positions regardless of language, one can easily determine how each English item has been translated into another language. Think of the following as a case of Example-Based Translation: 2. What Serbian term does Google use for "similar pages"? (To help you return to English, the Serbian for English is "Engelski") Google is working on a large number of languages for use in its interface texts. Not all of them are entirely serious. On the Preferences page, select "Bork, Bork, Bork" to get the Swedish Chef. 3. What is the Swedish-Chef text for "These search terms have been highlighted" in the header of a cached page? (Perform a search, and select the link to the cached copy instead of the direct link) Various other portal sites such as Lycos and Yahoo! have opted for localized sites (www.yahoo.de, www.lycos.it, etc.) instead of an internationalized site. 4. Why would one want to have multiple sites, each specific to one country? B. Machine Translation on the Web Compare the following free translation servers available on the web: SRV1 babelfish.altavista.com SRV2 www.freetranslation.com SRV3 http://www.tranexp.com:2000/InterTran SRV4 www.worldlingo.com/products_services/worldlingo_translator.html 5. Which translator offers the most distinct languages? 6. Which translator offers the most options for adjusting the translation or getting alternative translations? Now, translate each of the following URLs: URL1 (German) http://archiv.mopo.de/archiv/2001/20010424/dpa/onl11_3_2404_0424074242.html URL2 (Spanish) http://www.abc.es/Economia/noticia.asp?id=28480&dia=03052001 URL3 (Chinese) http://www.e-multiweb.com/chgb/chgb.htm into English using each of the translation services that is capable of translating from the page's source language into English. First, compare the results without looking at the original web page (simulating a user who is completely unable to gain any information from the original text, such as Chinese pages for a user without Asian fonts). 7. Rate the quality of each translation. Is it understandable? Are there errors which are obvious even without having seen the original text, such as untranslated words? SRV1+URL1: SRV1+URL2: etc. Now, compare the original page to the translations. 8. How well do the translators preserve the formatting of the original page? Are any of them particularly good or bad? Your general impressions of machine translation. 9. Is MT usable for gleaning information from other languages? Explain in a few sentences. 10. Is the output of any of these systems of publishable quality? Explain in a few sentences. =========================================