border
border OVERVIEW | CRCL | CALLS | DICTS | FONTS | SOFTWARE | PAPERS | PROJECTS | WHO?... | LISTS | SPOKEN... | REF CARDS | SEEKING | BASICS... | HOW?... | CLOCKS | LOCAL | CONTENTS...

CRCL logo COMPUTING AND LINGUISTICS IN THAILAND

ABOUT THE CENTER FOR RESEARCH IN
COMPUTATIONAL LINGUISTICS, BANGKOK

border
space
spacespace WHO WE ARE spacers
space
The Center was established in 1994 to do research in computing and languages in the Mekong Valley region -- primarily Thai, Lao, Khmer, and Burmese. Our work focuses on computational linguistics -- using computers to study language, primarily by analyzing large text corpora. In the process, we do basic research that underlies applications in software development, education, and linguistics.
vertical space The Center is modeled on organizations like the GNU Project and the Free Software Foundation. We think that for research and industry to flourish, a wide range of tools and data have to be available in the public domain. It's good science -- performance must be measured against the same raw data to be meaningful; and it's good business -- companies can focus on adding value, rather than reinventing the wheel. map of the region
vertical space The Center, which is presently being organized as a nonprofit Thai foundation, gives away everything it produces, including data and source code for all software. Anybody can use our results for any purpose -- other scholars can improve them, and private companies can use them to improve their products. Our only restriction is that the free version must always be available as well.
space
spacespace WHAT WE DO spacers
space
Research in linguistics and commercial software development both depend on a variety of resources: word lists, published algorithms, on-line text data, and so on. Such resources are produced by long-term cooperative effort; research teams make use of what's there, and contribute what they can.
vertical space In Southeast Asia, however, such resources are rarely available, and when they have been collected, they are usually regarded as 'trade secrets.' Ironically, more publically available software and data involving this region is available in the US and Europe than at home.
vertical space The Center was founded to help change this situation. We seek support from a variety of sources: companies in high-tech industries, and individuals, companies, and foundations who wish to help Thailand take the lead in technological research in Southeast Asia. In return, we do the kind of basic research that may not be profitable for individual companies, but which benefits everybody.
space
spacespace FOR EXAMPLE . . . spacers
space
A typical research issue we're interested in is the problem of segmentation in Mon-Khmer languages. These languages, which include Thai, Burmese, Lao, and Khmer, don't ordinarily uses spaces to separate words. Here's an example from Thai:
vertical space A program that breaks a sentence into alternatives is easy (see segmenting text ). But choosing the right one is an extremely difficult proposition -- partly because doing it correcly requires statistical data that doesn't exist, and grammatical analysis that hasn't yet been done.
vertical space Solving problems like this is terribly important for future research and development. Almost any kind of investigation or application that involves text -- from spell checking, to fast database searching, to optical character recognition, speech synthesis, and beyond -- requires analysis of vast amounts of segmented text. We think it's time to step back from short-term product development, and look for solutions to the hard uderlying problems.
space
spacespace WHY GET INVOLVED? spacers
space
The Center can be a model for cooperative research in Thailand. We believe that the best way to encourage change is by example -- to show that there is more to gain by sharing results openly than by keeping secrets.
vertical space Our work has both scholarly and practical benefits. Published research in international journals and conferences is the strongest possible advertisement for Thailand as an intellectual destination, and not merely a low-wage manufacturing site.
vertical space In practical terms, we are able to contribute to the development of a reginoal software industry. We can do research that small companies cannot afford, and that multinational companies, usually only interested in minimal 'localization' of their products, are not willing to invest in.
space
spacespace FOR MORE INFORMATION spacers
space
To find out more about the Center for Research in Computational Linguistics, please contact doug@nwg.nectec.or.th
vertical space We are currently attempting to establish a project-oriented internship program in Thailand. Until that becomes a reality, we are more than happy to discuss cooperation on specific projects via the Internet.
vertical space Coming to Thailand? Please visit our office in Bangkok's Pratunaam district.
space
spacespace ABOUT OUR STAFF spacers
space
Doug Cooper is the Chief Research Scientist at the Center. He is well-known in the Computer Science community; his nine college-level textbooks (which include Oh! Pascal!) have been adopted at over 1,000 universities around the world. As a faculty member in the Department of Electrical Engineering and Computer Science, UC Berkeley, and the Department of Computer Science, Smith College, Prof. Cooper acquired considerable experience in getting projects done ;-}
vertical space His most recent papers, "Font Design for Thai/English Typesetting," and "Fuzzy Letters and Thai Optical Character Recognition," were presented at the Symposium on Natural Language Processing in Thailand '95, sponsored by Kasetsart University and NECTEC. Jump to PAPERS for these and other recent works.
OVERVIEW | CRCL | CALLS | DICTS | FONTS | SOFTWARE | PAPERS | PROJECTS | WHO?... | LISTS | SPOKEN... | REF CARDS | SEEKING | BASICS... | CLOCKS | HOW?... | LOCAL | CONTENTS...

All original work © 1995 Doug Cooper. Please see this disclaimer, which takes responsibility for content, and the release notice, which gives you the right to copy it. We believe that all files referenced by these pages may be distributed for research / educational purposes. If any file should not be distributed, please let us know and we will remove it.
red bar