|


WHO WE ARE





The Center was established in 1994 to do research in computing and languages in the Mekong Valley
region -- primarily Thai, Lao, Khmer, and Burmese. Our work focuses on computational linguistics -- using computers to study language, primarily by analyzing large text corpora. In the process, we do basic
research that underlies applications in software development, education, and linguistics.
The Center is modeled on organizations like the GNU Project and the Free Software Foundation.
We think that for research and industry to flourish, a wide range of tools and data
have to be available in the public domain. It's good science -- performance must be measured
against the same raw data to be meaningful; and it's good business -- companies can focus on
adding value, rather than reinventing the wheel.
The Center, which is presently being organized as a nonprofit Thai foundation, gives away everything
it produces, including data and source code for all software. Anybody can use our
results for any purpose -- other scholars can improve them, and private companies can use them to
improve their products. Our only restriction is that the free version must always be available as well.


WHAT WE DO




Research in linguistics and commercial software development both depend on
a variety of resources: word lists, published algorithms, on-line text data, and so on.
Such resources are produced by long-term cooperative effort; research teams make use
of what's there, and contribute what they can.
In Southeast Asia, however, such resources are rarely available, and when they have been collected,
they are usually regarded as 'trade secrets.' Ironically, more publically available software and data involving
this region is available in the US and Europe than at home.
The Center was founded to help change this situation.
We seek support from a variety of sources: companies in high-tech industries, and individuals,
companies, and foundations who wish to help Thailand take the lead
in technological research in Southeast Asia. In return, we do the kind of basic research that may
not be profitable for individual companies, but which benefits everybody.


FOR EXAMPLE . . .





A typical research issue we're interested in is the problem of segmentation in Mon-Khmer languages.
These languages, which include Thai, Burmese, Lao, and Khmer, don't ordinarily uses spaces to separate words. Here's an
example from Thai:
A program that breaks a sentence into alternatives is easy (see
segmenting text
). But choosing the right one is an extremely difficult
proposition -- partly because doing it correcly requires statistical data that
doesn't exist, and grammatical analysis that hasn't yet been done.
Solving problems like this is terribly important for future research and development.
Almost any kind of investigation or application that involves text -- from spell checking, to fast database searching, to
optical character recognition, speech synthesis, and beyond -- requires analysis of vast amounts of segmented text.
We think it's time to step back from short-term product development, and look for solutions to the hard uderlying problems.


WHY GET INVOLVED?





The Center can be a model for cooperative research in Thailand. We believe that the best way to
encourage change is by example -- to show that there is more to gain by sharing results openly
than by keeping secrets.
Our work has both scholarly and practical benefits. Published research in international journals and
conferences is the strongest possible advertisement for Thailand as an intellectual destination, and
not merely a low-wage manufacturing site.
In practical terms, we are able to contribute to the development of a reginoal software industry. We can do
research that small companies cannot afford, and that multinational companies, usually only interested in
minimal 'localization' of their products, are not willing to invest in.


FOR MORE INFORMATION





To find out more about the Center for Research in Computational Linguistics, please contact
doug@nwg.nectec.or.th
We are currently attempting to establish a project-oriented internship program in Thailand.
Until that becomes a reality, we are more than happy to discuss cooperation on specific
projects via the Internet.
Coming to Thailand? Please visit our office in Bangkok's Pratunaam district.


ABOUT OUR STAFF





Doug Cooper is the Chief Research Scientist at the Center. He is well-known in the Computer Science
community; his nine college-level textbooks (which include Oh! Pascal!)
have been adopted at over 1,000 universities around the world. As a
faculty member in the Department of Electrical Engineering and Computer Science, UC Berkeley, and the Department
of Computer Science, Smith College, Prof. Cooper acquired considerable experience in getting projects done ;-}
His most recent papers, "Font Design for Thai/English Typesetting," and "Fuzzy Letters and Thai Optical Character
Recognition," were presented at the Symposium on Natural Language Processing in Thailand '95, sponsored by
Kasetsart University and NECTEC. Jump to
PAPERS for these and other recent works.
OVERVIEW |
CRCL |
CALLS |
DICTS |
FONTS |
SOFTWARE |
PAPERS |
PROJECTS |
WHO?... |
LISTS |
SPOKEN... |
REF CARDS |
SEEKING |
BASICS... |
CLOCKS |
HOW?... |
LOCAL |
CONTENTS...
All original work © 1995 Doug Cooper. Please see this
disclaimer, which takes responsibility for content, and the
release notice, which gives you the right to copy it.
We believe that all files referenced by these pages may be distributed for research / educational purposes.
If any file should not be distributed, please let us know and we will remove it.
|