|
Maintained by Doug Cooper (bugs to doug@th.net) Center for Research in Computational Linguistics, Bangkok
|
| Query Expansion: The engine has an experimental query expansion feature; your milage may vary. If both boxes are checked, terms that are successfully expanded are not also derived - this implementation only goes to the well once per word. |
| Expand query semantics draws on a hand-written list of likely suspects. For example, 'numeral' incorporates 'number' and 'digit', while 'Thai' includes 'Siam' and 'Siamese'. The query expansion file can be inspected at keyword.txt, and additions are welcome. |
| Derive forms draws on a machine-generated list (based on Kevin Atkinson's AGID, rev. 4). Although all forms are not merged under one lemma, expansions like write, wrote, written, writ, writing, writes are still convenient. We derive a 110K subset (of the full 3.5 meg file) from all titles, keywords, and notes; it is at deriv.txt. |
| Thanks ... For Thai L/CL data: all contributors, particularly Judith Henchy and Eric Pawley; librarians at the Thai National Library; unsung Web-site maintainers (particularly at Chula and ANU); and to the newly-minted PhDs (especially those of Stan Starosta) who put the full texts of their theses on-line. |
| Mainland SEA data are derived from Franklin Huffman's Bibliography and Index of Mainland SE Asian Languages and Linguistics (1986), made available by David Stampe on his Austroasiatic page. |
| Munda data are derived from David Stampe's Munda Bibliography (to 1983), found on the same page. |