Unsupervised Morphological Segmentation Output

This page is a distribution site for unsupervised morphological segmentation output for Bengali. This output was generated from the system introduced in the following paper:

High-Performance, Language-Independent Morphological Segmentation.
Sajib Dasgupta and Vincent Ng.
In the annual conference of the NAACL-HLT, New York, 2007.

Here are the files:

Segmented Output : 143.8K words segmented.

Mapping : The transliteration we used to map Bengali to English.