Per-Language Sentence-Breaking Files
For languages in which words are not delimited by spaces (Japanese, Chinese, Thai, and Korean), the Content component uses sentence-breaking libraries. In a default Content component installation, these files are stored in the langfiles directory.
If you run Content on a UNIX platform, specify the LD_LIBRARY_PATH to ensure that Content can find the sentence-breaking files that it requires.
The following tables list the files that the individual languages require.
-
Japanese
NT UNIX japanesebreaking.dll\jpn-cha\cforms.cha\jpn-cha\chadic.da\jpn-cha\chadic.lex\jpn-cha\chasenrc\jpn-cha\connect.cha\jpn-cha\ctypes.cha\jpn-cha\grammar.cha\jpn-cha\makeda.exe\jpn-cha\matrix.cha\jpn-cha\table.chalibchasen.dlljapanesebreaking.so/jpn-cha/cforms.cha/jpn-cha/chadic.da/jpn-cha/chadic.lex/jpn-cha/chasenrc/jpn-cha/connect.cha/jpn-cha/ctypes.cha/jpn-cha/grammar.cha/jpn-cha/makeda.exe/jpn-cha/matrix.cha/jpn-cha/table.cha -
Traditional Chinese
NT UNIX chinesebreaking.dllbig5togb.txtwordlist.txtchineseconvlist.txtchinesebreaking.sobig5togb.txtwordlist.txtchineseconvlist.txt -
Simplified Chinese
NT UNIX chinesebreaking.dllbig5togb.txtwordlist.txtchineseconvlist.txtchinesebreaking.sobig5togb.txtwordlist.txtchineseconvlist.txt -
Thai
NT UNIX thaibreaking.dllthaidict.txtthaiconvlist.txtthaibreaking.sothaidict.txtthaiconvlist.txt -
Korean
NT UNIX koreanbreaking.dllmain.datprob.datmain.fstprob.fstpos.namtag.namtagout.namconnection.txtStopPosNam.txtTagName.txtkoreanconvlist.txtkoreanbreaking.somain.datprob.datmain.fstprob.fstpos.namtag.namtagout.namconnection.txtStopPosNam.txtTagName.txtkoreanconvlist.txt