2007年4月18日

Top Infomation Retrieval Related Components of Chinese Language

At latest, Chinese lexical analysis system ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) update to 3.0. The function of ICTCLAS includes word segmentation, Part-Of-Speech tagging ,unknown words recognition and user custom dictionary.

"With
ICTCLAS3.0, the speed of word segmentation can be up to 996KB/s,precision can be up to 98.45%. But the size of API Pack is only 200KB. After compressed, the file size of all dictionaries only 3M. It must be the best Chinese lexical analysis system", quoted from the official website of the component

You can integrated that API in your Java, C#, C/C++ Applications. The downloaded package contains samples of these languages both on Linux and Windows.

The trial version of that package can be download from the official site. The site also provides some other Chinese language processing tools for downloading. I interested one of them that named Web pages body extraction component.

At last ,Give you the website address:http://www.i3s.ac.cn
By the way : i3c means "Division of Information Intelligence and Information Security"







没有评论: