论文标题:基于本体与模糊数学的自动分词系统研究
论文作者 论文导师 刘万春,论文学位 硕士,论文专业 计算机应用技术 论文单位 北京理工大学,点击次数 200,论文页数 61页File Size1404K 2008-06-05论文网 http://www.lw23.com/lunwen_68445572/ Chinese automatic word segmentation;; ambiguity;; HowNet;; fuzzy mathematics;; build word II 中文自动分词是中文信息处理领域的基础课题,也是中文信息处理发展的瓶颈之一。其中对歧义字段的处理是影响分词精度的关键,国内外许多研究人员在这一领域都进行了深入的研究,但就目前现状来看,仍不能满足实际应用的需要。 本文介绍了中文自动分词的现状和基本的自动分词方法,分析了分词系统的困难。深入学习和研究了《知网》语义网络和模糊数学理论,并发现它们与自动分词的结合点,提出基于《知网》和模糊数学的分词方案:对输入中文文本进行两次扫描,第一次扫描找出句子中确定无歧义的词和歧义字段信息,第二次扫描找出正确的歧义字段切分方案,进而得到整个句子的分词结果。 《知网》语义网络作为主要知识来源,建立《知网》义原网络体系和概念网络体系,有效存储义原和概念的条目内容及内在的关系。采用《知网》的概念知识词典作为基本分词词典。根据模糊数学的理论制定排歧规则。对中文句子用“砌词”思想初步分词后,利用概念网络体系知识和排歧规则解决有歧义的问题。 针对中文自动分词的歧义问题,提出了基于《知网》语义关系网络和模糊推理机制的分词系统设计目标,结合“砌词”思想阐述了它的工作原理和设计方案,并初步实现了原型系统。 Chinese automatic word segmentation is the fundamental task of the Chinese Information Processing. It becomes one of the bottlenecks in Chinese Information Processing. The Chinese word is-ambiguity is the key factor of precision of Chinese word segmentation. Many researchers have studied in this field. But come to the present condition, it can"t satisfy the demand of application. This article has introduced the present situation and basic methods of Chinese automatic word segmentation, and has analyzed the difficulties of it. Deep study and research in HowNet semantics network as well as fuzzy mathematics have been done, the combined points between Chinese automatic word segmentation and them are found, and word segmentation plan based on HowNet and fuzzy mathematics is proposed: The input Chinese sentences are scanned twice. Firstly the words without ambiguity and words sequences with ambiguity are found out, secondly the right segmentation for ambiguity word sequences are found, so are the segmentation results for the entire sentences. HowNet semantics are used as knowledge source to build HowNet sememes network and words network, which record sememes and words so as the relationships of them. HowNet knowledge dictionary is the basic dictionary; making fuzzy mathematics rules for ambiguity. After the first segmentation based on“building word”, HowNet words network and fuzzy rules are used to do the second segmentation. As the dis-ambiguity aspect, we presented the goal of designing a word segmentation system based on HowNet semantics networks and fuzzy reasoning mechanism, discussed the principle and schemes combined with“building words”thoughts, and developed a prototype system.
|