论文标题:EM算法及其改进在混合模型参数估计中的应用研究
论文作者 论文导师 郭元术,论文学位 硕士,论文专业 交通信息工程及控制 论文单位 长安大学,点击次数 99,论文页数 68页File Size2255K 2006-05-15论文网 http://www.lw23.com/lunwen_379928617/ mixture model; Maximum likelihood estimation; EM algorithm; optimal number of components; Genetic algorithm 有限混合模型是分析复杂现象的一个灵活而强有力的建模工具,它提供了用简单结构模拟复杂密度的一个有效方法,给出了模拟同质性和异质性的一个自然框架和半参数结构。 EM算法为有限混合模型的极大似然估计提供了一个标准框架。本文简单推导了有限混合高斯分布的EM算法,并针对其收敛速度慢的缺点设计了一种有效选取参数初始值的方法,数值实验表明,该方法有助于EM算法以较快的速度在参数真值附近收敛。 EM算法思想简单,易于实现。但是,EM算法往往获得的只是一个局部最优解,这是因为它本质上是一个迭代算法,只能保证达到局部最优,而遗传算法具有强大的全局搜索能力,因此,本文将采用遗传算法来改进EM算法,提出一种以遗传算法为主、结合EM迭代算法的混合算法(即GA_EM算法)。在前面的研究中,我们总是提前定义一个混合模型的分支数,但在大多实际应用中,最优分支数是未知的,所以为混合模型选择一个最优分支数是一个相当重要又困难的问题。本文将使用GA_EM算法来学习未知支数的多元高斯混合模型,算法中使用MDL准则作为选择最优分支数的信息准则。采用EM算法和遗传算法混合编程的目的是为了更好地利用两种算法各自的优点,基于随机搜索优化技术遗传算法的种群极大地拓展了EM算法的搜索空间,有效地降低了基本EM算法对初始值的依赖程度,改善了其收敛到局部最大值的缺陷。数值实验也表明,GA_EM算法不仅继承了EM算法的单调收敛性,对模型参数初始值也更加稳健:1)在同样的迭代终止条件下,GA_EM算法能够得到比EM算法更好的MDL值。2)GA_EM算法克服了EM算法选择最优分支数的正确率会随着分支数的增加而迅速降低的缺陷。 Finite mixture model is a flexible and powerful tool for analyzing complicated phenomena, which provide an efficient method of simulating complicated density with simple structures and present a natural frame and semi-parameter structure of modeling unobserved population homogeneity and heterogeneity.The expectation-maximization (EM) algorithm is a standard frame for maximum likelihood estimation in finite mixture models. The EM iterative formulas for Gaussian mixtures are derived in the present paper, and we propose an efficient initialization method for these iterative formulas to accelerate the EM algorithm" speed of convergence. Experiences of numerical simulation show that the proposed initial method contributes to accelerate the algorithm"s speed of convergence, the more accurate results can be obtained.The EM algorithm is realized easily because of it"s simplify. Unfortunately, optimal results are not always achieved because the EM algorithm, iterative in nature, is only guaranteed to produce a local maximum. As we known, the Genetic algorithm (GA) has much stronger global searching ability, we import the GA algorithm for improving the EM algorithm, and propose a genetic-based expectation-maximization (GA-EM) algorithm. The EM algorithm is restricted to a predefined number of components. However, in many practical applications, the optimal number of components is unknown, therefore, finding the optimum number of components in the mixture is an important but very difficult problem. We"ll learn Gaussian mixture models from multivariate data the GA-EM algorithm . This algorithm is caballing of selecting the number of components of the model using the minimum description length (MDL) criterion. Our algorithm benefits from the properties of Genetic algorithm (GA) and EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM algorithm, therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The experiments on simulated data show that the GA-EM algorithm maintains the monotonic convergence property of the EM algorithm, taking on a good robustness to initial values of parameters: 1) we have obtained a better MDL score while using exactly the same termination condition for both algorithm. 2) our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.
|