论文标题:图像和视频文字检测技术研究 Research on Text Detection in Images and Video Frames 论文作者 论文导师 高文,论文学位 博士,论文专业 计算机应用 论文单位 中国科学院研究生院(计算技术研究所),点击次数 205,论文页数 122页File Size17692K 2006-02-01论文网 http://www.lw23.com/lunwen_81475592/ Text detection; Text recognition; Video content analysis; Wavelet feature; SVM classification; Image region segmentation 图像和视频中的文字是一种包含丰富信息的对象,对于视频内容分析、检索,图片内容理解等研究领域有重要作用。不同于其他典型模式(如单个汉字字符模式、人脸模式等),成行的文字在大小、灰度、形状、颜色等属性上具有很强的不一致性,在很多情况下文字还处于复杂的背景中,这给文字的检测和识别带来了巨大的困难。而且传统的使用机器学习方法直接检测图像块模式的方法不适合于文字检测问题。因此,本文基于由粗到精的检测思想,提出了适用于几种典型文字((1)视频叠加文字,(2)自然场景图像中的文字,(3)单个数字字符)的一个通用检测框架,在文字检测过程中,我们总是依赖于最可靠的特征进行文字粗定位,然后融合其他特征对候选文字进行验证。这样既提高了检测速度,又能够保证高的检测精度。在三种具体样例上,详细的分析了由粗到精的思想对于文字检测的有效性和重要性。在本文的总结和扩展中,讨论了将由粗到精的检测方法推广到检测图像中的其他纹理对象的可行性。 对于视频帧中的叠加文字,作者使用了多尺度的小波特征进行检测。在这个研究中,我们着重研究了如何融合、选择有效的底层特征用于区分文字行和非文字行模式。首先,在文字行的粗定位程序中使用了小波能量特征和全局直方图分析的阈值确定方法检测候选文字象素,然后提出了一种“基于密度”的区域增长方法将离散的像素连接成为候选文字区域。对于检测到的候选文字区域,使用结构特征分割为单个的文字行。在精确分类过程中,融合了三种纹理特征和一种结构特征来表达文字行模式,使用了前向特征选择算法进行了融合特征的筛选。最后,基于选择的纹理特征,使用了支持向量机(SVM)方法分类文字行和非文字行模式。实验表明,算法能够快速,鲁棒的检测视频叠加文字。视频文字的背景往往是复杂的,基于图像灰度信息的OCR软件不能取得好的识别效果。为此,作者提出了一种从复杂背景中分割文字前景的算法。在此方法中,作者基于Canny边缘检测结果提出了一种采样规则,并且使用混合高斯模型(GMMs)对于采样像素在色度-亮度二维特征空间内建立颜色模型,然后使用颜色模型准确地检测所有前景像素。这种先采样后检测的方法,使得文字分割完全自动并且具有很好的效果。 对于自然场景图像中的文字,在由粗到精的检测框架内,作者融合了颜色、小波直方图、OCR识别结果统计特征。并且研究了如何从复杂的图像中分割和定位文字行模式和对于发生了仿射形变的文字行进行恢复的方法。在此过程中,对图像分割技术、区域布局分析技术在文字行定位过程中的作用进行了深入的研究。对本文提出的由粗定位到精分类的检测框架进行了深化和验证。对于仿射形变文字行的恢复,利用了平面间的Homography运算,不需要任何摄像机参数。 最后,作者研究了一种更为难于检测的文字模式——具有非刚体形变的字符(运动衫号码)。在这种字符的检测中,主要的困难来源于文字的扭曲变形,这种变形是非刚体 Text in images and video frames carries important information for visual content understanding and video retrieval. Different from traditional patterns (single character, humance face, etc.), text line varies in size, grey, shape and color. Furthermore, some texe line embedded in complex background. These bring on difficulties to text detetection and recognition. Traditional learning-based classification method on image block is unsutibale for text detection. While in this thesis, a general framework is proposed for text detection based on coarse-to-fine object detection idea. We justify the effectiveness and importance of the framework in three kinds of representative text patterns including (1) video overlay text (2) text in natural scene images (3) single digital character. In the text detection process, text will be located by its most identical feature and then verified by other effective features. This scheme will own fast detection speed and high precision. During the summarization and exention, we discuss the feasibility of applying the coarse-to-fine method on other texture object detection task. For the overlay text in video frames, multiscale wavelet features are extracted for text detection. Feature combination and selection for text/nontext discrimination are emphasized in this algorithm. Firstly, in the coarse detection, after the wavelet energy feature is calculated to locate all possible text pixels, a density-based region growing method is developed to connect these pixels into regions which are further separated into candidate text lines by structural information. Secondly, in the fine detection, with three kinds of texture features and one kind of structure feature extracted to represent the texture pattern of a text line, a forward search algorithm is applied to select the most effective features. Finally, an SVM classifier is used to identify true text from the candidates based on the selected features. Experimental results show that this approach can fast and robustly detect text lines under various conditions. Text often lies in complex background and then bad recognition result is reported on grey-value based OCR software. We proposed an automatic method to segment text from complex background for recognition task. A rule-based sampling method is proposed to get portion of the text pixels. Then, the sampled pixels are used for training Gaussian Mixture Models of intensity and hue components in HSI color space. Finally, the trained GMMs together with the spatial connectivity information are used for segment all of text pixels form their background. Experiments results show that the proposed algorithm can work fully automatically and performs much better than the traditional methods. In the scene text location algorithm, color, wavelet histogram and OCR feedback features are combined in different detecting stages. Image pixels firstly are grouped into regions by
|