论文标题:基于机器学习的入侵检测系统研究 The Research of Intrusion Detection System Based on Machine Learning Algorithm 论文作者 论文导师 韩宗芬,论文学位 硕士,论文专业 计算机软件与理论 论文单位 华中科技大学,点击次数 93,论文页数 63页File Size628K 2006-05-01论文网 http://www.lw23.com/lunwen_540255092/ Intrusion Detection;; Machine Learning;; Elman Neural Network;; Robust SVM 传统的基于神经网络的入侵检测系统采用前馈神经网络对网络数据包的头部信息进行分析,可以有效检测网络数据包内部的异常行为,但是未考虑网络数据包在时间维度上的动态序列统计特性、未分析网络数据包正文信息,因此,难以发现网络数据包序列之间的异常,缺乏对应用程序层的网络异常检测能力。另一方面,现有基于主机系统日志的入侵检测,在训练阶段受限于噪音数据带来的负面影响,存在高误警率的缺陷。 基于机器学习的入侵检测系统采用基于Elman神经网络的入侵检测与基于鲁棒SVM近邻分类的入侵检测两种方式解决上述问题。基于Elman神经网络的入侵检测运用聚类算法对网络数据包正文进行聚类,克服了遗漏网络数据包正文信息的缺陷。同时,利用Elman神经网络的再发生机制来记忆网络数据包的动态序列统计特性,提高了对网络数据包序列之间异常行为的检测能力。另一方面,基于鲁棒SVM近邻分类的入侵检测采用鲁棒SVM的最优分类面对主机系统日志的特征空间进行加权,实现可变尺度的近邻分类,从而消除噪音数据带来的负面影响,降低入侵检测的误警率。同时,对主机系统日志的特征空间进行加权可以消除近邻分类算法中的维数灾难,提高检测的准确率。 基于Linux操作系统采用C和C++语言实现了基于机器学习的入侵检测系统,并对林肯实验室的DARPA测试数据在网络级和主机级两个层次进行了测试。测试表明:在误警率为0的要求下,基于Elman神经网络的入侵检测可以达到92.7%的检测率;在误警率为2.3%时,检测率为96.2%。在误警率为0的要求下,基于鲁棒SVM近邻分类的入侵检测可以达到87.3%的检测率;在误警率为2.8%时,检测率为100%。 Traditional intrusion detection systems employed Feed-forward Neural Netwroks for analyzing network packet header. Current studies have shown that packet inter-arrival times follow a packet-train model, while traditional mechanisms neglect this dynamic characteristic. Furthermore, current available mechanisms discard the payload and retain the header of each packet for data analysis. As a result, these systems cannot detect inter-packet sequence anomalies, cannot detect the anomaly network traffic on application level, and cannot detect complicated and distributed intrusions. On the other hand, host-based intrusion detection systems using machine learning algorithms are limited by the noise in the training data, which leads to an over-fitting problem. In real-time detection, these systems face the challenge of high false positive rates; the administrator is in difficulty of accurately analyzing these intrusions and configuring the security policies timely. To overcome the above limitations, we implemented an intrusion detection system based on machine learning algorithm. This system includes two subsystems– Network-based Intrusion Detection subsystem using an Elman Network and Host-based Intrusion Detection subsystem using a Robust SVMs Nearest Neighbor Classifier. In the former subsystem, the clustering algorithm is used for clustering the packet payload to distill valuable information besides the packet header. To develop an efficiently working real-time anomaly detector, the BPTT algorithm is used for training the Elman network. Furthermore, with the dynamic feature of the Elman network, the proposed network detector has the capability of detecting the inter-packet anomalies. In the latter subsystem, the gradient-based weighting scheme is proposed for overcoming the over-fitting limitation. Meanwhile, this weighting scheme makes a positive effect on the curse of dimensionality, so that the detection performance is improved. This system is implemented in the Linux platform using C and C++ language. To fully evaluate its performance, we made solid experiments on DARPA dataset in terms of network-based and host-based intrusion detection respectively. Results indicate that the network-based subsystem can attain a detection rate of 92.7% with a zero false positive rate. It reaches 100% with a false positive rate of 2.3%. The host-based subsystem can attain a detection rate of 87.3% with a zero false positive rate. It reaches 100% with a false positive rate of 2.8%.
|