XAO OpenIR  > 计算机技术应用研究室
利用带无标签数据的双支持向量机对恒星光谱分类
Alternative TitleStellar Spectra Classification by Support Vector Machine with Unlabeled Data
刘忠宝1,2; 雷宇飞1; 宋文爱2; 张静2; 王杰3; 屠良平4
2019-03-01
Source Publication光谱学与光谱分析
ISSN1000-0593
Volume39Issue:3Pages:948-952
Contribution Rank3
Abstract恒星光谱分类是天文技术与方法领域一直关注的热点问题之一。随着观测设备持续运行和不断改进,人类获得的光谱数量与日俱增。这些海量光谱为人工处理带来了极大挑战。鉴于此,研究人员开始关注数据挖掘算法,并尝试对这些光谱进行数据挖掘。近年来,神经网络、自组织映射、关联规则等数据挖掘方法广泛应用于恒星光谱分类。在这些方法中,支持向量机(SVM)以其强大的学习能力和高效的分类性能而备受推崇。SVM的基本思想是试图在两类样本之间找到一个最优分类面将两类分开。SVM在求解时,通过将其最优化问题转化为具有(QP)形式的凸问题,进而得到全局最优解。尽管该方法在实际应用中表现优良,但为了进一步提高其分类能力,有的学者提出双支持向量机(TSVM)。该方法通过构造两个非平行的分类面将两类分开,每一类靠近某个分类面,而远离另一个分类面。TSVM的计算效率较之传统SVM提高近4倍,因此,自TSVM提出后便受到研究人员的持续关注,并出现若干改进算法。在恒星光谱分类中,一般分类算法都是根据历史观测光谱来建立分类模型,其中最关键的是对光谱进行人工标注,这项工作极为繁琐,且容易犯错。如何利用已标记的光谱以及部分无标签的光谱来建立分类模型显得尤为重要。因此,提出带无标签数据的双支持向量机(TSVMUD)用以实现对恒星光谱智能分类的目的。该方法首先将光谱分为训练数据集和测试数据集两部分;然后,在训练集上进行学习,得到分类依据;最后利用分类依据对测试集上的光谱进行验证。继承了双支持向量机的优势,更重要的是,在训练集上学习分类模型过程中,不仅考虑有标记的训练样本,也考虑部分未标记的样本。一方面提高了学习效率,另一方面得到更优的分类模型。在SDSS DR8恒星光谱数据集上的比较实验表明,与支持向量机SVM、双支持向量机TSVM以及K近邻(KNN)等传统分类方法相比,带无标签数据的双支持向量机TSVMUD具有更优的分类能力。然而,该方法亦存在一定的局限性,其中一大难题是其无法处理海量光谱数据。该工作将借鉴海量数据随机采样思想,利用大数据处理技术,来对所提方法在大数据环境下的适应性展开进一步研究。
Other AbstractStellar spectra classification is one of hot spots in astronomical techniques and methods. With continuous operation and improvement of observation apparatus, hundreds and thousands of spectra were obtained by researchers, which presented challenges to process them manually. In view of this, data mining algorithms have attracted more attentions, and have been utilized to deal with the spectra. Neural networks, self organization mapping, association rules and other data mining algorithms have been utilized to classify the stellar spectra in recent years. In these algorithms, Support Vector Machine (SVM)is much more popular due to its good learning capability and excellent classification performance. The basic idea of standard SVM is to find an optimal separating hyper-plane between the positive and negative samples. SVM as a convex programming problem has a unique optimal solution, which can be posed as a quadratic programming (QP)problem. In order to further improve the classification efficiency, Twin Support Vector Machine (TSVM)has been proposed. It aims at generating two non-parallel hyper-planes such that each plane is close to one class and as far as possible from the other one. The learning speed of TSVM is approximately four times faster than that of the classical SVM. TSVM receives many attentions since it shows low computational complexity, and many variants of TSVM have been proposed in literatures. During the process of stellar spectra classification, the classification model is built based on the observation data. The key step is to manually label the spectra, which is time-consuming and painstaking. Therefore, how to construct the spectra classification model based on the labeled and unlabeled spectra is a problem de-serving study. In order to effectively classify the stellar spectra, Twin Support Vector Machine with Unlabeled Data (TSVMUD) is proposed in this paper. In TSVMUD, the stellar spectra are firstly divided into two parts, one is for training, and the other is for test. Then, the proposed method TSVMUD is utilized on the training data and the classification model is obtained. At last, the spectra in the test dataset are verified by the classification model. TSVMUD not only preserve the advantage of low computational complexity, but also improve the classification efficiency by taking both the labeled and unlabeled data into consideration. The comparative experiments on the SDSS datasets verify that TSVMUD performs better than the traditional classifiers, such as SVM, TSVM, KNN (K Nearest Neighbor). However, some limitations exist in TSVMUD, for example, how to deal with the mass spectra is quite difficult to solve. Inspired by random sampling, we will research the adaptability of our proposed method in the big data environment based on big data technologies
Keyword恒星光谱 智能分类 双支持向量机 无标签数据
DOI10.3964/j.issn.1000-0593(2019)03-0948-05
Indexed BySCI ; EI ; CSCD ; 中文核心期刊要目总览
Language中文
WOS IDWOS:000463846600048
CSCD IDCSCD:6456809
Citation statistics
Document Type期刊论文
Identifierhttp://ir.xao.ac.cn/handle/45760611-7/3342
Collection计算机技术应用研究室
Affiliation1.泉州信息工程学院软件学院,福建泉州 362000;
2.中北大学软件学院,山西太原 030051;
3.中国科学院新疆天文台,新疆乌鲁木齐 830011;
4.辽宁科技大学理学院,辽宁鞍山 114051
Recommended Citation
GB/T 7714
刘忠宝,雷宇飞,宋文爱,等. 利用带无标签数据的双支持向量机对恒星光谱分类[J]. 光谱学与光谱分析,2019,39(3):948-952.
APA 刘忠宝,雷宇飞,宋文爱,张静,王杰,&屠良平.(2019).利用带无标签数据的双支持向量机对恒星光谱分类.光谱学与光谱分析,39(3),948-952.
MLA 刘忠宝,et al."利用带无标签数据的双支持向量机对恒星光谱分类".光谱学与光谱分析 39.3(2019):948-952.
Files in This Item:
File Name/Size DocType Version Access License
刘忠宝-2019-利用带无标签数据的双支(245KB)期刊论文出版稿开放获取CC BY-NC-SAView Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘忠宝]'s Articles
[雷宇飞]'s Articles
[宋文爱]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘忠宝]'s Articles
[雷宇飞]'s Articles
[宋文爱]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘忠宝]'s Articles
[雷宇飞]'s Articles
[宋文爱]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 刘忠宝-2019-利用带无标签数据的双支持向量机对恒星光谱分类.pdf
Format: Adobe PDF
This file does not support browsing at this time
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.