XAO OpenIR  > 研究生学位论文
基于GPU集群的相干消色散算法研究
Alternative TitleResearch on Coherent De-dispersion Algorithm Based on GPU Cluster
王博群
Subtype硕士
Thesis Advisor张海龙
2021-06-01
Degree Grantor中国科学院大学
Place of Conferral北京
Degree Name理学硕士
Degree Discipline天文技术与方法
Keyword相干消色散 零拷贝 GPU集群 多进程
Abstract脉冲星信号在传输过程中受到星际介质的影响会导致轮廓展宽和变形,需要进行消色散处理。具体方法可分为非相干消色散和相干消色散,其中相干消色散可最大限度消除星际介质的色散效应,便于后续的科学研究。但相干消色散过程中需要进行密集浮点运算,消耗大量的计算资源,如何提升算法的效率成为亟待解决的问题。国际上使用GPU集群进行相干消色散已成为主流,通过借助GPU的多线程优势,降低了CPU负载,大大增强了系统的数据处理性能。 针对新疆天文台南山观测站26米射电望远镜(NSRT)实际运行过程中遇到的问题及未来大口径射电望远镜脉冲星实时消色散系统的研制需求,结合实际硬件设备的发展现状,本文对基于GPU集群的相干消色散算法进行了分析研究,主要完成了以下几个方面的工作:(1)分析了星际介质色散效应,探讨了非相干消色散和相干消色散的原理及优缺点。(2)设计并实现了基于零拷贝的GPU相干消色散算法,采用设备内存映射消除了主机到设备的拷贝开销,利用CUDA的cuFFT库进行多BATCH傅里叶变换提高了DFT效率,同时采用多线程实现了传递函数的加速计算。(3)设计并实现了基于集群的相干消色散算法,利用shell多进程实现了任务并行分发及流量控制,并在新疆天文台Taurus高性能计算集群上完成了算法测试。论文主要创新点如下:(1)设计并实现了基于零拷贝的GPU相干消色散算法,使用零拷贝技术消除了主机和设备间的拷贝开销,提高了显存利用率。(2)优化GPU傅里叶变换库cuFFT,对数据进行截取,保证数据长度是2的幂级数,加快了运行速度。利用单精度变换,采取额外空间进行DFT结果缓存,提高了运行效率。通过多批次并行DFT变换,避免了重复调用cufftPlan1d,缩短了算法运行时间。(3) 设计并实现了基于GPU集群的相干消色散算法,使用shell多进程技术实现了任务的并行分发及流量控制。本文分析了星际介质色散效应,介绍了相干消色散的技术原理。设计并实现了基于零拷贝的GPU相干消色散算法,提升了GPU的处理效率。在此基础上设计并实现了基于GPU集群的相干消色散算法,在新疆天文台Taurus高性能计算集群上完成了测试。实验结果证明,集群算法的效率与串行算法相比有较大提升。关键词:相干消色散,零拷贝,GPU集群,多进程
Other AbstractThe pulsar signal is affected by the interstellar medium during the transmission process, which will cause the contour to be broadened and deformed, which requires de-dispersion processing. The de-dispersion methods include incoherent de-dispersion and coherent de-dispersion, among which coherent de-dispersion can eliminate the dispersion effect of the interstellar medium to the greatest extent, which is convenient for subsequent scientific research. However, the process of coherent de-dispersion requires intensive floating-point operations and consumes a lot of computing resources. How to improve the efficiency of the coherent de-dispersion algorithm has become an urgent problem to be solved. The use of GPU clusters for coherent de-dispersion has become the mainstream internationally. The use of GPU clusters for coherent de-dispersion has become the mainstream internationally. By taking advantage of the multi-threading characteristics of GPUs, the pressure on the CPU is reduced, and the data processing performance of the system is greatly enhanced.Aiming at the problems encountered in the actual operation of the 26-meter radio telescope (NSRT) at the Nanshan Station of Xinjiang Astronomical Observatory as well as the development needs of the future large-aperture radio telescope pulsar real-time de-dispersion system, combined with the development status of actual hardware equipment, this paper studied the GPU-based Coherent de-dispersion algorithm for clusters. The main results are as follows:(1) The dispersion effect of the interstellar medium was analyzed, and the principles, advantages and disadvantages of incoherent and coherent de-dispersion were studied.(2) Designed and implemented a GPU coherent de-dispersion algorithm based on zero-copy, using device memory mapping to eliminate host-to-device copy cost, using CUDA's cuFFT library for multi-BATCH Fourier transform to improve DFT efficiency, and using multi-threaded to accelerate delivery function.(3) Designed and implemented a pulsar coherent de-dispersion algorithm based on GPU cluster. Used shell multi-process to achieve parallel distribution of tasks, and used named pipes for flow control.The main innovations of the thesis are as follows:(1) Designed and implemented a GPU coherent de-dispersion algorithm based on zero-copy, we used zero-copy technology to eliminate the copy overhead between the host and the device, and improved video memory utilization.(2) The GPU Fourier transform library cuFFT was optimized for use. By intercepting the data, the data length was guaranteed to be a power series of 2, which speeded up the operation. By using single-precision transformation and using extra space for DFT result caching (out of place), the operating efficiency is improved. By using multi-batch transformation for DFT instead of reusing cufftPlan1d, the running time was shortened.(3) Designed and implemented a pulsar coherent de-dispersion algorithm based on GPU cluster. We used shell multi-process to achieve parallel distribution of tasks, and used named pipes for flow control.This article analyzed the dispersion effect of the interstellar medium and introduced the technical principles of coherent de-dispersion. Designed and implemented a zero-copy-based GPU coherent de-dispersion algorithm to improve the processing efficiency of the GPU. On this basis, a cluster-based pulsar coherent de-dispersion algorithm was designed and implemented, and it was tested on the Taurus high-performance computing cluster of Xinjiang Observatory. Experimental results proved that the efficiency of the cluster algorithm is up to 5 times higher than the serial algorithm. Keywords: Coherent De-dispersion, Zero Copy, GPU Cluster, Multi-process
Pages57
Language中文
Document Type学位论文
Identifierhttp://ir.xao.ac.cn/handle/45760611-7/4741
Collection研究生学位论文
Affiliation中国科学院新疆天文台
First Author AffilicationXinjiang Astronomical Observatory, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
王博群. 基于GPU集群的相干消色散算法研究[D]. 北京. 中国科学院大学,2021.
Files in This Item:
File Name/Size DocType Version Access License
基于GPU集群的相干消色散算法研究_王博(1683KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王博群]'s Articles
Baidu academic
Similar articles in Baidu academic
[王博群]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王博群]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.