CPLSTool: A Framework to Generate Automatic Bioinformatics Pipelines

Journal Title: Biomedical Journal of Scientific & Technical Research (BJSTR) - Year 2018, Vol 11, Issue 5

Abstract

Many bioinformatics tools have been developed for data analysis and focus on some specific problems. However, one program is not enough to complete the data mining. We developed CPLSTool (https://github.com/maoshanchen/CPLSTool) that can compress multiple bioinformatics tools and the produced pipeline can be used for data anlaysis repeatly. The most significant advantage of using CPLSTool is to save waiting time, compared to step-by-step analysis. In addition, some steps for the data analysis can be run parallely in order to save the program running time. We used CPLSTool to build an automatic pipeline based on QIIME and analyzed skin 16S rRNA data. The results showed that a total of 102 minutes can be saved using CPLSTool and the visualization of results improves our understanding of the results. CPLSTool can be applied in any kind of data analysis, including genomic, transcriptomic, proteomic and metagenomic data analysis. The use of CPLSTool will improve our understanding of data analysis and save time and computing resources.The last decade has witnessed the breaking development of Next-Generation Sequencing (NGS) tools, including Transcriptome Sequencing (RNA-Seq), Whole-Genome and Whole-Exome Sequencing (WGS/WXS), Metagenomics, Chromatin Immunoprecipitation or Methylated DNA Immunoprecipitation followed by Sequencing (ChIP-Seq or MeDIP-Seq), and a multitude of more specialized protocols, such as Cross-Linking Immunoprecipitation (CLIP-Seq), Assay for Transposase-Accessible Chromatin Using Sequencing (ATAC-Seq), and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE-Seq) [1]. Every NGS tool was born with one or more analysis applications and now there are many bioinformatics tools developed for general and special research purposes, such as BWA [2], ExScalibur [3], Chipster [4], Churchill [5], NEAT [6], MG-RAST [7], TopHat [8] and QIIME [9]. However, there are some drawbacks for these tools. For example,i) Some tools concentrate on a single analysis step instead of completing all needed contents, such as BWA and Top Hat; ii) It is difficult to add new analysis contents to current integrated pipelines, such as NEAT; iii) Some tools are based on web server and the analysis is limited by the internet speed sometimes, such as MG-RAST; and iv) An automatic pipeline is necessary for the whole analysis rather than step-by-step operation, such as QIIME. Moreover, the tremendous amount of NGS output requires a possible way to speed up the analysis. Thus, it is important to develop a clever way to organize the related tools and software within reasonable time to get automatic pipelines and to speed up the overall procedure using parallelization and acceleration technologies [10]. To address this need, some features of a program should be considered when it is developed, such as i) Management of related tools and programs regardless of their own program language and input file formats, ii) Flexibility of adding new contents, iii) Generating an automatic pipeline instead of step-by-step operations, and iv) use of parallelization and acceleration technologies. We developed CPLSTool, which can conform to all the above features. CPLSTool is freely available for users from https://github.com/maoshanchen/CPLSTool.

Authors and Affiliations

Sifen Lu, Jing Song, Maoshan Chen

Keywords

Related Articles

The Smile Aesthetics and its Implications in Social Prosthetic Rehabilitations: A Case Report

The aim of this study is to individualize the correlative aspects between the type of smile , the degree of visibility of the teeth and the way in which the chosen therapeutic solution, in accordance with the particulari...

Neural Interconnectedness Between Thalamus, Cerebral Cortex and Cerebellum and its Pathophysiological Implications

Advances in neurosciences, imaging techniques, magnetoencephalography and mathematical modelling enlightening us how our multi- billion neurons located in different parts of our brain integrate and interact synchronously...

Numerical Simulation in Electrocardiology Using an Explicit Generalized Finite Difference Method

In this paper we present a fast, accurate and conditionally stable algorithm to solve a monodomain model in 2D, which describes the electrical activity in the heart. The model consists of a parabolic anisotropic Partial...

Latent Time (Quiescence) Properties of Human Colonic Crypt Cells : Mechanistic Relationships to Colon Cancer Development

Objectives : To determine latent time (quiescence) properties of human colonic crypt cells and explores relationships between these properties and Colorectal Cancer (CRC) development. Methods : Quantitative methods were...

Histopathological Spectrum of Nephrectomy Specimens: Single Center Experience

Background: The kidneys are by and large a very resilient organ. The renal parenchyma though subjected to repeated trauma/insults of the noxious environment, they are the last to respond. The kidneys are affected by vari...

Download PDF file
  • EP ID EP592718
  • DOI 10.26717/BJSTR.2018.11.002172
  • Views 201
  • Downloads 0

How To Cite

Sifen Lu, Jing Song, Maoshan Chen (2018). CPLSTool: A Framework to Generate Automatic Bioinformatics Pipelines. Biomedical Journal of Scientific & Technical Research (BJSTR), 11(5), 8863-8867. https://europub.co.uk/articles/-A-592718