Encryption and Sharing of Genomic Data Across Servers

Abstract

Contemporary studies in human genetics has become a ‘Big Data Science’, due to recent advances in sequencing technologies that has reduced the cost and improved accuracy of sequencing. This high volume ‘Big Data’ consists of genomics data, clinical data, electronic health records (EHR) and physical activity data form personal health apps and presents the unprecedented opportunity to combine them for integrated ‘Big Data’ healthcare analytics and applied knowledge. There is legitimate privacy and security concerns over indiscriminate open source access and use of the biomedical ‘Big Data’, while creation of barriers to data access is feared to hinder research endeavors in human diseases. In this brief commentary, the present status of genetic studies and data security are discussed(Figure 1). How is Genetics or DNA Sequence Linked with Disease? Over the past few decades, analysis of human genome or DNA sequences for identification of disease causing mutations has been one of the greatest focus of genomics research. Identification of disease associated single nucleotide variations (SNPs) in DNA is based on the hypothesis of ‘common disease common variant’. The assumption behind this hypothesis is that a common disease must have a different genetic structure than rare diseases. This assumption was supported by discoveries of susceptibility variants or SNPs with high minor allele frequency, on APOE gene and PPAR gamma gene, for the common diseases, Alzheimer’s disease and Type II diabetes Blacker et al. [1]. These successes combined with the evolution of genomic technology, has ushered an age of genomics with tremendous growth in genotype (DNA sequence and variation) and phenotype (disease manifestation) data. In these GWAS (Genome-Wide Association Studies) projects researchers have identified single nucleotide variations (SNPs) in the human genome associated with diseases. What is the Technology that is used to Collect DNA Sequence and Genetic Variation Data? Identification of genetic variations (genotype) associated with disease (phenotype), also called association studies, involves determination of the DNA sequence. The two technologies available to do this at a genome-wide level, are Array-based technology Distefano et al. [2] and Next-generation sequencing (NGS) technology Goodwin et al. [3]. Array based technology is based on DNA-probes, each ~70 base pairs in size, which recognize specific sites on the genome and emit a measurable signal when there is a match. The Array based technology covers ~500,000 or 1,000,000 (< 0.1% of the genome) different genetic variations or SNPs on the genome. Arrays are available either as an Affymetrix platform with DNA probes printed on a spot of a chip or as an Illumina platform with DNA probes on beads. The human genome is made up of 3.3 billon bases, while the arrays cover only ~500,000 or 1,000,000 sites. Thus, the key to the success of the array based GWAS for identification of disease associated variants, came from careful selection of sites (SNPs or markers) which are known to vary across a population of humans and/or prior studies have shown them to be a disease associated locus. Over 25,000 significant disease-associated genetic loci have been identified so far with the help of Array based GWAS studies Mac Arthur et al. [4]. In NGS bases technology it is not necessary to have prior knowledge of genetic variants in the population, as NGS based Whole Genome Sequencing (WGS) covers all 3.3 billion bases or sites on the genome without any bias of site or marker selection. As the cost of sequencing with NGS continues to decrease this technology is poised to be widely used to identify not only common variants, like the Array based technology, but also rare variants associated with diseases Bennette et al. [5]. Given the cost of WGS, presently the most widely used NGS bases technology is Whole Exome Sequencing (WES), which sequences base by base the entire protein coding region or exons of the genome (~1% of the entire genome). NGS based technology, has revealed that an estimated 100 loss of function variants or 100 non-functional genes occur per human, with around 20 of them being homozygous or completely gene inactivating in each person, and most occur population wide at a frequency of >1% Mac Arthur et al. [6]. Thus, WGS and WEG, have the potential to reveal novel genetic variations associated with diseases.

Authors and Affiliations

Shradha Mukherjee

Keywords

Related Articles

Experiences of the Mobile Injection Team for Multidrug Resistant-Tuberculosis Patients in Ugu District, Kwazulu-Natal

The purpose of this paper is to describe the experiences of the mobile injection team (MTI) for multidrug resistant- tuberculosis with an aim of identifying the challenges facing the team and the institution providing th...

CADASIL, Migraine and Multiple Sclerosis (MS) – The Risk of Misdiagnosis, Case Report

Diagnostic criteria for multiple sclerosis (MS) have been changing for years to enable easier and faster ways to confirm diagnosis especially during last decade. They lead to earlier treatment of patients with MS what gi...

Protein Content of Dr Alcaraz Y Col Protocol for ObtainingPlasma Growth Factors in 350 Healthy Patients andComparison with Other Methods Published in Literature

Introduction: The diversity of procedures for obtaining platelet and plasmatic growth factors, the absence of control in most of them and the growing field of clinical application, makes them necessary methods adequately...

Extraction Optimization for Phenols and Flavonoids from Cultured Mycelia of Cordyceps Ophioglossoides and Exploration of Bioactivities of its Aqueous and Ethanol Extracts

Aqueous and ethanol extracts from the mycelia of Cordyceps Ophioglossoides have been used as a nutritional supplement, especially for women suffering from massive postpartum vaginal bleeding in Southwest China. However,...

Phase-Down of Amalgam Use in Dentistry: A Perspective For its Effective Control and Management

Mercury pollution of the environment and its negative impacts on the health of humans had been recognized many years ago by the world community; and the contribution of dental amalgam, which contains about 50% of mercury...

Download PDF file
  • EP ID EP592749
  • DOI 10.26717/BJSTR.2018.07.001479
  • Views 155
  • Downloads 0

How To Cite

Shradha Mukherjee (2018). Encryption and Sharing of Genomic Data Across Servers. Biomedical Journal of Scientific & Technical Research (BJSTR), 7(2), 5809-5812. https://europub.co.uk/articles/-A-592749