kommt noch

Achtung: Die Navigationslinks auf dieser Seite funktionieren nicht, weil das nicht der Rahmen ist, der zu diesem Skript gehört. Wenn das Skript ordnungsgemäß dort installiert ist, wo es später laufen soll, funktionieren auch die Links in der dortigen Umgebung.

LINUX Cluster Project

Life science data management


  • Name: Fachgebiet Bioinformatik
  • Address: Maximus-von-Imhof Forum 3, 85354 Freising
  • Project Proposal Date: 2018-04-24 19:40:37


The overall idea of the project is the application of DeepVariant and GATK4 on public sequencing data sets to establish a standardized accurate, reliable and fast pipeline for the analysis of WES data generated in clinics. First, we want to show whether the promised performance and accuracy of both pipelines can be reproduced. We define the following five milestones to accomplish this aim: 1. Implement the DeepVariant pipeline on the LRZ cluster and apply it to the WGS data set of the individual NA12878 which is widely used for sequencing benchmarks.. The primary interest of this milestone is the performance that can be achieved by our installation of DeepVariant on the LRZ cluster. The accuracy of the variant calls should be the same as reported by the publication for DeepVariant. 2. Apply DeepVariant to the WES data set of the same individual (NA12878) generated at by the Care-for-Rare Laboratories at the Dr. von Hauner Children's Hospital, LMU Munich. First, we want to measure the performance of DeepVariant on the LRZ cluster for WES data whose quality is exemplary for clinical exome sequencing. Second, we want to compare the accuracy of DeepVariant depending on the model used for variant calling. Currently, only models for WGS data are available. Here, we want to train a model for WES data and compare the accuracy of variant calling between the WES model and the WGS model. 3. Implement the GATK4 variant calling pipeline on the LRZ cluster and apply it to WES as well WGS data of NA12878. This milestone aims to compare both, performance and accuracy between variants identified by GATK4 and DeepVariant. 4. Apply GATK4 and DeepVariant on WES/WGS data of NA12878 mapped to GRCh38. Currently, most sequencing studies are based on the human reference genome assembly GRCh37. Here we want to compare GATK4 and DeepVariant using the alignment of the reads to the more accurate assembly GRCh38. It is not expected that there are differences in the performance but there might be tendencies that one of both callers is more accurate on the latest assembly. 5. Finally, the best performing pipeline will be used for the LRZ project “NG-Sequenzierung bei Patienten mit angeborenen Immundefekten” (pr58gu).