Instructor: Hemant Kelkar, Ph.D., UNC Center for Bioinformatics
Location: HSL 307
Genome wide analysis of gene expression has been a popular technique since the advent of microarrays. Next Generation Sequencing (NGS) technology was applied for this purpose early in the life cycle and has since become one of the most popular uses of NGS. We will start with a general overview of NGS technologies and consider commonly used data formats. Because of time/space constraints, we will be using a small subset of sequences (derived from a real dataset). I will introduce FastQC which is a popular application used for quality control (QC) of NGS data. We will then take a look at examples of results from FastQC (aside from the one we will run on our test data). This will be followed by an overview of data scanning/trimming programs. These programs look for contaminating sequences (e.g. adapters) and can also trim data based on various other criteria.
Account Requirement: You will need an account on the “Longleaf” Linux cluster managed by ITS-Research Computing. Instructions for requesting a “Longleaf” account via the ONYEN Subscribe to Services utility is available at this link: http://help.unc.edu/help/how-do-i-get-an-account-on-killdevil-research-computing-cluster/. Please submit your account request at least 3 days before class. It generally takes 24-48 hours for account creation.
Prerequisites: For those unfamiliar with UNIX and Longleaf/UNC computer cluster usage, UNIX for Biologists, Part 1 and UNIX for Biologists, Part 2 (from this series or prior offerings) are prerequisites for this class.