Syllabus - PhD Alumni from The Computer Science Dept at UC

California State University, Chico
Department of Computer Science
Syllabus
Spring 2015 – CSCI 598 Advanced Topics in Computer Science
Bioinformatics: Computational Methods for Next Generation Sequencing Data Analysis
Course Title: Advanced Topics in Computer Science
Bioinformatics: Computational Methods for Next Generation Sequencing Data Analysis
Course Description: An introduction to computational methods for Next Generation
Sequencing data analysis. Topics include mapping sequenced reads back to a reference
genome; approximate string matching; intro to biostatistics: probability distribution,
hypothesis testing; identification of SNPs (single polymorphisms); analysis of RNA-seq data:
mapping RNA-seq reads, identification of splice-junctions, analyzing gene expression;
analysis of methylation: mapping bisulfite-treated reads, estimation of methylation level,
genome-wide associative analysis of methylation and gene expression.
Prerequisites: CSCI-311
Course Times and Location: TR 11:00am – 12:15am in OCNL 254
Instructor: Dr. Elena Harris
E-mail: eyharris@csuchico.edu
Website: http://www.cs.ucr.edu/~elenah/
Office Phone: (530) 898-4304
Office: OCNL 221
Office Hours:
MW 1:00pm-2:00pm and 3:00pm-3:45pm
TR 12:30pm-1:30pm
Or by an appointment
Recommended Textbook:
An Introduction to Bioinformatics Algorithms (Computational
Molecular Biology), First Edition by Neil C. Jones and Pavel A.
Pevzner, Copyright 2004 by MIT.
ISBN-10: 0262101068
Course objectives:
The goal of this course is to introduce students to the basic algorithms and techniques used to
analyze the Next Generation Sequencing, NGS, data. One of the major components of this
course is collaborative activities between Computer Science students and Biology students.
After completion of this course Computer Science students will be able





To program basic algorithms for NGS data including but not limiting to
 Pattern search (mapping sequenced reads back to a reference genome)
 Local approximate string matching
To analyze NGS data in order to identify single-nucleotide polymorphism
To use the state-of-the-art tools for NGS data analysis
 BRAT-bw for methylation analysis
 TopHat for gene expression analysis
To apply biostatistics to carry out genome-wide associative analysis
To work efficiently with biologists to
 Discuss biological problems
 Choose the appropriate tools and/or techniques for solving problems
 Convey the results and discuss the biological importance of the findings
Attendance and Deportment Policy: Lecture attendance is mandatory. Positive professional
attitude and interaction are mandatory. You are expected to actively participate during lecture
and discussions: ask questions, answer questions when you are called, and participate in group
work. It is required that you show up to class on time.
Course Assignments: There will be programming or data analysis assignments on a weekly
basis. Some of these assignments will be joint assignments with students from Biology
Department. The course will culminate with a joint project between CSCI and BIOL students.
Policy on Turning in Homework Assignments: Due dates are firm. No late assignments
will be accepted unless serious illness or other excused absences merit allowances in the
judgment of the instructor.
Exams: There will no exams.
Group work: Group work (including collaborative with BIOL) will be offered as needed to
improve your understanding of the material presented and to facilitate your team working
skills. There will be no make up for missed group work (no exceptions).
Grade Evaluation Procedures: Students will be graded based on their performance in the
following course components (grades will NOT be curved):
Programming or data analysis assignments (7-11) 50%
Group work (as needed) 20%
Project (one)
30%
Final Grades: Final grades will be expressed as a percentage of the maximum possible score
of all evaluated materials. I will round up decimal points to the nearest integer. Final grade
will not be curved. Letter grades will be given according to the following:
Scale
(inclusive)
Letter
Grade
University Definition
93-100
90-92
87-89
83-86
80-82
77-79
73-76
70-72
67-69
60-66
0-59
A
AB+
B
BC+
C
CD+
D
F
Superior Work
Very Good Work
Adequate Work
Minimally Acceptable Work
Unacceptable
Plagiarism/Cheating Policy: Students who are in violation of the University’s policy on
academic honesty and integrity will be reported to the Campus Student Judicial Affairs. Such
violations include copying of other students’ work.
University Policies and Campus Resources
Academic integrity
Students are expected to be familiar with the University’s Academic Integrity Policy. Your
own commitment to learning, as evidenced by your enrollment at California State University,
Chico, and the University’s Academic Integrity Policy requires you to be honest in all your
academic course work. Faculty members are required to report all infractions to the Office of
Student Judicial Affairs. The policy on academic integrity and other resources related to
student conduct can be found at: http://www.csuchico.edu/sjd/integrity.shtml
Campus Policy in Compliance with the American Disabilities Act
If you need course adaptations or accommodations because of a disability, or if you need to
make special arrangements in case the building must be evacuated, please make an
appointment with me as soon as possible, or see me during office hours. Students with
disabilities requesting accommodations must register with the DSS Office (Disability Support
Services) to establish a record of their disability. Special accommodations for exams require
ample notice to the testing office and must be submitted to the instructor well in advance of
the exam date.
Student Computing
Computer labs for student use are available http://www.csuchico.edu/stcp located on the 1st
floor of the Merriam Library Rm 116 and 450, Tehama Hall Rm.131 and the BMU Rm 301.
Student Services
Student services are designed to assist students in the development of their full academic
potential and to motivate them to become self-directed learners. Students can find support for
services such as skills assessment, individual or group tutorials, subject advising, learning
assistance, summer academic preparation and basic skills development. Student services
information can be found at: http://www.csuchico.edu/5.-studentservices.html.
Disability Services
Any student who feels s/he may need an accommodation based on the impact of a disability
should contact me privately to discuss your specific needs. Please also contact the Disability
Support Services office to coordinate reasonable accommodations for students with
documented disabilities. Disabilities Support Services online:
http://www.csuchico.edu/dss/studentServices/.
Student Learning Center
The mission of the Student Learning Center (SLC) is to provide services that will assist CSU,
Chico students to become independent learners. The SLC prepares and supports students in
their college course work by offering a variety of programs and resources to meet student
needs. The SLC facilitates the academic transition and retention of students from high schools
and community colleges by providing study strategy information, content subject tutoring,
and supplemental instruction. The SLC is online at http://www.csuchico.edu/slc/. The
University Writing Center has been combined with the Student Learning Center.
Tentative Schedule of Topics
Week# Topics covered
Biology background: DNA, Genes structure (exons and introns), ORFs,
1
transcription and translation. Next Generation Sequencing Technology.
Pattern search problem (string matching problem).
2
Hash tables for exact read mapping.
Approximate string matching (mismatches).
3
Approximate string matching (indels). Smith-Waterman algorithm for local
sequence alignment.
Biostatistics: Intro to Probability. Probability distributions (binomial,
4
hypergeometric, normal). Intro to data analysis.
Hypothesis testing. Identification of SNPs. Associative analysis of indels and
5
disease.
RNA-Seq. Alignment of RNA-Seq reads to transcriptome. Updating Hash
6
table with sequence ID. Ambiguous reads.
RNA-Seq. Identification of splice-junction. TopHat.
7
Analyzing RNA-Seq data: identifying differentiated gene expression.
8
Spring Break
9
Short coding ORFs from RNA-seq data.
Date
Jan 20
Jan 22
Jan 27 HW1
Jan 29
Feb 3 HW2
Feb 5
Feb 10 HW3
Feb 12
Feb 17 HW4
Feb 19
Feb 24 HW5
Feb 26
March3 HW6
March5
March10
HW7
March 12
March 17
March 19
March 24
March 26
Epigenomics: Methylation. Mapping bisulfite-treated reads.
10
11
12
13
14
15
Methylation level. Differential Methylated regions, DMRs. Hypomethylated
Methylated regions, HMRs.
Associative analysis of methylation level and gene expression.
microRNA mapping. Associative analysis of miRNA and mRNA.
Project Presentations
Project Presentations
March 31
HW8
Apr 2
Apr 7 HW9
Apr 9
Apr 14 HW10
Apr 16
Apr 21 HW11
Apr 23
Apr 28
Apr 30
May 5
May 7