COMPUTATIONAL BIOPHYSICS SYLLABUS (modelled on the 2017-18 course but with

Possible openings towards machine learning, active inference and atomistic simulations of biomolecules) 

DOING BIOLOGY WITH COMPUTERS AND MODELS: FROM THE BORN-OPPENHEIMER APPROXIMATION TO THE SPACE OF BIOLOGICAL SEQUENCES AND THE INTEGRATIVE MODELLING OF SYSTEMS BIOLOGY

 

Academic Year 2018-2019, 6 ECTS.

LOCATION/TIME:  Room 5, Fermi Building: Mon, 10:00 -12:00 am.

INSTRUCTOR:
Dr. Andrea Giansanti

Office: 211 Marconi Building, near Toushek room. 
Office Hours: mon. 2:00 - 4:00 pm, wed. 11:00-13:00 am.
Phone: +390649914367, +393385075611
E-Mail:andrea.giansanti@roma1.infn.it

 

This is a course for the Master Program in Physics, given in the 2nd semesterand, in perspective,  is quite strictly related to other courses in the BIOSYSTEMS program, namely: BIOCHEMISTRY, MOLECULAR BIOLOGY, BIOPHYSICS, THEORETICAL BIOPHYSICS AND SOFT&BIOLOGICAL MATTER.

Description/Objectives. This course is intended as an introduction to computational (in silico, as opposed to in vivo/in vitro) biophysics/biology, in an evolutionary perspective. The style of teaching is by active illustration not by exhaustive passive demonstration. It will require from the students an active participation through: questions, statements, and written essays. Extensive reference and critical introductions to the literature and to many specialized texts will be offered as a thread for personal study. An effort will be done to locate each topic in a clear scheme of references, useful to prepare the final exam. The objective of this course is, in a nutshell, to narrow the gap between the institutional level of training and that of research. Guest lecturers will occasionally present specialized views and lines of contemporary research of impact and interest to an audience of students enrolled in the biosystems and theoretical curricula.

Requirements.Enrolled students should have taken the basic courses of the BA program. In particular, basic competence in classical mechanics, thermodynamics, chemical equilibrium and quantum mechanics is required together with basic programming skills. Biological facts will be discussed as needed. 

Evaluation:based onwritten essays, written tests, home-works and participation to discussions: 40%. Final oral exam: 60%.

 

Recommended texts. 

For a quantitative assessment of modern biology

[MP] Ron Milo and Rob Phillips, Cell Biology by the numbers, Garland, New York, 2016.

Reference textbook for the entire course

[HA] PG Higgs, TK Attwood, Bioinformatics and Molecular Evolution, Blackwell, 2006.

Mathematical backbone and proofs

[DU] R Durbin, Eddy, Krogh, MichisonBiological Sequence Analysis. CUP, 1999.

 

General readings, to be discussed along the course:

Paolo Zellini,  La dittatura del calcolo, Adelphi.

Siri Hustvedt, Le illusioni della certezza, Einaudi.

Dennis Bray, Wetware, Yale Unibersity Press.

 

 

 

A TRACK OF THE LECTURES AND STUDY MATERIALS CAN BE FOUND  in the e-learning MOODLE platform of Sapienza University of Rome (corso di BIOFISICA COMPUTAZIONALE (password from the instructor))

https://elearning2.uniroma1.it/course/view.php?id=5626

 

  1. BASIC CELL BIOLOGY BY THE NUMBERS (A)

What is life (HA 3.1 Weinberg_chap1.pdf)

Living cells: prokaryotes/eukaryotes (basic_biology_2017)

The genetic code/genomes (HA 2.2, 2.3)

Noise in gene expression (P) (see, e.g. Calanchon’s webpage https://www.mrc-lmb.cam.ac.uk/genomes/guilhem/noise.html)

What is darwinian evolution (HA 3.1,3.2, 3.3 (mutations), 3.4 (coalescence) 3.6 (neutral evolution and adaptation, codon bias, P)

Biology by numbers (P) (Phillips & Milo’s quantitative approach: see their book online and practice at your will https://www.weizmann.ac.il/plants/Milo/http://book.bionumbers.org)

 

  1. SEQUENCING (A, M, P) 

Libraries

Sanger sequencing vs New Generation Sequencing

Pipelines for ChIP-seq

 

  1. ELEMENTS OF PROTEIN STRUCTURES AND DATABASES (A)

Central dogma of molecular biology, peptide bonds formation: planarity, protein synthesis on the ribosome, genetic code and its degeneration (HA 2.3).

Standard form of the 20 natural occurring amino acids: special cases of glycine, histydine, proline, cysteine (disulphide bridges).

Dihedral angles and Ramachandran plots: regions of standard secondary structures. 

Supersecondary structures and structural motifs.

Packing of secondary structures (Levitt’s rules for anti-parallel alpha-helices and Crick’s knob into holes for parallel helices see slides: Introduzione alla struttura delle proteine (Forcelloni).pdf).

Stabilizing forces in protein structure (electrostatic, van der Waals, hydrogen bonds, hydrophobic, for this section of the program see the first two chaptes of Tom Waigh’s Applied Biophysics in folder n. 3 of the attached STUDY MATERIALS).

Primary, secondary, tertiary and quaternary protein structures. 

Databases for nucleic acids and proteins: formats (HA  5.1, 5.2). GenBank (HA 5.3.3, 5.3.4), Uniprot/Swissprot (HA 5.4.4 , 5.4.5, 5.4.6, 5.4.8, see updates on the websites).

Structural and funtional protein domains (e.g. PRODOM classification, see also Pawson’s interaction domains)

Sequences determine protein structures (Anfinsen’s studies on ribonuclease) and structures are more conserved than sequences (foldability).

Protein databases (HA. 5.5.1) PROSITE (regex, 5.5.2) PRINTS (5.5.3) pFam (HA 5.5.8)

Protein Data Bank (PDB): resolution. Visualizing strutures: Pymol, Chimera e Litemol. Protein structural classification (SCOP CATH, see e.g.HA 5.7.3),

Intrinsically disordered proteins (P) (DISPROT, MobiDB)

 

Protein Molecular Dynamics 

Initial conditions, integration scheme, force fields,Verlet’s algorithm,Thermostats.

“Essential Dynamics” as a Principal Component Analysis (PCA, see sect.9 below) in the space of protein atomic motions 

 

Direct coupling analysis for protein contacts prediction 

 

  1. WHAT IS COMPUTATIONAL BIOPHYSICS (A,M) 

Big data and the problem of “law without law” (P)(Wheeler1983)

Modelling and computation (deterministic/probabilistic, see Bray2014)

Patterns, signals, noise

Probability distributions and Kulback-Leibler distances

Simplicity/complexity: objects in diluted isolation (in vitro) and connected into a networks of relationships (in vivo)

Physics of living systems vs Physics of parts of living systems (Ageno’s [integrative] vs Careri’s [molecular] approach see: BC_2017_L1.pdf)

The space of biological sequences as the archive of evolution (Molecules as documents of Evolutionary History, see: ZuckerkandlandPauling1965)

 

  1. PROBABILISTIC REASONING (BAYES THEOREM) (A,M)

Relevance of Bayes’ theorem in computational biology: (Durbin_INTRO.pdf, see also: Puga2015a and Puga2015b)

Events, trials, uncertainty, probability

There are only conditional probabilities: P(H|I) 

Inferential machine learning: maximum posterior (MAP) and maximum likelihood criteria in patter recognition

Philosophical digression: Popper (intrinsic physical propensity of events) vs De Finetti (subjective degree of belief) 

Probability distributions: discrete/continuous. Momenta and estimators.

Example: arithmetic mean as a maximum likelihood estimator of the 

first moment of a Gaussian distribution

 

  1. MODELS OF SEQUENCE EVOLUTION (M)

Mutations(HA 3.2), sequence variation within and between species (HA 3.3)

Negative purifying vs positive Darwinian selection 

Evolutionary pressure on a site through dn/ds [Ka/Ks]](HA 11.2.3)

Evolutionary distances between orthologous genes (HA 4.1.1)

Probabilistic models of evolution (DU chap 8.2)

The solution of the Jukes-Cantor model(HA 4.1.2 and Box 4.1)

Substitution matrices (DU 2.2)

PAM model of protein sequence evolution (HA 4.2)

PAM distances (HA box 4.2)

Log-odd scoring PAM matrices (HA 4.3.1, DU chap 2)

BLOSUM scoring matrices (HA 4.3.3, DU chap.2

 

  1. SEQUENCE ALIGNMENT ALGORITHMS (M)

video attwood su alignments

Scaling of algorithms with the dimension of a problem (HA 6.1)

Pairwise sequence alignment: gap cost functions, dynamic programming

Global alignment: Needleman-Wunsch algorithm (HA 6.2, 6.3, DU chap 2)

Local alignment: Smith-Waterman algorithm (HA 6.3.3, DU chap 2)

Effects of scoring parameters on thealignment (HA 6.4))

Multiple sequence alignment (HA 6.5, optional)

 

  1. SEARCHING SEQUENCE DATABASES (M)

FASTA/BLAST (HA 7.1.2)

PSI_BLAST (7.1.3)

Measuring performance of algorithms: (HA 7.1.4, sensitivity, selectivity, ROC curves, Deiana2013)

Alignment statistics, extreme value distribution: p-values (HA 7.2 Box7.1and 7.2, E-value (HA 7.3)).

Rooted trees and the molecular clock hypothesis. Bootstrapping

 

  1. CLUSTERING AND CLASSIFICATION (M,P)

Hierarchical clustering (HA 2.6.2) / non-hierarchical clustering (HA 2.6.3, 2.6.4).

Principal components analysis (PCA) (HA 2.5 , box2.2).

k-means.

Super paramagnetic unsupervised clustering (P)(Blatt1996, Tetko2005 see also: http://www.vcclab.org/lab/spc/and also the very useful link to Rudy Stoop’s computational biology clustering pagehttp://stoop.ini.uzh.ch/research/clustering).

 

  1. PHYLOGENETIC METHODS (M)

Phylogenetic trees as graphs of evolutionary relationships based on evolutionary distances (HA8.1, (.2 and DU chap. 7.1 and 7.2).

Additive trees: additivity and ultrametricity (P)(Rammal1986pattern recognition in sequences: Bayesian classifiers).

Distance methods as clustering methods (HA 8.3).

UPGMA method (HA8.3.2).

Neighbor joining algorithm (HA box 8.1).

Bootstrapping (HA 8.4)

Cavalli-Sforza&Edwards Theorem on the number of distinct philogenetic rooted and unrooted  trees (see slides, and DU chap. 7 p.163-164).

 

  1. MACHINE LEARNING METHODS (M)

Pattern recognition in sequences of symbols from alphabets: Bayesian classifiers (HA 10.1, 10.2 and Bulashevska2008)

Prior and posterior probabilities (key formula10.3), pseudocounts (key formula 10.11)

Hidden Markov Models(HMM): basic structure (HA 10.3, see also Chap3_Durbin_Biological_Sequence_Analysis)

HMM Problems:Evaluation, Decoding, Learning (see slides: Introduction_Hidden_Markov_Models.pdf)

Decoding problem: the Viterbi algorithm (HA box 10.2)

Training supervised/unsupervised of a HMM on a gapless profile associated to a protein family: Viterbi (minimum action path) vs Baum-Welsch (path integral) method (HA 10.3.3). Forward/Backwards algorithms. (HA box 10.3, see also materials in HMM_VITERBI_BAUM_WELSCH.zip)

Introduction to Artificial Neural Networks as general methods to recognize and classify patterns in data (Papert & Minsky pioneers of Artificial Intelligence, see, e.g., Perceptrons, MIT Press 1988):

formal neurons and perceptrons: logic gates, Input/output transfer functions, weights, thresholds.

Basic structure of a neural network: layers (HA 10.5).

Multi-layer networks the Back propagation algorithm (HA10.5.5; box 10.4)

Deep-learning (see the materials in the on line book by Michael Nielsen at:http://neuralnetworksanddeeplearning.comand the two reviews:Angemueller2016, Jones2017).