DATA ANALYSIS (DA_2024) SYLLABUS
Academic Year 2023-2024, 6 ECTS.
A course for the Master Programs in Genetics and Molecular Biology and Neurobiology, given in the 2nd semester.
Instructor: Prof. Andrea Giansanti, Sapienza University of Rome, Department of Physics, (Marconi Building (CU013), 2nd floor, room 211) tel. 0649914367 (3385075611)
andrea.giansanti@roma1.infn.it
Evaluation: based on the discussion of a research/methodological paper, written essays, written tests, home-works and participation to discussions : 40%. Final oral exam: 60%.
Recommended textbooks
[R] Bernard Rosner - Fundamentals of Biostatistics-Brooks Cole (2015).
[WS] Michael C. Whitlock and Dolph Schluter - The Analysis of Biological Data-W. H. Freeman and Company (2015)
Note: STUDY MATERIALS, LECTURE OUTLINES & NOTES, SLIDES CAN BE FOUND ON THE
e-learning Sapienza Platform: (course DA_2024)
Access to the recordings of the lectures, for strict personal use, can be requested from the instructor
0. INTRODUCTION (see also the inaugural lecture DA_2024_inaugural_lecture.pdf)
Data, Metadata, Ontologies.
Data/Models/Computation/Simulation
Facts/things (Wittgenstein redux: 1.1 the world is the totality of facts not of things)
Data tables (objects/descriptors)
Galilei's remouval of the animal at the origin of moderm science, based on physics
Styles of physics and styles of biology: "geometry vs. stamp collection"
Randomness, noise
Evolution and randomness: the case of surviving war planes
Elements of the scientific method: observations/models
The universal structure of a scientific paper
Forms of reasoning: deduction, induction, abduction.
The issue of reproducibility
Science before and after the computer era
The birth of computational (in silico) biology
Computational systems biology/medicine
Numeracy (Cell biology by the numbers)
The Biological Bayesian revolution.
2. DATA AND THEIR REPRESENTATION (WS chap. 2, R 2.8)
types of data and variables (categorical, numerical)
displaying the data (scatter plots, bar graphs, pie charts, strip charts, box plots, frequency tables,
histograms: binning and resolution)
3. DESCRIPTIVE STATISTICS (R 2.1-2.6, W&S chap.3 (to be used as a rewiew reading))
measures of location (aritmetic mean, median, mode, mean vs median)
measures of spread (range, quantiles, variance and standard deviation, coefficient of variation)
Chisini's principles for the means [Graziani2009]
4. PROBABILISTIC REASONING (R 3.1-3.8, W&S chap. 5)
uncertainty and decisions
events, trials, uncertainty, probability
definitions of probabilities: classic, frequentist, subjective
graphical representations through Venn diagrams
dependent/independent events
addition and multiplication formulas
conditional probabilities
Bayes’ theorem and its relevance (subjective/objective, a fair representation of knowledge accumulation) [Puga2015]
Eikosograms (RW Oldford)[ https://cran.r-project.org/web/packages/eikosograms/vignettes/Introduction.html ]
5. BAYES’ THEOREM AND TESTS (see lecture n.10)
Clinical tests as binary classifications (R 3.7)
Clinical tests and conditional probabilities (R 3.8)
A bayesian classifier of protein sequences [Bulashevska2008]
ROC curves (R 3.9, see also [Fawcett2006])
Confusion matrices
Relative risk
6. PROBABILITY DISTRIBUTIONS
random variables (R 4.1- 4.6)
Probability distributions: discrete/continuous (R 5.1-5.4)
Moments of a probability distribution
The normal distribution (R 5.3-5.4, W&S chap. 10 suggested recapitulation reading)
Linear combinations of random variables (R 5.6)
Z-transform and percentiles of N(0,1) (R 5.5)
7. INFERENTIAL STATISTICS AND ESTIMATION (R 6.1-6.7 )
Population and samples moments and estimators
The arithmetic mean as maximum likelihood estimator of the first moment of a gaussian distribution [see notes DA_2020_L12_notes]
Random samples (tables of random numbers)
Biased/unbiased estimators
Point estimation of the mean
Standard error of the mean
Error bars [https://www.nature.com/articles/nmeth.2659 ]
Central limit theorem
Interval estimation
t-distribution, degrees of freedom
Confidence intervals for the mean of anormal distribution
Point estimator of the variance
Chi-square distribution, degrees of freedom
Interval estimation for the variance of a normal distribution
The bootstrap (R 6.11, [https://www.nature.com/articles/nmeth.3414])
8. HYPOTHESIS TESTING (one/two samples) ( R 7.1- 7.6; 8.1-8.4, 8.6 ,W&S chap. 6)
Formal aspects (see lecture notes included in L 14 slides)
Null hypothesis and alternative hypothesis.
Errors of Type I and Type II.
Confidence levels
Acceptance-rejection regions
p-value.
Power
One sample/two sample tests
One sided test/two sided test
Longitudinal/cross-sectional studies
Testing for the equality of two variances
F distribution
F test
9. NON PARAMETRIC TESTS (R 9.1-9.2; 9.4 ,W&S chap.13)
The problem of non-normal distributed data
Lognormal distribution
Tests of normality
Ranks
Sign test
Wilcoxon rank-sum test (Mann-Whitney U test)
10. MULTIPLE PARAMETRIC/NON PARAMETRIC TESTS (R 12.1-12-4; 12.7)
Whitin group/between-group variability
One-way ANOVA
Bonferroni correction
Kruskal-Wallis test
11. CORRELATION AND REGRESSION (R 11.1-11-5 ; 11.7, W&S chapp. 16 and 17 )
Association of variables, correlations/dependencies
Correlation is not causation [https://www.nature.com/articles/nmeth.3587]
The mathematics of linearity: vectors, matrices, linear transformations
Linear regression analysis independent/dependent variables
Least squares method [see handout SJ Miller]]
The correlation coefficient
Principal component analysis (PCA) [see Higgs&Attwood chap. 2]
- Teacher: ANDREA GIANSANTI