Indice degli argomenti

  • DATA METADATA ONTOLOGIES

    LECTURE N.1 MON MARCH 2   2022 Auletta 2 CU026 11am-1pm NO_RECORD

    to set the stage: Inaugural lecture (DA_2022_inaugural_lecture.pdf)

    data, metdata, ontologies

    facts/things (Wittgenstein redux: 1.1 the world is the totality of facts not of things)

    data tables (objects/descriptors)

    Galilei's remouval of the animal at the origin of moderm science, based on physics

    styles of physics and styles of biology: "geometry vs. stamp collection" (?)

    randomness, noise

    evolution and randomness: the case of surviving war planes

    elements of the scientific method

    the universal structure of a scientific paper





  • RANDOMNESS COMPLEXITY CORRELATIONS

    LECTURE N.2 MON MARCH 7  2022 Auletta 2 CU026 11am-1pm

    structure of a scientific paper (2), peer review, open access

    repositories: Bioarxiv, Pubmed

    complexity/simplicity: from Galilei’s “simplicity” to Parisi’s “complexity”

    evolutionary time, randomness (again)

    the computational dimension (advent of computers. Moore’s law)

    computational medicine/computational phychiatry

    cell biology by the numbers (Milo & Phillips)

    Points of significance (Naomi Altman): Association, correlation and causation

  • POPULATIONS AND SAMPLES

    LECTURE N.3  WED MARCH 9  2022 Auletta 2 CU026 11am-1pm REC

    observations/ models / facts

    stochastic/deterministic models

    parameters

    correlation is not causation

    population/sample

    random samples

    Istograms probability distributions


  • DESCRIPTIVE STATISTICS I

    LECTURE N.4 MON MARCH 14 2022 Auletta 2 CU026 11am-1pm REC

    samples, statistic, estimation (biased/unbiases)

    indexes of localisation/dispersion

    random samples

    pseudoreplication (Lazic2010)

    means: Chisini’s criterion (Graziani2009)

    median, mode

    symmetric, asymmetric distributions

  • DESCRIPTIVE STATISTICS II

    LECTURE N.5 WED MARCH 16 Auletta 2 CU026 11am-1pm REC

    modes of science: deduction, deduction, abduction

    measures of spread of sampled variables

    range

    quantiles

    interquartile range

    sample variance and standard deviation

    coefficient of variation

    variance and standard deviation

    modes of organising data: scatter plots, bar graphs, pie charts, strip charts, box plots

    frequency tables/histograms/binning/resolution

    sampling the distribution of estimates (statistics)

    the mean of means

    self-averaging/non self-averaging quantities


  • FROM HISTOGRAMS TO PROBABILITY

    LECTURE N.6  MON  MARCH 21 Auletta 2 CU026 11am-1pm REC

    anatomy of a box-plot

    strip charts

    shapes of histograms (optimize: binning/resolution)

    the sampling distribution of a mean (mean of means)

    the distribution of human genes (see W&S chap4)

    events, experiment, probability

    definitions of probabilities: classic, frequentist, subjective

    axioms of probabilities

    events, sets, propositions (logic)

    conditional probabilities

  • PROBABILITIES

    LECTURE N.7 Wed MARCH 23 Auletta 2 CU026 11am-1pm REC

    conditional probabilities, correlated (overlapping, interphering) events

    algebra of events (dependent/independent)

    total probability rule

    Bayes’ formula

  • BAYES THEOREM I

    LECTURE N.8 MON  MARCH 28 11am-1pm ON LINE REC

    Bayes’ formula, partitions of events as cognitive bases

    Bayes’ Theorem

    example: appliaction to a classification problem on protein sequences

    sensitivity and selectivity of a classification

  • BAYES THEOREM II

    LECTURE N.9 WED MARCH 30 11am-1pm ON LINE REC

    discussion of Bulashevska2008

    intrinsically disordered proteins

    Discussion of Puga2015a (Altman series) presented by Martina Roiati

    Confusion Matrices

    ROC curves(Fawcett2006)

    Jackknife

  • INTRODUCTION TO THE TEST OF HYPOTHESES

    LECTURE N.10  MON APRIL 4 ONLINE 11am-1pm REC

    Null and alternative hypotheses

    Test statistic and the null distribution

    p-value

    level of statistical significance: errors of type I and II

    Bilateral/unilateral tests

    (see Whitlock&Schluter chap. 6, see also Rosner 7.1,7.2)

  • ANALYSIS OF PROPORTIONS

    LECTURE N.11 WED APRIL 6 11am-1pm ONLINE REC
    Proportions (and the binomial distribution)
    The Binomial distribution
    Sampling the proportions: parameter estimates, uncertainty
    Testing proportions: the binomial test

    Study materials: Whitlock and Schluter chap.7

    further reading: W&S interleaf 3: why statistical significance is not the same as biological importance?

    W&S Interleaf 4 Correlation does not require causation (see “Spuriuous Correlations” a nice website:

    http://www.tylervigen.com/spurious-correlations
    Problem 13 from W&S

  • ESTIMATION

    LECTURE N.12 MON APRIL 11 Auletta 2 CU026 11am-1pm REC

    Estimators vs parameters

    Point estimates vs interval estimatrion

    Random numbers and random samples

    Standard error of the mean

    Central limit theorem

    Interval estimation

    t-distribution (Student)
    Chi-square distribution
    Percentiles and confidence intervals

    The study material for this lecture can be found in chap. 6 of Rosner’s textbook
    and chap.10 and  11 of W&S

  • INTERVAL ESTIMATION, THE BOOTSTRAP AND ERROR BARS

    DA_2022 LECTURE 13 WED APR 13 11-13 auletta 2 CU026

    -The bootstrap Montecarlo method to evaluate estimates and confidence interval

    from just one sample (see R chap 6.7, and W&S chap 19)

    -confidence intervals definition and evaluation using t-distributions and chi-square distributions formulas and examples.

    HOMEWORK to be done on the LOG-BOOK

    REVIEW QUESTION 6B AND & 6C in Rosners’ textbook

    The study material for this lecture can be found in chap. 6 of Rosner’s textbook


  • TESTING HYPOTHESES WITH ONE SAMPLE

    LECTURE N.14 WED APRIL 20 On line REC

    One sample/two sample tests
    One sided test/two sided test
    Parametric/non parametric tests
    Type I and Type II errors, Power of a test
    Test flowchart (R p.268)
    One sample test for the mean of a normal variable (one sided/two sided test R 7.3/R.7.4))
    Acceptance/rejection regions
    P-values
    Rosner’s chapter 7 and Whitlock’s chap. 6
    As an home work for next mondy please complete by yourself the study of the first paragraphs (7.1,7.2, 7.3 and 7.4 by Rosner)
    In particular consider the review questions  7A p.222 of R IN THE LOGBOOK

  • FORMAL REMARKS ON THE TEST OF HYPOTHESES

    LECTURE N.15 WED APRIL 27 11am-1pm auletta 2 CU026 REC

    Parametric vs. non parametric  tests
    Formal structure of a test: parameter space and the space of samples
    Role of H0
    Critical regions
    Mapping beween parameter and sample space: type I and Type II errors
    Amplitude and power of a test
    Optimal critical region

    Study materials:

    Rosner’s chap. 7 par. 7.1- and lecture notes DA_2020_L18_notes


  • TESTING HYPOTHESES WITH TWO SAMPLES

    LECTURES N 16 (02 May2022) and N17 (04 May 2022) AULETTA 2 11am-1pm

    One sample vs two sample tests MOTIVATION

    General reference for this topic: Rosner’s chapter 8 and Whitlock’s chap. 12

    Assumption of normality (see Whitlock chap.13)

    Two samples (cross-sectional, synchronic studies) vs longitudinal paired tests (longitudinal, diachronic studies)

    The paired t-test (R 8.2) [paired t-test statistic, acceptance region, p-value, interval estimation (R.8.3)]

    Two sample test for independent samples with equal variances: acceptance region, p-value (R. 8.4), interval estimation (R.8.5)

    Testing for the equality of variances (R section 8.6): the F distribution, The F-test

    Two sample t test for independent samples with different variances (R. 8.7) (self study)



  • NON PARAMETRIC TESTS I

    DA_2022 LECTURE N.18 Monday may 9 auletta 2 11am-1pm

    •The problem of non-normal distributed data
    •Transformations: lognormal distributions
    •Tests of normality
    •Non parametric tests
    •Sign test
    •Binomial distribution

    Study materials:

    Rosner’s chapter 9 and Whitlock’s chap. 13 (very good)

    MOREOVER: As an exercise (to be recorded in the logbook) I suggest that you look at the very good scholarly lecture by professor Francesco Pauli of Trieste (in Italian) on the Neyman-Parson paradigm of testing hypotheses published on you tube


  • NON PARAMETRIC TESTS II

    DA_2022 L 19 11 may 2022 auletta 2 11am 1pm

    •Basic distinctions and concepts in inference: one sample/two samples/Multiple samples; power/significance (type I/type II errors) sample size/ experimental design

    •Overwiew of statistical test (see Rosner’s general  flow-chart p.895;

    See also W&S  interleaf on p. 465 which test I should use?

    •The Wilcoxon rank sum test (Mann-Whitney U test)


  • MULTIPLE PARAMETRIC NON PARAMETRIC TEST I

    DA_2022 L 20 16 may 2022 auletta 2 CU026 11am 1pm

    MULTIPLE PARAMETRIC/NON PARAMETRIC TESTS (R 12.1-12-4; 12.7)

    Whitin group/between-group variability

    One-way ANOVA

    Bonferroni correction

    Kruskal-Wallis tes


  • MULTIPLE PARAMETRIC NON PARAMETRIC TESTS II

    DA 2022 L21 18 may 2022 auletta 2 Cu026 11am 1pm

    •F test statistics (see Rosners’ chapter 12 see R chap. 8.6)
    • Graded Homework Formally derive  equation 12.4 and equation 12.5 in Rosner’s textbook
    •A glance to the jungle…(Rosner’s road map)
    •Further discussion of the ANOVA one way test
    •Kruskal-Wallis test

  • CORRELATION/REGRESSION I

    DA_2022_L22 Monday May 23 auletta 2 11am-13 pm REC

    •General notion of correlated events
    •General notion of correlation
    •Pearson’s Correlation Coefficient (sample)
    •Population Correlation coefficient

    •The mathematics of linearity: linear spaces of finite dimension, vectors,
    •Linear transformations, matrices
    •Discussion of geometric data analysis/dimensional reduction / classification/clustering)
    •Intro to Principal Component analysis (PCA)
    See Rosner 11.7, 11.8,11.14 W&S 16.1,16.2,16.3,16.4

  • INTRODUCTION TO PRINCIPAL COMPONENT ANALYSIS

    DA_2022_L23 wed may 25 auletta 2 11 am-1 pm

    Search for multivariate correlations in an object-descriptor data table

    The language of linear mathematics: symbolic operators and numerical representatives

    What is a vector? What is a linear transformations of a vector?

    What is a change of reference frame (basis)?

    The eigenvalue problem

    Z-transform of an object-descriptor table

    Find eigenvalues and eigenvectors of the covariance matrix such as to maximize variance

    The (ordered) eigenvalues of the covariance matrix encode the variance contained in the original data.

  • CORRELATION/REGRESSION II

    DA_2022_L24 monday may 30th auletta 2 11am-1pm


    Linear regression analysis independent/dependent variables)
    Least squares method
    W&S 17.1,17.2
    Rosner 11.1, 11.3, 11,11.4

  • Argomento 24