المقرر: Probability for Data Science

Brief course description: The course will cover the main topics in high-dimensional probability
theory. High-dimensional probability is an area of probability theory that studies random objects
in Rn where the dimension n can be very large. The applications in data science of the introduced
theoretical tools will be discussed.

Course topics:

Basic tail and concentration bounds: from Markov to Chernoff, sub-Gaussian vari-
ables and Hoeffding bounds, sub-exponential variables and Bernstein bounds, some one-
sided results, uniform laws of large numbers.
Sparse linear models in high dimensions: problem formulation and applications
and penalized estimators, recovery in the noiseless setting, estimation in noisy settings
and LASSO estimator, bounds on prediction error, variable or subset selection.
Nonparametric least squares: problem set-up, oracle inequalities, regularized esti-
mators.

Textbooks:

R. Vershynin (2018) High-dimensional probability. An introduction with applications in
Data Science. Cambridge University Press.
M. J. Wainwright (2019) High-dimensional statistics: A non-asymptotic viewpoint. Cam-
bridge University Press.