Corso: Big Data Computing 2021-22

Welcome to Moodle's page of Big Data Computing 2021-22!

Class Schedule

Tuesday from 5:00 p.m. to 7:00 p.m.
Wednesday from 8:00 a.m. to 11:00 a.m.

How to Attend Classes

According to the guidelines provided by Sapienza University, to contrast the COVID-19 pandemic, the course will be held both in-person and remotely. For any further information, students must refer to the official documentation available on the Sapienza website.

Attending Classes in Person: Room 1L - Via del Castro Laurenziano 7a

Students willing to attend classes in person must issue their request through the Infostud Lab App or the Prodigit Sapienza online booking system, according to the established rules (please, see here). Once the booking is confirmed - according to the class schedule above - students must go to Room 1L, which is located in Via del Castro Laurenziano 7a.

Attending Classes Remotely: Zoom

Students who are willing to attend classes remotely online will need to register for the dedicated Zoom conference using the following link: https://uniroma1.zoom.us/meeting/register/tZAkdOysqjkiG9SU5I1rG-oENGV-RIfCxLwv

Course website: Click here

Description and Goals

The amount, variety, and rate at which data is being generated nowadays, both by humans and machines, are unprecedented. This opens up a number of challenges on how to deal with those data, as traditional computing paradigms are not conceived to operate at such a scale.

"Big Data" is the umbrella term that has rapidly become popular to describe methodologies and tools specifically designed for collecting, storing, and processing very large or complex data sets. In addition to addressing foundational computer science problems, such as searching and sorting, big data computing mainly focuses on extracting knowledge - thereby value - from large-scale data sets using advanced data analysis techniques, such as machine learning.

This course is intended to provide graduate-level students with a deep understanding of programming models and tools that are suitable for the large-scale analysis of data distributed across clusters of computers. More specifically, the course will give students the ability to proficiently develop big data/machine learning solutions on top of industry-standard frameworks, such as Hadoop and Spark, to tackle real-world problems faced by the so-called "Big Five" tech companies (i.e., Apple, Amazon, Google, Microsoft, and Facebook): text/graph analysis, classification/regression, and recommendation, just to name a few.

Prerequisites
The course assumes that students are familiar with the basics of data analysis and machine learning, properly supported by a strong knowledge of foundational concepts of calculus, linear algebra, probability, and statistics. In addition, students must have non-trivial computer programming skills (preferably using Python programming language). Previous experience with Hadoop, Spark, or distributed computing is not required.

Docente: GABRIELE TOLOMEI