Distributed Data Acquisition and Processing Infrastructure

Entry requirements: Basic experience in working with databases and computer systems, programming and web-technology skills

Credits: 5

Course: Core

Language of the course: Russian

Lecturer

Denis Nasonov

Objectives

Learning outcomes:

  • Identification of key reasons behind Big Data creation; Big Data definition and identification
  • Introduction to Big Data processing technologies, Grid, WMS, MapReduce
  • MapReduce basics and Apache Hadoop technology
  • HDFS, Apache Hadoop basic infrastructure
  • Introduction to Apache Spark and Apache Streaming technology

Contents

Big Data technologies are by all means crucial in software solutions of large companies. As of today efficient data processing and analysis do not only constitute a base for successful business development but can also become a decisive competitive edge. For this reason the course focuses on mastering skills for handling and analyzing Big Data. The course covers a brief history of Big Data, as well as its definitions and identification. Students will learn the basics of working with the distributed file system HDFS and Apache Hadoop technology, basic functioning of MapReduce, and Apache Spark and Spark Streaming technology. As a result, students will be skilled in main Big Data technologies, such as Apache Hadoop and Apache Spark.

Format

Practical sessions

Assessment

Attendance is mandatory. Students should complete all the practical assignments. The final grade is based on the student performance throughout the course.