Big Data Technologies and Infrastructure
Entry requirements: Basic programming skills, knowledge of web technologies, SQL and DBMS. Experience in working with databases.
Language of the course: Russian
- Identification of key reasons behind th eemergence of Big Data; Big Data definition and identification
- Introduction to Big Data processing technologies, Grid, WMS
- MapReduce basics and Apache Hadoop technology
- HDFS, Apache Hadoop basic infrastructure
- Introduction to Apache Spark and Apache Streaming technology
- Review of information retrieval means and its properties via data processing
- Review of data gathering technique via intelligent data analysis (Data mining)
- Ability to find open data for research
- Ability to gather specific information from a variety of sources
- Understading of how to store and work with very large amounts of data.
Big Data technologies are by all means crucial in software solutions of large companies. As of today efficient data processing and analysis do not only constitute a base for successful business development but can also become a decisive competitive edge. For this reason the course focuses on mastering skills for handling and analyzing Big Data. The course covers a brief history of Big Data, as well as its definitions and identification. Students will learn the basics of working with the distributed file system HDFS and Apache Hadoop technology, basic functioning of MapReduce, and Apache Spark and Spark Streaming technology. As a result, students will be skilled in main Big Data technologies, such as Apache Hadoop and Apache Spark.
Lectures and practical sessions
Attendance is mandatory. Students should complete all the assignments and course papers. The final grade is based on the student performance throughout the course: 60% course paper (of which 20% data retrieval, 20% data processing with Big Data technologies, 20% analysis and report); 20% work on seminars; 20% test.