Course information

  • Title: Data Science Computer Lab
  • Neptun code:
  • Instructor: László Oroszlány
  • Semester: 3
  • Type: Computer Lab
  • Credit points: 5
  • Prerequisites: -

Course description

The goal of the course is instil practical skills needed for exploratory data analysis. With the acquired knowledge the student shall be able to perform independent research requiring handling of big data. To this end the students will have to explore a couple of longer running projects inspired data intensive problems drawn from multiple fields such as astronomy, genomics and social net-works. The students will familiarize themselves with a wide skillset from various software engineering techniques to presenting their well distilled research in a manner that is accessible for the general public.

Topics

  • Designing data models, databases and file standards
  • Implementing data extraction, transformation and loading framework
  • Data acquisition and processing techniques and technologies
  • Data processing pipe-lines
  • Parallel data processing
  • Web-based data presentation technologies
  • Presenting distilled reports in a variety of formats from text based documents to interactive infographics
  • Practical issues of data science

Recommended readings

  • Joel Grus: Data Science from Scratch (O’Reilly 2015)
  • Željko Ivezić, Andrew J. Connolly, Jacob T. VanderPlas & Alexander Gray: Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data (Princeton University Press; 2014, ISBN: 978-0691151687)