SBWL 1: Data Processing 1 (PI2.0)
Winterterm 2019
Axel Polleres, Stefan Sobernig
Table of contents
Schedule
Organisational
Unit details
Jupyter Notebook
Supplemental Reading
Syllabus
Overall, students shall gain fundamental knowledge for dealing with different data formats and in using methods and tools to integrate data from various sources in this course
- Hands-on experience in processing and preparing data for data science tasks with Python.
- An understanding of how to use Python's standard libraries to write programs, access various data science tools.
- Working knowledge how to solve basic data (pre-)processing tasks , including:
- Finding & accessing data (e.g., tabular (CSV), tree (JSON or XML), graph-shaped (RDF) data but also databases)
- Cleansing and normalizing data
- Sorting, filtering and grouping data
- Tools and algorithms for data transformation
- Connection to and loading data into a database system and indexing techniques, for faster access of data in a database
Schedule
Organisational
Instructor(s)
axel.polleres@wu.ac.at
stefan.sobernig@wu.ac.at
Rositsa Ivanova (Tutor)
rositsa.ivanova@wu.ac.at
Grading
See the authoritative details at Learn@WU.
Course Material
Unit details
- Introduction
- Motivation & expected learning outcome
- Course structure
- Grading
- What is Data Science and how does it work? (theory)
- Course tools and materials (practice)
- Python & Jupyter Coding Environment (practice)
Slides: This unit is also available in a
PDF format and as a single
HTML Page
Readings:
Notebook of Unit1
- Data encoding and exchange formats, standards (JSON, CSV, XML, RDF)
- How and where to find data?
- Data access and parsing
- Encoding (conversion of encodings)
- Data format specific parsing in Python
Slides: This unit is also available in a
PDF format and as a single
HTML Page
Readings:
Notebook of Unit2
- data inspection/ reshaping
- data filtering
- data sorting
- data aggregation (grouping)
Slides: This unit is also available in a
PDF format and as a single
HTML Page
Notebook of Unit3
- Missing data
- Data duplicates
- Data outliers (incl. outlier exploration, removal)
Slides: This unit is also available in a
PDF format and as a single
HTML Page
Notebooks of Unit 4
Storing/loading data to/from a file vs.
Connection to and loading data into and from a Database System
- Python and Persistence:
- Persisting objects in files: Pickle
- Relational Databases Systems: SQLite
- Querying data from a Relational Database
- Persisting objects in a Relational Database
Slides: This unit is also available in a
PDF format and as a single
HTML Page
Readings:
Notebook of Unit5
- Basic analysis of algorithms: The Big O
- (Library support):
- High-level libraries: pandas (cont'd)
- Low-level libraries: numpy, scipy
- Plotting (cont'd): seaborn, bokeh
- Parsing
- Visualization primer: matplotlib, pandas
Slides: This unit is also available in a
PDF format and as a single
HTML Page
Readings:
Notebooks of Unit 6
Jupyter Notebook
The theoretical part of the course is accompanied by practical code examples and hands on exercises using the interactive Python environment Jupyter.
Supplemental Reading
Coding