#SLIDE STYLE
#!/usr/bin/env python3
from traitlets.config.manager import BaseJSONConfigManager
from pathlib import Path
path = Path.home() / ".jupyter" / "nbconfig"
cm = BaseJSONConfigManager(config_dir=str(path))
cm.update(
"rise",
{
"theme": "serif",
"transition": "slide"
}
)
#CSS MARKDOWN STYLES
from IPython.display import HTML
style = """<style> h1 {color: darkred; font-size: 28px;}
h2 {color: darkred; font-size: 22px;}
h3 {color: darkred; font-size: 18px;} </style>"""
HTML(style)
... bottomline: there is no single definition, but some main recurring terms:
A growing area of private and social life become reflected in computerised data to be turned into "valuable" insights.
... plus some recurring mention of common skills...
Data analyst | Data scientist | |
---|---|---|
Analyt. skills | Analytical thinking | Excellent in math and statistics |
Apply established analysis methods | Visualisation, new approaches | |
Tech. skills | Data modelling, databases | Data modelling, databases |
Use of analysis tools | Data mining | |
Programming skills of advantage | Algorithm development, method abstraction | |
Domain knowledge | Detailed domain knowledge | Background domain knowledge |
Project management | Creativity | |
Communication skills | Team work |
''3 sexy skills of data geeks'' (Nathan Yau, Rise of the Data Scientist, 2009)
Example for data journalism
Dataset for published articles </font>
Example of a "classic" data-driven process: ETL in dataware housing
See., e.g. Matteo Golfarelli, Stefano Rizzi. Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill, 2009.
"Classic" views are challenged by datafication:
"Knowledge Discovery in Databases (KDD)" process (often used in the course of Data Mining) </font>
Source: Howard Hamilton </font>
Towards a ''Data Science workflow"
Cathy O'Neil, Rachel Schutt. Doing Data Science: Straight Talk from the Frontline (O'Reilly, 2013) (Chapter 2) </font>
Danyel Fisher & Miriah Meyer. "Making Data Visual" (O'Reilly, 2018) (Chapter 2)* </font>
WARNING: At each stage, things can go wrong! Any filtering/aggregation may bias the data!
"Data wrangling is a huge — and surprisingly so — part of the job,” said Monica Rogati, vice president for data science at Jawbone, whose sensor-filled wristband and software track activity, sleep and food consumption, and suggest dietary and health tips based on the numbers. “It’s something that is not appreciated by data civilians. At times, it feels like everything we do."
Again, not a single definition, but some recurring terms:
Source http://www.responsibledatascience.org/
NOTE:
Source https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
Source https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
Source https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
"The Python vs R debate confines you to one programming language. You should look beyond it and embrace both tools for their respective strengths. Using more tools will only make you better as a data scientist."