Slides: This unit is also available in a PDF format and as a single HTML Page
Readings:
and memory consumption.
What would you consider "rapidly", "slowly"?
Commonly found types of time growth for some input n:
increases the required time by a constant amount, i.e. logarithmic (example: binary search).
How could we sort it by a different column? e.g., how could we sort countries by population?
Let's look at the excerpts from the following notebook
haystack = [('BE', 10839905),
('BG', 7563710),
('CZ', 10532770),
('DE', 81802257),
('EE', 1365275),
('ES', 47021031),
('FR', 64611814),
('IT', 60340328),
('CY', 819100),
('HU', 10014324),
('NL', 16574989),
('PL', 38529866),
('PT', 10573479),
('RO', 22480599),
('SK', 5435273),
('FI', 5351427),
('SE', 9415570),
('NO', 4858199),
('CH', 7877571)]
haystack.sort() # by country code
haystack.sort(key=lambda x:x[1]) # by population count
Note: if you know that a file is sorted, then searching in that file becomes easier/cheaper!
Bottomline: (pre-)sorting can be costly, but might speed up other operations... another example: grouping!
# Search for first entry bigger than number in a sorted
# list of lists of length 2:
def binary_search(number, array, lo, hi):
if hi < lo: return array[lo] # no more numbers
mid = (lo + hi) // 2 # midpoint in array
if number == array[mid][0]:
return array[mid] # number found here
elif number < array[mid][0]:
# try left of here
return binary_search(number, array, lo, mid - 1)
else:
# try above here
return binary_search(number, array, mid + 1, hi)
# Sample call: Find me a country with a pop. > 5m people?
binary_search(5000000, haystack, 0, len(haystack))
sort
applies Timsort: \( O(n\log{}n) \) (worst case).
import pandas as pd
contains high-level data structures and tools designed to make data analysis fast and easy. Pandas are built on top of NumPy, and makes it easy to use in NumPy-centric applications.
Pandas is well suited for many different kinds of data:
Here are just a few of the things that pandas does well:
It takes a while to get used to pandas. The documentation is exhaustive and there exists hundreds of tutorials and use cases
Checkout the notebook pandas.ipynb
import numpy as np
Numpy the fundamental package for scientific computing with Python. It contains among other things:
Check out this tutorial or this one (includes also scipy and matplotlib)
NumPy does not provide high-level data analysis functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like Pandas much more effectively.
SciPy is open-source software for mathematics, science, and engineering
The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines , such as routines for numerical integration and optimization.
from scipy import linalg, optimize
Again, check out the official tutorials
Some examples:
There exists many libraries for plotting: