Overview#

This notebook contains content from the Whirlwind Tour of Python by Jake VanderPlas

One feature of Python that makes it so useful is the fact that it contains tools for a wide range of tasks. On top of this, there is a broad ecosystem of third-party modules that offer more specialized functionality.

Note

Python modules are a set of useful functions that eliminate the need for writing codes from scratch.

A Python module is a reusable chunk of code that you can import in your own projects so you don’t have to write all the code by yourself. There are around 140000 available Python projects and one way to discover and install them is to use the Python Package Index (PyPI). Note that we will use another way to install Python modules since we work with the open source data science platform Anaconda (as a general rule, you shouldn’t use PyPI in Anaconda).

Built-in modules#

Python’s standard library contains many useful built-in modules, which you can read about fully in Python’s documentation. Any of these can be imported with the import statement. Here is a short list of some of the modules you might wish to explore and learn about:

  • os and sys: Tools for interfacing with the operating system, including navigating file directory structures and executing shell commands

  • math and cmath: Mathematical functions and operations on real and complex numbers

  • itertools: Tools for constructing and interacting with iterators and generators

  • functools: Tools that assist with functional programming

  • random: Tools for generating pseudorandom numbers

  • pickle: Tools for object persistence: saving objects to and loading objects from disk

  • json and csv: Tools for reading JSON-formatted and CSV-formatted files.

  • urllib: Tools for doing HTTP and other web requests.

You can find information on these, and many more, in the Python standard library documentation: https://docs.python.org/3/library/.

Important modules#

Here a list of some of the modules we will use frequently:

  • pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. We will use pandas regularly in our course and you will find all relevant content in this introduction to pandas

  • NumPy offers tools for scientific computing like mathematical functions and random number generators.

  • SciPy contains algorithms for scientific computing.

  • matplotlib is a library for creating data visualizations.

  • Seaborn provides a high-level interface for drawing attractive and informative statistical graphics.

  • Altair is a simple, friendly and consistent API built on top of the powerful Vega-Lite visualization grammar of interactive graphics. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

  • plotly is a graphing library to make interactive, publication-quality graphs.

  • statsmodels includes statistical models, hypothesis tests, and data exploration.

  • scikit-learn provides a toolkit for applying common machine learning algorithms to data.

  • TensorFlow is an end-to-end open source platform for machine learning.

Import modules#

For loading built-in and third-party modules, Python provides the import statement. There are a few ways to use the statement, which we will briefly mention in this chapter, from most recommended to least recommended.

Explicit module import#

Explicit import of a module preserves the module’s content in a namespace. The namespace is then used to refer to its contents with a “.” between them. For example, here we’ll import the built-in math module and compute the cosine of pi:

import math
math.cos(math.pi)

Explicit module import by alias#

For longer module names, it’s not convenient to use the full module name each time you access some element. For this reason, we’ll commonly use the "import ... as ..." pattern to create a shorter alias for the namespace.

For example, the NumPy (Numerical Python) package, a popular third-party package useful for data science, is by convention imported under the alias np:

import numpy as np
np.cos(np.pi)

Explicit import of module contents#

Sometimes rather than importing the module namespace, you would just like to import a few particular items from the module. This can be done with the "from ... import ..." pattern.

For example, we can import just the cos function and the pi constant from the math module:

from math import cos, pi
cos(pi)

Implicit import#

Finally, it is sometimes useful to import the entirety of the module contents into the local namespace. This can be done with the "from ... import *" pattern:

from math import *
sin(pi) ** 2 + cos(pi) ** 2

This pattern should be used sparingly, if at all. The problem is that such imports can sometimes overwrite function names that you do not intend to overwrite, and the implicitness of the statement makes it difficult to determine what has changed.