The Ultimate Data Science Setup for Mac (2022 update)

February 15, 2022

You may think I am crazy, but I really enjoy setting up my computer. I am the kind of person who likes to wipe their computer clean every so often and start fresh with a clean slate. This blog post is an update to an earlier blog post from 2020. A lot has changed since then, so here is a look at my current setup.

Feedback?

Like what you see here? Or have some tools you think that I should have included? I would love to hear from you on twitter (https://twitter.com/TheReaLSamlam).

TL/DR

Feeling bold and want to completely clone my setup? Just run the following script and install everything at your own risk ☠️:

setup.sh
# Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install python tools
curl https://pyenv.run | bash
curl -sSL https://install.python-poetry.org | python3 -

# Command line tools
brew install just
brew install starship
brew install gh
brew install mas
brew install exa
brew install bat
brew install xclip
pipx install pls
pipx install bpytop

# Install others tools using Homebrew cask
brew install --cask docker
brew install --cask iterm2

# Fonts
brew tap homebrew/cask-fonts
brew install --cask font-fira-code-nerd-font

# R
brew install --cask xquartz

# Install apps via the app store
mas install 937984704  # Amphetamine
mas install 441258766  # Magnet

Homebrew

Using a package manager is an easy way to keep tools up to date, and forces a consistent approach for downloading and installing new tools. When ever possible, I try to brew install tools / software using Homebrew. To install homebrew run the following command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Throughout this blog post we will use brew install where ever possible to install our tools.

~/.zshrc

If you are using the zsh shell (now the default on mac) you will save your configurations and environment variables in the ~/.zshrc file. Below is my ~/.zshrc file. If there are any programs that you choose to no install just delete the associated lines from this file.

# ------------------------------------------------------------------------------
# General alias
# ------------------------------------------------------------------------------
alias ll="exa -la --created --header --no-permissions --no-user --no-time --binary"
alias tree="exa --tree"
alias b="bat --paging never"

# ------------------------------------------------------------------------------
# Python
# ------------------------------------------------------------------------------
alias py310="source ~/.venv/py310/bin/activate"

py_new_env() {
    rich "Creating a new virtual envrionment" --rule
    python -m venv venv
    source venv/bin/activate
    python -m pip install --upgrade pip wheel
}

# pyenv
export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"

# poetry
export PATH="$HOME/.local/bin:$PATH"

# ------------------------------------------------------------------------------
# Other
# ------------------------------------------------------------------------------
# starship
eval "$(starship init zsh)"

Python

Search “how to install python” on Google and you will find many differing opinions. My current approach is:

  • Use pyenv to manage my python versions (e.g. enables me to have both Python 3.9 and 3.10 on my computer).
  • Use venv for creating virtual environments.
  • Use poetry for managing dependencies in projects.
  • Use pipx to install system wide packages and command line tools.

pyenv

Installation

pyenv (https://github.com/pyenv/pyenv>) allows you to manage multiple versions of Python on your computer. To install I like to use the pyenv install script from https://github.com/pyenv/pyenv-installer:

curl https://pyenv.run | bash

The script installs “pyenv and friends” which includes:

  • pyenv
  • pyenv-doctor
  • pyenv-installer
  • pyenv-update
  • pyenv-virtualenv
  • pyenv-which-ext

Follow the instructions from the terminal output to configure pyenv. For me, I added the following line to my ~/.zshrc file:

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"

Using pyenv

Now that pyenv is installed, you can start to install different versions of python. Check all of the available versions using the pyenv install --list command:

pyenv install --list
show output
Available versions:
  2.1.3
  2.2.3
  ...
  3.9.1
  3.9.2
  3.9.4
  3.9.5
  3.9.6
  3.9.7
  3.10.0
  ...

As of writing this blog post 3.10.0 is the most current version of Python. You can install it using:

pyenv install 3.10.0

And then validate that the installation worked:

pyenv versions
show output
  system
  3.10.0

Global pyenv

I want to make the latest version of Python as my default. To do so I run the following command:

pyenv global 3.10.0
pyenv versions
show output
  system
* 3.10.0 (set by /Users/samedwardes/.pyenv/version)

Local pyenv

For some projects I may want to use a different version of Python then my default 3.10.0. To change the default version for a specific project I can use the local command. First install version a new version:

pyenv install 3.7.10

Then set it as the default python version for a new project:

mkdir old-py-project
cd old-py-project
pyenv local 3.7.10
pyenv versions
show output
  system
  3.10.0
* 3.7.10 (set by /Users/samedwardes/tmp/old-py-project/.python-version)

As soon as I navigate away from the project, my python version changes back to my global default.

cd ..
pyenv versions
show output
  system
* 3.10.0 (set by /Users/samedwardes/.pyenv/version)
  3.7.10

venv

Creating a new virtual environment

venv comes built into Python. It is used to create virtual environments. Every time I start a new project I create a new virtual environment. Lets demonstrate by creating a new project:

mkdir toy-project
cd toy-project

To create a new virtual environment run the following command:

python -m venv venv

Lets break down the command above:

  • python -m venv runs the venv program from the command line.
  • The last part of the command, the second venv is the positional argument for ENV_DIR. Run python -m venv --help to see all the arguments and options for the `venv program.
  • By convention, I always name the ENV_DIR as venv. However, you can name it anything you like (e.g. python -m venv my-virtual-environment would also work).

The python -m venv venv create a new directory in our current project named venv. Lets take a look inside:

tree venv -L 2
show output
venv
├── bin
│   ├── Activate.ps1
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── pip
│   ├── pip3
│   ├── pip3.10
│   ├── python -> /Users/samedwardes/.pyenv/versions/3.10.0/bin/python
│   ├── python3 -> python
│   └── python3.10 -> python
├── include
├── lib
│   └── python3.10
└── pyvenv.cfg

Inside the venv/bin/ directory are several files and scripts. These are used to activate the virtual environment. Run the following command to activate your virtual environment:

source venv/bin/activate

If you are using starship your command line will now have a nice indicator letting you know that you are using a virtual environment:

~/tmp/toy-project via 🐍 pyenv 3.10.0 (venv)

You can prove to yourself that you are in a brand new isolated Python environment by running:

pip list
show output
Package    Version
---------- -------
pip        21.2.3
setuptools 57.4.0
WARNING: You are using pip version 21.2.3; however, version 22.0.3 is available.
You should consider upgrading via the '/Users/samedwardes/tmp/toy-project/venv/bin/python -m pip install --upgrade pip' command.

Nice! You have a brand new canvas to start your next python project on. While the virtual environment is activated, anything you pip install will only be installed into the virtual environment.

First I will update pip to the most current version.

python -m pip install --upgrade pip

Then lets install a package:

pip install wheel
pip list
show output
Package    Version
---------- -------
pip        22.0.3
setuptools 57.4.0
wheel      0.37.1

Since I do this often, I have included a little helper function in my ~/.zshrc file to automate the creation and updating of new virtual environments.

py_new_env() {
    rich "Creating a new virtual envrionment" --rule
    python -m venv venv
    source venv/bin/activate
    python -m pip install --upgrade pip wheel
}

With this function you can call py_new_env() from the command line at any time to create, activate, and then update a new virtual environment 🎉!

Using with pyenv

When you run the command python -m venv venv the virtual environment will automatically be created using which ever version of python you currently have activated. If you are unsure, run the following command to check before creating a new virtual environment:

pyenv versions
show output
  system
* 3.10.0 (set by /Users/samedwardes/.pyenv/version)
  3.7.10

I can see that Python 3.10.0 is currently active. I can double check by just running:

python --version
show output
Python 3.10.0

If I want to create my virtual environment using a different python version I must first activate the other version using pyenv:

pyenv local 3.9.4

Then I can create my virtual environment:

python -m venv venv-394

Lets compare the two different virtual environments we created:

tree -L 3
show output
.
├── venv
│   ├── bin
│   │   ├── Activate.ps1
│   │   ├── activate
│   │   ├── activate.csh
│   │   ├── activate.fish
│   │   ├── pip
│   │   ├── pip3
│   │   ├── pip3.10
│   │   ├── python -> /Users/samedwardes/.pyenv/versions/3.10.0/bin/python
│   │   ├── python3 -> python
│   │   ├── python3.10 -> python
│   │   └── wheel
│   ├── include
│   ├── lib
│   │   └── python3.10
│   └── pyvenv.cfg
└── venv-394
    ├── bin
    │   ├── Activate.ps1
    │   ├── activate
    │   ├── activate.csh
    │   ├── activate.fish
    │   ├── easy_install
    │   ├── easy_install-3.9
    │   ├── pip
    │   ├── pip3
    │   ├── pip3.9
    │   ├── python -> /Users/samedwardes/.pyenv/versions/3.9.4/bin/python
    │   ├── python3 -> python
    │   └── python3.9 -> python
    ├── include
    ├── lib
    │   └── python3.9
    └── pyvenv.cfg

10 directories, 25 files

You can see that the first one we created (venv) is using python3.10, and the second one we created (venv-394) is using python3.9.

venv vs. pyenv

When I first started using these tools I would often get them mixed up. What is venv doing? What is pyenv doing? Do I need both?

  • pyenv controls your python version (e.g. 3.10 vs. 3.9).
  • venv isolates your project dependencies (the things you pip install).

poetry

poetry is a tool for python dependency management and packaging. From their website https://python-poetry.org/:

Python packaging and dependency management made easy

Installation

To install poetry run the following command:

curl -sSL https://install.python-poetry.org | python3 -

Follow the instructions from the terminal output to configure poetry. For me, I added the following line to my ~/.zshrc file:

export PATH="$HOME/.poetry/bin:$PATH"

Verify that the installation worked by running:

poetry --version
show output
Poetry version 1.1.12

Starting a new project with poetry

I use poetry for almost every new project that I start. Lets create an example project to demonstrate:

mkdir new-project
cd new-project
poetry init -n    # The `-n` flag tells poetry to not ask any questions interactively.
tree
show output
.
└── pyproject.toml

0 directories, 1 file

Your directory now has a new file, pyproject.toml. This file is used to keep track of the dependencies required for your project.

cat pyproject.toml
show output
[tool.poetry]
name = "new-project"
version = "0.1.0"
description = ""

[tool.poetry.dependencies]
python = "^3.9"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Go ahead and manually update the projects description in pyproject.toml. Too add a dependency use the poetry add command. This command essentially replaces pip install. When you use poetry add a few things happen:

  • The new package is added to your pyproject.toml file.
  • The poetry dependency resolver verifies the version requirements.
  • A virtual environment is created by poetry.
poetry add requests
show output
Creating virtualenv new-project-ubTVXCow-py3.9 in /Users/samedwardes/Library/Caches/pypoetry/virtualenvs
Using version ^2.27.1 for requests

Updating dependencies
Resolving dependencies... (3.1s)

Writing lock file

Package operations: 5 installs, 0 updates, 0 removals

 Installing certifi (2021.10.8)
 Installing charset-normalizer (2.0.12)
 Installing idna (3.3)
 Installing urllib3 (1.26.8)
 Installing requests (2.27.1)

Take a look at your project, you will notice a new file:

tree
show output
.
├── poetry.lock
└── pyproject.toml

0 directories, 2 files

The new poetry.lock file contains a detailed description of all the packages you are using (this includes requests, and all of the packages that requests depends on). The pyproject.toml file has been automatically updated to include requests as a dependency.

cat pyproject.toml
show output
[tool.poetry]
name = "new-project"
version = "0.1.0"
description = ""
authors = ["SamEdwardes <edwardes.s@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.27.1"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Using the poetry virtual environment

In order to isolate your package poetry creates a virtual environment. To access this virtual environment you can prefix any command with poetry run and it will run inside the virtual environment.

poetry run pip list
show output
Package            Version
------------------ ---------
certifi            2021.10.8
charset-normalizer 2.0.12
idna               3.3
pip                21.3.1
requests           2.27.1
setuptools         58.3.0
urllib3            1.26.8
wheel              0.37.0

It can be annoying to prefix every command with poetry run. Alternatively, you can run all commands in the poetry virtual environment with poetry shell.

poetry shell
pip list
show output
Package            Version
------------------ ---------
certifi            2021.10.8
charset-normalizer 2.0.12
idna               3.3
pip                21.3.1
requests           2.27.1
setuptools         58.3.0
urllib3            1.26.8
wheel              0.37.0

Adding development dependencies

When creating a python package it is common that you as the developer will need a particular package (e.g. like black for code formatting), but that the end user will not require that same package to run the program. poetry has a mechanism to handle this by allowing you to specify development dependencies with the --dev option.

poetry add --dev black
cat pyproject.toml
show output
[tool.poetry]
name = "new-project"
version = "0.1.0"
description = ""
authors = ["SamEdwardes <edwardes.s@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.9"
requests = "^2.27.1"

[tool.poetry.dev-dependencies]
black = "^22.1.0"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

As you can see, black has been added as dev-dependency. When you publish your package to PyPi it will not include black as a requirement, but it will include requests.

poetry vs venv

It may not be clear when you should use poetry, vs. when you should us venv. There is not correct answer, but in general I use poetry when:

  • I am creating a python package that I will share with others via PyPi and/or GitHub.

I use venv when:

  • I do not think that I will be sharing the code with anyone.
  • I want to quickly experiment

If you are unsure about which tool to use, just choose one and get started! You can always change your mind later on the. The important thing is that you are using a virtual environment!

pipx

On no, another python tool 🤯!

Yes, it is a lot to take in. But hopefully you will begin to find these tools useful as you continue your python journey. From the pipx website (https://pypa.github.io/pipx/):

pipx — Install and Run Python Applications in Isolated Environments

pipx-gif

To install pipx run the following:

brew install pipx
pipx ensurepath

I use pipx to install command line tools that I always want to be available to me:

pipx install cowsay
pipx install rich
pipx install jupyter-lab

Python packages

Below is a collection of my favourite Python packages. I will not go into detail here, but these are the packages that I use on the most regular basis.

# Data
pip install pandas

# Data science
pip install spacy
pip install -U scikit-learn

# Visualization
pip install plotly
pip install dash
pip install altair

# Web
pip install requests
pip install flask
pip install fastapi

# Fun stuff
pip install pydantic     # Data validation
pip install rich         # Beautiful command line outputs
pip install typer        # Create command line applications

R

Installing R

Install the latest version of R from CRAN:

cran-homepage

Select “Download R for macOS.

download-correct-version-of-r

If you are using a newer Mac with an M1 chip select the second option for Apple silicon arm64. Otherwise, choose the first option for Intel 64-bit.

installer-r

Click on the link and follow the instructions as prompted. Select all of the default configuration options.

Installing XQaurtz

In order to use R on your Mac OS you will also need to install XQuartz (https://www.xquartz.org/). From R for macOS page on CRAN:

Note: the use of X11 (including tcltk) requires XQuartz to be installed since it is no longer part of OS X. Always re-install XQuartz when upgrading your macOS to a new major version.

You can install XQuartz using homebrew:

brew install --cask xquartz

RStudio Desktop

If you use R, you are probably already using RStudio Desktop. You can install RStudio from here: https://www.rstudio.com/products/rstudio/download/#download.

R packages

When ever I am starting with a fresh R install the first thing I do is install the tidyverse 🤓. It include 90% of what I use on a regular basis. The tidyverse is actually not one package, but a collection of packages that follow a common design language. The tidyverse is my favourite part of R!

install.packages("tidyverse")

Terminal

Every nerds favourite place to be… the terminal. As a data scientist / developer your terminal setup is a great way to express yourself. Do you like to get crazy and customize everything? Do you keep it simple and stick to the defaults? I like to take a middle ground approach. I want things to look pretty, but I also do not want to waste too much time configuring things.

iTerm2

iTerm2 is a replacement for the default terminal app that comes with your mac. It includes some nice features such as tabs and split panes.

iterm2-screenshot

Image from https://iterm2.com/features.html

To install run the following command:

brew install --cask iterm2

Starship

starship is a cross-shell prompt. According to their website:

The minimal, blazing-fast, and infinitely customizable prompt for any shell!

starship-gif

Gif from https://starship.rs/guide/

I really like starship because:

  • I think the defaults look good and are reasonable.
  • I can easily have a consistent prompt across all of my devices.
  • When needed, starship also has lots of customization options.

To install starship run the following command:

brew install starship

Command line tools

exa

From the exa website (https://the.exa.website/)

A modern replacement for ls.

exa-image

Image from https://github.com/ogham/exa

It has defaults that I prefer, and has a nice coloured output. Run the following to install:

brew install exa

bat

From the bat repo (https://github.com/sharkdp/bat):

A cat clone with wings.

bat screenshot

Run the following to install:

brew install bat

just

From the just repo (https://github.com/casey/just):

just is a handy way to save and run project-specific commands.

just is very similar to the ubiquitous make command. It is inspired by make, but focuses on just being a command runner as opposed to a build tool. To install just run the following command:

brew install just

bpytop

From the bpytop GitHub repo (https://github.com/aristocratos/bpytop):

Resource monitor that shows usage and stats for processor, memory, disks, network and processes.

bpytop-screenshot

To install bytop use homebrew:

brew install bpytop

VS Code

Installation

VS Code is quickly becoming the editor of choice for many languages, including Python. For me, I use VS Code for most Python projects. I will also use VS Code when I want to quickly view a text based file (e.g. .json, .txt, .R, etc.).

Install VS Code directly from the VS Code Website and follow in the installation instructions:

https://code.visualstudio.com/download

Plugins

Out of the box VS Code comes with many great features, but it will not have everything you need. One of VS Codes greatest strengths is the strong eco system of plugins that are available. My favourites are:

Apps

Docker desktop

Very helpful for creating reproducible environments and for deploying applications to production (https://www.docker.com/products/docker-desktop/).

brew install --cask docker

Typora

A nice text editor for markdown (https://typora.io/). Typora costs $14.99 but is worth it if you are writing a lot of markdown.

Fonts

You can manage and install fonts using homebrew. The website https://www.nerdfonts.com/ provides many fun fonts that include glyphs (icons). When you use these fonts with tools like starship your terminal output will look much better!

I use the Fira Code Nerd Font which is recommended by starship:

brew tap homebrew/cask-fonts
brew install --cask font-fira-code-nerd-font

Remember to update iTerm2 (and any other terminals you may be using) to use your new font.

iterm-2-font-screenshot

Inspiration and reference

I have referenced and used the below guides many times. Check them out for additional inspiration and ideas on how to create your perfect data science setup.