Tools#
Jupyter Notebooks#
Jupyter Notebooks are a popular tool in the data science community. This is because they offer the ability to mix code and explanatory text. In fact, this entire course book is written as a series of Jupyter Notebooks and Markdown text files. As with any tool, the Jupyter Notebook has some drawbacks; in fact, it is a popular subject for blogs:
Personally, I (Andrew McCluskey) like Notebooks as a place for preparing code and text to be descriptive of the work completed. For example, sharing some analysis with a colleague who is less comfortable with programming or prototyping a solution to a problem before building this into a large piece of code, such as a Python package.
Bristol Only
In the SCIFM0002 unit, you are expected to be comfortable working with Jupyter Notebooks.
How you work with them is up to you: for example, you may want to install the Jupyter application or run your .ipynb
files through VSCode.
Python Environments and Package Management#
The Python ecosystem we will be using is vibrant; having so many packages with such a vast range of functionality can be extremely powerful. However, it comes with drawbacks, specifically around the compatibility of different packages. Therefore, the concept of Python environments has become popular. A Python environment is a sandbox containing a specific set of packages, aiming to ensure package compatibility and reproducibility of experience.
There are a few different ways that we can work with Python environments.
Python’s built-in approach uses a tool called venv
.
However, this is limited to using packages distributed through the Python Package Index (PyPI),i.e., with pip
.
A popular alternative is conda
(or mamba
).
conda
and mamba
have the added functionality to be able to install conda packages – conda packages are more flexible than PyPI packages (it is easier to build a conda package with non-Python code included).
Conda Cheatsheet
It can, at times, be hard to remember all the different conda
commands.
Access to a cheatsheet can eb valuable in times like these.
To ensure that you have all of the necessary packages to work effectively with this course book, we provide the following environment file: special-topics.yml. To install this environment, you should install conda or mamba, and then create the environment by downloading the environment file and running the following command in the bash terminal:
conda env create -f special-topics.yml
To activate this conda environment, the following command can be used:
conda activate special-topics
You should be able to access all the necessary packages to work through this course book from this environment. To remove the environment, use the following command:
conda env remove -n special-topics
Bristol Only
Similar to the Jupyter Notebooks, it is expected that you will be able to run within a conda environment. If you are running on Windows, installing this within the Windows Subsystem for Linux (WSL) partition is best. You should work out how to run Jupyter Notebooks from the WSL partition to ensure you are accessing the relevant conda environment. If you have any trouble, speak to the unit director or one of the PhD student demonstrators.