Skip to content

Instantly share code, notes, and snippets.

@marc31
Last active July 27, 2024 11:35
Show Gist options
  • Save marc31/dd8222855fce04e7aeb99663203f2de8 to your computer and use it in GitHub Desktop.
Save marc31/dd8222855fce04e7aeb99663203f2de8 to your computer and use it in GitHub Desktop.
Python: Setting Up a Reproducible, Isolated, and Easily Installable Work Environment

Python: Setting Up a Reproducible, Isolated, and Easily Installable Work Environment

The goal is to create a work environment that is easy to install, reproducible, and isolated. By reproducible, we mean an environment that can be installed by oneself and shared with a friend without issues.

Pip

Pip has several limitations:

  • Does not manage multiple environments.
  • Is slow
  • Only one repository (channel) is accessible.
  • Cannot manage system dependencies, such as installing Python itself or other system libraries like netCDF.

Conda, Mamba, Micromamba

Conda, Mamba, and Micromamba are fully compatible with each other. They can install dependencies other than Python packages, including Python itself, netCDF, Fortran, and even Pip.

However, using their env.yml files does not guarantee a 100% reproducible environment, especially across different platforms (e.g., combinations of processor types and operating systems).

To maximize compatibility across different environments, follow these best practices:

For multi-system compatibility:

  • Avoid using == and prefer ~=.
  • Do not include hashes at the end of dependency specifications.

Examples:

  • Incorrect:

    - numpy=1.26.4=py310h055cbcc_0
  • Incorrect:

    - numpy==1.26.4
  • Correct:

    - numpy~=1.26.4
  • Do not export with conda env export but rather use conda env export --from-history.

  • Ideally, manually write and maintain the dependencies in the environment file, removing unnecessary dependencies.

Minimize the number of channels in the environment file:

channels:
  - conda-forge
  - nodefaults

Avoid:

channels:
  - anaconda
  - defaults
  - conda-forge

Refer to the Conda package match specifications for more details.

Example of an env.yml

name: test_environment
channels:
  - conda-forge
dependencies:
  - python~=3.10.14
  - pip~=24.0
  - numpy~=1.26.4
  - xarray~=2023.6.0
  - pip:
      - faicons~=0.2.0

Poetry

Poetry can generate a project structure directly and manage dependencies better, ensuring they are cross-platform. However, it does not support conda-forge or non-Python dependencies.

Pixi

Pixi is a new tool that combines the advantages of Poetry and Conda. It is also highly performant, providing an efficient and effective solution for dependency and environment management.

Nb Conda vs Mamba vs Micromamba

Conda is the default package and environment manager provided by Anaconda and Miniconda. Key features include:

  • Dependency Management: Conda manages libraries and their dependencies, making it easier to install complex packages.
  • Environments: Allows the creation of isolated environments to avoid conflicts between libraries.
  • Multi-language Support: Although primarily used for Python, Conda can also manage packages for other languages like R.
  • Large Ecosystem: A vast number of packages are available in the conda-forge and anaconda channels.

Mamba

Mamba is a project designed to enhance the Conda experience by improving speed and efficiency. Key features include:

  • Performance: Written in C++ and uses more efficient algorithms for dependency resolution, making it much faster than Conda.
  • Compatibility: Fully compatible with Conda and can be used as a direct replacement. Basic commands remain the same, facilitating an easy transition for Conda users.
  • Package Management: Like Conda, Mamba manages installations, updates, and removals of packages, as well as environment management.
  • User Output: Provides clearer and more informative error messages and user outputs.

Micromamba

Micromamba is a lightweight version of the Mamba package manager. It is a standalone C++ executable with a separate command-line interface. It does not require a base environment and does not come with a default version of Python, making it a self-contained solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment