Python Conda Environments — Core Concepts

Manage Python and non-Python dependencies in isolated conda environments with reproducible environment files.

Why this topic matters

Python virtual environments (venv) handle Python packages, but many data science and scientific computing workflows depend on system-level libraries — CUDA for GPU computing, GDAL for geospatial work, or MKL for optimized linear algebra. Conda manages both Python packages and these non-Python dependencies in a unified system, making it the standard environment manager for data science teams.

How it works

Creating and using environments

# Create environment with specific Python version
conda create -n myproject python=3.11

# Activate it
conda activate myproject

# Install packages
conda install numpy pandas scikit-learn

# See what's installed
conda list

# Deactivate
conda deactivate

Each environment is a self-contained directory with its own Python interpreter, libraries, and binaries. Activating an environment adjusts your PATH so commands use that environment’s tools.

Environment files

For reproducibility, define environments in YAML:

# environment.yml
name: ml-project
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - numpy=1.26
  - pandas>=2.0
  - scikit-learn
  - jupyter
  - cudatoolkit=12.1
  - pip:
    - transformers
    - wandb

Create from the file:

conda env create -f environment.yml

Update an existing environment:

conda env update -f environment.yml --prune

The --prune flag removes packages that were dropped from the YAML.

Key concepts

Channels

Channels are repositories where conda finds packages. The most important ones:

defaults: Anaconda’s curated channel (comes with Anaconda/Miniconda)
conda-forge: Community-maintained, largest selection, most up-to-date
pytorch: Official PyTorch builds with CUDA support
nvidia: NVIDIA’s GPU tools

Channel priority matters. Conda checks channels in order and uses the first match:

# Set conda-forge as highest priority
conda config --prepend channels conda-forge
conda config --set channel_priority strict

Strict channel priority prevents mixing packages from different channels, which avoids subtle binary incompatibilities.

Conda vs pip

Aspect	Conda	Pip
Package scope	Any language (Python, C, R, CUDA)	Python only
Dependency solver	SAT solver, checks all constraints	Resolves iteratively
Environment management	Built-in	Separate tool (venv)
Package format	.conda / .tar.bz2	.whl / .tar.gz
Package source	Conda channels	PyPI
Binary packages	Pre-built for each platform	Wheels (some need compilation)

They can coexist: install conda packages first, then use pip for packages not available on conda channels. The environment.yml format supports this with the pip: section.

Solving and dependency resolution

Conda’s solver examines the entire dependency graph before installing anything. If package A needs NumPy 1.24 and package B needs NumPy 1.26, conda tells you about the conflict upfront rather than installing one and breaking the other.

This thoroughness comes at a cost — solving can be slow for large environments. The default solver (libmamba, integrated since conda 23.10) is significantly faster than the original solver.

# Full export (platform-specific, exact versions)
conda env export > environment-lock.yml

# Cross-platform export (no build strings)
conda env export --no-builds > environment.yml

# Minimal export (only explicitly installed)
conda env export --from-history > environment-minimal.yml

The --from-history export is most portable — it lists only what you explicitly asked for, letting conda resolve platform-appropriate versions on the target machine.

Miniconda vs Anaconda

Miniconda: Minimal installer — just conda, Python, and essential packages (~80 MB). Install what you need.

Anaconda: Full distribution with 250+ pre-installed scientific packages (~3 GB). Ready to use immediately but heavy.

For most workflows, Miniconda with conda-forge is the recommended approach — you get exactly what you need without bloat.

Common misconception

“Conda replaces pip entirely.” Many Python packages exist only on PyPI, not on conda channels. The practical approach is conda-first for packages available there (especially those with C dependencies), then pip for the rest. The key rule: install conda packages first, pip packages second — pip installations don’t register with conda’s solver.

One thing to remember

Conda environments isolate entire software stacks — Python, C libraries, CUDA, and more — making them essential for data science workflows where pip and venv can’t manage the full dependency chain.

pythoncondaenvironmentsdata-scienceanaconda