Neural Architecture Search with Python — Core Concepts

Understand NAS search spaces, strategies (reinforcement learning, evolutionary, one-shot), and how Python frameworks like NNI and Optuna enable automated architecture design.

The Three Components of NAS

Every NAS system has three parts:

1. Search Space

The set of all possible architectures the algorithm can explore. A search space defines:

Available operations — convolution (3×3, 5×5), pooling, skip connections, attention layers
How operations connect — sequential, branching, parallel paths
Constraints — max layers, parameter budget, latency target

A larger search space has more potential but is harder to explore. Most practical NAS systems use a cell-based search space — find a good small building block (cell), then stack copies of it to build the full network.

2. Search Strategy

How the algorithm navigates the search space:

Strategy	How It Works	Compute Cost	Quality
Random Search	Try random architectures	Baseline	Surprisingly good
Reinforcement Learning	Train a controller network that proposes architectures	Very high	High
Evolutionary	Mutate and recombine good architectures	High	High
Bayesian Optimization	Build a model of architecture → performance mapping	Moderate	Good
One-Shot / Supernet	Train one giant network containing all candidates	Low	Good
Differentiable (DARTS)	Make architecture choices continuous and differentiable	Low	Good

Early NAS (2017) used reinforcement learning and required 450 GPU-days to find a single architecture. Modern one-shot and differentiable methods achieve comparable results in 1-4 GPU-days.

3. Performance Estimation

Evaluating each candidate architecture is the bottleneck. Full training takes hours. Estimation shortcuts:

Early stopping — train for a few epochs, extrapolate final performance
Weight sharing — all candidates share weights from a single trained supernet
Predictor models — train a regression model that predicts architecture quality from its structure
Zero-cost proxies — estimate performance without any training (using gradient statistics at initialization)

Cell-Based Search

Instead of searching for an entire network, NAS finds a repeating cell — a small computational graph with ~7-10 operations:

Input → [Choose operations and connections] → Output

The full network stacks these cells. Two cell types are typical:

Normal cell — maintains spatial dimensions
Reduction cell — reduces spatial dimensions (like a pooling layer)

This approach dramatically shrinks the search space. Instead of searching over billions of possible full networks, you search over thousands of possible cell designs.

Key NAS-Designed Architectures

Architecture	Found By	Method	Impact
NASNet	Google Brain	RL + cell-based	First competitive NAS result
EfficientNet	Google	RL + compound scaling	State-of-art efficiency
DARTS cells	CMU	Differentiable	Made NAS accessible
MnasNet	Google	RL + latency target	Mobile-optimized
Once-for-All	MIT	Progressive shrinking	One training, many deployments

Hardware-Aware NAS

Modern NAS doesn’t just optimize accuracy — it targets a specific hardware budget:

Objective = Accuracy × (Latency / Target_Latency)^w

Where w is a penalty weight. If an architecture is too slow, its score drops even if it’s accurate. This produces networks that are Pareto-optimal: the best accuracy possible at a given speed.

MnasNet (used in MobileNet V3) was designed this way — the NAS algorithm searched for architectures that run in under 80ms on a Pixel phone.

Common Misconception

“NAS makes human architecture design obsolete.” Not yet. NAS works within human-defined search spaces — the human still decides what operations are available, how cells connect, and what constraints apply. A poorly designed search space produces poor results regardless of the search strategy. NAS is a powerful tool that augments human expertise; it doesn’t replace the intuition needed to define good building blocks.

When to Use NAS

Good fit: You have a clear performance target (accuracy, latency, model size), enough compute for the search (at least a few GPU-days), and a task where small architectural differences matter (competitive benchmarks, production deployment at scale).

Overkill: Prototyping, tasks where a standard architecture (ResNet, ViT) already works well enough, limited compute budget, or rapidly changing requirements where the search would be outdated before it finishes.

The one thing to remember: NAS automates architecture design by defining a search space of possible building blocks, using strategies like evolutionary search or differentiable optimization to explore candidates, and estimating performance efficiently — with hardware-aware methods producing networks specifically optimized for your target device’s speed and memory constraints.

pythonmachine-learningmodel-optimization