Neural Architecture Search with Python — Core Concepts

The Three Components of NAS

Every NAS system has three parts:

1. Search Space

The set of all possible architectures the algorithm can explore. A search space defines:

  • Available operations — convolution (3×3, 5×5), pooling, skip connections, attention layers
  • How operations connect — sequential, branching, parallel paths
  • Constraints — max layers, parameter budget, latency target

A larger search space has more potential but is harder to explore. Most practical NAS systems use a cell-based search space — find a good small building block (cell), then stack copies of it to build the full network.

2. Search Strategy

How the algorithm navigates the search space:

StrategyHow It WorksCompute CostQuality
Random SearchTry random architecturesBaselineSurprisingly good
Reinforcement LearningTrain a controller network that proposes architecturesVery highHigh
EvolutionaryMutate and recombine good architecturesHighHigh
Bayesian OptimizationBuild a model of architecture → performance mappingModerateGood
One-Shot / SupernetTrain one giant network containing all candidatesLowGood
Differentiable (DARTS)Make architecture choices continuous and differentiableLowGood

Early NAS (2017) used reinforcement learning and required 450 GPU-days to find a single architecture. Modern one-shot and differentiable methods achieve comparable results in 1-4 GPU-days.

3. Performance Estimation

Evaluating each candidate architecture is the bottleneck. Full training takes hours. Estimation shortcuts:

  • Early stopping — train for a few epochs, extrapolate final performance
  • Weight sharing — all candidates share weights from a single trained supernet
  • Predictor models — train a regression model that predicts architecture quality from its structure
  • Zero-cost proxies — estimate performance without any training (using gradient statistics at initialization)

Instead of searching for an entire network, NAS finds a repeating cell — a small computational graph with ~7-10 operations:

Input → [Choose operations and connections] → Output

The full network stacks these cells. Two cell types are typical:

  • Normal cell — maintains spatial dimensions
  • Reduction cell — reduces spatial dimensions (like a pooling layer)

This approach dramatically shrinks the search space. Instead of searching over billions of possible full networks, you search over thousands of possible cell designs.

Key NAS-Designed Architectures

ArchitectureFound ByMethodImpact
NASNetGoogle BrainRL + cell-basedFirst competitive NAS result
EfficientNetGoogleRL + compound scalingState-of-art efficiency
DARTS cellsCMUDifferentiableMade NAS accessible
MnasNetGoogleRL + latency targetMobile-optimized
Once-for-AllMITProgressive shrinkingOne training, many deployments

Hardware-Aware NAS

Modern NAS doesn’t just optimize accuracy — it targets a specific hardware budget:

Objective = Accuracy × (Latency / Target_Latency)^w

Where w is a penalty weight. If an architecture is too slow, its score drops even if it’s accurate. This produces networks that are Pareto-optimal: the best accuracy possible at a given speed.

MnasNet (used in MobileNet V3) was designed this way — the NAS algorithm searched for architectures that run in under 80ms on a Pixel phone.

Common Misconception

“NAS makes human architecture design obsolete.” Not yet. NAS works within human-defined search spaces — the human still decides what operations are available, how cells connect, and what constraints apply. A poorly designed search space produces poor results regardless of the search strategy. NAS is a powerful tool that augments human expertise; it doesn’t replace the intuition needed to define good building blocks.

When to Use NAS

Good fit: You have a clear performance target (accuracy, latency, model size), enough compute for the search (at least a few GPU-days), and a task where small architectural differences matter (competitive benchmarks, production deployment at scale).

Overkill: Prototyping, tasks where a standard architecture (ResNet, ViT) already works well enough, limited compute budget, or rapidly changing requirements where the search would be outdated before it finishes.

The one thing to remember: NAS automates architecture design by defining a search space of possible building blocks, using strategies like evolutionary search or differentiable optimization to explore candidates, and estimating performance efficiently — with hardware-aware methods producing networks specifically optimized for your target device’s speed and memory constraints.

pythonmachine-learningmodel-optimization

See Also