Neural Networks — Core Concepts

How layers of math can teach a computer to see, speak, and reason — the architecture behind GPT, image recognition, and AlphaGo explained without the PhD.

What Is a Neural Network?

A neural network is a type of machine learning model loosely inspired by how biological brains process information. It consists of many simple computing units — called neurons or nodes — organized into layers. Data flows through these layers, getting transformed at each step, until the network produces an output: a classification, a prediction, a generated image, or a piece of text.

The “learning” part happens during training: the network adjusts its internal settings (called weights) by repeatedly comparing its guesses to correct answers, shrinking the error over time.

Architecture: Layers, Neurons, and Weights

A typical neural network has three kinds of layers:

Layer	Role
Input layer	Receives raw data (pixel values, words, sensor readings)
Hidden layers	Transforms data through learned patterns
Output layer	Produces the final answer (probability, class, value)

Each neuron takes in several numbers, multiplies each by a weight, adds them up, and passes the result through an activation function — a mathematical curve that decides whether the neuron “fires” and how strongly. Common activation functions include ReLU (which simply zeroes out negative values) and sigmoid (which squashes values between 0 and 1).

The weights are the network’s “knowledge.” Untrained, they’re random noise. After training, they encode everything the network has learned about the problem.

How Training Works

Training is a feedback loop in four steps:

Forward pass — Data flows through the network; the output is a prediction.
Loss calculation — A loss function measures how wrong the prediction was (e.g., the difference between predicted price and actual price).
Backpropagation — The error signal flows backward through the network. Each weight gets a small adjustment proportional to how much it contributed to the error.
Repeat — This cycle runs for thousands or millions of examples. Each pass, the network edges closer to correct.

The adjustment process is called gradient descent: the network is always nudging itself toward lower error, like water finding the downhill path. With modern datasets, this can mean processing billions of examples over days of computation on specialized hardware.

Deep Networks: Why Depth Matters

The word “deep” in deep learning just means the network has many hidden layers — sometimes dozens or hundreds. Each layer learns increasingly abstract features.

For an image-recognition network:

Layer 1 might detect edges and basic contrasts
Layer 5 might detect shapes like circles or corners
Layer 15 might recognize specific textures like fur or brick
Final layers combine these into “this is a dog” vs. “this is a cat”

This hierarchical feature learning is why deep networks dramatically outperform shallow ones on tasks like speech recognition, image classification, and language understanding. A two-layer network has to work with raw pixels; a 50-layer network works with fur and ears.

Common Types

Convolutional Neural Networks (CNNs)

Designed for grid-like data (images, video). Instead of connecting every neuron to every other, CNNs use small filters that scan across the image looking for local patterns — edges, corners, shapes. Efficient and powerful for anything visual. Behind: face unlock on your phone, medical image analysis, autonomous vehicles.

Recurrent Neural Networks (RNNs) and LSTMs

Handle sequential data — text, audio, time series. Neurons feed their output back as input on the next step, giving the network a short-term memory of what came before. Long Short-Term Memory (LSTM) networks are an improved version that can remember context across longer sequences. Largely replaced by transformers for language tasks but still used in audio and forecasting.

Transformers

The architecture behind GPT, BERT, and most modern AI language tools. Instead of processing sequences step-by-step, transformers read the entire input at once and use an “attention” mechanism to weigh how relevant each part of the input is to every other part. This parallelism made training much faster, enabling the scale of models like GPT-4.

Common Misconception: Neural Networks “Think” Like Brains

The biological metaphor is useful for intuition but misleading at the detail level. Biological neurons have complex electrochemical dynamics; artificial neurons are just weighted sums with a simple function on top. Real brains don’t use backpropagation. The visual cortex doesn’t work quite like a CNN.

Neural networks are mathematical tools that were inspired by neuroscience — the way a submarine was inspired by fish. Submarines don’t swim; neural networks don’t think.

The Cost of Going Deep

Scale has a price. Training GPT-4 reportedly cost over $100 million in compute. Running large language models requires energy at a scale that has real climate implications. Small edge devices like phones still struggle to run large models in real time.

There’s also the problem of data: a network trained to recognize cats in North American domestic homes may struggle with cats photographed in different lighting, angles, or cultural contexts — because those weren’t in the training set.

One Thing to Remember

Neural networks learn by turning raw data into increasingly abstract patterns across many layers — and every layer’s knowledge is locked inside millions of numbers that even their creators can’t fully read.

techaineural-networksdeep-learning