TensorFlow TensorBoard — Core Concepts

Navigate TensorBoard's dashboards — scalars, histograms, images, graphs, and profiler — to diagnose training problems and compare experiments.

What TensorBoard Does

TensorBoard is a visualization toolkit for machine learning experiments. It reads log files generated during training and renders interactive dashboards in your web browser. You use it to:

Track metrics (loss, accuracy) across training epochs
Compare multiple training runs side by side
Inspect model architecture
Profile performance bottlenecks
Visualize data samples and model predictions

Getting Started

TensorBoard works through a simple callback mechanism:

tensorboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir="./logs/experiment_01"
)

model.fit(
    train_data,
    epochs=50,
    validation_data=val_data,
    callbacks=[tensorboard_callback]
)

Then launch the dashboard:

tensorboard --logdir=./logs
# Open http://localhost:6006 in your browser

The Core Dashboards

Scalars — Metric Curves

The most-used dashboard. It plots numeric metrics over training steps or epochs:

Training loss — Should decrease steadily. If it oscillates wildly, the learning rate is too high.
Validation loss — Should follow training loss. If it starts increasing while training loss decreases, the model is overfitting.
Accuracy — Shows learning progress in human-readable terms.
Learning rate — Track scheduled changes to verify they happen as expected.

The scalars dashboard is where you diagnose the three most common training problems: learning rate too high (unstable loss), overfitting (validation divergence), and underfitting (both losses stay high).

Histograms — Weight Distributions

Shows how weight and gradient distributions change across training. Key signals:

Pattern	Meaning
Weights collapsing to zero	Dying neurons, possible vanishing gradients
Gradients exploding	Unstable training, need gradient clipping
Bimodal weight distribution	Layer may have learned a binary decision
Static distributions	Layer is not learning, may be frozen unintentionally

Images — Visual Samples

Log sample inputs, model predictions, or intermediate feature maps:

# Log images during training via custom callback
file_writer = tf.summary.create_file_writer("./logs/images")
with file_writer.as_default():
    tf.summary.image("Training samples", sample_images, step=epoch)

Useful for computer vision tasks to verify that augmentations look correct and predictions make visual sense.

Graphs — Model Architecture

Visualizes the computation graph — layers, connections, and data flow. Helps verify that your model architecture matches your intention, especially for complex Functional API or subclassed models.

Profiler — Performance Analysis

The profiler shows:

GPU utilization percentage
Time breakdown by operation type
Data pipeline bottlenecks (input vs. compute bound)
Memory allocation timeline

This dashboard answers the critical question: “Is my GPU actually busy, or is it waiting for data?”

Comparing Experiments

TensorBoard’s killer feature is experiment comparison. Each run logged to a different subdirectory appears as a separate curve:

logs/
├── run_lr_001/     # Learning rate 0.01
├── run_lr_0001/    # Learning rate 0.001
└── run_lr_00001/   # Learning rate 0.0001

All three appear in the same scalars chart with different colors. You can instantly see which learning rate converges fastest, which overshoots, and which is too slow.

Custom Metrics and Logging

Beyond the automatic callback, you can log anything:

file_writer = tf.summary.create_file_writer("./logs/custom")

with file_writer.as_default():
    tf.summary.scalar("custom/gradient_norm", grad_norm, step=step)
    tf.summary.histogram("custom/activations", activations, step=step)
    tf.summary.text("custom/config", config_string, step=0)

Log hyperparameters, evaluation results on specific test subsets, or any metric that helps you understand your experiment.

Common Misconception

“TensorBoard is only for TensorFlow.” While designed for TensorFlow, TensorBoard works with PyTorch (via torch.utils.tensorboard), JAX, and any framework that writes TensorBoard-compatible log files. The SummaryWriter format is an open standard. Many teams use TensorBoard regardless of their training framework.

Integration with Experiment Tracking

TensorBoard can be used standalone or alongside experiment tracking tools:

TensorBoard.dev — Upload logs to a hosted, shareable dashboard (free, public)
Weights & Biases — More features (artifact tracking, sweeps) but logs to their cloud
MLflow — Open-source alternative with broader experiment management

For quick local experiments, plain TensorBoard is fastest. For team collaboration, pairing it with a tracking tool adds reproducibility and sharing capabilities.

The one thing to remember: TensorBoard turns opaque training runs into visual dashboards — use scalars to catch overfitting, the profiler to find bottlenecks, and experiment comparison to make better hyperparameter decisions.

pythonmachine-learningtensorflowvisualization