TensorFlow TensorBoard — Core Concepts
What TensorBoard Does
TensorBoard is a visualization toolkit for machine learning experiments. It reads log files generated during training and renders interactive dashboards in your web browser. You use it to:
- Track metrics (loss, accuracy) across training epochs
- Compare multiple training runs side by side
- Inspect model architecture
- Profile performance bottlenecks
- Visualize data samples and model predictions
Getting Started
TensorBoard works through a simple callback mechanism:
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir="./logs/experiment_01"
)
model.fit(
train_data,
epochs=50,
validation_data=val_data,
callbacks=[tensorboard_callback]
)
Then launch the dashboard:
tensorboard --logdir=./logs
# Open http://localhost:6006 in your browser
The Core Dashboards
Scalars — Metric Curves
The most-used dashboard. It plots numeric metrics over training steps or epochs:
- Training loss — Should decrease steadily. If it oscillates wildly, the learning rate is too high.
- Validation loss — Should follow training loss. If it starts increasing while training loss decreases, the model is overfitting.
- Accuracy — Shows learning progress in human-readable terms.
- Learning rate — Track scheduled changes to verify they happen as expected.
The scalars dashboard is where you diagnose the three most common training problems: learning rate too high (unstable loss), overfitting (validation divergence), and underfitting (both losses stay high).
Histograms — Weight Distributions
Shows how weight and gradient distributions change across training. Key signals:
| Pattern | Meaning |
|---|---|
| Weights collapsing to zero | Dying neurons, possible vanishing gradients |
| Gradients exploding | Unstable training, need gradient clipping |
| Bimodal weight distribution | Layer may have learned a binary decision |
| Static distributions | Layer is not learning, may be frozen unintentionally |
Images — Visual Samples
Log sample inputs, model predictions, or intermediate feature maps:
# Log images during training via custom callback
file_writer = tf.summary.create_file_writer("./logs/images")
with file_writer.as_default():
tf.summary.image("Training samples", sample_images, step=epoch)
Useful for computer vision tasks to verify that augmentations look correct and predictions make visual sense.
Graphs — Model Architecture
Visualizes the computation graph — layers, connections, and data flow. Helps verify that your model architecture matches your intention, especially for complex Functional API or subclassed models.
Profiler — Performance Analysis
The profiler shows:
- GPU utilization percentage
- Time breakdown by operation type
- Data pipeline bottlenecks (input vs. compute bound)
- Memory allocation timeline
This dashboard answers the critical question: “Is my GPU actually busy, or is it waiting for data?”
Comparing Experiments
TensorBoard’s killer feature is experiment comparison. Each run logged to a different subdirectory appears as a separate curve:
logs/
├── run_lr_001/ # Learning rate 0.01
├── run_lr_0001/ # Learning rate 0.001
└── run_lr_00001/ # Learning rate 0.0001
All three appear in the same scalars chart with different colors. You can instantly see which learning rate converges fastest, which overshoots, and which is too slow.
Custom Metrics and Logging
Beyond the automatic callback, you can log anything:
file_writer = tf.summary.create_file_writer("./logs/custom")
with file_writer.as_default():
tf.summary.scalar("custom/gradient_norm", grad_norm, step=step)
tf.summary.histogram("custom/activations", activations, step=step)
tf.summary.text("custom/config", config_string, step=0)
Log hyperparameters, evaluation results on specific test subsets, or any metric that helps you understand your experiment.
Common Misconception
“TensorBoard is only for TensorFlow.” While designed for TensorFlow, TensorBoard works with PyTorch (via torch.utils.tensorboard), JAX, and any framework that writes TensorBoard-compatible log files. The SummaryWriter format is an open standard. Many teams use TensorBoard regardless of their training framework.
Integration with Experiment Tracking
TensorBoard can be used standalone or alongside experiment tracking tools:
- TensorBoard.dev — Upload logs to a hosted, shareable dashboard (free, public)
- Weights & Biases — More features (artifact tracking, sweeps) but logs to their cloud
- MLflow — Open-source alternative with broader experiment management
For quick local experiments, plain TensorBoard is fastest. For team collaboration, pairing it with a tracking tool adds reproducibility and sharing capabilities.
The one thing to remember: TensorBoard turns opaque training runs into visual dashboards — use scalars to catch overfitting, the profiler to find bottlenecks, and experiment comparison to make better hyperparameter decisions.
See Also
- Python Pytorch Lightning Training How PyTorch Lightning removes the boring parts of training AI models so researchers can focus on ideas instead of boilerplate.
- Python Tensorflow Custom Layers How to teach TensorFlow new tricks by building your own custom layers — explained with a cookie cutter analogy.
- Python Tensorflow Data Pipelines How TensorFlow feeds data to your model without wasting time — explained like a restaurant kitchen that never stops cooking.
- Python Tensorflow Keras Api Why Keras is TensorFlow's friendly front door — and how it turns complex math into simple building blocks anyone can stack together.
- Python Tensorflow Model Optimization Why making a trained model smaller and faster matters — explained like packing a suitcase for a trip.