Adaptive Learning Systems in Python — Deep Dive

Build adaptive learning engines with BKT, deep knowledge tracing, IRT, and bandit-based content selection in Python with full implementations.

Building an adaptive learning system requires combining student modeling, content selection, and a feedback loop that improves over time. This guide implements each component in Python and addresses the engineering challenges of production deployment.

Bayesian Knowledge Tracing

BKT models each skill as a two-state hidden Markov model. The hidden state is binary (mastered or not), and the observation is the student’s response (correct or incorrect).

from dataclasses import dataclass

@dataclass
class BKTParams:
    p_init: float = 0.3    # Prior probability of mastery
    p_learn: float = 0.1   # Probability of learning per opportunity
    p_guess: float = 0.2   # Probability of correct answer when unmastered
    p_slip: float = 0.1    # Probability of incorrect answer when mastered

def bkt_update(p_mastered: float, correct: bool, params: BKTParams) -> float:
    """Update mastery probability after observing a response."""
    if correct:
        # P(mastered | correct) via Bayes' theorem
        p_correct_given_mastered = 1 - params.p_slip
        p_correct_given_unmastered = params.p_guess
        p_correct = (p_mastered * p_correct_given_mastered +
                    (1 - p_mastered) * p_correct_given_unmastered)
        p_mastered_posterior = (p_mastered * p_correct_given_mastered) / p_correct
    else:
        p_incorrect_given_mastered = params.p_slip
        p_incorrect_given_unmastered = 1 - params.p_guess
        p_incorrect = (p_mastered * p_incorrect_given_mastered +
                      (1 - p_mastered) * p_incorrect_given_unmastered)
        p_mastered_posterior = (p_mastered * p_incorrect_given_mastered) / p_incorrect

    # Account for learning transition
    p_mastered_new = p_mastered_posterior + (1 - p_mastered_posterior) * params.p_learn
    return p_mastered_new

To fit BKT parameters from data, use expectation-maximization. The E-step computes expected mastery states given current parameters, and the M-step updates parameters to maximize the likelihood of observed responses.

import numpy as np

def fit_bkt_em(sequences: list[list[bool]], max_iter: int = 100, tol: float = 1e-4):
    """Fit BKT parameters using EM on multiple student response sequences."""
    params = BKTParams()

    for iteration in range(max_iter):
        # E-step: compute expected states
        expected_mastered = []
        expected_transitions = []
        log_likelihood = 0.0

        for seq in sequences:
            p_m = params.p_init
            states = []
            for correct in seq:
                p_m = bkt_update(p_m, correct, params)
                states.append(p_m)
                prob = (p_m * (1 - params.p_slip) + (1 - p_m) * params.p_guess
                        if correct else
                        p_m * params.p_slip + (1 - p_m) * (1 - params.p_guess))
                log_likelihood += np.log(max(prob, 1e-10))
            expected_mastered.append(states)

        # M-step: update parameters from expected states
        all_states = [s for seq in expected_mastered for s in seq]
        new_p_init = np.mean([seq[0] if seq else 0.3 for seq in expected_mastered])
        params.p_init = np.clip(new_p_init, 0.01, 0.99)

        if abs(log_likelihood) < tol:
            break

    return params

Deep Knowledge Tracing with PyTorch

DKT uses a recurrent neural network to predict future performance from the full interaction history:

import torch
import torch.nn as nn

class DKTModel(nn.Module):
    def __init__(self, num_skills: int, hidden_size: int = 128):
        super().__init__()
        # Input: one-hot encoding of (skill, correctness) pairs
        self.input_size = num_skills * 2
        self.lstm = nn.LSTM(self.input_size, hidden_size, batch_first=True)
        self.output = nn.Linear(hidden_size, num_skills)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        x: (batch, seq_len, num_skills * 2)
        Returns: (batch, seq_len, num_skills) — predicted P(correct) per skill
        """
        lstm_out, _ = self.lstm(x)
        logits = self.output(lstm_out)
        return self.sigmoid(logits)

def encode_interaction(skill_id: int, correct: bool, num_skills: int) -> list[float]:
    """One-hot encode a single interaction."""
    vec = [0.0] * (num_skills * 2)
    idx = skill_id + (num_skills if correct else 0)
    vec[idx] = 1.0
    return vec

Training uses binary cross-entropy loss between predicted and actual correctness on the next interaction. The key preprocessing step is grouping interactions by student and sorting by timestamp, then padding sequences to uniform length within each batch.

Item Response Theory Implementation

The 2PL IRT model estimates student ability (θ) and item parameters (difficulty b, discrimination a) jointly:

from scipy.optimize import minimize
from scipy.special import expit  # logistic function

def irt_2pl_probability(theta: float, a: float, b: float) -> float:
    """P(correct | ability, discrimination, difficulty)."""
    return expit(a * (theta - b))

def estimate_ability(responses: list[dict], items: dict[str, dict]) -> float:
    """
    Maximum likelihood estimation of student ability.
    responses: [{'item_id': str, 'correct': bool}, ...]
    items: {item_id: {'a': float, 'b': float}}
    """
    def neg_log_likelihood(theta):
        ll = 0.0
        for r in responses:
            item = items[r['item_id']]
            p = irt_2pl_probability(theta[0], item['a'], item['b'])
            p = max(1e-10, min(1 - 1e-10, p))
            ll += r['correct'] * np.log(p) + (1 - r['correct']) * np.log(1 - p)
        return -ll

    result = minimize(neg_log_likelihood, [0.0], method='L-BFGS-B',
                     bounds=[(-4, 4)])
    return result.x[0]

For computerized adaptive testing, the next item is selected to maximize Fisher information at the current ability estimate:

def fisher_information(theta: float, a: float, b: float) -> float:
    """Information this item provides about ability at theta."""
    p = irt_2pl_probability(theta, a, b)
    return a**2 * p * (1 - p)

def select_next_item(theta: float, available_items: dict, used_ids: set) -> str:
    """Select the most informative unused item."""
    best_id = None
    best_info = -1
    for item_id, params in available_items.items():
        if item_id in used_ids:
            continue
        info = fisher_information(theta, params['a'], params['b'])
        if info > best_info:
            best_info = info
            best_id = item_id
    return best_id

Bandit-Based Content Selection

When learning outcomes are uncertain, Thompson sampling explores different content items efficiently:

import numpy as np

class ThompsonContentSelector:
    """Select learning content using Thompson sampling."""

    def __init__(self, item_ids: list[str]):
        # Beta distribution parameters per item
        self.alpha = {item: 1.0 for item in item_ids}  # successes + 1
        self.beta = {item: 1.0 for item in item_ids}   # failures + 1

    def select(self, eligible_items: list[str]) -> str:
        """Sample from posterior and pick the item with highest expected reward."""
        samples = {
            item: np.random.beta(self.alpha[item], self.beta[item])
            for item in eligible_items
        }
        return max(samples, key=samples.get)

    def update(self, item_id: str, learning_gain: bool):
        """Update posterior after observing learning outcome."""
        if learning_gain:
            self.alpha[item_id] += 1
        else:
            self.beta[item_id] += 1

The “reward” can be defined as: did the student demonstrate mastery of the target skill within N interactions after this content? This turns content selection into an online optimization problem that adapts to student populations over time.

Putting It Together: Adaptive Engine

class AdaptiveEngine:
    def __init__(self, curriculum_graph: dict, content_catalog: list[dict]):
        self.graph = curriculum_graph  # skill -> [prerequisite skills]
        self.catalog = content_catalog  # [{id, skills_assessed, skills_taught, type}]
        self.student_models = {}  # user_id -> {skill: mastery_probability}

    def get_student_model(self, user_id: str) -> dict:
        if user_id not in self.student_models:
            self.student_models[user_id] = {
                skill: 0.3 for skill in self.graph
            }
        return self.student_models[user_id]

    def get_ready_skills(self, user_id: str) -> list[str]:
        """Skills whose prerequisites are mastered but skill itself is not."""
        model = self.get_student_model(user_id)
        ready = []
        for skill, prereqs in self.graph.items():
            if model[skill] >= 0.9:
                continue  # Already mastered
            if all(model.get(p, 0) >= 0.9 for p in prereqs):
                ready.append(skill)
        return ready

    def recommend(self, user_id: str, n: int = 5) -> list[dict]:
        """Recommend next content items."""
        model = self.get_student_model(user_id)
        ready_skills = self.get_ready_skills(user_id)

        scored_items = []
        for item in self.catalog:
            relevance = sum(
                1 - model.get(s, 0) for s in item['skills_taught']
                if s in ready_skills
            )
            if relevance > 0:
                scored_items.append((relevance, item))

        scored_items.sort(key=lambda x: -x[0])
        return [item for _, item in scored_items[:n]]

    def process_response(self, user_id: str, item_id: str, correct: bool):
        """Update student model after a response."""
        model = self.get_student_model(user_id)
        item = next(i for i in self.catalog if i['id'] == item_id)
        params = BKTParams()

        for skill in item['skills_assessed']:
            current = model.get(skill, 0.3)
            model[skill] = bkt_update(current, correct, params)

Production Considerations

Latency: Content selection must complete in under 100ms for interactive use. BKT and IRT updates are sub-millisecond. DKT inference takes 5-20ms on CPU for typical sequence lengths. Pre-compute candidate sets and cache student models in Redis.

A/B testing: Always run new algorithms against an existing baseline. The primary metric is learning efficiency: how many interactions does a student need to reach mastery? Secondary metrics include engagement (session length, return rate) and student satisfaction.

Cold start mitigation: For new students, use a short diagnostic quiz (5-10 IRT-selected items) to quickly estimate ability. For new content, use expert-provided difficulty estimates until sufficient response data accumulates.

Fairness: Monitor mastery rates across demographic groups. If the system consistently underestimates ability for certain groups, the item parameters or student model may carry bias from training data. Regular audits and parameter recalibration help.

The one thing to remember: An adaptive learning system is fundamentally two models working in concert — one that estimates what the student knows, and one that selects what they should encounter next — connected by a feedback loop that gets smarter with every interaction.

pythonadaptive-learningeducation-technologypersonalization