Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Debugging

Dynamic programming models are complex, and most computation happens inside JIT-compiled functions. This page covers practical strategies for diagnosing problems.

Disable JIT for readable tracebacks

By default, pylcm JIT-compiles internal functions for performance. When something goes wrong inside a JIT-compiled function, the traceback is often unhelpful. Disable JIT at model creation time to get standard Python tracebacks:

model = Model(
    regimes={...},
    ages=ages,
    regime_id_class=RegimeId,
    enable_jit=False,  # readable tracebacks, but slower
)

This does not affect correctness --- the same functions run, just without compilation. Re-enable JIT once the issue is resolved.

Log levels

The log_level parameter controls both console output and disk persistence:

LevelOutputPersistence
"off"Nothing (good for HPC batch jobs)No
"warning"NaN/Inf warnings in value functionsNo
"progress" (default)Progress and timing per period, total elapsed timeNo
"debug"All above + V_arr statistics per regime, regime transition countsYes, requires log_path
# Silent — no console output at all
V_arr_dict = model.solve(params, log_level="off")

# Warnings only — alerts on NaN/Inf but no progress output
V_arr_dict = model.solve(params, log_level="warning")

# Progress (default) — timing per period
V_arr_dict = model.solve(params)  # log_level="progress"

# Debug — full diagnostics + snapshot persistence
V_arr_dict = model.solve(params, log_level="debug", log_path="./debug/")

Using log_level="debug" without providing log_path raises a ValueError.

Debug snapshots

When log_level="debug" and log_path is provided, pylcm saves a snapshot directory containing all inputs and outputs. This lets you reconstruct a failed run on a different machine.

What’s saved

Each snapshot is a directory (e.g. solve_snapshot_001/) containing:

FileContents
arrays.h5Value function arrays in HDF5 (datasets at /V_arr/{period}/{regime})
model.pklThe Model instance (cloudpickle)
params.pklUser parameters (cloudpickle)
initial_states.pklInitial state arrays (simulate/solve_and_simulate only)
initial_regimes.pklInitial regime assignments (simulate/solve_and_simulate only)
result.pklSimulationResult (simulate/solve_and_simulate only)
metadata.jsonSnapshot type, platform string, field manifest
pixi.lockLock file from the project root
pyproject.tomlProject file from the project root
REPRODUCE.mdStep-by-step reconstruction recipe

Creating snapshots

# Solve snapshot
V_arr_dict = model.solve(
    params, log_level="debug", log_path="./debug/"
)
# Creates: ./debug/solve_snapshot_001/

# Simulate snapshot
result = model.simulate(
    params, initial_states, initial_regimes, V_arr_dict,
    log_level="debug", log_path="./debug/",
)
# Creates: ./debug/simulate_snapshot_001/

# Solve-and-simulate snapshot (includes everything)
result = model.solve_and_simulate(
    params, initial_states, initial_regimes,
    log_level="debug", log_path="./debug/",
)
# Creates: ./debug/solve_and_simulate_snapshot_001/

Loading snapshots

from lcm import load_snapshot

# Load the full snapshot
snapshot = load_snapshot("./debug/solve_snapshot_001")
snapshot.model       # the Model instance
snapshot.params      # the user parameters
snapshot.V_arr_dict  # value function arrays (loaded from HDF5)

# Re-run the solve to reproduce the result
V_arr_dict = snapshot.model.solve(snapshot.params)

For large snapshots, skip fields you don’t need:

# Load without the (potentially large) value function arrays
snapshot = load_snapshot("./debug/solve_snapshot_001", exclude=["V_arr_dict"])
snapshot.V_arr_dict  # None
snapshot.model       # still available

Platform mismatch

Each snapshot records the platform it was created on (e.g. x86_64-Linux). When loading on a different platform, a warning is emitted:

WARNING  Snapshot created on x86_64-Linux but loading on arm64-Darwin
         — environment may not match

To reproduce the environment exactly, use the bundled lock file:

cp ./debug/solve_snapshot_001/pixi.lock .
cp ./debug/solve_snapshot_001/pyproject.toml .
pixi install --frozen

Snapshot retention

Snapshots accumulate when running inside an optimization loop. The log_keep_n_latest parameter (default 3) limits how many snapshot directories are kept per type:

V_arr_dict = model.solve(
    params, log_level="debug", log_path="./debug/", log_keep_n_latest=5
)

After each write, the oldest directories beyond the limit are deleted automatically.

Recipe: Debugging NaN in parameter estimation with optimagic

A common scenario: you are estimating model parameters with optimagic, and at some iteration the criterion function returns NaN. Here is how to diagnose the problem.

1. Enable optimagic logging

import optimagic as om

result = om.minimize(
    fun=criterion,
    params=start_params,
    algorithm="scipy_lbfgsb",
    logging="my_log.db",
)

2. Find the problematic parameters

reader = om.SQLiteLogReader("my_log.db")
history = reader.read_history()

# history["fun"] contains criterion values, history["params"] the parameter vectors
import numpy as np

fun_values = history["fun"]
nan_mask = np.isnan(fun_values)
if nan_mask.any():
    first_nan_idx = np.argmax(nan_mask)
    bad_params = history["params"].iloc[first_nan_idx]
    print(f"First NaN at iteration {first_nan_idx}")
    print(f"Parameters: {bad_params}")

3. Re-run with JIT disabled

# Re-create the model without JIT
model = Model(
    regimes={...},
    ages=ages,
    regime_id_class=RegimeId,
    enable_jit=False,
)

# Call solve with the bad parameters --- the traceback will be readable
V_arr_dict = model.solve(bad_params)

The traceback now points to the exact line in your user-defined functions where the NaN originates.

Inspecting value function arrays

The solution V_arr_dict is a nested mapping: period -> regime_name -> array. You can iterate over it to check shapes, look for NaN/inf, or plot slices:

import jax.numpy as jnp
import plotly.graph_objects as go
from plotly.subplots import make_subplots

V_arr_dict = model.solve(params)

# Check for issues
for period, regimes in V_arr_dict.items():
    for regime_name, V_arr in regimes.items():
        n_nan = int(jnp.sum(jnp.isnan(V_arr)))
        n_inf = int(jnp.sum(jnp.isinf(V_arr)))
        if n_nan > 0 or n_inf > 0:
            print(f"Period {period}, regime '{regime_name}': "
                  f"shape={V_arr.shape}, NaN={n_nan}, Inf={n_inf}")

# Plot a 1D slice (e.g. value over wealth grid for first period)
period = 0
regime_name = "working"
V_arr = V_arr_dict[period][regime_name]

fig = go.Figure()
fig.add_trace(go.Scatter(y=V_arr.tolist(), mode="lines", name="V(wealth)"))
fig.update_layout(title=f"Value function, period {period}, regime '{regime_name}'")
fig.show()

Understanding error messages

pylcm raises specific exceptions to help you diagnose problems: