Model transparency
model cardBETA
What goes into the probability-of-default score, and the honest limits of what it can do.
Intended use
AgentCredit produces a probability-of-default (PD), a 12-grade letter scale (A+ to F) anchored to S&P long-run corporate-default rates, and an AI-generated credit narrative for UK SMEs in the £100k–£10M total-assets band. It is designed for fast, automated credit decisions — credit triage, exposure-limit sizing, portfolio monitoring, and sales-ops qualification at scale.
Recommended human review when (a) the counterparty lands in grade C or weaker, (b) the facility size exceeds £250k, or (c) the counterparty's balance-sheet leverage is high while their profit performance is unknown. In these cases a credit officer should cross-check the counterparty's profitability and cash flow alongside the model output.
Mandatory dual review for any facility above £1M per standard UK SME lending practice. Not intended for regulated consumer credit (Consumer Credit Act 1974) or any decision with direct legal effect on the assessed party without an audit trail.
Training data
| Source | Records | What it contributes |
|---|---|---|
| UK Companies House bulk filings | 2,989,715 UK companies | Balance sheets + limited P&L + insolvency-derived default labels |
| UCI Polish bankruptcy dataset | 10,503 Polish firms | Full P&L + confirmed bankruptcy labels — the primary P&L training signal |
| The Gazette (UK) | ~5,000 notices/year | Insolvency notices providing default labels for CH matches |
| SEC EDGAR (US public filers) | Growing (weekly refresh) | Full P&L from 10-K filings — feeds the P&L specialist head |
| EBA Transparency Exercise | ~130 EU banks | Bank financials (used only for FI scoring endpoint) |
| EIOPA Solvency II | ~1,500 EU insurers | Insurance financials (FI scoring endpoint) |
| NayaOne synthetic UK SME | ~25,000 synthetic businesses | Covers income-statement feature gaps in UK CH data |
Approximately 12,000+ confirmed defaults across training data. Data refreshed daily, with full model retraining on a monthly cadence.
Algorithms
AgentCredit uses an ensemble of statistical credit models with adaptive weights:
- Monotone-spline logistic-regression scorecard — a machine-learning credit scorecard trained on UK SME defaulters from The Gazette insolvency notices, with non-defaulters matched on vintage and total-assets distribution. Each feature passes through a PCHIP monotonic-spline transform, sign-constrained to credit-officer priors, before entering a non-negative-coefficient logistic regression. Out-of-time AUC 0.78 on a true temporal hold-out; Kolmogorov-Smirnov 0.51. Population-prior anchored at UK SME 4.8% one-year default rate (Insolvency Service Dec-2025 commentary scaled to the active-full-accounts £100k–£10M segment).
- Altman Z-score — the validated multivariate discriminant model for corporate distress prediction (Altman 1968, Altman & Sabato 2007 SME refit). We use Z' for non-manufacturing and Z'' for private firms, with a startup adjustment for <3 years in business. Acts as a corroborating distress signal: a Z-score in the “distress zone” applies a 30% PD floor when corroborated by the scorecard or ML head.
- Heuristic ML overlay — a hand-coded SHAP-like signal that runs alongside the trained scorecard. Returns explainable per-feature contributions.
Ensemble weights adapt at scoring time based on data completeness (lower weight on the ML overlay when P&L is absent). The trained scorecard is the dominant signal when present; Altman Z provides a corroborating distress floor.
Adjustments on top of the ensemble
- Country risk adjustment — sovereign rating + OECD tier for 30+ countries.
- Industry risk adjustment — UK SIC 2007 section (A–U), 21 sectors with calibrated default rates.
- Data completeness adjustment — conservative PD uplift and wider confidence intervals when balance-sheet-only data is supplied.
- Loan-to-value / collateral adjustment — LGD derived from margin type and coverage ratio.
Peer benchmarks
Financial ratios are benchmarked against 110 populated peer buckets: 21 SIC sections + 88 divisions + 1 portfolio-wide fallback. Percentile rank (P10 / P25 / P50 / P75 / P90) returned with every assessment. Benchmarks computed from 2.97M UK company filings, regenerated on every full training run.
Known limitations
- UK micro-entity filings are balance-sheet-only (no P&L) for ~99.5% of Companies House records. The data-completeness grading flags this and the P&L specialist head is only used when P&L is present.
- Training data is ~99% UK + Polish. Model may be less reliable for non-UK, non-Polish SMEs until more regional data is ingested.
- Default labels from insolvency notices have a ~6–12 month lag. Very recent deteriorations may not be reflected.
- AgentCredit does not access bank-transaction data or HMRC filings. If you have richer signals (Open Banking, cashflow forecasts), feed them via the API — the ML head can use them.
- The AI narrative report is generated by Claude (Anthropic). It is a qualitative summary of the quantitative model output — it does not change the risk grade.
Performance
We validate the model at each training run and publish a qualitative performance summary with every monthly retrain. Detailed backtest reports (discrimination, calibration, default-rate tracking by grade) are available to Growth-tier customers on request. We commit to publishing a plain-language performance update on the blog each quarter.
What the model produces
AgentCredit produces a 24-month probability of default (PD), a 12-grade letter scale (A+ through F) anchored to S&P long-run corporate-default rates, and a calibrated risk rating for UK SMEs in the £100k–£10M total-assets range. Under the hood:
- Methodology: trained monotone-spline logistic-regression scorecard, with Altman Z-score as a corroborating distress signal and a heuristic ML overlay.
- Calibration anchor: 4.8% UK SME one-year default rate (Insolvency Service Dec-2025 commentary scaled to our deployed segment; 95% band [3.5%, 6.5%]).
- Grade scale: 12 PD-anchored letter grades, with thresholds cross-walked to S&P long-run rates (A+ < 0.1% PD ↔ S&P AA-equivalent, B-mid ↔ S&P BB, C-low ↔ S&P B, etc.).
Real-population grade distribution
On 200 randomly-sampled UK companies (full-accounts filers, £100k–£10M TA, not in the training set), the model produces this grade distribution:
| Grade band | Count | % of sample | Interpretation |
|---|---|---|---|
| A (very safe — PD < 1.5%) | 23 | 11.5% | Top decile of SMEs by balance-sheet health |
| B (safe to average — PD 1.5–4%) | 97 | 48.5% | Around population mean |
| C (elevated — PD 4–12%) | 70 | 35.0% | Above-average risk; human review recommended |
| D (high — PD 12–30%) | 8 | 4.0% | Distress markers present |
| E / F (distressed — PD ≥ 30%) | 2 | 1.0% | Z-distress override may apply |
How it’s trained
- Defaulter sample: UK SME insolvencies recovered via the Companies House API and cross-referenced with The Gazette insolvency notices. Filings sourced from the 6-to-24-month pre-insolvency window.
- Non-defaulter sample: UK SMEs in the same total-assets range (£100k–£10M) extracted from the Companies House iXBRL bulk data dump, stratified by vintage quarter and total-assets decile to match the defaulter distribution (eliminates both size and vintage confounding).
- Horizon: 24 months forward from the balance-sheet date.
- Out-of-time validation: AUC 0.78 on a true temporal hold-out (train on filings ≤ 2024-12-31, test on ≥ 2025-03-31 with a 3-month gap); Kolmogorov-Smirnov 0.51.
- Four automated guards must pass before any artifact ships: (a) composition balance — defaulter rate per TA decile within 0.5×–2× of population; (b) sign consistency — every feature's composite matches the credit-officer prior (more debt → higher risk; more equity → lower risk); (c) behavioural separation — synthetic distressed/healthy PD lift ≥ 2.5×; (d) out-of-time discrimination — AUC ≥ 0.65. Artifacts that fail any guard are not saved.
Where it has limits
- Balance-sheet-only training. The scorecard sees equity, leverage, and working capital — not profitability or cash flow. The model therefore over-penalises capital-intensive or thin-working-capital businesses that may be profitable. Profitable but high-leverage firms (e.g. a mature professional-services firm with a debt-funded partner buyout, generating 30%+ operating margin) will be graded conservatively. Customers should interpret the scorecard alongside qualitative judgment on profitability, market position, and management quality. A P&L-aware head is on the roadmap.
- Training scope: UK SMEs with total assets £100k–£10M. Out of this range — micro-entities below £100k TA and mid-caps above £10M — model output is advisory only. The scorecard was not calibrated on those populations.
- Single-vintage data. Defaulter training data covers 2024–2026 (post-COVID, post-energy-shock). A 2008-style financial-system shock or a different macro regime may not be captured. Multi-vintage retraining (2015–2026) is in progress.
- No CCJ or payment-behaviour signal. Credit bureaus include county-court judgments and supplier-payment data — we don't. We use Companies House charges register and director-change frequency as partial substitutes; these are in active integration.
- AUC confidence interval. OOT AUC 0.78 with 95% CI [0.71, 0.86] (counterparty-clustered bootstrap p5 = 0.73). The discrimination claim should be read as “AUC ≥ 0.70 on out-of-time validation,” not a point estimate.
- Fixed-pp policy floors. The Z-distress override applies a 30%/10% PD floor when an Altman Z “distress” signal is corroborated by the scorecard or ML head. These are policy statements, not statistical claims — “a counterparty in Altman distress surfaces as E-band regardless of statistical PD.”
- Recalibration triggers. We re-anchor (a) quarterly as new defaults emerge, (b) if the observed point-in-time default rate diverges >50% from 4.8% over a 12-month rolling window, or (c) if Population Stability Index drift on any input feature exceeds 0.25.
How it’s monitored
- SLO: re-validate quarterly.
- Tripwires: flag a model review if AUC drops below 0.65 OR mean predicted PD diverges >50% from realised default rate over 12 months.
- Version pinning: every score is stamped with the engine version (
scoring_engine_version) and grandfathered — historical scores are never silently re-graded. Users re-score explicitly when the engine bumps. - Regression suite: 5 golden borrowers locked at expected grade bands; any future engine change that drifts these trips a CI failure before merge.
Fairness & responsible use
AgentCredit operates on company financials, not personal attributes. No protected-characteristic data is used as input. That said, any credit-scoring system can reinforce historical patterns — we commit to reviewing scoring outputs by sector and region annually for unwarranted differential impact, and publishing findings.