Skip to content
Intelligence Suite · shipped in v1.5.0

Seven advisors.
One database fleet.

DBHelm turns raw telemetry into decisions: what to fix, what to upgrade, what to right-size, what's running hot right now. No extra agents. No PromQL. No cloud roundtrip.

7
Built-in advisors
0
Extra configuration
56
Supported engines
100%
Runs offline
Runbook automation · K8s-discovered DBs

Incident Playbooks

Your on-call runbooks, as code.

The problem

Every team has the same 3am incidents — high CPU, replication lag, connection exhaustion — and every team writes a private wiki page nobody reads at 3am.

What DBHelm does

Five built-in diagnostic playbooks that auto-run against any K8s-discovered database via kubectl exec, collect findings, rank by severity, and hand you an incident report with recommendations and a session-scoped execution history.

  • Five pre-built playbooks

    High CPU, Replication Lag, Connection Exhaustion, Disk Space, Slow Queries — ready on day one. Direct-connect DBs not supported today (kubectl exec required).

  • Auto-run diagnostic steps

    Each playbook executes a fixed sequence of checks inside the pod — no click-through, no copy-paste.

  • Severity-ranked findings

    Critical / Warning / Info, with root-cause hints from the actual engine output.

  • Execution history

    Every run is stored in-process with timestamp, database, outcome, and findings. Session-scoped today; persistent audit trail is on the roadmap.

DBHelm · Incident Playbooks
< 60s
Playbook
Replication Lag · postgres-prod-02
Running · step 4/6
Check streaming replication status
walsender active
Measure current lag in WAL bytes
48.2 MB behind
Verify replication slot integrity
active, flush_lsn caught up
Inspect recent error logs
3 "could not receive" in last 10m
Check network path to primary
Summarize & recommend
Failover simulation · K8s-discovered DBs

DR Testing

Can you actually fail over? Score it.

The problem

"We have backups" is not a DR plan. Most teams discover at the wrong moment that backups are stale, replicas are lagging, or nobody knows which host is the primary.

What DBHelm does

A readiness score from 0 to 100 for every K8s-discovered database, fed by backup freshness, replica health, and failover configuration — plus a dry-run failover simulation that proves the plan actually works. Honest about the math: RPO is derived from snapshot age, RTO is heuristic from replica count + snapshot size.

  • DR readiness score (0–100)

    A single number per database, rolled up by cluster and fleet — driven by backup freshness, replica count, and probe success.

  • RPO from snapshot age

    Computed as time since the most recent successful backup / snapshot. Not engine-level transactional RPO.

  • RTO heuristic from topology

    ~120s when replicas ≥ 2, otherwise size-based formula or 3600s default. Not measured failover; the dry-run simulation surfaces real readiness.

  • Failover simulation (dry-run)

    Run the failover plan without touching production; get a pass/fail and the exact steps that would execute.

DBHelm · DR Testing
RPO/RTO
DR readiness
postgres-prod-01
Ready
82
of 100
RPO 6m 12s
RTO (measured) < 90s
Last failover sim 2d ago · pass
Backup freshness 95%
Replica health 88%
Failover config 70%
Cross-region 45%
Stop paying for idle pods · K8s-discovered DBs

Right-Sizing Advisor

Find the money you're leaving on the table.

The problem

Everyone "rightsized" during a migration once. Two years later, half the fleet is over-provisioned and a quarter is being throttled — but nobody can prove which pods, so nobody changes anything.

What DBHelm does

Per-pod CPU and memory analysis: requests vs actual usage from the K8s metrics-server, with a concrete recommendation per pod. Pair with the /cost dashboard to translate the waste percentage into a monthly dollar figure using built-in cloud rate cards.

  • Fleet efficiency score

    One number for your whole fleet — weighted CPU + memory utilization across all K8s-discovered DBs.

  • CPU & memory waste %

    See exactly which pods are overprovisioned, by how much, and which are being throttled.

  • Per-pod recommendations

    Exact request/limit values to set — copy straight into your manifest.

  • $ math via /cost dashboard

    The advisor returns waste in millicores + bytes; the cost dashboard converts to monthly $ using the AWS/GCP/Azure rate cards.

DBHelm · Right-Sizing Advisor
Waste %
Fleet efficiency
62% utilized · 4 over-provisioned
Est. savings
$384/mo
Pod
CPU req / used
Mem req / used
Action
mongo-prod-0
2000m / 310m
4Gi / 1.2Gi
over
mongo-prod-1
2000m / 820m
4Gi / 2.8Gi
ok
redis-cache-0
500m / 120m
1Gi / 340Mi
over
pg-analytics-0
1000m / 1190m
8Gi / 7.4Gi
throttled
kafka-broker-2
1500m / 980m
6Gi / 4.1Gi
ok
Recommendation: drop mongo-prod-0 cpu request to 500m — save $94/mo.
Policy-as-code for data · 7 checks today

Compliance Engine

Prove the fleet meets the policy.

The problem

"All our databases use SSL" is an assertion until you can show it. Audit season turns into a three-week scavenger hunt across clusters, consoles, and tribal knowledge.

What DBHelm does

Define policies as code from the seven built-in check types and evaluate against the fleet. Per-DB pass/fail with severity. Honest about scope: policies are session-scoped today, persistent storage + a wider check catalog (encryption-at-rest, audit-log-enabled, framework tags) are on the roadmap.

  • Custom policy builder

    Compose rules from the seven built-in check types; tag by severity. Framework tags (SOC2 / HIPAA) on the roadmap.

  • Seven built-in check types

    backup_configured, alerting_configured, replica_count, ssl_enabled, max_connections, resource_limits, password_policy. Encryption-at-rest, audit-log-enabled, version-min are on the roadmap.

  • Severity levels

    Critical, high, medium, low — with a per-policy and per-database view.

  • Per-DB pass/fail reports

    Exportable, timestamped. Persistent policy storage (today: in-memory per session) is on the roadmap.

DBHelm · Compliance Engine
7 checks
Compliance report
SOC2 · prod fleet
Pass rate
17/20
Policy
pg-01
pg-02
mongo
redis
kafka
TLS required
Auth enabled
Encryption at rest
Backup within 24h
Version supported
No public exposure
Same-engine version upgrades without surprises

Upgrade Advisor

Know what's about to be end-of-life.

The problem

Postgres 11 goes EOL. Someone has to know. Someone has to map every 11.x database in the fleet. Someone has to check every app for breaking changes. That someone writes it in a spreadsheet that's out of date two weeks later.

What DBHelm does

A live EOL radar across the fleet. For every database, see current version, EOL date, recommended upgrade path, and a pre-flight check that flags breaking changes and deprecated features before you touch a pod.

  • EOL & nearing-EOL detection

    Every supported engine mapped to its official EOL calendar.

  • Upgrade path recommendations

    Target version per database, with notes on whether it needs an in-place upgrade or a dump-and-restore.

  • Pre-flight compatibility checks

    Breaking changes, deprecated features, driver incompatibilities — surfaced before the maintenance window.

  • Breaking-change catalog

    Curated per engine and version — not a link to the release notes.

DBHelm · Upgrade Advisor
EOL radar
EOL radar
2 critical · 3 nearing
Action required
pg-reporting-01
PostgreSQL
11.19
16.x
critical
mysql-legacy-02
MySQL
5.7.41
8.0
critical
mongo-prod
MongoDB
4.4.29
7.0
high
redis-cache
Redis
6.2.13
7.2
medium
pg-app
PostgreSQL
15.5
ok
See the wall before you hit it · K8s for live metrics

Capacity Forecasting

"Days until full" — for every database, every day.

The problem

Disk-full alerts fire on a Saturday night. Connection-cap alerts fire during a launch. Capacity problems are predictable; you just have to actually look at the trend.

What DBHelm does

Linear-regression projections for storage, memory, and connections — computed on a 5-minute cache from the K8s metrics-server + engine queries. Every database gets a days-until-full estimate, a growth rate, and an urgency tier. Trend history is session-scoped per process today; persistent history is on the roadmap.

  • Storage, memory & connections

    Three dimensions tracked per database, not just disk.

  • Days-until-full per metric

    Linear regression on the in-process trend window. You fix the ones under 30 days; you plan the ones under 90.

  • Growth rate per day

    Slope of the regression. Useful for capacity planning and cost forecasts.

  • Urgency tiers

    Critical / warning / healthy — with a fleet-level summary. Direct-connect DBs need K8s metrics-server access for full forecasts.

DBHelm · Capacity Forecasting
Forecast
Storage forecast
pg-events-01 · 18 days until full
Warning
100% 75% 50% 25% FULL today +18d
Growth
+2.1 GB/d
Current
62%
Until full
18 days
PostgreSQL · MySQL · MongoDB · Redis · K8s only

Transaction Tracer

The first place to look when it's slow right now.

The problem

"The database is slow" — and now you need to know who's connected, what they're running, how long it's been running, and whether anyone is blocked. Ten different tools, four different engines.

What DBHelm does

A unified live view of active queries, slow queries, and client connections — across PostgreSQL, MySQL, MongoDB, and Redis. Lock contention is full for PG + MySQL (waits and blocked-by graph), aggregate global stats for MongoDB, and not exposed for Redis (single-threaded — there is no lock graph to show). Auto-refreshing, filterable, K8s-discovered DBs only.

  • Active queries & PIDs

    Who's connected, what they're running, how long it's been running. All four engines.

  • Slow-query stream

    Live feed of queries over a threshold you set per engine. PG / MySQL slow log + Mongo currentOp + Redis SLOWLOG.

  • Lock contention graph (PG + MySQL)

    Who's blocking whom — with the blocking statement surfaced. MongoDB shows global lock stats; Redis is single-threaded so there is no lock graph.

  • Per-connection state

    Idle, active, waiting — with app name, client IP, and session duration. All four engines.

DBHelm · Transaction Tracer
Real-time
Active sessions
pg-orders-01 · 14 active · 2 blocked
Live
PID
Query
State
Dur
28194
UPDATE orders SET status=$1 WHERE id=$2
blocked
42s
28193
SELECT … FROM orders o JOIN line_items li…
active
0.8s
28188
BEGIN; SELECT FOR UPDATE orders WHERE id=$1
idle-in-tx
2m14s
28191
INSERT INTO events (…) VALUES ($1,$2,…)
active
0.1s
28170
VACUUM ANALYZE orders
active
18s
Lock: PID 28194 is blocked by 28188 (idle-in-tx for 2m14s).

Why this matters

Zero configuration

Every tool activates automatically when you connect a database. No agents to install. No scrape configs. No PromQL to write. Connect a cluster — all eight are on.

Engine-aware

Every recommendation understands the engine — Galera wsrep, MongoDB oplog, Patroni HA, Raft consensus, Kafka consumer-group lag — not just generic pod CPU and memory.

100% offline

All analysis runs locally on your machine. No telemetry. No cloud roundtrip. No procurement ticket. Your database credentials never leave your laptop.

Connect a cluster. All seven advisors are on.

DBHelm is a free desktop app — macOS, Windows, and Linux. Install it, point it at a kubeconfig or a database endpoint, and the Intelligence suite works on minute one.