Prometheus Metrics for Kubernetes CronJobs: A Complete Guide

Prometheus is the standard for Kubernetes monitoring, but CronJob observability is one of its weak spots out of the box. This guide covers everything you need to know about monitoring CronJobs with Prometheus — from the built-in metrics to custom exporters.

What kube-state-metrics Gives You

If you’re running Prometheus on Kubernetes, you almost certainly have kube-state-metrics installed. It provides these CronJob-related metrics:

CronJob Metrics

Metric	Type	Description
`kube_cronjob_info`	Gauge	Information about a CronJob (schedule, suspend status)
`kube_cronjob_labels`	Gauge	CronJob labels
`kube_cronjob_created`	Gauge	Creation timestamp
`kube_cronjob_next_schedule_time`	Gauge	Next scheduled run (Unix timestamp)
`kube_cronjob_status_active`	Gauge	Number of currently running jobs
`kube_cronjob_status_last_schedule_time`	Gauge	Last scheduled time (not last success!)
`kube_cronjob_spec_suspend`	Gauge	Whether the CronJob is suspended

Job Metrics

Metric	Type	Description
`kube_job_info`	Gauge	Information about a Job
`kube_job_owner`	Gauge	Job’s owner reference (links to CronJob)
`kube_job_status_succeeded`	Gauge	Number of succeeded pods
`kube_job_status_failed`	Gauge	Number of failed pods
`kube_job_status_active`	Gauge	Number of active pods
`kube_job_complete`	Gauge	Whether the job completed
`kube_job_status_completion_time`	Gauge	Completion timestamp
`kube_job_status_start_time`	Gauge	Start timestamp

Useful PromQL Queries

Last Execution Status per CronJob

This requires joining CronJob and Job metrics, which is the biggest pain point:

# Check if the most recent job for each CronJob succeeded
kube_job_status_succeeded{job_name=~".*"}
  * on(job_name) group_left(owner_name)
  kube_job_owner{owner_name=~".*", owner_kind="CronJob"}

CronJob Execution Duration

# Duration of the last completed job
(kube_job_status_completion_time - kube_job_status_start_time)
  * on(job_name) group_left(owner_name)
  kube_job_owner{owner_kind="CronJob"}

Detect Missed Schedules

# CronJobs where the last schedule time is older than expected
time() - kube_cronjob_status_last_schedule_time > 86400

This is approximate — it compares current time to last schedule, but doesn’t account for the CronJob’s actual schedule interval.

Failed CronJobs in the Last Hour

# Jobs owned by CronJobs that have failed pods
kube_job_status_failed > 0
  unless on(job_name) (kube_job_status_succeeded > 0)
  * on(job_name) group_left(owner_name)
  kube_job_owner{owner_kind="CronJob"}

The Gaps in kube-state-metrics

While kube-state-metrics provides raw data, it has significant limitations for CronJob monitoring:

No direct CronJob success/failure metric — you have to join across Job and CronJob metrics
No execution counter — you can’t easily count total executions over time
No duration tracking — calculating duration requires joining start and completion times
No true missed schedule detection — you can approximate it, but it’s unreliable
High cardinality — Job names include timestamps (e.g., nightly-backup-28486760), which creates new time series constantly

Purpose-Built CronJob Metrics with Varax Monitor

Varax Monitor was built to fill exactly these gaps. It watches CronJob executions directly and exports clean, purpose-built metrics:

Metric	Type	Description
`cronjob_last_execution_status`	Gauge	1=success, 0=failure per CronJob
`cronjob_last_execution_duration_seconds`	Gauge	Duration of last execution
`cronjob_execution_total`	Counter	Total executions by success/failure
`cronjob_missed_schedules_total`	Counter	Total missed schedules
`cronjob_next_schedule_time`	Gauge	Next expected execution time
`cronjob_is_suspended`	Gauge	Suspension status

These metrics are labeled by CronJob name and namespace — no complex joins required.

Example: Alert on Any Failed CronJob

With kube-state-metrics:

# Complex multi-metric join
kube_job_status_failed > 0
  unless on(job_name) (kube_job_status_succeeded > 0)
  * on(job_name) group_left(owner_name)
  kube_job_owner{owner_kind="CronJob"}

With Varax Monitor:

# Direct, simple query
cronjob_last_execution_status == 0

Example: CronJobs Running Longer Than Usual

With kube-state-metrics:

# Requires multiple joins and history comparison
(kube_job_status_completion_time - kube_job_status_start_time)
  * on(job_name) group_left(owner_name)
  kube_job_owner{owner_kind="CronJob"}
  > 300

With Varax Monitor:

cronjob_last_execution_duration_seconds > 300

Setting Up a Complete Dashboard

Whether you use kube-state-metrics alone or Varax Monitor, your Grafana dashboard should include:

Overview table — all CronJobs with last status, last run time, and next scheduled time
Failure timeline — when and which CronJobs have failed
Duration trends — execution time over the last 7 days per CronJob
Missed schedule alerts — any CronJobs that didn’t fire when expected
Active jobs — currently running CronJob executions

Varax Monitor includes a pre-built Grafana dashboard with all of these panels. Install it with one command and customize as needed.

Getting Started

If you want the simplest path to CronJob observability:

helm repo add varaxlabs https://charts.varax.io
helm install varax-monitor varaxlabs/varax-monitor

It’s free, open-source (Apache 2.0), and deploys in under 60 seconds. Read the full quickstart.