Quickstart

Install Varax Monitor and start monitoring every CronJob in your cluster in 60 seconds.

Prerequisites

  • A running Kubernetes cluster (v1.21+)
  • Helm v3 installed
  • kubectl configured to access your cluster
  • Prometheus installed in your cluster (e.g., via kube-prometheus-stack)

Install Varax Monitor

Add the Varax Helm repository and install the chart:

helm repo add varaxlabs https://charts.varax.io
helm repo update
helm install varax-monitor varaxlabs/varax-monitor

That’s it. Varax Monitor will automatically discover every CronJob in your cluster and start exporting Prometheus metrics.

Verify Installation

Check that the pod is running:

kubectl get pods -l app.kubernetes.io/name=varax-monitor

You should see output like:

NAME                             READY   STATUS    RESTARTS   AGE
varax-monitor-7f8b9c6d4f-x2k9p  1/1     Running   0          30s

Verify metrics are being exported:

kubectl port-forward svc/varax-monitor 9090:9090
curl http://localhost:9090/metrics | grep cronjob_

You should see metrics like cronjob_last_execution_status, cronjob_execution_total, and others for each CronJob in your cluster.

Import the Grafana Dashboard

If you’re running Grafana (included with kube-prometheus-stack), import the pre-built dashboard:

  1. Open Grafana in your browser
  2. Go to Dashboards > Import
  3. Enter dashboard ID: varax-monitor (or paste the JSON from the GitHub repo)
  4. Select your Prometheus data source
  5. Click Import

You’ll see a dashboard showing all CronJobs with execution history, success rates, and duration trends.

Set Up Alerts

Varax Monitor includes pre-configured alert rules for Prometheus AlertManager. Copy the alert rules into your AlertManager configuration:

groups:
  - name: varax-monitor
    rules:
      - alert: CronJobFailed
        expr: cronjob_last_execution_status == 0
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "CronJob {{ $labels.cronjob }} failed"
          description: "CronJob {{ $labels.cronjob }} in namespace {{ $labels.namespace }} has failed its last execution."

      - alert: CronJobMissedSchedule
        expr: increase(cronjob_missed_schedules_total[1h]) > 0
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "CronJob {{ $labels.cronjob }} missed schedule"
          description: "CronJob {{ $labels.cronjob }} in namespace {{ $labels.namespace }} has missed one or more scheduled executions."

      - alert: CronJobSlowExecution
        expr: cronjob_last_execution_duration_seconds > 300
        for: 0m
        labels:
          severity: info
        annotations:
          summary: "CronJob {{ $labels.cronjob }} running slowly"
          description: "CronJob {{ $labels.cronjob }} took {{ $value }}s to execute (threshold: 300s)."

Configuration Options

Varax Monitor works with zero configuration, but you can customize its behavior via Helm values:

# values.yaml
namespaces: []          # Monitor all namespaces (default) or specify a list
metricsPort: 9090       # Port for the metrics endpoint
logLevel: info          # Log verbosity: debug, info, warn, error
resources:
  requests:
    memory: 32Mi
    cpu: 10m
  limits:
    memory: 64Mi
    cpu: 50m

Install with custom values:

helm install varax-monitor varaxlabs/varax-monitor -f values.yaml

Next Steps