Skip to content
GreenKube

Recommendations

import { Card, CardGrid } from ‘@astrojs/starlight/components’;

GreenKube analyzes your cluster metrics to generate actionable recommendations that reduce both costs and carbon emissions.

The recommendation engine reads stored metrics over a configurable lookback window controlled by RECOMMENDATION_LOOKBACK_DAYS (default: 7 days) and applies threshold-based detection algorithms. Results are deduplicated at the Deployment level — pods belonging to the same Deployment are grouped so you see one recommendation per workload, not one per replica.

Each recommendation includes:

  • Type — one of 9 detection categories
  • Priorityhigh, medium, or low
  • Scopepod, workload, namespace, or node
  • Projected annual savingspotential_savings_co2e_grams and potential_savings_cost

When the API or startup scan runs, metrics from namespaces that no longer exist in Kubernetes are filtered out when the cluster API is reachable. That lets GreenKube reconcile old active recommendations as stale instead of regenerating them forever.

Identifies pods consuming resources but producing little or no value — idle workloads that should be terminated. Finds pods with CPU or memory requests significantly higher than actual usage, suggesting smaller resource allocations. Detects workloads with high usage variability that would benefit from Horizontal Pod Autoscalers. Identifies workloads that could be time-shifted to periods of lower grid carbon intensity. Spots namespaces with minimal activity that may contain forgotten resources consuming energy and cost. Suggests scaling down workloads during off-peak hours with a generated CronJob schedule. Identifies nodes with far more capacity than their scheduled pods require. Flags nodes running at very low CPU and memory utilization — consolidation candidates.

What: Pods that are running and consuming resources but show near-zero energy usage.

Detection:

  • Energy consumption < ZOMBIE_ENERGY_THRESHOLD
  • Cost > ZOMBIE_COST_THRESHOLD (default: $0.01)

Scope: pod


What: Pods with CPU requests significantly higher than actual utilization.

Detection:

  • Average CPU utilization < RIGHTSIZING_CPU_THRESHOLD × CPU request (default: 30%)
  • Recommendation uses P95 usage, observed max, average usage, and a headroom multiplier for safe reductions
  • Only reductions are surfaced — recommendations that would increase a request are discarded

Scope: workload (grouped per Deployment)


📏 Rightsizing Memory (RIGHTSIZING_MEMORY)

Section titled “📏 Rightsizing Memory (RIGHTSIZING_MEMORY)”

What: Pods with memory requests significantly higher than actual utilization.

Detection:

  • Average memory utilization < RIGHTSIZING_MEMORY_THRESHOLD × memory request (default: 30%)

Scope: workload (grouped per Deployment)


📈 Autoscaling Candidates (AUTOSCALING_CANDIDATE)

Section titled “📈 Autoscaling Candidates (AUTOSCALING_CANDIDATE)”

What: Workloads with high CPU usage variability that would benefit from autoscaling.

Detection:

  • Coefficient of Variation > AUTOSCALING_CV_THRESHOLD (default: 0.7)
  • Max/min ratio > AUTOSCALING_SPIKE_RATIO (default: 3.0)
  • No existing HPA detected

Scope: workload


🌍 Carbon-Aware Scheduling (CARBON_AWARE_SCHEDULING)

Section titled “🌍 Carbon-Aware Scheduling (CARBON_AWARE_SCHEDULING)”

What: Workloads running during high-carbon-intensity windows that could be shifted to cleaner periods.

Detection: Grid intensity > zone average × CARBON_AWARE_THRESHOLD (default: 1.5×)

Scope: pod / workload


🗂️ Idle Namespace Cleanup (IDLE_NAMESPACE)

Section titled “🗂️ Idle Namespace Cleanup (IDLE_NAMESPACE)”

What: Namespaces with minimal energy consumption.

Detection: Total namespace energy < IDLE_NAMESPACE_ENERGY_THRESHOLD (default: 1,000 J)

Scope: namespace


What: Workloads with sustained idle periods during consistent time windows.

Detection: Idle period >= OFF_PEAK_MIN_IDLE_HOURS (default: 4h) at consistent hours, below OFF_PEAK_IDLE_THRESHOLD of the daily peak.

Output: Suggested CronJob/KEDA scale-down + scale-up schedule.

Scope: workload (grouped per Deployment)


🖥️ Overprovisioned Node (OVERPROVISIONED_NODE)

Section titled “🖥️ Overprovisioned Node (OVERPROVISIONED_NODE)”

What: Nodes with far more capacity than their scheduled workloads require.

Detection: Node CPU utilization < NODE_UTILIZATION_THRESHOLD (default: 20%)

Scope: node


💤 Underutilized Node (UNDERUTILIZED_NODE)

Section titled “💤 Underutilized Node (UNDERUTILIZED_NODE)”

What: Nodes running at very low CPU and memory utilization — consolidation candidates.

Detection: Node has fewer than 3 pods and average CPU utilization below 15%.

Scope: node


Each recommendation is persisted in the database and reconciled across scans.

active -> applied
|
+-> ignored -> active
|
+-> stale
StatusMeaning
activeCurrent valid recommendation, shown in active lists, top recommendations, Prometheus active gauges, Grafana cards, and the frontend Active tab
appliedRecommendation explicitly implemented through the API; excluded from active lists and included in realized savings
ignoredRecommendation intentionally hidden with an audit reason; preserved for review and can be restored
stalePreviously active recommendation that no longer appears in the latest generated set

On the /recommendations page:

  • Active tab — current actionable recommendations with type filter and annual savings summary
  • Ignored tab — previously ignored recommendations with restore action
  • Realized Savings tab — applied recommendations and cumulative realized savings
  • Ignore flow — ignore requires a reason; restore is available from the Ignored tab

The current frontend does not expose an Apply button yet, even though the API supports applying recommendations.

Terminal window
# Mark a recommendation as applied
PATCH /api/v1/recommendations/{id}/apply
# Permanently ignore with a reason
PATCH /api/v1/recommendations/{id}/ignore
# Restore an ignored recommendation
DELETE /api/v1/recommendations/{id}/ignore
# Get active recommendations (optionally trigger a live refresh)
GET /api/v1/recommendations/active?refresh=true
# Get applied recommendations
GET /api/v1/recommendations/applied
# Get ranked top recommendations
GET /api/v1/recommendations/top?limit=5&metric=co2
# Get savings summary
GET /api/v1/recommendations/savings

When a recommendation is marked applied, GreenKube estimates realized annual savings on the recommendation row, then the SavingsAttributor converts those annual values into per-period ledger rows. Over time this accumulates into:

  • greenkube_co2e_savings_attributed_grams_total — cumulative CO₂e savings (Prometheus gauge)
  • greenkube_cost_savings_attributed_dollars_total — cumulative cost savings (Prometheus gauge)

The savings ledger is visible in:

  • The Grafana dashboard’s Impact Command Center section (attributed savings timeline)
  • The /api/v1/recommendations/savings endpoint

For fixed Grafana windows, prefer the pre-computed dashboard gauges:

  • greenkube_dashboard_savings_co2e_grams_total
  • greenkube_dashboard_savings_cost_dollars_total

GreenKube also exposes a ranked view of active recommendations with positive projected savings:

Terminal window
GET /api/v1/recommendations/top?limit=5&metric=co2
GET /api/v1/recommendations/top?limit=10&metric=cost&refresh=true

This API powers:

  • The Grafana Actionable Recommendations row
  • The greenkube_top_recommendations Prometheus gauge
  • CO₂e-first or cost-first ranking via metric=co2|cost
Terminal window
# Get all recommendations (reads from database)
greenkube recommend
# Filter by namespace
greenkube recommend -n production
# Longer lookback window
greenkube recommend --last 7d
# Live collection (re-runs the full collection pipeline)
greenkube recommend --live
# CI/CD gate: exit 1 if any recommendations exist
greenkube recommend --fail-on-recommendations
Terminal window
# Live recommendations (runs the recommender)
GET /api/v1/recommendations
# Persisted active recommendations
GET /api/v1/recommendations/active
# Persisted applied recommendations
GET /api/v1/recommendations/applied
# Ranked active recommendations by projected annual savings
GET /api/v1/recommendations/top
# History
GET /api/v1/recommendations/history

All thresholds are configurable via Helm values.yaml or environment variables:

config:
recommendations:
lookbackDays: 7 # Analysis window used by API/startup scan
rightsizingCpuThreshold: 0.3 # 30% usage triggers CPU rightsizing
rightsizingMemoryThreshold: 0.3 # 30% usage triggers memory rightsizing
rightsizingHeadroom: 1.2 # 20% safety margin on new request
zombieCostThreshold: 0.01 # $0.01/day minimum to flag
zombieEnergyThreshold: 1000 # 1,000 Joules minimum to flag
autoscalingCvThreshold: 0.7 # High variability coefficient
autoscalingSpikeRatio: 3.0 # 3× max/min ratio
carbonAwareThreshold: 1.5 # 1.5× average intensity
nodeUtilizationThreshold: 0.2 # 20% CPU for overprovisioned node
offPeakIdleThreshold: 0.05 # 5% of daily peak
offPeakMinIdleHours: 4 # 4h idle to suggest off-peak scaling
minCpuMillicores: 10 # Floor for generated CPU requests
minMemoryBytes: 16777216 # Floor for generated memory requests
applyTolerance: 0.25 # Reserved for future auto-apply detection

Adjust these based on your cluster size, workload patterns, and organizational priorities.