Pipeline solve CAPTCHA Anda menargetkan success rate 95%. Minggu lalu hasilnya 94,2%. Apakah itu masalah? Tanpa error budget, Anda tidak dapat menjawab pertanyaan itu secara kuantitatif. Error budget memberi tahu Anda tepat berapa banyak kegagalan yang bisa Anda toleransi sebelum keandalan berada di bawah SLO Anda — dan apa yang harus dilakukan saat budget habis.
Dasar-dasar Error Budget
| Konsep | Definisi | Contoh |
|---|---|---|
| SLO | Target success rate | 95% solve berhasil |
| Error budget | Tingkat kegagalan yang diizinkan | 5% dari total solve boleh gagal |
| Burn rate | Seberapa cepat budget dikonsumsi | 2× berarti budget habis di setengah window |
| Window | Periode pengukuran | Rolling 24 jam atau 7 hari |
Jika SLO Anda 95% dalam window 24 jam dengan 10.000 solve, error budget Anda adalah 500 kegagalan. Setelah mencapai 500 kegagalan, deployment baru atau perubahan berisiko dihentikan.
Python: Error Budget Tracker
import time
import threading
from dataclasses import dataclass, field
from collections import deque
from enum import Enum
API_KEY = "YOUR_API_KEY"
class BudgetStatus(Enum):
HEALTHY = "healthy" # Budget > 50% remaining
WARNING = "warning" # Budget 10-50% remaining
CRITICAL = "critical" # Budget < 10% remaining
EXHAUSTED = "exhausted" # Budget depleted
@dataclass
class SLOConfig:
"""Service Level Objective configuration."""
target_success_rate: float = 0.95 # 95%
window_seconds: int = 86400 # 24 hours
warning_threshold: float = 0.50 # Alert at 50% budget
critical_threshold: float = 0.10 # Alert at 10% budget
@dataclass
class ErrorBudgetEvent:
timestamp: float
success: bool
class ErrorBudgetTracker:
"""Tracks error budget consumption for CAPTCHA solving."""
def __init__(self, config: SLOConfig = SLOConfig()):
self.config = config
self._events: deque[ErrorBudgetEvent] = deque()
self._lock = threading.Lock()
self._callbacks: dict[BudgetStatus, list[callable]] = {
status: [] for status in BudgetStatus
}
self._last_status = BudgetStatus.HEALTHY
def on_status_change(self, status: BudgetStatus, callback: callable):
"""Register a callback for status transitions."""
self._callbacks[status].append(callback)
def record(self, success: bool):
"""Record a solve attempt."""
now = time.monotonic()
event = ErrorBudgetEvent(timestamp=now, success=success)
with self._lock:
self._events.append(event)
self._prune(now)
new_status = self._compute_status()
if new_status != self._last_status:
self._last_status = new_status
for cb in self._callbacks.get(new_status, []):
try:
cb(self.get_report())
except Exception as e:
print(f"[BUDGET] Callback error: {e}")
def _prune(self, now: float):
"""Remove events outside the window."""
cutoff = now - self.config.window_seconds
while self._events and self._events[0].timestamp < cutoff:
self._events.popleft()
def _compute_status(self) -> BudgetStatus:
remaining = self.remaining_fraction
if remaining <= 0:
return BudgetStatus.EXHAUSTED
if remaining < self.config.critical_threshold:
return BudgetStatus.CRITICAL
if remaining < self.config.warning_threshold:
return BudgetStatus.WARNING
return BudgetStatus.HEALTHY
@property
def total_events(self) -> int:
with self._lock:
return len(self._events)
@property
def success_count(self) -> int:
with self._lock:
return sum(1 for e in self._events if e.success)
@property
def failure_count(self) -> int:
with self._lock:
return sum(1 for e in self._events if not e.success)
@property
def current_success_rate(self) -> float:
total = self.total_events
return self.success_count / total if total > 0 else 1.0
@property
def error_budget_total(self) -> float:
"""Total allowed failures in the window."""
total = self.total_events
if total == 0:
return 0
return total * (1 - self.config.target_success_rate)
@property
def error_budget_remaining(self) -> float:
"""Remaining failure allowance."""
return max(0, self.error_budget_total - self.failure_count)
@property
def remaining_fraction(self) -> float:
"""Fraction of error budget remaining (0.0 to 1.0)."""
budget = self.error_budget_total
if budget <= 0:
return 1.0 if self.failure_count == 0 else 0.0
return max(0, self.error_budget_remaining / budget)
@property
def burn_rate(self) -> float:
"""How fast the budget is being consumed (1.0 = normal, 2.0 = 2× faster)."""
total = self.total_events
if total == 0:
return 0.0
expected_failures = total * (1 - self.config.target_success_rate)
if expected_failures == 0:
return 0.0
return self.failure_count / expected_failures
def get_report(self) -> dict:
return {
"status": self._last_status.value,
"slo_target": self.config.target_success_rate,
"current_rate": round(self.current_success_rate, 4),
"total_events": self.total_events,
"successes": self.success_count,
"failures": self.failure_count,
"budget_total": round(self.error_budget_total, 1),
"budget_remaining": round(self.error_budget_remaining, 1),
"budget_remaining_pct": round(self.remaining_fraction * 100, 1),
"burn_rate": round(self.burn_rate, 2),
}
# --- Integration with solver ---
budget = ErrorBudgetTracker(SLOConfig(
target_success_rate=0.95,
window_seconds=3600, # 1-hour window for demo
))
# Register alerts
budget.on_status_change(BudgetStatus.WARNING, lambda r:
print(f"[ALERT] Budget warning: {r['budget_remaining_pct']}% remaining"))
budget.on_status_change(BudgetStatus.CRITICAL, lambda r:
print(f"[ALERT] Budget critical: {r['budget_remaining_pct']}% remaining"))
budget.on_status_change(BudgetStatus.EXHAUSTED, lambda r:
print(f"[ALERT] Budget EXHAUSTED — throttle new requests"))
def solve_with_budget(params: dict) -> str:
"""Solve CAPTCHA while tracking error budget."""
import requests
if budget._last_status == BudgetStatus.EXHAUSTED:
raise RuntimeError("Error budget exhausted — solving paused")
try:
submit_params = {**params, "key": API_KEY, "json": 1}
resp = requests.post(
"https://ocr.captchaai.com/in.php", data=submit_params, timeout=30
).json()
if resp.get("status") != 1:
budget.record(False)
raise RuntimeError(f"Submit: {resp.get('request')}")
task_id = resp["request"]
start = time.monotonic()
while time.monotonic() - start < 180:
time.sleep(5)
poll = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id, "json": 1,
}, timeout=15).json()
if poll.get("request") == "CAPCHA_NOT_READY":
continue
if poll.get("status") == 1:
budget.record(True)
return poll["request"]
budget.record(False)
raise RuntimeError(f"Solve: {poll.get('request')}")
budget.record(False)
raise RuntimeError("Timeout")
except Exception:
budget.record(False)
raise
# Usage
for i in range(100):
try:
token = solve_with_budget({
"method": "turnstile",
"sitekey": "0x4XXXXXXXXXXXXXXXXX",
"pageurl": "https://example.com",
})
except RuntimeError as e:
if "exhausted" in str(e):
print(f"Stopped at iteration {i}")
break
print(budget.get_report())
JavaScript: Error Budget Tracker
class ErrorBudgetTracker {
#events = [];
#config;
#callbacks = {};
constructor(config = {}) {
this.#config = {
targetRate: config.targetRate || 0.95,
windowMs: config.windowMs || 3600_000,
warningThreshold: config.warningThreshold || 0.5,
criticalThreshold: config.criticalThreshold || 0.1,
};
this.lastStatus = "healthy";
}
on(status, callback) {
this.#callbacks[status] = this.#callbacks[status] || [];
this.#callbacks[status].push(callback);
}
record(success) {
const now = Date.now();
this.#events.push({ time: now, success });
this.#prune(now);
const newStatus = this.#computeStatus();
if (newStatus !== this.lastStatus) {
this.lastStatus = newStatus;
for (const cb of this.#callbacks[newStatus] || []) {
cb(this.report());
}
}
}
#prune(now) {
const cutoff = now - this.#config.windowMs;
while (this.#events.length && this.#events[0].time < cutoff) {
this.#events.shift();
}
}
#computeStatus() {
const frac = this.remainingFraction;
if (frac <= 0) return "exhausted";
if (frac < this.#config.criticalThreshold) return "critical";
if (frac < this.#config.warningThreshold) return "warning";
return "healthy";
}
get total() { return this.#events.length; }
get successes() { return this.#events.filter((e) => e.success).length; }
get failures() { return this.#events.filter((e) => !e.success).length; }
get currentRate() { return this.total ? this.successes / this.total : 1; }
get budgetTotal() {
return this.total * (1 - this.#config.targetRate);
}
get budgetRemaining() {
return Math.max(0, this.budgetTotal - this.failures);
}
get remainingFraction() {
const bt = this.budgetTotal;
if (bt <= 0) return this.failures === 0 ? 1 : 0;
return Math.max(0, this.budgetRemaining / bt);
}
get burnRate() {
const expected = this.total * (1 - this.#config.targetRate);
return expected > 0 ? this.failures / expected : 0;
}
report() {
return {
status: this.lastStatus,
currentRate: Math.round(this.currentRate * 10000) / 10000,
total: this.total,
failures: this.failures,
budgetRemainingPct: Math.round(this.remainingFraction * 1000) / 10,
burnRate: Math.round(this.burnRate * 100) / 100,
};
}
}
// Usage
const budget = new ErrorBudgetTracker({ targetRate: 0.95, windowMs: 3600_000 });
budget.on("warning", (r) => console.log(`[WARN] ${r.budgetRemainingPct}% budget left`));
budget.on("exhausted", (r) => console.log("[ALERT] Budget exhausted!"));
// Record results from your solver
budget.record(true); // success
budget.record(false); // failure
console.log(budget.report());
Alert Burn Rate
| Burn rate | Artinya | Tindakan |
|---|---|---|
| < 1.0 | Konsumsi lebih lambat dari yang diharapkan | Tidak perlu tindakan |
| 1.0 | Menuju habis di akhir window | Pantau dengan cermat |
| 2.0 | Budget habis di setengah window | Selidiki dan perlambat |
| 5.0+ | Konsumsi budget sangat cepat | Pause solve yang tidak kritis |
Pemecahan Masalah
| Masalah | Penyebab | Perbaikan |
|---|---|---|
| Budget habis terlalu cepat | SLO terlalu ketat untuk kondisi aktual | Tetapkan SLO realistis berdasarkan data historis |
| Budget tidak pernah habis | SLO terlalu longgar | Perketat SLO untuk mendorong peningkatan keandalan |
| Status berfluktuasi | Window terlalu pendek | Gunakan window pengukuran lebih panjang (24 jam vs 1 jam) |
| Burn rate menyesatkan pada volume rendah | Sedikit event menyimpangkan perhitungan | Wajibkan jumlah event minimum sebelum menghitung burn rate |
| Memory tracker terus bertambah | Event tidak di-prune | Pastikan _prune berjalan setiap kali record() dipanggil |
Pertanyaan Umum
Apa SLO yang realistis untuk solve CAPTCHA?
Tergantung tipe CAPTCHA. reCAPTCHA v2 biasanya mencapai success rate 90–95%. Turnstile mungkin lebih tinggi. Image CAPTCHA bervariasi. Mulailah dengan mengukur success rate Anda saat ini, lalu tetapkan SLO 2–3% di bawah baseline tersebut untuk membuat error budget yang bermakna.
Apa yang harus dilakukan saat error budget habis?
Opsi dari paling ringan hingga paling agresif: alert tim, throttle request baru, pause solve yang tidak kritis, beralih ke penanganan CAPTCHA manual. Jangan pernah diam-diam mengabaikan budget yang sudah habis.
Bagaimana cara menangani error budget untuk beberapa tipe CAPTCHA?
Lacak budget terpisah per tipe. reCAPTCHA mungkin memiliki SLO 93% sementara Turnstile memiliki 97%. Menggabungkannya ke dalam satu budget menyembunyikan masalah yang spesifik per tipe.
Artikel Terkait
- Pelacakan Solve CAPTCHA Serverless dengan DynamoDB
Langkah Selanjutnya
Lacak keandalan solve CAPTCHA Anda secara kuantitatif — dapatkan API key CaptchaAI Anda dan implementasikan error budget tracking.
Panduan terkait:
- Circuit Breaker Pattern untuk API Call CAPTCHA
- Health Check Endpoint untuk Worker CAPTCHA
- Monitoring Solve Rate CAPTCHA dengan Prometheus dan Grafana