Tutorial

Pelacakan Error Budget untuk Keandalan Solve CAPTCHA

Pipeline solve CAPTCHA Anda menargetkan success rate 95%. Minggu lalu hasilnya 94,2%. Apakah itu masalah? Tanpa error budget, Anda tidak dapat menjawab pertanyaan itu secara kuantitatif. Error budget memberi tahu Anda tepat berapa banyak kegagalan yang bisa Anda toleransi sebelum keandalan berada di bawah SLO Anda — dan apa yang harus dilakukan saat budget habis.

Dasar-dasar Error Budget

Konsep Definisi Contoh
SLO Target success rate 95% solve berhasil
Error budget Tingkat kegagalan yang diizinkan 5% dari total solve boleh gagal
Burn rate Seberapa cepat budget dikonsumsi 2× berarti budget habis di setengah window
Window Periode pengukuran Rolling 24 jam atau 7 hari

Jika SLO Anda 95% dalam window 24 jam dengan 10.000 solve, error budget Anda adalah 500 kegagalan. Setelah mencapai 500 kegagalan, deployment baru atau perubahan berisiko dihentikan.

Python: Error Budget Tracker

import time
import threading
from dataclasses import dataclass, field
from collections import deque
from enum import Enum

API_KEY = "YOUR_API_KEY"


class BudgetStatus(Enum):
    HEALTHY = "healthy"          # Budget > 50% remaining
    WARNING = "warning"          # Budget 10-50% remaining
    CRITICAL = "critical"        # Budget < 10% remaining
    EXHAUSTED = "exhausted"      # Budget depleted


@dataclass
class SLOConfig:
    """Service Level Objective configuration."""
    target_success_rate: float = 0.95  # 95%
    window_seconds: int = 86400        # 24 hours
    warning_threshold: float = 0.50    # Alert at 50% budget
    critical_threshold: float = 0.10   # Alert at 10% budget


@dataclass
class ErrorBudgetEvent:
    timestamp: float
    success: bool


class ErrorBudgetTracker:
    """Tracks error budget consumption for CAPTCHA solving."""

    def __init__(self, config: SLOConfig = SLOConfig()):
        self.config = config
        self._events: deque[ErrorBudgetEvent] = deque()
        self._lock = threading.Lock()
        self._callbacks: dict[BudgetStatus, list[callable]] = {
            status: [] for status in BudgetStatus
        }
        self._last_status = BudgetStatus.HEALTHY

    def on_status_change(self, status: BudgetStatus, callback: callable):
        """Register a callback for status transitions."""
        self._callbacks[status].append(callback)

    def record(self, success: bool):
        """Record a solve attempt."""
        now = time.monotonic()
        event = ErrorBudgetEvent(timestamp=now, success=success)

        with self._lock:
            self._events.append(event)
            self._prune(now)
            new_status = self._compute_status()

            if new_status != self._last_status:
                self._last_status = new_status
                for cb in self._callbacks.get(new_status, []):
                    try:
                        cb(self.get_report())
                    except Exception as e:
                        print(f"[BUDGET] Callback error: {e}")

    def _prune(self, now: float):
        """Remove events outside the window."""
        cutoff = now - self.config.window_seconds
        while self._events and self._events[0].timestamp < cutoff:
            self._events.popleft()

    def _compute_status(self) -> BudgetStatus:
        remaining = self.remaining_fraction
        if remaining <= 0:
            return BudgetStatus.EXHAUSTED
        if remaining < self.config.critical_threshold:
            return BudgetStatus.CRITICAL
        if remaining < self.config.warning_threshold:
            return BudgetStatus.WARNING
        return BudgetStatus.HEALTHY

    @property
    def total_events(self) -> int:
        with self._lock:
            return len(self._events)

    @property
    def success_count(self) -> int:
        with self._lock:
            return sum(1 for e in self._events if e.success)

    @property
    def failure_count(self) -> int:
        with self._lock:
            return sum(1 for e in self._events if not e.success)

    @property
    def current_success_rate(self) -> float:
        total = self.total_events
        return self.success_count / total if total > 0 else 1.0

    @property
    def error_budget_total(self) -> float:
        """Total allowed failures in the window."""
        total = self.total_events
        if total == 0:
            return 0
        return total * (1 - self.config.target_success_rate)

    @property
    def error_budget_remaining(self) -> float:
        """Remaining failure allowance."""
        return max(0, self.error_budget_total - self.failure_count)

    @property
    def remaining_fraction(self) -> float:
        """Fraction of error budget remaining (0.0 to 1.0)."""
        budget = self.error_budget_total
        if budget <= 0:
            return 1.0 if self.failure_count == 0 else 0.0
        return max(0, self.error_budget_remaining / budget)

    @property
    def burn_rate(self) -> float:
        """How fast the budget is being consumed (1.0 = normal, 2.0 = 2× faster)."""
        total = self.total_events
        if total == 0:
            return 0.0
        expected_failures = total * (1 - self.config.target_success_rate)
        if expected_failures == 0:
            return 0.0
        return self.failure_count / expected_failures

    def get_report(self) -> dict:
        return {
            "status": self._last_status.value,
            "slo_target": self.config.target_success_rate,
            "current_rate": round(self.current_success_rate, 4),
            "total_events": self.total_events,
            "successes": self.success_count,
            "failures": self.failure_count,
            "budget_total": round(self.error_budget_total, 1),
            "budget_remaining": round(self.error_budget_remaining, 1),
            "budget_remaining_pct": round(self.remaining_fraction * 100, 1),
            "burn_rate": round(self.burn_rate, 2),
        }


# --- Integration with solver ---

budget = ErrorBudgetTracker(SLOConfig(
    target_success_rate=0.95,
    window_seconds=3600,  # 1-hour window for demo
))

# Register alerts
budget.on_status_change(BudgetStatus.WARNING, lambda r:
    print(f"[ALERT] Budget warning: {r['budget_remaining_pct']}% remaining"))

budget.on_status_change(BudgetStatus.CRITICAL, lambda r:
    print(f"[ALERT] Budget critical: {r['budget_remaining_pct']}% remaining"))

budget.on_status_change(BudgetStatus.EXHAUSTED, lambda r:
    print(f"[ALERT] Budget EXHAUSTED — throttle new requests"))


def solve_with_budget(params: dict) -> str:
    """Solve CAPTCHA while tracking error budget."""
    import requests

    if budget._last_status == BudgetStatus.EXHAUSTED:
        raise RuntimeError("Error budget exhausted — solving paused")

    try:
        submit_params = {**params, "key": API_KEY, "json": 1}
        resp = requests.post(
            "https://ocr.captchaai.com/in.php", data=submit_params, timeout=30
        ).json()
        if resp.get("status") != 1:
            budget.record(False)
            raise RuntimeError(f"Submit: {resp.get('request')}")

        task_id = resp["request"]
        start = time.monotonic()
        while time.monotonic() - start < 180:
            time.sleep(5)
            poll = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": API_KEY, "action": "get", "id": task_id, "json": 1,
            }, timeout=15).json()

            if poll.get("request") == "CAPCHA_NOT_READY":
                continue
            if poll.get("status") == 1:
                budget.record(True)
                return poll["request"]

            budget.record(False)
            raise RuntimeError(f"Solve: {poll.get('request')}")

        budget.record(False)
        raise RuntimeError("Timeout")

    except Exception:
        budget.record(False)
        raise


# Usage
for i in range(100):
    try:
        token = solve_with_budget({
            "method": "turnstile",
            "sitekey": "0x4XXXXXXXXXXXXXXXXX",
            "pageurl": "https://example.com",
        })
    except RuntimeError as e:
        if "exhausted" in str(e):
            print(f"Stopped at iteration {i}")
            break

print(budget.get_report())

JavaScript: Error Budget Tracker

class ErrorBudgetTracker {
  #events = [];
  #config;
  #callbacks = {};

  constructor(config = {}) {
    this.#config = {
      targetRate: config.targetRate || 0.95,
      windowMs: config.windowMs || 3600_000,
      warningThreshold: config.warningThreshold || 0.5,
      criticalThreshold: config.criticalThreshold || 0.1,
    };
    this.lastStatus = "healthy";
  }

  on(status, callback) {
    this.#callbacks[status] = this.#callbacks[status] || [];
    this.#callbacks[status].push(callback);
  }

  record(success) {
    const now = Date.now();
    this.#events.push({ time: now, success });
    this.#prune(now);

    const newStatus = this.#computeStatus();
    if (newStatus !== this.lastStatus) {
      this.lastStatus = newStatus;
      for (const cb of this.#callbacks[newStatus] || []) {
        cb(this.report());
      }
    }
  }

  #prune(now) {
    const cutoff = now - this.#config.windowMs;
    while (this.#events.length && this.#events[0].time < cutoff) {
      this.#events.shift();
    }
  }

  #computeStatus() {
    const frac = this.remainingFraction;
    if (frac <= 0) return "exhausted";
    if (frac < this.#config.criticalThreshold) return "critical";
    if (frac < this.#config.warningThreshold) return "warning";
    return "healthy";
  }

  get total() { return this.#events.length; }
  get successes() { return this.#events.filter((e) => e.success).length; }
  get failures() { return this.#events.filter((e) => !e.success).length; }
  get currentRate() { return this.total ? this.successes / this.total : 1; }

  get budgetTotal() {
    return this.total * (1 - this.#config.targetRate);
  }

  get budgetRemaining() {
    return Math.max(0, this.budgetTotal - this.failures);
  }

  get remainingFraction() {
    const bt = this.budgetTotal;
    if (bt <= 0) return this.failures === 0 ? 1 : 0;
    return Math.max(0, this.budgetRemaining / bt);
  }

  get burnRate() {
    const expected = this.total * (1 - this.#config.targetRate);
    return expected > 0 ? this.failures / expected : 0;
  }

  report() {
    return {
      status: this.lastStatus,
      currentRate: Math.round(this.currentRate * 10000) / 10000,
      total: this.total,
      failures: this.failures,
      budgetRemainingPct: Math.round(this.remainingFraction * 1000) / 10,
      burnRate: Math.round(this.burnRate * 100) / 100,
    };
  }
}

// Usage
const budget = new ErrorBudgetTracker({ targetRate: 0.95, windowMs: 3600_000 });

budget.on("warning", (r) => console.log(`[WARN] ${r.budgetRemainingPct}% budget left`));
budget.on("exhausted", (r) => console.log("[ALERT] Budget exhausted!"));

// Record results from your solver
budget.record(true);   // success
budget.record(false);  // failure
console.log(budget.report());

Alert Burn Rate

Burn rate Artinya Tindakan
< 1.0 Konsumsi lebih lambat dari yang diharapkan Tidak perlu tindakan
1.0 Menuju habis di akhir window Pantau dengan cermat
2.0 Budget habis di setengah window Selidiki dan perlambat
5.0+ Konsumsi budget sangat cepat Pause solve yang tidak kritis

Pemecahan Masalah

Masalah Penyebab Perbaikan
Budget habis terlalu cepat SLO terlalu ketat untuk kondisi aktual Tetapkan SLO realistis berdasarkan data historis
Budget tidak pernah habis SLO terlalu longgar Perketat SLO untuk mendorong peningkatan keandalan
Status berfluktuasi Window terlalu pendek Gunakan window pengukuran lebih panjang (24 jam vs 1 jam)
Burn rate menyesatkan pada volume rendah Sedikit event menyimpangkan perhitungan Wajibkan jumlah event minimum sebelum menghitung burn rate
Memory tracker terus bertambah Event tidak di-prune Pastikan _prune berjalan setiap kali record() dipanggil

Pertanyaan Umum

Apa SLO yang realistis untuk solve CAPTCHA?

Tergantung tipe CAPTCHA. reCAPTCHA v2 biasanya mencapai success rate 90–95%. Turnstile mungkin lebih tinggi. Image CAPTCHA bervariasi. Mulailah dengan mengukur success rate Anda saat ini, lalu tetapkan SLO 2–3% di bawah baseline tersebut untuk membuat error budget yang bermakna.

Apa yang harus dilakukan saat error budget habis?

Opsi dari paling ringan hingga paling agresif: alert tim, throttle request baru, pause solve yang tidak kritis, beralih ke penanganan CAPTCHA manual. Jangan pernah diam-diam mengabaikan budget yang sudah habis.

Bagaimana cara menangani error budget untuk beberapa tipe CAPTCHA?

Lacak budget terpisah per tipe. reCAPTCHA mungkin memiliki SLO 93% sementara Turnstile memiliki 97%. Menggabungkannya ke dalam satu budget menyembunyikan masalah yang spesifik per tipe.

Artikel Terkait

  • Pelacakan Solve CAPTCHA Serverless dengan DynamoDB

Langkah Selanjutnya

Lacak keandalan solve CAPTCHA Anda secara kuantitatif — dapatkan API key CaptchaAI Anda dan implementasikan error budget tracking.

Panduan terkait:

Komentar dinonaktifkan untuk artikel ini.