CAPTCHA solver worker Anda tampak hidup — prosesnya berjalan — namun belum berhasil menyelesaikan task dalam 10 menit. API key mungkin habis, atau worker terjebak dalam satu loop. Tanpa health check, orkestrator terus mengarahkan pekerjaan ke worker yang sudah mati. Health endpoints memungkinkan load balancer dan Kubernetes mendeteksi masalah dan melakukan rerouting.
Tiga Jenis Health Check
| Check | Pertanyaan | Respons kegagalan |
|---|---|---|
| Liveness | Apakah prosesnya berjalan? | Restart container |
| Readiness | Apakah bisa menerima pekerjaan? | Hentikan routing traffic |
| Dependency | Apakah layanan upstream baik-baik saja? | Degradasi dengan graceful |
Python: Health Check Endpoint Flask
import requests
import time
import threading
from flask import Flask, jsonify
from dataclasses import dataclass, field
API_KEY = "YOUR_API_KEY"
RESULT_URL = "https://ocr.captchaai.com/res.php"
app = Flask(__name__)
@dataclass
class WorkerHealth:
"""Tracks worker health metrics."""
started_at: float = field(default_factory=time.monotonic)
last_solve_at: float = 0.0
total_solved: int = 0
total_failed: int = 0
consecutive_failures: int = 0
balance: float | None = None
balance_checked_at: float = 0.0
_lock: threading.Lock = field(default_factory=threading.Lock)
def record_success(self):
with self._lock:
self.total_solved += 1
self.last_solve_at = time.monotonic()
self.consecutive_failures = 0
def record_failure(self):
with self._lock:
self.total_failed += 1
self.consecutive_failures += 1
@property
def success_rate(self) -> float:
total = self.total_solved + self.total_failed
return self.total_solved / total if total > 0 else 1.0
@property
def seconds_since_last_solve(self) -> float:
if self.last_solve_at == 0:
return time.monotonic() - self.started_at
return time.monotonic() - self.last_solve_at
health = WorkerHealth()
# Thresholds
MAX_CONSECUTIVE_FAILURES = 10
MAX_SECONDS_WITHOUT_SOLVE = 600 # 10 minutes
MIN_BALANCE = 1.0
def check_balance() -> float | None:
"""Check CaptchaAI balance."""
now = time.monotonic()
# Cache balance for 60 seconds
if health.balance is not None and now - health.balance_checked_at < 60:
return health.balance
try:
resp = requests.get(RESULT_URL, params={
"key": API_KEY, "action": "getbalance", "json": 1,
}, timeout=10).json()
health.balance = float(resp.get("request", 0))
health.balance_checked_at = now
return health.balance
except Exception:
return health.balance # Return cached value on error
@app.route("/health/live")
def liveness():
"""Liveness probe — is the process responsive?"""
return jsonify({"status": "ok", "uptime_s": int(time.monotonic() - health.started_at)}), 200
@app.route("/health/ready")
def readiness():
"""Readiness probe — can the worker accept tasks?"""
issues = []
# Check consecutive failures
if health.consecutive_failures >= MAX_CONSECUTIVE_FAILURES:
issues.append(f"consecutive_failures={health.consecutive_failures}")
# Check time since last solve
if health.total_solved > 0 and health.seconds_since_last_solve > MAX_SECONDS_WITHOUT_SOLVE:
issues.append(f"no_solve_for={int(health.seconds_since_last_solve)}s")
# Check balance
balance = check_balance()
if balance is not None and balance < MIN_BALANCE:
issues.append(f"low_balance=${balance:.2f}")
if issues:
return jsonify({
"status": "not_ready",
"issues": issues,
"stats": {
"solved": health.total_solved,
"failed": health.total_failed,
"success_rate": round(health.success_rate, 3),
},
}), 503
return jsonify({
"status": "ready",
"stats": {
"solved": health.total_solved,
"failed": health.total_failed,
"success_rate": round(health.success_rate, 3),
"balance": balance,
},
}), 200
@app.route("/health/dependencies")
def dependencies():
"""Check upstream dependencies."""
checks = {}
# CaptchaAI API reachability
try:
resp = requests.get(RESULT_URL, params={
"key": API_KEY, "action": "getbalance", "json": 1,
}, timeout=10)
checks["captchaai_api"] = {
"status": "ok" if resp.status_code == 200 else "degraded",
"response_ms": int(resp.elapsed.total_seconds() * 1000),
}
except Exception as e:
checks["captchaai_api"] = {"status": "down", "error": str(e)}
all_ok = all(c["status"] == "ok" for c in checks.values())
return jsonify({
"status": "ok" if all_ok else "degraded",
"checks": checks,
}), 200 if all_ok else 503
# --- Worker loop (runs in background) ---
def worker_loop():
"""Simulated CAPTCHA solving worker."""
while True:
try:
# ... solve CAPTCHA logic ...
health.record_success()
except Exception:
health.record_failure()
time.sleep(1)
threading.Thread(target=worker_loop, daemon=True).start()
JavaScript: Health Check Endpoint Express
const express = require("express");
const API_KEY = "YOUR_API_KEY";
const RESULT_URL = "https://ocr.captchaai.com/res.php";
const app = express();
const health = {
startedAt: Date.now(),
lastSolveAt: 0,
totalSolved: 0,
totalFailed: 0,
consecutiveFailures: 0,
balance: null,
balanceCheckedAt: 0,
recordSuccess() {
this.totalSolved++;
this.lastSolveAt = Date.now();
this.consecutiveFailures = 0;
},
recordFailure() {
this.totalFailed++;
this.consecutiveFailures++;
},
get successRate() {
const total = this.totalSolved + this.totalFailed;
return total > 0 ? this.totalSolved / total : 1;
},
};
async function checkBalance() {
if (health.balance !== null && Date.now() - health.balanceCheckedAt < 60000) {
return health.balance;
}
try {
const url = `${RESULT_URL}?key=${API_KEY}&action=getbalance&json=1`;
const resp = await (await fetch(url)).json();
health.balance = parseFloat(resp.request);
health.balanceCheckedAt = Date.now();
return health.balance;
} catch {
return health.balance;
}
}
app.get("/health/live", (req, res) => {
res.json({ status: "ok", uptimeMs: Date.now() - health.startedAt });
});
app.get("/health/ready", async (req, res) => {
const issues = [];
if (health.consecutiveFailures >= 10) {
issues.push(`consecutive_failures=${health.consecutiveFailures}`);
}
if (health.totalSolved > 0) {
const silentMs = Date.now() - health.lastSolveAt;
if (silentMs > 600_000) {
issues.push(`no_solve_for=${Math.round(silentMs / 1000)}s`);
}
}
const balance = await checkBalance();
if (balance !== null && balance < 1.0) {
issues.push(`low_balance=$${balance.toFixed(2)}`);
}
const stats = {
solved: health.totalSolved,
failed: health.totalFailed,
successRate: Math.round(health.successRate * 1000) / 1000,
balance,
};
if (issues.length > 0) {
return res.status(503).json({ status: "not_ready", issues, stats });
}
res.json({ status: "ready", stats });
});
app.get("/health/dependencies", async (req, res) => {
const checks = {};
try {
const start = Date.now();
const url = `${RESULT_URL}?key=${API_KEY}&action=getbalance&json=1`;
const resp = await fetch(url);
checks.captchaaiApi = {
status: resp.ok ? "ok" : "degraded",
responseMs: Date.now() - start,
};
} catch (e) {
checks.captchaaiApi = { status: "down", error: e.message };
}
const allOk = Object.values(checks).every((c) => c.status === "ok");
res.status(allOk ? 200 : 503).json({
status: allOk ? "ok" : "degraded",
checks,
});
});
app.listen(8080, () => console.log("Health server on :8080"));
Konfigurasi Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: captcha-worker
spec:
replicas: 3
template:
spec:
containers:
- name: worker
image: captcha-worker:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2
Kode Respons Health Check
| Endpoint | 200 | 503 |
|---|---|---|
/health/live |
Proses responsif | Proses terhenti – restart |
/health/ready |
Bisa menerima pekerjaan | Berhenti mengirimkan task |
/health/dependencies |
Semua dependency OK | Upstream terdegradasi |
Threshold Operator
- Gunakan readiness untuk memblokir pekerjaan baru, liveness untuk memicu restart, dan alert balance untuk mendeteksi penurunan throughput.
- Kaitkan health check dengan kedalaman queue, error rate terkini, dan keterjangkauan dependency, bukan hanya uptime proses.
- Jaga agar nilai threshold tetap visible bagi on-call engineer sehingga perubahan kesehatan bisa ditindaklanjuti.
Pemecahan Masalah
| Masalah | Penyebab | Solusi |
|---|---|---|
| Worker terus-menerus restart | Threshold liveness terlalu rendah | Naikkan failureThreshold atau periodSeconds |
| Worker ditandai not ready saat startup | Belum ada solve yang dianggap "terlalu lama" | Hanya periksa seconds_since_last_solve setelah solve pertama |
| Pengecekan balance memperlambat health endpoint | API call setiap request | Cache balance dengan TTL (60 detik disarankan) |
| Health endpoint sendiri error | Exception tidak tertangani dalam pengecekan | Bungkus setiap check di try/except; return degraded, bukan 500 |
| False negative dari dependency check | Jaringan terputus saat cek balance | Gunakan cached value dengan stale-while-revalidate |
Pertanyaan Umum
Seberapa sering Kubernetes harus probe health endpoint?
Liveness: setiap 10–30 detik dengan failure threshold 3. Readiness: setiap 5–10 detik dengan failure threshold 2. Probe lebih sering mendeteksi masalah lebih cepat tapi menambah overhead.
Haruskah health endpoint memanggil API CaptchaAI?
Hanya untuk readiness dan dependency check, dan selalu cache hasilnya. Liveness probe tidak boleh melakukan external call — harus respons instan untuk membuktikan proses berjalan.
Bagaimana cara memantau kesehatan beberapa worker?
Expose metrik kesehatan dalam format Prometheus (/metrics) di samping health endpoint. Agregasikan semua worker dengan Grafana dashboard untuk melihat kesehatan seluruh fleet.
Artikel Terkait
Langkah Selanjutnya
Jadikan pekerja CAPTCHA Anda siap produksi —dapatkan kunci API CaptchaAI Andadan menambahkan titik akhir pemeriksaan kesehatan.
Panduan terkait:
- Pola Pemutus Sirkuit untuk Panggilan API CAPTCHA
- Pola Sekat untuk Pemecahan CAPTCHA
- Memantau Tingkat Penyelesaian CAPTCHA dengan Prometheus dan Grafana