Tutorial

Manajemen Status Sesi CAPTCHA di Seluruh Worker Terdistribusi

Ketika beberapa worker memecahkan CAPTCHA untuk situs yang sama, mereka menghadapi masalah yang sama: setiap worker memiliki sesinya sendiri. Situs target melihat cookie berbeda, IP berbeda, dan browser sinyal browser berbeda. Manajemen status sesi menyinkronkan konteks antar worker sehingga penyelesaiannya konsisten dan situs target melihat sesi yang koheren.

Masalah Status Sesi

Worker 1 → Login → Solve CAPTCHA → Get cookie A → Submit form ✅
Worker 2 → New session → Solve CAPTCHA → Get cookie B → Submit form ✅
Worker 3 → Reuse cookie A? → Cookie expired → Solve CAPTCHA → Fail ❌

Tanpa state bersama, worker membuang solve pada sesi yang sudah kedaluarsa dan menghasilkan perilaku tidak konsisten yang dapat terdeteksi oleh situs target.

Apa yang Termasuk dalam Status Sesi

Komponen Negara Seumur hidup Strategi Berbagi
Cookie otentikasi Menit ke jam Redis dengan TTL
Token CAPTCHA 90–300 detik Daftar Redis (TTL pendek)
Cookie qa_validation_cookie ~30 menit Redis hash
Token CSRF Per page load Jangan dibagi — setiap worker ambil sendiri
Browser sinyal browser Permanen Konfigurasi, bukan runtime state
Penugasan proxy Per sesi Pool proxy berbasis Redis

Arsitektur

┌──────────────────────────────────────┐
│          Session State Store          │
│              (Redis)                  │
│                                      │
│  cookies:{domain} → Hash             │
│  tokens:{sitekey} → List             │
│  proxies:pool → Set                  │
│  locks:{domain}:{worker} → String    │
└─────┬──────────┬──────────┬──────────┘
      │          │          │
  ┌───▼───┐  ┌──▼────┐  ┌──▼────┐
  │Worker1│  │Worker2│  │Worker3│
  └───────┘  └───────┘  └───────┘

Implementasi Python

Toko Sesi

import os
import json
import time
import redis
import requests
from datetime import datetime, timezone

r = redis.Redis(
    host=os.environ.get("REDIS_HOST", "localhost"),
    port=int(os.environ.get("REDIS_PORT", 6379)),
    decode_responses=True
)

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


class SessionStore:
    """Shared session state across distributed workers."""

    def __init__(self, domain):
        self.domain = domain
        self.cookie_key = f"session:cookies:{domain}"
        self.token_key = f"session:tokens:{domain}"

    def save_cookies(self, cookies, ttl=1800):
        """Store cookies from a successful session."""
        cookie_data = {name: value for name, value in cookies.items()}
        r.hset(self.cookie_key, mapping=cookie_data)
        r.expire(self.cookie_key, ttl)

    def get_cookies(self):
        """Retrieve shared cookies."""
        cookies = r.hgetall(self.cookie_key)
        return cookies if cookies else None

    def save_token(self, sitekey, token, ttl=80):
        """Store a solved CAPTCHA token."""
        key = f"{self.token_key}:{sitekey}"
        r.rpush(key, token)
        r.expire(key, ttl)

    def get_token(self, sitekey):
        """Pop a cached CAPTCHA token."""
        key = f"{self.token_key}:{sitekey}"
        return r.lpop(key)

    def acquire_session_lock(self, worker_id, ttl=300):
        """Ensure only one worker manages the session at a time."""
        lock_key = f"session:lock:{self.domain}"
        return r.set(lock_key, worker_id, nx=True, ex=ttl)

    def release_session_lock(self, worker_id):
        """Release session lock if this worker holds it."""
        lock_key = f"session:lock:{self.domain}"
        current = r.get(lock_key)
        if current == worker_id:
            r.delete(lock_key)

Pekerja dengan Negara Bersama

class CaptchaWorker:
    def __init__(self, worker_id, domain):
        self.worker_id = worker_id
        self.store = SessionStore(domain)
        self.session = requests.Session()

    def setup_session(self):
        """Load shared cookies into this worker's session."""
        cookies = self.store.get_cookies()
        if cookies:
            for name, value in cookies.items():
                self.session.cookies.set(name, value)
            return True
        return False

    def solve_captcha(self, sitekey, pageurl):
        """Solve with token cache and session sharing."""
        # Check for cached token
        cached = self.store.get_token(sitekey)
        if cached:
            return {"solution": cached, "source": "cache"}

        # Solve via CaptchaAI
        resp = requests.post("https://ocr.captchaai.com/in.php", data={
            "key": API_KEY,
            "method": "userrecaptcha",
            "googlekey": sitekey,
            "pageurl": pageurl,
            "json": 1
        })
        data = resp.json()
        if data.get("status") != 1:
            return {"error": data.get("request")}

        captcha_id = data["request"]

        for _ in range(60):
            time.sleep(5)
            result = requests.get("https://ocr.captchaai.com/res.php", params={
                "key": API_KEY, "action": "get",
                "id": captcha_id, "json": 1
            }).json()

            if result.get("status") == 1:
                token = result["request"]
                self.store.save_token(sitekey, token)
                return {"solution": token, "source": "api"}

            if result.get("request") != "CAPCHA_NOT_READY":
                return {"error": result.get("request")}

        return {"error": "TIMEOUT"}

    def process_page(self, url, sitekey):
        """Full workflow: setup session → solve CAPTCHA → submit."""
        # Load shared session
        self.setup_session()

        # Solve CAPTCHA
        result = self.solve_captcha(sitekey, url)
        if "error" in result:
            return result

        # Submit form with token
        response = self.session.post(url, data={
            "g-recaptcha-response": result["solution"]
        })

        # Share resulting cookies
        self.store.save_cookies(dict(self.session.cookies))

        return {"status": response.status_code, "source": result["source"]}

Manajemen Pool Proxy

class ProxyPool:
    """Distribute proxies across workers to avoid IP conflicts."""

    def __init__(self, proxies):
        self.pool_key = "session:proxy_pool"
        self.assigned_key = "session:proxy_assigned"
        # Initialize pool
        for proxy in proxies:
            r.sadd(self.pool_key, proxy)

    def acquire_proxy(self, worker_id, ttl=600):
        """Assign an unused proxy to a worker."""
        # Check if worker already has one
        existing = r.hget(self.assigned_key, worker_id)
        if existing:
            return existing

        # Pop from available pool
        proxy = r.spop(self.pool_key)
        if proxy:
            r.hset(self.assigned_key, worker_id, proxy)
            r.expire(self.assigned_key, ttl)
            return proxy
        return None

    def release_proxy(self, worker_id):
        """Return proxy to the pool."""
        proxy = r.hget(self.assigned_key, worker_id)
        if proxy:
            r.sadd(self.pool_key, proxy)
            r.hdel(self.assigned_key, worker_id)

Implementasi JavaScript

const Redis = require("ioredis");
const axios = require("axios");

const redis = new Redis(process.env.REDIS_URL || "redis://localhost:6379");
const API_KEY = process.env.CAPTCHAAI_API_KEY;

class SessionStore {
  constructor(domain) {
    this.domain = domain;
    this.cookieKey = `session:cookies:${domain}`;
    this.tokenKey = `session:tokens:${domain}`;
  }

  async saveCookies(cookies, ttl = 1800) {
    const entries = Object.entries(cookies).flat();
    if (entries.length > 0) {
      await redis.hset(this.cookieKey, ...entries);
      await redis.expire(this.cookieKey, ttl);
    }
  }

  async getCookies() {
    return await redis.hgetall(this.cookieKey);
  }

  async saveToken(sitekey, token, ttl = 80) {
    const key = `${this.tokenKey}:${sitekey}`;
    await redis.rpush(key, token);
    await redis.expire(key, ttl);
  }

  async getToken(sitekey) {
    return await redis.lpop(`${this.tokenKey}:${sitekey}`);
  }

  async acquireLock(workerId, ttl = 300) {
    const result = await redis.set(`session:lock:${this.domain}`, workerId, "NX", "EX", ttl);
    return result === "OK";
  }

  async releaseLock(workerId) {
    const current = await redis.get(`session:lock:${this.domain}`);
    if (current === workerId) await redis.del(`session:lock:${this.domain}`);
  }
}

async function workerSolve(store, sitekey, pageurl) {
  const cached = await store.getToken(sitekey);
  if (cached) return { solution: cached, source: "cache" };

  const submit = await axios.post("https://ocr.captchaai.com/in.php", null, {
    params: { key: API_KEY, method: "userrecaptcha", googlekey: sitekey, pageurl, json: 1 },
  });
  if (submit.data.status !== 1) return { error: submit.data.request };

  const captchaId = submit.data.request;
  for (let i = 0; i < 60; i++) {
    await new Promise((r) => setTimeout(r, 5000));
    const poll = await axios.get("https://ocr.captchaai.com/res.php", {
      params: { key: API_KEY, action: "get", id: captchaId, json: 1 },
    });
    if (poll.data.status === 1) {
      await store.saveToken(sitekey, poll.data.request);
      return { solution: poll.data.request, source: "api" };
    }
    if (poll.data.request !== "CAPCHA_NOT_READY") return { error: poll.data.request };
  }
  return { error: "TIMEOUT" };
}

Pola Manajemen State

Pola Kapan Digunakan
Session lock Satu worker mengelola login, worker lain menggunakan cookie
Token pool Throughput tinggi: solve lebih awal dan distribusikan token
Cookie sharing Worker memerlukan sesi terautentikasi
Proxy affinity Situs target melacak binding sesi IP

Pemecahan Masalah

Masalah Penyebab Solusi
Worker mendapat sesi berbeda Cookie tidak dibagi via Redis Verifikasi save_cookies dipanggil setelah request berhasil
Token kedaluarsa sebelum worker lain menggunakannya TTL terlalu panjang atau network delay Kurangi margin safety TTL; gunakan token dalam 10 detik setelah diambil
Session lock tidak pernah dilepas Worker crash TTL pada lock key melepasnya otomatis (default 300 detik)
Situs target memblokir worker Semua worker menggunakan proxy yang sama Gunakan pool proxy dengan affinity per worker

Pertanyaan Umum

Hanya untuk situs yang memerlukan sesi terautentikasi. Untuk solve CAPTCHA stateless (kirim sitekey → dapatkan token), worker tidak perlu cookie bersama — cukup berbagi token.

Bagaimana cara menangani berakhirnya sesi?

Atur Redis TTL sedikit lebih pendek dari masa hidup sesi. Ketika cookie kedaluarsa, satu worker mengambil session lock, melakukan autentikasi ulang, dan menyimpan cookie baru untuk worker lainnya.

Bagaimana dengan sesi berbasis browser (Puppeteer/Playwright)?

Serialkan cookie browser dengan page.cookies() dan simpan di Redis. Worker lain memuatnya dengan page.setCookie(). Ini berfungsi di seluruh mesin dan browser instance yang berbeda.

Langkah Selanjutnya

Koordinasikan worker CAPTCHA terdistribusi Anda secara efisien — dapatkan kunci API CaptchaAI Anda.

Panduan terkait:

  • Manajemen TTL Token Redis
  • Kegigihan Sesi Browser
Komentar dinonaktifkan untuk artikel ini.