Tutorial

Deduplikasi Request Solve CAPTCHA dengan Database Locking

Ketika beberapa worker atau retry mengirimkan CAPTCHA yang sama untuk di-solve, Anda membayar untuk setiap duplikat. Lapisan deduplikasi menangkap request yang identik dan mengembalikan hasil yang sama — menghemat kredit API dan mengurangi latensi.

Bagaimana Duplikat Terjadi

Skenario Penyebab Pemborosan
Retry sebelum hasil tiba Retry logic terlalu agresif Biaya 2–5x per CAPTCHA
Banyak worker, target sama Tidak ada koordinasi antar worker Solve terbuang secara paralel
Trigger ulang refresh halaman Retry frontend saat timeout Solve ekstra per refresh
Pesan queue di-replay Jaminan at-least-once delivery Solve duplikat per replay

Desain Dedup Key

Buat key unik dari parameter request:

import hashlib


def dedup_key(method, sitekey, pageurl):
    """Generate a deduplication key for a CAPTCHA solve request."""
    raw = f"{method}:{sitekey}:{pageurl}"
    return f"captcha:dedup:{hashlib.sha256(raw.encode()).hexdigest()[:16]}"

Komposisi key:

Jenis CAPTCHA Komponen Key
reCAPTCHA v2 method + sitekey + pageurl
reCAPTCHA v3 method + sitekey + pageurl + action
hCaptcha method + sitekey + pageurl
Turnstile method + sitekey + pageurl
Image CAPTCHA method + hash body (konten gambar)

Deduplikasi Berbasis Redis

Implementasi Python

import os
import time
import json
import hashlib
import redis
import requests

r = redis.Redis(
    host=os.environ.get("REDIS_HOST", "localhost"),
    port=int(os.environ.get("REDIS_PORT", 6379)),
    decode_responses=True
)

API_KEY = os.environ["CAPTCHAAI_API_KEY"]

# Dedup window: how long to consider a request "in progress"
DEDUP_TTL = 180  # seconds


def dedup_key(method, sitekey, pageurl, extra=""):
    raw = f"{method}:{sitekey}:{pageurl}:{extra}"
    return f"captcha:dedup:{hashlib.sha256(raw.encode()).hexdigest()[:16]}"


def solve_with_dedup(sitekey, pageurl, method="userrecaptcha"):
    key = dedup_key(method, sitekey, pageurl)

    # Check if this request is already being solved
    existing = r.get(key)
    if existing:
        state = json.loads(existing)
        if state["status"] == "solving":
            # Wait for the result
            return wait_for_result(key)
        elif state["status"] == "solved":
            return {"solution": state["solution"], "source": "dedup_cache"}
        elif state["status"] == "error":
            pass  # Allow retry on error

    # Mark as solving
    r.set(key, json.dumps({"status": "solving", "started": time.time()}), ex=DEDUP_TTL)

    # Submit to CaptchaAI
    resp = requests.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": method,
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": 1
    })
    data = resp.json()

    if data.get("status") != 1:
        r.set(key, json.dumps({"status": "error", "error": data.get("request")}), ex=30)
        return {"error": data.get("request")}

    captcha_id = data["request"]

    # Poll for result
    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get",
            "id": captcha_id, "json": 1
        }).json()

        if result.get("status") == 1:
            solution = result["request"]
            # Cache the result for other workers (short TTL since tokens expire)
            r.set(key, json.dumps({
                "status": "solved",
                "solution": solution,
                "solved_at": time.time()
            }), ex=60)  # Cache result for 60 seconds
            return {"solution": solution, "source": "api"}

        if result.get("request") != "CAPCHA_NOT_READY":
            r.set(key, json.dumps({
                "status": "error", "error": result.get("request")
            }), ex=30)
            return {"error": result.get("request")}

    r.set(key, json.dumps({"status": "error", "error": "TIMEOUT"}), ex=30)
    return {"error": "TIMEOUT"}


def wait_for_result(key, timeout=120):
    """Wait for another worker to finish solving."""
    start = time.time()
    while time.time() - start < timeout:
        data = r.get(key)
        if data:
            state = json.loads(data)
            if state["status"] == "solved":
                return {"solution": state["solution"], "source": "dedup_wait"}
            if state["status"] == "error":
                return {"error": state.get("error", "UNKNOWN")}
        time.sleep(2)
    return {"error": "DEDUP_WAIT_TIMEOUT"}

Implementasi JavaScript

const Redis = require("ioredis");
const axios = require("axios");
const crypto = require("crypto");

const redis = new Redis(process.env.REDIS_URL || "redis://localhost:6379");
const API_KEY = process.env.CAPTCHAAI_API_KEY;
const DEDUP_TTL = 180;

function dedupKey(method, sitekey, pageurl) {
  const raw = `${method}:${sitekey}:${pageurl}`;
  const hash = crypto.createHash("sha256").update(raw).digest("hex").slice(0, 16);
  return `captcha:dedup:${hash}`;
}

async function solveWithDedup(sitekey, pageurl, method = "userrecaptcha") {
  const key = dedupKey(method, sitekey, pageurl);

  // Check existing
  const existing = await redis.get(key);
  if (existing) {
    const state = JSON.parse(existing);
    if (state.status === "solving") return await waitForResult(key);
    if (state.status === "solved") return { solution: state.solution, source: "dedup_cache" };
  }

  // Mark as solving
  await redis.set(key, JSON.stringify({ status: "solving", started: Date.now() }), "EX", DEDUP_TTL);

  // Submit
  const submit = await axios.post("https://ocr.captchaai.com/in.php", null, {
    params: { key: API_KEY, method, googlekey: sitekey, pageurl, json: 1 },
  });

  if (submit.data.status !== 1) {
    await redis.set(key, JSON.stringify({ status: "error", error: submit.data.request }), "EX", 30);
    return { error: submit.data.request };
  }

  const captchaId = submit.data.request;

  for (let i = 0; i < 60; i++) {
    await new Promise((r) => setTimeout(r, 5000));
    const poll = await axios.get("https://ocr.captchaai.com/res.php", {
      params: { key: API_KEY, action: "get", id: captchaId, json: 1 },
    });

    if (poll.data.status === 1) {
      await redis.set(key, JSON.stringify({ status: "solved", solution: poll.data.request }), "EX", 60);
      return { solution: poll.data.request, source: "api" };
    }
    if (poll.data.request !== "CAPCHA_NOT_READY") {
      await redis.set(key, JSON.stringify({ status: "error", error: poll.data.request }), "EX", 30);
      return { error: poll.data.request };
    }
  }

  await redis.set(key, JSON.stringify({ status: "error", error: "TIMEOUT" }), "EX", 30);
  return { error: "TIMEOUT" };
}

async function waitForResult(key, timeout = 120000) {
  const start = Date.now();
  while (Date.now() - start < timeout) {
    const data = await redis.get(key);
    if (data) {
      const state = JSON.parse(data);
      if (state.status === "solved") return { solution: state.solution, source: "dedup_wait" };
      if (state.status === "error") return { error: state.error };
    }
    await new Promise((r) => setTimeout(r, 2000));
  }
  return { error: "DEDUP_WAIT_TIMEOUT" };
}

Alternatif Database Locking

Untuk dedup berbasis PostgreSQL tanpa Redis:

import psycopg2


def solve_with_pg_dedup(conn, sitekey, pageurl):
    """Use PostgreSQL advisory locks for deduplication."""
    # Generate a numeric lock key from the dedup key
    lock_id = hash(f"{sitekey}:{pageurl}") & 0x7FFFFFFF

    cursor = conn.cursor()

    # Try to acquire advisory lock (non-blocking)
    cursor.execute("SELECT pg_try_advisory_lock(%s)", (lock_id,))
    acquired = cursor.fetchone()[0]

    if not acquired:
        # Another worker is solving — wait for result
        cursor.execute("SELECT pg_advisory_lock(%s)", (lock_id,))
        # Lock acquired means other worker finished — check cache
        cursor.execute(
            "SELECT solution FROM captcha_cache "
            "WHERE sitekey = %s AND pageurl = %s "
            "AND created_at > NOW() - INTERVAL '60 seconds'",
            (sitekey, pageurl)
        )
        row = cursor.fetchone()
        cursor.execute("SELECT pg_advisory_unlock(%s)", (lock_id,))
        if row:
            return {"solution": row[0], "source": "pg_cache"}
        return {"error": "NO_CACHED_RESULT"}

    try:
        # Solve the CAPTCHA
        solution = solve_via_api(sitekey, pageurl)
        if solution:
            cursor.execute(
                "INSERT INTO captcha_cache (sitekey, pageurl, solution) "
                "VALUES (%s, %s, %s)",
                (sitekey, pageurl, solution)
            )
            conn.commit()
        return {"solution": solution} if solution else {"error": "SOLVE_FAILED"}
    finally:
        cursor.execute("SELECT pg_advisory_unlock(%s)", (lock_id,))

Metrik Efektivitas Dedup

Lacak penghematan deduplikasi:

def track_dedup_stats(source):
    """Increment counters for dedup tracking."""
    today = time.strftime("%Y-%m-%d")
    r.hincrby(f"dedup:stats:{today}", source, 1)
    r.expire(f"dedup:stats:{today}", 7 * 86400)


def get_dedup_report():
    today = time.strftime("%Y-%m-%d")
    stats = r.hgetall(f"dedup:stats:{today}")
    total = sum(int(v) for v in stats.values())
    saved = int(stats.get("dedup_cache", 0)) + int(stats.get("dedup_wait", 0))
    return {
        "total_requests": total,
        "deduplicated": saved,
        "savings_pct": f"{saved / total * 100:.1f}%" if total else "0%",
        "breakdown": stats
    }

Pemecahan Masalah

Masalah Penyebab Perbaikan
Collision dedup key Hash terlalu pendek atau parameter kurang Sertakan semua parameter khusus CAPTCHA dalam key; naikkan panjang hash
Worker menunggu timeout Worker solver crash TTL pada status solving auto-expire (180 detik)
Hasil cache basi Token kedaluwarsa tapi cache masih valid Set TTL cache hasil lebih pendek dari lifetime token (60 detik untuk reCAPTCHA)
Race condition saat submit Dua worker memeriksa secara bersamaan Gunakan SET NX (set-if-not-exists) untuk akuisisi lock atomik

Pertanyaan Umum

Kapan deduplikasi sepadan dengan kerumitannya?

Ketika Anda memiliki beberapa worker yang menargetkan kombinasi sitekey/pageurl yang sama. Bahkan tingkat dedup 10% menghemat kredit API dalam skala besar — dan menghilangkan waktu solve yang terbuang.

Haruskah saya mendeduplikasi image CAPTCHA?

Ya, tapi gunakan hash konten gambar sebagai bagian dari dedup key. Gambar yang identik menghasilkan teks yang sama, sehingga dedup efektif.

Bagaimana dengan proxy yang berbeda untuk CAPTCHA yang sama?

Jangan sertakan proxy di dedup key. Token solve berfungsi terlepas dari proxy mana yang digunakan. Memasukkan proxy akan menggagalkan deduplikasi.

Langkah Selanjutnya

Berhenti membayar untuk solve CAPTCHA duplikat — dapatkan API key CaptchaAI Anda dan terapkan dedup hari ini.

Panduan terkait:

Komentar dinonaktifkan untuk artikel ini.