Tutorial

Throughput Penyelesaian CAPTCHA: Cara Memproses 10.000 Tugas per Jam

Memproses 10.000 CAPTCHA per jam berarti ~2,8 solve per detik secara berkelanjutan. Ini dapat dicapai dengan arsitektur yang tepat. Panduan ini membahas matematika, kode, dan tuning yang diperlukan menggunakan CaptchaAI.

Matematika

Jika satu penyelesaian reCAPTCHA v2 memerlukan waktu 15 detik (median):

  • Berurutan: 3.600 detik / 15 detik = 240 solve/jam
  • Untuk mencapai 10.000/jam: Anda memerlukan ~42 solve concurrent yang sedang berjalan setiap saat

Insight utama: Anda tidak menunggu CaptchaAI menjadi lebih cepat — cukup overlap request sehingga 42 solve selesai dalam jangka waktu 15 detik yang sama.

Arsitektur

┌──────────┐     ┌────────────┐     ┌─────────────┐     ┌──────────┐
│  Task     │────▶│  Submit    │────▶│  CaptchaAI  │────▶│  Result  │
│  Queue    │     │  Workers   │     │  API        │     │  Store   │
│  (Redis)  │     │  (async)   │     │             │     │  (DB)    │
└──────────┘     └────────────┘     └─────────────┘     └──────────┘
                       │                    ▲
                       │    ┌──────────┐    │
                       └───▶│  Poll    │────┘
                            │  Workers │
                            └──────────┘

Komponen:

  1. Task queue – Menyimpan task CAPTCHA pending dengan sitekey dan URL
  2. Submit workers – Submit task ke CaptchaAI API secara concurrent
  3. Poll workers – Periksa hasil pada interval optimal
  4. Result store – Simpan token saat tiba

Python: Pipa Asinkron

# high_throughput_solver.py
import os
import asyncio
import time
import aiohttp

API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
BASE_URL = "https://ocr.captchaai.com"
MAX_CONCURRENT = 50  # Max simultaneous solves
POLL_INTERVAL = 5    # Seconds between polls
INITIAL_WAIT = 12    # Seconds before first poll

semaphore = asyncio.Semaphore(MAX_CONCURRENT)
stats = {"submitted": 0, "solved": 0, "failed": 0, "start": 0}

async def solve_one(session, sitekey, pageurl, task_num):
    """Submit and poll a single CAPTCHA."""
    async with semaphore:
        try:
            # Submit
            async with session.get(f"{BASE_URL}/in.php", params={
                "key": API_KEY, "method": "userrecaptcha",
                "googlekey": sitekey, "pageurl": pageurl, "json": "1",
            }) as resp:
                result = await resp.json(content_type=None)

            if result.get("status") != 1:
                stats["failed"] += 1
                return None

            stats["submitted"] += 1
            task_id = result["request"]

            # Wait before first poll
            await asyncio.sleep(INITIAL_WAIT)

            # Poll
            for _ in range(25):
                async with session.get(f"{BASE_URL}/res.php", params={
                    "key": API_KEY, "action": "get",
                    "id": task_id, "json": "1",
                }) as resp:
                    poll_result = await resp.json(content_type=None)

                if poll_result.get("status") == 1:
                    stats["solved"] += 1
                    return poll_result["request"]

                if poll_result.get("request") != "CAPCHA_NOT_READY":
                    stats["failed"] += 1
                    return None

                await asyncio.sleep(POLL_INTERVAL)

            stats["failed"] += 1
            return None

        except Exception as e:
            stats["failed"] += 1
            return None

async def run_batch(tasks):
    """Process a batch of CAPTCHA tasks concurrently."""
    connector = aiohttp.TCPConnector(
        limit=MAX_CONCURRENT,
        keepalive_timeout=60,
    )
    async with aiohttp.ClientSession(connector=connector) as session:
        coros = [
            solve_one(session, task["sitekey"], task["pageurl"], i)
            for i, task in enumerate(tasks)
        ]
        results = await asyncio.gather(*coros)
    return results

async def main():
    # Generate test tasks (replace with your task source)
    tasks = [
        {
            "sitekey": "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
            "pageurl": "https://www.google.com/recaptcha/api2/demo",
        }
        for _ in range(100)  # Start with 100 tasks
    ]

    stats["start"] = time.time()
    print(f"Processing {len(tasks)} tasks with {MAX_CONCURRENT} concurrent workers")

    results = await run_batch(tasks)
    elapsed = time.time() - stats["start"]

    print(f"\nCompleted in {elapsed:.0f}s")
    print(f"Submitted: {stats['submitted']}")
    print(f"Solved: {stats['solved']}")
    print(f"Failed: {stats['failed']}")
    print(f"Throughput: {stats['solved'] / (elapsed / 3600):.0f} solves/hour")

asyncio.run(main())

JavaScript: Alur Concurrent

// high_throughput_solver.js
const axios = require('axios');
const https = require('https');

const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';
const BASE = 'https://ocr.captchaai.com';
const MAX_CONCURRENT = 50;

const agent = new https.Agent({ keepAlive: true, maxSockets: MAX_CONCURRENT });
const api = axios.create({ baseURL: BASE, httpsAgent: agent, timeout: 30000 });

const stats = { submitted: 0, solved: 0, failed: 0 };

async function solveOne(sitekey, pageurl) {
  try {
    const submit = await api.get('/in.php', {
      params: { key: API_KEY, method: 'userrecaptcha', googlekey: sitekey, pageurl, json: '1' },
    });
    if (submit.data.status !== 1) { stats.failed++; return null; }
    stats.submitted++;

    await new Promise(r => setTimeout(r, 12000));

    for (let i = 0; i < 25; i++) {
      const poll = await api.get('/res.php', {
        params: { key: API_KEY, action: 'get', id: submit.data.request, json: '1' },
      });
      if (poll.data.status === 1) { stats.solved++; return poll.data.request; }
      if (poll.data.request !== 'CAPCHA_NOT_READY') { stats.failed++; return null; }
      await new Promise(r => setTimeout(r, 5000));
    }
    stats.failed++;
    return null;
  } catch { stats.failed++; return null; }
}

async function runWithConcurrency(tasks, limit) {
  const results = [];
  const executing = new Set();

  for (const task of tasks) {
    const p = solveOne(task.sitekey, task.pageurl).then(r => {
      executing.delete(p);
      return r;
    });
    executing.add(p);
    results.push(p);

    if (executing.size >= limit) {
      await Promise.race(executing);
    }
  }
  return Promise.all(results);
}

(async () => {
  const tasks = Array.from({ length: 100 }, () => ({
    sitekey: '6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-',
    pageurl: 'https://www.google.com/recaptcha/api2/demo',
  }));

  const start = Date.now();
  console.log(`Processing ${tasks.length} tasks, ${MAX_CONCURRENT} concurrent`);

  await runWithConcurrency(tasks, MAX_CONCURRENT);
  const elapsed = (Date.now() - start) / 1000;

  console.log(`\nDone in ${elapsed.toFixed(0)}s`);
  console.log(`Solved: ${stats.solved}, Failed: ${stats.failed}`);
  console.log(`Throughput: ${(stats.solved / (elapsed / 3600)).toFixed(0)} solves/hour`);

  agent.destroy();
})();

Parameter Penyetelan

Parameter Konservatif Seimbang Agresif
MAX_CONCURRENT 20 50 100
INITIAL_WAIT 15 dtk 12 dtk 10 dtk
POLL_INTERVAL 7 dtk 5 dtk 3 dtk
MAX_POLL_ATTEMPTS 30 25 20
Throughput diharapkan ~4.800/jam ~10.000/jam ~18.000/jam

Mulailah secara konservatif dan tingkatkan MAX_CONCURRENT hingga Anda melihat hasil yang semakin berkurang atau tingkat kesalahan yang meningkat.

Pemantauan Throughput

Lacak metrik ini secara real-time:

  • Solve per menit — Harus tetap di ~167 untuk target 10K/jam
  • Error rate – Tetap di bawah 5%. Jika melonjak, kurangi concurrency
  • Kedalaman queue – Jika bertambah, tambah worker. Jika kosong, provisioning berlebihan
  • Waktu solve P90 – Jika meningkat, CaptchaAI mungkin sedang rate-limiting

Pemecahan Masalah

Masalah Penyebab Solusi
Throughput plateau di ~5K/jam Concurrency tidak cukup Naikkan MAX_CONCURRENT ke 80–100
Error rate > 10% API kelebihan beban atau proxy buruk Kurangi concurrency; periksa kesehatan proxy
Memory meningkat Akumulasi task tak terbatas Proses hasil saat datang, jangan buffer
ERROR_NO_SLOT_AVAILABLE Queue CaptchaAI penuh Backoff dan retry setelah 5 detik

Pertanyaan Umum

Berapa batas concurrency CaptchaAI?

Tidak ada batasan ketat pada request concurrent, namun concurrency sangat tinggi (500+) dapat memicu rate limiting. Mulai dari 50 dan naikkan bertahap.

Bisakah saya menjalankan ini di banyak mesin?

Ya. Gunakan shared queue (Redis, RabbitMQ) dan jalankan worker di beberapa server. Setiap worker memproses task secara independen.

Bagaimana pemantauan saldo pada volume ini?

Pada 10.000 solve/jam, pantau saldo Anda dengan cermat. Gunakan endpoint cek saldo (res.php?action=getbalance) dan atur peringatan.

Langkah Selanjutnya

Bangun pipeline CAPTCHA throughput tinggi Anda — dapatkan kunci API CaptchaAI Anda.

Panduan terkait:

  • Koneksi Keep-Alive HTTP/2 untuk CAPTCHA API
  • Benchmarking Waktu Solve CAPTCHA
Komentar dinonaktifkan untuk artikel ini.