DevOps & Skalabilitas

ELK Stack untuk Analisis Log Solve CAPTCHA

Saat pipeline CAPTCHA Anda memproses ribuan task, grep tidak berskala. ELK Stack (Elasticsearch, Logstash, Kibana) memungkinkan Anda mencari, mengagregasi, dan memvisualisasikan log solve — menemukan pola error, melacak tren latensi, dan mendiagnosis masalah dalam hitungan detik.

Arsitektur

[CAPTCHA Workers] → JSON logs → [Filebeat] → [Logstash] → [Elasticsearch]
                                                                ↓
                                                           [Kibana]

Structured Logging

Python — Output Log JSON

import os
import json
import time
import logging
import sys
import requests

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        # Add extra fields
        if hasattr(record, "captcha_id"):
            log_entry["captcha_id"] = record.captcha_id
        if hasattr(record, "captcha_type"):
            log_entry["captcha_type"] = record.captcha_type
        if hasattr(record, "solve_time"):
            log_entry["solve_time"] = record.solve_time
        if hasattr(record, "error_code"):
            log_entry["error_code"] = record.error_code
        if hasattr(record, "target_url"):
            log_entry["target_url"] = record.target_url
        if hasattr(record, "poll_count"):
            log_entry["poll_count"] = record.poll_count
        return json.dumps(log_entry)


# Configure logger
logger = logging.getLogger("captchaai")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)

session = requests.Session()


def solve_captcha(sitekey, pageurl, captcha_type="recaptcha_v2"):
    extra = {"captcha_type": captcha_type, "target_url": pageurl}

    # Submit
    resp = session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": 1
    })
    data = resp.json()

    if data.get("status") != 1:
        logger.error("Submit failed", extra={
            **extra, "error_code": data.get("request")
        })
        return {"error": data.get("request")}

    captcha_id = data["request"]
    extra["captcha_id"] = captcha_id
    logger.info("Task submitted", extra=extra)

    # Poll
    start = time.time()
    poll_count = 0
    for _ in range(60):
        time.sleep(5)
        poll_count += 1
        result = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": captcha_id, "json": 1
        }).json()

        if result.get("status") == 1:
            elapsed = round(time.time() - start, 2)
            logger.info("Solve success", extra={
                **extra,
                "solve_time": elapsed,
                "poll_count": poll_count
            })
            return {"solution": result["request"]}

        if result.get("request") != "CAPCHA_NOT_READY":
            logger.error("Solve failed", extra={
                **extra,
                "error_code": result.get("request"),
                "poll_count": poll_count
            })
            return {"error": result.get("request")}

    logger.error("Solve timeout", extra={
        **extra,
        "error_code": "TIMEOUT",
        "poll_count": poll_count
    })
    return {"error": "TIMEOUT"}

JavaScript — Structured Logging

const axios = require("axios");

const API_KEY = process.env.CAPTCHAAI_API_KEY;

function log(level, message, fields = {}) {
  const entry = {
    timestamp: new Date().toISOString(),
    level,
    message,
    service: "captcha-worker",
    ...fields,
  };
  console.log(JSON.stringify(entry));
}

async function solveCaptcha(sitekey, pageurl, captchaType = "recaptcha_v2") {
  const fields = { captchaType, targetUrl: pageurl };

  const submitResp = await axios.post("https://ocr.captchaai.com/in.php", null, {
    params: {
      key: API_KEY, method: "userrecaptcha",
      googlekey: sitekey, pageurl, json: 1,
    },
  });

  if (submitResp.data.status !== 1) {
    log("error", "Submit failed", { ...fields, errorCode: submitResp.data.request });
    return { error: submitResp.data.request };
  }

  const captchaId = submitResp.data.request;
  fields.captchaId = captchaId;
  log("info", "Task submitted", fields);

  const startTime = Date.now();
  let pollCount = 0;

  for (let i = 0; i < 60; i++) {
    await new Promise((r) => setTimeout(r, 5000));
    pollCount++;

    const pollResp = await axios.get("https://ocr.captchaai.com/res.php", {
      params: { key: API_KEY, action: "get", id: captchaId, json: 1 },
    });

    if (pollResp.data.status === 1) {
      const solveTime = ((Date.now() - startTime) / 1000).toFixed(2);
      log("info", "Solve success", { ...fields, solveTime: parseFloat(solveTime), pollCount });
      return { solution: pollResp.data.request };
    }

    if (pollResp.data.request !== "CAPCHA_NOT_READY") {
      log("error", "Solve failed", { ...fields, errorCode: pollResp.data.request, pollCount });
      return { error: pollResp.data.request };
    }
  }

  log("error", "Solve timeout", { ...fields, errorCode: "TIMEOUT", pollCount });
  return { error: "TIMEOUT" };
}

module.exports = { solveCaptcha };

Konfigurasi Filebeat

# filebeat.yml
filebeat.inputs:

  - type: log
    paths:

      - /var/log/captcha-worker/*.log
    json:
      keys_under_root: true
      add_error_key: true
      message_key: message

output.logstash:
  hosts: ["logstash:5044"]

Pipeline Logstash

# logstash-captcha.conf
input {
  beats {
    port => 5044
  }
}

filter {
  # Parse JSON logs
  json {
    source => "message"
    target => "captcha"
  }

  # Add computed fields
  if [captcha][solve_time] {
    mutate {
      add_field => {
        "solve_time_bucket" => "fast"
      }
    }
    if [captcha][solve_time] > 30 {
      mutate { update => { "solve_time_bucket" => "medium" } }
    }
    if [captcha][solve_time] > 90 {
      mutate { update => { "solve_time_bucket" => "slow" } }
    }
  }

  # Extract date
  date {
    match => ["[captcha][timestamp]", "ISO8601"]
    target => "@timestamp"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "captcha-logs-%{+YYYY.MM.dd}"
  }
}

Index Template Elasticsearch

{
  "index_patterns": ["captcha-logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "mappings": {
      "properties": {
        "captcha_type": { "type": "keyword" },
        "captcha_id": { "type": "keyword" },
        "error_code": { "type": "keyword" },
        "solve_time": { "type": "float" },
        "poll_count": { "type": "integer" },
        "target_url": { "type": "keyword" },
        "level": { "type": "keyword" },
        "message": { "type": "text" }
      }
    }
  }
}

Panel Dashboard Kibana

Panel Visualisasi Query
Solve success rate Metric level:info AND message:"Solve success" / total
Error breakdown Pie chart level:error dikelompokkan berdasarkan error_code
Latensi seiring waktu Line chart Rata-rata solve_time dari waktu ke waktu
Error seiring waktu Bar chart Count level:error per bucket 5 menit
Solve paling lambat Data table Top 10 berdasarkan solve_time menurun
Aktivitas queue Area chart Count berdasarkan message ("Task submitted" vs "Solve success")

Query yang Berguna

# Semua error dalam 1 jam terakhir
level:error AND @timestamp:[now-1h TO now]

# Error timeout untuk reCAPTCHA
error_code:TIMEOUT AND captcha_type:recaptcha_v2

# Solve lambat (> 60 detik)
solve_time:>60

# Error untuk target URL tertentu
level:error AND target_url:"example.com"

# Investigasi CAPTCHA ID tertentu
captcha_id:"73519847"

Pemecahan Masalah

Masalah Penyebab Perbaikan
Log tidak muncul di Kibana Filebeat tidak mengirim log Periksa log Filebeat; verifikasi kecocokan pola path
Error parse JSON Baris non-JSON dalam file log Tambahkan json.keys_under_root ke Filebeat; perbaiki output logger
Terlalu banyak index Index harian tanpa ILM Setup Index Lifecycle Management dengan retensi 30 hari
Query lambat Mapping keyword tidak ada Gunakan tipe keyword untuk field yang bisa difilter, bukan text

Pertanyaan Umum

Berapa lama saya harus menyimpan log CAPTCHA?

30 hari untuk log operasional. 90 hari jika Anda membutuhkan analisis tren. Gunakan Elasticsearch ILM untuk menghapus index lama secara otomatis.

Bisakah saya menggunakan OpenSearch daripada Elasticsearch?

Ya. OpenSearch kompatibel API dengan Elasticsearch. Plugin output Logstash, Filebeat, dan OpenSearch Dashboards (pengganti Kibana) bekerja dengan cara yang sama.

Haruskah saya mencatat teks solusi CAPTCHA?

Tidak. Solusi adalah token sekali pakai tanpa nilai diagnostik. Mencatatnya menambah biaya storage dan dapat menimbulkan masalah keamanan. Catat hanya metadata (ID, tipe, latensi, status).

Langkah Selanjutnya

Cari dan analisis log CAPTCHA Anda — dapatkan API key CaptchaAI Anda dan setup ELK.

Panduan terkait:

  • Structured Logging
  • Monitoring Datadog
  • OpenTelemetry Tracing
Komentar dinonaktifkan untuk artikel ini.