Referensi

Persistensi Sesi Browser untuk Alur Kerja CAPTCHA

Setiap sesi browser baru dimulai dari nol — tanpa cookie, tanpa riwayat, tanpa kepercayaan. Sistem CAPTCHA melihat sesi baru sebagai sesi yang berisiko dan lebih sering memicu tantangan. Sesi yang bertahan di antara proses akan membangun kepercayaan, mengurangi frekuensi CAPTCHA, dan menghindari penyelesaian tantangan yang sama berulang kali.


Mengapa Persistensi Sesi Mengurangi CAPTCHA

Status Sesi Frekuensi CAPTCHA Alasan
Sesi baru (tanpa cookie) Tinggi Tidak ada riwayat kepercayaan, pengguna tidak dikenal
Sesi dengan cookie Google Sedang reCAPTCHA mengenali login Google
Sesi hangat (riwayat browsing) Rendah Sinyal perilaku organik
Profil persisten (berhari-hari) Sangat rendah Skor kepercayaan yang sudah terbentuk

import json
import os
import time
from selenium import webdriver


class PersistentSession:
    def __init__(self, profile_name="default", cookie_dir="./sessions"):
        self.profile_name = profile_name
        self.cookie_dir = cookie_dir
        self.cookie_file = os.path.join(cookie_dir, f"{profile_name}_cookies.json")
        os.makedirs(cookie_dir, exist_ok=True)

    def create_driver(self):
        options = webdriver.ChromeOptions()
        options.add_argument("--window-size=1920,1080")
        return webdriver.Chrome(options=options)

    def save_cookies(self, driver):
        """Save all cookies to disk."""
        cookies = driver.get_cookies()
        with open(self.cookie_file, "w") as f:
            json.dump(cookies, f, indent=2)
        print(f"Saved {len(cookies)} cookies to {self.cookie_file}")

    def load_cookies(self, driver, domain=None):
        """Restore cookies from disk."""
        if not os.path.exists(self.cookie_file):
            print("No saved cookies found")
            return False

        with open(self.cookie_file) as f:
            cookies = json.load(f)

        loaded = 0
        for cookie in cookies:
            # Filter by domain if specified
            if domain and domain not in cookie.get("domain", ""):
                continue

            # Remove problematic fields
            cookie.pop("sameSite", None)
            cookie.pop("storeId", None)

            try:
                driver.add_cookie(cookie)
                loaded += 1
            except Exception as e:
                print(f"Skip cookie {cookie.get('name')}: {e}")

        print(f"Loaded {loaded}/{len(cookies)} cookies")
        return loaded > 0

    def run_with_session(self, url, callback):
        """Run a task with persistent session."""
        driver = self.create_driver()

        try:
            # Navigate to domain first (required for cookie loading)
            driver.get(url)
            time.sleep(1)

            # Load saved cookies
            self.load_cookies(driver)

            # Refresh to apply cookies
            driver.get(url)
            time.sleep(2)

            # Execute task
            result = callback(driver)

            # Save updated cookies
            self.save_cookies(driver)

            return result

        finally:
            driver.quit()


# Usage
session = PersistentSession("target-site")

def my_task(driver):
    # Check if already logged in
    if "dashboard" in driver.current_url:
        print("Session restored — no login needed")
        return driver.page_source
    else:
        print("Need to login + solve CAPTCHA")
        # Solve CAPTCHA with CaptchaAI...
        return None

result = session.run_with_session("https://example.com", my_task)

Direktori Data Pengguna Chrome (Persistensi Profil Lengkap)

Persistensi terlengkap — menyimpan cookie, Penyimpanan lokal, cache, riwayat, dan status browser:

import os
from selenium import webdriver

PROFILE_DIR = os.path.abspath("./chrome-profiles/profile-1")


def create_persistent_driver():
    options = webdriver.ChromeOptions()
    options.add_argument(f"--user-data-dir={PROFILE_DIR}")
    options.add_argument("--profile-directory=Default")
    options.add_argument("--no-sandbox")
    return webdriver.Chrome(options=options)


# First run: builds fresh profile
driver = create_persistent_driver()
driver.get("https://example.com")
# ... solve CAPTCHA, login, etc.
driver.quit()

# Second run: same profile, cookies and state preserved
driver = create_persistent_driver()
driver.get("https://example.com")
# Often skips CAPTCHA because session is recognized
driver.quit()

Manfaat Direktori Data Pengguna

Apa yang Disimpan Dampak pada CAPTCHA
Cookie Token sesi, cookie NID Google
LocalStorage Token kepercayaan khusus situs
IndexedDB Status internal reCAPTCHA
Cache Pemuatan halaman lebih cepat
Riwayat Sinyal pola browsing
Service Worker Pemeriksaan CAPTCHA latar belakang

Konteks Persisten Puppeteer

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra (mode standar)");
const path = require("path");

puppeteer.use(StealthPlugin());

const USER_DATA_DIR = path.resolve("./chrome-profiles/profile-1");

async function runWithPersistentProfile() {
  const browser = await puppeteer.launch({
    headless: false,
    userDataDir: USER_DATA_DIR,
    args: [
      "--no-sandbox",
      "--window-size=1920,1080",
    ],
  });

  const page = await browser.newPage();
  await page.goto("https://example.com", { waitUntil: "networkidle0" });

  // Check if session is active
  const isLoggedIn = await page.evaluate(() =>
    document.querySelector(".user-menu") !== null
  );

  if (isLoggedIn) {
    console.log("Session active — no CAPTCHA needed");
  } else {
    console.log("Session expired — solving CAPTCHA");
    // Solve with CaptchaAI...
  }

  await browser.close();
}

Penyimpanan lokal dan Penyimpanan sesi

def save_storage(driver, filepath):
    """Save localStorage and sessionStorage."""
    storage = driver.execute_script("""
        return {
            localStorage: Object.fromEntries(
                Object.entries(localStorage)
            ),
            sessionStorage: Object.fromEntries(
                Object.entries(sessionStorage)
            ),
        };
    """)

    with open(filepath, "w") as f:
        json.dump(storage, f, indent=2)


def restore_storage(driver, filepath):
    """Restore localStorage and sessionStorage."""
    if not os.path.exists(filepath):
        return

    with open(filepath) as f:
        storage = json.load(f)

    for key, value in storage.get("localStorage", {}).items():
        driver.execute_script(
            f"localStorage.setItem('{key}', '{value}')"
        )

    for key, value in storage.get("sessionStorage", {}).items():
        driver.execute_script(
            f"sessionStorage.setItem('{key}', '{value}')"
        )

Strategi Pemanasan Sesi

Sesi baru memicu lebih banyak CAPTCHA. "Menghangatkan" sesi dengan perilaku organik membangun kepercayaan:

import random
import time

def warm_session(driver, warm_urls=None):
    """Simulate organic browsing to build session trust."""
    default_urls = [
        "https://www.google.com",
        "https://www.google.com/search?q=weather",
        "https://www.wikipedia.org",
    ]

    urls = warm_urls or default_urls

    for url in urls:
        driver.get(url)
        time.sleep(random.uniform(2, 5))

        # Simulate scroll
        driver.execute_script(
            f"window.scrollTo(0, {random.randint(200, 800)})"
        )
        time.sleep(random.uniform(1, 3))

    print(f"Session warmed with {len(urls)} pages")


# Usage
driver = create_persistent_driver()
warm_session(driver)

# Now navigate to target — lower CAPTCHA chance
driver.get("https://staging.example.com/form")

Manajer Sesi Multi-Profil

import os
import json
import time
from datetime import datetime


class SessionManager:
    """Manage multiple persistent browser profiles."""

    def __init__(self, base_dir="./sessions"):
        self.base_dir = base_dir
        self.meta_file = os.path.join(base_dir, "profiles.json")
        os.makedirs(base_dir, exist_ok=True)

        if os.path.exists(self.meta_file):
            with open(self.meta_file) as f:
                self.profiles = json.load(f)
        else:
            self.profiles = {}

    def _save_meta(self):
        with open(self.meta_file, "w") as f:
            json.dump(self.profiles, f, indent=2)

    def get_profile_dir(self, name):
        return os.path.join(self.base_dir, f"profile-{name}")

    def create_profile(self, name, proxy=None):
        """Create a new browser profile."""
        profile_dir = self.get_profile_dir(name)
        os.makedirs(profile_dir, exist_ok=True)

        self.profiles[name] = {
            "created": datetime.now().isoformat(),
            "last_used": None,
            "use_count": 0,
            "proxy": proxy,
            "captcha_solves": 0,
        }
        self._save_meta()
        return profile_dir

    def get_least_used_profile(self):
        """Get the profile used least recently."""
        if not self.profiles:
            return None

        return min(
            self.profiles.items(),
            key=lambda x: x[1].get("last_used") or ""
        )[0]

    def record_use(self, name, solved_captcha=False):
        """Record profile usage."""
        if name in self.profiles:
            self.profiles[name]["last_used"] = datetime.now().isoformat()
            self.profiles[name]["use_count"] += 1
            if solved_captcha:
                self.profiles[name]["captcha_solves"] += 1
            self._save_meta()

    def get_stats(self):
        """Print profile statistics."""
        for name, meta in self.profiles.items():
            print(f"Profile: {name}")
            print(f"  Uses: {meta['use_count']}")
            print(f"  CAPTCHAs: {meta['captcha_solves']}")
            print(f"  Last used: {meta.get('last_used', 'never')}")
            print()


# Usage
manager = SessionManager()

# Create 5 rotating profiles
for i in range(5):
    manager.create_profile(f"worker-{i}")

# Get next profile to use
profile_name = manager.get_least_used_profile()
profile_dir = manager.get_profile_dir(profile_name)

# Use with Selenium
options = webdriver.ChromeOptions()
options.add_argument(f"--user-data-dir={os.path.abspath(profile_dir)}")
driver = webdriver.Chrome(options=options)

# After task
manager.record_use(profile_name, solved_captcha=True)
manager.get_stats()

from datetime import datetime, timezone


def clean_expired_cookies(cookie_file):
    """Remove expired cookies from saved file."""
    if not os.path.exists(cookie_file):
        return

    with open(cookie_file) as f:
        cookies = json.load(f)

    now = datetime.now(timezone.utc).timestamp()
    valid = [c for c in cookies if c.get("expiry", float("inf")) > now]

    removed = len(cookies) - len(valid)
    if removed > 0:
        with open(cookie_file, "w") as f:
            json.dump(valid, f, indent=2)
        print(f"Removed {removed} expired cookies")


def merge_cookies(existing_file, new_cookies):
    """Merge new cookies with existing, preferring newer values."""
    existing = []
    if os.path.exists(existing_file):
        with open(existing_file) as f:
            existing = json.load(f)

    # Index by (name, domain)
    cookie_map = {}
    for c in existing:
        key = (c["name"], c.get("domain", ""))
        cookie_map[key] = c

    for c in new_cookies:
        key = (c["name"], c.get("domain", ""))
        cookie_map[key] = c  # Newer overwrites

    merged = list(cookie_map.values())
    with open(existing_file, "w") as f:
        json.dump(merged, f, indent=2)

    return len(merged)

Pemecahan Masalah

Masalah Penyebab Solusi
Cookie tidak dimuat Belum navigasi ke domain terlebih dahulu Panggil driver.get(url) sebelum add_cookie
Error profil terkunci Chrome sebelumnya tidak ditutup Matikan proses Chrome, hapus SingletonLock
Sesi masih kedaluwarsa Cookie sameSite tidak cocok Hapus sameSite sebelum memuat
Storage diblokir Konteks CORS/keamanan Muat storage setelah navigasi ke origin yang benar
CAPTCHA rate naik seiring waktu IP ditandai Rotasi proxy per profil

Pertanyaan Umum

Berapa lama sesi browser mengurangi frekuensi CAPTCHA?

Cookie NID Google bertahan selama 6 bulan. Cf_clearance Cloudflare biasanya berlangsung 15 menit hingga 1 jam. Pertahankan dan segarkan secara berkala.

Bisakah saya berbagi sesi antar mesin?

Ya — ekspor file cookie dan folder direktori data pengguna. Cocokkan zona waktu dan proxy dengan sesi asli untuk hasil terbaik.

Apakah persistensi sesi berfungsi dengan Chrome headless?

Ya. Direktori data pengguna dan file cookie bekerja identik dalam mode headless. Cookie yang tersimpan membawa sinyal kepercayaan yang sama.

Berapa banyak profil yang harus saya pertahankan?

Untuk penggunaan bergilir, pertahankan 5–10 profil per situs target. Rotasi penggunaan untuk menghindari rate limiting pada satu profil.

Apakah CaptchaAI mendapat manfaat dari persistensi sesi?

Secara tidak langsung — persistensi sesi mengurangi frekuensi CAPTCHA, menurunkan jumlah panggilan CaptchaAI yang diperlukan (menghemat biaya). Ketika CAPTCHA muncul, CaptchaAI menyelesaikannya seperti biasa.


Panduan Terkait

  • Isolasi Profil Browser + Integrasi CaptchaAI
  • Puppeteer + CaptchaAI untuk QA

Bangun sesi browser persisten yang mengurangi tantangan CAPTCHA — dapatkan kunci CaptchaAI Anda ketika tantangan masih muncul.

Komentar dinonaktifkan untuk artikel ini.