Situs e-commerce melindungi halaman produk dengan CAPTCHA untuk mencegah pemotongan harga otomatis. CaptchaAI memungkinkan Anda membangun sistem pemantauan harga yang andal yang menangani tantangan ini secara otomatis.
Platform yang Menggunakan CAPTCHA
| Platform | Jenis CAPTCHA | Trigger |
|---|---|---|
| Amazon | Image CAPTCHA, reCAPTCHA | Volume request tinggi |
| Walmart | Cloudflare Turnstile | Deteksi bot |
| eBay | reCAPTCHA v2 | Pola mencurigakan |
| Best Buy | Cloudflare Challenge | Semua traffic otomatis |
| Shopify stores | reCAPTCHA v3 | Bervariasi per konfigurasi toko |
Tanpa penanganan CAPTCHA, pipeline pemantauan Anda akan gagal secara diam-diam, menyebabkan kesenjangan data harga.
Arsitektur
Scheduler (every 30 min)
→ URL Queue
→ Scraper Workers (5-10 concurrent)
→ Fetch page
→ CAPTCHA detected?
→ Yes → CaptchaAI → Solve → Retry page
→ No → Parse prices
→ Store in database
→ Alert on price changes
Implementasi
Pemantau Harga (Python)
import requests
import time
import re
import json
import os
from datetime import datetime
API_KEY = os.environ["CAPTCHAAI_API_KEY"]
BASE_URL = "https://ocr.captchaai.com"
def solve_captcha(method, params):
params["key"] = API_KEY
params["method"] = method
resp = requests.get(f"{BASE_URL}/in.php", params=params)
if not resp.text.startswith("OK|"):
raise Exception(f"Submit failed: {resp.text}")
task_id = resp.text.split("|")[1]
for _ in range(60):
time.sleep(5)
result = requests.get(f"{BASE_URL}/res.php", params={
"key": API_KEY, "action": "get", "id": task_id,
})
if result.text == "CAPCHA_NOT_READY":
continue
if result.text.startswith("OK|"):
return result.text.split("|", 1)[1]
raise Exception(f"Solve failed: {result.text}")
raise TimeoutError("CAPTCHA solve timed out")
def fetch_with_captcha(url, session):
"""Fetch a page, solving CAPTCHAs if encountered."""
resp = session.get(url)
# Check for reCAPTCHA
match = re.search(r'data-sitekey=["\']([A-Za-z0-9_-]+)["\']', resp.text)
if match:
site_key = match.group(1)
token = solve_captcha("userrecaptcha", {
"googlekey": site_key,
"pageurl": url,
})
resp = session.post(url, data={"g-recaptcha-response": token})
# Check for Turnstile
match = re.search(
r'class="cf-turnstile"[^>]*data-sitekey=["\']([^"\']+)', resp.text
)
if match:
site_key = match.group(1)
token = solve_captcha("turnstile", {
"sitekey": site_key,
"pageurl": url,
})
resp = session.post(url, data={"cf-turnstile-response": token})
return resp
def extract_price(html, selectors):
"""Extract price from HTML using regex patterns."""
for pattern in selectors:
match = re.search(pattern, html)
if match:
price_str = match.group(1).replace(",", "")
return float(price_str)
return None
def monitor_prices(products):
"""Monitor prices for a list of products."""
session = requests.Session()
session.headers["User-Agent"] = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/120.0.0.0"
)
results = []
for product in products:
try:
resp = fetch_with_captcha(product["url"], session)
price = extract_price(resp.text, product["selectors"])
results.append({
"name": product["name"],
"url": product["url"],
"price": price,
"timestamp": datetime.utcnow().isoformat(),
"status": "ok",
})
print(f" {product['name']}: ${price}")
except Exception as e:
results.append({
"name": product["name"],
"url": product["url"],
"price": None,
"timestamp": datetime.utcnow().isoformat(),
"status": f"error: {e}",
})
print(f" {product['name']}: ERROR - {e}")
return results
# Define products to monitor
products = [
{
"name": "Wireless Headphones",
"url": "https://example.com/product/headphones",
"selectors": [
r'class="price"[^>]*>\$?([\d,]+\.?\d*)',
r'itemprop="price" content="([\d.]+)"',
],
},
{
"name": "Bluetooth Speaker",
"url": "https://example.com/product/speaker",
"selectors": [
r'class="price"[^>]*>\$?([\d,]+\.?\d*)',
],
},
]
print("Starting price check...")
results = monitor_prices(products)
# Save results
with open("prices.json", "w") as f:
json.dump(results, f, indent=2)
Implementasi Node.js
const axios = require("axios");
const cheerio = require("cheerio");
const API_KEY = process.env.CAPTCHAAI_API_KEY;
async function solveCaptcha(method, params) {
params.key = API_KEY;
params.method = method;
const submit = await axios.get("https://ocr.captchaai.com/in.php", {
params,
});
const taskId = String(submit.data).split("|")[1];
for (let i = 0; i < 60; i++) {
await new Promise((r) => setTimeout(r, 5000));
const poll = await axios.get("https://ocr.captchaai.com/res.php", {
params: { key: API_KEY, action: "get", id: taskId },
});
const text = String(poll.data);
if (text === "CAPCHA_NOT_READY") continue;
if (text.startsWith("OK|")) return text.split("|").slice(1).join("|");
throw new Error(text);
}
throw new Error("Timeout");
}
async function monitorPrice(url) {
const resp = await axios.get(url);
const $ = cheerio.load(resp.data);
// Check for reCAPTCHA
const siteKey = $(".g-recaptcha").attr("data-sitekey");
if (siteKey) {
const token = await solveCaptcha("userrecaptcha", {
googlekey: siteKey,
pageurl: url,
});
// Re-fetch with token
const formResp = await axios.post(url, { "g-recaptcha-response": token });
return cheerio.load(formResp.data);
}
const price = $('[itemprop="price"]').attr("content") || $(".price").text();
return parseFloat(price.replace(/[^0-9.]/g, ""));
}
Penjadwalan
Jalankan pengecekan setiap 30 menit dengan cron:
# crontab -e
*/30 * * * * cd /opt/monitor && python price_monitor.py >> /var/log/prices.log 2>&1
Atau gunakan library schedule Python:
import schedule
schedule.every(30).minutes.do(lambda: monitor_prices(products))
while True:
schedule.run_pending()
time.sleep(60)
Perkiraan Biaya
| Volume | CAPTCHA/Day | Perkiraan. Biaya Harian |
|---|---|---|
| 50 produk, setiap 30 menit | ~2.400 | ~$2-5 |
| 200 produk, setiap 15 menit | ~19.200 | ~$15-30 |
| 1000 produk, setiap jam | ~24.000 | ~$20-40 |
Tidak semua pemuatan halaman memicu CAPTCHA. Biaya sebenarnya mungkin 50-70% lebih rendah.
Pertanyaan Umum
Bagaimana cara mendeteksi perubahan harga?
Bandingkan harga saat ini dengan nilai yang tersimpan. Alert pada perubahan >5% membantu memfilter noise dari fluktuasi kecil.
Apakah saya akan diblokir meskipun CAPTCHA sudah di-solve?
Rotasi proxy dan User-Agent untuk meminimalkan pemblokiran. Beri jarak request sepanjang waktu, bukan fetch berturutan.
Bisakah saya memantau harga dalam berbagai mata uang?
Ya. Parse simbol mata uang di samping harga. CaptchaAI bekerja secara global terlepas dari lokasi situs target.
Panduan Terkait
- Tangani CAPTCHA di Web Scraping
- Pengumpulan Data Riset Pasar