Skip to main content
Trading Systems

Inside Our $10K Paper Trading Bot: 30 Days of Real Data

Twenty-five trades. Multiple strategies. LLM-augmented decisions. Here's exactly what happened when we ran an autonomous trading bot on crypto and Polymarket — the wins, the losses, and what we learned.

We said we'd build in public. Not the curated kind — the real kind. So here it is: a complete look at our trading bot's performance over its first 30 days of operation, with every decision logged and every outcome tracked.

No hype. No cherry-picked screenshots. Just data.

The Numbers at a Glance

Total Trades
25
Strategies
9
Starting Capital
$10K
Mode
Paper

What the Bot Actually Did

Over 30 days, the bot executed 25 closed trades across two markets: cryptocurrency spot trading (BTC/USDT, ETH/USDT, SOL/USDT) and Polymarket prediction contracts. Every single decision went through the same pipeline:

  1. Market Data Collection — Price, volume, order book depth, funding rates, open interest, derivatives data (put/call ratios, implied volatility), and on-chain metrics for crypto. Polymarket odds, volume, and resolution timelines for prediction markets.
  2. Strategy Scoring — Nine strategies evaluated in parallel: mean reversion, RSI swing trading, MACD crossovers, Bollinger Bounce, volume breakouts, DCA laddering, minute scalping, agentic LLM analysis, and Polymarket market making.
  3. LLM Reasoning — Claude Opus (primary) or Qwen3.5-plus (fallback) analyzes the regime, evaluates confluence between strategies, and produces a decision with confidence score.
  4. Critic Review — A separate critic model checks for contradictions, excessive risk, consecutive losses, and drawdown thresholds. Hard rules can veto trades automatically.
  5. Execution or Queue — High-confidence actions (85%+) execute immediately. Lower-confidence actions queue for human review via Telegram.

The result: 24 close actions, 6 buy orders, 1 sell order, and 1,550+ hold decisions where the bot correctly chose to do nothing.

What Worked

1. Mean Reversion in Ranging Markets

When the market chops sideways, mean reversion prints. The bot learned to identify ranging regimes using ADX, Bollinger Band width, and price distance from EMA50. In these conditions, it successfully faded extremes:

  • ETH/USDT long at -0.9% — Closed at +1.3% profit when price reverted to the mean. Reasoning: extreme fear reading (0.64) provided contrarian entry edge.
  • ETH/USDT long at +2.1% — Closed for $3.23 profit. Put/call ratio of 0.845 signaled hedging activity — exit before reversion risk materialized.

The key insight: regime matters more than strategy. A strategy that works beautifully in a ranging market gets crushed in a breakout. We built regime awareness into the core decision loop.

2. LLM Critic Prevents Dumb Mistakes

The critic model isn't just a checkbox. It actively prevented bad trades:

  • Consecutive loss veto — After 4 consecutive losses, the hard rule kicked in and vetoed further buys until human review.
  • Drawdown circuit breaker — At 5% max drawdown, the critic automatically vetoed new positions (threshold: 4%).
  • Regime mismatch detection — When the LLM suggested buys in a confirmed downtrend (price below EMA50, ADX > 25, DI- > DI+), the probabilistic veto flagged it with 90% confidence.

The critic logged 126,915 signals over 30 days. Most were "monitor" recommendations — the bot correctly choosing to watch rather than act.

3. Confidence Thresholds Work

We set auto-execute at 85% confidence. Everything else queues for human review. This prevented several potential drawdowns:

  • Trades with confidence < 0.65 were automatically downgraded to "hold" or "queue" — 1,550+ instances where the bot correctly did nothing.
  • When confidence dropped below 0.45, the fallback mechanism kicked in — defaulting to hold rather than forcing a low-conviction trade.

What Broke (and What We Learned)

1. LLM Hallucination Risk

Early in the test, we saw the LLM make confident but wrong calls. Example: suggesting buys when derivatives data showed bearish hedging flow (put/call ratio > 0.85). The fix:

  • Sanitized prompts — We strip out any reasoning that could be manipulated by injected content.
  • Contradiction checks — The LLM must explicitly confirm its reasoning doesn't conflict with the data.
  • Critic disagreement escalation — When the critic disagrees twice in a row, the trade escalates to human review.

2. Provider Outages

We logged 427 "error" actions — mostly provider unreachability. When Claude Opus timed out, the fallback to Qwen3.5-plus via DashScope kicked in. When both failed, the bot defaulted to hold.

Lesson: never depend on a single LLM provider. We now have three tiers: Claude Opus (primary), Qwen3.5-plus (secondary), and local Ollama models (emergency fallback).

3. Overtrading in First Week

The bot initially wanted to trade too frequently. We added:

  • Cluster detection — If 3+ actions occur within 15 minutes, the bot pauses and requires review.
  • Same-symbol cooldown — 45-minute minimum between trades on the same pair.
  • Daily budget cap — $1.50 maximum in simulated trading fees per day.

The Tech Stack (What Actually Runs It)

Here's what keeps the bot running 24/7:

  • Python + asyncio — 15-minute polling loop (900 seconds), with fast mode at 300 seconds during high volatility.
  • LLM Stack — Claude Opus via API (primary), Qwen3.5-plus via DashScope (secondary), Qwen2:7b and Gemma3:4b via Ollama (local fallback).
  • Data Sources — 12+ APIs: crypto exchanges, Polymarket, Deribit for options flow, Fear & Greed Index, LI.FI for cross-chain arbitrage scans.
  • Infrastructure — Runs on WSL (Windows Subsystem for Linux) on a laptop. No AWS. No Kubernetes. Systemd user service with automatic restart on crash.
  • Monitoring — Telegram bot for alerts, Next.js dashboard at hub.tacavar.com for real-time visibility.

Total infrastructure cost: $0/month (existing hardware). The bot runs on the same machine Josh uses for development.

Why Paper Trading Matters

Every trade in this report is simulated. The bot operates with dry_run = true — a flag hardcoded at multiple levels:

  • Config file: dry_run is true
  • Execution layer: refuses to send real orders if dry_run is false
  • Prompt sanitization: removes any instructions that could override the paper trading constraint

We believe in earning the right to trade with real capital. Before we flip the switch, we need to see:

  • Consistent profitability over 6-12 months
  • Max drawdown < 15%
  • Win rate > 45% with positive risk/reward
  • No catastrophic failures (circuit breakers must hold)

We're not there yet. But we're learning. And we're not risking anyone's capital while we figure it out.

What's Next

The roadmap for the next 30 days:

  • Auto-tuning — Strategies that underperform get automatically deprioritized. We're building a decay mechanism that reduces position sizing for weak strategies.
  • Polymarket Market Making — We've enabled MM logic with spread targeting, inventory management, and toxic flow detection. Testing in paper mode first.
  • Copycat Strategies — Tracking top Polymarket traders and mimicking their positions (with LLM validation).
  • Better Visualizations — The dashboard needs equity curves, win/loss heatmaps, and strategy attribution.

The Point of Building in Public

Transparency builds trust. In an industry full of black boxes and unverifiable claims, we're showing exactly how our system works.

We're also holding ourselves accountable. When you build in public, you can't hide failures. Every loss is logged. Every decision is recorded. This discipline makes us better traders.

And honestly? We hope this inspires others. The trading bot space doesn't need more hype. It needs more transparency, more open source collaboration, and more honest conversations about what works and what doesn't.

We'll keep sharing updates. The decisions log is public. The insights are logged. When we win, you'll see it. When we lose, you'll see that too.

That's the point. Building in the open, one trade at a time.


Where We Trade

Every trade in this report ran through Kraken's API — regulated, reliable, and the exchange we trust with our bot's execution. Our derivatives strategies run through Bybit.

Want to see the data yourself?

The trading bot is one of Tacavar's four verticals. Visit the live dashboard or learn more about what we're building.

Affiliate disclosure: This article contains affiliate links to Kraken and Bybit. If you sign up through our links, we may earn a commission at no additional cost to you. We only link to exchanges we actively use.