Use arXiv to Research the Science Behind Competitor Products with Claude Code
When a competitor launches a feature that seems technically impressive — better search results, faster inference, more accurate recommendations — there’s usually a published paper behind it. Companies publish research to attract talent, and that research tells you exactly how their systems work. AgentPatch’s arXiv Search tool lets Claude Code find those papers so you can understand what you’re competing against.
Why This Matters
Competitive research in tech often stops at the product level: what features they shipped, what their pricing looks like, what users say in reviews. But for engineering teams, the more useful question is how they built it. If a competitor’s search relevance just got noticeably better, knowing they adopted a specific retrieval architecture tells you something actionable.
arXiv is where this information lives. Most major AI labs and many tech companies publish their methods. The challenge is finding the right papers among the millions on arXiv. Claude Code with arXiv Search can do that lookup for you as part of a broader research session.
Setup
The AgentPatch CLI is designed for AI agents to use via shell access. Install it, and your agent can discover and invoke any tool on the marketplace.
Install (zero dependencies, Python 3.10+):
pip install agentpatch
Set your API key:
export AGENTPATCH_API_KEY=your_api_key
Example commands your agent will use:
ap search "web search"
ap run google-search --input '{"query": "test"}'
Get your API key from the AgentPatch dashboard.
Skill (Recommended)
Install the AgentPatch skill — it teaches Claude Code when to use AgentPatch and how to use the CLI:
/plugin marketplace add fullthom/agentpatch-claude-skill
/plugin install agentpatch@agentpatch
MCP Server (Alternative)
If you prefer raw MCP tool access instead of the skill:
claude mcp add -s user --transport http agentpatch https://agentpatch.ai/mcp \
--header "Authorization: Bearer YOUR_API_KEY"
Replace YOUR_API_KEY with your actual key from the AgentPatch dashboard.
Example
Your team noticed that a competitor’s code completion tool got significantly faster without losing quality. You want to understand what changed:
“Search arXiv for papers published by [competitor] on code completion or code generation in the last 6 months. Try cs.SE and cs.CL categories.”
Claude Code runs the search:
Searching arXiv for "[competitor] code completion" in cs.SE, sorted by
submittedDate descending, limit 20...
Also searching in cs.CL...
Found 4 papers from authors affiliated with [competitor]:
1. "Efficient Code Generation via Cascaded Model Routing" (2026-01-22)
Authors: [competitor research team]
Category: cs.SE, cs.CL
Abstract: We propose a cascaded approach where a small model handles
"easy" completions and a larger model is routed to only for complex
predictions. We show 4x latency reduction with <2% quality loss on
HumanEval.
2. "FIM-Turbo: Faster Fill-in-the-Middle with Speculative Decoding" (2025-12-08)
Authors: [competitor research team]
Category: cs.CL
Abstract: Applies speculative decoding specifically to fill-in-the-middle
tasks. Uses a 150M parameter draft model trained on code...
3. "Repository-Level Context Compression for Code LLMs" (2025-11-19)
Authors: [competitor research team]
Category: cs.SE
Abstract: Proposes compressing repository-level context into a fixed-size
representation, reducing prompt length by 85% while retaining retrieval
accuracy for relevant code context.
4. "Evaluating Code Completion in Production: Metrics Beyond HumanEval" (2025-10-30)
Authors: [competitor research team]
Category: cs.SE
Abstract: Introduces production-oriented metrics for code completion
quality...
Paper #1 likely explains the speed improvement you noticed. They're using model
cascading — a small model handles routine completions, and the large model only
activates for harder cases. Combined with speculative decoding from paper #2,
that's a significant latency reduction.
“Can we replicate the cascaded routing approach? What would we need?”
Based on paper #1's abstract:
- A small model (they used ~1B parameters) fine-tuned on code completion
- A routing mechanism that classifies completion difficulty — they use the
small model's confidence score as the signal
- A threshold that determines when to escalate to the large model
The core idea is straightforward. The hard part is training the small model to be
good enough on "easy" completions and calibrating the confidence threshold.
From paper #3, they also reduce context length before sending to the large model,
which further cuts latency.
Want me to search for open-source implementations of model cascading for
inference? There may be related work from other groups that includes code.
“Yes, search for that.”
Searching arXiv for "model cascading inference routing" in cs.LG, sorted by
relevance, limit 10...
Found relevant open work:
- "FrugalGPT: How to Use Large Language Models While Reducing Cost" has an
open implementation of cascaded LLM routing...
In three turns, you went from “why is their product faster?” to understanding the published technique, assessing feasibility, and finding reference implementations. That’s competitive intelligence grounded in actual research, not speculation.
Wrapping Up
AgentPatch gives Claude Code access to arXiv Search so you can investigate the research behind any product or technique. Combine it with other marketplace tools — Google Search for product announcements, Hacker News Search for community reactions — to build a complete picture. Explore the full toolkit at agentpatch.ai.