AI for Competitive & Strategic Analysis — What It Can and Can't Do

AI for Competitive & Strategic Analysis

TL;DR: AI can do competitive and strategic analysis at a genuinely useful level — peer-reviewed studies find LLMs generate and evaluate business strategies comparable to entrepreneurs and investors. But three disciplines separate signal from confident noise. (1) Aggregate, don’t single-shot: one LLM evaluation is inconsistent and order-biased; aggregating many evaluations (varying models, roles, prompts) produces rankings that match human experts (r=0.675) — Doshi et al. 2025, Strategic Management Journal. (2) Keep humans on the judgment dimensions: LLMs match expert scores on average (r=0.52) but are weakest exactly where it matters — innovation (r=0.21) and product-market fit (r=0.24) — Csaszar et al. 2024, Strategy Science. (3) Sequence it right: using an LLM for both problem-framing and idea-generation cuts strategic quality by 15 percentage points via anchoring; let humans frame the competitive question, use AI to generate options — Wu, Kim & Lin 2025 (INSEAD RCT, n=305). And in monitoring, AI triages signal but doesn’t decide — it does coarse relevance well (~92%) and nuanced editorial judgment poorly.

What it means

“AI for competitive analysis” covers using LLMs to do the analytical work of competitive and strategic intelligence: evaluating competitors’ positions, generating strategic options, scoring business models, and triaging the firehose of competitor signals. The 2024–2026 research is unusually clear that this works — and unusually clear about the three places it breaks. This page is the evidence base for how to deploy AI in the competitor-analysis/overview cluster without getting confident-but-wrong output.

It’s a direct application of glossary/appropriate-reliance: the goal isn’t maximal AI use, it’s calibrated use — heavy where AI is strong (volume, aggregation, triage), light where it’s weak (judgment, framing).

Finding 1 — A single LLM is a biased evaluator; aggregate many

Doshi, Bell, Mirzayev & Vanneste (2025, Strategic Management Journal — FT50, Tier 1) tested LLMs as evaluators of business models. A single LLM evaluation is inconsistent (the order options are presented in changes the verdict; consistency ranged 29.9–80.9%) and biased (systematic preference for the first or second option shown). So a one-prompt “ask ChatGPT to rank these competitors” is unreliable.

But aggregating many evaluations — varying the model, the assigned role, and the prompt, then pooling — produces rankings that resemble human-expert rankings (Pearson 0.675, Spearman 0.463), agreeing with experts more than human non-experts do. Aggregation is the validity mechanism. This is the “LLM jury” discipline, and it’s the same statistical move that underpins glossary/share-of-model measurement (run the prompt set many times; report the distribution, not one answer).

Finding 2 — LLMs match experts on average, but are weakest on judgment

Csaszar, Ketkar & Kim (2024, Strategy Science — INFORMS, Tier 1) ran two studies with entrepreneurs and investors. LLMs generated business plans rated 0.14 SD higher than entrepreneurs’ own (judged by 250 experienced investors) and evaluated plans in line with VCs/angels (r=0.52, ~29% of variance, across 138 plans / 541 evaluations).

The load-bearing caveat: LLMs were weakest precisely on the judgment-heavy dimensions — innovation (r=0.21) and product-market fit (r=0.24). These are the glossary/jagged-frontier outside-frontier zones: pattern-matching handles structured evaluation well and genuine novelty-assessment poorly. The competitive-analysis implication: trust AI for structured scoring (feature comparisons, positioning maps, completeness checks); keep humans on “is this actually differentiated?” and “will the market want it?”

Finding 3 — Sequence matters: humans frame, AI generates

Wu, Kim & Lin (2025, INSEAD working paper — pre-registered RCT, n=305 MBA students, three arms) found a sharp sequencing effect. Using an LLM for both problem formulation and idea generation reduced strategic quality — the proportion of strategic options dropped 7pp and the likelihood of choosing a strategic option fell 15 percentage points — via cognitive anchoring on the AI’s framing. Critically, that drop was absent when the LLM was used for idea generation only.

The rule for CI workflows: let humans frame the competitive question; use AI to generate and expand options against that frame. Outsourcing the framing to AI quietly narrows the strategy space. (Working paper, MBA subjects — directional, not yet peer-reviewed.)

Finding 4 — AI triages signal; humans decide

For the continuous-monitoring layer, the evidence draws a clean boundary. LLMs do coarse relevance triage well — up to ~92% agreement (within ±1) with expert newsworthiness ratings in a high-volume news-stream study (Hagar & Diakopoulos 2025) — but consistently struggle with nuanced judgments requiring domain expertise. And raw LLMs lack current market knowledge (knowledge-cutoff + incomplete competitive-landscape understanding; Nokia Bell Labs 2025), so they need retrieval/grounding for anything time-sensitive.

The boundary: AI filters the firehose; humans make the call. Use AI to triage competitor signals (which of these 500 alerts matter), not to decide what they mean. This is exactly the glossary/continuous-monitoring discipline — the 2026 bottleneck is signal-to-noise triage, and that’s the part AI does well.

(Honesty note: both monitoring papers had their stronger method claims — “multi-aspect cueing consistently improves performance,” “F1=0.94 first-pass filter” — refuted in independent verification. Only the limitation/triage-ceiling framings above survived. Cite accordingly.)

The integrated playbook

Aggregate, never single-shot. For any AI competitive evaluation, run it many times across models/roles/prompts and pool — a lone answer is biased and order-dependent.
Humans frame the question; AI expands the options. Don’t let the AI set the competitive frame — anchoring narrows your strategy space.
Keep judgment human. Differentiation, novelty, product-market fit — the dimensions where AI scores worst — stay with people.
AI triages, humans decide. Use AI to filter competitor-signal volume; reserve interpretation and action for humans.
Ground anything time-sensitive. Raw LLMs don’t know current market reality — pair with retrieval, or the output is confidently stale.

Honest limits

The two Tier-1 strategic-decision papers (Doshi; Csaszar) are about evaluating business models/strategies, not CI or win-loss directly — their CI relevance is by extension (the AI-as-evaluator discipline transfers; the studies don’t test competitor analysis per se).
Wu et al. is a working paper with MBA-student subjects on a single task — directional.
The monitoring evidence (Nokia, Hagar) is non-peer-reviewed / domain-transferred (news → CI), and both papers’ method claims were refuted — only the limitation framings are citable.
All findings are on 2024–2026-era models and will shift as capability moves the frontier.
“Aggregate many runs” raises cost — the validity gain is real but isn’t free.

competitor-analysis/overview — the five-layer CI methodology this page tells you how to AI-augment
glossary/appropriate-reliance — the parent principle: calibrated, not maximal, AI use
glossary/jagged-frontier — why AI is weakest on the judgment dimensions (innovation, PMF)
glossary/share-of-model — aggregation-across-runs is the same discipline that makes share-of-model measurement valid
glossary/continuous-monitoring — the triage-not-decision boundary applied to signal monitoring
glossary/win-loss-analysis — the judgment-heavy CI layer where human interviewing stays load-bearing
glossary/automation-eats-execution — AI compresses CI execution (triage, scoring); strategy/judgment stays human
competitor-analysis/dna-beauty-paid-social-whitespace — a worked sweep output: the indirect ad-sweep method with calibrated whitespace confidence
glossary/creative-reverse-engineering — Layer 5 of the CI methodology: deconstructing competitors’ winning ad creative into transferable formulas (the layer this page’s other links omit)

Sources

Doshi, A. R., Bell, J. J., Mirzayev, E. & Vanneste, B. S. (2025). Generative artificial intelligence and evaluating strategic decisions. Strategic Management Journal, 46(3), 583–610. DOI: 10.1002/smj.3677 (open access). Single LLM inconsistent/biased; aggregated evaluations match experts (r=0.675). [verified CONFIRMED, Tier 1, 3-0]
Csaszar, F. A., Ketkar, H. & Kim, H. (2024). Artificial Intelligence and Strategic Decision-Making: Evidence from Entrepreneurs and Investors. Strategy Science, 9(4), 322–345 (INFORMS). arXiv:2408.08811. LLMs ~ experts on average (r=0.52); weakest on innovation (r=0.21) and PMF (r=0.24). [verified CONFIRMED, Tier 1, 3-0]
Wu, N., Kim, H. & Lin, C. (2025). From Problems to Solutions in Strategic Decision-Making: The Effects of Generative AI on Problem Formulation. INSEAD Working Paper 2025/42/STR. DOI: 10.2139/ssrn.5456494. Pre-registered RCT, n=305: AI for both framing+ideation cuts strategic quality −15pp via anchoring. [verified CONFIRMED, Tier 2 working paper, 3-0]
Hagar, N., Silver, B., Spencer, J. & Diakopoulos, N. (2025). LLM-Assisted News Discovery in High-Volume Information Streams. Computation + Journalism Symposium 2025. arXiv:2509.25491. Coarse relevance triage ~92%; struggles on nuanced judgment. [limitation/triage framing only; F1=0.94 method claim refuted 0-3]
Hadifar, A., Ochs, M. & Van Ewijk, S. (2025). Language Models Guidance with Multi-Aspect-Cueing: A Case Study for Competitor Analysis. Nokia Bell Labs. arXiv:2504.02984. Raw LLMs lack current market knowledge. [limitation framing only; multi-aspect-cueing method claim refuted 0-3]

By Andrej Ruckij