Web scraping has fundamentally changed in the age of AI. Traditional scrapers break constantly as websites update their HTML structure. Firecrawl solves this by combining intelligent web crawling with LLM-powered extraction — you define what you want, not where to find it in the DOM.
In this tutorial, you'll build a Competitor Intelligence Dashboard using Firecrawl and Next.js 15 that:
- Scrapes any web page and returns clean Markdown
- Extracts structured data (product names, pricing, features) using AI and Zod schemas
- Crawls entire documentation sites or product catalogs
- Displays real-time results in a production-ready dashboard
Prerequisites
Before starting, ensure you have:
- Node.js 20+ installed
- Basic knowledge of Next.js App Router and TypeScript
- A Firecrawl account and API key (free tier: 500 credits/month at firecrawl.dev)
- Familiarity with Zod for schema validation
What You'll Build
A Next.js 15 application with:
- API routes that interface with the Firecrawl SDK
- Zod-validated extraction for structured competitor product data
- Async crawl jobs with status polling for large sites
- A dashboard UI displaying intelligence cards with pricing and features
Step 1: Project Setup
Create a new Next.js 15 project with TypeScript:
npx create-next-app@latest competitor-intel --typescript --tailwind --app
cd competitor-intelInstall the Firecrawl JavaScript SDK and Zod:
npm install @mendable/firecrawl-js zodAdd your API key to .env.local:
FIRECRAWL_API_KEY=fc-your-api-key-hereStep 2: Configure the Firecrawl Client
Create a reusable client module at lib/firecrawl.ts:
import FirecrawlApp from '@mendable/firecrawl-js';
if (!process.env.FIRECRAWL_API_KEY) {
throw new Error('FIRECRAWL_API_KEY is not defined');
}
export const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY,
});This singleton pattern prevents creating multiple instances during server-side rendering.
Step 3: Scraping a Single Page
Firecrawl's scrape endpoint fetches a URL and returns clean data in multiple formats. Create app/api/scrape/route.ts:
import { NextRequest, NextResponse } from 'next/server';
import { firecrawl } from '@/lib/firecrawl';
export async function POST(request: NextRequest) {
const { url } = await request.json();
if (!url || typeof url !== 'string') {
return NextResponse.json({ error: 'URL is required' }, { status: 400 });
}
try {
const result = await firecrawl.scrapeUrl(url, {
formats: ['markdown', 'html'],
});
if (!result.success) {
return NextResponse.json({ error: 'Scrape failed' }, { status: 500 });
}
return NextResponse.json({
markdown: result.markdown,
title: result.metadata?.title,
description: result.metadata?.description,
});
} catch (error) {
return NextResponse.json(
{ error: 'Failed to scrape URL' },
{ status: 500 }
);
}
}The formats array controls what Firecrawl returns. markdown gives you clean, readable text (perfect for LLMs), while html returns the raw markup.
Step 4: LLM-Structured Extraction with Zod
The real power of Firecrawl comes from schema-driven extraction — you define a Zod schema, and Firecrawl uses an LLM to extract matching fields from any page, regardless of HTML structure.
Define your product schema at lib/schemas.ts:
import { z } from 'zod';
export const ProductSchema = z.object({
name: z.string().describe('Product or service name'),
tagline: z.string().optional().describe('Main marketing tagline'),
pricing: z
.array(
z.object({
plan: z.string(),
price: z.string(),
features: z.array(z.string()),
})
)
.optional()
.describe('Pricing tiers with features'),
mainFeatures: z.array(z.string()).describe('Top 5 key features'),
targetAudience: z.string().optional().describe('Who the product is for'),
techStack: z.array(z.string()).optional().describe('Technologies mentioned'),
});
export type Product = z.infer<typeof ProductSchema>;Now create the extraction API route at app/api/extract/route.ts:
import { NextRequest, NextResponse } from 'next/server';
import { firecrawl } from '@/lib/firecrawl';
import { ProductSchema } from '@/lib/schemas';
export async function POST(request: NextRequest) {
const { url } = await request.json();
if (!url || typeof url !== 'string') {
return NextResponse.json({ error: 'URL is required' }, { status: 400 });
}
try {
const result = await firecrawl.scrapeUrl(url, {
formats: ['extract'],
extract: {
schema: ProductSchema,
prompt:
'Extract product information, pricing tiers, and key features from this page.',
},
});
if (!result.success || !result.extract) {
return NextResponse.json({ error: 'Extraction failed' }, { status: 500 });
}
return NextResponse.json({ data: result.extract });
} catch (error) {
return NextResponse.json(
{ error: 'Failed to extract data' },
{ status: 500 }
);
}
}The extract format sends the scraped content through an LLM and returns data matching your Zod schema. This keeps working even after the site completely redesigns its layout.
Step 5: Crawling Entire Websites
For crawling multiple pages (a documentation site or product catalog), use asyncCrawlUrl. Create app/api/crawl/route.ts:
import { NextRequest, NextResponse } from 'next/server';
import { firecrawl } from '@/lib/firecrawl';
export async function POST(request: NextRequest) {
const { url, limit = 10 } = await request.json();
if (!url || typeof url !== 'string') {
return NextResponse.json({ error: 'URL is required' }, { status: 400 });
}
try {
const crawlResponse = await firecrawl.asyncCrawlUrl(url, {
limit,
scrapeOptions: {
formats: ['markdown'],
},
excludePaths: ['/blog/*', '/news/*'],
});
if (!crawlResponse.success) {
return NextResponse.json(
{ error: 'Crawl failed to start' },
{ status: 500 }
);
}
return NextResponse.json({ jobId: crawlResponse.id });
} catch (error) {
return NextResponse.json(
{ error: 'Failed to start crawl' },
{ status: 500 }
);
}
}asyncCrawlUrl returns a job ID immediately. Poll for completion:
// app/api/crawl/status/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { firecrawl } from '@/lib/firecrawl';
export async function GET(request: NextRequest) {
const jobId = request.nextUrl.searchParams.get('jobId');
if (!jobId) {
return NextResponse.json({ error: 'jobId is required' }, { status: 400 });
}
const status = await firecrawl.checkCrawlStatus(jobId);
return NextResponse.json({
status: status.status,
completed: status.completed,
total: status.total,
pages: status.status === 'completed' ? status.data : [],
});
}Step 6: URL Discovery with Map
Before committing crawl credits to an entire site, use mapUrl to discover available pages:
const siteMap = await firecrawl.mapUrl('https://competitor.com', {
search: 'pricing', // Filter URLs containing this keyword
limit: 50,
});
console.log(siteMap.links);
// ['https://competitor.com/pricing', 'https://competitor.com/pricing/enterprise', ...]This is ideal for targeted crawling — discover the relevant pages first, then crawl only those.
Step 7: Building the Dashboard UI
Create the main page at app/page.tsx:
'use client';
import { useState } from 'react';
import type { Product } from '@/lib/schemas';
export default function IntelligenceDashboard() {
const [url, setUrl] = useState('');
const [loading, setLoading] = useState(false);
const [product, setProduct] = useState<Product | null>(null);
const [error, setError] = useState<string | null>(null);
async function handleExtract() {
if (!url) return;
setLoading(true);
setError(null);
try {
const res = await fetch('/api/extract', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url }),
});
const json = await res.json();
if (!res.ok) {
setError(json.error ?? 'Extraction failed');
return;
}
setProduct(json.data);
} catch {
setError('Network error — please try again');
} finally {
setLoading(false);
}
}
return (
<div className="max-w-4xl mx-auto p-8">
<h1 className="text-3xl font-bold mb-2">Competitor Intelligence</h1>
<p className="text-gray-600 mb-8">
Enter any competitor URL to extract structured product data with AI.
</p>
<div className="flex gap-2 mb-8">
<input
type="url"
value={url}
onChange={(e) => setUrl(e.target.value)}
placeholder="https://competitor.com/pricing"
className="flex-1 border rounded-lg px-4 py-2 text-sm"
/>
<button
onClick={handleExtract}
disabled={loading || !url}
className="bg-orange-500 text-white px-6 py-2 rounded-lg disabled:opacity-50"
>
{loading ? 'Extracting...' : 'Extract'}
</button>
</div>
{error && (
<div className="bg-red-50 border border-red-200 rounded-lg p-4 mb-6 text-red-700">
{error}
</div>
)}
{product && <ProductCard product={product} />}
</div>
);
}
function ProductCard({ product }: { product: Product }) {
return (
<div className="border rounded-xl p-6 space-y-4">
<div>
<h2 className="text-2xl font-bold">{product.name}</h2>
{product.tagline && (
<p className="text-gray-600 mt-1">{product.tagline}</p>
)}
{product.targetAudience && (
<p className="text-sm text-orange-600 mt-1">
For: {product.targetAudience}
</p>
)}
</div>
<div>
<h3 className="font-semibold mb-2">Key Features</h3>
<ul className="list-disc list-inside space-y-1 text-sm text-gray-700">
{product.mainFeatures.map((f, i) => (
<li key={i}>{f}</li>
))}
</ul>
</div>
{product.pricing && product.pricing.length > 0 && (
<div>
<h3 className="font-semibold mb-2">Pricing Plans</h3>
<div className="grid grid-cols-1 md:grid-cols-3 gap-3">
{product.pricing.map((plan, i) => (
<div key={i} className="border rounded-lg p-3 text-sm">
<div className="font-medium">{plan.plan}</div>
<div className="text-orange-600 font-bold">{plan.price}</div>
<ul className="mt-2 space-y-1 text-gray-600">
{plan.features.slice(0, 3).map((f, j) => (
<li key={j} className="truncate">
• {f}
</li>
))}
</ul>
</div>
))}
</div>
</div>
)}
{product.techStack && product.techStack.length > 0 && (
<div>
<h3 className="font-semibold mb-2">Tech Stack Mentioned</h3>
<div className="flex flex-wrap gap-2">
{product.techStack.map((tech, i) => (
<span key={i} className="bg-gray-100 px-2 py-1 rounded text-xs">
{tech}
</span>
))}
</div>
</div>
)}
</div>
);
}Step 8: Rate Limiting and Retry Logic
Firecrawl enforces rate limits based on your plan. Add exponential backoff for resilience:
// lib/firecrawl-retry.ts
import { firecrawl } from './firecrawl';
export async function scrapeWithRetry(
url: string,
options = {},
maxRetries = 3
) {
let lastError: Error | null = null;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await firecrawl.scrapeUrl(url, options);
} catch (error) {
lastError = error as Error;
if (attempt < maxRetries) {
// Exponential backoff: 1s, 2s, 4s
await new Promise((resolve) =>
setTimeout(resolve, Math.pow(2, attempt - 1) * 1000)
);
}
}
}
throw lastError;
}When processing multiple URLs in batch, add a small delay between requests:
async function batchScrape(urls: string[]) {
const results = [];
for (const url of urls) {
const result = await scrapeWithRetry(url);
results.push(result);
// 500ms between requests prevents rate limit errors
await new Promise((resolve) => setTimeout(resolve, 500));
}
return results;
}Step 9: Caching Results with Next.js
Firecrawl credits are limited, so cache extraction results to avoid redundant API calls:
import { unstable_cache } from 'next/cache';
import { firecrawl } from '@/lib/firecrawl';
import { ProductSchema } from '@/lib/schemas';
export const getCachedProductData = unstable_cache(
async (url: string) => {
const result = await firecrawl.scrapeUrl(url, {
formats: ['extract'],
extract: { schema: ProductSchema },
});
return result.extract;
},
['product-data'],
{ revalidate: 3600 } // Cache for 1 hour
);Testing Your Implementation
- Start the dev server:
npm run dev - Navigate to
http://localhost:3000 - Enter a competitor pricing URL (try
https://vercel.com/pricing) - Click Extract and wait for the AI extraction (typically 3-8 seconds)
- Verify the structured data matches what is displayed on the page
A successful extraction for a pricing page should return plan names, prices, and feature lists — all without any CSS selectors.
Troubleshooting
"API key not found": Ensure FIRECRAWL_API_KEY is in .env.local and restart the dev server after adding it.
"Scrape failed" on certain sites: Some sites aggressively block scrapers. Firecrawl has built-in anti-bot bypass for most sites. For JavaScript-heavy SPAs, add a waitFor option:
const result = await firecrawl.scrapeUrl(url, {
formats: ['extract'],
waitFor: 2000, // Wait 2 seconds for JS rendering
extract: { schema: ProductSchema },
});Empty extraction results: The LLM extraction works best with content-rich pages. Ensure the target page has sufficient visible text about the product.
Function timeout in production: For large crawl jobs, increase the maximum function duration in your route:
// Top of app/api/crawl/route.ts
export const maxDuration = 60;Deployment on Vercel
- Add
FIRECRAWL_API_KEYto your Vercel project environment variables - For crawl routes, increase
maxDurationto 60 seconds as shown above - Always use
asyncCrawlUrl(notcrawlUrl) in production to avoid synchronous timeouts
Next Steps
- Add a database (Neon + Drizzle) to persist extracted competitor data over time
- Schedule weekly re-scrapes with Trigger.dev for freshness monitoring
- Combine with the Vercel AI SDK to auto-generate competitive analysis reports
- Explore Firecrawl's Agent API for fully autonomous data gathering tasks
Conclusion
Firecrawl transforms web scraping from a fragile HTML-parsing exercise into a resilient AI-powered pipeline. The combination of scrapeUrl for single pages, asyncCrawlUrl for entire sites, and schema-based extract for structured data covers the vast majority of web data extraction needs in modern AI applications. With Zod schema validation and Next.js caching, you get a production-ready pipeline that delivers structured competitor intelligence without maintaining brittle CSS selectors.