Lightpanda: An AI Scraping Browser 11x Faster Than Chrome

Repository: https://github.com/lightpanda-io/browser

Medium difficulty, approximately 20 minutes. Learn how to use a lightweight headless browser for AI data collection that is 11x faster than Chrome and uses 9x less memory.

Target Audience

Engineers requiring large-scale web scraping
AI/LLM data collection developers
Automation test engineers
Tech enthusiasts interested in high-performance browser technology

Core Dependencies and Environment

Linux x86_64 or macOS aarch64
Windows requires WSL2
Docker (optional, recommended for production)
Node.js 18+ (to run Puppeteer/Playwright scripts)

TIP

If you are on Windows, install Lightpanda directly in WSL2 and run Puppeteer from the Windows host.

Complete Project Structure

lightpanda-browser/
├── lightpanda              # Main program binary
├── LICENSE                # MIT License
├── README.md              # Project description
├── CONTRIBUTING.md        # Contribution guide
├── CLA.md                 # Contributor License Agreement
├── docker/
│   └── Dockerfile         # Docker build file
├── src/                   # Zig source code (if you want to compile from source)
└── docs/                  # Documentation directory

Step-by-Step Tutorial

Step 1: Download and Install Lightpanda

We can download the binaries directly from nightly builds.

Linux installation:

curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux && \
chmod a+x ./lightpanda

macOS installation:

curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-aarch64-macos && \
chmod a+x ./lightpanda

Verify installation:

./lightpanda --version

WARNING

Currently, official binaries are only available for Linux x86_64 and macOS aarch64. Windows users must use WSL2.

Step 2: Start with Docker (Recommended)

If you don't want to download the binary directly, you can get started faster with Docker:

docker run -d --name lightpanda -p 9222:9222 lightpanda/browser:nightly

This starts the CDP server directly, listening on port 9222.

Verify container status:

docker ps | grep lightpanda

Step 3: Launch the CDP Server

If we are not using Docker, we need to start the CDP server manually:

./lightpanda serve --obey_robots --log_format pretty --log_level info --host 127.0.0.1 --port 9222

Output will look similar to:

INFO  telemetry : telemetry status . . . . . . . . . . . . .  [+0ms]
      disabled = false

INFO  app : server running . . . . . . . . . . . . . . . .  [+0ms]
      address = 127.0.0.1:9222

TIP

The --obey_robots parameter ensures Lightpanda respects robots.txt, which is basic etiquette for polite crawlers.

Step 4: Write a Puppeteer Script

Now let's write our first crawler script. Assuming you have installed puppeteer-core in your project directory:

npm install puppeteer-core

Create a crawler.js file:

'use strict'

import puppeteer from 'puppeteer-core';

// Connect to Lightpanda's CDP server via WebSocket
const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

// Create browser context and page
const context = await browser.createBrowserContext();
const page = await context.newPage();

// Navigate to target page
await page.goto('https://demo-browser.lightpanda.io/amiibo/', {waitUntil: "networkidle0"});

// Extract all links from the page
const links = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('a')).map(row => {
    return row.getAttribute('href');
  });
});

console.log('Scraped links:');
links.forEach(link => console.log(link));

// Statistics for page loading
const metrics = await page.metrics();
console.log('\nPage metrics:');
console.log('Script Duration:', metrics.ScriptDuration, 'ms');
console.log('DOM Nodes:', metrics.Nodes);

// Cleanup resources
await page.close();
await context.close();
await browser.disconnect();

Run the script:

node crawler.js

TIP

If you started Lightpanda using Docker, change browserWSEndpoint in the script to ws://localhost:9222.

Step 5: Experience Extreme Scraping Speed

Official Lightpanda data shows:

Speed: 11x faster than Chrome
Memory: 9x less than Chrome
Startup: Instant startup (Headless Chrome takes several seconds)

Let's run a simple comparison test. First, ensure Lightpanda is running:

./lightpanda serve --host 127.0.0.1 --port 9222

Then write a batch scraping script:

'use strict'

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

const context = await browser.createBrowserContext();
const page = await context.newPage();

// Batch scrape multiple pages
const urls = [
  'https://demo-browser.lightpanda.io/amiibo/',
  'https://demo-browser.lightpanda.io/campfire-commerce/',
  'https://demo-browser.lightpanda.io/hacker-news-top-stories/',
];

const startTime = Date.now();

for (const url of urls) {
  console.log(`\nScraping: ${url}`);
  const pageStart = Date.now();

  await page.goto(url, {waitUntil: "networkidle0"});

  const title = await page.title();
  console.log(`Title: ${title}`);
  console.log(`Time taken: ${Date.now() - pageStart}ms`);
}

console.log(`\nTotal Time: ${Date.now() - startTime}ms`);

await browser.disconnect();

Run it:

node batch-crawler.js

You will find that even with batch scraping, Lightpanda's response is extremely fast.

Step 6: Advanced Feature - Page Screenshots

Lightpanda also supports screenshot functionality:

'use strict'

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

const context = await browser.createBrowserContext();
const page = await context.newPage();

// Set viewport size
await page.setViewport({ width: 1280, height: 720 });

await page.goto('https://demo-browser.lightpanda.io/campfire-commerce/', {waitUntil: "networkidle0"});

// Save screenshot
await page.screenshot({ path: 'screenshot.png', fullPage: true });

console.log('Screenshot saved to screenshot.png');

await browser.disconnect();

Troubleshooting Common Issues

Q1: Port 9222 is already in use

Symptom: Error "Address already in use" at startup.

Solution:

# Check what is using this port
lsof -i :9222

# Or change the port
./lightpanda serve --port 9223
# Then update the script to use ws://127.0.0.1:9223

Q2: Web API Not Supported Error

Symptom: "XXX is not defined" error when running scripts.

Solution: Lightpanda is currently in Beta, and Web API coverage is incomplete. File an issue on GitHub; the team usually responds quickly.

Q3: Docker Container Fails to Start

Symptom: docker run error or container exits immediately.

Solution:

# Check container logs
docker logs lightpanda

# If port conflict, change it
docker run -d --name lightpanda -p 9322:9222 lightpanda/browser:nightly

Q4: Puppeteer Fails to Connect

Symptom: Error: Protocol error (Target.attachToTarget): No target with given id.

Solution: Ensure the Lightpanda CDP server is started and the version is compatible with Puppeteer. Try restarting:

# Kill old process
pkill lightpanda
# Restart
./lightpanda serve --port 9222

Q5: Page Load Timeout

Symptom: TimeoutError: Navigation timeout.

Solution:

# Increase timeout duration
await page.goto(url, { timeout: 60000 });
# Or use domcontentloaded instead of networkidle0
await page.goto(url, { waitUntil: "domcontentloaded" });

Q6: Using Playwright instead of Puppeteer

Symptom: Unsure how to integrate.

Solution: Playwright connection is similar to Puppeteer:

import { chromium } from 'playwright';

const browser = await chromium.connectOverCDP('ws://127.0.0.1:9222');
// Subsequent usage is the same

Extended Reading / Advanced Directions

1. Compile from Source

If you want to dive deep into Lightpanda's internal implementation or contribute code, you can compile from source:

# Install Zig 0.15.2
curl -L https://ziglang.org/download/0.15.2/zig-linux-x86_64-0.15.2.tar.xz | tar xJ

# Clone the project
git clone https://github.com/lightpanda-io/browser.git
cd browser

# Compile
zig build run

2. Playwright Integration

Lightpanda officially supports Playwright. Usage:

npm install playwright

import { firefox } from 'playwright';

const browser = await firefox.connectOverCDP('ws://127.0.0.1:9222');
// Subsequent usage is the same as standard Playwright

WARNING

Playwright support has limitations—as Lightpanda continuously adds new Web APIs, Playwright might choose different execution paths, causing some scripts to fail.

3. Proxy and Network Interception

Lightpanda supports proxy and network request interception:

# Specify proxy at startup
./lightpanda serve --proxy http://proxy:8080

4. Custom HTTP Headers

await page.setExtraHTTPHeaders({
  'X-Custom-Header': 'value'
});

5. Web Platform Tests

The Lightpanda team is constantly pushing Web API compatibility tests. You can test specific API support at wpt.live.