Repository: https://github.com/lightpanda-io/browser
Medium difficulty, approximately 20 minutes. Learn how to use a lightweight headless browser for AI data collection that is 11x faster than Chrome and uses 9x less memory.
Target Audience
- Engineers requiring large-scale web scraping
- AI/LLM data collection developers
- Automation test engineers
- Tech enthusiasts interested in high-performance browser technology
Core Dependencies and Environment
- Linux x86_64 or macOS aarch64
- Windows requires WSL2
- Docker (optional, recommended for production)
- Node.js 18+ (to run Puppeteer/Playwright scripts)
TIP
If you are on Windows, install Lightpanda directly in WSL2 and run Puppeteer from the Windows host.
Complete Project Structure
lightpanda-browser/
βββ lightpanda # Main program binary
βββ LICENSE # MIT License
βββ README.md # Project description
βββ CONTRIBUTING.md # Contribution guide
βββ CLA.md # Contributor License Agreement
βββ docker/
β βββ Dockerfile # Docker build file
βββ src/ # Zig source code (if you want to compile from source)
βββ docs/ # Documentation directory
Step-by-Step Tutorial
Step 1: Download and Install Lightpanda
We can download the binaries directly from nightly builds.
Linux installation:
curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux && \
chmod a+x ./lightpanda
macOS installation:
curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-aarch64-macos && \
chmod a+x ./lightpanda
Verify installation:
./lightpanda --version
WARNING
Currently, official binaries are only available for Linux x86_64 and macOS aarch64. Windows users must use WSL2.
Step 2: Start with Docker (Recommended)
If you don't want to download the binary directly, you can get started faster with Docker:
docker run -d --name lightpanda -p 9222:9222 lightpanda/browser:nightly
This starts the CDP server directly, listening on port 9222.
Verify container status:
docker ps | grep lightpanda
Step 3: Launch the CDP Server
If we are not using Docker, we need to start the CDP server manually:
./lightpanda serve --obey_robots --log_format pretty --log_level info --host 127.0.0.1 --port 9222
Output will look similar to:
INFO telemetry : telemetry status . . . . . . . . . . . . . [+0ms]
disabled = false
INFO app : server running . . . . . . . . . . . . . . . . [+0ms]
address = 127.0.0.1:9222
TIP
The --obey_robots parameter ensures Lightpanda respects robots.txt, which is basic etiquette for polite crawlers.
Step 4: Write a Puppeteer Script
Now let's write our first crawler script. Assuming you have installed puppeteer-core in your project directory:
npm install puppeteer-core
Create a crawler.js file:
'use strict'
import puppeteer from 'puppeteer-core';
// Connect to Lightpanda's CDP server via WebSocket
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222",
});
// Create browser context and page
const context = await browser.createBrowserContext();
const page = await context.newPage();
// Navigate to target page
await page.goto('https://demo-browser.lightpanda.io/amiibo/', {waitUntil: "networkidle0"});
// Extract all links from the page
const links = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a')).map(row => {
return row.getAttribute('href');
});
});
console.log('Scraped links:');
links.forEach(link => console.log(link));
// Statistics for page loading
const metrics = await page.metrics();
console.log('\nPage metrics:');
console.log('Script Duration:', metrics.ScriptDuration, 'ms');
console.log('DOM Nodes:', metrics.Nodes);
// Cleanup resources
await page.close();
await context.close();
await browser.disconnect();
Run the script:
node crawler.js
TIP
If you started Lightpanda using Docker, change browserWSEndpoint in the script to ws://localhost:9222.
Step 5: Experience Extreme Scraping Speed
Official Lightpanda data shows:
- Speed: 11x faster than Chrome
- Memory: 9x less than Chrome
- Startup: Instant startup (Headless Chrome takes several seconds)
Let's run a simple comparison test. First, ensure Lightpanda is running:
./lightpanda serve --host 127.0.0.1 --port 9222
Then write a batch scraping script:
'use strict'
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222",
});
const context = await browser.createBrowserContext();
const page = await context.newPage();
// Batch scrape multiple pages
const urls = [
'https://demo-browser.lightpanda.io/amiibo/',
'https://demo-browser.lightpanda.io/campfire-commerce/',
'https://demo-browser.lightpanda.io/hacker-news-top-stories/',
];
const startTime = Date.now();
for (const url of urls) {
console.log(`\nScraping: ${url}`);
const pageStart = Date.now();
await page.goto(url, {waitUntil: "networkidle0"});
const title = await page.title();
console.log(`Title: ${title}`);
console.log(`Time taken: ${Date.now() - pageStart}ms`);
}
console.log(`\nTotal Time: ${Date.now() - startTime}ms`);
await browser.disconnect();
Run it:
node batch-crawler.js
You will find that even with batch scraping, Lightpanda's response is extremely fast.
Step 6: Advanced Feature - Page Screenshots
Lightpanda also supports screenshot functionality:
'use strict'
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222",
});
const context = await browser.createBrowserContext();
const page = await context.newPage();
// Set viewport size
await page.setViewport({ width: 1280, height: 720 });
await page.goto('https://demo-browser.lightpanda.io/campfire-commerce/', {waitUntil: "networkidle0"});
// Save screenshot
await page.screenshot({ path: 'screenshot.png', fullPage: true });
console.log('Screenshot saved to screenshot.png');
await browser.disconnect();
Troubleshooting Common Issues
Q1: Port 9222 is already in use
Symptom: Error "Address already in use" at startup.
Solution:
# Check what is using this port
lsof -i :9222
# Or change the port
./lightpanda serve --port 9223
# Then update the script to use ws://127.0.0.1:9223
Q2: Web API Not Supported Error
Symptom: "XXX is not defined" error when running scripts.
Solution: Lightpanda is currently in Beta, and Web API coverage is incomplete. File an issue on GitHub; the team usually responds quickly.
Q3: Docker Container Fails to Start
Symptom: docker run error or container exits immediately.
Solution:
# Check container logs
docker logs lightpanda
# If port conflict, change it
docker run -d --name lightpanda -p 9322:9222 lightpanda/browser:nightly
Q4: Puppeteer Fails to Connect
Symptom: Error: Protocol error (Target.attachToTarget): No target with given id.
Solution: Ensure the Lightpanda CDP server is started and the version is compatible with Puppeteer. Try restarting:
# Kill old process
pkill lightpanda
# Restart
./lightpanda serve --port 9222
Q5: Page Load Timeout
Symptom: TimeoutError: Navigation timeout.
Solution:
# Increase timeout duration
await page.goto(url, { timeout: 60000 });
# Or use domcontentloaded instead of networkidle0
await page.goto(url, { waitUntil: "domcontentloaded" });
Q6: Using Playwright instead of Puppeteer
Symptom: Unsure how to integrate.
Solution: Playwright connection is similar to Puppeteer:
import { chromium } from 'playwright';
const browser = await chromium.connectOverCDP('ws://127.0.0.1:9222');
// Subsequent usage is the same
Extended Reading / Advanced Directions
1. Compile from Source
If you want to dive deep into Lightpanda's internal implementation or contribute code, you can compile from source:
# Install Zig 0.15.2
curl -L https://ziglang.org/download/0.15.2/zig-linux-x86_64-0.15.2.tar.xz | tar xJ
# Clone the project
git clone https://github.com/lightpanda-io/browser.git
cd browser
# Compile
zig build run
2. Playwright Integration
Lightpanda officially supports Playwright. Usage:
npm install playwright
import { firefox } from 'playwright';
const browser = await firefox.connectOverCDP('ws://127.0.0.1:9222');
// Subsequent usage is the same as standard Playwright
WARNING
Playwright support has limitationsβas Lightpanda continuously adds new Web APIs, Playwright might choose different execution paths, causing some scripts to fail.
3. Proxy and Network Interception
Lightpanda supports proxy and network request interception:
# Specify proxy at startup
./lightpanda serve --proxy http://proxy:8080
4. Custom HTTP Headers
await page.setExtraHTTPHeaders({
'X-Custom-Header': 'value'
});
5. Web Platform Tests
The Lightpanda team is constantly pushing Web API compatibility tests. You can test specific API support at wpt.live.