SAFI.DEV
Back to Projects
Tailwindcss Typescript VS Code Extension React SQL.js zustand Playwright

SafiCrawl: In-Editor Website Crawler & SEO Auditor for VS Code

A zero-dependency VS Code extension that crawls any website, audits SEO across 11 categories, and reports Core Web Vitals, all without leaving the editor. No Python, no Docker, no hosted service.

SafiCrawl: In-Editor Website Crawler & SEO Auditor for VS Code

Project Overview

SafiCrawl is a Visual Studio Code extension that turns the editor into a full-featured SEO auditing workstation. It crawls any site from a seed URL, follows internal links, respects robots.txt, discovers sitemaps, and streams results live into a nine-tab React dashboard — all running inside the extension host.

Core capabilities

  • Crawler engine — BFS with a Promise-based worker pool, token-bucket rate limiting, exponential-backoff retries, and gzipped sitemap-index recursion.
  • SEO analysis — 11 categories covering title tags, meta descriptions, headings, content, canonicals, Open Graph, Twitter Cards, JSON-LD/microdata, hreflang, analytics, and indexability.
  • JavaScript rendering — optional Playwright integration that crawls React / Vue / Angular / any SPA with fully-hydrated HTML.
  • Core Web Vitals — post-crawl PageSpeed Insights with LCP, CLS, FCP, INP, and TTFB medians per URL.
  • Persistence — every crawl is saved to SQLite (via sql.js WASM) and can be loaded, resumed, archived, or deleted from the activity bar.
  • Visualization — interactive force-directed graph of the site structure with status-code coloring.
  • Exports — CSV, JSON, and XML for external reporting.

The Challenge

Existing SEO crawlers (Screaming Frog, Sitebulb, Ahrefs Site Audit) are powerful but live outside the developer's workflow. They require a separate desktop app, a paid subscription, or a hosted service — and none of them let you audit a staging site against a pull request or validate a docs change before it ships.

The problem compounded with specific technical constraints:

  1. No native modules allowed. VS Code ships a new Electron runtime every few weeks. Any extension that bundles a native SQLite binding (better-sqlite3, node-sqlite3) would need electron-rebuild for every VS Code version — making a single .vsix impossible to distribute.
  2. Playwright is ~300 MB. Bundling a headless browser would violate the marketplace 100 MB size cap and make installs painful on slow connections.
  3. Web VS Code / Codespaces support. A browser-based editor has no Node process, no filesystem, and no child processes — so the extension had to degrade gracefully.
  4. Live streaming + persistence without blocking the UI. A 5,000-URL crawl produces tens of thousands of edges and issues; naive postMessage and synchronous writes would freeze the extension host.
  5. Secret storage. A Google PageSpeed API key can't live in settings.json (it syncs, it leaks, it ends up in dotfiles repos).

The Solution

SafiCrawl was built as a pure TypeScript VS Code extension — no external server, no language bridge, no Docker — with every architectural decision driven by portability and performance.

1. SQLite without native modules

Persistence uses sql.js — SQLite compiled to WebAssembly. A single .vsix works on every VS Code / Electron version with zero rebuilds. The database is a standard SQLite file stored in VS Code's globalStorage, openable with any SQLite GUI for ad-hoc analysis.

2. Bring-your-own Playwright

Instead of bundling a 300 MB headless browser, SafiCrawl auto-detects Playwright from three locations: an explicit path in settings, the workspace's node_modules, or the global npm root. Users install once with npm i -g playwright and the extension stays under the marketplace size cap. JS rendering is off by default and force-disabled in web contexts with a one-time notification.

3. Batched IPC for live streaming

The controller bridges the crawler engine and the React webview by batching up to 50 rows or 100 ms of events before posting. Every batch is also written through to sql.js, so a window close mid-crawl flips the row to interrupted on next activation and offers Resume from exact checkpoint.

4. Nine-tab React dashboard

A React 19 + Tailwind v4 webview with TanStack Virtual tables (handles 10k+ rows without frame drops), a vis-network force-directed site graph, and a zustand store driven by typed postMessage events. Settings form is live-synced two-way with VS Code's settings.json.

5. Secure key storage

The PageSpeed API key lives in the OS keychain (macOS Keychain / Windows Credential Vault / libsecret) via VS Code's SecretStorage API — never in settings.json, never sent to the webview, never in the SQLite DB.

6. Robust crawler engine

  • undici for HTTP/2 with pooled connections and configurable timeouts.
  • cheerio for fast HTML parsing and SEO extraction.
  • robots-parser + fast-xml-parser for sitemap and robots.txt compliance (including gzipped sitemap-index recursion).
  • Token-bucket rate limiting with per-domain concurrency control.
  • Exponential-backoff retries and 429 handling for the PSI client.

Results

A single .vsix runs on every VS Code version and every OS, crawls 5,000 URLs with live streaming and full persistence, and adds SEO + Core Web Vitals auditing to the editor workflow — no servers, no subscriptions, no context switching.

Gallery

SafiCrawl: In-Editor Website Crawler & SEO Auditor for VS Code screenshot
SafiCrawl: In-Editor Website Crawler & SEO Auditor for VS Code screenshot
SafiCrawl: In-Editor Website Crawler & SEO Auditor for VS Code screenshot

Project Info

ROLE Lead Developer
TIMELINE 1 Month
YEAR 2026
STATUS
Completed

Tech Stack

Tailwindcss
Typescript
VS Code
Extension
React
SQL.js
zustand
Playwright
NEXT PROJECT Project Pulse: Health Monitoring for VS Code
View Project
GET IN TOUCH

Let's Build Something Together

Whether you have a project idea, want to collaborate on a game, or just say hello

Get in Touch safi.abdulkader@gmail.com