The Archaeologist
A field journal for codebases and engineering blogs.
What this is
The Archaeologist is a curation tool. It digs into open-source repositories and engineering blogs — the kind of technical writing that shapes how the industry thinks — and surfaces what’s actually inside. Not a README scrape. Not a star count. A genuine read.
Every entry is analyzed at three depths: Tourist (plain English — what is this and why does it exist?), Engineer (a technical architectural tour), and Architect (a critical audit — complexity, trade-offs, when to use it and when to walk away). You pick your posture on arrival and the whole page reshapes itself.
Why it exists
The explosion of OSS tooling and engineering content has made the discovery problem worse, not better. Stars and HN points tell you what’s popular today, not what’s architecturally interesting or actually worth reading. Most engineers have a mental stack of repos and blogs they’ve been meaning to understand properly for months.
The Archaeologist is an attempt at a different kind of signal: editorial, depth-aware, opinionated. Each entry gets a verdict — USE, CONSIDER, or SKIP — not because any tool is objectively bad, but because context matters and someone has to say it plainly.
How the pipeline works
For repositories: the pipeline clones the codebase, runs tree-sitter to build an import graph across every file, clusters files into logical modules, then sends the structural skeleton (not the raw code) to an LLM for narrative synthesis. The result is three parallel narratives, a dependency diagram, a glossary of domain terms, and a curated reading path through the codebase.
For engineering blogs: the pipeline discovers the RSS feed, fetches and extracts full post text with trafilatura, sends posts in batches for per-post analysis (summary, topics, read time), clusters them into recurring theme groups, then generates an editorial verdict on the blog’s overall voice and stance. The activity timeline shows posting cadence over time.
Blogs are refreshed automatically every Monday via GitHub Actions. The cron checks the latest RSS date against the stored analyzedAttimestamp and only re-runs the pipeline when new posts exist — keeping API costs in check.
The catalog
Current coverage: 22 open-source repositories and 14 engineering blogs, with more added in batches. The selection skews toward infrastructure, distributed systems, ML tooling, and developer platform work — the engineering writing that tends to have a long half-life.
The repos section is a snapshot; the blogs section is a living catalog updated weekly. Together they form something like a curated reading list for engineers who want to understand the systems they work with and the thinking behind them.
The aesthetic
Field journal, not dashboard. The typography (Source Serif 4, IBM Plex Mono) and layout borrow from technical documents and academic dispatches — the kind of thing you’d find in a well-worn notebook. The oxblood accent is deliberate: it suggests annotation, not alert.
The ⁂ is an asterism — a typographic marker that something interesting just ended and something else is about to begin.