Andrej Karpathy and the LLM Wiki: Building Personal Knowledge Bases with AI

What Andrej Karpathy Published

Andrej Karpathy — former Tesla AI Director, OpenAI co-founder, one of the most influential minds in the AI field — published an unusual document last week: a Markdown file on GitHub describing how he personally uses AI for knowledge management.

No new model. No tool. No code. Just an idea — and within days it racked up millions of views. Dozens of developers built their own implementations, each tailored to their own stack.

Karpathy calls the document an “Idea File” — a deliberately abstract description that you hand to your AI agent so it can build its own version. The idea behind it: the LLM Wiki.

The Problem with RAG

Most people know AI and documents like this: you upload files, the AI searches through them anew for each question and generates an answer. That is RAG (Retrieval Augmented Generation) — it is how NotebookLM, ChatGPT with file upload, and most knowledge-base solutions work.

The problem: every request starts from zero. There is no accumulation. Ask a complex question that connects five documents, and the AI has to find and reassemble the same fragments every single time. Nothing gets built. Nothing connects over time.

Karpathy’s Solution: The LLM Wiki

Karpathy’s approach is fundamentally different. Instead of starting from scratch with every query, the AI builds a persistent, growing knowledge base — a collection of linked Markdown files that gets better with every new source.

The architecture has three layers:

1. Raw Sources — the Sources

Immutable original documents: articles, papers, notes, files. They are the source of truth and do not get modified. Karpathy uses the Obsidian Web Clipper to save web pages directly as Markdown.

2. The Wiki — the Knowledge Base

LLM-generated Markdown files with summaries, entity pages, concept pages, and cross-references. The LLM builds this structure incrementally — not anew with every query, but as a persistent, growing artifact. Karpathy describes it like this: Obsidian is the development environment, the LLM is the programmer, the Wiki is the codebase.

3. The Schema — the Rules of the Game

A configuration file (for example a CLAUDE.md) that tells the AI agent how the Wiki is structured, which conventions apply, and how new content gets classified. Human and AI evolve these rules together.

Three Operations That Cover Everything

Karpathy defines three core operations:

Ingest: A new source gets added. The LLM reads it, discusses the key findings, writes a summary, updates the index, and amends relevant pages in the Wiki. A single source can touch 10 to 15 existing pages.

Query: You ask a question. The LLM searches the Wiki, reads relevant pages, and synthesizes an answer with source attributions. The critical point: good answers get saved as new Wiki pages. An analysis you requested does not disappear into the chat history — it becomes part of the knowledge base. That way knowledge grows through your questions as well.

Lint: A periodic health check. The LLM looks for contradictions between pages, outdated statements, orphan pages without incoming links, and missing cross-references. It proposes new questions and identifies knowledge gaps.

Why It Works

Anyone who has ever maintained a wiki, a knowledge base, or even just a well-organized folder knows the problem: at the start everything is tidy. After three months the first entries go stale. After six months nobody dares change anything anymore because it is no longer clear what is still correct and what is not.

Karpathy nails it:

“The tedious part of maintaining a knowledge base is not the reading or the thinking — it’s the bookkeeping.”

The hard part of a knowledge base is not reading new articles or thinking about connections. The hard part is maintenance: updating cross-references, keeping summaries current, spotting contradictions between pages. People give up on wikis because the maintenance load grows faster than the benefit. AI agents do not have this problem — they do not get tired, do not forget a cross-link, and can work through 15 files in a single pass.

The result: using this method Karpathy has built around 100 articles with over 400,000 words on a single research topic — without typing a line himself. He describes the shift like this:

“A large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge.”

His AI usage is shifting: away from programming, toward curating knowledge. The human curates the sources, asks the right questions, and steers the analysis. The AI does the rest.

What the Idea File Itself Shows

Regardless of the content, the form is remarkable. Karpathy did not build a tool and did not share any code. He wrote a Markdown file — an abstract description of a pattern. At the end it explicitly says:

“The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document’s only job is to communicate the pattern.”

The document describes the what and the why, not the how in detail. The concrete implementation — directory structure, page formats, tooling — Karpathy deliberately leaves to the agent and the user. And that is exactly what worked: within hours, dozens of developers built their own versions, each tailored to their stack.

This shows something fundamental about how software gets built in the AI era: LLM agents like Claude Code, Codex, or Cursor can today read an abstract description, understand the underlying pattern, and generate a working implementation. Code becomes a disposable instance — what remains is the mental model behind it.

What This Means for Businesses

Karpathy’s concept is personal, but the implications reach much further:

Finally Making Company Knowledge Usable

In most companies, knowledge lives in people’s heads, email inboxes, Slack threads, and chaotic folders. Confluence wikis go stale after months because nobody takes ownership of maintenance. An LLM-Wiki approach solves exactly this problem: the AI maintains what no one on the team wants to maintain permanently. Meeting notes, customer calls, and project documents turn into a living knowledge base — one that maintains itself.

Knowledge Accumulates Instead of Decaying

The crucial difference compared with classic RAG systems: RAG starts from zero with every request. An LLM Wiki accumulates. Every new source makes the knowledge base richer. Every question can extend it. Cross-references are already there. Contradictions are already flagged. After six months the Wiki has a deeper understanding of the topic than any single team member.

Consulting and Client Work

For consultants, agencies, and freelancers this opens up a new possibility: building a dedicated knowledge base for each client. Every analysis, every report, every recommendation flows in. The agent recognizes patterns, identifies gaps, and proposes next steps — based on the entire project history, not just the latest message.

Content Strategy

Keyword research, competitor analyses, ranking data — feed all of it into an LLM Wiki as sources. The agent automatically spots content gaps, outdated pieces, and new opportunities. Instead of running a quarterly SEO audit, you have a knowledge base that thinks along continuously.

The Connection to Vannevar Bush

In his Idea File, Karpathy references Vannevar Bush’s “Memex” from 1945 — the vision of a personal, associative knowledge store. Bush described a machine that stores documents and connects them via cross-references so that knowledge becomes accessible as a network rather than linearly.

Bush had the right idea, but an unsolvable problem: who maintains all this? Eighty years later, AI provides the answer. The LLM agent does exactly what Bush described — only faster, cheaper, and without physical limits.

What You Can Do Now

You do not need to be a developer to benefit from this approach:

Collect your sources in one place. Articles, notes, reports — everything as files, ideally Markdown or plain text.
Give an AI agent the task of building a structured knowledge base from them. Summaries, cross-references, an index — the agent handles the grunt work.
Ask questions and save the good answers. Every analysis you request becomes part of the knowledge base. That way the system grows along with your work.
Run regular clean-ups. The agent finds outdated entries, missing links, and contradictions — maintenance you would never keep up with yourself.

My Own Idea Files on GitHub

Inspired by Karpathy’s approach I have published my own mental models as Idea Files — patterns from my day-to-day work as a freelancer that any AI agent can read and implement:

FAQ-First Chatbot — A chatbot architecture that answers exclusively from verified FAQs. Zero hallucination, full control.
MCP Server as Agent Tooling — How to give an AI agent access to real data sources — using SEO data as an example.
AI Video Production — Automated videos with voice cloning and programmatic rendering, optimized for TikTok and Reels.
One-Person Agency Operating System — Running a full agency solo — with an AI agent as the operating system.

The full repository is on my GitHub page.

Conclusion: Knowledge Instead of Code

Andrej Karpathy has shown that the most important application of AI might not be programming — but organizing and connecting knowledge. An LLM Wiki is not a futuristic concept. It is a Markdown folder, an AI agent, and a clear structure. The technology is here. The only question is who uses it first.

If you want to know what AI-assisted knowledge management could concretely look like in your company, reach out via my free initial call.