Building The Brain

A personal knowledge base that leverages AI, automation, and real-world insights to transform static information into actionable insights.

Dec 21, 2024

A Personal Knowledge Base for my AI projects

When I started working on "The Brain," it was to tackle a recurring issue: every new AI project required me to manually recreate prompts and reintroduce context about myself—my history, preferences, and style. This repetitive process was not only inefficient but also incredibly frustrating, like reinventing the wheel each time.

I thought The Brain could act as a central hub—a single place where all my personal knowledge, insights, and context could live-- would not only be helpful for that cause but would also be a good forcing function for reflection as I take a small break between startups/projects/etc.

It’s designed to make sure every AI project I work on has access to the same consistent foundation. By streamlining this process, I’ve saved myself from the hassle of repeating the same setup work while also delivering on a deeper integration for these agents to understand "me" better.


Why Build “The Brain”?

In building "The Brain," I wanted to create a system that allows LLMs to access and utilize my personal knowledge. These projects can leverage my insights and experiences, making information from my mind readily available and actionable. The Brain acts as an extension of my own knowledge, ensuring that LLMs can respond as if it were me, bringing a more personalized and authentic touch to user interactions.

The Brain is more than just a personal knowledge base. It’s a dynamic system combining OpenAI embeddings, vector databases, and automation to transform static information into something searchable, contextual, and impactful. By linking my outputs with real-time applications, "The Brain" ensures my knowledge is both preserved and easily accessible, enabling AI projects to efficiently use and adapt my personal insights without repetitive setup.

The Brain was built to address this challenge, providing a system that:

  1. Automates Updates: Automatically processes new content from various sources to a GitHub repository, keeping the system up to date.
  2. Enables Contextual Search: Uses state-of-the-art embeddings to make all content searchable and interconnected.
  3. Provides an API for Agents: Offers a best-practice API for seamless integration with external systems, enabling agentic interactions.
  4. Visual Understanding of AI Understanding: Visualizes patterns and connections in a 3D space for a deeper understanding of knowledge relationships.

How It Works: The Technical Blueprint

1. Organizing Knowledge with GitHub

All content resides in a GitHub repository, organized into directories like career/, publications/, and research/. Files are stored in Markdown or MDX format, ensuring:

  • Ease of Updates: Adding or editing content is as simple as committing a new file.
  • Scalability: Hierarchical organization keeps retrieval efficient, no matter the size of the repository.
  • Portability: No reliance on proprietary software, ensuring long-term accessibility.
GitHub repository structure for The Brain

2. Automating Knowledge Updates and Embedding

To keep the system current, I set up a GitHub Action that triggers on every commit to the main branch. The process includes:

  1. Check File Changes: The system identifies modified files to prevent redundant processing.
  2. Chunking Content: Files are divided into 400-token segments with overlapping boundaries to preserve context.
  3. Embedding Content: Each chunk is embedded using OpenAI’s text-embedding-large model.
  4. Storing Embeddings: Embeddings are saved in a Supabase Postgres database equipped with the pgvector extension, enabling vector similarity search natively.

This workflow ensures updates are processed automatically, reducing manual effort and enabling real-time availability for querying.

The github action for The Brain

3. Automated Ingestion from RSS Feeds

Another critical component of the system is automated ingestion, which expands its capabilities beyond manual updates. Using RSS feeds, new content is fetched and converted into Markdown files for integration. For example:

  • Blog Post Conversion: A GitHub Action pulls entries from an RSS feed (e.g., https://brennanmceachran.com/rss), converts them using a predefined template, and saves them to the knowledge base.
  • Seamless Integration: The system commits these changes and triggers the embedding workflow automatically.

This ensures a steady stream of updated content—like blog posts, newsletters, or external contributions—is continuously added and ready for use.

The Brain’s API is at the heart of its functionality, enabling advanced, AI-driven search capabilities. A key feature is query rewriting, which refines user inputs for greater accuracy and relevance.

Workflow:

  1. Input Parsing: Queries are analyzed to clarify intent and resolve ambiguities.
  2. Moderation: Inputs are assessed for safety, fairness, and compliance. Malicious or inappropriate queries are flagged or rejected.
  3. Embedding and Search: Both original and rewritten queries are embedded, searched independently, and merged for comprehensive results.
  4. Response Generation: Results are returned with metadata, including source context and similarity scores.

Why It Matters:

  • Accuracy: Refined queries better capture user intent, ensuring relevant results.
  • Efficiency: Optimized workflows reduce computational overhead, even in large datasets.
  • Safety: Built-in moderation ensures ethical and secure operations.
Query rewriting process in The Brain

5. Visualization and Agent Integration

A standout feature of The Brain is its 3D visualization, powered by Three.js. This tool offers an interactive way to explore the knowledge base.

Features:

  • Dimensionality Reduction: High-dimensional embeddings are reduced to 3D space using PCA, facilitating intuitive exploration.
  • Clustering: Thematic connections are highlighted as related ideas cluster together.
  • Color-Coding: Categories are visually differentiated with unique colors for quick identification.
  • Interactivity: Users can click on points to access metadata, source text, and related chunks.

Leveraging Next.js 15’s "use cache" enhances performance by optimizing server-side rendering and reducing redundant computations. This optimization significantly improves the user experience, enabling faster load times and smoother interactions when exploring the 3D visualization or querying the knowledge base. The visualization isn’t just visually appealing; it’s a practical tool for uncovering hidden relationships in the data.

3D visualization of The Brain

Real-World Applications and Insights

AI, Powered by Personal Knowledge

Unlike traditional knowledge management systems, which emphasize external inputs, The Brain focuses on leveraging personal outputs. This approach prioritizes the unique value of the knowledge I’ve created—such as talks, specs, and pitch decks—ensuring that these outputs are not only preserved but also readily accessible and actionable, setting it apart from systems designed for general content aggregation. Acting as an agent, it autonomously accesses and utilizes knowledge I’ve created—from talks and specs to pitch decks and research insights.

Future Use

  1. Personal Website Chatbot: Visitors can ask questions and receive personalized, contextual responses based on my corpus.
  2. Creative Writing Support: The system injects relevant insights into scripts or documents, enriching the creative process.
  3. Collaborative Knowledge Bases: Expanding to support multi-user inputs for team-based knowledge integration.
  4. Agent Augmentation: Providing a foundation for AI agents to learn, adapt, and execute tasks autonomously.
The Brain in use on my personal website

Conclusion

The Brain helps me streamline my work and make better use of what I’ve already created. It’s not a grand reinvention—just a practical tool to reduce repetition, keep my projects consistent, and give me a clearer view of how my knowledge connects. At its core, it’s about making things easier and more efficient for the work I actually care about.