Insights August 02, 2025 Technical insight

This article was co-written with Claude AI for translation, structure, and technical documentation. It is the fruit of real dialogue and hands-on implementation, not automated content. I wish everyone a good read.

A Note on Humility and Learning

After spending considerable time, energy, and resources learning and experimenting with AI, I still feel like a beginner. This is my humble experience report—I want to share perspectives, hopes, major mistakes, and overlooked foolishness that I’m still discovering.

I am much more a learner than a knower, much more ignorant than a lesson-giver. I’m sharing lessons I’ve learned for myself rather than pushing lessons others should apply without considering their own context. It’s in this humble philosophy of sharing that I write (or co-write with Claude Code) this article.

I hope you’ll find my personal touch despite the LLM reconditioning for proper English and readability that I cannot achieve alone.

Coming Back from Datalake View: Why I’m Reconsidering the “Everything in One Place” Approach

The Original Vision: Personal Data Lake Paradise

Just a few weeks ago, I was completely sold on the personal data lake concept. Following Damien’s vision and my own article “What I Learned About Data Strategy Before Going AI-First,” I believed the answer was simple: put everything in Markdown, centralize all knowledge in one massive repository, and train AI models on this unified dataset.

The logic was compelling: “Nothing we do can be solid without a massive repository where we can tap into all our accumulated knowledge and reflections.” One vault, one truth, one place to search everything I’ve ever thought or written.

I implemented this vision enthusiastically, building my Obsidian-Git hybrid architecture, migrating years of PowerPoints to Markdown, centralizing everything from personal notes to professional projects. It felt like digital enlightenment.

The Cybersecurity Wake-Up Call

Then I started the MIT AI Strategy course, and my perspective shifted dramatically.

The cybersecurity reality: Having everything in one place is a massive security vulnerability. All my eggs in one digital basket—personal thoughts, professional strategies, family information, project details, sensitive conversations. One breach, one compromised token, one security failure, and everything becomes exposed.

The irony wasn’t lost on me: while building what I thought was a robust personal knowledge system, I was creating the perfect attack surface.

Beyond Security: The Fundamental Weighting Problem

But the cybersecurity concerns were just the beginning. As I dove deeper into federated learning concepts, I uncovered a more fundamental issue with the data lake approach—the problem of relative importance.

The Quantity vs. Quality Dilemma

In federated learning, gradients are typically weighted by the number of data points. If entity A has 100,000 data points and entity B has 10,000, entity A gets 10x more influence in the final model—regardless of the actual business value or strategic importance of each entity.

This same problem exists in my personal data lake. Consider this scenario:

My data distribution:

Articles from 2013: 50 documents
Articles from 2022: 80 documents
Articles from 2024: 40 documents
Articles from 2025: 25 documents

In my mind, 2024 and 2025 content is far more valuable—it represents more experienced thinking, more relevant to my current life constraints, more strategically important for my future direction. But if I train an AI model on this data lake without weighting, the 2022 content will have the strongest influence simply due to volume.

The problem: The AI doesn’t know what I consider important. It only sees quantities, not qualities.

The Solution: Distributed Intelligence with Dynamic Context

I now believe the future lies not in massive centralization, but in federated applications that communicate dynamically.

The New Architecture Vision

Instead of one monolithic data lake, imagine:

Multiple specialized repositories (personal thoughts, professional projects, family information, code, research)
Each with its own security perimeter and access controls
Dynamic communication via MCP (Model Context Protocol) or APIs
Context-aware weighting based on request type and user intent

This approach offers several advantages:

Security compartmentalization: Breach of one repository doesn’t compromise everything
Selective sharing: Different security levels for different types of information
Dynamic relevance: AI can access the most relevant repositories for each specific query
O(N) complexity: MCP enables direct communication between applications without requiring N² API integrations

The Importance Weighting Revolution

This distributed approach opens up fascinating possibilities for context engineering:

Dynamic importance based on query context:

Family-related queries prioritize family-tagged repositories
Professional strategy queries weight recent business thinking more heavily
Technical questions favor code repositories and recent learning

Request-time context injection:

"When answering this question about career strategy, 
prioritize 2024-2025 content 3x over earlier years, 
and weight professional repositories 2x over personal reflections"

User-defined importance hierarchies that can evolve over time without retraining models.

The Federated Learning Inspiration

What excites me most is how federated learning research is making this human context problem explicit. When we see gradient aggregation weighted purely by data volume, we realize something fundamental: the human judgment about relative importance still matters.

This isn’t just a technical challenge—it’s a philosophical statement about human-AI collaboration. Pure quantity-based weighting assumes all data points are created equal, which is rarely true from a human strategic perspective.

The Broader Implications

This shift from data lake to federated intelligence has implications beyond personal knowledge management:

For Collective Intelligence

Organizations could maintain data sovereignty while still enabling collaborative AI training. Teams could share insights without exposing sensitive data, creating new forms of collective learning that preserve individual autonomy.

For AI Augmentation vs Replacement

The need for human context weighting reinforces why I believe in augmentation over replacement. Humans bring strategic judgment about relative importance that pure quantity-based systems miss. The AI handles processing and pattern recognition; humans provide contextual wisdom.

For Personal Digital Strategy

We’re moving toward a world where multiple specialized AI applications work together rather than one monolithic system trying to do everything. Each application maintains its expertise while collaborating when needed.

The Practical Implementation Roadmap

What I’m planning to do in the coming months represents a complete shift from the monolithic approach to a federated personal AI strategy:

Phase 1: Repository Mapping and Conversion

Step 1: Inventory existing repositories

List all my current data sources and repositories
Categorize by domain, sensitivity level, and strategic importance
Assess security requirements for each category

Step 2: Local markdown conversion

Convert each repository to lightweight, uniform markdown formats
The graphics challenge: Instead of heavy OCR on image files, convert diagram logic to Mermaid code
This preserves the structural meaning while keeping everything LLM-readable
Maintain repository-specific formatting standards

Phase 2: Federated Fine-tuning Strategy

The granularity problem: Repository-level weighting will likely be too coarse. I’m considering:

Semantic chunking with weighting scores: Break content into meaningful units with explicit importance ratings
Temporal weighting: More recent insights weighted higher by default
Context-specific scoring: Different weights for professional vs personal queries

The reality check: Manually rating every semantic chunk will be impractical. I’ll probably need to accept that this weighting is temporal and will influence each fine-tuning cycle differently. The key is making the weighting explicit rather than invisible.

The technical reality: Dynamic fine-tuning at query time makes no practical sense—you can’t retrain a model for every request. This constraint pushes us toward smarter context engineering solutions where the intelligence lies in how we compose and weight information at inference time rather than training time.

Phase 3: Personal Model Training

The goal isn’t to create a “digital double” (a concept I now find philosophically naive), but rather to fine-tune models that can work as assistants informed by my context.

Important distinction:

What I used to think: “My personal data = me, so training on it creates a version of me”
What I understand now: My personal data are traces of my thinking, not my thinking itself
The realistic goal: An AI that can access and reason about my accumulated knowledge while preserving the irreplaceable human elements

Phase 4: Federated Aggregation

Gradient aggregation with explicit weighting rules:

Strategic importance scores for different knowledge domains
Recency multipliers for time-sensitive insights
Context-aware importance that varies by query type

The philosophical challenge: Even with perfect data and metadata, the resulting model won’t capture the full complexity of human identity—the intuitions, emotions, values, and experiences that can’t be reduced to data points.

The accessibility reality: I’m pursuing this experiment because I find it fascinating to play with the real limits of AI and LLMs—limits that many people discuss but few actually face in practice. But let’s be honest: personal model retraining isn’t feasible for the average individual. The “digital double” concept would remain inaccessible to most people anyway.

The enterprise opportunity: Where this becomes genuinely realistic is for internal AI representatives within organizations—AI systems trained via federated learning, accessible only internally, with use cases spanning departments and cross-functional initiatives. Here, you can actually implement governance rules like “two entities with the same strategic score get equal weight regardless of their data volume.”

The complexity ahead: This reveals that we’ll face even more decision points requiring courage and risk-taking. Uncertainty, irreversibility, and imperfection remain central challenges. If LLMs have a temperature parameter to control randomness, I’d say reality has one too—and it seems pretty high! 😄

This roadmap acknowledges both the technical possibilities and the philosophical limitations of personal AI systems, while recognizing where these concepts become more practically viable.

The Human Factor: Evolution, Not Replacement

Perhaps the most important insight from this journey: expert human knowledge and judgment remain difficult to replace. Whether we’re talking about importance weighting, strategic priorities, or relevance assessment, the human perspective adds a layer of wisdom that pure algorithmic approaches miss.

But here’s where it gets interesting—I want to leave room for doubt and evolution.

The accessibility challenge: We can work on making this finesse of judgment and reflection more widespread, training more people to effectively modulate AI-proposed responses. The goal isn’t to keep expertise locked in a few hands, but to democratize sophisticated thinking.

The system design frontier: While fine-tuning at prompt level makes no practical sense, we can become much better at transmitting contextual data during the fine-tuning process itself. Imagine automatically calculating relative weights for different local training databases based on criteria derived from other databases—meta-learning about our own data patterns.

Mixture of Context Knowledge: What excites me most is the potential for dynamic context engineering in RAG systems—a “Mixture of Context Knowledge” that intelligently selects relevant user context pieces based on the query. These context elements wouldn’t live in the model or directly in the prompt, but would be dynamically composed. This pairs beautifully with Mixture of Experts: “the domain expert + the field expert where the domain applies.”

The collective intelligence horizon: I’m genuinely excited about all this because I sense we’re advancing toward hybrid human-AI collective intelligence. I still lack the orders of magnitude, tools, and practical experience on certain aspects, but each day brings incremental progress.

This doesn’t diminish the power of AI—it clarifies AI’s role as an incredibly sophisticated tool that becomes most powerful when guided by evolving human strategic thinking.

The future isn’t about replacing human judgment with AI processing power. It’s about creating systems sophisticated enough to incorporate human wisdom while handling the computational complexity we could never manage alone—and perhaps democratizing that wisdom in the process.

What’s Next: The Infrastructure Challenge

My next exploration targets the infrastructure layer that makes all this possible: OpenRouter for model routing, Crush CLI for local orchestration, and the hybrid cloud management across Azure and AWS. Because ultimately, federated learning and context engineering need robust, flexible infrastructure to move from concept to reality.

From monolithic data lake to federated intelligence: Lessons learned from the MIT AI Strategy course and federated learning research

“The best AI systems don’t replace human judgment—they amplify it at scale while preserving what makes human thinking irreplaceable.”

Related: federated learning data strategy cybersecurity AI strategy

Back to Insights