Life | Technology | Investing

Abhay Bhat

Category: Technology Page 1 of 3

The Memory Wall: How AI Accelerators Are Solving the SRAM vs HBM Tradeoff — and Why KV Cache Innovation Is the Next Revolution

Everyone’s obsessing over FLOPs. Benchmarks, leaderboards, token throughput. But here’s the dirty secret nobody in AI infrastructure wants to admit: the memory wall is the real bottleneck, and we’ve been pretending it doesn’t exist. While GPU suppliers print money selling GPUs with ever-fatter HBM stacks, a quiet revolution is happening in how we think about memory hierarchy—and it’s about to reshape the entire inference stack.

The Infrastructure Reckoning: What LLM Inference Actually Costs at Scale

Two things the enterprise AI world hasn’t fully woken up to yet:

First — F500 enterprises setting up a dedicated GPU cluster with a custom-trained model for targeted use cases, running alongside frontier closed model like Claude Opus 4.6, is a business model the frontier labs haven’t fully explored. The labs still own the model and weights. The enterprise owns the security and data sovereignty layer. That’s a powerful, underexplored wedge.

Second — most enterprises have an untapped ability to build an intelligent routing layer on top of multiple models hosted by leading cloud providers like Oracle. Think OpenRouter sitting on top of 4–5 models leased or licensed by the enterprise — a mix of closed and open source. This intelligent routing layer is the enterprise superpower that almost no one is building yet.

The Infrastructure Layer Nobody Saw Coming: Agent Harnesses Are Eating Software

The smartest bet in AI right now isn’t the model. It’s the plumbing around it.

We spent 2025 arguing about which LLM was better. GPT-5 vs Claude vs Gemini. Benchmarks, leaderboards, Reddit fights. Underneath all that noise, engineers quietly stopped asking “which model?” and started asking “how do I run 20 of them without losing my mind?” That question is now an infrastructure category. Agent harnesses. And nobody has clean answers yet.

openclaw cron dashboard

Eight Cron Jobs, One AI Agent, and a mantra for my daughter

I’ve been running a personal AI agent named Sun on my personal MacBook for several weeks. It manages my morning news briefing, auto-categorises my Notion inbox, tracks my portfolio across four markets, and every evening at 7:30 PM — without fail — plays Hanuman Chalisa on the Sonos in my living room so my daughter Anaya grows up hearing it. This is what that actually looks like.

The AI Chip Wars: GPU vs TPU vs LPU https://t.co/7XCQkGZkGY


ai chip wars header

The AI Chip Wars: GPU vs TPU vs LPU

Why the future of AI isn’t one chip to rule them all


The 60-Second Primer

Three chips are fighting for AI’s soul. GPUs (Graphics Processing Units) — the Swiss Army knife that trains most AI models today. TPUs (Tensor Processing Units) — Google’s secret weapon, hoarded for its own data centers. And LPUs (Language Processing Units) — the new kid optimized purely for inference speed. Understanding which chip wins where isn’t just hardware trivia — it’s the difference between a startup burning cash on the wrong infrastructure and an enterprise shipping AI that actually responds in real-time.

From Windows 3.1 to Open Source: A Journey of Discovery

The Spark That Ignited Everything

I still remember the day my world changed forever. I was fifteen, maybe sixteen, when my father walked through our front door carrying what seemed like a treasure chest – a brand new Windows 3.1 system. The beige tower hummed with promise, and I was absolutely starstruck. While other teenagers were out playing cricket or watching movies, I found myself drawn to this magical box like a moth to a flame.

Autonomous RAG with llama3 on Groq

I’ve had building a RAG on my to-do list for a while. Lately, I’ve been experimenting with Groq, which has consistently impressed me with its inference speed. It’s been over four months since I cancelled my ChatGPT Plus subscription and switched to BoltAI, where I can access leading models like GPT-4, Llama-3.70B, Claude Opus, and Mistral-8x7b at a fraction of the cost. What I appreciate about BoltAI is its well-designed graphical user interface, which allows me to easily connect with multiple foundational models via API.

Payments 🕸 Web 🕸

As a payment enthusiast, I’ve been researching the ins and outs of building a digital wallet that stands out in a crowded market. One thing that’s become clear is that payments themselves are a commodity – everyone wants to offer them, and they’re an essential part of any business transaction. However, the real value lies in what you can build on top of those payments.

COSS Revolution: Empowering Small Teams to Build Full Stack Web Apps

In recent times, SaaS solutions built upon Open Source Software (OSS), commonly referred to as Commercial OSS (COSS), have garnered significant attention. This growing trend has made it much easier to develop a comprehensive and scalable web application without resorting to expensive SaaS offerings. Now, all that is required is a team of 3-5 skilled engineers who possess the ability to navigate through extensive code bases. With the aid of tools such as cursor.sh or GitHub Copilot, these engineers can immerse themselves in solving the business or technical challenges at hand, without getting overly fixated on opinionated technology stacks. As someone who tends to get caught up in such stacks, this shift allows me to focus more on the task at hand and avoid unnecessary distractions.

Page 1 of 3