Technology – Abhay Bhat

The Router Is the Product: MoE at application Infra layer

On July 3, 2026

Every time you send a prompt to an AI app, something behind the scenes has to decide which model inference server actually answers it. Most people never think about this. But that decision — the “router” — is quietly turning into one of the most important pieces of the AI stack. This is a walkthrough of what routers actually do today, what’s still shallow about them, and where the smart money says they’re headed next.

The Memory Wall: How AI Accelerators Are Solving the SRAM vs HBM Tradeoff — and Why KV Cache Innovation Is the Next Revolution

By Abhay

On May 26, 2026

In Technology

Everyone’s obsessing over FLOPs. Benchmarks, leaderboards, token throughput. But here’s the dirty secret nobody in AI infrastructure wants to admit: the memory wall is the real bottleneck, and we’ve been pretending it doesn’t exist. While GPU suppliers print money selling GPUs with ever-fatter HBM stacks, a quiet revolution is happening in how we think about memory hierarchy—and it’s about to reshape the entire inference stack.

The Infrastructure Reckoning: What LLM Inference Actually Costs at Scale

By Abhay

On March 21, 2026

In AI Infrastructure, Blogs, Technology

Two things the enterprise AI world hasn’t fully woken up to yet:

First — F500 enterprises setting up a dedicated GPU cluster with a custom-trained model for targeted use cases, running alongside frontier closed model like Claude Opus 4.6, is a business model the frontier labs haven’t fully explored. The labs still own the model and weights. The enterprise owns the security and data sovereignty layer. That’s a powerful, underexplored wedge.

Second — most enterprises have an untapped ability to build an intelligent routing layer on top of multiple models hosted by leading cloud providers like Oracle. Think OpenRouter sitting on top of 4–5 models leased or licensed by the enterprise — a mix of closed and open source. This intelligent routing layer is the enterprise superpower that almost no one is building yet.

The Infrastructure Layer Nobody Saw Coming: Agent Harnesses Are Eating Software

By Abhay

On March 14, 2026

In Blogs, Technology

The smartest bet in AI right now isn’t the model. It’s the plumbing around it.

We spent 2025 arguing about which LLM was better. GPT-5 vs Claude vs Gemini. Benchmarks, leaderboards, Reddit fights. Underneath all that noise, engineers quietly stopped asking “which model?” and started asking “how do I run 20 of them without losing my mind?” That question is now an infrastructure category. Agent harnesses. And nobody has clean answers yet.

Eight Cron Jobs, One AI Agent, and a mantra for my daughter

By Abhay

On February 28, 2026

In Blogs, Open Source, Technology

I’ve been running a personal AI agent named Sun on my personal MacBook for several weeks. It manages my morning news briefing, auto-categorises my Notion inbox, tracks my portfolio across four markets, and every evening at 7:30 PM — without fail — plays Hanuman Chalisa on the Sonos in my living room so my daughter Anaya grows up hearing it. This is what that actually looks like.

The AI Chip Wars: GPU vs TPU vs LPU https://t.co/7XCQkGZkGY

By Abhay

On January 25, 2026

In Blogs, Technology, Twitter

The AI Chip Wars: GPU vs TPU vs LPU https://t.co/7XCQkGZkGY

— Abhay 🇸🇬🇮🇳 (@Abhay08)
Jan 25, 2026

The AI Chip Wars: GPU vs TPU vs LPU

By Abhay

On January 24, 2026

In Blogs, Technology, Twitter

Why the future of AI isn’t one chip to rule them all

The 60-Second Primer

Three chips are fighting for AI’s soul. GPUs (Graphics Processing Units) — the Swiss Army knife that trains most AI models today. TPUs (Tensor Processing Units) — Google’s secret weapon, hoarded for its own data centers. And LPUs (Language Processing Units) — the new kid optimized purely for inference speed. Understanding which chip wins where isn’t just hardware trivia — it’s the difference between a startup burning cash on the wrong infrastructure and an enterprise shipping AI that actually responds in real-time.

From Windows 3.1 to Open Source: A Journey of Discovery

By Abhay

On July 5, 2025

In Blogs, Life, Open Source

The Spark That Ignited Everything

I still remember the day my world changed forever. I was fifteen, maybe sixteen, when my father walked through our front door carrying what seemed like a treasure chest – a brand new Windows 3.1 system. The beige tower hummed with promise, and I was absolutely starstruck. While other teenagers were out playing cricket or watching movies, I found myself drawn to this magical box like a moth to a flame.

Autonomous RAG with llama3 on Groq

By Abhay

On May 13, 2024

In Open Source, Technology, Twitter

I’ve had building a RAG on my to-do list for a while. Lately, I’ve been experimenting with Groq, which has consistently impressed me with its inference speed. It’s been over four months since I cancelled my ChatGPT Plus subscription and switched to BoltAI, where I can access leading models like GPT-4, Llama-3.70B, Claude Opus, and Mistral-8x7b at a fraction of the cost. What I appreciate about BoltAI is its well-designed graphical user interface, which allows me to easily connect with multiple foundational models via API.

Payments 🕸 Web 🕸

By Abhay

On February 13, 2024

In Blogs, Payments, Startups, Technology

As a payment enthusiast, I’ve been researching the ins and outs of building a digital wallet that stands out in a crowded market. One thing that’s become clear is that payments themselves are a commodity – everyone wants to offer them, and they’re an essential part of any business transaction. However, the real value lies in what you can build on top of those payments.

Life | Technology | Investing

Category: Technology Page 1 of 3

The Router Is the Product: MoE at application Infra layer

The Memory Wall: How AI Accelerators Are Solving the SRAM vs HBM Tradeoff — and Why KV Cache Innovation Is the Next Revolution

The Infrastructure Reckoning: What LLM Inference Actually Costs at Scale

The Infrastructure Layer Nobody Saw Coming: Agent Harnesses Are Eating Software

Eight Cron Jobs, One AI Agent, and a mantra for my daughter

The AI Chip Wars: GPU vs TPU vs LPU https://t.co/7XCQkGZkGY

The AI Chip Wars: GPU vs TPU vs LPU

The 60-Second Primer

From Windows 3.1 to Open Source: A Journey of Discovery

The Spark That Ignited Everything

Autonomous RAG with llama3 on Groq

Payments 🕸 Web 🕸