Abhay Bhat

Life | Technology | Investing

Abhay Bhat

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar https://t.co/ekUuDIqPEI


@TheMattBerman @openclaw UGC Thanks


Blown away by first 30m of how Qwen3.5-35B is performing when locally running on my 16gb m4 mini. Swapped my default model for Sun ☀️, 28 Tok/s is excellent! Just go with MoE agent running natively with llama cpp https://t.co/oyoJX1xHiM


Excellent read on how Singapore is navigating the current situation https://t.co/QWA0FLG1q9


If you have been a developer once, it’s a must listen 🚴🎧 https://t.co/1AYicOb9pC


A deeper dive into inference engineering https://t.co/WQy5TZKpa4


The Infrastructure Reckoning: What LLM Inference Actually Costs at Scale https://t.co/TVAGpJEvFe


@Teknium @morganlinton Peters time


RT @teortaxesTex: might be the most expensive published scale sweep in history https://t.co/46QeNQ9Vjt


The Infrastructure Reckoning: What LLM Inference Actually Costs at Scale

Two things the enterprise AI world hasn’t fully woken up to yet:

First — F500 enterprises setting up a dedicated GPU cluster with a custom-trained model for targeted use cases, running alongside frontier closed model like Claude Opus 4.6, is a business model the frontier labs haven’t fully explored. The labs still own the model and weights. The enterprise owns the security and data sovereignty layer. That’s a powerful, underexplored wedge.

Second — most enterprises have an untapped ability to build an intelligent routing layer on top of multiple models hosted by leading cloud providers like Oracle. Think OpenRouter sitting on top of 4–5 models leased or licensed by the enterprise — a mix of closed and open source. This intelligent routing layer is the enterprise superpower that almost no one is building yet.

Page 5 of 86