@pbteja1998 just use qwen3.5-27B-GGUF locally as a primary model and routing rules on top if you are doing anything fancy…works like a charm and your wallet will thank you
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 5, 2026
Guess Claude code proved this way earlier when it was more performant with sonnet 4.6 than other frontiers, it’s the harness fgs!! https://t.co/3dl7NMODr7
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 5, 2026
Not impacted with anthropic cutting subs from claws. If you have been using your agents regularly, using anthropic 100% was never a sustainable option as it was clearly $200++ bill a month, so I have been using locally hosted Qwen3.5 with other options. No complaints. Kimi2.5 has https://t.co/53quTIemdT
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 4, 2026
OSS models are closing the gap, and with GGUF, NVFP and MXL, local inference is gaining wider adoption. Frontier models will still be the apple of the industry https://t.co/uH662icIlC
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 3, 2026
Classic troll https://t.co/s3PhfDmPw3
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 3, 2026
BitTorrent meets Napster for LLM inferencing
https://t.co/bgFnO1s870— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 3, 2026
RT @hqmank: If you’re using Claude Code, this is worth knowing.
Instead of worrying about whether Opus 4.6 or GPT 5.4 is better, it’s moreโฆ
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Apr 3, 2026
@ivanfioravanti Running GLM 5.1 and M2.7 via Ollama Claude code? Global settings won’t work right for 3 separate configs?
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Mar 30, 2026
Implementation of Googleโs TurboQuant (ICLR 2026) โ KV cache compression for local LLM inference, with planned extensions beyond the paper https://t.co/raO8puxH9B
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Mar 30, 2026
@thekitze @Lovable ๐คฃ
— Abhay ๐ธ๐ฌ๐ฎ๐ณ (@Abhay08)
Mar 28, 2026