Oliver Ng. Ai experiments.

Category: Uncategorized

Gemma4 12b

With Nvidia announcing the RTX Spark to challenge Apple’s MX reign on local LLMs, it is appropriate that Google dumped some toys on us the next day.

Gemma4 12B is out today, in a sweet spot between their prior 8B and 26B I’m super interested to see how it stands up. 8B was a great chatbot, but a terrible tool bot. 26B seems much better but I could never run it on my M4 Mac mini 16GB. Does Google see 12B is “about right” for most spec’d out systems today – it runs on 16GB! Downloading now….

Unexpectedly, Google added to LiteRT-LM a local LLM server / CLI!

The LiteRT-LM CLI provides a lightweight, zero-code tool for running language models locally. We are now expanding the tool with the serve command, letting the CLI act as a drop-in local LLM server. Use this functionality with Gemma 4 12B to point any standard tool, SDK, or framework (such as OpenClaw, Hermes, OpenCode, Pi, or popular extensions like Continue and Aider) directly to your local endpoint.

Aside from the fact that the internets are saying RTX spark is going to cost $5000+ it does not seem far away where local models, even on edge devices (looking at you iOS27),may start to be competitive with cloud for basic stuff. Not coding, not tool use, not yet. But everyday things, like voice typing, opening apps, regular analysis, asking offline questions, could be done in the very near future all on local LLM.

June 4, 2026
4.8

New Opus 4.8 came out. I saw the video announcement on YouTube but what wasn’t captured was this gem in the release notes.

One of the most prominent improvements in Opus 4.8 is its honesty. We train all our models to be honest—for instance, to avoid making claims that they can’t support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in our evaluations, which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.

It’s nice to see a feature update focused on improving LLM alignments.

May 30, 2026
Into the rabbit hole of a rabbit hole
If you are a Claude Code regular you know that using it in terminal is the way to go. What seems intimidating becomes second nature over the desktop app. But unlike the desktop app which organizes your session views nicely, Code has always been at the mercy of your tmux client and terminal sessions. Super easy to start scrambling for screen real estate with all your terminal windows and tabs.

This week, Claude released Agent View for Claude Code. By hitting the left arrow on your keyboard in a Code session, you can get into an overview screen of your Claude agents. From here, you can spawn agents to do simultaneous work. Your agents can start working on features, planning enhancements, building tests, or bug fixing – all at once. Without opening another Claude Code terminal window. It leverages this by spawning multiple git worktrees on the current working folder so that agents do not interfere with each other.

If you’re in an existing Claude Code window, you can even background that that so that the session stays alive once you close terminal. Even crazier, is that you can have multiple terminal windows open and go from terminal (1), jump into agent view, to jump into the Claude session from terminal (2).

What manages this, is what Anthropic is calling the Supervisor process. It’s like a motherboard for your agents that remembers state. The whole thing is very freeing as where you once needed to have multiple terminal windows open to work on the same tasks, you now can have just one terminal and agent view.

But can I just say with a chuckle that, I feel terminal was not designed for being blown into a multi-windowed-multi-paned orchestration engine. It’s getting a bit out of hand. I often get lost in what exactly this window is showing me as I fall into a rabbit hole of terminals within terminals.

I have by my count lost my way many times staring at a black terminal window trying to recollect where in the matrix I am. Because I have:
- Claude Code running in every terminal window.
- cmux managing my terminal workspaces where some workspaces have multiple 2-up or 3-up panes.
- Claude agent view allowing me to spawn 3X more sessions in the background hidden from view.
- Claude agent view allowing me to move within different terminal windows without leaving my terminal window
- git branches where I have to remember what branch I had pulled and what working folder I’m in because every feature is usually on a new branch
- git worktrees, which the agent view will use because agent view has to use worktrees so as not to conflict in the working folder
- git worktree branches which… same as above
Insane.
May 20, 2026
Claude Routines

Since Claude released routines, I’ve had a blast finding ways to automate my code. I have a laundry list of personal projects in my GitHub repository that I work on with Claude, daily. I regularly have 5+ Claude code sessions coding away on random thoughts and prototypes all at once. So when routines released, I wondered how different it would be over scheduled tasks for cowork. The beauty for me is in the tool calls.

I use Claude Cowork to run a daily financial market analysis and refine a daily thesis that’s ready in the morning. Scheduled runs prompting at a given time and day are a part of cowork that I love. Cronjob with more intelligence.

Claude routines can use my GitHub integration to run everything in the cloud without my machine being on. It pulls my project code, uses a managed instance to run everything and pushes it back to Git after it’s done. It operates like a coding partner for me at night.

Just like that, I now have a bunch of Claude Code routines that trigger nightly, review my project codebase for issues, propose enhancements. Every morning I end up with a list of bug fixes and proposed enhancements. I recently changed my routine to straight up pick an enhancement to work on so when I get up the feature is ready for PR. Claude documents the change, performs security and dependency reviews, and summarizes the change every night. My projects are slowly building themselves as I direct projects rather than code them.

May 13, 2026
Faster! Faster!
Having played with LLMs for a few years now I’ve had various stages of appreciation for its efficiency.
1. Tell me some jokes..
2. You coded a debugging nightmare.
3. Hey this is kinda neat.
4. Think for me. I’m too lazy to look it up.
5. Spawn five of yourself and wire it up.
It happened so fast. For me, tool use has been the most eye opening. To see Claude computer use, review functionality, that it just implemented, by itself, by literally clicking around the iOS app it just built, is astounding.

I dove into an article on Sherwood News, Test time. It made me think about hiring the best people for tomorrow. Imagine you are looking at candidates. How can one justify hiring someone who has no experience with the potential of LLMs?

Instead of simply talking through strategy, some CMOs, investors, and operators are now being asked to use AI tools live — or during a tight take-home window — to create something in front of interviewers. A number of other firms do the same, while Nicole DeTommaso, a principal at Harlem Capital, says that anecdotally, she’s seen practically every potential candidate looking to join a venture capital firm being asked to show their prowess with AI coding tools.

DeTommaso wrote that one candidate was asked to build an AI agent that could produce automated research about industries within a working week that could reliably brief partners on a sector before they invested. Another needed to use the likes of Claude Code and Codex to vibe code a dashboard to show information about portfolio companies.

“You are not told which tools to use or how to go about it. You are just expected to figure it out,” she wrote. “And increasingly, what you can actually show in an interview matters more than what’s on your resume.”

At an individual contributor level it seems risky to hire someone who would be doing things “the old way”. It’s like signing a flat footed defence man in the world of Cale Makars. Speed is the game now. And at a leader level, Arguably it applies too where the best managers should excel at delegating to LLMs. It’s easier than ever to test and prototype. At a fraction of the cost before AI.
May 7, 2026