Introducing VaSE: Value-Aware Stochastic KV Cache Eviction.
Reasoning models think in CoT, bloating the KV cache. Eviction caps memory but suffers capability drop. VaSE is a training-free recipe that cuts that cost: keep large-magnitude value states, evict stochastically.
Excited to share that I’ve started @GoogleResearch as a student researcher today. I'll be working on tabular foundation models. Come and chat if you are around at Google or at the Bay Area.
Excited to share that I've started my summer internship at SystemsResearch@Google in Sunnyvale, working on agentic environment generation!
Always happy to chat about coding agents or LLM memory too. If you're around the Bay Area, would love to meet up.
The future risk of computer-use agents won’t come only from malicious prompts. It will come from agents that can flawlessly follow normal instructions straight into harm.
Introducing 𝐎𝐒-𝐁𝐥𝐢𝐧𝐝: a realistic but overlooked setting where every task begins with a benign user instruction, yet the harmfulness only emerges as the agent acts in the environment.
New paper: Convergent Evolution: How Different Language Models Learn Similar Number Representations.
Language models, classical word embeddings, and even raw token frequencies all develop the same Fourier features for numbers. But only some develop the underlying structure. 🧵
After three papers on Fourier features in LLMs, I think there's a principle worth naming. How should we do science on an LLM?
It corresponds to the existential questions:
> who am I? ↔ the phenomenon.
> where do I come from? ↔ the emergence.
> where am I going? ↔ the use.
🧵
8/8 Both artifacts may find use beyond this paper.
🦣 BEHEMOTH as a testbed for diverse memory extraction approaches (self-evolving, routing-based, skill-based, and beyond).
🌱 CluE for any setting where one agent must handle heterogeneous demands, e.g. serving users with distinct habits.
w/ @TengxiaoLiu, @BillJohn1235813, @taiwei_shi, @linxins2, @robinomial
Check out the paper & code if this resonates!
🧵 1/8
What should an LLM assistant remember across conversations?
Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem.
Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction.
📄 Paper: arxiv.org/abs/2604.11610
Frontier LLMs don't debug, they regenerate.
We built PDB to measure that gap, GPT-5.1-Codex pass unit tests >76% of the time, but touch only <45% of the right lines.
Even Claude Code touches only ~50%.
📄 Paper: arxiv.org/abs/2604.17338
🌐 Project: precise-debugging-benchmark.github.io
Coding agents running 24/7 will unlock a lot of breakthroughs 🚀. Easy to feel like we're being replaced 😨. But the real question:
What can we learn from this, and where do they still fall short?
New blog ⬇️
Auto research is on 🔥
We give algorithmic problems (like circle packing) to general coding agents, let it run overnight. 🌙
Agents reach SoTA. But more importantly: we analyze 100+ hours of trajectories to understand how it gets there 🧵
🏧Giving your agent unlimited tool calls doesn't make it smarter.
💡Why? It lacks 'Budget Awareness'!
Introducing Budget Tracker, a simple plug-in that enables more effective scaling behaviors: higher performance, lower cost.
Paper: arxiv.org/pdf/2511.17006
Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization!
Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
3 Followers 7K FollowingHilbert Space 📍Don’t follow back : I don’t have a line to write here ! builds softs at lab :) Bouncing electron; Interest; nuclear physics prev: YC , AWS Cloud
276 Followers 662 FollowingPhd Student @UWCheritonCS @VectorInst. Reasoning and Generalization of Language Agents. Ex @ucsbNLP, @MSFTResearch, @NanjingUnivers1.
337 Followers 7K FollowingHSBC
Deputy Manager at HSBC Bank
Former Account Officer at HSBC
Senior Financial Accountant
From Jeddah, Saudi Arabia
Lives in London, United Kingdom
276 Followers 662 FollowingPhd Student @UWCheritonCS @VectorInst. Reasoning and Generalization of Language Agents. Ex @ucsbNLP, @MSFTResearch, @NanjingUnivers1.
542 Followers 825 FollowingWorking on RL and agents at AWS. Previously reasoning and self-improvement. Six years of prior industrial research experience in speech processing and NLP
11K Followers 1K FollowingWaiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account.
Accepting ML/NLP PhD students.