memory-arena: 20 agent-memory strategies, one eval
Twenty memory strategies through one lifecycle, one judge, one model. A 30-line vector store outranks every funded vendor SDK.
- ai
- memory
- benchmark
- case-study
Case studies and notes on the tools I build.
Twenty memory strategies through one lifecycle, one judge, one model. A 30-line vector store outranks every funded vendor SDK.
A human-calibrated Sentinel-2 canopy model for 17 NCR LGUs, 2019 to 2026, scored F1 0.78 against hand-labeled pixels.
Land subsidence for seven cities from Sentinel-1 InSAR, 2016 to 2025, mapped next to where the floods actually hit.
Animated bubble charts joining PHP 5 trillion of contract awards to poverty and GDP across 82 areas, 2014 to 2024.
Turn any installed Python package into a coding-assistant skill from inspect.signature. Offline, no API key, no LLM in the loop.
Why I keep building benchmark harnesses before I build the feature.
Where the per-country plumbing goes, and how to keep the core pipeline portable.
Sentinel-1 SAR for what flooded, AlphaEarth for what recurs, reported as separate metrics.
Three civic-tech projects on the same substrate: Earth Engine, a frozen encoder or spectral index, a small classifier, Platt calibration, deterministic Docker.
Frozen CLIP-ViT-L plus a logistic-regression head plus Platt calibration. F1 0.87 on a held-out NCR split.
Three projects in, here is the scope of claim on Sentinel-1, Sentinel-2, and AlphaEarth.
About me.
What cloudwright generates from a one-line prompt and why it matters.
100 tasks, sigmoid scoring, 12 capability dimensions. What the workflow benchmark actually measures.
A side-by-side benchmark of seven retrieval strategies on user-supplied corpora.
What you can see in a graph that you can't see across four separate systems.
DPWH says the project is done. Sentinel-2 says nothing was built.
An MCP server that puts Philippine civic data inside any agent.
Why on-device LLMs make sense for regulated enterprise support, and what shipping one looks like.