Skip to content

5 min read

The satellite-tech stack I keep reaching for

  • civic-tech
  • satellite
  • patterns

Three projects in the last year have shipped on roughly the same stack: ghostwatch, solar-map-ph, and floodwatch-ph. They answer different questions (was this infrastructure built, is there solar on this roof, did this place flood) but the layer cake under each is almost identical. This post is the layer cake.

The substrate is always Earth Engine

Every project pulls from Google Earth Engine with server-side reduction and small client downloads only. There is no bulk raw-imagery fetch and no tile-throttling failure mode by design. Sentinel-2 (10 m optical, 5-day revisit) for ghostwatch and the solar quarterly pipeline. Sentinel-1 (C-band SAR, 10 m, sees through cloud) for floodwatch Track A. AlphaEarth Satellite Embedding V1 (frozen 64-dim per-pixel embedding) for floodwatch Track B. The point of pulling from Earth Engine is the zero marginal cost per project. The point of server-side reduction is that you do not pay for tiles you are not going to use.

The model is always small

ghostwatch is parameter-free: spectral indices (NDBI, NDVI, BSI) plus thresholds plus a five-bin classifier. solar-map-ph is a frozen CLIP-ViT-L encoder feeding a scikit-learn logistic regression head. floodwatch Track A is Otsu thresholding plus permanent-water and terrain masks. floodwatch Track B is the AlphaEarth embedding feeding a scikit-learn logistic regression head.

None of these is a trained CNN. None of them needs a GPU at inference time. The reason this works is that the encoder (when there is one) is the part that captures the visual prior, and a frozen pretrained encoder with a tiny classifier on top is usually competitive with a domain-tuned CNN at a fraction of the engineering cost. solar-map-ph’s encoder ablation has the receipts: CLIP-ViT-L beats dinov2-large by 4 F1 points and beats satlas-pretrain by 14 F1 points on rooftop solar.

Calibration is always Platt sigmoid

Every project that produces a probability calibrates it. Platt is the production choice in every case, with isotonic regression run alongside for comparison. Platt wins on monotonicity and parameter count. Isotonic sometimes wins on Brier score but the practical gap is small and the calibration plots are messier.

Calibration matters because every project has at least one downstream consumer (a journalist, a planner, a homeowner, a public dashboard) that needs to make a decision at a chosen threshold. An uncalibrated score that ranks tiles correctly but does not map to “fraction of tiles at this score that are real positives” is not enough.

Validation is always event-disjoint, source-disjoint, or held-out

The single most consequential decision in each project is the split. Random pixel splits inflate every metric because adjacent satellite pixels are near-duplicates. solar-map-ph holds out 20% of NCR source documents. floodwatch holds out 17 whole flood events, and drops any pixel that appeared in both a train and a holdout event. ghostwatch is unsupervised but reports its classifier accuracy on a hand-labeled set of project sites with the same care.

If you find yourself splitting at the pixel level, you are about to publish numbers you cannot defend.

Reproducibility is always bit-exact

Each project ships a Makefile, a Dockerfile, pinned dependencies, and a committed embeddings cache in git (solar-map-ph: ~11 MB, floodwatch-ph: ~1 MB). make train && make hash-verify asserts a specific sha256 prefix of the trained classifier. If a different scikit-learn version produces a different hash, the EXPECTED_HASH in the Makefile is bumped intentionally on upgrades.

The reason this matters: pickle deserialization executes arbitrary code. If someone sends you a .joblib file, you should hash-verify it before joblib.load. Both repos ship scripts/verify_clf.py for exactly that.

Privacy is always at publication, not at scan

Each project decides where the publication boundary sits, and enforces it with a CI gate that fails the build if the boundary is crossed. solar-map-ph publishes per-building polygons only for commercial, industrial, and public-purpose roofs; residential is aggregate-count only. floodwatch publishes flood exposure at province granularity only. ghostwatch publishes per-project flags but only against project coordinates the contracting agency itself published.

The boundary is not a model decision. It is a publication decision, made with the lawyer and the journalist in mind, and enforced before deploy.

What it costs you

This stack has a ceiling. 10 m Sentinel resolution means a 30 m footbridge is one pixel. A 100 m² covered court is two by two pixels at best. Underground works (drainage rehab, sewer projects) are invisible. Anything completed outside the cloud-free acquisition window is invisible. SAR sees through cloud but it sees backscatter, not pictures; the false-positive classes are different (smooth wet rice paddy, calm water, certain metal roofs).

Higher-resolution commercial imagery would catch more. It is not free. The point of building on Sentinel and AlphaEarth is the zero marginal cost per project. Once you accept that ceiling, the same stack keeps working.

What this is, what it is not

This is a working pattern for civic-tech satellite projects when the goal is reproducibility, a small per-project cost, and a defensible publication boundary. It is not the stack you reach for if you are doing high-resolution single-site verification (commercial imagery and a domain-tuned CNN), or active monitoring with sub-day latency (commercial constellations and an alerting pipeline), or anything that needs to identify individual people (this stack is explicitly designed not to).

Repos: ghostwatch, solar-map-ph, floodwatch-ph.