Files
bincio-activity/CLAUDE.md
T
2026-03-29 11:26:58 +02:00

346 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BincioActivity — Context for Claude
## What this project is
BincioActivity is a federated, open-source, self-hosted activity stats platform
(think personal Strava). Two-stage pipeline:
1. **`bincio extract`** (Python): GPX/FIT/TCX → BAS JSON data store
2. **`bincio render`** (Astro/Node): BAS data store → static website
The BAS (BincioActivity Schema) JSON files are the federation protocol.
Anyone can publish their data as BAS JSON and others can include it.
## Key design decisions
- **No database, no server** — everything is static files
- **Python with uv** for the extract stage
- **Astro + Svelte + Tailwind + MapLibre GL + Observable Plot** for the site
- **Haversine** (not geopy) for distance calculations (10x faster)
- **Worker initializer pattern** for ProcessPoolExecutor — large shared data
(strava_lookup dict, known_hashes frozenset) is sent once per worker via
`initializer=`, not once per task
- **BAS activity IDs** always use UTC with Z suffix for URL safety
- **TCX files** from Garmin use both `http://` and `https://` namespace URIs —
parser handles both
## User's data
- Source: `~/src/cycling_data_davide/`
- `activities/` — Strava export (GPX, FIT, TCX, all with .gz variants)
- `Karoo_2026/` — recent Karoo device FIT files
- `Karoo/` — older Karoo FIT files
- `activities.csv` — Strava metadata (names, descriptions, gear)
- Extracted output: `~/bincio_data/` (or `/tmp/bincio_test/` for testing)
- ~3,200 input files → ~2,082 unique activities after dedup
- Date range: 20142026
## Project structure
```
bincio/ Python package
extract/
models.py DataPoint, ParsedActivity, LapData
parsers/ GPX, FIT, TCX parsers + factory
sport.py sport name normalisation
metrics.py haversine-based stats computation (single pass)
timeseries.py downsample to 1Hz, build BAS timeseries object
simplify.py RDP track simplification → GeoJSON
dedup.py exact (hash) + near-duplicate detection
strava_csv.py Strava activities.csv importer
writer.py BAS JSON + GeoJSON writer
config.py extract_config.yaml loader
cli.py `bincio extract` CLI
render/
cli.py `bincio render` CLI (symlinks data, runs astro build/dev)
schema/
bas-v1.schema.json JSON Schema for BAS
SCHEMA.md Human-readable BAS spec
site/ Astro project
src/
layouts/Base.astro
pages/
index.astro Activity feed (loads index.json client-side)
activity/[id].astro Single activity (SSG, loads detail JSON client-side)
stats/index.astro Heatmap + year totals
components/
ActivityFeed.svelte Card grid, sport filter, pagination
ActivityDetail.svelte Map + stats + charts wrapper
ActivityMap.svelte MapLibre GL (gradient track, linked hover dot)
ActivityCharts.svelte Observable Plot (elevation/speed/HR/cadence tabs)
StatsView.svelte Yearly heatmap + totals
lib/
types.ts BAS TypeScript types
format.ts formatDistance, formatDuration, sportIcon, etc.
```
## How to run
```bash
# Extract
cd ~/src/bincio_activity
uv run bincio extract --input ~/src/cycling_data_davide/activities --output /tmp/bincio_test
# Site dev server
cd site
ln -sf /tmp/bincio_test public/data # symlink data
BINCIO_DATA_DIR=/tmp/bincio_test npm run dev
# Tests
uv run pytest
```
## MapLibre GL + Vite/Astro — known gotchas
Learnt the hard way during debugging (March 2026):
- **`maplibregl.workerUrl = ...` is the v3 API and silently no-ops in v4+.**
The v5 API is `maplibregl.setWorkerUrl(url)`, but you don't need it at all in a
normal Vite environment — MapLibre handles the blob worker automatically.
- **`optimizeDeps: { exclude: ['maplibre-gl'] }` breaks tile loading.**
It prevents Vite from converting MapLibre's UMD bundle to ESM. The UMD bundle
uses AMD `define()` internally; served raw, the tile worker blob fails silently →
black map, no tiles. The correct setting is `include: ['maplibre-gl']`.
- **`build.target: 'es2022'` (and `optimizeDeps.esbuildOptions.target`) is required.**
MapLibre's dependencies use ES2022 class field syntax. If esbuild downgrades it,
helpers like `__publicField` aren't available inside the serialised worker blob
scope → tile loading fails. This is a known upstream issue (maplibre-gl-js #6680).
- **Use static imports, not dynamic `await import('maplibre-gl')`, when possible.**
With `client:only="svelte"` in Astro, SSR never runs for the component so there is
no `window is not defined` risk. Static import lets Vite pre-bundle correctly.
- **Use `client:only="svelte"` (not `client:load`) for the activity detail page.**
`client:load` does SSR + hydration; complex interactive components with MapLibre
can hit hydration mismatch issues. `client:only` mounts fresh on the client only.
- **MapLibre v5 requires explicit `center` and `zoom` in the Map constructor.**
v4 silently defaulted to `center: [0,0], zoom: 0`. v5 leaves internal projection
state undefined → `Cannot read properties of undefined (reading 'lng')` crashes
on any operation that touches coordinates (markers, resize, render). Always pass
`center` and `zoom` even if you plan to `fitBounds` later.
- **MapLibre v5 requires `setLngLat()` on markers before `.addTo(map)`.**
v4 tolerated markers without coordinates. v5 calls `Marker._update()` inside
`addTo()`, which needs valid lngLat → same `'lng'` crash. Set a dummy `[0, 0]`
if the real position arrives later (e.g. hover markers).
## Observable Plot — known gotchas
- **Curve names are hyphenated, not camelCase.**
Use `"monotone-x"`, not `"monotoneX"`. Plot uses its own curve name registry
(not raw d3 identifiers). Wrong names throw `unknown curve` at runtime.
The working `astro.config.mjs` Vite section:
```js
vite: {
optimizeDeps: {
include: ['maplibre-gl'],
esbuildOptions: { target: 'es2022' },
},
build: { target: 'es2022' },
},
```
## StatsView heatmap — colour intensity scaling
Two approaches have been tried. The **active one is percentile-based** (preferred for now).
### Option A — Linear / max-relative (simpler, currently inactive)
```ts
$: maxDailyKm = Math.max(...[...byDate.values()].map(v => v / 1000), 1);
// inside cellColors loop:
const km = total / 1000;
const intensity = Math.min(0.12 + (km / maxDailyKm) * 0.88, 1.0);
```
- Busiest day = full brightness; all others scale linearly against it.
- Intuitive: you can visually read "this day was ~50% of my biggest day".
- Downside: one outlier (e.g. a 250 km day) compresses everything else into
near-darkness. Cross-sport comparison is unfair (10 km run vs 10 km cycling
look very different even when filtered to a single sport).
- Legend shows actual max km: `More (X km max)`.
### Option B — Percentile rank (active)
```ts
$: sortedDaily = [...byDate.values()].sort((a, b) => a - b);
function pctRank(value: number, sorted: number[]): number {
if (!sorted.length) return 0;
let lo = 0, hi = sorted.length;
while (lo < hi) { const mid = (lo + hi) >> 1; if (sorted[mid] <= value) lo = mid + 1; else hi = mid; }
return lo / sorted.length;
}
// inside cellColors loop:
const intensity = 0.12 + pctRank(total, sortedDaily) * 0.88;
```
- Each day is ranked against all other active days; the laziest active day =
intensity 0.12, the busiest = 1.0. The colour scale spreads evenly regardless
of km gaps.
- GitHub-contribution-graph style: easy to see "busy vs quiet" relative to
your own habits.
- Downside: absolute effort is not visible. A 5 km walk and a 200 km ride can
look the same if they're both 95th-percentile days for their respective sports.
- Legend says `More (percentile · max X km)` to hint at both dimensions.
### Shared infrastructure
- Blended colours: in "All" sport view, each cell's RGB is a weighted average
of sport colours by distance that day.
- `applyIntensity(hex, t)`: lerps from zinc-800 (#27272a = 39,39,42) to the
target colour, so dim cells fade into the background rather than going black.
- `$: cellColors = Map<string, string>` — precomputed reactively so Svelte
detects the dependency change when the sport filter or scale method changes
(plain function calls with static args don't trigger Svelte re-renders).
## Activity sidecar edits — design spec
Users edit activities via **sidecar markdown files** that live alongside BAS JSON in the data dir.
No database, no server — consistent with the project's static-files-only philosophy.
### File naming
```
~/bincio_data/
2024-05-15T10:30:00Z_cycling.json ← immutable extract output (never touched)
2024-05-15T10:30:00Z_cycling.md ← user edits (sidecar)
```
Same stem as the JSON, `.md` extension. `bincio extract` never writes `.md` files,
so re-running extract is always safe and will never clobber user edits.
### Sidecar format
YAML frontmatter + optional Markdown body:
```markdown
---
title: "Epic climb up Monte Grappa"
sport: cycling # override detected sport
hide_stats: [cadence] # suppress specific stat panels in detail view
highlight: true # pin/feature in feed (shown first, maybe badged)
private: false # exclude from public feed
gear: "Trek Domane" # freeform gear note
---
Rode with Marco and Giulia. Legs felt great after the rest week...
```
- All frontmatter keys are optional; omit means "keep extracted value"
- The Markdown body becomes the activity's `description`, rendered as HTML in the detail page
- `hide_stats` takes stat panel names: `elevation`, `speed`, `heart_rate`, `cadence`, `power`
### Where overrides are applied: the render stage
The **render stage** (`bincio render`) is the right place — not extract, not the browser.
- Extract → clean BAS JSON (immutable)
- Render → merges sidecars → Astro build consumes enriched data
A `bincio.render.merge` module walks the data dir, finds `*.md` sidecars,
and produces either enriched JSON files or a separate `overrides/index.json`
that Astro reads at build time. The site never needs to fetch a `.md` file
at runtime — all merging is build-time, keeping the static-first guarantee.
### Federation angle
Sidecars work for *remote* activities too: if you include someone else's BAS feed,
you can write local `.md` sidecars for their activity IDs. Your render stage applies
your overrides on top of their data. This is a natural extension of the local case.
### Editing UX: `bincio edit --serve`
A separate FastAPI server (`bincio edit --serve`, default port 4041) handles all writes.
The static site and Astro are untouched — no hybrid mode, no dead-code API routes in prod.
**How it works:**
```
bincio edit --serve --data ~/bincio_data # starts on :4041
```
- Serves a bundled Svelte UI (single compiled HTML, reuses existing Svelte investment)
- `GET /api/activity/{id}` — returns merged BAS JSON + existing sidecar fields
- `POST /api/activity/{id}` — writes `edits/{id}.md` to the data dir
- `POST /api/activity/{id}/images` — multipart upload → `edits/images/{id}/{filename}`
- The Astro dev server's file watcher picks up `.md` writes → incremental rebuild
**Edit UI features:**
- Title text input (pre-filled from BAS JSON)
- Sport dropdown (pre-filled, shows all known sport types)
- Markdown textarea for description, with minimal toolbar (bold, italic, link, image insert)
- Live markdown preview panel
- `hide_stats` checkbox group: elevation, speed, heart_rate, cadence, power
- `highlight` toggle (feature in feed)
- `private` toggle (suppress from feed at render time)
- Image drag-and-drop zone → uploads to `edits/images/{id}/`, inserts `![]()` into textarea
- Save button → POST to API → success toast
**Workflow (typical):**
1. User browses the Astro dev server on :4040
2. Activity detail page has an "Edit" button (rendered only when `PUBLIC_EDIT_URL` env var is set)
3. Button links to `:4041/edit/{id}` — opens the FastAPI-served edit UI
4. User fills in form, saves → sidecar written → Astro rebuilds → refreshing :4040 shows changes
The `PUBLIC_EDIT_URL` env var in `.env` controls whether the Edit button appears;
leave it unset for production builds, set to `http://localhost:4041` for local dev.
### Image storage
```
~/bincio_data/
edits/
2024-05-15T10:30:00Z_cycling.md
images/
2024-05-15T10:30:00Z_cycling/
col-summit.jpg
group-photo.jpg
```
Images are referenced in the markdown body with relative paths: `![Summit](col-summit.jpg)`.
The render stage resolves relative image paths against `edits/images/{id}/` and copies them
to `site/public/images/activities/{id}/` so they're served from the static site.
### Decided
- **Sidecar location**: `edits/` subdirectory (not co-located with JSON) — cleaner, easier to
backup/sync just your customisations independently of the extracted data
- **`private: true`**: suppresses from `index.json` at render time (not client-side hide) —
safer for public hosting
- **`highlight`**: visual badge in feed + sorted before non-highlighted activities
- **Edit UI**: `bincio edit --serve` FastAPI server (Option B) — not integrated into Astro
## Known issues / next steps
- `bincio render` Python CLI is a stub — site is built via `npm run build` directly
- Activity IDs in existing test data still use `+0000` format (pre-fix); re-run extract to get `Z` format
- Some activities appear with both untitled and titled IDs (near-dedup timing race)
- Stats page heatmap month labels are embedded in the week-column flex grid (fixed March 2026); `getWeeks` uses `localISO()` not `toISOString()` to avoid UTC/local date mismatch
- Federation (remote data sources) not yet implemented in site
- Friends pages (`/friends/{handle}/`) not yet implemented
- `bincio render` should automate: symlink data → `astro build`
- The `site/.env` file is gitignored — document the setup for new users
- Add `--workers` benchmark: on 8 cores, ~7 min for 3,200 activities first run
## What "good" looks like (not yet done)
- [ ] `bincio render` Python CLI wraps `astro build` properly
- [ ] Friends/federation pages in site
- [ ] Personal records page
- [ ] Activity search / full-text filter in feed
- [ ] Map thumbnail in activity cards (SVG path from GeoJSON)
- [ ] GitHub Actions template for auto-publish
- [ ] Karoo/Garmin Connect importers beyond Strava
- [ ] `bincio.render.merge` module: walk `edits/`, parse sidecars, produce enriched data for Astro
- [ ] `bincio render --watch` incremental rebuild on sidecar changes
- [ ] Sidecar `.md` format: title, sport, description, hide_stats, highlight, private, images
- [ ] `bincio edit --serve` FastAPI server with Svelte edit UI (port 4041)
- [ ] Edit button on activity detail pages (visible when `PUBLIC_EDIT_URL` env var set)
- [ ] Image upload → `edits/images/{id}/`, render stage copies to `public/images/activities/{id}/`