From 77e7b1dbec1c27365f9ac1cdb6ae315c0baa62f9 Mon Sep 17 00:00:00 2001 From: Davide Scaini Date: Mon, 30 Mar 2026 11:43:01 +0200 Subject: [PATCH] data ingestion plan --- CLAUDE.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 96df3dc..e320ced 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -489,6 +489,119 @@ to power-having activities, pull their `mmp` arrays, take element-wise max per s 7. `AthleteDrawer.svelte` — zones + gear editing form 8. Season config in `extract_config.yaml` / `edits/athlete.yaml` +## Data ingestion — design plan + +How activity data gets into BincioActivity. Three orthogonal vectors. + +### Vector 1 — Web file upload (extends existing edit server) + +The lowest-friction path: drag a FIT/GPX/TCX file onto the site, it appears immediately. + +``` +POST /api/upload multipart FIT/GPX/TCX +→ saves to staging dir +→ bincio extract on just that file +→ merge_all() +→ returns { id, redirect: "/activity/{id}/" } +``` + +An "Upload activity" button in the nav (gated behind `PUBLIC_EDIT_URL` like the edit drawer). +No CLI needed. Preserves static-site output — the server only exists in local editing mode. + +### Vector 2 — Platform importers + +#### `bincio import strava` — pull your Strava history + +```bash +bincio import strava \ + --client-id 12345 \ + --client-secret abc... \ + --output ~/bincio_data \ + --since 2024-01-01 # optional, default: all-time +``` + +**How Strava API access works:** + +Every Strava user can register an API app at `strava.com/settings/api` — no review, +no approval, no fees. Fill in a name, website (`localhost` is fine), and callback +domain (`localhost` for local use). You instantly get a Client ID and Client Secret. + +Strava's "developer" label is misleading: formal review is only required for +commercial apps used by *other* people. For a self-hosted personal tool, each user +brings their own credentials and authenticates their own account. Rate limits are +generous for personal use: **100 requests / 15 min, 1000 / day**. + +The importer: +1. Opens a local OAuth2 callback server (like `gh auth login`) +2. Pops a browser to `strava.com/oauth/authorize?scope=activity:read_all` +3. User clicks Authorize → callback receives the code → exchanges for tokens +4. Tokens saved to `~/.config/bincio/strava_tokens.json` +5. Fetches paginated activity list → for each, fetches streams (lat/lng, time, + altitude, HR, cadence, power, velocity) → converts to BAS JSON +6. Idempotent: existing IDs (matched by Strava activity ID embedded in BAS metadata) + are skipped. Safe to re-run for incremental sync. + +Strava streams give the same data as FIT files at ~1 Hz (GPS, power meter, HR strap). + +#### `bincio import garmin` — Garmin Connect + +No official public API. Options: +- **`garminconnect` Python library** — unofficial but widely used (same approach as + tapiriik, garmin-connect-export). Works with email/password or session cookies. +- **FIT file sync** — Garmin Express / Tapiriik sync FIT files to a local folder; + `bincio extract` picks them up normally. Simplest. + +#### Watch mode — for ongoing device sync + +```bash +bincio extract --watch ~/Dropbox/Garmin/Activities --output ~/bincio_data +``` + +Watches a directory for new FIT/GPX/TCX files (using `watchfiles` or `inotify`). +New file dropped → auto-extract → `merge_all()` → site reflects it on next reload. +Zero friction for users who already sync files from Garmin/Karoo/Wahoo to a folder +via Dropbox, Syncthing, or Garmin Express. + +#### Strava webhook — real-time push (advanced) + +```bash +bincio edit --data-dir ~/bincio_data --webhook-strava +``` + +Registers a Strava webhook subscription. When you finish a ride, Strava pushes a +notification → server fetches streams → extract → merge. Requires a **publicly +accessible URL** (works with Tailscale, a VPS, or ngrok). Not needed for most +self-hosters; polling via `bincio import strava --since yesterday` is simpler. + +### Vector 3 — Federation (pull remote BAS feeds) + +The cleanest "data in from the web" path for the self-hosted model: +anyone who publishes a `index.json` at a public URL is a data source. + +```yaml +# extract_config.yaml +sources: + - url: https://alice.example.com/data/index.json + handle: alice + - url: https://bob.example.com/data/index.json + handle: bob +``` + +`bincio render` fetches remote index files at build time, merges them into the +site. No API keys, no OAuth. Local `.md` sidecars can annotate remote activities. +Not yet implemented — see friends/federation items in the checklist below. + +### Implementation priority + +1. **Web file upload** — trivial to build, highest immediate UX value +2. **`bincio import strava`** — covers historical migration and incremental sync; + most cyclists already have years of data there +3. **Watch mode** — covers ongoing FIT-file-based workflows (Karoo, Garmin) +4. **Garmin Connect importer** — second most common platform +5. **Federation** — longer term; enables the "personal Strava" social layer + +--- + ## Known issues / next steps - `bincio render` Python CLI is a stub — site is built via `npm run build` directly @@ -512,6 +625,11 @@ to power-having activities, pull their `mmp` arrays, take element-wise max per s - [ ] Activity search / full-text filter in feed - [ ] Map thumbnail in activity cards (SVG path from GeoJSON) - [ ] GitHub Actions template for auto-publish +- [ ] **Ingestion: web file upload** — `POST /api/upload` in edit server, drag-and-drop in nav +- [ ] **Ingestion: `bincio import strava`** — OAuth2 + streams API, idempotent incremental sync +- [ ] **Ingestion: `bincio extract --watch`** — directory watcher for ongoing FIT sync +- [ ] **Ingestion: `bincio import garmin`** — garminconnect library or FIT folder sync +- [ ] **Ingestion: federation** — `sources:` in config, remote BAS index pull at render time - [ ] Karoo/Garmin Connect importers beyond Strava - [x] `bincio.render.merge` — sidecar parser, `_merged/` output, private filter, highlight sort - [x] `bincio edit` FastAPI write API (GET/POST activity, image upload/delete, triggers merge)