data ingestion plan
This commit is contained in:
@@ -489,6 +489,119 @@ to power-having activities, pull their `mmp` arrays, take element-wise max per s
|
|||||||
7. `AthleteDrawer.svelte` — zones + gear editing form
|
7. `AthleteDrawer.svelte` — zones + gear editing form
|
||||||
8. Season config in `extract_config.yaml` / `edits/athlete.yaml`
|
8. Season config in `extract_config.yaml` / `edits/athlete.yaml`
|
||||||
|
|
||||||
|
## Data ingestion — design plan
|
||||||
|
|
||||||
|
How activity data gets into BincioActivity. Three orthogonal vectors.
|
||||||
|
|
||||||
|
### Vector 1 — Web file upload (extends existing edit server)
|
||||||
|
|
||||||
|
The lowest-friction path: drag a FIT/GPX/TCX file onto the site, it appears immediately.
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/upload multipart FIT/GPX/TCX
|
||||||
|
→ saves to staging dir
|
||||||
|
→ bincio extract on just that file
|
||||||
|
→ merge_all()
|
||||||
|
→ returns { id, redirect: "/activity/{id}/" }
|
||||||
|
```
|
||||||
|
|
||||||
|
An "Upload activity" button in the nav (gated behind `PUBLIC_EDIT_URL` like the edit drawer).
|
||||||
|
No CLI needed. Preserves static-site output — the server only exists in local editing mode.
|
||||||
|
|
||||||
|
### Vector 2 — Platform importers
|
||||||
|
|
||||||
|
#### `bincio import strava` — pull your Strava history
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bincio import strava \
|
||||||
|
--client-id 12345 \
|
||||||
|
--client-secret abc... \
|
||||||
|
--output ~/bincio_data \
|
||||||
|
--since 2024-01-01 # optional, default: all-time
|
||||||
|
```
|
||||||
|
|
||||||
|
**How Strava API access works:**
|
||||||
|
|
||||||
|
Every Strava user can register an API app at `strava.com/settings/api` — no review,
|
||||||
|
no approval, no fees. Fill in a name, website (`localhost` is fine), and callback
|
||||||
|
domain (`localhost` for local use). You instantly get a Client ID and Client Secret.
|
||||||
|
|
||||||
|
Strava's "developer" label is misleading: formal review is only required for
|
||||||
|
commercial apps used by *other* people. For a self-hosted personal tool, each user
|
||||||
|
brings their own credentials and authenticates their own account. Rate limits are
|
||||||
|
generous for personal use: **100 requests / 15 min, 1000 / day**.
|
||||||
|
|
||||||
|
The importer:
|
||||||
|
1. Opens a local OAuth2 callback server (like `gh auth login`)
|
||||||
|
2. Pops a browser to `strava.com/oauth/authorize?scope=activity:read_all`
|
||||||
|
3. User clicks Authorize → callback receives the code → exchanges for tokens
|
||||||
|
4. Tokens saved to `~/.config/bincio/strava_tokens.json`
|
||||||
|
5. Fetches paginated activity list → for each, fetches streams (lat/lng, time,
|
||||||
|
altitude, HR, cadence, power, velocity) → converts to BAS JSON
|
||||||
|
6. Idempotent: existing IDs (matched by Strava activity ID embedded in BAS metadata)
|
||||||
|
are skipped. Safe to re-run for incremental sync.
|
||||||
|
|
||||||
|
Strava streams give the same data as FIT files at ~1 Hz (GPS, power meter, HR strap).
|
||||||
|
|
||||||
|
#### `bincio import garmin` — Garmin Connect
|
||||||
|
|
||||||
|
No official public API. Options:
|
||||||
|
- **`garminconnect` Python library** — unofficial but widely used (same approach as
|
||||||
|
tapiriik, garmin-connect-export). Works with email/password or session cookies.
|
||||||
|
- **FIT file sync** — Garmin Express / Tapiriik sync FIT files to a local folder;
|
||||||
|
`bincio extract` picks them up normally. Simplest.
|
||||||
|
|
||||||
|
#### Watch mode — for ongoing device sync
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bincio extract --watch ~/Dropbox/Garmin/Activities --output ~/bincio_data
|
||||||
|
```
|
||||||
|
|
||||||
|
Watches a directory for new FIT/GPX/TCX files (using `watchfiles` or `inotify`).
|
||||||
|
New file dropped → auto-extract → `merge_all()` → site reflects it on next reload.
|
||||||
|
Zero friction for users who already sync files from Garmin/Karoo/Wahoo to a folder
|
||||||
|
via Dropbox, Syncthing, or Garmin Express.
|
||||||
|
|
||||||
|
#### Strava webhook — real-time push (advanced)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bincio edit --data-dir ~/bincio_data --webhook-strava
|
||||||
|
```
|
||||||
|
|
||||||
|
Registers a Strava webhook subscription. When you finish a ride, Strava pushes a
|
||||||
|
notification → server fetches streams → extract → merge. Requires a **publicly
|
||||||
|
accessible URL** (works with Tailscale, a VPS, or ngrok). Not needed for most
|
||||||
|
self-hosters; polling via `bincio import strava --since yesterday` is simpler.
|
||||||
|
|
||||||
|
### Vector 3 — Federation (pull remote BAS feeds)
|
||||||
|
|
||||||
|
The cleanest "data in from the web" path for the self-hosted model:
|
||||||
|
anyone who publishes a `index.json` at a public URL is a data source.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# extract_config.yaml
|
||||||
|
sources:
|
||||||
|
- url: https://alice.example.com/data/index.json
|
||||||
|
handle: alice
|
||||||
|
- url: https://bob.example.com/data/index.json
|
||||||
|
handle: bob
|
||||||
|
```
|
||||||
|
|
||||||
|
`bincio render` fetches remote index files at build time, merges them into the
|
||||||
|
site. No API keys, no OAuth. Local `.md` sidecars can annotate remote activities.
|
||||||
|
Not yet implemented — see friends/federation items in the checklist below.
|
||||||
|
|
||||||
|
### Implementation priority
|
||||||
|
|
||||||
|
1. **Web file upload** — trivial to build, highest immediate UX value
|
||||||
|
2. **`bincio import strava`** — covers historical migration and incremental sync;
|
||||||
|
most cyclists already have years of data there
|
||||||
|
3. **Watch mode** — covers ongoing FIT-file-based workflows (Karoo, Garmin)
|
||||||
|
4. **Garmin Connect importer** — second most common platform
|
||||||
|
5. **Federation** — longer term; enables the "personal Strava" social layer
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Known issues / next steps
|
## Known issues / next steps
|
||||||
|
|
||||||
- `bincio render` Python CLI is a stub — site is built via `npm run build` directly
|
- `bincio render` Python CLI is a stub — site is built via `npm run build` directly
|
||||||
@@ -512,6 +625,11 @@ to power-having activities, pull their `mmp` arrays, take element-wise max per s
|
|||||||
- [ ] Activity search / full-text filter in feed
|
- [ ] Activity search / full-text filter in feed
|
||||||
- [ ] Map thumbnail in activity cards (SVG path from GeoJSON)
|
- [ ] Map thumbnail in activity cards (SVG path from GeoJSON)
|
||||||
- [ ] GitHub Actions template for auto-publish
|
- [ ] GitHub Actions template for auto-publish
|
||||||
|
- [ ] **Ingestion: web file upload** — `POST /api/upload` in edit server, drag-and-drop in nav
|
||||||
|
- [ ] **Ingestion: `bincio import strava`** — OAuth2 + streams API, idempotent incremental sync
|
||||||
|
- [ ] **Ingestion: `bincio extract --watch`** — directory watcher for ongoing FIT sync
|
||||||
|
- [ ] **Ingestion: `bincio import garmin`** — garminconnect library or FIT folder sync
|
||||||
|
- [ ] **Ingestion: federation** — `sources:` in config, remote BAS index pull at render time
|
||||||
- [ ] Karoo/Garmin Connect importers beyond Strava
|
- [ ] Karoo/Garmin Connect importers beyond Strava
|
||||||
- [x] `bincio.render.merge` — sidecar parser, `_merged/` output, private filter, highlight sort
|
- [x] `bincio.render.merge` — sidecar parser, `_merged/` output, private filter, highlight sort
|
||||||
- [x] `bincio edit` FastAPI write API (GET/POST activity, image upload/delete, triggers merge)
|
- [x] `bincio edit` FastAPI write API (GET/POST activity, image upload/delete, triggers merge)
|
||||||
|
|||||||
Reference in New Issue
Block a user