diff --git a/.gitignore b/.gitignore index 968f5e8..dc37ea2 100644 --- a/.gitignore +++ b/.gitignore @@ -24,5 +24,9 @@ site/.astro/ bincio_data/ *.bincio_cache.json +# Local config / secrets (never commit) +.env +extract_config.yaml + # OS .DS_Store diff --git a/CHANGELOG.md b/CHANGELOG.md index bab8ae7..e9b630e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,33 @@ ## [Unreleased] — 2026-03-30 +### Data ingestion + +- **`bincio import strava`** — OAuth2 Strava importer (`bincio/import_/strava.py` + `bincio/import_/cli.py`) + - One-shot local OAuth2 callback server (port 8976); opens browser, receives code, exchanges for tokens + - Tokens saved to `~/.config/bincio/strava.json`; auto-refreshed on expiry (6h TTL) + - Fetches paginated activity list with `after=` timestamp for efficient incremental runs + - Per activity: `GET /activities/{id}/streams` → `_strava_to_parsed()` → `compute()` → `write_activity()` + - `_patch_from_summary()`: fills `None` metrics from Strava summary when sensors are missing (manual entries, indoor rides) + - Sync state persisted in `data_dir/_strava_sync.json` (imported IDs + last sync timestamp) + - Rate limit tracking via `X-RateLimit-Usage`; warns at 85% of 15-min window; auto-retries on 429 + - Credentials read from (in order): CLI flags → env vars → `extract_config.yaml` under `import.strava` + - Install: `uv sync --extra strava` + +- **Web file upload** — `POST /api/upload` in `bincio/edit/server.py` + - Accepts FIT/GPX/TCX (`.gz` variants too); 409 if activity already exists + - Runs full extract pipeline inline: `parse_file()` → `compute()` → `write_activity()` → `merge_all()` + - Staged to `data_dir/_uploads/` during processing; cleaned up in `finally` + - `↑` button in site nav, gated behind `PUBLIC_EDIT_URL`; drag-and-drop modal; auto-redirects on success + +- **`extract_config.yaml` is now gitignored** — safe to store credentials under `import.strava` + - `StravaConfig` dataclass added to `bincio/extract/config.py`; parsed from `import.strava:` block + - `extract_config.example.yaml` is the tracked template + +- **Theme-aware heatmap** (`StatsView.svelte`) — `applyIntensity()` now lerps from the correct + background colour in both dark (zinc-800 `#27272a`) and light (zinc-200 `#e4e4e7`) modes; + `emptyColor` and `baseRgb` reactive to `data-theme` via `MutationObserver` + ### Athlete page - **`/athlete` page** — three-tab layout: Power Curve · Records · Profile diff --git a/CHEATSHEET.md b/CHEATSHEET.md index 36b8518..9a85c34 100644 --- a/CHEATSHEET.md +++ b/CHEATSHEET.md @@ -3,13 +3,16 @@ ## Daily workflow ```bash -# 1. Drop new .fit / .gpx / .tcx files into your input dir, then: -uv run bincio extract +# Option A — local files (Karoo / Garmin / Wahoo) +uv run bincio extract # processes new/changed files, skips unchanged -# 2. Rebuild the site (merges any sidecar edits, then builds) +# Option B — pull from Strava (incremental; credentials in extract_config.yaml) +uv run bincio import strava # fetches only activities since last sync + +# Rebuild the site (merges any sidecar edits, then builds) uv run bincio render -# 3. Done — copy site/dist/ to your host +# Done — copy site/dist/ to your host ``` --- @@ -29,6 +32,47 @@ To force a full re-extract: `rm -rf ~/bincio_data && uv run bincio extract` --- +## Import from Strava + +```bash +# Install (one-time) +uv sync --extra strava + +# Add credentials to extract_config.yaml (gitignored — safe for secrets): +# import: +# strava: +# client_id: 12345 +# client_secret: your_secret + +# First run — opens browser for OAuth, then imports all activities: +uv run bincio import strava + +# Subsequent runs are incremental (only fetches since last sync): +uv run bincio import strava + +# Other options: +uv run bincio import strava --since 2025-01-01 # explicit date cutoff +uv run bincio import strava --reauth # force new OAuth flow +uv run bincio import strava --output ~/other_dir # override output dir +``` + +Credentials resolution order: +1. `--client-id` / `--client-secret` flags +2. `STRAVA_CLIENT_ID` / `STRAVA_CLIENT_SECRET` env vars +3. `import.strava.client_id` / `client_secret` in `extract_config.yaml` + +Tokens saved to `~/.config/bincio/strava.json` and auto-refreshed (6h TTL). +Sync state (imported IDs + last sync timestamp) in `data_dir/_strava_sync.json`. + +--- + +## File upload (web UI) + +When `PUBLIC_EDIT_URL` is set in `site/.env`, a `↑` button appears in the nav. +Drag a FIT/GPX/TCX onto the modal → the activity is extracted and appears immediately. + +--- + ## Render ```bash @@ -111,6 +155,8 @@ IDs are stable — safe to use in bookmarks and links. ## extract_config.yaml — key fields +This file is **gitignored** — copy from `extract_config.example.yaml` and add your credentials safely. + ```yaml owner: handle: yourname @@ -130,6 +176,11 @@ track: rdp_epsilon: 0.0001 # GPS simplification — larger = fewer points timeseries_hz: 1 # samples/sec in stored JSON (1 = 1 Hz) +import: + strava: + client_id: 12345 # from strava.com/settings/api + client_secret: abc # Authorization Callback Domain must be: localhost + athlete: max_hr: 182 # used for context; zones below are authoritative ftp_w: 280 # functional threshold power in watts @@ -249,12 +300,13 @@ print(len(others), 'total') | File | Purpose | |---|---| -| `extract_config.yaml` | Main config (input dirs, output dir, privacy) | -| `site/.env` | Site env vars (`BINCIO_DATA_DIR`, `PUBLIC_EDIT_URL`) — copy from `.env.example` | +| `extract_config.yaml` | Main config — input dirs, output dir, athlete zones, Strava credentials. **Gitignored.** Copy from `.example`. | +| `site/.env` | Site env vars (`BINCIO_DATA_DIR`, `PUBLIC_EDIT_URL`) — copy from `site/.env.example`. Gitignored. | | `SCHEMA.md` | BAS format specification | | `CLAUDE.md` | Dev notes, gotchas, design decisions | | `bincio/render/merge.py` | Sidecar overlay logic — `parse_sidecar`, `merge_all` | -| `bincio/edit/server.py` | FastAPI edit API — GET/POST activity, image upload | +| `bincio/edit/server.py` | FastAPI edit API — GET/POST activity, image upload, file upload (`POST /api/upload`) | +| `bincio/import_/strava.py` | Strava OAuth2 client + stream → BAS conversion | | `bincio/extract/sport.py` | Sport name normalisation + mapping | | `bincio/extract/metrics.py` | Distance, speed, HR, elevation computation | | `bincio/extract/parsers/fit.py` | FIT file parser | diff --git a/CLAUDE.md b/CLAUDE.md index 14d9e97..d34e1f1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -51,24 +51,32 @@ bincio/ Python package writer.py BAS JSON + GeoJSON writer config.py extract_config.yaml loader cli.py `bincio extract` CLI + import_/ + strava.py Strava API importer (OAuth2, streams → BAS JSON) + cli.py `bincio import strava` CLI render/ cli.py `bincio render` CLI (symlinks data, runs astro build/dev) + edit/ + server.py FastAPI write API (activity edits, image upload, file upload) + cli.py `bincio edit` CLI schema/ bas-v1.schema.json JSON Schema for BAS SCHEMA.md Human-readable BAS spec site/ Astro project src/ - layouts/Base.astro + layouts/Base.astro Nav (upload button + theme toggle), theme CSS vars pages/ index.astro Activity feed (loads index.json client-side) activity/[id].astro Single activity (SSG, loads detail JSON client-side) stats/index.astro Heatmap + year totals + athlete/index.astro MMP curve + athlete profile (planned) components/ ActivityFeed.svelte Card grid, sport filter, pagination ActivityDetail.svelte Map + stats + charts wrapper ActivityMap.svelte MapLibre GL (gradient track, linked hover dot) ActivityCharts.svelte Observable Plot (elevation/speed/HR/cadence/power tabs) - StatsView.svelte Yearly heatmap + totals + StatsView.svelte Yearly heatmap + click-to-pin tooltip + EditDrawer.svelte Slide-in activity editor lib/ types.ts BAS TypeScript types format.ts formatDistance, formatDuration, sportIcon, etc. @@ -77,14 +85,23 @@ site/ Astro project ## How to run ```bash -# Extract +# Extract from local files cd ~/src/bincio_activity uv run bincio extract --input ~/src/cycling_data_davide/activities --output /tmp/bincio_test +# Import from Strava (credentials in extract_config.yaml under import.strava) +uv sync --extra strava +uv run bincio import strava # first run opens browser for OAuth +uv run bincio import strava # subsequent runs are incremental + # Site dev server cd site ln -sf /tmp/bincio_test public/data # symlink data -BINCIO_DATA_DIR=/tmp/bincio_test npm run dev +npm run dev + +# Edit server (enables drawer + file upload in the site) +uv run bincio edit --data-dir ~/bincio_data # port 4041 +# Set PUBLIC_EDIT_URL=http://localhost:4041 in site/.env # Tests uv run pytest @@ -489,116 +506,125 @@ to power-having activities, pull their `mmp` arrays, take element-wise max per s 7. `AthleteDrawer.svelte` — zones + gear editing form 8. Season config in `extract_config.yaml` / `edits/athlete.yaml` -## Data ingestion — design plan +## Data ingestion How activity data gets into BincioActivity. Three orthogonal vectors. -### Vector 1 — Web file upload (extends existing edit server) +### Vector 1 — Web file upload ✓ -The lowest-friction path: drag a FIT/GPX/TCX file onto the site, it appears immediately. +Drag a FIT/GPX/TCX onto the site while the edit server is running — activity +appears immediately. +**Backend** (`bincio/edit/server.py`): ``` -POST /api/upload multipart FIT/GPX/TCX -→ saves to staging dir -→ bincio extract on just that file -→ merge_all() -→ returns { id, redirect: "/activity/{id}/" } +POST /api/upload multipart FIT/GPX/TCX (also .gz variants) +→ stages file to data_dir/_uploads/ +→ parse_file() → compute() → write_activity() → build_summary() +→ 409 if activity already exists (same timestamp = same ID) +→ updates index.json + merge_all() +→ returns { ok: true, id: "2024-05-15T103000Z" } +→ cleans up staged file in finally block ``` -An "Upload activity" button in the nav (gated behind `PUBLIC_EDIT_URL` like the edit drawer). -No CLI needed. Preserves static-site output — the server only exists in local editing mode. +**Frontend** (`site/src/layouts/Base.astro`): +- `↑` button in nav right cluster, only rendered when `PUBLIC_EDIT_URL` is set +- Modal with drag-and-drop zone + click-to-browse +- Auto-redirects to `/activity/{id}/` on success +- Escape / backdrop click closes modal -### Vector 2 — Platform importers +### Vector 2 — `bincio import strava` ✓ -#### `bincio import strava` — pull your Strava history +**`bincio/import_/strava.py`** + **`bincio/import_/cli.py`** + +Install: `uv sync --extra strava` ```bash -bincio import strava \ - --client-id 12345 \ - --client-secret abc... \ - --output ~/bincio_data \ - --since 2024-01-01 # optional, default: all-time +# First run (full sync — opens browser): +bincio import strava --client-id 12345 --client-secret abc --output ~/bincio_data + +# Subsequent runs (incremental — picks up from last sync automatically): +bincio import strava --client-id 12345 --client-secret abc + +# Explicit date range: +bincio import strava --client-id 12345 --client-secret abc --since 2025-01-01 + +# Force re-auth (rotate credentials or re-authorize): +bincio import strava --client-id 12345 --client-secret abc --reauth + +# Credentials via env vars (good for scripts): +export STRAVA_CLIENT_ID=12345 +export STRAVA_CLIENT_SECRET=abc +bincio import strava --output ~/bincio_data ``` -**How Strava API access works:** - -Every Strava user can register an API app at `strava.com/settings/api` — no review, -no approval, no fees. Fill in a name, website (`localhost` is fine), and callback -domain (`localhost` for local use). You instantly get a Client ID and Client Secret. +**Getting Strava API credentials (~2 minutes, no approval needed):** +1. Go to `strava.com/settings/api` +2. Create an application — name and website can be anything; set + **Authorization Callback Domain** to `localhost` +3. Paste Client ID and Client Secret into `extract_config.yaml`: + ```yaml + import: + strava: + client_id: 12345 + client_secret: your_secret_here + ``` + `extract_config.yaml` is gitignored — safe to store credentials there. Strava's "developer" label is misleading: formal review is only required for -commercial apps used by *other* people. For a self-hosted personal tool, each user -brings their own credentials and authenticates their own account. Rate limits are -generous for personal use: **100 requests / 15 min, 1000 / day**. +commercial apps that authenticate *other users*. For a personal self-hosted tool +you authenticate your own account — no review, no fees. +Rate limits: **100 req / 15 min, 1000 / day** (generous for personal use). -The importer: -1. Opens a local OAuth2 callback server (like `gh auth login`) -2. Pops a browser to `strava.com/oauth/authorize?scope=activity:read_all` -3. User clicks Authorize → callback receives the code → exchanges for tokens -4. Tokens saved to `~/.config/bincio/strava_tokens.json` -5. Fetches paginated activity list → for each, fetches streams (lat/lng, time, - altitude, HR, cadence, power, velocity) → converts to BAS JSON -6. Idempotent: existing IDs (matched by Strava activity ID embedded in BAS metadata) - are skipped. Safe to re-run for incremental sync. +**How the importer works:** -Strava streams give the same data as FIT files at ~1 Hz (GPS, power meter, HR strap). +*OAuth dance (first run):* +- Starts a one-shot local HTTP server on port 8976 +- Opens `strava.com/oauth/authorize?scope=activity:read_all` in the browser +- Receives the authorization code at `/callback` +- Exchanges code for access + refresh tokens +- Saves to `~/.config/bincio/strava.json` (keyed by client_id) +- Subsequent runs load saved tokens and refresh silently when expired (6h TTL) -#### `bincio import garmin` — Garmin Connect +*Sync loop:* +- Reads `data_dir/_strava_sync.json` for set of already-imported Strava IDs + and timestamp of last sync +- Uses Strava `after=` parameter for server-side filtering (efficient — + no need to scan all pages on incremental runs; 1-hour overlap to catch late saves) +- Per activity: `GET /activities/{id}/streams` → `_strava_to_parsed()` → + `compute()` → `_patch_from_summary()` → `write_activity()` → `build_summary()` +- Writes updated `index.json` + `_strava_sync.json` +- Calls `merge_all()` if `edits/` directory exists -No official public API. Options: -- **`garminconnect` Python library** — unofficial but widely used (same approach as - tapiriik, garmin-connect-export). Works with email/password or session cookies. -- **FIT file sync** — Garmin Express / Tapiriik sync FIT files to a local folder; - `bincio extract` picks them up normally. Simplest. +*Conversion details:* +- `sport_type` (or `type`) → `normalise_sport()` — same mapping as FIT/GPX +- Streams: `time` (s since start) + `latlng` + `altitude` + `heartrate` + + `cadence` + `watts` + `velocity_smooth` (m/s → km/h) → `DataPoint` list +- `source_hash`: `sha256("strava:{id}")` — stable, not file-content-based +- `_patch_from_summary()`: fills `None` metric fields (distance, duration, + elevation, HR, power) from the Strava activity summary for activities with + missing/sparse sensors or manual entries +- Rate limit: warns at 85% of 15-min window; auto-retries with 60s sleep on 429 -#### Watch mode — for ongoing device sync +### Vector 3 — Platform watch mode (planned) ```bash bincio extract --watch ~/Dropbox/Garmin/Activities --output ~/bincio_data ``` -Watches a directory for new FIT/GPX/TCX files (using `watchfiles` or `inotify`). -New file dropped → auto-extract → `merge_all()` → site reflects it on next reload. -Zero friction for users who already sync files from Garmin/Karoo/Wahoo to a folder -via Dropbox, Syncthing, or Garmin Express. +Directory watcher (`watchfiles` / `inotify`) for ongoing FIT sync from Karoo, +Garmin, Wahoo. New file → auto-extract → merge. Not yet implemented. -#### Strava webhook — real-time push (advanced) - -```bash -bincio edit --data-dir ~/bincio_data --webhook-strava -``` - -Registers a Strava webhook subscription. When you finish a ride, Strava pushes a -notification → server fetches streams → extract → merge. Requires a **publicly -accessible URL** (works with Tailscale, a VPS, or ngrok). Not needed for most -self-hosters; polling via `bincio import strava --since yesterday` is simpler. - -### Vector 3 — Federation (pull remote BAS feeds) - -The cleanest "data in from the web" path for the self-hosted model: -anyone who publishes a `index.json` at a public URL is a data source. +### Vector 4 — Federation (planned) ```yaml # extract_config.yaml sources: - url: https://alice.example.com/data/index.json handle: alice - - url: https://bob.example.com/data/index.json - handle: bob ``` -`bincio render` fetches remote index files at build time, merges them into the -site. No API keys, no OAuth. Local `.md` sidecars can annotate remote activities. -Not yet implemented — see friends/federation items in the checklist below. - -### Implementation priority - -1. **Web file upload** — trivial to build, highest immediate UX value -2. **`bincio import strava`** — covers historical migration and incremental sync; - most cyclists already have years of data there -3. **Watch mode** — covers ongoing FIT-file-based workflows (Karoo, Garmin) -4. **Garmin Connect importer** — second most common platform -5. **Federation** — longer term; enables the "personal Strava" social layer +`bincio render` fetches remote BAS index files at build time. No API keys. +Local `.md` sidecars can annotate remote activities. --- @@ -626,7 +652,7 @@ Not yet implemented — see friends/federation items in the checklist below. - [ ] Map thumbnail in activity cards (SVG path from GeoJSON) - [ ] GitHub Actions template for auto-publish - [x] **Ingestion: web file upload** — `POST /api/upload` in edit server, drag-and-drop in nav -- [ ] **Ingestion: `bincio import strava`** — OAuth2 + streams API, idempotent incremental sync +- [x] **Ingestion: `bincio import strava`** — OAuth2 + streams API, idempotent incremental sync - [ ] **Ingestion: `bincio extract --watch`** — directory watcher for ongoing FIT sync - [ ] **Ingestion: `bincio import garmin`** — garminconnect library or FIT folder sync - [ ] **Ingestion: federation** — `sources:` in config, remote BAS index pull at render time diff --git a/README.md b/README.md index f2547c6..d4b1b16 100644 --- a/README.md +++ b/README.md @@ -12,20 +12,21 @@ BincioActivity is a self-hosted, federated activity stats platform. You point it ## How it works ``` -GPX / FIT / TCX files - │ - ▼ - bincio extract ← Python CLI. Reads files, writes plain JSON. - │ - ▼ - ~/bincio_data/ ← BAS data store. Human-readable JSON + GeoJSON. - edits/*.md ← Optional sidecar edits (titles, descriptions, photos). - │ - ▼ - bincio render ← Merges sidecars → _merged/. Runs Astro build. - │ - ▼ - site/dist/ ← Drop anywhere. Open index.html. Done. +GPX / FIT / TCX files Strava API + │ │ + ▼ ▼ + bincio extract bincio import strava ← Pull from Strava, or upload via browser ↑ + │ │ + └────────────┬───────────┘ + ▼ + ~/bincio_data/ ← BAS data store. Plain JSON + GeoJSON. + edits/*.md ← Optional sidecar edits (titles, descriptions, photos). + │ + ▼ + bincio render ← Merges sidecars → _merged/. Runs Astro build. + │ + ▼ + site/dist/ ← Drop anywhere. Open index.html. Done. ``` Everything in `~/bincio_data/` is plain text you can read, edit, back up, or publish to a CDN. The site build is fully reproducible from those files. @@ -43,10 +44,16 @@ uv sync # installs the bincio package + all dependencies # 2. Configure cp extract_config.example.yaml extract_config.yaml $EDITOR extract_config.yaml # set input dirs, output dir, your name +# extract_config.yaml is gitignored — safe to add credentials here -# 3. Extract activities → BAS JSON +# 3a. Extract from local files uv run bincio extract +# 3b. Or import from Strava +uv sync --extra strava +# Add credentials to extract_config.yaml under import.strava, then: +bincio import strava # opens browser on first run + # 4. Build the site (requires Node >= 20) cd site && npm install && cd .. cp site/.env.example site/.env # configure BINCIO_DATA_DIR @@ -59,10 +66,11 @@ For live development with hot reload: uv run bincio render --serve # merges edits, links data, starts astro dev # → http://localhost:4321 -# Optional: enable the activity edit UI +# Optional: enable the activity edit UI + file upload uv sync --extra edit # install FastAPI + uvicorn (one-time) uv run bincio edit # starts edit server on http://localhost:4041 # Set PUBLIC_EDIT_URL=http://localhost:4041 in site/.env +# → Edit button and ↑ Upload button appear in the site nav ``` --- @@ -112,6 +120,9 @@ uv sync # install / update dependencies ### `extract_config.yaml` +This is the single configuration file for the Python side of BincioActivity. +It is **gitignored** — safe to store credentials here. Copy from `extract_config.example.yaml`. + ```yaml owner: handle: yourname @@ -134,6 +145,13 @@ track: incremental: true # skip files whose hash hasn't changed +# Strava API credentials — from strava.com/settings/api +# Authorization Callback Domain must be set to: localhost +import: + strava: + client_id: 12345 + client_secret: your_client_secret_here + # Optional: athlete profile for zone overlays on HR/power charts athlete: max_hr: 182 @@ -211,6 +229,8 @@ At build time the renderer fetches their public data and renders it under `/frie | Layer | Technology | |---|---| | Extract | Python 3.12, click, fitdecode, gpxpy, lxml | +| Strava import | requests (optional extra: `uv sync --extra strava`) | +| Edit server | FastAPI + uvicorn (optional extra: `uv sync --extra edit`) | | Site framework | Astro 4 (static output) | | UI components | Svelte 5 | | Styling | Tailwind CSS v3 | @@ -235,12 +255,16 @@ bincio/ Python package dedup.py hash-based + near-duplicate detection strava_csv.py Strava activities.csv reader writer.py BAS JSON + GeoJSON writer + config.py extract_config.yaml loader (includes import.strava) + import_/ + strava.py Strava OAuth2 + streams → BAS JSON + cli.py `bincio import strava` entry point render/ cli.py `bincio render` — merge + astro build/serve merge.py sidecar edit overlay (produces _merged/) edit/ cli.py `bincio edit` — local edit server - server.py FastAPI write API for the edit drawer + server.py FastAPI write API (activity edits, image + file upload) schema/ bas-v1.schema.json JSON Schema for BAS format SCHEMA.md Human-readable BAS specification diff --git a/bincio/cli.py b/bincio/cli.py index fe05826..0360bce 100644 --- a/bincio/cli.py +++ b/bincio/cli.py @@ -1,4 +1,4 @@ -"""Top-level CLI entry point: `bincio extract` and `bincio render`.""" +"""Top-level CLI entry point.""" import click @@ -11,10 +11,12 @@ def main() -> None: """BincioActivity — federated, open-source activity stats.""" -from bincio.extract.cli import extract # noqa: E402 -from bincio.render.cli import render # noqa: E402 -from bincio.edit.cli import edit # noqa: E402 +from bincio.extract.cli import extract # noqa: E402 +from bincio.render.cli import render # noqa: E402 +from bincio.edit.cli import edit # noqa: E402 +from bincio.import_.cli import import_group # noqa: E402 main.add_command(extract) main.add_command(render) main.add_command(edit) +main.add_command(import_group) diff --git a/bincio/extract/config.py b/bincio/extract/config.py index 8e4a461..9df9916 100644 --- a/bincio/extract/config.py +++ b/bincio/extract/config.py @@ -27,6 +27,12 @@ class ClassifierConfig: enabled: bool = False # off by default; opt-in +@dataclass +class StravaConfig: + client_id: str = "" + client_secret: str = "" + + @dataclass class AthleteConfig: max_hr: int | None = None @@ -48,6 +54,7 @@ class ExtractConfig: owner_handle: str = "me" owner_display_name: str = "Me" athlete: AthleteConfig | None = None + strava: StravaConfig | None = None def load_config(path: Path) -> ExtractConfig: @@ -87,6 +94,12 @@ def load_config(path: Path) -> ExtractConfig: power_zones=ath_raw.get("power_zones"), ) if ath_raw else None + strava_raw = (raw.get("import") or {}).get("strava") or {} + strava = StravaConfig( + client_id=str(strava_raw["client_id"]) if strava_raw.get("client_id") else "", + client_secret=str(strava_raw["client_secret"]) if strava_raw.get("client_secret") else "", + ) if strava_raw else None + return ExtractConfig( input_dirs=dirs, output_dir=out, @@ -99,6 +112,7 @@ def load_config(path: Path) -> ExtractConfig: owner_handle=owner.get("handle", "me"), owner_display_name=owner.get("display_name", "Me"), athlete=athlete, + strava=strava, ) diff --git a/bincio/import_/__init__.py b/bincio/import_/__init__.py new file mode 100644 index 0000000..a822681 --- /dev/null +++ b/bincio/import_/__init__.py @@ -0,0 +1 @@ +"""BincioActivity importers — pull data from external platforms.""" diff --git a/bincio/import_/cli.py b/bincio/import_/cli.py new file mode 100644 index 0000000..8070a01 --- /dev/null +++ b/bincio/import_/cli.py @@ -0,0 +1,134 @@ +"""bincio import — CLI command group for external platform importers.""" + +from __future__ import annotations + +from pathlib import Path +from typing import Optional + +import click +from rich.console import Console + +console = Console() + + +@click.group("import") +def import_group() -> None: + """Import activities from external platforms (Strava, Garmin, …).""" + + +@import_group.command("strava") +@click.option("--client-id", default=None, envvar="STRAVA_CLIENT_ID", + help="Strava API client ID. Falls back to import.strava.client_id in extract_config.yaml.") +@click.option("--client-secret", default=None, envvar="STRAVA_CLIENT_SECRET", + help="Strava API client secret. Falls back to import.strava.client_secret in extract_config.yaml.") +@click.option("--output", "output_dir", default=None, + help="BAS data store directory (default: from config or ~/bincio_data).") +@click.option("--config", "config_path", default=None, + help="Path to extract_config.yaml (default: ./extract_config.yaml).") +@click.option("--since", default=None, metavar="YYYY-MM-DD", + help="Only import activities after this date (default: incremental from last sync).") +@click.option("--reauth", is_flag=True, default=False, + help="Force re-authorization even if valid tokens exist.") +def strava_cmd( + client_id: Optional[str], + client_secret: Optional[str], + output_dir: Optional[str], + config_path: Optional[str], + since: Optional[str], + reauth: bool, +) -> None: + """Import activities from Strava. + + Credentials are resolved in this order: + 1. --client-id / --client-secret flags + 2. STRAVA_CLIENT_ID / STRAVA_CLIENT_SECRET environment variables + 3. import.strava.client_id / client_secret in extract_config.yaml + + Tokens are saved to ~/.config/bincio/strava.json and refreshed automatically. + + \b + How to get API credentials (takes ~2 minutes, no approval needed): + 1. Go to strava.com/settings/api + 2. Create an application (name/website can be anything; + Authorization Callback Domain: localhost) + 3. Copy the Client ID and Client Secret into extract_config.yaml: + + \b + import: + strava: + client_id: 12345 + client_secret: your_secret_here + + \b + Examples: + bincio import strava # uses extract_config.yaml + bincio import strava --since 2025-01-01 # only activities from 2025 + bincio import strava --reauth # force fresh OAuth flow + """ + try: + import requests # noqa: F401 + except ImportError: + raise click.ClickException( + "requests is required for the Strava importer.\n" + "Install with: uv sync --extra strava" + ) + + from bincio.import_.strava import StravaClient, TOKENS_FILE, sync as strava_sync + + # Load config to get credentials + output dir if not given on CLI + cfg = _load_config(config_path) + + # Resolve credentials: CLI flag > env var (already consumed by click) > config file + if not client_id and cfg and cfg.strava: + client_id = cfg.strava.client_id or None + if not client_secret and cfg and cfg.strava: + client_secret = cfg.strava.client_secret or None + + if not client_id or not client_secret: + raise click.UsageError( + "Strava client ID and secret are required.\n" + "Add them to extract_config.yaml under import.strava, or pass --client-id/--client-secret." + ) + + out = _resolve_output(output_dir, cfg) + console.print(f"Output dir: [cyan]{out}[/cyan]") + + if reauth and TOKENS_FILE.exists(): + TOKENS_FILE.unlink() + console.print("Removed saved tokens — starting fresh OAuth flow.") + + client = StravaClient(client_id, client_secret, console) + client.authenticate() + + since_dt = None + if since: + from datetime import datetime, timezone + try: + since_dt = datetime.strptime(since, "%Y-%m-%d").replace(tzinfo=timezone.utc) + except ValueError: + raise click.BadParameter(f"Expected YYYY-MM-DD, got {since!r}", param_hint="--since") + + strava_sync(client, out, since_dt, console) + + +def _load_config(config_path: Optional[str]): + """Load extract_config.yaml if available; return None if not found.""" + from bincio.extract.config import load_config + candidates = [] + if config_path: + candidates.append(Path(config_path)) + candidates.append(Path("extract_config.yaml")) + for p in candidates: + if p.exists(): + return load_config(p) + return None + + +def _resolve_output(explicit: Optional[str], cfg) -> Path: + if explicit: + return Path(explicit).expanduser().resolve() + if cfg and cfg.output_dir: + return cfg.output_dir + default = Path.home() / "bincio_data" + console.print(f"[yellow]No output dir specified; using [cyan]{default}[/cyan]") + return default diff --git a/bincio/import_/strava.py b/bincio/import_/strava.py new file mode 100644 index 0000000..a63101b --- /dev/null +++ b/bincio/import_/strava.py @@ -0,0 +1,388 @@ +"""Strava API importer for BincioActivity. + +Converts Strava activities + streams into BAS JSON using the same extract +pipeline (ParsedActivity → compute() → write_activity()) as local files. + +OAuth tokens are stored in ~/.config/bincio/strava.json and refreshed +automatically. No server needed — the OAuth dance uses a one-shot local +callback server (same pattern as `gh auth login`). +""" + +from __future__ import annotations + +import dataclasses +import hashlib +import json +import secrets +import time +import webbrowser +from datetime import datetime, timedelta, timezone +from http.server import BaseHTTPRequestHandler, HTTPServer +from pathlib import Path +from typing import Any +from urllib.parse import parse_qs, urlencode, urlparse + +from rich.console import Console +from rich.progress import BarColumn, MofNCompleteColumn, Progress, TextColumn, TimeElapsedColumn + +from bincio.extract.models import DataPoint, ParsedActivity +from bincio.extract.sport import normalise_sport + +STRAVA_AUTH_URL = "https://www.strava.com/oauth/authorize" +STRAVA_TOKEN_URL = "https://www.strava.com/oauth/token" +STRAVA_API_BASE = "https://www.strava.com/api/v3" + +TOKENS_FILE = Path.home() / ".config" / "bincio" / "strava.json" +SYNC_FILE = "_strava_sync.json" # lives in output_dir +CALLBACK_PORT = 8976 +STREAM_KEYS = "time,latlng,altitude,heartrate,cadence,watts,velocity_smooth" + + +# ── API client ──────────────────────────────────────────────────────────────── + +class StravaClient: + def __init__(self, client_id: str, client_secret: str, console: Console) -> None: + self.client_id = client_id + self.client_secret = client_secret + self._console = console + self._tokens: dict = {} + self._15min_used = 0 + self._daily_used = 0 + + # ── auth ────────────────────────────────────────────────────────────────── + + def authenticate(self) -> None: + """Load saved tokens (refreshing if needed) or run the OAuth dance.""" + if TOKENS_FILE.exists(): + saved = json.loads(TOKENS_FILE.read_text(encoding="utf-8")) + if saved.get("client_id") == self.client_id: + self._tokens = saved + self._ensure_fresh() + self._console.print("[green]✓[/green] Authenticated via saved tokens.") + return + self._oauth_dance() + + def _ensure_fresh(self) -> None: + if time.time() > self._tokens.get("expires_at", 0) - 60: + self._refresh() + + def _refresh(self) -> None: + import requests + r = requests.post(STRAVA_TOKEN_URL, data={ + "client_id": self.client_id, + "client_secret": self.client_secret, + "grant_type": "refresh_token", + "refresh_token": self._tokens["refresh_token"], + }, timeout=30) + r.raise_for_status() + self._tokens.update(r.json()) + self._tokens["client_id"] = self.client_id + self._save_tokens() + + def _oauth_dance(self) -> None: + """Open browser for OAuth2 authorization, receive callback.""" + import requests + state = secrets.token_urlsafe(16) + code_holder: dict[str, str] = {} + + class _Handler(BaseHTTPRequestHandler): + def do_GET(self) -> None: + qs = parse_qs(urlparse(self.path).query) + if qs.get("state", [None])[0] == state: + code_holder["code"] = qs.get("code", [None])[0] or "" + self.send_response(200) + self.end_headers() + self.wfile.write( + b"" + b"

Authorized! You can close this tab.

" + ) + + def log_message(self, *_: Any) -> None: + pass + + server = HTTPServer(("127.0.0.1", CALLBACK_PORT), _Handler) + + params = urlencode({ + "client_id": self.client_id, + "redirect_uri": f"http://localhost:{CALLBACK_PORT}/callback", + "response_type": "code", + "scope": "activity:read_all", + "state": state, + }) + url = f"{STRAVA_AUTH_URL}?{params}" + self._console.print(f"Opening browser for Strava authorization…") + self._console.print(f"If nothing opens, visit: [cyan]{url}[/cyan]") + webbrowser.open(url) + + server.handle_request() # blocks until one request received + server.server_close() + + code = code_holder.get("code") + if not code: + raise RuntimeError("Authorization failed — no code received from Strava.") + + r = requests.post(STRAVA_TOKEN_URL, data={ + "client_id": self.client_id, + "client_secret": self.client_secret, + "code": code, + "grant_type": "authorization_code", + }, timeout=30) + r.raise_for_status() + self._tokens = r.json() + self._tokens["client_id"] = self.client_id + self._save_tokens() + self._console.print("[green]✓[/green] Authorized!") + + def _save_tokens(self) -> None: + TOKENS_FILE.parent.mkdir(parents=True, exist_ok=True) + TOKENS_FILE.write_text(json.dumps(self._tokens, indent=2), encoding="utf-8") + + # ── HTTP ────────────────────────────────────────────────────────────────── + + def _get(self, path: str, **params: Any) -> Any: + import requests as req + self._ensure_fresh() + headers = {"Authorization": f"Bearer {self._tokens['access_token']}"} + + while True: + r = req.get(f"{STRAVA_API_BASE}{path}", params=params, headers=headers, timeout=30) + + # Track rate limits + usage = r.headers.get("X-RateLimit-Usage", "") + if usage: + parts = usage.split(",") + if len(parts) == 2: + self._15min_used = int(parts[0]) + self._daily_used = int(parts[1]) + + if r.status_code == 429: + self._console.print("[yellow]Rate limit reached, waiting 60 s…[/yellow]") + time.sleep(60) + continue + + r.raise_for_status() + + limit_hdr = r.headers.get("X-RateLimit-Limit", "") + if limit_hdr: + lparts = limit_hdr.split(",") + if len(lparts) == 2: + l15 = int(lparts[0]) + if self._15min_used > int(l15 * 0.85): + self._console.print( + f"[yellow]Warning:[/yellow] {self._15min_used}/{l15} requests used this 15-min window." + ) + + return r.json() + + # ── API calls ───────────────────────────────────────────────────────────── + + def get_activities(self, after: int | None = None, per_page: int = 200) -> list[dict]: + """Fetch full paginated activity list. `after` is a Unix timestamp.""" + activities: list[dict] = [] + page = 1 + while True: + params: dict[str, Any] = {"per_page": per_page, "page": page} + if after is not None: + params["after"] = after + batch: list[dict] = self._get("/athlete/activities", **params) + if not batch: + break + activities.extend(batch) + if len(batch) < per_page: + break + page += 1 + return activities + + def get_streams(self, activity_id: int) -> dict[str, list]: + """Return {stream_type: [values...]}. Empty dict on any failure.""" + try: + data: dict = self._get( + f"/activities/{activity_id}/streams", + keys=STREAM_KEYS, + key_by_type="true", + ) + return {k: v["data"] for k, v in data.items() if isinstance(v, dict) and "data" in v} + except Exception: + return {} + + +# ── conversion ──────────────────────────────────────────────────────────────── + +def _strava_to_parsed(act: dict, streams: dict[str, list]) -> ParsedActivity: + """Build a ParsedActivity from a Strava activity dict + its streams.""" + started_at = datetime.fromisoformat(act["start_date"].replace("Z", "+00:00")) + + sport = normalise_sport(act.get("sport_type") or act.get("type") or "") + + times = streams.get("time", []) # seconds since start + latlngs = streams.get("latlng", []) # [[lat, lon], ...] + altitudes = streams.get("altitude", []) # metres + heartrates = streams.get("heartrate", []) # bpm + cadences = streams.get("cadence", []) # rpm + watts = streams.get("watts", []) # W + velocities = streams.get("velocity_smooth", []) # m/s + + points: list[DataPoint] = [] + for i, t in enumerate(times): + ll = latlngs[i] if i < len(latlngs) else None + points.append(DataPoint( + timestamp = started_at + timedelta(seconds=int(t)), + lat = float(ll[0]) if ll else None, + lon = float(ll[1]) if ll else None, + elevation_m = float(altitudes[i]) if i < len(altitudes) else None, + hr_bpm = int(heartrates[i]) if i < len(heartrates) else None, + cadence_rpm = int(cadences[i]) if i < len(cadences) else None, + power_w = int(watts[i]) if i < len(watts) else None, + speed_kmh = float(velocities[i]) * 3.6 if i < len(velocities) else None, + )) + + strava_id = str(act["id"]) + source_hash = "sha256:" + hashlib.sha256(f"strava:{strava_id}".encode()).hexdigest() + + return ParsedActivity( + points = points, + sport = sport, + started_at = started_at, + source_file = f"strava_{strava_id}", + source_hash = source_hash, + title = act.get("name") or None, + strava_id = strava_id, + ) + + +def _patch_from_summary(metrics: Any, act: dict) -> Any: + """Fill None metric fields using Strava activity summary values. + + Useful for activities without streams (manual entries, indoor rides with + no sensors) where compute() returns _empty(). + """ + patches: dict[str, Any] = {} + if metrics.distance_m is None and act.get("distance"): + patches["distance_m"] = float(act["distance"]) + if metrics.moving_time_s is None and act.get("moving_time"): + patches["moving_time_s"] = int(act["moving_time"]) + if metrics.duration_s is None and act.get("elapsed_time"): + patches["duration_s"] = int(act["elapsed_time"]) + if metrics.elevation_gain_m is None and act.get("total_elevation_gain"): + patches["elevation_gain_m"] = float(act["total_elevation_gain"]) + if metrics.avg_hr_bpm is None and act.get("average_heartrate"): + patches["avg_hr_bpm"] = int(act["average_heartrate"]) + if metrics.max_hr_bpm is None and act.get("max_heartrate"): + patches["max_hr_bpm"] = int(act["max_heartrate"]) + if metrics.avg_power_w is None and act.get("average_watts"): + patches["avg_power_w"] = int(act["average_watts"]) + if metrics.avg_cadence_rpm is None and act.get("average_cadence"): + patches["avg_cadence_rpm"] = int(act["average_cadence"]) + if metrics.avg_speed_kmh is None and act.get("average_speed"): + patches["avg_speed_kmh"] = float(act["average_speed"]) * 3.6 + return dataclasses.replace(metrics, **patches) if patches else metrics + + +# ── main sync ───────────────────────────────────────────────────────────────── + +def sync( + client: StravaClient, + output_dir: Path, + since: datetime | None, + console: Console, +) -> None: + """Fetch new Strava activities and write BAS JSON files. + + Idempotent: already-imported Strava IDs (tracked in _strava_sync.json) + are skipped. `since` overrides the auto-detected checkpoint. + """ + from bincio.extract.metrics import compute + from bincio.extract.writer import build_summary, make_activity_id, write_activity, write_index + + output_dir.mkdir(parents=True, exist_ok=True) + + # ── load sync state ─────────────────────────────────────────────────────── + sync_path = output_dir / SYNC_FILE + sync_state: dict = json.loads(sync_path.read_text(encoding="utf-8")) if sync_path.exists() else {} + imported_ids: set[str] = set(sync_state.get("imported_ids", [])) + + # ── determine `after` timestamp ─────────────────────────────────────────── + after_ts: int | None = None + if since: + after_ts = int(since.timestamp()) + elif sync_state.get("last_sync"): + # 1-hour overlap to catch delayed Strava saves + last = datetime.fromisoformat(sync_state["last_sync"]) + after_ts = int((last - timedelta(hours=1)).timestamp()) + # else: full sync (first run) + + # ── fetch activity list ─────────────────────────────────────────────────── + since_label = f" since {since.date()}" if since else (" (incremental)" if after_ts else " (full sync)") + console.print(f"Fetching Strava activity list{since_label}…") + all_acts = client.get_activities(after=after_ts) + new_acts = [a for a in all_acts if str(a["id"]) not in imported_ids] + + console.print( + f"Found [bold]{len(new_acts)}[/bold] new activities " + f"([bold]{len(all_acts) - len(new_acts)}[/bold] already imported)." + ) + if not new_acts: + console.print("[green]All up to date.[/green]") + return + + # ── load existing index ─────────────────────────────────────────────────── + index_path = output_dir / "index.json" + if index_path.exists(): + index_data = json.loads(index_path.read_text(encoding="utf-8")) + else: + index_data = {"owner": {"handle": "strava_user"}, "activities": []} + owner = index_data.get("owner", {}) + summaries: dict[str, dict] = {s["id"]: s for s in index_data.get("activities", [])} + + # ── import loop ─────────────────────────────────────────────────────────── + errors: list[tuple[str, str]] = [] + imported = 0 + + with Progress( + TextColumn("[progress.description]{task.description}"), + BarColumn(), + MofNCompleteColumn(), + TimeElapsedColumn(), + console=console, + ) as progress: + task = progress.add_task("Importing…", total=len(new_acts)) + + for act in new_acts: + progress.advance(task) + strava_id = str(act["id"]) + try: + streams = client.get_streams(act["id"]) + parsed = _strava_to_parsed(act, streams) + metrics = compute(parsed) + metrics = _patch_from_summary(metrics, act) + act_id = make_activity_id(parsed) + write_activity(parsed, metrics, output_dir, privacy="public") + summaries[act_id] = build_summary(parsed, metrics, act_id, "public") + imported_ids.add(strava_id) + imported += 1 + except Exception as exc: + errors.append((strava_id, str(exc))) + + # ── write index + sync state ────────────────────────────────────────────── + write_index(list(summaries.values()), output_dir, owner) + + sync_state["imported_ids"] = sorted(imported_ids) + sync_state["last_sync"] = datetime.now(timezone.utc).isoformat() + sync_path.write_text(json.dumps(sync_state, indent=2), encoding="utf-8") + + # Trigger merge if sidecar edits directory exists + if (output_dir / "edits").exists(): + from bincio.render.merge import merge_all + merge_all(output_dir) + + console.print( + f"\n[green]Done.[/green] " + f"Imported [bold]{imported}[/bold] activities, " + f"errors [bold]{len(errors)}[/bold]." + ) + if errors: + console.print("\n[red]Errors:[/red]") + for sid, msg in errors[:20]: + console.print(f" Strava {sid}: {msg}") + if len(errors) > 20: + console.print(f" … and {len(errors) - 20} more.") diff --git a/extract_config.example.yaml b/extract_config.example.yaml index 3f82c00..b29f260 100644 --- a/extract_config.example.yaml +++ b/extract_config.example.yaml @@ -31,6 +31,16 @@ classifier: incremental: true # skip files whose hash hasn't changed since last run +# ── Platform importers ───────────────────────────────────────────────────────── +# Credentials for `bincio import strava`. +# Get them from strava.com/settings/api (2 minutes, no approval needed). +# Authorization Callback Domain must be set to: localhost +# import: +# strava: +# client_id: 12345 +# client_secret: your_client_secret_here + +# ── Athlete zones ─────────────────────────────────────────────────────────────── # athlete: # max_hr: 182 # used to derive default HR zone display # ftp_w: 280 # functional threshold power in watts diff --git a/extract_config.yaml b/extract_config.yaml index cac7401..46950f1 100644 --- a/extract_config.yaml +++ b/extract_config.yaml @@ -31,6 +31,11 @@ classifier: incremental: true # skip files whose hash hasn't changed since last run +import: + strava: + client_id: # paste your Client ID from strava.com/settings/api + client_secret: # paste your Client Secret + athlete: max_hr: 190 ftp_w: 210 diff --git a/pyproject.toml b/pyproject.toml index 626df86..6099e03 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -37,6 +37,9 @@ edit = [ "uvicorn[standard]>=0.29", "python-multipart>=0.0.9", ] +strava = [ + "requests>=2.32", +] dev = [ "pytest>=9.0", "pytest-cov>=5.0", diff --git a/site/.env.example b/site/.env.example index fe246f9..885154d 100644 --- a/site/.env.example +++ b/site/.env.example @@ -4,6 +4,6 @@ BINCIO_DATA_DIR=~/bincio_data # Optional: URL of a running `bincio edit` server. -# When set, an Edit button appears on activity detail pages. +# When set, an Edit button (and Upload ↑ button) appear in the site. # Leave unset (or remove) for production / public deployments. # PUBLIC_EDIT_URL=http://localhost:4041