Commit Graph

62 Commits

Author SHA1 Message Date
Davide Scaini 84eff1f3b0 perf: spatial 10 m downsampling for timeseries
Extract _haversine_m from the inline block in _gps_speed_kmh, add
_spatial_downsample (keep one sample per 10 m traveled, GPS haversine
primary / speed×Δt fallback, indoor activities unchanged), and wire it
into build_timeseries() after the 1 s dedup loop.

Add --downsample-timeseries migration flag to bincio render that applies
the same downsampling to existing stored timeseries files without
re-extracting from original FIT/GPX files.
2026-05-19 20:11:00 +02:00
Davide Scaini 766da0320b NerdCorner VAM: filter short climbs, opacity-encode confidence, add climbing time to tooltip
- Exclude per-activity VAM contributions where climbing_time_s < 10 min; short
  punchy efforts don't represent aerobic fitness and were skewing monthly averages
- Store climbing_time_s alongside climbing_vam_mh in metrics, detail JSON, and
  summary JSON so the frontend has the data to reason about confidence
- Accumulate total climbing time per period; opacity scales from 0.25 (10 min,
  minimum threshold) to 1.0 (≥ 1 h) so thin-evidence months read as faint dots
- Render VAM as dots only (no lines) since each period is an independent average,
  not a cumulative — lines implied continuity that isn't there
- Tooltip now shows "1060 m/h · 38 min climbing"
2026-05-17 10:13:39 +02:00
Davide Scaini 14a4a0b994 Activity detail: layout refactor + GPS-derived speed for map coloring
Layout: map + charts stacked left, stats panel (2-col) on the right.
Cadence moved to last stat. Charts sit directly below the map.

Speed coloring: most FIT files don't record per-second speed, leaving
timeseries speed_kmh all-null and the hover link dead. Fix: derive speed
from consecutive GPS coordinates (haversine + 5-pt moving average) when
the device didn't record it. Add --backfill-speed render flag to retrofit
existing timeseries files.
2026-05-16 23:24:29 +02:00
Davide Scaini 003b540481 VAM: drop duration curve, show avg climbing VAM in Nerd Corner
Remove the per-duration VAM curve everywhere (metrics, summaries, detail
JSON, athlete.json, VamChart.svelte, AthleteView VAM tab). Keep only
climbing_vam_mh per activity. Add it to activity summaries so NerdCorner
can plot average climbing VAM per week/month year-over-year alongside
distance/elevation/time. Add --backfill-vam-summary flag to copy the
field from existing detail JSONs into index.json without re-extracting.
2026-05-16 22:03:40 +02:00
Davide Scaini 7cd8a6b030 VAM: add --recompute-vam flag and compute_vam_from_timeseries helper
Refactors core VAM logic into _vam_from_ele_1hz() and _build_ele_1hz()
so both the DataPoint-based extract path and the timeseries-based backfill
path share the same implementation.

render --recompute-vam reads stored *.timeseries.json files and updates
climbing_vam_mh + vam_curve in activities/*.json and index.json in-place,
without re-parsing the original FIT/GPX files.
2026-05-16 21:37:51 +02:00
Davide Scaini baf20b51ba Add VAM (climbing velocity) metric and per-duration curve
Extract pipeline now computes two VAM metrics per activity (cycling,
running, hiking, walking):
- climbing_vam_mh: VAM on ascending segments only, using 30 s forward
  lookahead to classify climbing vs. flat/descent (stored in detail JSON)
- vam_curve: [[duration_s, vam_mh], ...] best VAM per standard duration
  (60 s – 1 h), sliding window on 30 s smoothed elevation, only windows
  with ≥ 10 m net gain count (stored in summary + detail)

Athlete JSON aggregates vam_curve across all activities (all_time,
last_365d, last_90d), same structure as power_curve.

Frontend:
- ActivityDetail shows "Climbing VAM" stat (grouped with elevation)
- AthleteView adds a "VAM Curve" tab that appears only when the athlete
  has climbing data; renders VamChart (new component, mirrors MmpChart)

vam_curve stripped from combined global feed; kept in user year shards
for season-based on-the-fly aggregation in VamChart.

Requires bincio reextract to backfill existing activities.
2026-05-16 21:34:06 +02:00
Davide Scaini 3b675a68b0 Elevation: skip near-zero dropout values mid-recording
Devices (Apple Watch, some GPS units) record 0.0 when they lose barometric/GPS
lock mid-activity. The old accumulation committed these as real sea-level points,
inflating both gain and loss by the current elevation (e.g. 792m dropout on the
Cosmo Walk added ~1584m of phantom gain+loss).

Fix: skip any elevation value < 1.0m when the current committed elevation is
significantly above zero (> threshold). Gradual legitimate descents to sea level
are unaffected because intermediate values are committed along the way.

Add --recompute-elevation flag to bincio render to backfill existing activities.
2026-05-15 01:21:34 +02:00
Davide Scaini 4ea2292e2b Indoor detection: title-based inference in merge layer + fix _merge_all_locked
- Add _INDOOR_TITLE_RE / _infer_indoor_title() to writer.py (matches zwift,
  ftp-builder, turbo-trainer, rodillo); replaces the narrower zwift-only regex
  that was local to write_athlete_json
- _is_outdoor now delegates to _infer_indoor_title so all four keywords are
  excluded from records and MMP aggregation
- apply_sidecar and _apply_sidecar_summary both set sub_sport=indoor when the
  title matches and no explicit sub_sport is already present
- _merge_one_locked: detect title-inferred activities as needs_merge and call
  apply_sidecar({},{}) so the _merged copy gets sub_sport=indoor written
- _merge_all_locked: read index upfront to populate to_merge with title-inferred
  IDs; call apply_sidecar({},{}) for activities in to_merge without sidecars;
  apply _apply_sidecar_summary to ALL summary entries (not only sidecar ones)
2026-05-15 01:03:17 +02:00
Davide Scaini a863cdd663 Records: exclude Zwift activities by title; show Saved confirmation before closing drawer
- _is_outdoor now also excludes activities whose title matches /\bzwift\b/i,
  covering the ~50 Strava-imported Zwift rides that lack sub_sport metadata.

- EditDrawer waits 900ms after a successful save before dispatching 'saved'
  (which closes the drawer), so the green "Saved" confirmation is visible.
2026-05-15 00:49:46 +02:00
Davide Scaini 9f1e9e4d3b Records: apply sidecars before computing; fix best_climb_m for long mountain climbs
- _rebuild_athlete_json now applies sidecar edits (sub_sport, sport, etc.)
  in-memory before passing summaries to write_athlete_json, so activities
  marked indoor via sidecar are correctly excluded from records.

- _best_climb now runs Kadane's over cumulative distance (not 1Hz dense
  time) so recording pauses don't create None gaps that falsely reset the
  climbing window. Grappa: 811m→1603m; Nivolet: 311m→2009m.

- Add bincio render --recompute-climbs to backfill existing activities
  from their stored timeseries.
2026-05-15 00:30:58 +02:00
Davide Scaini 1ce94b8536 records: exclude indoor/treadmill/virtual sub_sport; rebuild athlete.json on bake
- fit.py: map FIT sub_sport 'treadmill' and 'virtual' to 'indoor'
- writer.py: broaden _is_outdoor to catch all indoor sub_sport variants
- render/cli.py: rebuild athlete.json from index.json on every bake so
  records never go stale when the exclusion logic changes
2026-05-14 23:37:54 +02:00
Davide Scaini a10164b932 metrics: fall back to GPS-derived speed in compute_best_efforts when device speed absent
FIT files from older devices (and GPX/TCX files) often omit the speed field.
The sliding-window best-effort algorithm was treating all such points as speed=0,
so no records were ever produced for these activities.

Fix: when p.speed_kmh is None but consecutive lat/lon are available, compute
haversine segment speed and spread it evenly across the 1Hz interval slots.
This mirrors what _gps_stats already does for avg/max speed computation.
2026-05-14 09:16:01 +02:00
Davide Scaini 9dd533825f Fix pre-existing test failures in test_writer and test_metrics
test_writer: _dummy_metrics() and test_build_summary_required_fields were
missing np_power_w=None after the field was added to ComputedMetrics.

test_metrics: the leading-zero elevation heuristic fired on a single 0.0
start value, incorrectly skipping the first legitimate elevation step.
Guard now requires at least 2 consecutive near-zero leading values before
activating the Apple Watch lock-acquisition workaround.
2026-05-13 23:15:26 +02:00
Davide Scaini c837464a28 Exclude indoor/virtual activities from records and power curve 2026-05-13 16:05:26 +02:00
Davide Scaini 4d2df860ce Segments Phase 3: detection algorithm, CLI, ingest hook, and efforts API
- detect.py: ActivityTrack + detect_one/detect_all (bbox pre-filter →
  start/end proximity 25m → path conformance 50m/30% → effort extraction
  with avg speed/HR/power and Coggan NP)
- cli.py: `bincio segments detect` for retroactive detection over stored
  timeseries JSONs, with optional --activity-id / --segment-id filters
- ingest.py: non-fatal hook at end of ingest_parsed runs detect_all
- server.py: GET /api/segments/{id}/efforts and POST /api/segments/{id}/detect
2026-05-13 00:50:39 +02:00
Davide Scaini bd0595ee79 Add avg power and NP to activity summary; NP uses Coggan 30s rolling-average method 2026-05-12 23:47:06 +02:00
Davide Scaini 0b266d208c Strip pre-2000 leading points to prevent epoch-zero start time and absurd duration 2026-05-12 23:11:33 +02:00
Davide Scaini 867da767eb Add sub_sport editing to activity edit drawer 2026-05-12 23:01:12 +02:00
Davide Scaini 8f028101c7 Fix elevation gain inflation from device no-fix leading zeros
Apple Watch and similar devices record exactly 0.0 for elevation while
waiting for barometric/GPS lock, then jump to the real altitude. The
hysteresis accumulator was seeding from 0.0, counting the full jump as
ascent. Fix: detect a leading near-zero run followed by a large jump
and seed the accumulator from the first real value instead.

Applied in both _elevation() (fresh extractions) and
recalculate_elevation_hysteresis() (recompute path). Added a bulk
admin endpoint POST /api/admin/users/{handle}/recompute-elevation and
corresponding button to fix existing stored activities.
2026-05-10 16:21:24 +02:00
Davide Scaini c077fceba6 fix: validate activity file exists before treating cached hash as known
The dedup cache (.bincio_cache.json) persists source hashes across runs.
If the activities/ directory is wiped (--fresh in another session, or macOS
clearing /tmp after a restart) while the cache file survives at the user dir
level, re-running extract skips all files as "already extracted" and leaves
activities/ empty. Now only hashes whose corresponding .json file is present
on disk are treated as known, so missing files are always re-extracted.
2026-04-25 09:54:38 +02:00
Davide Scaini df496a017f fix: refine hysteresis recalculation with MA pre-smoothing and lower thresholds
- dem.py: pre-smooth elevation with 30s moving average before hysteresis
  in recalculate_elevation_hysteresis(); thresholds drop from 5m/10m to
  1m (barometric) / 3m (GPS) — accurate after noise is smoothed out
- dem.py: widen DEM median-filter window 45s → 60s
- dem.py: rename response key source → altitude_source for consistency
- writer.py: write altitude_source into detail JSON at extract time
- tests/test_dem.py: 21 unit tests for pure functions and file-level hysteresis
- tests/test_edit_server.py: 11 TestClient API tests for both recalculate endpoints
- add httpx as dev dependency (required by FastAPI TestClient)
2026-04-22 10:57:28 +02:00
Davide Scaini ebac3f50f4 fix: DEM elevation overcounting and add hysteresis-only recalculation button
- dem.py: apply 45s median filter before hysteresis to suppress SRTM
  tile-boundary steps that were accumulating through the 5m threshold;
  raise DEM hysteresis threshold from 5m to 10m
- dem.py: back up elevation_m as elevation_m_original in timeseries
  before the first DEM overwrite, so original sensor data is preserved
- dem.py: add recalculate_elevation_hysteresis() — recomputes gain/loss
  from original recorded elevation (reads elevation_m_original if a DEM
  run already replaced elevation_m) using source-aware thresholds
  (5m barometric, 10m GPS/unknown); does not touch the elevation array
- edit/server.py, serve/server.py: split /recalculate-elevation into
  two endpoints: /recalculate-elevation/dem and
  /recalculate-elevation/hysteresis
- EditDrawer.svelte: replace single DEM button with two side-by-side
  buttons — "Recalculate (hysteresis)" (fast, offline) and
  "Recalculate (DEM)" (SRTM lookup)
2026-04-20 21:41:23 +02:00
Davide Scaini 1940e2409b feat: DEM-based elevation recalculation via edit drawer button
Adds a "Recalculate from terrain map (DEM)" button to the activity edit
drawer. On click it queries an Open-Elevation-compatible API to replace
GPS altitude with SRTM terrain data, applies 5m hysteresis, and updates
the activity's elevation stats and timeseries chart in place.

- bincio/extract/dem.py: lookup_elevations() (batched HTTP POST) +
  recalculate_elevation() (subsample → DEM → interpolate → hysteresis →
  patch activity JSON, timeseries JSON, index.json)
- POST /api/activity/{id}/recalculate-elevation on both serve and edit
  servers; serve endpoint is auth-gated and triggers merge + rebuild
- --dem-url flag (also DEM_URL env var) on bincio serve and bincio edit;
  logged at startup; missing URL returns a clear 503 with setup instructions
- /api/me response gains dem_configured bool
- EditDrawer: button with loading state, shows new ↑/↓ values on success
2026-04-20 20:45:06 +02:00
Davide Scaini 872651f471 metrics: replace naive elevation accumulation with hysteresis dead-band
GPS jitter and barometric quantization noise caused systematic overestimation
of elevation gain — in extreme cases 100% of reported gain was sub-1m noise.

Implements source-aware hysteresis: elevation is only committed when it
deviates from the last committed value by ≥5m (barometric) or ≥10m (GPS/GPX/TCX).

- ParsedActivity gains `altitude_source` field ("barometric"/"gps"/"unknown")
- FIT parser sets "barometric" when enhanced_altitude is present, else "gps"
- GPX and TCX parsers always set "gps"
- metrics._elevation() uses the threshold matching the source
- 5 new parametric tests covering flat GPS noise, threshold differences, and real climbs
2026-04-20 20:29:20 +02:00
Davide Scaini cd1cdca33b extract: auto-detect gzip by magic bytes, not just .gz extension
Files compressed with gzip but named without .gz (e.g. activity.gpx
containing gzip data) now decompress transparently.
2026-04-16 18:49:01 +02:00
Davide Scaini 290eef6c72 metrics: guard against corrupted time streams causing OOM
Strava originals with absolute Unix timestamps stored as elapsed-second
offsets produce a t_max of ~1.6 billion. compute_mmp and compute_best_efforts
both create dense 1Hz arrays via range(t_min, t_max+1), which for a 1.6B
span allocates 44+ GB and OOM-kills the process. Add a >1-week sanity
check and return None early for corrupt streams.

Root cause: old Strava activities (seen from 1970-epoch start_date)
where the time stream contains absolute Unix timestamps instead of
elapsed seconds.
2026-04-15 14:06:20 +02:00
Davide Scaini e6bb6e61a2 fix elevation_gain_m null for modern Garmin FIT files; fix map flash
FIT parser: try enhanced_altitude before altitude. Barometric altimeters
on modern Garmins (Edge 540, 840, etc.) write enhanced_altitude in
record messages and total_ascent in lap messages. The old code read only
altitude, producing null elevation_m per point → null elevation_gain_m
at the activity root while laps had correct values from total_ascent.

ActivityMap: use preview_coords (passed from ActivitySummary) to
initialise the map at the activity's location on mount, eliminating the
flash of world-view before the async detail JSON / bbox arrives.
2026-04-13 19:18:37 +02:00
Davide Scaini 5ad3aee8f6 rename privacy "private" → "unlisted"; enable GPS for unlisted
- "unlisted" = not shown in the public feed, but GPS track, timeseries
  and detail JSON are all accessible by direct URL (security by obscurity)
- "private" accepted as legacy alias everywhere (backward compat with
  existing data on disk)
- New writes from Strava sync / ZIP upload / sidecar use "unlisted"
- Only "no_gps" now suppresses the GPS track
- isUnlisted() helper in format.ts used by all Svelte/Astro components
- SCHEMA.md and CLAUDE.md document the privacy model and the distinction
  between "unlisted" and "no_gps"
2026-04-13 18:49:20 +02:00
Davide Scaini f003fdd89f garmin sync first attempt 2026-04-12 15:36:21 +02:00
Davide Scaini 6c431e8821 Here's what was built and why each decision was made:
Key at data_dir.parent/.garmin_key — nginx serves location /data/ { alias /var/bincio/data/; } so
  anything inside that dir is reachable. The key lives one level up at /var/bincio/.garmin_key,
  outside nginx's reach.

  Two-layer storage — garmin_creds.json holds the encrypted email+password (needed for re-login when
  tokens expire); garmin_session/ holds the garth OAuth tokens in plain JSON (short-lived, not the
  user's actual password).

  test_login() — called by the connect endpoint before saving anything, so credentials are only
  persisted if they actually work.

  get_client() — tries the session first (fast, no network), falls back to full re-login
  transparently. The caller never needs to think about whether the session is fresh.
2026-04-12 15:12:20 +02:00
Davide Scaini 01db4eb9ae ingest activities.csv 2026-04-11 08:13:27 +02:00
Davide Scaini bc30e0a2fc option to keep all activities private from strava zip, fix copy of register link 2026-04-10 22:51:29 +02:00
Davide Scaini 3b8bc159c5 upload strava zip 2026-04-10 22:01:44 +02:00
Davide Scaini 3e4ff4019b limit number of workers 2026-04-10 18:13:49 +02:00
Davide Scaini cf414a08ad fix strava import? 2026-04-10 18:13:32 +02:00
Davide Scaini ae883a7dba fix: rebuild athlete.json on every ingest; remove bincio-extract references from UI 2026-04-10 15:47:50 +02:00
Davide Scaini 6d3673b2f7 1. Image upload size limit — _MAX_IMAGE_BYTES = 10 MB in both serve/server.py and edit/server.py
2. Image MIME type whitelist — _ALLOWED_IMAGE_TYPES blocks SVG XSS in both servers
  3. Filename collision safety — _unique_image_name() helper in both servers
  4. OAuth CSRF — state token generated in edit/server.py auth-url, stored in _oauth_states, validated and discarded in callback; strava_api.auth_url() accepts optional state param
  5. Error message leak — upload processing errors now return generic "Processing failed" instead of exception type/message
  6. Handle injection in subprocess — _trigger_rebuild now asserts handle matches _VALID_HANDLE before passing to subprocess
2026-04-10 13:56:39 +02:00
Davide Scaini 469a5954cc "keep data on the server" opt-in/out 2026-04-10 13:01:21 +02:00
Davide Scaini 084c652fdd fixing stuff after splitting jsons 2026-04-09 15:27:00 +02:00
Davide Scaini 8118f6f316 1 — Timeseries split
- writer.py: timeseries is now written to {id}.timeseries.json as a separate file. The detail JSON gets a timeseries_url field instead. finalize_pending and cleanup_pending handle the extra file.
  - merge.py (merge_one): symlinks the .timeseries.json file alongside the detail JSON. merge_all already handles it transparently (the .timeseries.json stem doesn't match any activity
  ID in to_merge, so it falls through to the symlink branch).
  - types.ts: timeseries is now timeseries?: Timeseries | null, and timeseries_url?: string | null added.
  - dataloader.ts: new loadTimeseries(url, detailUrl, base) function that resolves paths correctly in both single- and multi-user modes (uses the fetched detail URL's directory as the base).
  - ActivityDetail.svelte: loads timeseries separately after detail loads; uses detail.timeseries for IDB activities (embedded) or fetches via detail.timeseries_url for server activities. Charts show a pulse placeholder while loading.

 2 — GZip

  - GZipMiddleware (min 1 KB) added to both bincio/serve/server.py and bincio/edit/server.py — all API JSON responses are now gzip-compressed.
  - For static files (the big timeseries JSONs), nginx should be configured with gzip on; gzip_types application/json application/geo+json; — no code change needed on the server side.

  Net effect: opening an activity page now fetches ~1.4 KB (detail) instead of ~586 KB. The timeseries fetches ~60–150 KB gzip-compressed shortly after (it loads concurrently with the map rendering).
2026-04-09 14:01:02 +02:00
Davide Scaini 7dcb1e6dd0 refactor: extract/ingest facade, merge_one, deduplicate ops constants
- Add bincio/extract/ingest.py as a facade over the extract internals (ingest_parsed, strava_sync), reducing coupling from 6+ imports to one
  - Add merge_one() to merge.py — fast single-activity path for interactive edits (rewrites one file + index, skips full directory rebuild)
  - Rewrite edit/ops.py to delegate to the new facade; fix broken run_strava_sync return (was referencing undefined locals)
  - Remove duplicated SPORTS, STAT_PANELS, VALID_ACTIVITY_ID from edit/server.py — now imported from ops.py
2026-04-09 12:05:01 +02:00
Davide Scaini 98c42dc443 unify single user and multi user behaviour 2026-04-09 08:59:40 +02:00
Davide Scaini 5bf0f3636c local conversion 2026-04-06 22:25:57 +02:00
Davide Scaini 17f36889f3 sync strava data from web ui 2026-04-06 12:38:41 +02:00
Davide Scaini bd5831c2fd second pass. low 2026-04-01 19:00:28 +02:00
Davide Scaini 3d364c3992 second pass. medium 2026-04-01 11:05:00 +02:00
Davide Scaini 94369606a4 second pass at issues. critical ones. 2026-04-01 10:58:45 +02:00
Davide Scaini 81438231b4 fix low level issues 2026-03-31 23:22:12 +02:00
Davide Scaini 8f91503cf7 fix mid level issues. updated changelog 2026-03-31 23:00:39 +02:00
Davide Scaini f8abab2c23 fix high priority issues 2026-03-31 22:53:50 +02:00