Indoor detection: title-based inference in merge layer + fix _merge_all_locked

- Add _INDOOR_TITLE_RE / _infer_indoor_title() to writer.py (matches zwift,
  ftp-builder, turbo-trainer, rodillo); replaces the narrower zwift-only regex
  that was local to write_athlete_json
- _is_outdoor now delegates to _infer_indoor_title so all four keywords are
  excluded from records and MMP aggregation
- apply_sidecar and _apply_sidecar_summary both set sub_sport=indoor when the
  title matches and no explicit sub_sport is already present
- _merge_one_locked: detect title-inferred activities as needs_merge and call
  apply_sidecar({},{}) so the _merged copy gets sub_sport=indoor written
- _merge_all_locked: read index upfront to populate to_merge with title-inferred
  IDs; call apply_sidecar({},{}) for activities in to_merge without sidecars;
  apply _apply_sidecar_summary to ALL summary entries (not only sidecar ones)
This commit is contained in:
Davide Scaini
2026-05-15 01:03:17 +02:00
parent 0fbb7822df
commit 4ea2292e2b
2 changed files with 49 additions and 12 deletions
+13 -4
View File
@@ -10,6 +10,18 @@ from bincio.extract.models import LapData, ParsedActivity
from bincio.extract.simplify import build_geojson, preview_coords
from bincio.extract.timeseries import build_timeseries
# Titles that reliably identify indoor/virtual activities regardless of sub_sport metadata.
# Strava imports from Zwift and FTP-builder platforms lose sub_sport on export.
_INDOOR_TITLE_RE = re.compile(
r'\b(zwift|ftp[\s\-]builder|turbo[\s\-]?trainer|rodillo)\b',
re.IGNORECASE,
)
def _infer_indoor_title(title: str) -> bool:
"""Return True if the title reliably identifies an indoor/virtual activity."""
return bool(_INDOOR_TITLE_RE.search(title))
def make_activity_id(activity: ParsedActivity) -> str:
"""Generate a BAS activity ID from started_at + optional title slug.
@@ -278,14 +290,11 @@ def write_athlete_json(summaries: list[dict], output_dir: Path, athlete_config:
return [[d, w] for d, w in sorted(best.items())]
_INDOOR_SUB_SPORTS = {"indoor", "treadmill", "virtual"}
_INDOOR_TITLE_RE = re.compile(r'\bzwift\b', re.IGNORECASE)
def _is_outdoor(s: dict) -> bool:
if s.get("sub_sport") in _INDOOR_SUB_SPORTS:
return False
if _INDOOR_TITLE_RE.search(s.get("title") or ""):
return False
return True
return not _infer_indoor_title(s.get("title") or "")
all_mmps = [s["mmp"] for s in summaries if s.get("mmp") and _is_outdoor(s)]
mmps_365 = [s["mmp"] for s in summaries if s.get("mmp") and _is_outdoor(s) and s["started_at"] >= cutoff_365]