fix(upload): prevent false 422s and EMFILE crash during bulk uploads

Four related issues made uploading 271+ activities unreliable:

1. merge_all/write_combined_feed were inside the extraction try/except —
   any merge race returned 422 even though the file was on disk, causing
   the mobile app to permanently mark the upload as failed.  Fixed by
   moving them to a separate best-effort try/except after the extraction
   block.  Switch to merge_one (single-activity symlink) instead of
   merge_all (full rebuild) so each upload is O(1) FS ops, not O(N).

2. The dev watcher fired merge_all for every activity .json write AND the
   upload endpoint also ran merge_all — O(N²) symlink operations during
   bulk uploads.  Watcher now skips activities/*.json changes (upload
   endpoint handles those directly).

3. Vite/Chokidar followed the public/data symlink and opened a handle per
   activity file; constant merge rebuilds exhausted file descriptors and
   crashed the Astro dev server.  Fixed with watch.ignored on public/data.

4. _write_year_shards and write_combined_feed used f.unlink() without
   missing_ok=True — concurrent callers racing the same file threw
   FileNotFoundError which propagated as a false extraction failure.
This commit is contained in:
Davide Scaini
2026-04-27 14:33:05 +02:00
parent 7a65ed2078
commit be772bd3df
4 changed files with 34 additions and 9 deletions
+7 -2
View File
@@ -98,6 +98,11 @@ def merge_one(data_dir: Path, activity_id: str) -> None:
Use merge_all() for bulk operations (first run, Strava sync, etc.).
"""
with _merge_lock(data_dir):
_merge_one_locked(data_dir, activity_id)
def _merge_one_locked(data_dir: Path, activity_id: str) -> None:
edits_dir = data_dir / "edits"
acts_dir = data_dir / "activities"
merged_dir = data_dir / "_merged"
@@ -311,7 +316,7 @@ def _write_year_shards(merged_dir: Path, activities: list[dict], index_meta: dic
# Remove stale year shard files from previous runs
for f in merged_dir.glob("index-*.json"):
f.unlink()
f.unlink(missing_ok=True)
by_year: dict[str, list[dict]] = defaultdict(list)
for a in activities:
@@ -398,7 +403,7 @@ def write_combined_feed(data_dir: Path) -> int:
# Remove stale feed pages
for f in data_dir.glob("feed*.json"):
f.unlink()
f.unlink(missing_ok=True)
if not all_activities:
return 0