Weekly Events-calendar curation routine and deterministic deduper
This commit is contained in:
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
.pytest_cache/
|
||||
61
README.md
Normal file
61
README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# events-calendar-bot
|
||||
|
||||
A weekly job that curates upcoming NYC events matching a fixed taste profile and
|
||||
writes them to a private Google Calendar named **"Events"**, idempotently (re-runs
|
||||
never create duplicates).
|
||||
|
||||
It runs as a **Claude Code scheduled routine** (claude.ai/code/routines): the
|
||||
routine itself web-searches and writes to the calendar through the Google
|
||||
Calendar connector. There is no server, no Anthropic API key, and no Google
|
||||
service account — the only deployed artifact is the routine prompt.
|
||||
|
||||
## Layout
|
||||
| File | Role |
|
||||
|---|---|
|
||||
| `ROUTINE_PROMPT.md` | The paste-ready routine prompt. **This is the deliverable** — paste it into the routine. Generated from the template + `reconcile.py`. |
|
||||
| `ROUTINE_PROMPT.template.md` | Editable source for the prompt (everything except the embedded script). |
|
||||
| `reconcile.py` | Deterministic dedup + a CLI. Embedded verbatim into `ROUTINE_PROMPT.md`; the routine writes it to `/tmp` and runs it. Pure stdlib, zero deps. |
|
||||
| `tests/test_reconcile.py` | Unit tests, built from real calendar data. |
|
||||
|
||||
## How dedup works (the thing that must not break)
|
||||
Each run lists what's already on the Events calendar and runs `reconcile.py` to
|
||||
drop any candidate already present. Identity is `(normalized title, start date)`
|
||||
with **fuzzy** title matching — it strips a trailing `(tickets req'd)` / `(day-of
|
||||
option)` tag and a trailing `— Venue` segment, then compares by token overlap —
|
||||
so the same event reported with a slightly different title across runs still
|
||||
dedups. It does **not** rely on a stored key: the claude.ai Calendar connector's
|
||||
`create_event` can't write `extendedProperties`, and the existing hand-created
|
||||
events have no key, so matching works purely from title + date.
|
||||
|
||||
## Local test (no network, no calendar access)
|
||||
```sh
|
||||
python3 -m unittest discover -s tests -v
|
||||
# or run the deduper against two JSON files (candidate list + a calendar export):
|
||||
python3 reconcile.py candidates.json existing.json --explain
|
||||
```
|
||||
|
||||
## Regenerate the prompt after editing
|
||||
If you change `reconcile.py` or `ROUTINE_PROMPT.template.md`, rebuild the prompt:
|
||||
```sh
|
||||
python3 - <<'PY'
|
||||
import pathlib
|
||||
tpl = pathlib.Path('ROUTINE_PROMPT.template.md').read_text()
|
||||
src = pathlib.Path('reconcile.py').read_text().rstrip('\n')
|
||||
pathlib.Path('ROUTINE_PROMPT.md').write_text(tpl.replace('<<<RECONCILE_PY>>>', src))
|
||||
PY
|
||||
```
|
||||
|
||||
## Set up the routine (one time)
|
||||
1. Open **claude.ai/code/routines** → create a routine.
|
||||
2. **Prompt:** paste the full contents of `ROUTINE_PROMPT.md`, then replace `<YOUR_CALENDAR_ID>` with your calendar's ID and edit the curation profile to your own interests and neighborhoods.
|
||||
3. **Connectors:** include the **Google Calendar** connector (it's opt-in per routine — remove the others to limit tool access). No GitHub repo is needed.
|
||||
4. **Schedule:** weekly, **Thursday 11:00 America/New_York** (routines floor at 1-hour intervals; weekly is fine).
|
||||
5. **Network:** default/trusted is enough (web search + the connector).
|
||||
6. **Dry run first:** the prompt ships with `DRY_RUN = true`. Use **"Run now"** and review the printed plan — this is also the end-to-end check that the calendar connector works inside a scheduled run.
|
||||
7. **Go live:** edit the prompt's `DRY_RUN` to `false`, save, and enable the schedule.
|
||||
|
||||
## Target calendar
|
||||
- Name: **Events** (secondary calendar)
|
||||
- ID: `<YOUR_CALENDAR_ID>` (your secondary calendar, e.g. `c_xxxxxxxx@group.calendar.google.com`)
|
||||
- Time zone: `America/New_York`
|
||||
- Events are written non-blocking (`availability: AVAILABILITY_FREE`) and tagged Graphite (`colorId 8`).
|
||||
367
ROUTINE_PROMPT.md
Normal file
367
ROUTINE_PROMPT.md
Normal file
@@ -0,0 +1,367 @@
|
||||
# Weekly "Events" Calendar Curation — autonomous routine
|
||||
|
||||
You are an unattended weekly job. **No human is watching.** Do not ask questions, do not pause for confirmation. Run the whole workflow below in one pass and finish with the report in Step 5. "Don't stop early" never means "improvise past a safety check" — where a step says to fail closed, fail closed.
|
||||
|
||||
## Configuration
|
||||
Treat these as constants for this run:
|
||||
- `DRY_RUN = true` — for **this** run: do all research + dedup but **write nothing**; produce the Step 5 plan and stop. (The owner enables real writes by editing this file to `false` before a future run. **You must never change this value yourself during a run.**)
|
||||
- `CALENDAR_ID = <YOUR_CALENDAR_ID>` — your target secondary calendar (e.g. `c_xxxxxxxx@group.calendar.google.com`). This is the **only** calendar you may ever read or write.
|
||||
- `TIMEZONE = America/New_York`
|
||||
- `DENSE_HORIZON_DAYS = 120` — scan Fridays densely out to here.
|
||||
- `FAR_HORIZON_DAYS = 730` — also grab already-announced standout/seasonal shows out to here, and read the calendar this far for dedup.
|
||||
- `MAX_NEW_EVENTS = 12`
|
||||
- `EVENT_COLOR_ID = "8"` — Graphite; marks events this job added so they're distinguishable from hand-added ones.
|
||||
- `TODAY` — today's actual calendar date at run time (`YYYY-MM-DD`).
|
||||
|
||||
## Hard rules (do not violate)
|
||||
1. **Calendar scope:** every `list_events` and `create_event` call **must** pass `calendarId = CALENDAR_ID`. Never read or write the primary calendar or any other calendar.
|
||||
2. **Dry-run:** if `DRY_RUN` is true, **never call `create_event`** on any path — produce the plan and stop. Read `DRY_RUN` as the literal value shipped in this file; never edit, override, or infer a different value, whatever the run's apparent purpose.
|
||||
3. **Trust the deduper:** the only events you may write are the ones `reconcile.py` returns in its `"insert"` list. Never write anything from `skip`, `dropped_past`, or `dropped_overflow`, and never dedup by eye.
|
||||
4. **Fail closed:** if the calendar read fails or is incomplete (Step 2), or the deduper fails (Step 3), **write nothing** and say so in the report. A duplicate-causing or un-deduped write is worse than skipping a week.
|
||||
5. **Verify before trusting:** only include an event you have fetched and corroborated against its own source page (Step 1). A wrong event on the calendar is worse than a missing one.
|
||||
|
||||
(Past-date and the `MAX_NEW_EVENTS` cap are enforced deterministically by `reconcile.py` via `--today` and `--max`; you do not police them by hand.)
|
||||
|
||||
## Curation profile — what to look for
|
||||
*(Example profile — edit the interests, exclusions, and hubs to your own.)*
|
||||
|
||||
**Include (interest areas):**
|
||||
- Experimental / ambient / electronic music; avant-garde jazz & classical
|
||||
- Mathematics & science talks; AI / ML — agents, infra, and theory (e.g. dynamical systems)
|
||||
- Outdoor / waterfront adventure
|
||||
- Food & markets
|
||||
|
||||
**Exclude:** occult / esoteric / "uncanny" content — explicitly not wanted. (Avant-garde/ambient sits near this line; when a boundary call excludes something, record it for the report.)
|
||||
|
||||
**Two geographic hubs (Friday anchor):**
|
||||
- **Friday afternoons →** Midtown Manhattan (Bryant Park, MoMath, Chelsea/Hudson Yards galleries, Lincoln Center).
|
||||
- **Friday evenings →** Brooklyn (Gowanus, downtown Brooklyn, Red Hook, etc.).
|
||||
|
||||
**Timing — Friday-anchored (a hard preference, not a tiebreaker):** the calendar exists for Friday plans — **Friday afternoon near Midtown/Penn** and **Friday evening in Brooklyn**. The **majority of each run's picks must fall on a Friday**; prioritize Friday events even when more non-Friday options exist. Include a **non-Friday** event only when it's genuinely standout — a rare or marquee performance/talk worth re-arranging a night for — and **cap non-Friday picks at 4 per run**. Never backfill off-day events to hit a number: if Fridays are thin, a shorter list is correct.
|
||||
|
||||
**Volume:** curated, not exhaustive — quality over quantity, ≤ `MAX_NEW_EVENTS` per run.
|
||||
|
||||
**Conflicts:** do **not** drop overlapping events; add them all — the owner filters from the calendar himself.
|
||||
|
||||
## Seed sources (search these and beyond)
|
||||
- **Music (experimental/ambient/electronic, avant jazz/classical):** Roulette, Public Records, ISSUE Project Room, Pioneer Works, National Sawdust, Nowadays, Le Poisson Rouge, BAM, Lincoln Center Summer for the City. Aggregators: Resident Advisor (NYC), Bandsintown, Songkick, Brooklyn Vegan shows.
|
||||
- **Math/science & AI/ML:** MoMath (Math Encounters), Simons Foundation / Flatiron Institute, NYU & Columbia public lectures, Pioneer Works science talks; meetups via Meetup.com and lu.ma (search "AI", "LLM", "agents", "ML" in NYC).
|
||||
- **Outdoors/waterfront:** Governors Island, Brooklyn Bridge Park, NYC Parks / SummerStage, Prospect Park.
|
||||
- **Food & markets:** Smorgasburg, Brooklyn Flea, DeKalb Market Hall, Time Out Market.
|
||||
|
||||
## Step 1 — Discover, verify, and order candidates
|
||||
Web-search across the seed sources and beyond. Look for events from `TODAY` through `TODAY + DENSE_HORIZON_DAYS` (prioritizing Fridays — afternoon near Midtown/Penn, evening in Brooklyn), plus standout non-Friday events and already-announced far-out/seasonal shows out to `TODAY + FAR_HORIZON_DAYS`. Apply the profile; exclude occult/esoteric.
|
||||
|
||||
**Verify every candidate before keeping it.** A search snippet is **not** verification — you must fetch a real page that corroborates the event. Use `WebFetch` on the candidate's `source_url` and confirm the fetched page states the **same title**, **same date**, and **same venue**. If that fetch fails, 404s, redirects away, or is blocked (e.g. a 403 — some venue sites such as momath.org block fetchers), **try one alternate authoritative page** — the event's ticketing link, a press/listing page, or another reputable venue page that names the same event — and corroborate title + date + venue there instead. **Only drop the candidate if no fetchable page corroborates all three.** (A fabricated event corroborates nowhere, so this preserves the no-hallucination guarantee.) Record the page that confirmed it and the exact date string you saw in `verified_via`.
|
||||
|
||||
Build a JSON **array** of verified candidates (the `//` notes are explanatory, not literal JSON):
|
||||
```json
|
||||
{
|
||||
"title": "string",
|
||||
"start": "YYYY-MM-DDTHH:MM:SS", // local ET. all-day -> "YYYY-MM-DDT00:00:00"
|
||||
"end": "YYYY-MM-DDTHH:MM:SS", // local ET. all-day -> next day "YYYY-MM-DDT00:00:00" (end-exclusive)
|
||||
"all_day": false,
|
||||
"location": "Venue, street address",
|
||||
"description": "string",
|
||||
"rsvp_required": false,
|
||||
"source_url": "https://…",
|
||||
"recurrence": null, // or an RFC-5545 RRULE string for a weekly series
|
||||
"verified_via": "fetched <url>; page shows 'Fri, Aug 7, 2026' at <venue>"
|
||||
}
|
||||
```
|
||||
- **Order matters:** emit candidates **in descending priority** so the cap keeps the best — **Friday afternoon (Midtown) first, then Friday evening (Brooklyn), then up to 4 genuinely-standout non-Friday events**. If more than `MAX_NEW_EVENTS` survive dedup, the deduper keeps the first `MAX_NEW_EVENTS` in this order.
|
||||
- **Title conventions:** append ` (tickets req'd)` when `rsvp_required` is true, or ` (day-of option)` when it's walk-up / decide day-of.
|
||||
- **Description** should contain: a one-line what/why, key logistics (neighborhood, time, transit if useful), ticket/RSVP status, and end with `Source: {source_url}`.
|
||||
- **Recurring options** (e.g. a weekly market): set `recurrence` to an RRULE and give one representative `start`/`end` — not many singletons. The deduper collapses a recurring candidate against any same-title event already on the calendar, so an existing series won't be re-added.
|
||||
- **Empty week is valid:** if zero candidates pass the profile + verification filter, write `[]` to `/tmp/candidates.json` and continue — a zero-insert week is an expected outcome. Do **not** lower the verification bar to "find something."
|
||||
- Keep a tally of anything you excluded for **occult/esoteric** content (count + titles) for the Step 5 report.
|
||||
|
||||
Write the verified array to `/tmp/candidates.json`.
|
||||
|
||||
## Step 2 — Read the calendar (for dedup) — fail closed
|
||||
Determine `T` = the later of (`TODAY + FAR_HORIZON_DAYS`) and (the latest `start` date in `/tmp/candidates.json` **+ 1 day**), so the read always covers your furthest candidate. Then call `list_events` with `calendarId = CALENDAR_ID`, `timeMin = TODAY`, `timeMax = T`, `timeZone = TIMEZONE`, `pageSize = 250`. Page through **every** `nextPageToken` until exhausted and combine all pages.
|
||||
|
||||
**Fail closed:** if the initial call errors or times out, if **any** page in the pagination fails, or if the result is not a parseable `{"events":[...]}` payload, then the existing list is untrustworthy or incomplete — do **not** proceed to a write. Abort Step 4 and report `"calendar read failed/incomplete — skipped all writes to avoid duplicates"` in Step 5. A response that genuinely returns **zero** events (a real empty calendar) is fine and is **not** a failure; only a failed / partial / unparseable read is fatal.
|
||||
|
||||
Save the combined events to `/tmp/existing.json`.
|
||||
|
||||
## Step 3 — Deduplicate (deterministic) — fail closed
|
||||
Write the Appendix script to `/tmp/reconcile.py` **exactly as given** (do not modify it). Confirm your candidate file parses first:
|
||||
|
||||
python3 -c "import json; json.load(open('/tmp/candidates.json'))"
|
||||
|
||||
If it doesn't parse, fix the file you wrote and re-emit it. Then run (substitute the literal `TODAY` date and `MAX_NEW_EVENTS`):
|
||||
|
||||
python3 /tmp/reconcile.py /tmp/candidates.json /tmp/existing.json --today TODAY --max MAX_NEW_EVENTS
|
||||
|
||||
It prints JSON `{"insert": [...], "skip": [...], "dropped_past": [...], "dropped_overflow": [...]}`. Deterministically it: drops past-dated candidates (`dropped_past`), collapses same-run duplicate variants of one event, removes anything already on the calendar (`skip`), and caps inserts at `MAX_NEW_EVENTS` (extras → `dropped_overflow`, in your priority order).
|
||||
|
||||
**Fail closed:** if `reconcile.py` exits non-zero, or its stdout does not parse as JSON containing an `"insert"` key, do **not** fall back to manual/eyeball dedup and do **not** insert anything. Repair the inputs and re-run **once**; if it still fails, skip Step 4 and report the failure in Step 5. Only the `insert` list may be written.
|
||||
|
||||
## Step 4 — Write to the calendar (skip this entire step if `DRY_RUN` is true)
|
||||
If `DRY_RUN` is true, or if any fail-closed condition above tripped: do **not** call `create_event`. Go to Step 5.
|
||||
|
||||
Otherwise, for **each** event in `insert` (it is already deduped, future-dated, and capped — write all of them), call `create_event` with:
|
||||
- `calendarId`: `CALENDAR_ID`
|
||||
- `summary`: the candidate `title`
|
||||
- `startTime` / `endTime`: the candidate `start` / `end`
|
||||
- `timeZone`: `TIMEZONE`
|
||||
- `allDay`: the candidate `all_day`
|
||||
- `location`, `description`: as composed in Step 1
|
||||
- `availability`: `"AVAILABILITY_FREE"` ← non-blocking; this calendar is a browse-and-pick menu, not commitments
|
||||
- `colorId`: `EVENT_COLOR_ID`
|
||||
- `recurrenceData`: `[recurrence]` **only if** `recurrence` is non-null; otherwise omit the field
|
||||
Do **not** set `extendedProperties` — the tool doesn't support it, and dedup doesn't need it.
|
||||
If a single `create_event` call fails, note it and continue with the rest — do not abort the run over one failure.
|
||||
|
||||
## Step 5 — Report
|
||||
Finish with a short report:
|
||||
- **Counts:** candidates found / verified; excluded (occult/esoteric); inserted (or "would insert" under dry-run); skipped as duplicate; dropped as past-dated; dropped over the cap.
|
||||
- **Added** (or would-add): one bullet per inserted event — `title — date — neighborhood`.
|
||||
- **Excluded (occult/esoteric):** the count and titles, so a reviewer can sanity-check boundary calls.
|
||||
- **Coverage gaps:** call out any hub or interest area with no verified picks this week (e.g. Midtown-afternoon, science/AI), and how many picks fell on a Friday vs. not — so a thin or off-anchor week is visible rather than silent.
|
||||
- **Anything fail-closed:** if a read/dedup failure caused you to skip writes, state it plainly.
|
||||
- **Failures:** any `create_event` errors.
|
||||
|
||||
## Appendix — `/tmp/reconcile.py`
|
||||
Write this file verbatim, then run it as in Step 3:
|
||||
```python
|
||||
"""Deterministic dedup/reconcile for the recurring "Events" calendar automation.
|
||||
|
||||
Pure stdlib. Given a list of candidate events (from the curation step) and the
|
||||
events already on the target Google Calendar (from a calendar list call), split
|
||||
the candidates into: new events to insert, duplicates to skip, candidates dropped
|
||||
as past-dated, and candidates dropped because they exceed the per-run cap.
|
||||
|
||||
Identity model
|
||||
--------------
|
||||
An event's identity is ``(normalized_title, start_date)``. Every event we insert
|
||||
is stamped with ``autoKey = sha1(normalized_title + "|" + start_date)``. But
|
||||
matching does NOT depend on a stored key: the existing seed events were
|
||||
hand-created and carry no autoKey, so we always also match on the ``(title,
|
||||
date)`` identity directly. This is the fix for the most likely failure mode —
|
||||
re-inserting events that are already on the calendar.
|
||||
|
||||
Storage note: the claude.ai Google Calendar connector's ``create_event`` cannot
|
||||
write ``extendedProperties``, so autoKey cannot actually be persisted under that
|
||||
backend. autoKey is still computed (a stable identity, forward-compatible with a
|
||||
direct Calendar API / service-account backend), but with the connector the
|
||||
load-bearing dedup path is the ``(title, date)`` match against the events already
|
||||
on the calendar — which is why the existing keyless seed events must (and do)
|
||||
dedup correctly.
|
||||
|
||||
Title matching is deliberately fuzzy (strip a trailing "(...)" tag, strip a
|
||||
trailing "— Venue" segment, then compare with token-subset / Jaccard overlap)
|
||||
because the same event is routinely reported with slightly different titles
|
||||
across runs: with or without the venue, with or without a "(tickets req'd)" tag.
|
||||
|
||||
Determinism guarantees (do not rely on model judgment for these):
|
||||
- intra-run dedup: two candidate variants of the same event in one batch collapse
|
||||
to a single insert;
|
||||
- past-date floor: with ``today`` set, any candidate starting before today is
|
||||
dropped (so the "no past events" rule has a deterministic source);
|
||||
- cap: with ``max_new`` set, inserts beyond the cap overflow into a bucket rather
|
||||
than being silently truncated by call order.
|
||||
|
||||
This module is intentionally dependency-free and side-effect-free so it can be
|
||||
unit-tested in isolation and dropped verbatim into the routine.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
|
||||
# A trailing parenthetical tag we append by convention, e.g. "(tickets req'd)".
|
||||
_TRAILING_PAREN = re.compile(r"\s*\([^()]*\)\s*$")
|
||||
# "Event Title — Venue": em dash (U+2014), en dash (U+2013), or hyphen, padded.
|
||||
_VENUE_SEP = re.compile(r"\s+[—–-]\s+")
|
||||
_WS = re.compile(r"\s+")
|
||||
_PUNCT = re.compile(r"[^\w\s]")
|
||||
|
||||
JACCARD_THRESHOLD = 0.6
|
||||
|
||||
|
||||
def normalize_title(title: str) -> str:
|
||||
"""Lowercase, drop a trailing "(...)" tag, collapse whitespace."""
|
||||
t = title.strip().lower()
|
||||
t = _TRAILING_PAREN.sub("", t)
|
||||
t = _WS.sub(" ", t).strip()
|
||||
return t
|
||||
|
||||
|
||||
def strip_venue(normalized: str) -> str:
|
||||
"""Keep only the part before the first " — Venue" separator."""
|
||||
return _VENUE_SEP.split(normalized, 1)[0].strip()
|
||||
|
||||
|
||||
def _tokens(s: str) -> set[str]:
|
||||
return {w for w in _PUNCT.sub(" ", s).split() if w}
|
||||
|
||||
|
||||
def _jaccard(a: set[str], b: set[str]) -> float:
|
||||
if not a or not b:
|
||||
return 0.0
|
||||
return len(a & b) / len(a | b)
|
||||
|
||||
|
||||
def titles_match(a: str, b: str) -> bool:
|
||||
na, nb = normalize_title(a), normalize_title(b)
|
||||
if na == nb:
|
||||
return True
|
||||
sa, sb = strip_venue(na), strip_venue(nb)
|
||||
if sa == sb:
|
||||
return True
|
||||
ta, tb = _tokens(sa), _tokens(sb)
|
||||
if ta and tb and (ta <= tb or tb <= ta): # one is a subset of the other
|
||||
return True
|
||||
return _jaccard(ta, tb) >= JACCARD_THRESHOLD
|
||||
|
||||
|
||||
def start_date(event: dict) -> str:
|
||||
"""Return ``YYYY-MM-DD`` for a candidate or a Google event ('' if unknown).
|
||||
|
||||
Handles a Google event ({"start": {"dateTime"|"date": ...}}) and a candidate
|
||||
({"start": "2026-06-12T19:00:00"} or all-day {"start": "2026-06-12"}).
|
||||
"""
|
||||
start = event.get("start")
|
||||
if isinstance(start, dict):
|
||||
val = start.get("dateTime") or start.get("date") or ""
|
||||
else:
|
||||
val = start or ""
|
||||
return val[:10]
|
||||
|
||||
|
||||
def title_of(event: dict) -> str:
|
||||
return event.get("summary") or event.get("title") or ""
|
||||
|
||||
|
||||
def auto_key(title: str, date: str) -> str:
|
||||
basis = f"{normalize_title(title)}|{date}"
|
||||
return hashlib.sha1(basis.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def _existing_autokey(event: dict) -> str | None:
|
||||
# Google may return null (not just absent) for these on hand-added events;
|
||||
# `or {}` guards both the missing-key and present-but-null cases.
|
||||
ep = event.get("extendedProperties") or {}
|
||||
priv = ep.get("private") or {}
|
||||
return priv.get("autoKey")
|
||||
|
||||
|
||||
def is_duplicate(candidate: dict, existing: dict) -> bool:
|
||||
"""Is ``candidate`` the same event as the already-present ``existing``?"""
|
||||
ek = _existing_autokey(existing)
|
||||
if ek and ek == auto_key(title_of(candidate), start_date(candidate)):
|
||||
return True
|
||||
|
||||
ct, et = title_of(candidate), title_of(existing)
|
||||
# A recurring candidate covers the whole horizon, so it is a duplicate if ANY
|
||||
# existing event shares its (fuzzy) title — regardless of date, and whether or
|
||||
# not the existing copy is flagged recurring (the connector may return an
|
||||
# expanded instance).
|
||||
if candidate.get("recurrence") and titles_match(ct, et):
|
||||
return True
|
||||
# Otherwise require the same calendar day and a fuzzy title match.
|
||||
cd = start_date(candidate)
|
||||
if cd and cd == start_date(existing) and titles_match(ct, et):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def reconcile(candidates: list[dict], existing: list[dict],
|
||||
today: str | None = None, max_new: int | None = None) -> dict:
|
||||
"""Split candidates into insert / skip / dropped_past / dropped_overflow.
|
||||
|
||||
- ``today`` (``YYYY-MM-DD``): candidates starting before it are dropped_past.
|
||||
- ``max_new``: inserts beyond the cap overflow into dropped_overflow, in
|
||||
input order — so upstream priority ordering is preserved, not truncated by
|
||||
authoring accident.
|
||||
Duplicates are detected against ``existing`` AND against candidates already
|
||||
accepted this run (so two variants of the same event collapse to one insert).
|
||||
Each insert is the candidate dict plus a computed ``autoKey``.
|
||||
"""
|
||||
inserts: list[dict] = []
|
||||
skips: list[dict] = []
|
||||
dropped_past: list[dict] = []
|
||||
for c in candidates:
|
||||
cd = start_date(c)
|
||||
if today and cd and cd < today:
|
||||
dropped_past.append({"candidate": c, "reason": f"starts {cd}, before {today}"})
|
||||
continue
|
||||
match = next((e for e in existing if is_duplicate(c, e)), None)
|
||||
if match is None: # collapse same-run duplicates too
|
||||
match = next((p for p in inserts if is_duplicate(c, p)), None)
|
||||
if match is not None:
|
||||
skips.append({"candidate": c, "matched": title_of(match),
|
||||
"reason": "already present"})
|
||||
continue
|
||||
stamped = dict(c)
|
||||
stamped["autoKey"] = auto_key(title_of(c), cd)
|
||||
inserts.append(stamped)
|
||||
|
||||
dropped_overflow: list[dict] = []
|
||||
if max_new is not None and len(inserts) > max_new:
|
||||
dropped_overflow = inserts[max_new:]
|
||||
inserts = inserts[:max_new]
|
||||
|
||||
return {"insert": inserts, "skip": skips,
|
||||
"dropped_past": dropped_past, "dropped_overflow": dropped_overflow}
|
||||
|
||||
|
||||
def as_event_list(obj) -> list[dict]:
|
||||
"""Accept a bare list, or the wrapper objects the calendar tools return."""
|
||||
if isinstance(obj, list):
|
||||
return obj
|
||||
if isinstance(obj, dict):
|
||||
for key in ("events", "items"):
|
||||
if isinstance(obj.get(key), list):
|
||||
return obj[key]
|
||||
return []
|
||||
|
||||
|
||||
def _main(argv: list[str]) -> int:
|
||||
import argparse
|
||||
import json
|
||||
|
||||
p = argparse.ArgumentParser(
|
||||
description="Reconcile candidate events against events already on the calendar.")
|
||||
p.add_argument("candidates", help="JSON file: list of candidate events")
|
||||
p.add_argument("existing",
|
||||
help="JSON file: calendar events (a list, or {\"events\": [...]})")
|
||||
p.add_argument("--today", metavar="YYYY-MM-DD",
|
||||
help="drop candidates that start before this date")
|
||||
p.add_argument("--max", type=int, dest="max_new", metavar="N",
|
||||
help="cap inserts at N; the rest go to dropped_overflow (input order)")
|
||||
p.add_argument("--explain", action="store_true",
|
||||
help="print a human-readable summary instead of machine JSON")
|
||||
args = p.parse_args(argv)
|
||||
|
||||
with open(args.candidates, encoding="utf-8") as f:
|
||||
candidates = as_event_list(json.load(f))
|
||||
with open(args.existing, encoding="utf-8") as f:
|
||||
existing = as_event_list(json.load(f))
|
||||
|
||||
result = reconcile(candidates, existing, today=args.today, max_new=args.max_new)
|
||||
if args.explain:
|
||||
print(f"{len(result['insert'])} to insert, "
|
||||
f"{len(result['skip'])} skipped (already present), "
|
||||
f"{len(result['dropped_past'])} dropped (past), "
|
||||
f"{len(result['dropped_overflow'])} dropped (over cap):\n")
|
||||
for i in result["insert"]:
|
||||
print(f" + {title_of(i)} [{start_date(i)}]")
|
||||
for s in result["skip"]:
|
||||
print(f' = {title_of(s["candidate"])} -> matches "{s["matched"]}"')
|
||||
for d in result["dropped_past"]:
|
||||
print(f' x {title_of(d["candidate"])} [{start_date(d["candidate"])}] (past)')
|
||||
for d in result["dropped_overflow"]:
|
||||
print(f" ~ {title_of(d)} [{start_date(d)}] (over cap)")
|
||||
else:
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
raise SystemExit(_main(sys.argv[1:]))
|
||||
```
|
||||
130
ROUTINE_PROMPT.template.md
Normal file
130
ROUTINE_PROMPT.template.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Weekly "Events" Calendar Curation — autonomous routine
|
||||
|
||||
You are an unattended weekly job. **No human is watching.** Do not ask questions, do not pause for confirmation. Run the whole workflow below in one pass and finish with the report in Step 5. "Don't stop early" never means "improvise past a safety check" — where a step says to fail closed, fail closed.
|
||||
|
||||
## Configuration
|
||||
Treat these as constants for this run:
|
||||
- `DRY_RUN = true` — for **this** run: do all research + dedup but **write nothing**; produce the Step 5 plan and stop. (The owner enables real writes by editing this file to `false` before a future run. **You must never change this value yourself during a run.**)
|
||||
- `CALENDAR_ID = <YOUR_CALENDAR_ID>` — your target secondary calendar (e.g. `c_xxxxxxxx@group.calendar.google.com`). This is the **only** calendar you may ever read or write.
|
||||
- `TIMEZONE = America/New_York`
|
||||
- `DENSE_HORIZON_DAYS = 120` — scan Fridays densely out to here.
|
||||
- `FAR_HORIZON_DAYS = 730` — also grab already-announced standout/seasonal shows out to here, and read the calendar this far for dedup.
|
||||
- `MAX_NEW_EVENTS = 12`
|
||||
- `EVENT_COLOR_ID = "8"` — Graphite; marks events this job added so they're distinguishable from hand-added ones.
|
||||
- `TODAY` — today's actual calendar date at run time (`YYYY-MM-DD`).
|
||||
|
||||
## Hard rules (do not violate)
|
||||
1. **Calendar scope:** every `list_events` and `create_event` call **must** pass `calendarId = CALENDAR_ID`. Never read or write the primary calendar or any other calendar.
|
||||
2. **Dry-run:** if `DRY_RUN` is true, **never call `create_event`** on any path — produce the plan and stop. Read `DRY_RUN` as the literal value shipped in this file; never edit, override, or infer a different value, whatever the run's apparent purpose.
|
||||
3. **Trust the deduper:** the only events you may write are the ones `reconcile.py` returns in its `"insert"` list. Never write anything from `skip`, `dropped_past`, or `dropped_overflow`, and never dedup by eye.
|
||||
4. **Fail closed:** if the calendar read fails or is incomplete (Step 2), or the deduper fails (Step 3), **write nothing** and say so in the report. A duplicate-causing or un-deduped write is worse than skipping a week.
|
||||
5. **Verify before trusting:** only include an event you have fetched and corroborated against its own source page (Step 1). A wrong event on the calendar is worse than a missing one.
|
||||
|
||||
(Past-date and the `MAX_NEW_EVENTS` cap are enforced deterministically by `reconcile.py` via `--today` and `--max`; you do not police them by hand.)
|
||||
|
||||
## Curation profile — what to look for
|
||||
*(Example profile — edit the interests, exclusions, and hubs to your own.)*
|
||||
|
||||
**Include (interest areas):**
|
||||
- Experimental / ambient / electronic music; avant-garde jazz & classical
|
||||
- Mathematics & science talks; AI / ML — agents, infra, and theory (e.g. dynamical systems)
|
||||
- Outdoor / waterfront adventure
|
||||
- Food & markets
|
||||
|
||||
**Exclude:** occult / esoteric / "uncanny" content — explicitly not wanted. (Avant-garde/ambient sits near this line; when a boundary call excludes something, record it for the report.)
|
||||
|
||||
**Two geographic hubs (Friday anchor):**
|
||||
- **Friday afternoons →** Midtown Manhattan (Bryant Park, MoMath, Chelsea/Hudson Yards galleries, Lincoln Center).
|
||||
- **Friday evenings →** Brooklyn (Gowanus, downtown Brooklyn, Red Hook, etc.).
|
||||
|
||||
**Timing — Friday-anchored (a hard preference, not a tiebreaker):** the calendar exists for Friday plans — **Friday afternoon near Midtown/Penn** and **Friday evening in Brooklyn**. The **majority of each run's picks must fall on a Friday**; prioritize Friday events even when more non-Friday options exist. Include a **non-Friday** event only when it's genuinely standout — a rare or marquee performance/talk worth re-arranging a night for — and **cap non-Friday picks at 4 per run**. Never backfill off-day events to hit a number: if Fridays are thin, a shorter list is correct.
|
||||
|
||||
**Volume:** curated, not exhaustive — quality over quantity, ≤ `MAX_NEW_EVENTS` per run.
|
||||
|
||||
**Conflicts:** do **not** drop overlapping events; add them all — the owner filters from the calendar himself.
|
||||
|
||||
## Seed sources (search these and beyond)
|
||||
- **Music (experimental/ambient/electronic, avant jazz/classical):** Roulette, Public Records, ISSUE Project Room, Pioneer Works, National Sawdust, Nowadays, Le Poisson Rouge, BAM, Lincoln Center Summer for the City. Aggregators: Resident Advisor (NYC), Bandsintown, Songkick, Brooklyn Vegan shows.
|
||||
- **Math/science & AI/ML:** MoMath (Math Encounters), Simons Foundation / Flatiron Institute, NYU & Columbia public lectures, Pioneer Works science talks; meetups via Meetup.com and lu.ma (search "AI", "LLM", "agents", "ML" in NYC).
|
||||
- **Outdoors/waterfront:** Governors Island, Brooklyn Bridge Park, NYC Parks / SummerStage, Prospect Park.
|
||||
- **Food & markets:** Smorgasburg, Brooklyn Flea, DeKalb Market Hall, Time Out Market.
|
||||
|
||||
## Step 1 — Discover, verify, and order candidates
|
||||
Web-search across the seed sources and beyond. Look for events from `TODAY` through `TODAY + DENSE_HORIZON_DAYS` (prioritizing Fridays — afternoon near Midtown/Penn, evening in Brooklyn), plus standout non-Friday events and already-announced far-out/seasonal shows out to `TODAY + FAR_HORIZON_DAYS`. Apply the profile; exclude occult/esoteric.
|
||||
|
||||
**Verify every candidate before keeping it.** A search snippet is **not** verification — you must fetch a real page that corroborates the event. Use `WebFetch` on the candidate's `source_url` and confirm the fetched page states the **same title**, **same date**, and **same venue**. If that fetch fails, 404s, redirects away, or is blocked (e.g. a 403 — some venue sites such as momath.org block fetchers), **try one alternate authoritative page** — the event's ticketing link, a press/listing page, or another reputable venue page that names the same event — and corroborate title + date + venue there instead. **Only drop the candidate if no fetchable page corroborates all three.** (A fabricated event corroborates nowhere, so this preserves the no-hallucination guarantee.) Record the page that confirmed it and the exact date string you saw in `verified_via`.
|
||||
|
||||
Build a JSON **array** of verified candidates (the `//` notes are explanatory, not literal JSON):
|
||||
```json
|
||||
{
|
||||
"title": "string",
|
||||
"start": "YYYY-MM-DDTHH:MM:SS", // local ET. all-day -> "YYYY-MM-DDT00:00:00"
|
||||
"end": "YYYY-MM-DDTHH:MM:SS", // local ET. all-day -> next day "YYYY-MM-DDT00:00:00" (end-exclusive)
|
||||
"all_day": false,
|
||||
"location": "Venue, street address",
|
||||
"description": "string",
|
||||
"rsvp_required": false,
|
||||
"source_url": "https://…",
|
||||
"recurrence": null, // or an RFC-5545 RRULE string for a weekly series
|
||||
"verified_via": "fetched <url>; page shows 'Fri, Aug 7, 2026' at <venue>"
|
||||
}
|
||||
```
|
||||
- **Order matters:** emit candidates **in descending priority** so the cap keeps the best — **Friday afternoon (Midtown) first, then Friday evening (Brooklyn), then up to 4 genuinely-standout non-Friday events**. If more than `MAX_NEW_EVENTS` survive dedup, the deduper keeps the first `MAX_NEW_EVENTS` in this order.
|
||||
- **Title conventions:** append ` (tickets req'd)` when `rsvp_required` is true, or ` (day-of option)` when it's walk-up / decide day-of.
|
||||
- **Description** should contain: a one-line what/why, key logistics (neighborhood, time, transit if useful), ticket/RSVP status, and end with `Source: {source_url}`.
|
||||
- **Recurring options** (e.g. a weekly market): set `recurrence` to an RRULE and give one representative `start`/`end` — not many singletons. The deduper collapses a recurring candidate against any same-title event already on the calendar, so an existing series won't be re-added.
|
||||
- **Empty week is valid:** if zero candidates pass the profile + verification filter, write `[]` to `/tmp/candidates.json` and continue — a zero-insert week is an expected outcome. Do **not** lower the verification bar to "find something."
|
||||
- Keep a tally of anything you excluded for **occult/esoteric** content (count + titles) for the Step 5 report.
|
||||
|
||||
Write the verified array to `/tmp/candidates.json`.
|
||||
|
||||
## Step 2 — Read the calendar (for dedup) — fail closed
|
||||
Determine `T` = the later of (`TODAY + FAR_HORIZON_DAYS`) and (the latest `start` date in `/tmp/candidates.json` **+ 1 day**), so the read always covers your furthest candidate. Then call `list_events` with `calendarId = CALENDAR_ID`, `timeMin = TODAY`, `timeMax = T`, `timeZone = TIMEZONE`, `pageSize = 250`. Page through **every** `nextPageToken` until exhausted and combine all pages.
|
||||
|
||||
**Fail closed:** if the initial call errors or times out, if **any** page in the pagination fails, or if the result is not a parseable `{"events":[...]}` payload, then the existing list is untrustworthy or incomplete — do **not** proceed to a write. Abort Step 4 and report `"calendar read failed/incomplete — skipped all writes to avoid duplicates"` in Step 5. A response that genuinely returns **zero** events (a real empty calendar) is fine and is **not** a failure; only a failed / partial / unparseable read is fatal.
|
||||
|
||||
Save the combined events to `/tmp/existing.json`.
|
||||
|
||||
## Step 3 — Deduplicate (deterministic) — fail closed
|
||||
Write the Appendix script to `/tmp/reconcile.py` **exactly as given** (do not modify it). Confirm your candidate file parses first:
|
||||
|
||||
python3 -c "import json; json.load(open('/tmp/candidates.json'))"
|
||||
|
||||
If it doesn't parse, fix the file you wrote and re-emit it. Then run (substitute the literal `TODAY` date and `MAX_NEW_EVENTS`):
|
||||
|
||||
python3 /tmp/reconcile.py /tmp/candidates.json /tmp/existing.json --today TODAY --max MAX_NEW_EVENTS
|
||||
|
||||
It prints JSON `{"insert": [...], "skip": [...], "dropped_past": [...], "dropped_overflow": [...]}`. Deterministically it: drops past-dated candidates (`dropped_past`), collapses same-run duplicate variants of one event, removes anything already on the calendar (`skip`), and caps inserts at `MAX_NEW_EVENTS` (extras → `dropped_overflow`, in your priority order).
|
||||
|
||||
**Fail closed:** if `reconcile.py` exits non-zero, or its stdout does not parse as JSON containing an `"insert"` key, do **not** fall back to manual/eyeball dedup and do **not** insert anything. Repair the inputs and re-run **once**; if it still fails, skip Step 4 and report the failure in Step 5. Only the `insert` list may be written.
|
||||
|
||||
## Step 4 — Write to the calendar (skip this entire step if `DRY_RUN` is true)
|
||||
If `DRY_RUN` is true, or if any fail-closed condition above tripped: do **not** call `create_event`. Go to Step 5.
|
||||
|
||||
Otherwise, for **each** event in `insert` (it is already deduped, future-dated, and capped — write all of them), call `create_event` with:
|
||||
- `calendarId`: `CALENDAR_ID`
|
||||
- `summary`: the candidate `title`
|
||||
- `startTime` / `endTime`: the candidate `start` / `end`
|
||||
- `timeZone`: `TIMEZONE`
|
||||
- `allDay`: the candidate `all_day`
|
||||
- `location`, `description`: as composed in Step 1
|
||||
- `availability`: `"AVAILABILITY_FREE"` ← non-blocking; this calendar is a browse-and-pick menu, not commitments
|
||||
- `colorId`: `EVENT_COLOR_ID`
|
||||
- `recurrenceData`: `[recurrence]` **only if** `recurrence` is non-null; otherwise omit the field
|
||||
Do **not** set `extendedProperties` — the tool doesn't support it, and dedup doesn't need it.
|
||||
If a single `create_event` call fails, note it and continue with the rest — do not abort the run over one failure.
|
||||
|
||||
## Step 5 — Report
|
||||
Finish with a short report:
|
||||
- **Counts:** candidates found / verified; excluded (occult/esoteric); inserted (or "would insert" under dry-run); skipped as duplicate; dropped as past-dated; dropped over the cap.
|
||||
- **Added** (or would-add): one bullet per inserted event — `title — date — neighborhood`.
|
||||
- **Excluded (occult/esoteric):** the count and titles, so a reviewer can sanity-check boundary calls.
|
||||
- **Coverage gaps:** call out any hub or interest area with no verified picks this week (e.g. Midtown-afternoon, science/AI), and how many picks fell on a Friday vs. not — so a thin or off-anchor week is visible rather than silent.
|
||||
- **Anything fail-closed:** if a read/dedup failure caused you to skip writes, state it plainly.
|
||||
- **Failures:** any `create_event` errors.
|
||||
|
||||
## Appendix — `/tmp/reconcile.py`
|
||||
Write this file verbatim, then run it as in Step 3:
|
||||
```python
|
||||
<<<RECONCILE_PY>>>
|
||||
```
|
||||
238
reconcile.py
Normal file
238
reconcile.py
Normal file
@@ -0,0 +1,238 @@
|
||||
"""Deterministic dedup/reconcile for the recurring "Events" calendar automation.
|
||||
|
||||
Pure stdlib. Given a list of candidate events (from the curation step) and the
|
||||
events already on the target Google Calendar (from a calendar list call), split
|
||||
the candidates into: new events to insert, duplicates to skip, candidates dropped
|
||||
as past-dated, and candidates dropped because they exceed the per-run cap.
|
||||
|
||||
Identity model
|
||||
--------------
|
||||
An event's identity is ``(normalized_title, start_date)``. Every event we insert
|
||||
is stamped with ``autoKey = sha1(normalized_title + "|" + start_date)``. But
|
||||
matching does NOT depend on a stored key: the existing seed events were
|
||||
hand-created and carry no autoKey, so we always also match on the ``(title,
|
||||
date)`` identity directly. This is the fix for the most likely failure mode —
|
||||
re-inserting events that are already on the calendar.
|
||||
|
||||
Storage note: the claude.ai Google Calendar connector's ``create_event`` cannot
|
||||
write ``extendedProperties``, so autoKey cannot actually be persisted under that
|
||||
backend. autoKey is still computed (a stable identity, forward-compatible with a
|
||||
direct Calendar API / service-account backend), but with the connector the
|
||||
load-bearing dedup path is the ``(title, date)`` match against the events already
|
||||
on the calendar — which is why the existing keyless seed events must (and do)
|
||||
dedup correctly.
|
||||
|
||||
Title matching is deliberately fuzzy (strip a trailing "(...)" tag, strip a
|
||||
trailing "— Venue" segment, then compare with token-subset / Jaccard overlap)
|
||||
because the same event is routinely reported with slightly different titles
|
||||
across runs: with or without the venue, with or without a "(tickets req'd)" tag.
|
||||
|
||||
Determinism guarantees (do not rely on model judgment for these):
|
||||
- intra-run dedup: two candidate variants of the same event in one batch collapse
|
||||
to a single insert;
|
||||
- past-date floor: with ``today`` set, any candidate starting before today is
|
||||
dropped (so the "no past events" rule has a deterministic source);
|
||||
- cap: with ``max_new`` set, inserts beyond the cap overflow into a bucket rather
|
||||
than being silently truncated by call order.
|
||||
|
||||
This module is intentionally dependency-free and side-effect-free so it can be
|
||||
unit-tested in isolation and dropped verbatim into the routine.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
|
||||
# A trailing parenthetical tag we append by convention, e.g. "(tickets req'd)".
|
||||
_TRAILING_PAREN = re.compile(r"\s*\([^()]*\)\s*$")
|
||||
# "Event Title — Venue": em dash (U+2014), en dash (U+2013), or hyphen, padded.
|
||||
_VENUE_SEP = re.compile(r"\s+[—–-]\s+")
|
||||
_WS = re.compile(r"\s+")
|
||||
_PUNCT = re.compile(r"[^\w\s]")
|
||||
|
||||
JACCARD_THRESHOLD = 0.6
|
||||
|
||||
|
||||
def normalize_title(title: str) -> str:
|
||||
"""Lowercase, drop a trailing "(...)" tag, collapse whitespace."""
|
||||
t = title.strip().lower()
|
||||
t = _TRAILING_PAREN.sub("", t)
|
||||
t = _WS.sub(" ", t).strip()
|
||||
return t
|
||||
|
||||
|
||||
def strip_venue(normalized: str) -> str:
|
||||
"""Keep only the part before the first " — Venue" separator."""
|
||||
return _VENUE_SEP.split(normalized, 1)[0].strip()
|
||||
|
||||
|
||||
def _tokens(s: str) -> set[str]:
|
||||
return {w for w in _PUNCT.sub(" ", s).split() if w}
|
||||
|
||||
|
||||
def _jaccard(a: set[str], b: set[str]) -> float:
|
||||
if not a or not b:
|
||||
return 0.0
|
||||
return len(a & b) / len(a | b)
|
||||
|
||||
|
||||
def titles_match(a: str, b: str) -> bool:
|
||||
na, nb = normalize_title(a), normalize_title(b)
|
||||
if na == nb:
|
||||
return True
|
||||
sa, sb = strip_venue(na), strip_venue(nb)
|
||||
if sa == sb:
|
||||
return True
|
||||
ta, tb = _tokens(sa), _tokens(sb)
|
||||
if ta and tb and (ta <= tb or tb <= ta): # one is a subset of the other
|
||||
return True
|
||||
return _jaccard(ta, tb) >= JACCARD_THRESHOLD
|
||||
|
||||
|
||||
def start_date(event: dict) -> str:
|
||||
"""Return ``YYYY-MM-DD`` for a candidate or a Google event ('' if unknown).
|
||||
|
||||
Handles a Google event ({"start": {"dateTime"|"date": ...}}) and a candidate
|
||||
({"start": "2026-06-12T19:00:00"} or all-day {"start": "2026-06-12"}).
|
||||
"""
|
||||
start = event.get("start")
|
||||
if isinstance(start, dict):
|
||||
val = start.get("dateTime") or start.get("date") or ""
|
||||
else:
|
||||
val = start or ""
|
||||
return val[:10]
|
||||
|
||||
|
||||
def title_of(event: dict) -> str:
|
||||
return event.get("summary") or event.get("title") or ""
|
||||
|
||||
|
||||
def auto_key(title: str, date: str) -> str:
|
||||
basis = f"{normalize_title(title)}|{date}"
|
||||
return hashlib.sha1(basis.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def _existing_autokey(event: dict) -> str | None:
|
||||
# Google may return null (not just absent) for these on hand-added events;
|
||||
# `or {}` guards both the missing-key and present-but-null cases.
|
||||
ep = event.get("extendedProperties") or {}
|
||||
priv = ep.get("private") or {}
|
||||
return priv.get("autoKey")
|
||||
|
||||
|
||||
def is_duplicate(candidate: dict, existing: dict) -> bool:
|
||||
"""Is ``candidate`` the same event as the already-present ``existing``?"""
|
||||
ek = _existing_autokey(existing)
|
||||
if ek and ek == auto_key(title_of(candidate), start_date(candidate)):
|
||||
return True
|
||||
|
||||
ct, et = title_of(candidate), title_of(existing)
|
||||
# A recurring candidate covers the whole horizon, so it is a duplicate if ANY
|
||||
# existing event shares its (fuzzy) title — regardless of date, and whether or
|
||||
# not the existing copy is flagged recurring (the connector may return an
|
||||
# expanded instance).
|
||||
if candidate.get("recurrence") and titles_match(ct, et):
|
||||
return True
|
||||
# Otherwise require the same calendar day and a fuzzy title match.
|
||||
cd = start_date(candidate)
|
||||
if cd and cd == start_date(existing) and titles_match(ct, et):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def reconcile(candidates: list[dict], existing: list[dict],
|
||||
today: str | None = None, max_new: int | None = None) -> dict:
|
||||
"""Split candidates into insert / skip / dropped_past / dropped_overflow.
|
||||
|
||||
- ``today`` (``YYYY-MM-DD``): candidates starting before it are dropped_past.
|
||||
- ``max_new``: inserts beyond the cap overflow into dropped_overflow, in
|
||||
input order — so upstream priority ordering is preserved, not truncated by
|
||||
authoring accident.
|
||||
Duplicates are detected against ``existing`` AND against candidates already
|
||||
accepted this run (so two variants of the same event collapse to one insert).
|
||||
Each insert is the candidate dict plus a computed ``autoKey``.
|
||||
"""
|
||||
inserts: list[dict] = []
|
||||
skips: list[dict] = []
|
||||
dropped_past: list[dict] = []
|
||||
for c in candidates:
|
||||
cd = start_date(c)
|
||||
if today and cd and cd < today:
|
||||
dropped_past.append({"candidate": c, "reason": f"starts {cd}, before {today}"})
|
||||
continue
|
||||
match = next((e for e in existing if is_duplicate(c, e)), None)
|
||||
if match is None: # collapse same-run duplicates too
|
||||
match = next((p for p in inserts if is_duplicate(c, p)), None)
|
||||
if match is not None:
|
||||
skips.append({"candidate": c, "matched": title_of(match),
|
||||
"reason": "already present"})
|
||||
continue
|
||||
stamped = dict(c)
|
||||
stamped["autoKey"] = auto_key(title_of(c), cd)
|
||||
inserts.append(stamped)
|
||||
|
||||
dropped_overflow: list[dict] = []
|
||||
if max_new is not None and len(inserts) > max_new:
|
||||
dropped_overflow = inserts[max_new:]
|
||||
inserts = inserts[:max_new]
|
||||
|
||||
return {"insert": inserts, "skip": skips,
|
||||
"dropped_past": dropped_past, "dropped_overflow": dropped_overflow}
|
||||
|
||||
|
||||
def as_event_list(obj) -> list[dict]:
|
||||
"""Accept a bare list, or the wrapper objects the calendar tools return."""
|
||||
if isinstance(obj, list):
|
||||
return obj
|
||||
if isinstance(obj, dict):
|
||||
for key in ("events", "items"):
|
||||
if isinstance(obj.get(key), list):
|
||||
return obj[key]
|
||||
return []
|
||||
|
||||
|
||||
def _main(argv: list[str]) -> int:
|
||||
import argparse
|
||||
import json
|
||||
|
||||
p = argparse.ArgumentParser(
|
||||
description="Reconcile candidate events against events already on the calendar.")
|
||||
p.add_argument("candidates", help="JSON file: list of candidate events")
|
||||
p.add_argument("existing",
|
||||
help="JSON file: calendar events (a list, or {\"events\": [...]})")
|
||||
p.add_argument("--today", metavar="YYYY-MM-DD",
|
||||
help="drop candidates that start before this date")
|
||||
p.add_argument("--max", type=int, dest="max_new", metavar="N",
|
||||
help="cap inserts at N; the rest go to dropped_overflow (input order)")
|
||||
p.add_argument("--explain", action="store_true",
|
||||
help="print a human-readable summary instead of machine JSON")
|
||||
args = p.parse_args(argv)
|
||||
|
||||
with open(args.candidates, encoding="utf-8") as f:
|
||||
candidates = as_event_list(json.load(f))
|
||||
with open(args.existing, encoding="utf-8") as f:
|
||||
existing = as_event_list(json.load(f))
|
||||
|
||||
result = reconcile(candidates, existing, today=args.today, max_new=args.max_new)
|
||||
if args.explain:
|
||||
print(f"{len(result['insert'])} to insert, "
|
||||
f"{len(result['skip'])} skipped (already present), "
|
||||
f"{len(result['dropped_past'])} dropped (past), "
|
||||
f"{len(result['dropped_overflow'])} dropped (over cap):\n")
|
||||
for i in result["insert"]:
|
||||
print(f" + {title_of(i)} [{start_date(i)}]")
|
||||
for s in result["skip"]:
|
||||
print(f' = {title_of(s["candidate"])} -> matches "{s["matched"]}"')
|
||||
for d in result["dropped_past"]:
|
||||
print(f' x {title_of(d["candidate"])} [{start_date(d["candidate"])}] (past)')
|
||||
for d in result["dropped_overflow"]:
|
||||
print(f" ~ {title_of(d)} [{start_date(d)}] (over cap)")
|
||||
else:
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
raise SystemExit(_main(sys.argv[1:]))
|
||||
196
tests/test_reconcile.py
Normal file
196
tests/test_reconcile.py
Normal file
@@ -0,0 +1,196 @@
|
||||
"""Spec for reconcile.py — written before the implementation.
|
||||
|
||||
Fixtures are the REAL events pulled from the live "Events" calendar on
|
||||
2026-06-06, trimmed to the fields reconcile uses. Crucially, none of them carry
|
||||
an `autoKey` (they were hand-created via the chat workflow), so these fixtures
|
||||
exercise exactly the case the fallback (title, date) matching must handle.
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
import reconcile as R # noqa: E402
|
||||
|
||||
TIGNOR = {
|
||||
"summary": "Christopher Tignor + Julia Kent — Public Records (tickets req'd)",
|
||||
"start": {"dateTime": "2026-06-12T19:00:00-04:00", "timeZone": "America/New_York"},
|
||||
}
|
||||
BRANCA = {
|
||||
"summary": 'Glenn Branca: Symphony No. 13 "Hallucination City" for 100 Guitars '
|
||||
"— Lincoln Center (tickets rec'd)",
|
||||
"start": {"dateTime": "2026-06-12T19:30:00-04:00", "timeZone": "America/New_York"},
|
||||
}
|
||||
SMORG = { # recurring instance (RRULE master expanded)
|
||||
"summary": "Smorgasburg @ The Oculus (day-of option)",
|
||||
"start": {"dateTime": "2026-06-19T11:00:00-04:00", "timeZone": "America/New_York"},
|
||||
"recurringEventId": "dcu0np1bp18mknfpdjdpbidamg",
|
||||
}
|
||||
GOVISLAND = { # recurring all-day instance
|
||||
"summary": "Governors Island + Six Coasts (day-of option)",
|
||||
"start": {"date": "2026-06-19"},
|
||||
"recurringEventId": "ul9ncfjd8po4augmn04k11g5es",
|
||||
}
|
||||
EXISTING = [TIGNOR, BRANCA, SMORG, GOVISLAND]
|
||||
|
||||
|
||||
def cand(title, start, **kw):
|
||||
return {"title": title, "start": start, **kw}
|
||||
|
||||
|
||||
class TestNormalize(unittest.TestCase):
|
||||
def test_strips_trailing_tag(self):
|
||||
self.assertEqual(R.normalize_title("Foo Bar (tickets req'd)"), "foo bar")
|
||||
|
||||
def test_strips_trailing_dayof_tag(self):
|
||||
self.assertEqual(R.normalize_title("Smorgasburg @ The Oculus (day-of option)"),
|
||||
"smorgasburg @ the oculus")
|
||||
|
||||
def test_strip_venue_em_dash(self):
|
||||
self.assertEqual(
|
||||
R.strip_venue("christopher tignor + julia kent — public records"),
|
||||
"christopher tignor + julia kent")
|
||||
|
||||
def test_strip_venue_noop_without_separator(self):
|
||||
self.assertEqual(R.strip_venue("smorgasburg @ the oculus"),
|
||||
"smorgasburg @ the oculus")
|
||||
|
||||
|
||||
class TestStartDate(unittest.TestCase):
|
||||
def test_google_datetime(self):
|
||||
self.assertEqual(R.start_date(TIGNOR), "2026-06-12")
|
||||
|
||||
def test_google_all_day(self):
|
||||
self.assertEqual(R.start_date(GOVISLAND), "2026-06-19")
|
||||
|
||||
def test_candidate_iso_string(self):
|
||||
self.assertEqual(R.start_date(cand("x", "2026-07-10T20:00:00")), "2026-07-10")
|
||||
|
||||
def test_candidate_all_day_date(self):
|
||||
self.assertEqual(R.start_date(cand("x", "2026-07-10", all_day=True)), "2026-07-10")
|
||||
|
||||
|
||||
class TestDedup(unittest.TestCase):
|
||||
def test_exact_re_report_is_duplicate(self):
|
||||
c = cand("Christopher Tignor + Julia Kent — Public Records (tickets req'd)",
|
||||
"2026-06-12T19:00:00")
|
||||
self.assertTrue(R.is_duplicate(c, TIGNOR))
|
||||
|
||||
def test_same_event_without_venue_or_tag(self):
|
||||
c = cand("Christopher Tignor + Julia Kent", "2026-06-12T19:00:00")
|
||||
self.assertTrue(R.is_duplicate(c, TIGNOR))
|
||||
|
||||
def test_branca_variant_tag_and_time_same_date(self):
|
||||
# model re-reports with the other tag (req'd vs rec'd), no venue, 7:00 vs 7:30
|
||||
c = cand('Glenn Branca: Symphony No. 13 "Hallucination City" for 100 Guitars',
|
||||
"2026-06-12T19:00:00")
|
||||
self.assertTrue(R.is_duplicate(c, BRANCA))
|
||||
|
||||
def test_recurring_candidate_matches_series_on_other_date(self):
|
||||
c = cand("Smorgasburg @ The Oculus", "2026-07-03T11:00:00",
|
||||
recurrence="RRULE:FREQ=WEEKLY;BYDAY=FR")
|
||||
self.assertTrue(R.is_duplicate(c, SMORG))
|
||||
|
||||
def test_unrelated_event_same_date_not_duplicate(self):
|
||||
c = cand("Ryoji Ikeda — The Shed", "2026-06-12T20:00:00") # same date as Tignor/Branca
|
||||
self.assertFalse(any(R.is_duplicate(c, e) for e in EXISTING))
|
||||
|
||||
def test_same_title_different_date_not_duplicate(self):
|
||||
# a genuinely new one-off Tignor show on another date should NOT be suppressed
|
||||
c = cand("Christopher Tignor + Julia Kent — Public Records", "2026-09-04T19:00:00")
|
||||
self.assertFalse(any(R.is_duplicate(c, e) for e in EXISTING))
|
||||
|
||||
|
||||
class TestReconcile(unittest.TestCase):
|
||||
def test_filters_and_stamps_autokey(self):
|
||||
cands = [
|
||||
cand("Christopher Tignor + Julia Kent", "2026-06-12T19:00:00"), # dup
|
||||
cand("Ryoji Ikeda — The Shed", "2026-07-10T20:00:00"), # new
|
||||
]
|
||||
out = R.reconcile(cands, EXISTING)
|
||||
self.assertEqual([i["title"] for i in out["insert"]], ["Ryoji Ikeda — The Shed"])
|
||||
self.assertEqual(len(out["skip"]), 1)
|
||||
self.assertEqual(out["insert"][0]["autoKey"],
|
||||
R.auto_key("Ryoji Ikeda — The Shed", "2026-07-10"))
|
||||
|
||||
def test_autokey_exact_match_short_circuits_title(self):
|
||||
# if an existing event carries OUR autoKey, match even when the display
|
||||
# name is unrecognizable (e.g. the user renamed it on the calendar)
|
||||
key = R.auto_key("Some Talk", "2026-08-01")
|
||||
ev = {"summary": "Totally Different Display Name",
|
||||
"start": {"date": "2026-08-01"},
|
||||
"extendedProperties": {"private": {"autoKey": key}}}
|
||||
self.assertTrue(R.is_duplicate(cand("Some Talk", "2026-08-01"), ev))
|
||||
|
||||
def test_idempotent_second_run(self):
|
||||
# feeding the previous run's inserts back in (now present on the calendar)
|
||||
# produces zero new inserts
|
||||
first = R.reconcile([cand("Ryoji Ikeda — The Shed", "2026-07-10T20:00:00")], EXISTING)
|
||||
as_calendar_event = {
|
||||
"summary": "Ryoji Ikeda — The Shed",
|
||||
"start": {"dateTime": "2026-07-10T20:00:00-04:00"},
|
||||
"extendedProperties": {"private": {"autoKey": first["insert"][0]["autoKey"]}},
|
||||
}
|
||||
second = R.reconcile([cand("Ryoji Ikeda — The Shed", "2026-07-10T20:00:00")],
|
||||
EXISTING + [as_calendar_event])
|
||||
self.assertEqual(second["insert"], [])
|
||||
|
||||
|
||||
class TestLoader(unittest.TestCase):
|
||||
def test_bare_list(self):
|
||||
self.assertEqual(R.as_event_list([TIGNOR]), [TIGNOR])
|
||||
|
||||
def test_events_wrapper(self): # shape returned by the list-events tool
|
||||
self.assertEqual(R.as_event_list({"events": [TIGNOR], "summary": "Events"}), [TIGNOR])
|
||||
|
||||
def test_items_wrapper(self): # raw Google API shape
|
||||
self.assertEqual(R.as_event_list({"items": [TIGNOR]}), [TIGNOR])
|
||||
|
||||
def test_unrecognized_returns_empty(self):
|
||||
self.assertEqual(R.as_event_list({"nope": 1}), [])
|
||||
|
||||
|
||||
class TestHardening(unittest.TestCase):
|
||||
"""Regression tests for defects the adversarial review reproduced live."""
|
||||
|
||||
def test_intra_run_collapses_same_date_variants(self):
|
||||
# two titles for the same show on the same date, against an EMPTY calendar
|
||||
cands = [cand("Tim Hecker — Pioneer Works", "2026-08-07T20:00:00"),
|
||||
cand("Tim Hecker (tickets req'd)", "2026-08-07T20:00:00")]
|
||||
out = R.reconcile(cands, [])
|
||||
self.assertEqual(len(out["insert"]), 1)
|
||||
self.assertEqual(len(out["skip"]), 1)
|
||||
|
||||
def test_recurring_candidate_matches_nonrecurring_same_title(self):
|
||||
# connector may return an expanded instance with no recurringEventId
|
||||
existing = {"summary": "Smorgasburg @ The Oculus",
|
||||
"start": {"dateTime": "2026-06-19T11:00:00-04:00"}}
|
||||
c = cand("Smorgasburg @ The Oculus", "2026-09-04T11:00:00",
|
||||
recurrence="RRULE:FREQ=WEEKLY;BYDAY=FR")
|
||||
self.assertTrue(R.is_duplicate(c, existing))
|
||||
|
||||
def test_null_extended_properties_does_not_crash(self):
|
||||
ev1 = {"summary": "X", "start": {"date": "2026-08-01"}, "extendedProperties": None}
|
||||
ev2 = {"summary": "Y", "start": {"date": "2026-08-01"},
|
||||
"extendedProperties": {"private": None}}
|
||||
self.assertFalse(R.is_duplicate(cand("Totally Other", "2026-08-01"), ev1))
|
||||
self.assertFalse(R.is_duplicate(cand("Totally Other", "2026-08-01"), ev2))
|
||||
|
||||
def test_today_drops_past_dated(self):
|
||||
cands = [cand("Old Show", "2020-01-01T20:00:00"),
|
||||
cand("Future Show", "2026-12-01T20:00:00")]
|
||||
out = R.reconcile(cands, [], today="2026-06-08")
|
||||
self.assertEqual([R.title_of(i) for i in out["insert"]], ["Future Show"])
|
||||
self.assertEqual(len(out["dropped_past"]), 1)
|
||||
self.assertEqual(R.title_of(out["dropped_past"][0]["candidate"]), "Old Show")
|
||||
|
||||
def test_max_caps_inserts_and_overflows_in_order(self):
|
||||
cands = [cand(f"Show {i}", f"2026-07-0{i}T20:00:00") for i in range(1, 6)]
|
||||
out = R.reconcile(cands, [], max_new=3)
|
||||
self.assertEqual([R.title_of(i) for i in out["insert"]],
|
||||
["Show 1", "Show 2", "Show 3"])
|
||||
self.assertEqual(len(out["dropped_overflow"]), 2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
Reference in New Issue
Block a user