Files
event_curator/ROUTINE_PROMPT.md

22 KiB

Weekly "Events" Calendar Curation — autonomous routine

You are an unattended weekly job. No human is watching. Do not ask questions, do not pause for confirmation. Run the whole workflow below in one pass and finish with the report in Step 5. "Don't stop early" never means "improvise past a safety check" — where a step says to fail closed, fail closed.

Configuration

Treat these as constants for this run:

  • DRY_RUN = true — for this run: do all research + dedup but write nothing; produce the Step 5 plan and stop. (The owner enables real writes by editing this file to false before a future run. You must never change this value yourself during a run.)
  • CALENDAR_ID = <YOUR_CALENDAR_ID> — your target secondary calendar (e.g. c_xxxxxxxx@group.calendar.google.com). This is the only calendar you may ever read or write.
  • TIMEZONE = America/New_York
  • DENSE_HORIZON_DAYS = 120 — scan Fridays densely out to here.
  • FAR_HORIZON_DAYS = 730 — also grab already-announced standout/seasonal shows out to here, and read the calendar this far for dedup.
  • MAX_NEW_EVENTS = 12
  • EVENT_COLOR_ID = "8" — Graphite; marks events this job added so they're distinguishable from hand-added ones.
  • TODAY — today's actual calendar date at run time (YYYY-MM-DD).

Hard rules (do not violate)

  1. Calendar scope: every list_events and create_event call must pass calendarId = CALENDAR_ID. Never read or write the primary calendar or any other calendar.
  2. Dry-run: if DRY_RUN is true, never call create_event on any path — produce the plan and stop. Read DRY_RUN as the literal value shipped in this file; never edit, override, or infer a different value, whatever the run's apparent purpose.
  3. Trust the deduper: the only events you may write are the ones reconcile.py returns in its "insert" list. Never write anything from skip, dropped_past, or dropped_overflow, and never dedup by eye.
  4. Fail closed: if the calendar read fails or is incomplete (Step 2), or the deduper fails (Step 3), write nothing and say so in the report. A duplicate-causing or un-deduped write is worse than skipping a week.
  5. Verify before trusting: only include an event you have fetched and corroborated against its own source page (Step 1). A wrong event on the calendar is worse than a missing one.

(Past-date and the MAX_NEW_EVENTS cap are enforced deterministically by reconcile.py via --today and --max; you do not police them by hand.)

Curation profile — what to look for

(Example profile — edit the interests, exclusions, and hubs to your own.)

Include (interest areas):

  • Experimental / ambient / electronic music; avant-garde jazz & classical
  • Mathematics & science talks; AI / ML — agents, infra, and theory (e.g. dynamical systems)
  • Outdoor / waterfront adventure
  • Food & markets

Exclude: occult / esoteric / "uncanny" content — explicitly not wanted. (Avant-garde/ambient sits near this line; when a boundary call excludes something, record it for the report.)

Two geographic hubs (Friday anchor):

  • Friday afternoons → Midtown Manhattan (Bryant Park, MoMath, Chelsea/Hudson Yards galleries, Lincoln Center).
  • Friday evenings → Brooklyn (Gowanus, downtown Brooklyn, Red Hook, etc.).

Timing — Friday-anchored (a hard preference, not a tiebreaker): the calendar exists for Friday plans — Friday afternoon near Midtown/Penn and Friday evening in Brooklyn. The majority of each run's picks must fall on a Friday; prioritize Friday events even when more non-Friday options exist. Include a non-Friday event only when it's genuinely standout — a rare or marquee performance/talk worth re-arranging a night for — and cap non-Friday picks at 4 per run. Never backfill off-day events to hit a number: if Fridays are thin, a shorter list is correct.

Volume: curated, not exhaustive — quality over quantity, ≤ MAX_NEW_EVENTS per run.

Conflicts: do not drop overlapping events; add them all — the owner filters from the calendar himself.

Seed sources (search these and beyond)

  • Music (experimental/ambient/electronic, avant jazz/classical): Roulette, Public Records, ISSUE Project Room, Pioneer Works, National Sawdust, Nowadays, Le Poisson Rouge, BAM, Lincoln Center Summer for the City. Aggregators: Resident Advisor (NYC), Bandsintown, Songkick, Brooklyn Vegan shows.
  • Math/science & AI/ML: MoMath (Math Encounters), Simons Foundation / Flatiron Institute, NYU & Columbia public lectures, Pioneer Works science talks; meetups via Meetup.com and lu.ma (search "AI", "LLM", "agents", "ML" in NYC).
  • Outdoors/waterfront: Governors Island, Brooklyn Bridge Park, NYC Parks / SummerStage, Prospect Park.
  • Food & markets: Smorgasburg, Brooklyn Flea, DeKalb Market Hall, Time Out Market.

Step 1 — Discover, verify, and order candidates

Web-search across the seed sources and beyond. Look for events from TODAY through TODAY + DENSE_HORIZON_DAYS (prioritizing Fridays — afternoon near Midtown/Penn, evening in Brooklyn), plus standout non-Friday events and already-announced far-out/seasonal shows out to TODAY + FAR_HORIZON_DAYS. Apply the profile; exclude occult/esoteric.

Verify every candidate before keeping it. A search snippet is not verification — you must fetch a real page that corroborates the event. Use WebFetch on the candidate's source_url and confirm the fetched page states the same title, same date, and same venue. If that fetch fails, 404s, redirects away, or is blocked (e.g. a 403 — some venue sites such as momath.org block fetchers), try one alternate authoritative page — the event's ticketing link, a press/listing page, or another reputable venue page that names the same event — and corroborate title + date + venue there instead. Only drop the candidate if no fetchable page corroborates all three. (A fabricated event corroborates nowhere, so this preserves the no-hallucination guarantee.) Record the page that confirmed it and the exact date string you saw in verified_via.

Build a JSON array of verified candidates (the // notes are explanatory, not literal JSON):

{
  "title": "string",
  "start": "YYYY-MM-DDTHH:MM:SS",   // local ET. all-day -> "YYYY-MM-DDT00:00:00"
  "end":   "YYYY-MM-DDTHH:MM:SS",   // local ET. all-day -> next day "YYYY-MM-DDT00:00:00" (end-exclusive)
  "all_day": false,
  "location": "Venue, street address",
  "description": "string",
  "rsvp_required": false,
  "source_url": "https://…",
  "recurrence": null,                // or an RFC-5545 RRULE string for a weekly series
  "verified_via": "fetched <url>; page shows 'Fri, Aug 7, 2026' at <venue>"
}
  • Order matters: emit candidates in descending priority so the cap keeps the best — Friday afternoon (Midtown) first, then Friday evening (Brooklyn), then up to 4 genuinely-standout non-Friday events. If more than MAX_NEW_EVENTS survive dedup, the deduper keeps the first MAX_NEW_EVENTS in this order.
  • Title conventions: append (tickets req'd) when rsvp_required is true, or (day-of option) when it's walk-up / decide day-of.
  • Description should contain: a one-line what/why, key logistics (neighborhood, time, transit if useful), ticket/RSVP status, and end with Source: {source_url}.
  • Recurring options (e.g. a weekly market): set recurrence to an RRULE and give one representative start/end — not many singletons. The deduper collapses a recurring candidate against any same-title event already on the calendar, so an existing series won't be re-added.
  • Empty week is valid: if zero candidates pass the profile + verification filter, write [] to /tmp/candidates.json and continue — a zero-insert week is an expected outcome. Do not lower the verification bar to "find something."
  • Keep a tally of anything you excluded for occult/esoteric content (count + titles) for the Step 5 report.

Write the verified array to /tmp/candidates.json.

Step 2 — Read the calendar (for dedup) — fail closed

Determine T = the later of (TODAY + FAR_HORIZON_DAYS) and (the latest start date in /tmp/candidates.json + 1 day), so the read always covers your furthest candidate. Then call list_events with calendarId = CALENDAR_ID, timeMin = TODAY, timeMax = T, timeZone = TIMEZONE, pageSize = 250. Page through every nextPageToken until exhausted and combine all pages.

Fail closed: if the initial call errors or times out, if any page in the pagination fails, or if the result is not a parseable {"events":[...]} payload, then the existing list is untrustworthy or incomplete — do not proceed to a write. Abort Step 4 and report "calendar read failed/incomplete — skipped all writes to avoid duplicates" in Step 5. A response that genuinely returns zero events (a real empty calendar) is fine and is not a failure; only a failed / partial / unparseable read is fatal.

Save the combined events to /tmp/existing.json.

Step 3 — Deduplicate (deterministic) — fail closed

Write the Appendix script to /tmp/reconcile.py exactly as given (do not modify it). Confirm your candidate file parses first:

python3 -c "import json; json.load(open('/tmp/candidates.json'))"

If it doesn't parse, fix the file you wrote and re-emit it. Then run (substitute the literal TODAY date and MAX_NEW_EVENTS):

python3 /tmp/reconcile.py /tmp/candidates.json /tmp/existing.json --today TODAY --max MAX_NEW_EVENTS

It prints JSON {"insert": [...], "skip": [...], "dropped_past": [...], "dropped_overflow": [...]}. Deterministically it: drops past-dated candidates (dropped_past), collapses same-run duplicate variants of one event, removes anything already on the calendar (skip), and caps inserts at MAX_NEW_EVENTS (extras → dropped_overflow, in your priority order).

Fail closed: if reconcile.py exits non-zero, or its stdout does not parse as JSON containing an "insert" key, do not fall back to manual/eyeball dedup and do not insert anything. Repair the inputs and re-run once; if it still fails, skip Step 4 and report the failure in Step 5. Only the insert list may be written.

Step 4 — Write to the calendar (skip this entire step if DRY_RUN is true)

If DRY_RUN is true, or if any fail-closed condition above tripped: do not call create_event. Go to Step 5.

Otherwise, for each event in insert (it is already deduped, future-dated, and capped — write all of them), call create_event with:

  • calendarId: CALENDAR_ID
  • summary: the candidate title
  • startTime / endTime: the candidate start / end
  • timeZone: TIMEZONE
  • allDay: the candidate all_day
  • location, description: as composed in Step 1
  • availability: "AVAILABILITY_FREE" ← non-blocking; this calendar is a browse-and-pick menu, not commitments
  • colorId: EVENT_COLOR_ID
  • recurrenceData: [recurrence] only if recurrence is non-null; otherwise omit the field Do not set extendedProperties — the tool doesn't support it, and dedup doesn't need it. If a single create_event call fails, note it and continue with the rest — do not abort the run over one failure.

Step 5 — Report

Finish with a short report:

  • Counts: candidates found / verified; excluded (occult/esoteric); inserted (or "would insert" under dry-run); skipped as duplicate; dropped as past-dated; dropped over the cap.
  • Added (or would-add): one bullet per inserted event — title — date — neighborhood.
  • Excluded (occult/esoteric): the count and titles, so a reviewer can sanity-check boundary calls.
  • Coverage gaps: call out any hub or interest area with no verified picks this week (e.g. Midtown-afternoon, science/AI), and how many picks fell on a Friday vs. not — so a thin or off-anchor week is visible rather than silent.
  • Anything fail-closed: if a read/dedup failure caused you to skip writes, state it plainly.
  • Failures: any create_event errors.

Appendix — /tmp/reconcile.py

Write this file verbatim, then run it as in Step 3:

"""Deterministic dedup/reconcile for the recurring "Events" calendar automation.

Pure stdlib. Given a list of candidate events (from the curation step) and the
events already on the target Google Calendar (from a calendar list call), split
the candidates into: new events to insert, duplicates to skip, candidates dropped
as past-dated, and candidates dropped because they exceed the per-run cap.

Identity model
--------------
An event's identity is ``(normalized_title, start_date)``. Every event we insert
is stamped with ``autoKey = sha1(normalized_title + "|" + start_date)``. But
matching does NOT depend on a stored key: the existing seed events were
hand-created and carry no autoKey, so we always also match on the ``(title,
date)`` identity directly. This is the fix for the most likely failure mode —
re-inserting events that are already on the calendar.

Storage note: the claude.ai Google Calendar connector's ``create_event`` cannot
write ``extendedProperties``, so autoKey cannot actually be persisted under that
backend. autoKey is still computed (a stable identity, forward-compatible with a
direct Calendar API / service-account backend), but with the connector the
load-bearing dedup path is the ``(title, date)`` match against the events already
on the calendar — which is why the existing keyless seed events must (and do)
dedup correctly.

Title matching is deliberately fuzzy (strip a trailing "(...)" tag, strip a
trailing "— Venue" segment, then compare with token-subset / Jaccard overlap)
because the same event is routinely reported with slightly different titles
across runs: with or without the venue, with or without a "(tickets req'd)" tag.

Determinism guarantees (do not rely on model judgment for these):
- intra-run dedup: two candidate variants of the same event in one batch collapse
  to a single insert;
- past-date floor: with ``today`` set, any candidate starting before today is
  dropped (so the "no past events" rule has a deterministic source);
- cap: with ``max_new`` set, inserts beyond the cap overflow into a bucket rather
  than being silently truncated by call order.

This module is intentionally dependency-free and side-effect-free so it can be
unit-tested in isolation and dropped verbatim into the routine.
"""
from __future__ import annotations

import hashlib
import re

# A trailing parenthetical tag we append by convention, e.g. "(tickets req'd)".
_TRAILING_PAREN = re.compile(r"\s*\([^()]*\)\s*$")
# "Event Title — Venue": em dash (U+2014), en dash (U+2013), or hyphen, padded.
_VENUE_SEP = re.compile(r"\s+[—–-]\s+")
_WS = re.compile(r"\s+")
_PUNCT = re.compile(r"[^\w\s]")

JACCARD_THRESHOLD = 0.6


def normalize_title(title: str) -> str:
    """Lowercase, drop a trailing "(...)" tag, collapse whitespace."""
    t = title.strip().lower()
    t = _TRAILING_PAREN.sub("", t)
    t = _WS.sub(" ", t).strip()
    return t


def strip_venue(normalized: str) -> str:
    """Keep only the part before the first " — Venue" separator."""
    return _VENUE_SEP.split(normalized, 1)[0].strip()


def _tokens(s: str) -> set[str]:
    return {w for w in _PUNCT.sub(" ", s).split() if w}


def _jaccard(a: set[str], b: set[str]) -> float:
    if not a or not b:
        return 0.0
    return len(a & b) / len(a | b)


def titles_match(a: str, b: str) -> bool:
    na, nb = normalize_title(a), normalize_title(b)
    if na == nb:
        return True
    sa, sb = strip_venue(na), strip_venue(nb)
    if sa == sb:
        return True
    ta, tb = _tokens(sa), _tokens(sb)
    if ta and tb and (ta <= tb or tb <= ta):  # one is a subset of the other
        return True
    return _jaccard(ta, tb) >= JACCARD_THRESHOLD


def start_date(event: dict) -> str:
    """Return ``YYYY-MM-DD`` for a candidate or a Google event ('' if unknown).

    Handles a Google event ({"start": {"dateTime"|"date": ...}}) and a candidate
    ({"start": "2026-06-12T19:00:00"} or all-day {"start": "2026-06-12"}).
    """
    start = event.get("start")
    if isinstance(start, dict):
        val = start.get("dateTime") or start.get("date") or ""
    else:
        val = start or ""
    return val[:10]


def title_of(event: dict) -> str:
    return event.get("summary") or event.get("title") or ""


def auto_key(title: str, date: str) -> str:
    basis = f"{normalize_title(title)}|{date}"
    return hashlib.sha1(basis.encode("utf-8")).hexdigest()


def _existing_autokey(event: dict) -> str | None:
    # Google may return null (not just absent) for these on hand-added events;
    # `or {}` guards both the missing-key and present-but-null cases.
    ep = event.get("extendedProperties") or {}
    priv = ep.get("private") or {}
    return priv.get("autoKey")


def is_duplicate(candidate: dict, existing: dict) -> bool:
    """Is ``candidate`` the same event as the already-present ``existing``?"""
    ek = _existing_autokey(existing)
    if ek and ek == auto_key(title_of(candidate), start_date(candidate)):
        return True

    ct, et = title_of(candidate), title_of(existing)
    # A recurring candidate covers the whole horizon, so it is a duplicate if ANY
    # existing event shares its (fuzzy) title — regardless of date, and whether or
    # not the existing copy is flagged recurring (the connector may return an
    # expanded instance).
    if candidate.get("recurrence") and titles_match(ct, et):
        return True
    # Otherwise require the same calendar day and a fuzzy title match.
    cd = start_date(candidate)
    if cd and cd == start_date(existing) and titles_match(ct, et):
        return True
    return False


def reconcile(candidates: list[dict], existing: list[dict],
              today: str | None = None, max_new: int | None = None) -> dict:
    """Split candidates into insert / skip / dropped_past / dropped_overflow.

    - ``today`` (``YYYY-MM-DD``): candidates starting before it are dropped_past.
    - ``max_new``: inserts beyond the cap overflow into dropped_overflow, in
      input order — so upstream priority ordering is preserved, not truncated by
      authoring accident.
    Duplicates are detected against ``existing`` AND against candidates already
    accepted this run (so two variants of the same event collapse to one insert).
    Each insert is the candidate dict plus a computed ``autoKey``.
    """
    inserts: list[dict] = []
    skips: list[dict] = []
    dropped_past: list[dict] = []
    for c in candidates:
        cd = start_date(c)
        if today and cd and cd < today:
            dropped_past.append({"candidate": c, "reason": f"starts {cd}, before {today}"})
            continue
        match = next((e for e in existing if is_duplicate(c, e)), None)
        if match is None:  # collapse same-run duplicates too
            match = next((p for p in inserts if is_duplicate(c, p)), None)
        if match is not None:
            skips.append({"candidate": c, "matched": title_of(match),
                          "reason": "already present"})
            continue
        stamped = dict(c)
        stamped["autoKey"] = auto_key(title_of(c), cd)
        inserts.append(stamped)

    dropped_overflow: list[dict] = []
    if max_new is not None and len(inserts) > max_new:
        dropped_overflow = inserts[max_new:]
        inserts = inserts[:max_new]

    return {"insert": inserts, "skip": skips,
            "dropped_past": dropped_past, "dropped_overflow": dropped_overflow}


def as_event_list(obj) -> list[dict]:
    """Accept a bare list, or the wrapper objects the calendar tools return."""
    if isinstance(obj, list):
        return obj
    if isinstance(obj, dict):
        for key in ("events", "items"):
            if isinstance(obj.get(key), list):
                return obj[key]
    return []


def _main(argv: list[str]) -> int:
    import argparse
    import json

    p = argparse.ArgumentParser(
        description="Reconcile candidate events against events already on the calendar.")
    p.add_argument("candidates", help="JSON file: list of candidate events")
    p.add_argument("existing",
                   help="JSON file: calendar events (a list, or {\"events\": [...]})")
    p.add_argument("--today", metavar="YYYY-MM-DD",
                   help="drop candidates that start before this date")
    p.add_argument("--max", type=int, dest="max_new", metavar="N",
                   help="cap inserts at N; the rest go to dropped_overflow (input order)")
    p.add_argument("--explain", action="store_true",
                   help="print a human-readable summary instead of machine JSON")
    args = p.parse_args(argv)

    with open(args.candidates, encoding="utf-8") as f:
        candidates = as_event_list(json.load(f))
    with open(args.existing, encoding="utf-8") as f:
        existing = as_event_list(json.load(f))

    result = reconcile(candidates, existing, today=args.today, max_new=args.max_new)
    if args.explain:
        print(f"{len(result['insert'])} to insert, "
              f"{len(result['skip'])} skipped (already present), "
              f"{len(result['dropped_past'])} dropped (past), "
              f"{len(result['dropped_overflow'])} dropped (over cap):\n")
        for i in result["insert"]:
            print(f"  + {title_of(i)}  [{start_date(i)}]")
        for s in result["skip"]:
            print(f'  = {title_of(s["candidate"])}  ->  matches "{s["matched"]}"')
        for d in result["dropped_past"]:
            print(f'  x {title_of(d["candidate"])}  [{start_date(d["candidate"])}]  (past)')
        for d in result["dropped_overflow"]:
            print(f"  ~ {title_of(d)}  [{start_date(d)}]  (over cap)")
    else:
        print(json.dumps(result, ensure_ascii=False, indent=2))
    return 0


if __name__ == "__main__":
    import sys

    raise SystemExit(_main(sys.argv[1:]))