What this is
A searchable full-text database of every collective bargaining agreement currently in force for New York City municipal employees, sourced from the Office of Labor Relations.
Why it exists
The City publishes its collective bargaining agreements as scattered PDFs. Many are scans of paper documents — the text isn't selectable, you can't even use Ctrl-F inside them. Comparing how, say, the discipline process works for sanitation workers versus teachers requires opening dozens of PDFs side-by-side. This site puts every clause from every contract in one searchable, taggable, citable place.
Sources
- NYC OLR — Recent Agreements — the authoritative City list of currently in-force agreements. We do not include the OLR "Past Agreements" archive page (those contracts have superseding successors).
If a stated term appears expired (e.g. "2017–2025" PBA), the contract is still in force under New York State's Triborough Amendment, which keeps public-sector contracts in effect until a successor is signed. Expirations are flagged but do not mean a contract has lapsed.
Pipeline
- Inventory.
scripts/inventory.pyscrapes the OLR Recent Agreements page, finds every PDF link, and assigns each one an ID, label, source URL, and stated term years. - Download.
scripts/scrape.pydownloads each PDF todata/pdfs/<id>.pdf. - Extract.
scripts/extract.pyprocesses each PDF page-by-page:- Tables are detected and rendered as pipe-delimited markdown so column boundaries survive into the search corpus. (Naive PDF text extraction tends to flatten columns into a left-to-right wall of words; this preserves wage schedules, longevity tables, etc.)
- Multi-column pages are detected via character-position analysis and split into left/right columns before extraction, so reading order is preserved.
- Each page is scored on word count and alphabetic-character ratio. If a page falls below threshold (because the source PDF has no text layer), the page is rendered at 300 dpi and run through macOS Vision OCR via the
ocrmacPython binding. Vision-based OCR handles real-world contract pages well — including stamps, signatures, and side-by-side tables. - OCR'd pages are flagged with the OCR badge in the UI so users know the text was reconstructed and may have minor errors.
- Segment.
scripts/segment.pysplits each contract into clauses using article/section heading regexes (Article I, Section 1, all-caps headings, and numbered headings). Each clause carries an article, section, heading, page number, and OCR flag. - Tag.
scripts/tag.pyapplies a topic taxonomy via keyword regex against each clause: wages, longevity, overtime, holidays, vacation, sick leave, parental leave, health & welfare, pension, grievance, discipline, layoff, hours, shift differential, uniform allowance, training, safety, no-strike, management rights, work rules, union security, recognition, promotion, telework, anti-discrimination, workforce composition. A clause can carry multiple topics. - Index. The frontend loads
data/clauses.jsonand builds a FlexSearch full-text index in the browser. No server-side query — everything runs client-side.
Coverage beyond the OLR Recent Agreements page
The bulk of this corpus comes from OLR's "Recent Agreements" page, which is the authoritative list of contracts where the City of New York is the direct employer. Several major NYC public-sector unions whose contracts are not on that page are also included, sourced directly from OLR's contract download server, the City University of New York's labor relations page, and the unions themselves:
- UFA — Uniformed Firefighters Association (~9,500 FDNY firefighters, IAFF Local 94). The most recent UFA contract on OLR is the 2017-2020 MOA; subsequent rounds are in arbitration. Economic terms for the 2022-2027 cycle are set by the Uniformed Coalition Economic Agreement.
- UFOA — Uniformed Fire Officers Association (~2,500 FDNY captains, lieutenants, battalion chiefs, IAFF Local 854). The most recent contract on OLR is the 2018-2021 Fire Officers Agreement.
- NYSNA — New York State Nurses Association (~8,500 registered nurses at NYC Health + Hospitals). Most recent published contract is the 2019-2023 Staff Nurses Agreement on OLR; the 2023+ pay-parity successor is not yet posted there.
- PSC — Professional Staff Congress (~30,000 faculty and professional staff at CUNY, AFT Local 2334). Includes both the 2023-2027 MOA and the underlying 2017-2023 agreement. Sourced from CUNY's labor relations page, since PSC bargains with CUNY (state-affiliated) rather than NYC OLR.
These five contracts cover roughly 50,000 additional NYC public-sector unionized employees. Each is flagged in the contracts directory with its source ("olr-direct" or "cuny-direct") to distinguish from the OLR Recent Agreements set.
Still missing: the post-2023 NYSNA H+H pay-parity successor agreement, and any UFA/UFOA award from the current arbitration round. Both are tracked here for future ingestion when published.
Limitations
- OCR is not perfect. Scanned pages may have spelling errors that would not appear in a clean text PDF. The OCR flag tells you when to be cautious. Always verify by clicking through to the source PDF.
- Heading detection is heuristic. If a contract uses unusual numbering or all-caps body text, segmentation may merge or split clauses incorrectly. The page number on each clause anchors it back to the source PDF for verification.
- Topic tags are keyword-based. A clause that mentions "discipline" in passing while really being about safety will get a discipline tag. Tags are a navigation aid, not a legal classification.
- Side letters. Some agreements have side letters or memoranda of understanding that aren't in the linked PDF. Where OLR publishes these as separate documents, they appear as separate entries.
- Workforce summaries are extractive (taken from the contract's recognition clause) plus headcount drawn from the New York City Comptroller's bargaining unit roster. Not all contracts have a published headcount; some show "n/a."
- DOE / NYCHA / HHC / CUNY. The OLR Recent Agreements page now consolidates major agreements across these employers. If a unit's contract isn't on OLR, it isn't here either.
- NYC Health + Hospitals (H+H) is partially covered. H+H is a public benefit corporation that sometimes negotiates contracts directly with unions instead of through OLR, and not all H+H agreements get posted to OLR. The corpus includes Doctors Council (~3,000 physicians), 1199 SEIU (~12,000 non-physician healthcare workers), and CIR (resident physicians). DC 37 affiliated locals such as Local 420 (Hospital Care Employees) and Local 768 (Health Services) cover several thousand more H+H workers under the citywide DC 37 economic agreement, which is in the corpus. NYSNA (New York State Nurses Association), which represents the ~8,000–9,000 registered nurses at H+H, is a known gap — its H+H contract is not published on OLR.
Wage tracker — verification
The wage pattern tracker shows curated GWI schedules for the 12 largest contracts. Every percentage and effective date has been verified directly against the OCR'd contract text. Each entry on that page carries a verification badge:
- Verified — both percentages and dates were quoted from the contract.
- Partially verified — percentages quoted; some dates inferred from term plus visible step-spacing.
- Set by parent (UOCEA) — SBA / DEA incorporate the Uniformed Coalition Economic Agreement; percentages are the parent's.
- PDF appendix — UFT's salary tables live in Appendix A; we link to the source PDF rather than fabricate percentages.
The $3,000 ratification bonus appears only in the civilian-pattern contracts; the uniformed pattern doesn't include it. PBA's prior 2017-2025 settlement also has no $3,000 bonus.
Natural-language Q&A
For natural-language questions across the corpus — "compare the discipline procedures for sanitation workers and teachers," "which contracts have parental leave," "what's the longest grievance timeline" — use the companion NotebookLM notebook, which has been loaded with the same Markdown corpus this site is built from. NotebookLM provides citations back to the source documents and runs on Google's infrastructure (free for users with a Google account).
This site itself is search-only by design — it doesn't call any LLM API at query time, so it stays free to host and free to use. The two tools complement each other: use the search/topic-pivot here for keyword-precise lookups and citations, use NotebookLM for cross-document synthesis questions.
Markdown export
Every contract is also published as a standalone Markdown file under /data/markdown/<contract-id>.md, with YAML frontmatter for metadata, page numbers and OCR flags inline, and tables preserved as pipe-delimited rows. A single ZIP bundle of all 94 contracts (~750 KB) is also available, plus a per-contract download button on every contract detail page. Built by scripts/export_markdown.py from the same source data the search index uses.
Refresh
The corpus is regenerated by re-running the inventory + scrape + extract + segment + tag pipeline. The footer of the home page shows the build timestamp.
Reproduce locally
git clone https://github.com/joshgreenman1973/nyc-labor-contracts cd nyc-labor-contracts python3 -m venv .venv && source .venv/bin/activate pip install pypdf pdfplumber requests beautifulsoup4 lxml pypdfium2 ocrmac python scripts/inventory.py python scripts/scrape.py python scripts/extract.py python scripts/segment.py python scripts/tag.py python scripts/build_manifest.py python -m http.server 8000 # then open http://localhost:8000