Methodology

Where crime and darkness overlap

Methodology, data sources, and implementation details for the NYC street-lighting and violent-crime overlay tool.

Contents

Purpose and analytical framing
Data
Defining nighttime
Crime inclusion criteria
Spatial unit of analysis
Processing nighttime lighting data
Calculating crime intensity at the grid level
Threshold selection and overlay logic
Aggregation to administrative geographies
Interactive mapping and technical implementation
Limitations
Data sources

Purpose and analytical framing

This project analyses the spatial relationship between nighttime lighting conditions and nighttime violent crime across New York City to identify places where relatively low ambient lighting and elevated crime overlap. The central aim is practical rather than causal. The project takes a descriptive overlay approach. It does not attempt to estimate the causal effect of lighting on crime, nor does it claim that lighting improvements alone would reduce violence in any given location.

Data

The analysis integrates multiple datasets that capture nighttime lighting, violent crime, administrative geography, land use, housing context, and resident-reported lighting failures. The core lighting input comes from NASA's VIIRS Black Marble Level-3 product, VNP46A1, which provides cloud-filtered and radiance-corrected measures of nighttime brightness at roughly 500-meter resolution. These monthly files were compiled across the 2022 to 2024 period and used to create a stable multi-year measure of ambient nighttime lighting across the city. Because detailed public streetlight inventory data were not available at the spatial resolution needed for a citywide block-sensitive analysis, the VIIRS product served as the best available source for consistent large-area measurement of nighttime brightness.

The crime component of the analysis comes from NYPD complaint data covering 2022 through 2024. The analysis focuses on five offense categories that are both serious and plausibly relevant to public-space nighttime conditions: misdemeanor assault, robbery, felony assault, homicide, and rape. Each complaint record includes the timing and location of occurrence, which allows incidents to be filtered both temporally and spatially. Only incidents occurring during nighttime hours and in outdoor settings were retained.

In addition to the two primary analytic datasets, the project incorporated several contextual layers. Land use from PLUTO was used to assess whether priority areas were concentrated in particular built environments such as one- and two-family residential areas, mixed-use corridors, industrial areas, or open space. Neighborhood Tabulation Areas, police precincts, community districts, and council districts were included to support aggregation into administratively meaningful geographies. Business Improvement Districts, NYCHA housing developments, Cure Violence program sites, and scaffolding locations were incorporated as additional contextual overlays that may help users interpret local conditions or existing interventions. The tool also includes 311 complaint data from 2022 through 2024 related specifically to reports of streetlights being out. These complaints were geocoded and mapped as an additional resident-reported signal of lighting problems on the ground. Rather than display every complaint individually, the map highlights locations with twenty or more complaints to identify areas with a persistent concentration of reported lighting outages. A table with all data sources is available at the end of this methodology.

Defining nighttime

Rather than using a fixed set of clock hours across the calendar year, the analysis uses civil twilight to account for seasonal variation in daylight. For each month from 2022 through 2024, the average time of civil twilight end and civil twilight begin across New York City was calculated. Civil twilight end marks the point in the evening when natural light has faded sufficiently that artificial lighting becomes operationally relevant. Civil twilight begin marks the point in the morning when daylight returns. Only incidents occurring between these two points were classified as nighttime incidents and retained for analysis.

Crime inclusion criteria

Misdemeanor assault, robbery, felony assault and homicide were selected because they represent serious violence or high-harm interpersonal offenses that commonly occur in public space. Rape was excluded because the incident is often geocoded to the police precinct instead of where it occurred. After restricting incidents to nighttime, the crime dataset was further narrowed to events occurring outdoors. Incidents reported as occurring inside buildings were excluded. The included outdoor location types covered a range of settings such as streets, highways, bridges, tunnels, open lots, parks, playgrounds, cemeteries, parking lots, bus stops, bus terminals, ferry terminals, taxis, marinas, piers, mobile food locations, mailboxes outside, and construction sites.

Spatial unit of analysis

The spatial backbone of the project is a uniform grid system derived from the resolution of the VIIRS satellite product. New York City was divided into 5,559 grid cells, each approximately 500 meters by 500 meters. Each grid cell represents average brightness across several city blocks rather than the condition of any specific lamp, corner, or street segment. Crime incidents were spatially joined to these grid cells so that each grid could be assigned a total number of nighttime violent crimes over the study period.

Processing nighttime lighting data

For the lighting analysis, monthly VIIRS Black Marble VNP46A1 composites were compiled from 2022 through 2024. These files contain radiance values representing nighttime surface brightness. Observations flagged for cloud cover or low quality were removed before aggregation, and the data were clipped to the New York City boundary so that only relevant grid cells were retained. For each 500-meter grid cell, the monthly average radiance was calculated from all valid nighttime observations. The monthly files were then combined to create a unified dataset containing a stable three-year average lighting estimate for every grid in the city. The final lighting field therefore reflects persistent ambient brightness rather than momentary fluctuation.

Calculating crime intensity at the grid level

After temporal and spatial filtering, crime incidents were assigned to grid cells through a spatial join. For each cell, the total number of nighttime violent crimes across the full 2022 to 2024 period was calculated.

Threshold selection and overlay logic

The project defines low lighting using the 50th percentile of citywide lighting and high crime using the 80th percentile of nighttime violent crime counts. The choice of the lighting threshold reflects a deliberate decision to cast a broader net. Restricting low lighting to only the bottom 20 percent of grids would have yielded too few cases for meaningful citywide analysis. Using the median instead identifies places that are dim relative to half the city and therefore may still warrant attention from an infrastructure perspective.

The choice of the 80th percentile for crime is grounded in the well-established empirical reality that crime is disproportionately concentrated in a relatively small number of micro-places. This follows the general logic of the 80/20 rule and long-standing criminological research on spatial concentration. Priority grids are then defined as those that satisfy both conditions simultaneously.

Aggregation to administrative geographies

Although the grid is the core analytic unit, the project also aggregates results to larger geographies so that findings can be translated into forms that are more useful for agency planning and policy discussion. Grid-level results were summarized to Neighborhood Tabulation Areas, police precincts, community districts, and council districts using spatial joins. The analysis computes both the number of priority grids and the total number of nighttime violent crimes contained within those grids.

Interactive mapping and technical implementation

The interactive dashboard is built using Mapbox GL JS and Mapbox Studio tilesets. The grid polygons, crime points, and boundary layers are hosted as vector layers, while front-end interactivity is handled through the map client. Python, including pandas, geopandas, and numpy, was used for data cleaning, grid construction, spatial joins, and percentile calculations. The interface is organized into logical layer groups that separate core grid-based outputs from contextual and boundary layers. This structure was intended to keep the default view focused on the main analytical output while allowing deeper exploration as needed.

The dashboard includes dynamic sliders for both lighting and crime thresholds. The lighting slider allows users to choose the percentile below which a grid is considered low lighting, while the crime slider allows users to choose the percentile above which a grid is considered high crime. These controls are linked to the overlay layer through a filter expression that updates in real time. The tool also displays the actual radiance value and crime count corresponding to the selected percentiles, which helps bridge interpretability for technical and nontechnical users alike.

Hover tooltips provide core information for each grid, including the grid ID, total crime count, average lighting value, and Moran's I category where applicable. Boundary tooltips display the area name. A hover-priority system ensures that the most relevant layer is surfaced when multiple layers overlap. Layer toggles, collapsible control groups, and a dynamic legend were all included to reduce clutter and make the interface easier to navigate.

Limitations

First, the analysis is descriptive and cannot support causal claims. The fact that low lighting and high crime co-occur in some places does not mean that low lighting caused those crimes, nor does it mean that improving lighting alone would reduce future violence. Many other social, environmental, and institutional factors shape crime patterns.

Second, the lighting data operates at a 500-meter spatial resolution. This is appropriate for citywide comparative analysis, but it is not precise enough to identify individual malfunctioning fixtures, midblock dark spots, or uneven illumination within a single grid cell. Third, the crime data reflects reported incidents only and may undercount violence in some places. Indoor and outdoor classification also depends on reporting accuracy. Fourth, some contextual layers are time-bound and may not fully reflect current conditions. For example, Cure Violence data in the source material extends only through 2021. Finally, the absence of detailed public streetlight infrastructure data means the project cannot directly assess fixture-level conditions or maintenance patterns. For all of these reasons, the tool should be understood as a screening and prioritization mechanism that is meant to be followed by field assessment, not as a stand-alone basis for capital allocation.

Data sources

Data	Origin	Scope	Data type	Notes
NYPD Crime Complaints	NYC Open Data	2022–2024	Geocoded	Felony assault, robbery, homicide and misdemeanor assault
Satellite Imaging	NASA Earth Data	2022–2024	Level-3 image files	VIIRS Black Marble VNP46A1
Land Use Data (PLUTO)	Primary Land Use Tax Lot Output	2025	Geocoded	Parcel-level zoning
Neighborhood Tabulation Areas (NTAs)	NYC Dept. of City Planning	2020	Geocoded	Neighborhood aggregation
Police Precincts	NYC Open Data	2025	Geocoded	—
Community Districts	NYC Open Data	2025	Geocoded	—
Council Districts	NYC DCP Mapping Portal	2025	Geocoded	—
Business Improvement Districts (BIDs)	NYC Open Data	2025	Geocoded	—
Cure Violence Program	NYC Council	2012–2021	Geocoded	Community safety catchments
Scaffolding Sites	NYC Open Data	Current	Geocoded	Contextual overlay
NYCHA Housing Developments	NYC Open Data	2025	Polygon shapefile	—
311 Streetlight Outage Complaints	NYC Open Data	2022–2024	Geocoded	Locations with 20 or more complaints