I noticed some funny stuff about the data, but I haven't tracked down the source. I'd expect the entry/exit counts to increase over the course of the day, and they generally do. Sometimes, though, the latest record doesn't have the greatest number. I suspect it either odd cutoff times, i.e., not aligned to the day.
If anyone wants to play with the data, here's some stuff to start with PostgreSQL:
create table stats (
ca varchar(10),
unit varchar(10),
scp varchar(10),
dt timestamp,
"desc" varchar(20),
entries integer,
exits integer
);
copy stats from '<path>/output.csv' delimiter ',' csv header;
Here's query that will show the entire set of exit counts align with both their greatest and latest values:
select
unit, scp, dt, exits,
max(exits) over (
partition by unit, scp, date_trunc('day', dt)
rows between unbounded preceding and unbounded following
) as largest_exits,
last_value(exits) over (
partition by unit, scp, date_trunc('day', dt)
order by dt
rows between unbounded preceding and unbounded following
) as latest_exits
from stats
order by 1, 2, 3;
If you want to see the discrepancies I described above, just wrap it up and find where latest <> greatest:
with x as (
select
unit, scp, dt, exits,
max(exits) over (
partition by unit, scp, date_trunc('day', dt)
rows between unbounded preceding and unbounded following
) as largest_exits,
last_value(exits) over (
partition by unit, scp, date_trunc('day', dt)
order by dt
rows between unbounded preceding and unbounded following
) as latest_exits
from stats
)
select *
from x
where largest_exits <> latest_exits
order by unit, scp, dt;
I'm a bit surprised to see Penn Station below Grand Central.
Penn Station is the most trafficked train station in North America[0], which I would imagine would lead to more subway entrances/exits, especially during rush hours.
Also, Penn Station has the A,C,E, 1, 2, and 3. Grand Central only has the 4/5/6[1]. The 4/5/6 are the only lines on the east side and are therefore fairly busy, but I find this surprising nonetheless.
[0] This includes non-subway trains,
[1] Don't even get me started about the T (ie, the Second Ave. Line)! :)
Not every commuter takes the subway to and from Penn Station. The surrounding area has a lot of offices, so many people take light rail there and then just walk to and from work.
As others mention, Grand Central has the S and 7 (crosstown) which connect to Times Square.
Grand Central gets 800,000 visitors per day, getting many commuters from Long Island and Connecticut.
Grand Central is also a much bigger stopping point. It's near many more office buildings. Penn Station is in an urban area, and next to the Garden, but I don't think it's quite as dense. Many folks would exit at Times Square.
The 4/5/6 line is the busiest line in the country. It gets more traffic than the entirety of Washington's, Chicago's, Boston's and San Francisco's lines.
If anyone wants to play with the data, here's some stuff to start with PostgreSQL:
Here's query that will show the entire set of exit counts align with both their greatest and latest values: If you want to see the discrepancies I described above, just wrap it up and find where latest <> greatest: