Hacker News new | past | comments | ask | show | jobs | submit login
Crunching subway data – a New Yorker’s busiest stations (alastair.is)
43 points by danso on June 5, 2013 | hide | past | favorite | 8 comments



I noticed some funny stuff about the data, but I haven't tracked down the source. I'd expect the entry/exit counts to increase over the course of the day, and they generally do. Sometimes, though, the latest record doesn't have the greatest number. I suspect it either odd cutoff times, i.e., not aligned to the day.

If anyone wants to play with the data, here's some stuff to start with PostgreSQL:

  create table stats (
      ca varchar(10),
      unit varchar(10),
      scp varchar(10),
      dt timestamp,
      "desc" varchar(20),
      entries integer,
      exits integer
  );

  copy stats from '<path>/output.csv' delimiter ',' csv header;
Here's query that will show the entire set of exit counts align with both their greatest and latest values:

  select
      unit, scp, dt, exits,
      max(exits) over (
          partition by unit, scp, date_trunc('day', dt)
          rows between unbounded preceding and unbounded following
      ) as largest_exits,
      last_value(exits) over (
          partition by unit, scp, date_trunc('day', dt)
          order by dt 
          rows between unbounded preceding and unbounded following
      ) as latest_exits
  from stats
  order by 1, 2, 3;
If you want to see the discrepancies I described above, just wrap it up and find where latest <> greatest:

  with x as (
  select
      unit, scp, dt, exits,
      max(exits) over (
          partition by unit, scp, date_trunc('day', dt)
          rows between unbounded preceding and unbounded following
      ) as largest_exits,
      last_value(exits) over (
          partition by unit, scp, date_trunc('day', dt)
          order by dt 
          rows between unbounded preceding and unbounded following
      ) as latest_exits
  from stats
  )
  select *
  from x
  where largest_exits <> latest_exits
  order by unit, scp, dt;


I'm a bit surprised to see Penn Station below Grand Central.

Penn Station is the most trafficked train station in North America[0], which I would imagine would lead to more subway entrances/exits, especially during rush hours.

Also, Penn Station has the A,C,E, 1, 2, and 3. Grand Central only has the 4/5/6[1]. The 4/5/6 are the only lines on the east side and are therefore fairly busy, but I find this surprising nonetheless.

[0] This includes non-subway trains, [1] Don't even get me started about the T (ie, the Second Ave. Line)! :)


Not every commuter takes the subway to and from Penn Station. The surrounding area has a lot of offices, so many people take light rail there and then just walk to and from work.


NJ Transit and the LIRR are heavy rail (as are the subway and the PATH train).


As others mention, Grand Central has the S and 7 (crosstown) which connect to Times Square.

Grand Central gets 800,000 visitors per day, getting many commuters from Long Island and Connecticut.

Grand Central is also a much bigger stopping point. It's near many more office buildings. Penn Station is in an urban area, and next to the Garden, but I don't think it's quite as dense. Many folks would exit at Times Square.

edit: And OP - Thank you for sharing the data!


Grand Central also has the 7 and S.


The 4/5/6 line is the busiest line in the country. It gets more traffic than the entirety of Washington's, Chicago's, Boston's and San Francisco's lines.


I didn't know this data was available! I wish we had the same breadth available through the MBTA. I'd be interested in using this data to:

- plan travel directions based on past congestion patterns

- pair this with any data found for NY's bus system or taxi system to map out hubs vs destination stations

- examine stations' frequency of repair and see if congestion at a station correlates with the frequency of repair; try to predict dates of repair

very cool!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: