Can we predict ridership?

To provide the most accurate prediction of daily ridership (number of entries) per station, it is crucial to come up with the right set of features and methods for model selection. Intuitively, we know that the recent past goes a long way toward predicting the near future. We augmented that insight with data about weather patterns and sporting events in hopes of capturing their variability in our models and determining the incremental benefit each of these provide. Python’s toolset allows us to easily fit and tune multiple regression models. This permits us to select the best model for each T station that predicts entries for each day, when evaluated against held-out testing data.


Below is an embedded IPython Notebook - please download or scroll through the notebook for the detailed analysis.