Citibike: Analyze yourself!

In the 5 years that I have lived in NYC, the biggest improvement anything’s made to my quality of life has been Citibike. I like being on wheels, I like being energy efficient, I like the wind in my face, I like my freedom of commute (a regular bike binds itself to you), and I like that it saves my trip data! I like to think I am a Fastridahhh, but we are not always what we think we are. So by baselining myself against this analysis, I wanted to see where I actually truly lie.

The task was simple, get the lat/long of my start and end citibike stations, use Google Distance Matrix API to figure out the biking distance between the two locations, use the time from my citibike data, and get my average speed and see how it changes with distance etc.

Once the stew was cooked, the results were fairly interesting.
My average speed was 13.7 kmph (8.5 mph), but I have done no outlier removal or anything because lazy. The median was pretty much exactly the same as average (13.56 kmph), indicating an even distro. This was a little disappointing frankly, I was only slightly faster than the overall average according to the wider analysis (8.3 mph) and my self image rightly got punched back into its place. I can make up excuses for why it is slower than I feel I ride at, but that is just being dishonest (I have already thought of three, including shifting the blame on someone else).

Most of my trips are short distances that are annoying to walk (1-2 km), with nearly 50% falling in that range, the rest are equally distributed across the other buckets.

Distance Range # Trips Average time (min) Average Speed
0 to 0.5K 8% 1.53 8.24
0.5K to 1K 21% 3.87 11.90
1K to 2K 48% 5.89 13.09
2K to 5K 22% 11.31 14.15
Greater than 5K 1% 16.12 18.84
Grand Total 100% 6.38 13.36

I also wanted to look at how speed varies with distance, as you would expect it to be highly correlated (they were, with a correlation coefficient of 0.4). But I also created a polynomial best fit line with an order of two (anything higher is just overfitting and the data seems to follow a parabolic pattern rather than linear). The R-squared sits nicely (for such unclean data with so few datapoints, especially for higher distances) at 0.265, which vindicates my hunch about it being a parabolic curve. Here’s the pretty chart for your pretty eyes to peruse:

Speed v Distance

If you want to do it too:
Please do, and let me know what your results look like.

Getting the lat/longs:
Surprisingly, Citibike doesn’t have any place that you can get the lat/longs of stations from unless you distill it from their trip data. So I did that. And here’s the file for all you guys (updated till March 2017, includes Jersey City citibike locations)
Citibike Stations Lat Long

Using the Google Distance Matrix API:
I prefer Excel for small analysis, so wanted to write a VBA code that would get me the distance between two lat/long pairs. Here’s what I made, with a little help from my friends over at StackOverflow given that I don’t really know VBA. You can create a new module, paste this code there save it, and then use the function called getDist to get the distance between the lat/long of the locations (which you’ll need to vlookup from the station lat/long map that I shared earlier)

(WordPress is being a prick and not letting me format this properly, so please insert tabs at appropriate places)

Public Function getDist(orig_lat As Double, orig_Double As Double, end_lat As Double, end_long As Double) As String

Dim xhrRequest As XMLHTTP60
Dim domDoc As DOMDocument60
Dim domDoc2 As DOMDocument60
Dim nodes As IXMLDOMNodeList
Dim node As IXMLDOMNode

'You must acquire a google api key and enter it here
Dim googleKey As String
googleKey = "XXX" 'your api key here

'Send a "GET" request for place/textsearch
Set xhrRequest = New XMLHTTP60

Dim requrl As String
requrl = "" & orig_lat & "," & orig_Double & _
"&destinations=" & end_lat & "," & end_long & "&mode=bicycling" & "&key=" & googleKey

xhrRequest.Open "GET", requrl

'Save the response into a document
Set domDoc = New DOMDocument60
domDoc.LoadXML xhrRequest.responseText

'Find the first node that is called "distance" and is the child of the "result" node. The SelectSingleNode will pick the first result, as google can return multiple options
Set ixnlDistanceNodes = domDoc.SelectSingleNode("//distance/value")

If Not ixnlDistanceNodes Is Nothing Then
getDist = ixnlDistanceNodes.Text
getDist = -1
End If

Set domDoc = Nothing
Set xhrRequest = Nothing

End Function