Historical realtime data for a specific route
Hello Trafiklab team,
I would like to plot the history (going back of 2/3 years) of the scheduled vs real departures for few, around 5, selected routes.
After reading the description I decided to use the KoDA database as it's the one that contains historical real-time data, while GTFS only has the data at the moment it's queried.
example:
url = f"https://api.koda.trafiklab.se/KoDa/api/v2/gtfs-static/{COMPANY}?date={DATE}&key={KEY}"
I used KoDA static to get route_id, trip_id, stop_id and associated planned arrival/departure times.
To get the real departure times, I used the Koda real time:
url = f"https://api.koda.trafiklab.se/KoDa/api/v2/gtfs-rt/{OPERATOR}/{FEED}?date={DATE}&key={KEY} & hour={TIME}", where TIME = [0-23], FEED="TripUpdates"
Even for a single operator and day, this query results in very big unzipped files as it contains multiple updates per hour for all stations and routes.
I tried to add &route_id={ROUTE_ID} in the url but it didn't seem to work.
My question is:
0)Is KoDA the optimal dataset to achieve my goal?
1) Is it possible to limit the queried data by only including specific stops and trip ids? It didnt seem to work for me.
2) Is it possible to just query the last update for each hour (as it should be the closest to the real departure) instead of all the available data ?
Thank you for your attention.
I would like to plot the history (going back of 2/3 years) of the scheduled vs real departures for few, around 5, selected routes.
After reading the description I decided to use the KoDA database as it's the one that contains historical real-time data, while GTFS only has the data at the moment it's queried.
example:
url = f"https://api.koda.trafiklab.se/KoDa/api/v2/gtfs-static/{COMPANY}?date={DATE}&key={KEY}"
I used KoDA static to get route_id, trip_id, stop_id and associated planned arrival/departure times.
To get the real departure times, I used the Koda real time:
url = f"https://api.koda.trafiklab.se/KoDa/api/v2/gtfs-rt/{OPERATOR}/{FEED}?date={DATE}&key={KEY} & hour={TIME}", where TIME = [0-23], FEED="TripUpdates"
Even for a single operator and day, this query results in very big unzipped files as it contains multiple updates per hour for all stations and routes.
I tried to add &route_id={ROUTE_ID} in the url but it didn't seem to work.
My question is:
0)Is KoDA the optimal dataset to achieve my goal?
1) Is it possible to limit the queried data by only including specific stops and trip ids? It didnt seem to work for me.
2) Is it possible to just query the last update for each hour (as it should be the closest to the real departure) instead of all the available data ?
Thank you for your attention.
Följ inlägget
1
följare
KoDA is the right (and only) source for this kind of data. Koda archives all realtime data, which for delay information means a snapshot is taken every 15 seconds. This way, you can see exactly how much delay a trip had at a given time.
You need to download the complete dataset for a given day, then you can process and filter which data you are interested in for your use case.In your case you can choose to only read one realtime file for each hour, but since passed points are not kept in the realtime data for more than 10 minutes, you risk not having the actual passing time at all points.
Regards,
Bert
Thanks for the information.
As mentioned my aim was to get some statistics of delays for a few route_ids over the course of the past 1-2 years.
When I tried to download the regional realtime traffic data from KoDA gtfs-rt for a certain day the query took 1hr before being ready for download: do I need to run a query for for each day or I can query multiple days at the same time?
Best regards