Potentially missing data in GTFS Sverige 2
Hello,
I am working for an airline, and since we use trains in Sweden a lot to move our crew around, I'm working on transforming your data into SSIM format. This is the only format our crew planning system accepts.
While doing some spot checking to see if I got the correct data, I noticed that there were far fewer connections than I would have anticipated.
I tried doing some statistics on the dataset, but since this is my first time working with GTFS data, I might have overlooked something. What I found surprised me: on June 15, I counted 63 departures on sj.se between Malmö C and Copenhagen Airport, but in the data, I am only able to find a fraction of that number.
Can you please verify that my approach is sound, and if it is, try to have a look at the data to see if my findings are correct?
My approach was:
Best regards,
Claus
I am working for an airline, and since we use trains in Sweden a lot to move our crew around, I'm working on transforming your data into SSIM format. This is the only format our crew planning system accepts.
While doing some spot checking to see if I got the correct data, I noticed that there were far fewer connections than I would have anticipated.
I tried doing some statistics on the dataset, but since this is my first time working with GTFS data, I might have overlooked something. What I found surprised me: on June 15, I counted 63 departures on sj.se between Malmö C and Copenhagen Airport, but in the data, I am only able to find a fraction of that number.
Can you please verify that my approach is sound, and if it is, try to have a look at the data to see if my findings are correct?
My approach was:
- Filter stop_times.txt to identify trip_id where the stop_sequence for stop_id 740000003 (Malmö C) was lower than the stop_sequence for stop_id 860000858 (Copenhagen Airport).
- In trips.txt, find the service_id for the identified trip_ids.
- In calendar_dates.txt, identify the dates associated with the said service_id.
- Filter for departures between midnight and midnight to capture all departures on June 15, even if the trip started on June 14.
Best regards,
Claus
Följ inlägget
1
följare
Your approach seems to be correct, but I can only find a few direct connections on sj.se when searching departures on june 15. Most connections have a transfer in Hyllie, meaning the journey consists of two different trips. On the 14th, traffic seems to be as usual, with a lot of direct trains.
Regards,
Bert
Thanks a lot for taking the time to look into this and to answerme.
I see, that makes a lot of sense. If I wanted to include trips like these, with let's say a maximum of one change of transportation, but I don't know where the change will take place, is there a brilliant way to find these connections? Or is an exhaustive brute force on the data necessary?
Thanks again.
Claus
This would require the implementation of a route-planning algorithm (which could be simplified if you are only interested in routes between the two given stations with at most one stop in between).
If you want all routes from A to B with only one stop in between, you could look at the departing trips from A, then keep all of trip A's stops in memory, then look at all departures from these stops after trip A has arrived + there has been time for a transfer, then see if B is along the stops on these possible next trips.
If you know there only will be one change, you can optimize this process by only ever checking trips wich include either the origin or destination in their stops.
Regards,
Bert