Rikshallplats data quality between Stop data and Trafiklab Realtime APIs
Hej,
I am trying to build a solution for the stops that would unite the data from two of your APIs: NeTEx Datasets > Stops data and Trafiklab Realtime APIs > Trafiklab stop lookup. Particular use case being for Samtrafiken ticket data.
I chose to map the Trafiklab stop lookup data to map against Samtrafiken given that is the level passengers know the stations and Samtrafiken provide that data by the national stop identifier (rikshallplats) number. And I would only try to link them down to the Stops data by using the Trafiklab stop lookup json field stops > code mapping it to Stops data > PrivateCode.
By doing so, found mismatching data between the API's - could you, please, verify, if this is data quality issue on Trafiklab's side or I am using false assumptions during the interpretation of the data?
The data are attached in the excel file "Trafiklab_data_differences_20260209.xlsx" - it contains only the faulty lines, with the comment of the probable mismatch in column Q (Mismatching values).
On the left are the data from Trafiklab Realtime APIs > Trafiklab stop lookup with blue header and
on the right - NeTEx Datasets > Stops data in green.
Followed by column checks (L.. P) and Mismatching values (Q).
Best regards,
Girts Pulins
I am trying to build a solution for the stops that would unite the data from two of your APIs: NeTEx Datasets > Stops data and Trafiklab Realtime APIs > Trafiklab stop lookup. Particular use case being for Samtrafiken ticket data.
I chose to map the Trafiklab stop lookup data to map against Samtrafiken given that is the level passengers know the stations and Samtrafiken provide that data by the national stop identifier (rikshallplats) number. And I would only try to link them down to the Stops data by using the Trafiklab stop lookup json field stops > code mapping it to Stops data > PrivateCode.
By doing so, found mismatching data between the API's - could you, please, verify, if this is data quality issue on Trafiklab's side or I am using false assumptions during the interpretation of the data?
The data are attached in the excel file "Trafiklab_data_differences_20260209.xlsx" - it contains only the faulty lines, with the comment of the probable mismatch in column Q (Mismatching values).
On the left are the data from Trafiklab Realtime APIs > Trafiklab stop lookup with blue header and
on the right - NeTEx Datasets > Stops data in green.
Followed by column checks (L.. P) and Mismatching values (Q).
Best regards,
Girts Pulins
Följ inlägget
2
följare
Some of these are probably taken out of use while many others may have been created by accident when merging the different data sources and since corrected.
Secondly you don't differentiate between rikshållplatser and meta stops. Meta stops contain rikshållplatser and all start with 740098 in their riskhållplatsnumbers. These are for some reason not in the stops dataset (It would be really nice if they were), but are avaliable in the Sales dataset and GTFS Sweden 3. The meta stops are however exported in the same level as rikshållplatser in GTFS Sweden 3 and the realtime apis.
Uppsala (meta stop) => Uppsala Centralstation (rhp) => Uppsala Centralstation (stopplace/stop) => children for different modes
Kind regardes,
Sigurd Stenberg
Thank you for prompt answer and pointing to flaws in my comparison.
The <ToDate> was something I had totally missed in the file's schema - currently adding this.
And the explanation of the lack of meta stops in Stops data set, too.
I agree, that would be very nice to have them. However, good to know this.
Accounting for meta / rikshållplatser immediatelly shrunk the difference to 61 records (from 1000++).
Few checks pointed that some of the remaining ones will be the <ToDate> case, and some could be me not comparing data sets with latest data (coincidentally managed to fetch data just after sync from one source, and prior to daily update from another).
Will do the final comparison once the <ToDate> is properly populated.
Best regards,
Girts Pulins