Preprocessing

Introduction

In this project, we present the demonstration of Advanced Urban Public Transportation System with two applications - automatic bus-stop detection and bus arrival time prediction. It includes the implementation of preprocessing steps to address the problem of GPS outage and unavailability of GPS. Further, we present an interactive implementation and demonstration of automatic bus-stop detection and bus arrival time prediction. The demonstration is based upon our work “Advanced Urban Public Transportation System for Indian Scenarios” [1].

System model

GenericArchitecture.jpg

The system model consists of three modules 1) Bus module, 2) Server module, and 3) Commuter module. In the following, we describe these three modules.

Bus module The bus module is implemented as an application installed in an Android operating system based smartphone carried in the bus. It publishes the real-time location of a bus periodically (every second) to a server through a publish-subscribe mechanism based Message Queuing Telemetry Transport (MQTT) broker.

Server module The server module receives the real-time location updates of all the buses through the *EMQ* publish-subscribe broker. It applies the preprocessing steps to clean the data, stores the location data into MongoDB database, applies bus-stop detection algorithm, and arrival time prediction algorithm based on travel time estimates computed using historical trips.

Commuter module The commuter module permits a commuter to subscribe for the real-time updates from one or more ongoing trips. The MQTT messaging protocol is used for all the interactions among commuter and server modules.

In this project, we shall begin by preprocessing the bus location records collected using the bus module. Further, we will apply automatic bus-stop detection, travel time estimation, and bus arrival time prediction and develop an interactive demonstration of bus-stop detector and arrival time predictor.

Store raw location records into MongoDB

[1]:
'''Import and initialize MongoClient'''
from pymongo import MongoClient
con = MongoClient()

Bound definition

The data collection was performed on the college shuttle bus plying between ISCON, Ahmedabad and PDPU, Gandhinagar for morning and evening trips. The morning trip usually begins at 7:15 - 7:30 from ISCON towards PDPU. This direction is termed as *North bound* for our application. Likewise, the evening trip usually begins at 18:15 - 18:30 from PDPU towards ISCON and is termed as *South bound* direction for our application.

We start with storing the raw location records (available in .txt) from the bus module into MongoDB database. By “raw” location records, we mean the location records are as they were stored in the bus module android application and we have not applied preprocessing on it. Three versions of bus module application were used for recording location traces with minor changes. In the first version RawRecords, the time was recorded in ‘dd Month YYYY hh mm ss’ string format (for eg: 8 Jan 2018 07:41:43). Whereas in the second version RawRecordEpoch, the time was recorded in the epoch format. and in the third version RawRecordEpochSpeed, the additional parameter GPS Speed was recorded. The raw location records corresponding to each version are stored separately in the folder corresponding to the version name.

The location records file can contain one or more trip(s). Thus, before working with the location records, the records need to be separated as per the trips. The logic that we have used in our work is to check for the time difference of more than 30 min. between two consecutive records. If the time difference is more than 30 min., then they are considered as two different trips. The function ReadLocationRecordsAndSeparateIntoSegement reads the raw location records and separates the raw location records into different trip records using the above-mentioned logic.

Subsequently, it saves the separated trips records into MongoDB with dd_mm_yyyy__hh_mm_ss.RawRecords as a collection name. Here dd_mm_yyyy__hh_mm_ss represents the start time of the trip and RawRecords indicates that the given collection is of raw location records.

Further, status information related to every trip is maintained after every operation in TripInfo Collection. This is used at every stage to extract relevant records at each stage of execution.

[2]:
RouteName='Git_ISCON_PDPU'

If one has executed the current notebook and have created MongoDB database previously then the following code needs to be executed for creating the fresh MongoDB database

[3]:
'''
Used for deleting the location record database from MongoDB
in case one have created the database earlier by executing the below codes.'''
#con.drop_database(RouteName)

'''
In the same way remove the Processed location record with GPS speed
'''
#import os
#path = "/".join(os.getcwd().split('/')) + "/LocationRecords/RawRecordEpochSpeedProcessed"
#for file in [f for f in os.listdir(path)]:
#    os.remove(path+"/"+file)
[3]:
'\nIn the same way remove the Processed location record with GPS speed\n'
[4]:
import os
import sys

sys.path.append("/".join(os.getcwd().split('/')) +'/Codes/LibCodes')

'''Import project specific library'''
import ReadSeparateTripMongo

path = "/".join(os.getcwd().split('/')) + "/LocationRecords"

'''Read location records folders'''
BusModuleVersion = [f for f in os.listdir(path) if '.md' not in f]
[5]:
'''For updating the lib changes effects'''
'''
import importlib
importlib.reload(ReadSeparateTripMongo)
'''
[5]:
'\nimport importlib\nimportlib.reload(ReadSeparateTripMongo)\n'
[6]:
'''Read location records and separate them into trips. Subsequently store them into MongoDB'''
for RecordType in BusModuleVersion:
    LocationRecordDir = '/'.join([path, RecordType])
    for fileName in [f for f in os.listdir(LocationRecordDir)]:

        if RecordType == 'RawRecordEpochSpeed':

            ReadSeparateTripMongo.HandlerForNALocation(fileName,
                                                       LocationRecordDir,
                                                       LocationRecordDir + 'Processed')

            ReadSeparateTripMongo.ReadLocationRecordsAndSeparateIntoSegement(RouteName,
                                                                             fileName,
                                                                             LocationRecordDir + 'Processed',
                                                                             RecordType)

        else:
            ReadSeparateTripMongo.ReadLocationRecordsAndSeparateIntoSegement(RouteName,
                                                                             fileName,
                                                                             LocationRecordDir,
                                                                             RecordType)

Reading file: ISCON_PDPU+1?29_01_2018_16_03_04.txt
Reading file: ISCON_PDPU+1?07_02_2018_09_29_00
Reading file: ISCON_PDPU+1?18_01_2018_07_38_10
Reading file: ISCON_PDPU+1?22_12_2017_07_38_21
Reading file: ISCON_PDPU+1?19_12_2017_18_41_16
Reading file: ISCON_PDPU+1?08_01_2018_07_41_43
Reading file: ISCON_PDPU+1?27_12_2017_07_55_49
Reading file: ISCON_PDPU+1?12_02_2018_08_47_22.txt
Reading file: ISCON_PDPU+1?15_02_2018_16_08_07.txt
Reading file: ISCON_PDPU+1?21_02_2018_16_49_58.txt
Reading file: ISCON_PDPU+1?23_03_2018_08_47_23
Reading file: ISCON_PDPU+1?02_04_18_01_51_00
Reading file: ISCON_PDPU+1?23_03_2018_08_47_22
Reading file: ISCON_PDPU+1?14_02_2018_12_39_44.txt
Reading file: ISCON_PDPU+1?22_02_2018_12_06_55.txt
Reading file: ISCON_PDPU+1?12_02_2018_08_47_22.txt
Reading file: ISCON_PDPU+1?15_02_2018_16_08_07.txt
Reading file: ISCON_PDPU+1?21_02_2018_16_49_58.txt
Reading file: ISCON_PDPU+1?23_03_2018_08_47_23
Reading file: ISCON_PDPU+1?02_04_18_01_51_00
Reading file: ISCON_PDPU+1?23_03_2018_08_47_22
Reading file: ISCON_PDPU+1?14_02_2018_12_39_44.txt
Reading file: ISCON_PDPU+1?22_02_2018_12_06_55.txt

Note that ReadSeparateTripMongo.HandlerForNALocation and ReadSeparateTripMongo.ReadLocationRecordsAndSeparateIntoSegement are the project specific library function. One can find help related to project specific functions by executing FunctionName?. For instance, on executing the below cell, the help window related to function will pop-up. For further reference, one can also look the library code file in the LibCode directory as mentioned in the file field on executing the below cell.

[7]:
ReadSeparateTripMongo.ReadLocationRecordsAndSeparateIntoSegement?

MongoDB Collection record and it’s representation

Let us now look at the Trip collection record for one of the trip (let say trip: *29_12_2017__07_37_27*)

[8]:
[rec for rec in con[RouteName]['TripInfo'].find({'SingleTripInfo':'29_12_2017__07_37_27'})]
[8]:
[{'_id': ObjectId('5da547d552d4e70f7c4d3f28'),
  'SingleTripInfo': '29_12_2017__07_37_27',
  'filteredLocationRecord': False,
  'DBSCANOp': False,
  'segments': -1,
  'segmentsTimeStamp': []}]

Here, the keys represent the status of the operations applied on the location records. For instance, key filteredLocationRecord represents whether the given location records are filtered or not, DBSCANOp represents whether the DBSCAN based stoppage detection algorithm is applied on the collection records or not, Segment represents the number of segments in the location record, after applying interpolation and segmentation procedure. Concretely, the procedure segments the location record if it founds the GPS outage in the location record (the procedure is described in subsequent subsection). Likewise, segmentsTimeStamp determines the time stamp corresponding to segments in the location records to avoid GPS outage from consideration in all the subsequent procedures. Further, at different stages of modules, the status flags are computed for all the trips to keep the track of operation applied to a given trip. Subsequently, the modules extract the location record by querying the MongoDB collection TripInfo with the status flag values. For instance, to extract the trips on which filtering is not applied we would query MongoDB as follows:

[9]:
SingleTripsInfo = [rec['SingleTripInfo'] for rec in con[RouteName]['TripInfo'].find({'filteredLocationRecord': False})]
[10]:
print(SingleTripsInfo)
['29_01_2018__07_39_47', '30_01_2018__07_42_30', '01_02_2018__07_39_12', '02_02_2018__07_38_50', '18_01_2018__07_38_10', '19_01_2018__07_38_47', '22_01_2018__07_41_04', '22_12_2017__07_38_21', '22_12_2017__18_38_34', '26_12_2017__07_32_35', '19_12_2017__18_41_16', '20_12_2017__07_38_14', '20_12_2017__18_31_19', '21_12_2017__07_52_59', '08_01_2018__07_41_43', '08_01_2018__18_37_49', '09_01_2018__07_40_01', '27_12_2017__07_55_48', '29_12_2017__07_37_27', '01_01_2018__07_38_27', '12_02_2018__07_40_14', '14_02_2018__18_30_22', '15_02_2018__07_45_52', '15_02_2018__16_08_22', '15_02_2018__18_33_19', '16_02_2018__07_45_41', '19_02_2018__07_46_19', '20_02_2018__07_41_48', '20_02_2018__18_31_07', '21_02_2018__07_42_42', '13_03_2018__07_29_52', '14_03_2018__07_35_46', '20_03_2018__07_28_45', '28_03_2018__18_39_21', '21_03_2018__07_32_39', '21_03_2018__18_32_40', '22_03_2018__07_38_43', '14_02_2018__07_41_04', '21_02_2018__18_28_29', '22_02_2018__07_42_45', '12_02_2018__07_40_14', '14_02_2018__18_30_22', '15_02_2018__07_45_52', '15_02_2018__16_08_22', '15_02_2018__18_33_19', '16_02_2018__07_45_41', '19_02_2018__07_46_19', '20_02_2018__07_41_48', '20_02_2018__18_31_07', '21_02_2018__07_42_42', '13_03_2018__07_29_52', '14_03_2018__07_35_46', '20_03_2018__07_28_45', '28_03_2018__18_39_21', '21_03_2018__07_32_39', '21_03_2018__18_32_40', '22_03_2018__07_38_43', '14_02_2018__07_41_04', '21_02_2018__18_28_29', '22_02_2018__07_42_45']

Let us now look at the fields of collection record for a raw location record

[11]:
[rec for rec in con[RouteName][SingleTripsInfo[0]+'.RawRecords'].find().limit(1)]
[11]:
[{'_id': ObjectId('5da547ce52d4e70f7c4ce0ea'),
  'epoch': 1517191787000.0,
  'Longitude': 72.508215,
  'Latitude': 23.03014,
  'Accuracy': 6.599999904632568}]

Here, _id is the unique object id assigned by MongoDB, Longitude, Latitude corresponds to the location attributes and Accuracy is the accuracy of location record in meters

Filtering

We have stored the location records in MongoDB collections, let us now look at the filtering preprocessing steps. We will remove the outlier location record from the trip records, and then apply segmentation and interpolation procedures to detect and handle GPS outage and unavailability. Concretely, we will interpolate the location records if the unavailability is for the smaller duration and smaller interval, else will separate the location records into different segments.

Outlier removal

A given location record is considered as an outlier and removed if

\[ac > \bar{ac} + 2 \times \sigma_{ac}\]

where \(\bar{ac}\) and \(\sigma_{ac}\) is the mean and deviation of accuracy considering all the location records of a trip, respectively.

Segmentation and interpolation

If the consecutive location records are separated by lesser duration (\(<15\) seconds) or lesser distance (\(<50\) m) then apply interpolation. Else, we separate the location records into different segments and update the information related to segment into the trip status information record of TripInfo collection.

Note that we have extracted the trips for which filtering is not done using the code

SingleTripsInfo = [rec['SingleTripInfo'] for rec in
                   con[RouteName]['TripInfo'].find({'filteredLocationRecord': False})]

Now, we shall appy filtering into these SingleTripsInfo.

[12]:
import Preprocessing
[13]:
'''For updating the lib changes effects'''
#importlib.reload(Preprocessing)
[13]:
'For updating the lib changes effects'
[14]:
for SingleTripInfo in SingleTripsInfo:
    Preprocessing.ApplyFiltering(RouteName,SingleTripInfo)
Executing filtering on 29_01_2018__07_39_47
Executing filtering on 30_01_2018__07_42_30
Executing filtering on 01_02_2018__07_39_12
Executing filtering on 02_02_2018__07_38_50
Executing filtering on 18_01_2018__07_38_10
Executing filtering on 19_01_2018__07_38_47
Executing filtering on 22_01_2018__07_41_04
Executing filtering on 22_12_2017__07_38_21
Executing filtering on 22_12_2017__18_38_34
Executing filtering on 26_12_2017__07_32_35
Executing filtering on 19_12_2017__18_41_16
Executing filtering on 20_12_2017__07_38_14
Executing filtering on 20_12_2017__18_31_19
Executing filtering on 21_12_2017__07_52_59
Executing filtering on 08_01_2018__07_41_43
Executing filtering on 08_01_2018__18_37_49
Executing filtering on 09_01_2018__07_40_01
Executing filtering on 27_12_2017__07_55_48
Executing filtering on 29_12_2017__07_37_27
Executing filtering on 01_01_2018__07_38_27
Executing filtering on 12_02_2018__07_40_14
Executing filtering on 14_02_2018__18_30_22
Executing filtering on 15_02_2018__07_45_52
Executing filtering on 15_02_2018__16_08_22
Executing filtering on 15_02_2018__18_33_19
Executing filtering on 16_02_2018__07_45_41
Executing filtering on 19_02_2018__07_46_19
Executing filtering on 20_02_2018__07_41_48
Executing filtering on 20_02_2018__18_31_07
Executing filtering on 21_02_2018__07_42_42
Executing filtering on 13_03_2018__07_29_52
Executing filtering on 14_03_2018__07_35_46
Executing filtering on 20_03_2018__07_28_45
Executing filtering on 28_03_2018__18_39_21
Executing filtering on 21_03_2018__07_32_39
Executing filtering on 21_03_2018__18_32_40
Executing filtering on 22_03_2018__07_38_43
Executing filtering on 14_02_2018__07_41_04
Executing filtering on 21_02_2018__18_28_29
Executing filtering on 22_02_2018__07_42_45
Executing filtering on 12_02_2018__07_40_14
Executing filtering on 14_02_2018__18_30_22
Executing filtering on 15_02_2018__07_45_52
Executing filtering on 15_02_2018__16_08_22
Executing filtering on 15_02_2018__18_33_19
Executing filtering on 16_02_2018__07_45_41
Executing filtering on 19_02_2018__07_46_19
Executing filtering on 20_02_2018__07_41_48
Executing filtering on 20_02_2018__18_31_07
Executing filtering on 21_02_2018__07_42_42
Executing filtering on 13_03_2018__07_29_52
Executing filtering on 14_03_2018__07_35_46
Executing filtering on 20_03_2018__07_28_45
Executing filtering on 28_03_2018__18_39_21
Executing filtering on 21_03_2018__07_32_39
Executing filtering on 21_03_2018__18_32_40
Executing filtering on 22_03_2018__07_38_43
Executing filtering on 14_02_2018__07_41_04
Executing filtering on 21_02_2018__18_28_29
Executing filtering on 22_02_2018__07_42_45

Besides applying the filtering procedure, the Preprocessing.ApplyFiltering computes the relative standard deviation of the location records and starting hour of the trip. Further, it updates the segmentation information, mean and standard deviation accuracy of the location records, trip starting hour in the collection record of a trip in the TripInfo collection. For instance, let us now look at the Trip collection record for one of the trip (let say trip: 29_12_2017__07_37_27) as we did earlier.

[15]:
[rec for rec in con[RouteName]['TripInfo'].find({'SingleTripInfo':'29_12_2017__07_37_27'})]
[15]:
[{'_id': ObjectId('5da547d552d4e70f7c4d3f28'),
  'SingleTripInfo': '29_12_2017__07_37_27',
  'filteredLocationRecord': True,
  'DBSCANOp': False,
  'segments': 3,
  'segmentsTimeStamp': [[1514513265000.0, 1514513272000.0],
   [1514513294000.0, 1514516030000.0],
   [1514516107000.0, 1514516189000.0]],
  'RelativeSTDAccuracy': 10.19637724205251,
  'TripStartHour': '07',
  'meanAccuracy': 4.13410543598751,
  'stdAccuracy': 0.42152898583748616}]

One may change the trip (i.e. *29_12_2017__07_37_27*) and replace it with any of the other trips of the SingleTripsInfo for displaying the corresponding status of the selected trip.

We would like to draw the attention of readers to the fields of the TripInfo collection record. The Preprocessing.ApplyFiltering has updated the field filteredLocationRecord to True, as it has applied the filtering process on a trip. Likewise, the other fields have also updated with the computed values.

References

[1] P. Rajput, M. Chaturvedi, and P. Patel, “Advanced urban public transportation system for indian scenarios,” in Proceedings of the 20th International Conference on Distributed Computing and Networking, ICDCN , India, January 04-07, 2019, 2019, pp. 327–336. doi: 10.1145/3288599.3288624.