Inferring Individual’ activity and trip purposes is critical for transportation and travel behavior. State-of-Art trip purpose inference is conducted by GIS and land use data. However, there exist two major challenges: 1) how to identify accurate trip purposes in a high business density area with various possibilities of activities. 2) how to recognize high-resolution activities, which are much more than typical trip purposes (home, work, recreation, personal business, education, etc.) in existing literature.
Nowadays, the thriving growth of social media platforms, such as Twitter and Facebook, provides a new opportunity to extract crowdsourced data. Transportation authorities have also begun to identify social media data as another data source for transportation informatics. The advantage of social media is that passive activity information as well as time and location can be retrieved in real time with relatively small building and maintenance costs. The objective of this project is, as a first attempt, to prove the concept that social media, combined with existing land use data and Connected Vehicle (CV) trajectories, can infer individual’s high-resolution activity and trip purposes information. In order to accomplish this goal, a 15-month project is defined in this proposal with a multidisciplinary team assembled with two PIs from transportation engineering and computer science, respectively.
To accomplish the objective, first, the study will conduct a comprehensive literature review of previous studies in social media analytics and trip purpose inferences. Second, the research will develop machine learning models to retrieve travel related tweets and label geo information for tweets without geo-tags. Third, we will leverage a keyword-search approach to identify major public events and people’s gathering for public activities. Fourth, individual’s activities will be derived by deep learning and topic modeling. Fifth, the trip segments will be derived from Connected Vehicle trajectories and will be labeled with activities from both social media and land use data using topic modeling and WordNet, a popular lexical database. The proposed models will be finally evaluated with recently released 2 month CV data from 3000 equipped vehicles in Michigan CV safety pilot, the land use data, and the corresponding Tweeter data.