Date of Completion

7-11-2016

Embargo Period

5-3-2016

Advisors

Sanguthevar Rajasekaran, Karthik Konduri, Nicholas Lownes, Maifi Khan

Field of Study

Computer Science and Engineering

Degree

Master of Science

Open Access

Open Access

Abstract

This paper describes an approach to infer the location of a social media post at a hyper-local scale based on its content, conditional to the knowledge that the post originates from a larger area such as a city or even a state. The approach comprises three components: (i) a discriminative classifier, namely, Logistic Regression (LR) which selects from a set of most probable sub-regions from where 1 a post might have originated; (ii) a clustering technique, namely, k-means, that adaptively partitions the larger geographic region into sub-regions based on the density of the posts; and (iii) a range of techniques to extract a set of hyper-local words from the posts to be fed as features to the LR classifier. The approach is evaluated on a large corpus of tweets collected from Twitter over the NYC, Washington DC, and state of Connecticut regions. The results show that our approach can geo-locate tweets within 1.72 km for NYC, 12.5 km for DC and 37.00 km for CT. These results from three geographically and socially diverse regions suggest that our approach outperforms contemporary methods that estimate locations within ranges of hundreds of kilometers. It can thus support a wide array of services such as location-based advertising, and disaster and emergency response.

Major Advisor

Swapna Gokhale

Share

COinS