Representing the Center for Government Excellence (GovEx), we recently attended the 4th Annual International Conference on Computational Social Science (IC2S2), hosted by Northwestern University’s Kellogg School of Management—an immaculate, futuristic building encased almost entirely in glass. The four-day conference was largely attended by national and international students, academic organizations, tech professionals, non-profit organizations, and so on.
Just as GovEx works to help local governments utilize data in order to improve its residents’ quality of life, the field of computational social science aims to apply quantitative methods to human behaviors in order to enact positive societal change. The conference offered attendees the opportunity to attend skills workshops and fascinating talks that covered research topics ranging from politics and social movements, to methods, news and media consumption.
We were excited to be one of the presenters of our own original research project. After a 3 month long selection process and round of commentary from our abstract’s raters, we were selected to be one of 104 poster presentations.
Originally tasked with creating a national taxonomy for 311 service request types (SRT) and categories for a GovEx project, we grew quickly disenchanted by the lack of metadata describing the process of service requests that were presently interpreted and categorized. What was happening between the moment a resident called 311 to make a service request and when their words were translated into an SRT and ultimately categorized? Their fear in not knowing this was: if 311 requests are translated and categorized in a way that is biased, this could result in negative outcomes for the very residents it’s meant to help —
What if a request was denied simply because it was miscategorized at the onset? What if citizens’ requests are ever ignored due to improper, ineffective coding schemes? What if certain requests are disproportionately overlooked on the basis of a caller’s speech?
Impassioned by our mutual love for text analysis, open data, R, and difficult questions, we extended the project to explore our research question:
Is there a way to create an automated national coding scheme for 311 service requests that eliminates the potential of human bias?
In conducting research around text automation, we understood that sentiment analysis on text data utilizes specific dictionaries of words. We considered what language might currently be excluded when using dictionaries in text or sentiment analyses, as well as whether Natural Language Processing (NLP) algorithms reflect certain dialects, idioms, speech, or words.
We came to the conclusion for our project that “311 service request dictionaries” must include diverse, non-standard English text to reflect diverse cities and diverse speech. After all, our society has become more diverse, language and speech have changed and new words are part of the lexicon. What may have worked at one time, does not necessarily work for today or tomorrow. So with our guiding questions in mind, we analyzed free response text in ten U.S. cities’ open 311 datasets.
Our initial findings showed that there exists some evidence to support our initial theory that categorization of requests is, in part, due to some lack of thoughtfulness around informal and diverse speech. Our analysis revealed one example of this, in which some animal concerns were categorized as traffic violations. While our first finding was applauded by our audience at IC2S2, we still have a great deal of work ahead of us.
Moving forward, we hope to publish our methods, data, algorithm, and detailed instructions for implementation. We plan to eventually release our machine learning algorithm that will serve as an evidence-based tool for categorizing 311 service requests so that cities are better able to serve the needs of all of their residents. Rather than replace the work of a 311 responder, the algorithm can serve to supplement that responder’s process.