The proliferation of open-source projects has led to large amounts of source code and related artifacts: arguably, the rich and open resources associated with software--including open source repositories, Q/A sites, change histories, and communications between developers--are the richest and most detailed information resource for any technical area. Recently it has been discovered that “natural”, human-produced software has many interesting statistical regularities. As a consequence code corpora, just like natural language corpora, are amenable to statistical modeling, and a number of software tasks such as coding, testing, porting, bug-patching etc are potentially enhanced by the use of these statistical models.

This interdisciplinary workshop will explore issues related to the statistical modeling of software corpora, including topics such as: modeling repetitiveness in source code; use of language models for the code suggestion in IDEs; using probabilistic grammars to mine programming idioms; statistical methods for type inference in a dynamically typed languages; statistical machine translation for porting applications between programming languages, or “mini-fying”Javascript; using statistical language models to find bugs; or statistical methods for automatic code patching, code summarization, code retrieval, code annotation, or test generation.

The workshop follows several earlier workshops on this topic at Microsoft Research, Dagstuhl event, SIGSOFT FSE, and AAAI.

Call for participation

We invite you to join us in Lake Buena Vista, we have a great schedule of two keynote presentations, and a collection of presentations showcasing the latest work in this area.


Program overview

Nov 4, 2018

Welcome and Introductions

Rock Lake

- Keynote 1, Marc Brockschmidt, Microsoft Research (Title : "Learning from Code with Graphs" )

Lakes Foyer Social Coffee Break

Rock Lake

- Total Recall, Language Processing, and Software Engineering (Long)

- Is "Naturalness" a Result of Deliberate Choice? (Long)

- A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English (Long)

- TestNMT- Function-to-Test Neural Machine Translation (Short)

- 3CAP: Categorizing the Cognitive Capabilities of Alzheimer's Patients in a Smart Home Environment (Short)

- Generating Comments from Source Code with CCGs (Short)

Lakeview Restaurant (West) Lunch

Rock Lake

- Towards Understanding Code Readability and its Impact on Design Quality (Long)

- Cleaning StackOverflow for use in Machine Translation (Long)

- LinkSO: A Dataset for Learning to Retrieve Similar Question Answer Pairs on Software Development Forums (Long)

- Natural Language Processing (NLP) Applied on Issue Trackers as Eclipse Bugzilla (Short)

- Mining Monitoring Concerns Implementation in Java-based Software Systems (Short)

- Two Perspectives on Software Documentation Quality in StackOverflow (Short)

Lakes Foyer Social Coffee Break
Rock Lake

Keynote 2: Satish Chandra, Facebook (Title: "Bringing ML to the Developer")

Rock Lake

NL4SE Workshop wrap up

Program Committee

Organizing Committee
Yijun YuThe Open University (UK)
Erik FredericksOakland University
Prem DevanbuUniversity of California, Davis
Program Committee
Miltos AllamanisMicrosoft Cambridge (UK)
Marc BrockschmidtMicrosoft Cambridge (UK)
Satish ChandraFacebook, Inc. (USA)
Premkumar DevanbuUniversity of California, Davis (USA)
Erik FredericksOakland University (USA)
Reihaneh HaririOakland University (USA)
Abram HindleUniversity of Alberta (Canada)
Mark MarronMSR, WA (USA)
Fayola PetersLero (Ireland)
Michael PradelTU Darmstadt (Germany)
Guangzhi QuOakland University (USA)
Baishakhi RayVirginia (USA)
Thein Than TunOpen University (UK)
Bogdan VasilescuCarnegie Mellon University (USA)
Martin VechevETH Zurich (Switzerland)
Xiaoyin WangUniversity of Texas St. Antonio (USA)
Alistair WillisOpen University (UK)
Yijun YuOpen University (UK)