When large language models actually help SE, when classical ML still wins, and — now that major benchmarks (Defects4J, SStuBs, HumanEval, etc.) have leaked into LLM training corpora — how we can tell the difference honestly.
This will be a Livewriting meeting (arrive at 9am with a one-page abstract, leave at 5pm as a co-author on a finished paper). H
Participants will be pre-clustered into small writing teams around shared abstracts. The day will alternate entween structured discussion with drafting sprints, with LLMs welcome as tools but not as primary authors. Authorship will be verified live, in the room, by the Editor-in-Chief of the ASE journal. Qualifying papers will be fast-tracked to a special issue of the Autoamted SE journal, with expedited review. No homework before, no homework after.
Extended anstracts
one (track description)
LiveWriting is a proof-of-presence workshop on rethinking AI in software engineering. Arrive at 9am with a one-page abstract, leave at 5pm as a co-author on a finished paper. The theme asks when large language models actually help SE, when classical ML still wins, and — now that major benchmarks (Defects4J, SStuBs, HumanEval, etc.) have leaked into LLM training corpora — how we can tell the difference honestly. Participants are pre-clustered into small writing teams around shared abstracts; the day alternates structured discussion with drafting sprints, with LLMs welcome as tools but not as authors. Authorship is verified live, in the room, by the Editor-in-Chief of the ASE journal, and qualifying papers are fast-tracked to a special issue with expedited review. Real venue, real impact factor, witnessed provenance. No homework before, no homework after.
two (call for papers)
1st International Workshop on LiveWriting
A Proof-of-Presence Workshop
Rethinking AI in SE: When Do LLMs Help, When Does Classical ML Win,
and How Do We Know?
Arrive with an abstract. Leave with a paper.
-------------------------------------------------------------------
THE DOMAIN
-------------------------------------------------------------------
Software engineering has absorbed two waves of AI. Classical machine
learning — decision trees, Naive Bayes, random forests, search-based
methods — gave us artifacts that are small, auditable, reproducible,
and trainable on a laptop from project-specific data. Large language
models gave us something different: massive, opaque, trained on
corpora we cannot inspect, yet claiming (in some settings) to need no
project-specific training at all.
The two waves pull in opposite directions along nearly every
dimension that matters for SE: interpretability vs. predictive
ceiling, compute cost per decision vs. decision quality,
reproducibility vs. zero-shot convenience, in-domain calibration vs.
cross-project generality, correlation vs. causation, energy per
result vs. results per dollar — and, critically, whether "the model"
is a stable artifact a practitioner ships and maintains, or a remote
service that silently changes under them every quarter.
The old picture is cracking. Traditional SE analytics assumed the
model was the deliverable: train a defect predictor, cost estimator,
bug localizer, or test prioritizer on a project's own history, ship
it, recalibrate over time. That pipeline is now haunted by benchmark
contamination — Defects4J, SStuBs, HumanEval and their siblings are
all in LLM training corpora, so "accuracy on held-out data" no
longer measures what we once thought. Classical baselines are
quietly underperforming their own historical numbers for reasons the
leaderboard does not explain. "Vibe-coded" systems pass functional
tests while failing on accessibility, security, and maintainability.
At the same time, LLM-based approaches buckle at real deployment:
they do not meet the latency and cost envelope of a service running
a billion times a day, they cannot be audited for safety-critical
use, and their numbers do not reproduce across model versions.
The open question is no longer "does AI help SE?" but "which AI, at
what cost, with what evidence, for which task — and how do we
evaluate that honestly once the benchmarks have leaked?"
-------------------------------------------------------------------
WHAT IS DIFFERENT ABOUT THIS MEETING
-------------------------------------------------------------------
This is not a normal workshop. Four things set it apart:
1. You write the paper in the room.
Arrive at 9am with a one-page abstract. Leave at 5pm as a
co-author on a finished paper. No homework before. No homework
after.
2. Proof-of-Presence authorship.
The Editor-in-Chief of the ASE journal sits in all day and
verifies, in real time, that every listed author contributed. In
an era of LLM-assisted paper mills, this is the strongest
authorship signal our field can offer.
3. Teams, not solo submissions.
Abstracts are pre-clustered into three writing teams. You are
selected not only for what you know but for who you will
productively write with. The day is structured around
collaboration, not parallel monologues.
4. Fast-track to journal.
Papers that clear the editor's bar are invited to a special issue
of the ASE journal with an expedited review pipeline. Real venue,
real impact factor, witnessed provenance.
-------------------------------------------------------------------
SCHEDULE
-------------------------------------------------------------------
09:00 teams form around clustered abstracts
09:30 morning writing workshops (results, questions, scope)
12:00 round-robin panel across teams
13:00 drafting sprint (LLMs welcome as tools)
14:00 complete drafts
14:00 – 17:00 revision squads polish in Overleaf
17:00 papers done
-------------------------------------------------------------------
TOPICS
-------------------------------------------------------------------
Topics include but are not limited to:
- when classical ML beats LLMs on SE tasks, and why
- benchmark contamination and what replaces held-out accuracy
- cost, latency, and energy budgets for AI in production SE
- reproducibility across silently-changing model versions
- non-functional failures of "vibe-coded" systems
- costs and benefits of neurosymbolic systems
- interpretability and auditability for safety-critical SE
- GenAI: limits and the way ahead
-------------------------------------------------------------------
SUBMISSION GUIDELINES
-------------------------------------------------------------------
Format: one page, ACM sigconf two-column (references don't count)
Content: a claim, a method, and what you'd bring to a writing team.
Preliminary results and sharp open questions both welcome.
Submit: [HotCRP link TBD]
-------------------------------------------------------------------
IMPORTANT DATES (all deadlines 23:59:59 AoE)
-------------------------------------------------------------------
Abstract submission: [DATE]
Notification + team assignment: [DATE]
Workshop (paper written on day): [EASTER DATE]
Fast-track journal submission: within 2 weeks of the workshop
-------------------------------------------------------------------
ORGANIZATION
-------------------------------------------------------------------
## Organizing Committee:
Matteo Esposito, University of Oulu (Finland)
Valentina Lenarduzzi, University of Southern Denmark (Denmark)
and University of Oulu (Finland)
Tim Menzies, NC State University (USA)
Dario Di Nucci, University of Salerno (Italy)
Klaus Schmid, University of Hildesheim (Germany)
Editor-in-Residence: Tim Menzies, ASE journal
Program Committee: [TBD]