From Zero to Sixty at the Speed of RAG: Improving YAML Recipe Generation via Retrieval
LLMs have been shown to match or even exceed the performance of specialized Deep Learning models on code generation tasks for general purpose imperative languages, such as Python, Java, C++, and Rust. Conversely, there is only limited work investigating whether such impressive out of the box generalization transfers onto less ubiquitous domain-specific languages, which are often declarative, based on XML, JSON, or YAML. To bridge this gap, we explore the capabilities of LLMs for composing code automation recipes without resorting to any form of task-specific finetuning. We experiment with two GPT versions and CodeLLaMA-13b, and in our experiments, we find that after extensive prompt engineering and chain-of-thought prompting, these models performance in recipe selection is ≈ 30%. For parameter filling of YAML recipes, the performance of these models remains below ≈ 50%. However, by decomposing the task into two stages: dense retrieval and generative slot filling, and while still keeping our setup training-free, the models are able to attain a performance in a range of ≈ 50% to ≈ 67% in recipe selection, and ≈ 60% to ≈ 76% in recipe selection in parameter filling. Our study sheds light on the capabilities of LLMs in generating scripts for less widespread languages and opens up avenues for future research