Wrapping the Python REPL with Large Language Models
The code accompanying this post is at https://gitlab.com/da_doomer/natural-python.
In some cases it is now possible to write computer code mostly by prompting a Large Language Model (LLM). It seems there are two common modalities for this: (1) integrate the LLM call directly in an IDE (e.g., Copilot in VSCode) or (2) use something like ChatGPT or a playground to get source code to copy/paste in your editor.
Often, the top guess from the LLM (which is what is shown to the programmer) is almost correct, but still requires the programmer to fix small errors. However, correct code is likely to be found in the first 20 or so LLM guesses.
I propose[0] a new modality that addresses this issue: write natural language and constraints in a Read-Evaluate-Print-Loop (REPL) where the ‘evaluate’ step is formulated as a program synthesis problem on the given constraints. This allows the programmer to provide pass/fail criteria to filter the LLM guesses.
[0] Perhaps others have proposed this before. I came up with this idea this morning and have not looked it up.
As an example, consider the task of creating a list with the days of the week starting on Monday. Assume that it is hard for the programmer to describe the property of ‘starting on Monday’. In this case, the programmer would like the ability to filter LLM output with a hard constraint like so:
>>> # Create a list with the days of the week, call it 'days'
>>> # finally:
+++ assert days[0] == 'Monday'
The REPL loop then forms a prompt, samples completions from an LLM, filters the candidates using the constraint and responds with the following:
>>> days = [
>>> 'Monday',
>>> 'Tuesday',
>>> 'Wednesday',
>>> 'Thursday',
>>> 'Friday',
>>> 'Saturday',
>>> ]
After the user is satisfied with the session, the Python code can be saved to a script for future use.
I implemented this approach in a tool called Natural Python. Check it out if you are interested.
Naturally, from a safety perspective the output of an LLM can only be assumed to be adversarial. Indeed, executing the output of an LLM is an inherently dangerous approach to implementing a REPL.
An interesting extension would be to allow the constraints to be specified in natural language.