Programming with Large Language Models: Establishing Context
I have been heavily programming with Github Copilot for over a year and ChatGPT since it came out. It's the biggest paradigm shift in programming I've personally ever experienced, and I havent shied away from learning wild stuff.
Programming with a Large Language Model (LLM for short) feels to me like a very “weird” way to program. It is neither a conversation with a human (because of course it isn't), nor is it programming as I know it, because I am using imprecise natural language to get fuzzy answers. If I could best describe it, it feels like writing a google search query vs writing a regexp. Not only because of the language aspect, but because it is hard to tell “what” the algorithm is latching on to, and why it is giving the answers it gives.
I will be writing up quite a few of these smaller articles, but keep each of them short. I have multiple times started 10k words oddysseys and never gone back to them.
Technique number 1: Establishing context
Because with each API call, the LLM starts “from scratch” (this is both its strength and its weakness), you have to provide context with each query. In the case of straight GPT API usage, this is entirely up to you. In the case of Copilot, a much more complex context is sent to the model (here's a write up of reverse engineering an earlier version of copilot: copilot-explorer | Hacky repo to see what the Copilot extension sends to the server) and in the case of ChatGPT I have more studying to do to give a decent answer, but it is able to obviously take previous messages into consideration.
A language model is a mathematical construct that will provide the probability for a certain word (or token, to be more precise) given a certain context. This allows it to accomplish a variety of tasks, for example by selecting the or one of the most probable tokens and build sentences that way. In the case of a large language model, this probability is computed by an immense deep learning network that has been trained on a gargantuan amount of data. There is no realistic way to say why some probability got computed, but there is definitely structure to it.
I think of these models are cars that have been engineered to follow a certain set of roads, at a certain speed, and with certain capabilities to navigate their environment, and it is our job now to direct them where to go. I like to use the following example, and have a human answer. Complete the following sentences:
– As a suburban dad, when I go to town I use my ...
– As a race car engineer, when I go to town I use my ...
– As a 12 year old, when I go to town I use my ...
– As an alien from outer space, when I go to town I use my ...
Think of prompting as a similar exercise, except that you often want the model to complete with much more useful information: either new facts, or longer sequences of words to save on typing and research time (say, scaffold out a shell script, a SQL schema, etc...)