Table of Contents
Building a Piano Training App with o1 Models
I get a lot of value out of using LLMs for code. I have written before about my efforts to remain (somewhat) sharp at my ability to write code without assistance, but ultimately there is a certain degree of necessary humility to recognize that frontier language models tend to write cleaner and prettier code than me when provided the same problem. Luckily1, something that LLMs are less good at is navigating large code repos. Writing code is a comparatively small2 part of "being a programmer", so I'm not shy about admitting the machines have eclipsed me at it3.
It feels relatively important to have a clear picture of what these models are capable of accomplishing. After o1-preview and o1-mini were released, I thought it would be useful to build something simple with its help, in order to get a good sense of what it was capable of. I also happened to want to get back into being able to sightread sheet music recently, so I settled on a simple virtual piano app which trains you to read sheet music.
You can find the resulting application here. For now, I'm fairly comfortable calling something at this level of sophistication roughly the limit of what o1-mini and o1-preview can cook up by themselves, without much human help. It runs about 1300 lines of code, and beyond this it starts to struggle to understand all the plumbing between functions to make effective changes. I think this is pretty good – if I have a clear vision of a roughly-1000-LOC-sized subproject, I can mentally expect the o1 models to be useful in helping me figure things out.
The Piano App
The app has the following simple features:
- Completely client-side javascript
- Virtual Piano which can be played via keyboard for 5 octaves
- Ability to change the waveform which generates the sound
- Support for MIDI controllers (!)
- Draws sheet music via the VexFlow library
- Randomly generates notes, including sharps and flats, in an 8-note queue
- Supports both treble clef and bass clef
- A scales trainer for major / natural minor / harmonic minor / melodic minor from any root note
- An ear training mode which will play a note before prompting you to play it
- A metronome with customizable bpm
- Visual feedback for correct / incorrect responses
In general I think this is a fairly substantial number of features – I didn't do very much "programming" in this particular project beyond prompting, but I did occasionally manually fix bugs when they were simple and easy to understand. The support for MIDI controllers was a nice surprise for me! I bought a cheap one for about $30 and I've been having a good time playing around with it.
Thoughts on o1 models
The smaller your code is, the easier it will generally be for LLMs to understand. Overall this took ~20-30 prompts in total from o1-mini and o1-preview (I tended to use the former for coding tasks and the latter for more understanding tasks, since those are what the benchmarks suggested was the most effective way to do things). Overall my impression of these models are that the extra thinking time helps it logically follow through longer files a lot better than other very good coding models (e.g. Sonnet 3.5), but was not that much better on function-level changes.
For example, I ran into a bug at one point where VexFlow wanted the chords to be presented in sorted order, but sorting the notes would shuffle the randomly generated accidentals around (since those were not shuffled along with the notes). This is not a terribly horrific bug, as far as bugs go, but it was embedded inside a 1300 LOC file. Both the o1 models kept trying to essentially rewrite the entire app in order to solve this, proposing several wonky alternative solutions which all slightly did not work. I ended up relying on Sonnet 3.5 given just that function after it became clear that the task was too confusing.
I also echo some of the sentiments I've heard online about the o1 models wanting to make you go away with their responses. The o1 models tend to give extremely long, comprehensive outputs with a clear intent to end the conversation after the first turn whenever possible. This makes the pair programming experience a little unpleasant, since it will usually ask you to make several large changes at once in hopes of just fixing it all immediately. I tend to ask it specifically to only do one thing at a time, but I think this is a fair enough tradeoff for the more accurate longer-context understanding these models are capable of.
Overall I came away from this project with a pretty positive opinion of what the o1 models are capable of, and more importantly, with appreciation that they're good at a different class of things compared to the other existing top coding language models. I think there's a pretty clear ceiling still on what these models can do while in the driver's seat of a software project, but they are pretty impressive if you have some sort of clearly defined "thing" you need done.
Thoughts on LLM-driven Coding Projects
Using this as a "I'm not going to code this, I'm going to mostly let the LLM code it and just supervise" sort of project was admittedly just a tad unnerving for me. There were some spots where trying to make changes via natural language was too annoying and I just went in and fixed a bug myself, but for the most part this is a rare sort of project for me where I did it mostly just by asking o1-mini to do things. I could have tried to make it prettier, but I felt like adding as many features as possible was a better way to get a handle on what it was really capable of.
I've seen all sorts of these "7 year old child makes a webapp using Cursor" type posts over the last few years, but this was the first time I've genuinely thought that someone with roughly no technical knowledge could make a simple application I would actually use purely by interacting with a language model (or, perhaps, just that I'm now the skill level that passes for "7 year old child"). This is a departure from my previous mental model, which felt more like I was armed with a precocious intern. I don't necessarily think this is going to eat all software engineering jobs (I think there's a good chance it just makes it much easier to make software and as a result significantly increases the amount of software in the world), but I do feel like a lot about the practice of programming feels likely to change relatively soon.
Maybe I'm wrong and this thing will swallow my field and spit me back out into the job market. But, well, if that happens at least maybe I'll stand a chance at reading bass clef.
Footnotes:
That is, luckily for me, since I can probably keep my job a bit longer.
And, arguably, easy.
This is a story that comes for us all someday, so I've been mentally preparing myself for quite some time now. Lee Sedol famously quit Go after losing the match to AlphaGo, saying "There is an entity that cannot be defeated". I am certain that language models will soon achieve skill at programming contests which I couldn't dream of achieving (if they haven't already), so I've been trying to optimize towards not having that crush my self-worth.