4 years ago when I was about 12 years old, my brother played a song that I really liked at the time. I wanted to add it to my playlist, but at the time I had no clue what I was listening to. I heard about this new app called Shazam which can listen to music and actually identify what song it is. I waited, and waited, and the next time I heard that song in the car, I pulled out my crusty IPad 4 and Shazamed!
Holy sh*t! This thing works?
It worked like a charm, I got the song that I’ve been wanting to listen to forever. It’s unbelievable, at 12 years old I thought there was a genius inside my IPad who knew every single song in the world. It was a ridiculous thought, but my 12-year old brain had an even better idea. What if this guy can get into my brain and figure out what song I’m thinking of. That way, I wouldn’t even need to wait for the song to come on, I could just imagine the melody in my head and I’ll instantly be told what the song is. But even I knew that was a bit ridiculous… because even if they got into my brain, I could just be imagining a random song.
4 years later, I’ve learned about how Shazam works, a bit more about how the brain works and the technology used to interface with it. So I said “screw it, what could go wrong?” Turns out… A lot.
A Leap Of Faith
The scale of this project was unlike anything that I’ve ever done before, they were always super simple projects, like eye-blink detection, or live neurofeedback. Decoding imagined music, and actually classifying it was on a whole different level, especially since at the time there wasn’t much documentation on the topic as it wasn’t done before.
It was clear in the research phase that I was going to have to figure out basically everything by myself and have a lot of meetings with my mentors. I had absolutely no clue what I was doing and just decided that the best thing I can do is move forward with a little bit of faith. The first thing that I needed to do was to get a dataset that fit the context for my project: music imagery.
A Perfect* Dataset
After a little bit of research, I found a dataset online prepared by the Owen Lab, called OpenMIIR (Music Imagery Information Retrieval). It had everything I needed, the data was perfect* for my project except for one thing, it’s old.
The last relevant commit on the repository was 6 years ago. It isn’t really maintained too well, and the pipeline that they provided doesn’t work anymore without some heavy reconstruction and converting the Python 2 to Python 3.
Otherwise, the dataset gave me everything I needed. Here’s an overview of the experiment:
- 12 audio recordings of 8 songs. 4 songs with & without lyrics, and 4 instrumental songs
- Each audio recording was on average 10 seconds.
- The experiment was split into 2 blocks. The first block had 3 conditions where people listened to music, imagined music after cue beats, and imagined music without any interference(in order)
- Participants wore a 64-channel EEG during this experiment.
Great! This dataset has EEG data of people listening to and imagining music. So far so good, it’s time to preprocess the data.
The dataset came with a Python pipeline built in. It’s old, rusty, and there’s definitely some cracks in some of the pipes. But, with enough determination, and elbow grease, you can fix virtually anything.
I had to convert the pipeline to Python 3, and all of the helper files that it imports functions from, as well as cut out all the old code that doesn’t work/doesn’t have a purpose. Cleaning up the code was a nightmare, and it was just one bug after another.
After the pipeline was clean, I was able to import all the metadata and beat events (with the help of my mentor Sean Wood) into the stimulus channel by taking each individual beat and adding it to the trial events (The dataset already had all the beat timings, don’t worry). I performed some basic preprocessing like a 1–30Hz Band Pass Filter and Independent Component Analysis (ICA) as well as downsampling to 1/4 to speed up computation. Now that my data is ready, I need to figure out how I’m going to identify and classify each song in the EEG data
Into the rabbit hole we go!
I started off by reading about a technique called stimulus reconstruction which is mainly used to decode imagined speech envelopes. Around this time I heard about a research paper that came out August 5th, 2021 which was trying to achieve something similar to what I’m doing.
The paper was called “Accurate Decoding of Imagined and Heard Melodies,” literally gold. I bunkered up for the next couple of days and did everything in my power to understand this research paper that’s way above my pay grade. Essentially, the paper talks about using a backwards temporal response function (fancy way to say stimulus reconstruction) to extract the beat onsets meaning that in the EEG data they were able to find the timing of each individual beat.
They also used another method of extraction (which was the main focus of the paper) to decode the actual melody in the EEG data which is significantly harder and they (to my knowledge) don’t go too deep into what methods they used to achieved this.
Regardless, the paper answered a lot of my questions that I didn’t have the resources to answer. I chose to go with stimulus reconstruction, but edited my code to better fit the context of the research paper. Stimulus reconstruction is essentially what it sounds like, you take EEG data which has already been stimulated with something, and you perform various calculations to reconstruct the stimuli used to stimulate the brain.
In terms of the actual code, I just used the ReceptiveField function from MNE which can be used similarly to the mTRF Matlab toolbox which is used for stimulus reconstruction. If you want to learn more about the methods used in my project, read this paper which describes the functions really well.
The accuracy of the stimulus reconstruction would then be calculated by taking the Pearson’s correlation of the estimated note onset and the actual note onset.
The Bittersweet Ending
The first time I did it, I had an accuracy of 3%… Alright, it’s a start, but afterwards I got an accuracy of 20%! Still terrible, so I tried really really hard to optimize my code and make sure that the model trains correctly for the stimulus reconstruction but I only could get an accuracy of around 35%… Although I was disappointed that it barely worked in the end, I’m proud that I tried.
If Ph.Ds can barely get 80%+ accuracy on their models, I can live with 35% and say that I’m about 35/80 as smart as a Ph.D in this specific context…
I learned a lot from this project, and would love to take on another neurotech project later. Even though it was difficult, and didn’t succeed in the traditional sense, this project was so I could learn, and follow my curiosity. In the end, that’s exactly what I did.
PS: The song that I shazamed was Otis by Kanye West ;)