Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.
Today, we will listen to a new AI from DeepMind that is capable of creating beautiful piano
music.
Because there are many other algorithms that do that, to put things into perspective, let's
talk about the two key differentiating factors that set this method apart from previously
existing techniques.
One, music is typically learned from high-level representations, such as the score or MIDI
data.
This is a precise representation of what needs to be played, but they don't tell us how to
play them.
These small nuances are what makes the music come alive, and this is exactly what is missing
from most of the synthesis techniques.
This new method is able to learn these structures and generates not midi signals but raw audio
waveforms.
And two, it is better at retaining stylistic consistency.
Most previous techniques create music that is consistent on a shorter time-scale, but
do not take into consideration what was played 30 seconds ago, and therefore they lack the
high-level structure that is the hallmark of quality songwriting.
However, this new method shows stylistic consistency over longer time periods.
Let's give it a quick listen and talk about the architecture of this learning algorithm
after that.
While we listen, I'll show you the composers it has learned from to produce this.
I have never heard any AI-generated music before with such articulation and the harmonies
are also absolutely amazing.
Truly stunning results.
It uses an architecture that goes by the name autoregressive discrete autoencoder.
This contains an encoder module that takes a raw audio waveform and compresses it down
into an internal representation, where the decoder part is responsible for reconstructing
the raw audio from this internal representation.
Both of them are neural networks.
The autoregressive part means that the algorithm looks at previous time steps in the learned
audio signals when producing new notes, and is implemented in the encoder module.
Essentially, this is what gives the algorithm longer-term memory to remember what it played
earlier.
As you have seen the dataset the algorithm learned from as the music was playing, I am
also really curious how we can exert artistic control over the output by changing the dataset.
Essentially, you can likely change what the student learns by changing the textbooks used
to teach them.
For now, let's marvel at one more sound sample.
This is already incredible, and I can only imagine what we will be able to do not ten
years from now, just a year from now.
Thanks for watching and for your generous support, and I'll see you next time!
Không có nhận xét nào:
Đăng nhận xét