Neuro-Symbolic Music Models | EleutherAI | Page 2

tiny coral Jun 8, 2023, 3:48 PM

#

as per Oore et al 2018, iirc

hearty flicker Jun 8, 2023, 3:57 PM

#

Ah yeah that is my approach to

ruby frost Jun 9, 2023, 3:32 AM

#

is there a centralized doc where y'all are keeping track of progress somewhere? scattered messages can be hard to follow sometimes 😅

hearty flicker Jun 9, 2023, 10:54 AM

#

ruby frost is there a centralized doc where y'all are keeping track of progress somewhere? ...

Me and @timber talon are pretty busy with other obligations atm, but I've been slowly pushing some code to the repo - https://github.com/EleutherAI/music-transformer

GitHub

GitHub - EleutherAI/music-transformer

Contribute to EleutherAI/music-transformer development by creating an account on GitHub.

#

It would probably a good idea for me to add the progress write-ups to the discussion tab on there.

#

I've nearly finished building tokenizer/training infrastructure. I'm trying to build it up properly, it's important to me that people can easily fine-tune on their own datasets if they wish.

#

As for the direction of the paper, I'm interested in doing a compressive review outlining how different factors impact the quality of musical transformer models.

#

For example: composition of the pre-training data, model size, format of the tokenizer.

#

A side effect would be creating a SOTA open source pre-trained general purpose musical transformer model (MuseNet+ but open and apache 2.0 licensed).

#

I would also like to include some research on ways of aligning the model to produce piano music (I have a very good fine-tuning dataset here)

#

I'll update you @ruby frost when things are moving along properly! Like I said, I'm currently doing this in my spare time. The majority of the code should be done by mid next week,

timber talon Jun 9, 2023, 1:07 PM

#

this is a great overview @hearty flicker 🙂
i'd just add to that and say that I've been pretty wrapped up in the EMNLP deadline, but once it's passed, I'll be focusing on evaluation specifically whether there are ways of analyzing the structure of the output, as in: https://arxiv.org/pdf/2210.08444.pdf

hearty flicker Jun 9, 2023, 1:54 PM

#

Btw if anyone is interested in genre fine-tuning, I have about 600 hours of high quality midi paired with genre tags

#

By scraping this website - https://midifiles.co.uk/

#

Unfortunately they are all demos, only 1 minute long

sand nymph Jun 9, 2023, 4:44 PM

#

https://arxiv.org/abs/2306.05284

arXiv.org

Simple and Controllable Music Generation

We tackle the task of conditional music generation. We introduce MusicGen, a
single Language Model (LM) that operates over several streams of compressed
discrete music representation, i.e., tokens. Unlike prior work, MusicGen is
comprised of a single-stage transformer LM together with efficient token
interleaving patterns, which eliminates the n...

raven kettle Jun 9, 2023, 4:59 PM

#

The names for all these absolutely suck

#

MusicGen, MusicLM, AudioLM...

#

But the work itself is fantastic

#

And this from Facebook is MIT, code, models...

sand nymph Jun 9, 2023, 5:11 PM

#

NC license for the models I think

raven kettle Jun 9, 2023, 5:20 PM

#

Ah yeah MIT if only for the code

hearty flicker Jun 9, 2023, 6:00 PM

#

sand nymph https://arxiv.org/abs/2306.05284

I was reading this earlier, seems like a response to the audioLM/musicLM models by Google

#

Cool stuff

timber talon Jun 13, 2023, 7:11 PM

#

has anyone done audio diffusion models?

raven kettle Jun 13, 2023, 7:17 PM

#

In this channel, or in general

#

Because there's a lot of audio diffusion papers

timber talon Jun 13, 2023, 7:20 PM

#

in general... wondering why the current trend is for autoregressive transformers

#

when I feel like musical phrases lend themselves well to diffusion setups

raven kettle Jun 13, 2023, 7:35 PM

#

Current sota compression methods seem to lend themselves to discrete tokens

hearty flicker Jun 13, 2023, 8:44 PM

#

#

Tokenizer is working on my end now

#

Just need to fiddle with the API a bit to make it easier to use. Should be very easy for people to build their own datasets (with custom hooks for pre-processing, filtering, ect.) & tokenizers.

#

Will push everything and add a HOWTO when the API is finalized

hearty flicker Jun 13, 2023, 8:49 PM

#

timber talon has anyone done audio diffusion models?

People have done this

timber talon Jun 13, 2023, 8:49 PM

#

nice 🙂

hearty flicker Jun 13, 2023, 8:50 PM

#

Interestingly people have even trained image diffusion models on spectrograms to create audio

#

According to my supervisor the results are surprisingly good.

#

One of the things I'm aiming to start working on soon are non-auto-regressive symbolic (i.e. tokenized) music models

#

Could make a better foundation for compositional tools

timber talon Jun 13, 2023, 9:33 PM

#

That makes sense. I’m curious how they handle long sequences

#

Is the length of the output constrained by the length of the diffusion window?

raven kettle Jun 13, 2023, 9:43 PM

#

timber talon That makes sense. I’m curious how they handle long sequences

They don't

hearty flicker Jun 14, 2023, 8:55 PM

#

Tokenizer is up (https://github.com/EleutherAI/music-transformer/pull/5). Should have have model / train loop up by the weekend.

#

The logic was a little tedious, it could probably do with a refactor

#

The bright side is that it supports multi-track, pedal, and drums. It is also customisable with a config.json file (for quantization).

#

I will add a 'howto' to the README after the training logic is done.

raven kettle Jun 15, 2023, 5:54 PM

#

https://twitter.com/fjord41/status/1564347901031043072

Curtis Hawthorne (@fjord41)

Diffusion for music synthesis!

We trained a “notes2audio” pipeline to synthesize audio from multi-instrument MIDI notes.

Listen 🔊: https://t.co/keM3PgK0bC
Play 🎼: https://t.co/KeuRwZfJAh
Code 👩‍💻: https://t.co/mczOUi8r6b
Read 📝 : https://t.co/hSFZePbLrc

1/

Likes

2726

Retweets

559

#

This might have been mentioned before - it's relatively old anyway

hearty flicker Jun 15, 2023, 6:24 PM

#

Btw guys here is an example of what Aberesque (Debussy) looks with the current version of the lazy tokenizer with quantization. It has been truncated to 1000 tokens.

#

📎 message.txt

#

And this is what the midi sounds like after it has been converted back

#

For now I'm going to implement the model in PyTorch/Lightning for initial experiments. At some point it would be good to rewrite it using DeepSpeed.

quasi wren Jun 15, 2023, 7:03 PM

#

hearty flicker For now I'm going to implement the model in PyTorch/Lightning for initial experi...

Cool

#

Could you add my 60k midi dataset to the datasets

hearty flicker Jun 16, 2023, 1:47 PM

#

Beethovens entire sonata number 8 is only 25k tokens lol

quasi wren Jun 16, 2023, 2:10 PM

#

hearty flicker Beethovens entire sonata number 8 is only 25k tokens lol

Lol

hearty flicker Jun 16, 2023, 8:39 PM

#

On sunday I'm going to post a writeup on the Github repo, discussing where things currently stand. The implementation is basically completed at this point, time to start experimenting.

#

I've just seen this fresh pre-preprint out of Stanford https://twitter.com/jwthickstun/status/1669726326956371971

John Thickstun (@jwthickstun)

We’re releasing the Anticipatory Music Transformer: a controllable generative model for symbolic music (like MIDI). Read about the model on the CRFM blog:

https://t.co/IBN7K8GTB1

🧵👇

Likes

110

#

Very similar to the non-auto-regressive stuff I was working on a few months ago.

quasi wren Jun 16, 2023, 8:46 PM

#

hearty flicker Very similar to the non-auto-regressive stuff I was working on a few months ago.

cool

timber talon Jun 17, 2023, 2:53 PM

#

sounds great man

#

wow, the samples in this Stanford demo are really impressive

#

what's the difference between "anticipation" and "masked infilling", though? is this not just "BERT for long sequences of masks"?

fallow birch Jun 18, 2023, 4:18 PM

#

not bad (melody is new after 5 seconds, accompaniment is the original song)

somewhat in need of an actual interface

hearty flicker Jun 21, 2023, 6:03 PM

#

Hey guys, I've been really busy. It's probably gonna be a while before I've got any spare time. I'll update everyone, perhaps we can start things up again in a few weeks : )

quasi wren Jun 21, 2023, 6:21 PM

#

hearty flicker Hey guys, I've been really busy. It's probably gonna be a while before I've got ...

K

quasi wren Jun 21, 2023, 6:42 PM

#

I would love to get this started back up

timber talon Jun 26, 2023, 4:15 PM

#

no problem man!! I'm a lot freer, too, after EMNLP. maybe we can meet again to re-kick stuff off and talk about what a paper would look like? especially in light of some of these recent publications that we've chatted about in the past few weeks

hearty flicker Jun 28, 2023, 8:55 AM

#

Hey guys. Things are still slowly plugging along! I'm currently writing tests in my spare time. I will make a post/roadmap on the GitHub page (I'll also link that here) when the repo is fully functional.

hearty flicker Jul 24, 2023, 12:19 PM

#

Hey guys, as some of you know, I've been really busy recently (at my internship/job). Having said that I'm happy to announce that the code is pretty much done, and the project will be moving into the next phase now! I ran an initial test last night, training on ~200 bach midi files ($1 of compute cost). Here are some of the samples. I'm just happy that everything (code-wise) is working correctly.

In total I have gathered a significant amount of data (200k+ files). I believe that most of these are high quality, however to be safe I have also included support for data cleaning in the repo.

Later this week I'm going to post the roadmap I have in mind for the future of this project. One area that I haven't touched is evals, perhaps @timber talon has some ideas on this topic.

@here

#

sand nymph Jul 24, 2023, 12:44 PM

#

@trim solstice

warm tangle Jul 24, 2023, 2:42 PM

#

This is fantastic! How can I contribute to the project from here? It’s been quite a busy summer for me too!

timber talon Jul 25, 2023, 5:10 AM

#

Yah I have some ideas on evaluation. Does anyone in this channel have a background in graphical models?

#

Would love to chat with you if so

hearty flicker Jul 25, 2023, 2:55 PM

#

warm tangle This is fantastic! How can I contribute to the project from here? It’s been quit...

Best bet would be to wait for the roadmap, I'm working on it right now! I just merged the final pr into repo so theoretically you could play around training a model of your own if you like. The API (run.py) should be pretty easy to understand, however I'm also going to write a HOWTO.md sometime this week.

#

As a sidenote, would you mind changing the repo name to 'aria' if you have time @sand nymph? I don't have the permission to do it myself.

sand nymph Jul 25, 2023, 4:07 PM

#

hearty flicker As a sidenote, would you mind changing the repo name to 'aria' if you have time ...

Done https://github.com/EleutherAI/aria

GitHub

GitHub - EleutherAI/aria

Contribute to EleutherAI/aria development by creating an account on GitHub.

ruby frost Jul 25, 2023, 4:13 PM

#

timber talon Yah I have some ideas on evaluation. Does anyone in this channel have a backgrou...

I took a course on it once, can try to help but no guarantees 😅

timber talon Jul 26, 2023, 3:19 PM

#

cool! I'm playing around with ideas on how to model musical structure. The idea was to do latent criticism on the structure of the piece, a la: https://arxiv.org/pdf/2210.08444.pdf and then evaluate musical generations by whether they have structures that look like human-generated structures. I've made graphical models in the past, but they do require planning/etc. while designing them, since they're non-trivial to implement once the design is established

#

I was hoping to be able to toss around some ideas if you wanna chat abotu it some more

hearty flicker Jul 26, 2023, 3:59 PM

#

Currently running a test using the GiantMidi dataset - https://github.com/bytedance/GiantMIDI-Piano. The tokenized data for this one is over 2gb so it should be a lot better. Will post a follow up later if all goes well. Still only training a ~60m model with 512 max_seq for testing purposes.

tiny coral Jul 26, 2023, 4:57 PM

#

FYI blinkdl has trained some larger RWKV music models using his compute and my tokenizer https://huggingface.co/BlinkDL/rwkv-4-music

hearty flicker Jul 26, 2023, 4:58 PM

#

tiny coral FYI blinkdl has trained some larger RWKV music models using his compute and my t...

Very cool!

ruby frost Jul 28, 2023, 6:19 PM

#

timber talon I was hoping to be able to toss around some ideas if you wanna chat abotu it som...

just checked this paper out, down to brainstorm some more

timber talon Jul 28, 2023, 8:33 PM

#

ok cool. maybe i'll message you privately? unless others want to brainstorm as well

raven kettle Jul 28, 2023, 8:38 PM

#

timber talon ok cool. maybe i'll message you privately? unless others want to brainstorm as w...

If you don't have a reason you'd like to move it do DM, might as well keep it public.
People might have interesting thoughts

hearty flicker Jul 30, 2023, 12:45 PM

#

First 'real' experiment has finished training: ~60m model with 512 sequence length. The following samples are not cherry picked, have a listen!

#

I think this is a really good sign that things are working correctly

#

Roadmap/HOWTO should be posted today If I can finish it before the evening

#

@here

hearty flicker Jul 30, 2023, 1:45 PM

#

There is still a small bug (some edge case to do with the sustain pedal) somewhere in the decoding, however this one is still quite good

#

Btw guys if you like you can follow me on twitter - https://twitter.com/loubbrad

ruby frost Jul 30, 2023, 2:11 PM

#

alberti bass is all you need haha, but seriously sounds awesome!

hearty flicker Jul 30, 2023, 2:53 PM

#

I'm glad how the experiment turned out, this is just a test using a subset of the data. I think the final product will be pretty cool.

timber talon Aug 1, 2023, 5:15 PM

#

hey man, these are really cool!!

#

great work, super stoked man, and super impressed you're able to do this while also juggling what sounds like a super intense internship!

timber talon Aug 1, 2023, 5:16 PM

#

hearty flicker Btw guys if you like you can follow me on twitter - https://twitter.com/loubbrad

followed!!

hearty flicker Aug 1, 2023, 5:18 PM

#

If anyone wants to play around, dm me and I can give dl link to the model checkpoint. It's very easy to generate some samples from the cli

sand nymph Aug 1, 2023, 5:19 PM

#

hearty flicker If anyone wants to play around, dm me and I can give dl link to the model checkp...

Is there a reason to not host it on HF?

hearty flicker Aug 1, 2023, 5:20 PM

#

Well this is just a test/experiment, far from the final product

#

I'm writing the readme/howto/roadmap atm

#

I will put it on HF eventually

timber talon Aug 1, 2023, 5:21 PM

#

is there a way we can help you write a model-card? I know the training details, architecture, etc. have been discussed in this thread but i kinda lost them. Or is that already intuitive in the repo?

hearty flicker Aug 1, 2023, 5:22 PM

#

The thing is, the architecture is probably gonna change quite soon.

timber talon Aug 1, 2023, 5:22 PM

#

i see!

hearty flicker Aug 1, 2023, 5:22 PM

#

Like I said, this was just an experiment I ran to make sure it was not just producing noise.

#

The experiment just worked out well haha

timber talon Aug 1, 2023, 5:23 PM

#

ah great 🙂

hearty flicker Aug 1, 2023, 5:24 PM

#

I am writing the roadmap now, that should explain my ideas for how to continue

sand nymph Aug 1, 2023, 5:24 PM

#

@subtle lance wears a lot of hats, but one of them is helping people with dataset and model documentation FYI 🙂

timber talon Aug 1, 2023, 5:24 PM

#

that sounds really great, really appreciate the organizational work you're doing here! maybe we can have another sync meeting?

hearty flicker Aug 1, 2023, 5:24 PM

#

There are various things (such as data augmentation) which are half implemented

#

Good idea, I'll try to set that up after I've finished up the roadmap

#

The models also scale really well (in a musicality sense) with transformer context length. The experiment I ran was very limited in that respect.

#

With 2048 cl it should be able to product 2-3min pieces

rustic dirge Aug 3, 2023, 4:56 AM

#

hearty flicker First 'real' experiment has finished training: ~60m model with 512 sequence leng...

You can try rwkv MIDI models too 🙂 https://huggingface.co/BlinkDL/rwkv-4-music
Can generate MIDI for your prompts so I can test them

BlinkDL/rwkv-4-music · Hugging Face

hearty flicker Aug 6, 2023, 3:17 PM

#

@everyone

Hey guys, just a small update here. I've added a roadmap (https://github.com/EleutherAI/aria/blob/main/ROADMAP.md) and how-to (https://github.com/EleutherAI/aria/blob/main/HOWTO.md) to the repository. This roadmap is essentially a list of things I will be working on over the next month. I don't expect anyone else to get involved, but it's nice to have document to refer people to now! The howto is still a work in progress, as it stands it should be enough to fully grok the repo if you have typical Python/PyTorch (and some basic MIDI) knowledge.

I'm going to try blitz through as may of the issues as I can over next week, as I will have more free time. If you have any questions, feel free to ask me : )

sharp quiver Aug 11, 2023, 5:16 PM

#

might be interesting?
https://arxiv.org/abs/2301.11975
https://github.com/ugtqphgirx/bpe-symbolic-music

arXiv.org

Byte Pair Encoding for Symbolic Music

The symbolic music modality is nowadays mostly represented as discrete and
used with sequential models such as Transformers, for deep learning tasks.
Recent research put efforts on the tokenization, i.e. the conversion of data
into sequences of integers intelligible to such models. This can be achieved by
many ways as music can be composed of si...

GitHub

GitHub - ugtqphgirx/bpe-symbolic-music: Code of the paper "Byte Pai...

Code of the paper "Byte Pair Encoding for Symbolic Music" - GitHub - ugtqphgirx/bpe-symbolic-music: Code of the paper "Byte Pair Encoding for Symbolic Music"

hearty flicker Aug 11, 2023, 9:21 PM

#

I read this paper a few months ago. I think BPE would be nice to add way down the line (as it is essentially automatic/free) however it's not a priority of mine at all.

#

My goal for Aria is to make the best possible pre-trained symbolic music model with the currently available data and known LLM improvements. I'm currently meticulously min-maxing aspect that I can think of. I suppose BPE fits that criteria but I don't think it will make enough of a difference to warrant including at this stage. In my experiance the biggest factors are data quality and the tokenizer itself (inc the kinds of data augmentation that it enables).

#

I'm currently working on the functionality for collecting metadata and adding relevant bits as prefix tokens. MuseNet does this, it's actually been surprisingly straightforward to implement.

hearty flicker Aug 12, 2023, 11:38 AM

#

Hey guys. I'm aiming to do a full training run (full context length, full dataset) during the first week of September if the cluster is available. Will be exciting to see how it turns out. After that, my rough idea is to start work on a paper about scaling pre-trained symbolic music models. This should align with when I'm back at school which will be nice!

timber talon Aug 13, 2023, 3:30 PM

#

What can we do to help?

#

@ruby frost and i met to discuss 2 novel evaluation approaches, but they will be papers in-and-of-themselves and thus might not be ready for an arxiv release of this paper if you're already close to being done with experiments

hearty flicker Aug 13, 2023, 4:41 PM

#

I've got some thoughts on the future that I need to go over with Stella first. Will update soon.

#

Should be fairly easy to get you guys involved with what I have planned. Pretty exciting stuff imo!

timber talon Aug 15, 2023, 6:20 PM

#

ok sounds good man! just let me know. Starting to become a lot more flexible as the semester starts so whatever you need

hearty flicker Aug 18, 2023, 8:21 PM

#

Just a quick update guys : )

#

I'm currently working on squeezing all the perf I can out of the model. I'm also rewriting the pretraining loop using hf-accelerate to implement proper checkpointing/logging/experiment tracking.

#

As for my ideas for the future direction, I really want to get a sense of how well nlp-style alignment (pretrain -> finetune) techniques can be used with musical transformer models.

#

I have two basic research directions in mind:

#

How to scale symbolic transformer models. Considerations such as architectural details, allocation of the parameter budget, dataset weightings, data augmentation, and tokenizer differences, etc. There aren't any papers that delve deeply into these aspects that I have seen.

#

How to align pretrained models. A good example for the sort of question I'm interested in: If I finetune Aria (pretrained on ~350k midi files) on this (https://bushgrafts.com/midi/) amazing (but small) jazz dataset, how much better is the resulting model than one that wasn't pretrained? How about other alignment techniques?

#

From my experiance of fine-tuning music models, (2) makes a massive difference. In some (unpublished) work I was doing a few months ago, I observed that pretraining helped a massive amount when training a symbolic music models on J.S. Bach's Fugues (from the WTC Book 1&2).

#

However I haven't seen any papers on it.

timber talon Aug 18, 2023, 8:35 PM

#

Yah I’m really into this. We can also do a kinda “linguistic blood bank” kind of analysis for which styles help the most.. lemme find the paper

#

https://aclanthology.org/2022.naacl-main.361/

ACL Anthology

A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Map...

Dan Malkin, Tomasz Limisiewicz, Gabriel Stanovsky. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022.

#

Ex. Pretraining in classical might help when fine tuning on jazz, but it might not help for fine tuning on pop, for instance

hearty flicker Aug 18, 2023, 8:38 PM

#

timber talon Ex. Pretraining in classical might help when fine tuning on jazz, but it might n...

This is exactly the sort of stuff I'm interested in

#

I think its a cool research idea because it links very well with music stuff, nlp stuff, and transformer stuff!

timber talon Aug 18, 2023, 8:39 PM

#

That linguistic blood bank paper won a naacl best paper award!

hearty flicker Aug 18, 2023, 8:40 PM

#

Btw I reached out to the website Classical Archives and they agreed to donate their library of over 15k classical MIDI files to our pretraining dataset.

#

However this data is sensitive so it won't be availible for uses other than pretraining.

hearty flicker Aug 18, 2023, 8:43 PM

#

timber talon That linguistic blood bank paper won a naacl best paper award!

It might be good to investigate the effect of including pop songs in the pretraining dataset.

#

I have a feeling that although it's pretty far away from classical/jazz, it will still increase the quality of the aligned models.

#

I really want to do a proper training run soon, it would be nice to have a proper pretrained model to start experimenting on.

carmine musk Aug 21, 2023, 5:17 AM

#

Hi @hearty flicker I would like to contribute in the project wouod you please tell me more about it.

sand nymph Sep 20, 2023, 1:32 PM

#

@hearty flicker So the next step, now that I've gotten you access to the HPC cluster, is to test the efficiency of the code at the 16 GPU (2 node) scale right?

hearty flicker Sep 20, 2023, 1:33 PM

#

One node will be fine. I don't want to complicate things more than I need to right now!

#

I'm still working on making sure the training is as efficient as possible.

sand nymph Sep 20, 2023, 1:33 PM

#

Ah I forgot if you had done the 1 GPU -> 1 node jump yet

hearty flicker Sep 20, 2023, 1:34 PM

#

I've done 1x8 before for my training

#

I'm using accelerate for training so configuring for multiple nodes shouldn't be hard, however I haven't looked into it yet

sand nymph Sep 20, 2023, 1:35 PM

#

The goal should be 120-140 TFLOP/s/A100, though if you're using Flash Attention you can add 30-ish to those numbers. If you're getting below 120 and your code doesn't totally suck you're probably doing something wrong in the configuration

hearty flicker Sep 20, 2023, 1:36 PM

#

The training code will be pushed to the main repo soon. I'm pretty sure I've got everything configured efficiently however I will need to do some flop tests as you say. I am using flash attention btw.

sand nymph Sep 20, 2023, 1:42 PM

#

Awesome

#

So the next update we'll be looking for is efficiency confirmation and an estimate for the amount of compute required to train the models you'd like to train

timber talon Sep 21, 2023, 6:39 AM

#

very exciting!

hearty flicker Sep 25, 2023, 12:18 PM

#

Hey, @everyone! I wanted to provide a small update: I'm now officially back at university, so I'll be working on this pretty much full time. Consequently, this channel should become a lot more active! Here's a brief overview of what I'll be working on in the short-term and long-term.

The v1 version of Aria is essentially ready to be trained. However, there are a few things that need to be looked at. As @sand nymph mentioned, I need to ensure that I'm utilizing the compute resources efficiently. I will conduct some tests this week. Additionally, I need to explore modern ways of extending context windows, as this is important for music generation. I'm not very familiar with this area, but I've heard that this paper describes the current best way of doing it:

https://arxiv.org/abs/2108.12409

If anyone has any thoughts on this area, please let me know. Once Aria has finished pre-training, I have several areas that I'll be looking into in the short term:

Fine-tuning Aria on some small high-quality datasets (think jazz/classical) to create high-quality (possibly SOTA) generative models for symbolic music.
@timber talon, among others, is interested in evaluations for AI-generated symbolic music. I'll be looking into this by fine-tuning Aria on some well-known AI-music datasets that have additional meta information, which I will interpret as additional meta-tokens.
My supervisor is currently interested in and working on a paper on 'bum' (i.e., incorrect) note detection in symbolic music. I have an idea for a different approach that would utilize Aria. The general idea would be to run Aria over real music and flag notes when Aria assigns them very little probability. Due to the tokenization scheme I'm using, this should be easy to implement.

#

In the longer term, I'm really interested in scaling up Aria once again. Recently, audio to MIDI conversion (using deep learning) has become very good. For an example of this, you can see the GiantMIDI dataset, which itself will likely be largely responsible for the quality of Aria's generative output. I want to continue in this direction, building an even larger (and more diverse) dataset of MIDI that we can add to Aria's pre-training dataset in a future version. From the experiments I've done, I've observed that generative output just keeps getting better and more convincing the larger and higher quality the dataset. Since it is theoretically possible to systematically download (using tools like spotifydl) and process a large proportion of recorded piano works, I think this is a very promising direction for the long-term continuation of this project.

If anyone has any questions or ideas, feel free to message me!

sand nymph Sep 25, 2023, 2:17 PM

#

hearty flicker Hey, @everyone! I wanted to provide a small update: I'm now officially back at u...

We actually collaborated on a method for context length extension that has some notable advantages over ALiBi cc: @deft hedge @robust juniper @quasi steppe https://arxiv.org/abs/2309.00071

arXiv.org

YaRN: Efficient Context Window Extension of Large Language Models

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, r...

hearty flicker Sep 25, 2023, 2:19 PM

#

Do you have paper or preprint?

sand nymph Sep 25, 2023, 2:24 PM

#

Added to my reply; I was off looking for it

hearty flicker Sep 25, 2023, 2:32 PM

#

Thank you Stella, I will give this a read after ALiBi

quasi steppe Sep 25, 2023, 3:50 PM

#

Let us know if you have questions.
I have also been curious, do you guys have the full freedom to choose transformer architecture and directly train on some huge music dataset, or have to finetune on some large models available so far?

timber talon Sep 25, 2023, 4:11 PM

#

Hey loubb, welcome back to uni — and your progress here sounds really exciting!

As for context window extension, here’s another technique worth looking at as it also enforces some locality bias: https://arxiv.org/abs/2209.10655

On point #3 that sounds like a fascinating direction. These kinds of analyses can be a bit difficult because of the noisiness inherent in these kinds of measures on the word/note level. See Figure 4 and 16 in this paper that many of us were involved in: https://arxiv.org/pdf/2306.17806.pdf for a sense of what word-by-word analyses can look like. As you can see, they’re very noisy and also not uniform throughout (way more variation in the beginning). The context of these experiments is a little different from what your advisor is going for, because we were comparing two model variations with these metrics, but perplexity curves tend to look similar. I think maybe because some words are “correct” but still surprising… taking the sentence in a different direction or introducing a prepositional phrase or something. Not to say it’s impossible, but some kind of way of separating what a true “wrong note” vs. just an unlikely “correct” note (maybe self-supervised perturbed data?) could be useful here.

There’s another approach for #3 that I think would be really, really fascinating to compare to, which detects jargon terms:

https://blog.allenai.org/words-as-gatekeepers-measuring-discipline-specific-terms-and-meanings-in-scholarly-publications-718dc56d08a5?gi=024bad4293a6

This approach wouldn’t be for detecting “wrong” notes, but maybe a robust way to identify correct notes that are unlikely but specific to a genre — say, blues notes in jazz.

Curious to chat more about your other directions and take on some tasks if you need help with them!

#

I know we had a whole discussion about tokenization schemes… this might be a dumb question but can you humor me.. the compound word tokenization approach to me seems like it comes closest to actual sheet music representation: https://arxiv.org/pdf/2101.02402v1.pdf

Is there a reason why people don’t typically use this one?

hearty flicker Sep 26, 2023, 9:23 AM

#

quasi steppe Let us know if you have questions. I have also been curious, do you guys have th...

We are doing the pretraining from scratch so have full freedom - I'm trying to incorporate most of the modern transformer advancements. I'll let you know if I have any questions after I've read the paper. I'm a big fan of the original RoFormer paper so this looks really interesting.

hearty flicker Sep 26, 2023, 9:31 AM

#

timber talon Hey loubb, welcome back to uni — and your progress here sounds really exciting! ...

Alex, these references are brilliant! Thanks for sharing. I've had ideas along the same lines as the compound word tokenization approach, however I somehow missed this paper. I've only been working in this area for ~6 months so haven't had time to read everything yet haha.

#

My rough idea was to add the embeddings in the input (embedding) layer for the different quantities that need to be accounted for (instrument, pitch, velocity, duration)

#

Which would be fairly straightforward to implement

#

However I was unsure about how I could do the decoding (LM head) to get a probability distribution to compute the loss against. I foresaw some complications due to fact that I still would still need 'wait' tokens aswell of these compounded note tokens. The obvious solution would to use different LM-heads for the different quantities, and compute the total loss as the sum of the CEL derived from each LM-head, however it wasn't obvious to me how exactly to implement this way that would work well with the wait tokens.

#

The approach I took with Aria was to instead just compound as much as I can without explicitly adding representations together. Instrument, pitch and velocity are compounded into a single token. When quantizing the possible values for the velocity into, say, 10 different volume levels, the resulting vocab size is about 15,000. This feels resonable to me.

#

I did have to make the sacrifice of separating out duration tokens though. Excluding meta and special tokens, Aria therefore has three types of tokens: combined (instrument, pitch, velocity), durations tokens (in ms), wait tokens (in ms).

#

A typical sequence might looks something like: '(piano, pitch=62, velocity=70)', '(duration, 100ms)', '(wait, 100ms)'

#

The advantage of this approach is that I can do a very straightforward decoding stage (linear layer: d_model -> vocab_size, followed by a softmax, followed by CEL loss). Having said that, it does require more tokens than it 'should' in theory.

#

I'm really interested how they solve the problem that I mentioned, I'll give this a paper a read!

hearty flicker Sep 26, 2023, 10:12 AM

#

hearty flicker However I was unsure about how I could do the decoding (LM head) to get a probab...

To add to this, I was also skeptical about classifiying the different aspects of a note as being 'independant' via the multiple lm-head approach. For instance if I had a quickly ascending c-major scale, then it's natural that a model might suggest either the next note in the scale (lets say it is a b) for 50ms, or a low C for 500ms as a harmonic.

#

By splitting the predictions apart we could imagine the separate lm-head suggesting something like:

duration lm-head: 50% chance that duration is 50ms, 50% chance it is 500ms
pitch lm-head: 50% chance that pitch is a b, 50% chance that pitch is a low c

However we surely don't want the model to suggest the compound token (pitch=b, duration=500ms) in this context, as that would sound bad over a c-major scale.

#

I'm not sure how much of an issue this would be in practice, it is just the line of thinking that made me go with a different approach. I'm looking forward to reading this paper to see how they deal with this issue.

timber talon Sep 27, 2023, 12:26 AM

#

Ahh man it really makes one appreciate how linear text is! I think i pretty much follow your approach. Just to clarify, the "wait" token is the equivalent to a rest in sheet music, correct?

#

Ultimately, I think this is an empirical question and probably, with enough data, it doesn't matter. And you've experimented with this for a while, so you have more intuition here, slash the tokenization scheme has been set for a while so i'm sorry to keep questioning it

My only intuition is that the model's internal representation should be close to sheet music. There's some interesting neuropsychology research that indicates that rhythm and pitch comprehension are actually different cognitive processes (https://pubmed.ncbi.nlm.nih.gov/14681127/), so maybe the LM head approach isn't a bad one to represent that

That being said, I definitely think you're right that rhythm and pitch are correlated. However, even in the other compound word approach, it's pretty trivial to add cross attention between the multiple LM-heads to model that correlation, isn't it?

hearty flicker Sep 27, 2023, 10:23 AM

#

timber talon Ahh man it really makes one appreciate how linear text is! I think i pretty much...

The 'wait' tokens are there to represent the amount of time to wait until the next note. The sequences need a way of tracking the flow of time, if that makes sense.

hearty flicker Sep 27, 2023, 10:26 AM

#

timber talon Ultimately, I think this is an empirical question and probably, with enough data...

I definely agree that the compound representation is closer to sheet music. I look forward to reading the paper to see how they solve the problem I mentioned! There is probably a smart solution that I haven't thought of.

hearty flicker Sep 28, 2023, 3:49 PM

#

Just a quick update - the training loop is finished. I still have to test that it will work correctly in a distributed setup and verify that it's utilizing hardware properly (e.g. measure the flops). I'm going to add some multiprocessing to the data processing and hopefully run a training test overnight on my home 1x4090 server.

hearty flicker Sep 29, 2023, 11:44 AM

#

The dataset preprocessing utilities I've written have been incredibly enlightening. It turns out in most music datasets (commonly used MIDI ones) there are a ton of subtlety duplicate files with different names / meta information.

#

The unimportant differences lets them pass a file-level hash duplication check, when they should be a duplicate in the eyes of a tokenizer. This definitely has the potential to cause duplication leakage into validations sets.

tiny coral Sep 29, 2023, 4:00 PM

#

I had thought about the possibility of creating an embedding model to try to detect duplicates at a more semantic level. How many MIDI files are recreations of the same original piece of music, I wonder?

hearty flicker Sep 29, 2023, 4:12 PM

#

That's a cool idea

hearty flicker Oct 2, 2023, 3:46 PM

#

Currently running an experiment on the home gpu server in my flat - will post some samples later

hearty flicker Oct 2, 2023, 4:37 PM

#

Everything is essentially ready now for a full scale training run. The data library is all properly multithreaded now too, it no longer takes an few hours to build the full dataset lol

hearty flicker Oct 4, 2023, 12:42 PM

#

With the data augmentation that I've added, the experiment I'm training is still decreasing in val and train loss after 75 epochs

#

This is only training on about 5% of the total dataset, and the model has only ~30m model size

#

Here are some samples prompted with Bach's Cmajor prelude

#

All of these are non-cherry picked btw

quasi steppe Oct 4, 2023, 1:25 PM

#

hearty flicker Here are some samples prompted with Bach's Cmajor prelude

is the C major prelude part of the training data or does the end of prompts slightly vary across those generations?

hearty flicker Oct 4, 2023, 1:27 PM

#

This particular prompt isn't (this exact MIDI file), however I'm sure a different version of the piece is in the training data

#

The prompts for each of these is identical, the only difference is the randomness inherent to the sampling process

#

If there are any pieces you would be interested in prompting with, let me know and I'll give it a spin

#

I should reiterate that experiment is not scaled properly (model seems to be unable to overfit the train set). I'm currently preparing to run a full scale training run

quasi steppe Oct 4, 2023, 1:35 PM

#

hearty flicker The prompts for each of these is identical, the only difference is the randomnes...

gotcha. I was just curious because 3 and 10 got the 17-18 bars correctly and the rest don't 😂

#

but really fascinating results!

quasi steppe Oct 4, 2023, 1:37 PM

#

hearty flicker This is only training on about 5% of the total dataset, and the model has only ~...

It's really amazing that ~30m model can do this. When I have time I will play with it for sure

hearty flicker Oct 4, 2023, 1:38 PM

#

The full-scale version will be somewhere in the range 200m-500m

#

This is a continuation of my favourite Fugue by Bach

sharp quiver Oct 4, 2023, 1:49 PM

#

hearty flicker If there are any pieces you would be interested in prompting with, let me know a...

Could you try Hungarian Rhapsody nr2?

hearty flicker Oct 4, 2023, 2:04 PM

#

I think this one would really require a model with longer context to give it a chance haha

#

The full-scale version will have 4x-8x the context length

#

I will give it a go anyway though

#

Ok this is a non cherry picked example and it is wild lmao

#

Doesn't get to generate for very long since it runs out of context

#

another @sharp quiver

sharp quiver Oct 4, 2023, 2:38 PM

#

❤️

quasi steppe Oct 4, 2023, 3:06 PM

#

hearty flicker The full-scale version will have 4x-8x the context length

if you use RoPE you might get 2x more without finetuning for free by doing some dynamic scaling technique

hearty flicker Oct 4, 2023, 3:07 PM

#

quasi steppe if you use RoPE you might get 2x more without finetuning for free by doing some ...

I am using RoPe

#

Although I'm pretty sure I'm going to implement ALiBi (or YaRN) before doing the actual train run

quasi steppe Oct 4, 2023, 3:21 PM

#

hearty flicker Although I'm pretty sure I'm going to implement ALiBi (or YaRN) before doing the...

yeah dynamic YaRN can directly patch on RoPE without finetuning for this 2x I talked about

hearty flicker Oct 4, 2023, 3:22 PM

#

Right now I've got RoPe implemented in the exact same way as it is for neox

#

Idk if this is ideal to be honest, the position embeddings for the dynamic sequence lengths are handled by the following code ``` def forward(self, x, seq_dim=1, seq_len=None):
"""Returns tuple cos, sin"""
# Comment out bfloat16() specific code for now
if seq_len is None:
seq_len = x.shape[seq_dim]
if seq_len != self.seq_len_cached:
self.seq_len_cached = seq_len
t = torch.arange(seq_len, device=x.device).type_as(self.inv_freq)
freqs = torch.einsum("i,j->ij", t, self.inv_freq)
emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
# if self.precision == torch.bfloat16:
# emb = emb.float()
self.cos_cached = emb.cos()[:, None, None, :]
self.sin_cached = emb.sin()[:, None, None, :]
# if self.precision == torch.bfloat16:
# self.cos_cached = self.cos_cached.bfloat16()
# self.sin_cached = self.sin_cached.bfloat16()

    return self.cos_cached, self.sin_cached

#

Due to the nature of the data, during training the model will only see contexts length of max_seq_len, so the re-calculation is only ever used when sampling

#

I definitely want support for longer contexts, it's the next thing to cross of the list

hearty flicker Oct 4, 2023, 4:50 PM

#

warm tangle Oct 4, 2023, 4:55 PM

#

Wow! How many parameters are there for this ^^?

hearty flicker Oct 4, 2023, 5:10 PM

#

30m

sand nymph Oct 4, 2023, 5:26 PM

#

@hearty flicker How do you plan on evaluating the models? Are there muscians who you can have listen to samples and guess which famous person is being mimicked?

hearty flicker Oct 4, 2023, 5:27 PM

#

There is a large ai-music/ai-audio group in my department that I plan to use for this

#

Me and @timber talon are also working on some symbolic-music eval stuff

hearty flicker Oct 4, 2023, 6:18 PM

#

Last ones for tonight, I think these both sound pretty nice

quasi steppe Oct 4, 2023, 8:00 PM

#

hearty flicker Idk if this is ideal to be honest, the position embeddings for the dynamic seque...

no I meant dynamic scaling in the interpolation, not dynamic length.
https://github.com/jquesnelle/yarn/blob/master/scaled_rope/GPTNeoXDynamicScaledRotaryEmbedding.py
Better to also include YaRN.

#

In particular I meant what's in the following chart. It's the vanilla Llama-2 without any finetuning and we only modify the model codes. The model only has 4192 ctx but dynamic-yarn stabilizes the ppl to ~8000.

The reason I'm really curious is that we didn't test it on a lot of downstream tasks and it would be extremely interesting to "hear" whether there is a difference for the first half and the interpolated second half in this case.

hearty flicker Oct 4, 2023, 8:46 PM

#

I'm definitely up for this, I'll start looking into it tomorrow

timber talon Oct 4, 2023, 8:48 PM

#

hearty flicker If there are any pieces you would be interested in prompting with, let me know a...

i wonder how it'll do with Phillip Glass? e.g. Piano etude no. 12?

hearty flicker Oct 4, 2023, 8:51 PM

#

timber talon i wonder how it'll do with Phillip Glass? e.g. Piano etude no. 12?

I'll give this a go tomorrow : )

timber talon Oct 4, 2023, 8:52 PM

#

hearty flicker

lol truly wild. I love that secondary melody that it just, like, randomly introduces lol

quasi steppe Oct 5, 2023, 12:00 PM

#

there was that "world model" paper a couple days ago, but I'm thinking maybe music model is an even better one for that kind of experiment since every piece has a much more natural geographical/temporal label (as opposed to "ask the LM for latitude/longitude").

timber talon Oct 5, 2023, 8:06 PM

#

quasi steppe there was that "world model" paper a couple days ago, but I'm thinking maybe mus...

sorry can you say more about this? you're not talking about this "world model" paper, are you https://robotics-transformer-x.github.io/ ?

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Project page for Open X-Embodiment: Robotic Learning Datasets and RT-X Models.

sand nymph Oct 5, 2023, 8:09 PM

#

timber talon sorry can you say more about this? you're not talking about this "world model" p...

This one, I suspect https://x.com/wesg52/status/1709551516577902782

timber talon Oct 5, 2023, 8:15 PM

#

ohh thanks!!
@quasi steppe by "every piece has a much more natural geographical/temporal label", do you mean the geography of the composer and the time period it was written in?

quasi steppe Oct 5, 2023, 8:18 PM

#

sand nymph This one, I suspect https://x.com/wesg52/status/1709551516577902782

Yeah this one

quasi steppe Oct 5, 2023, 8:18 PM

#

timber talon ohh thanks!! <@823129585230544906> by "every piece has a much more natural geog...

Exactly. Like whether you can use a linear probe to see the songs are embedded in a way that respects the time

timber talon Oct 5, 2023, 8:26 PM

#

could be interesting - I'm aware of some classical work looking at geographic differences in music/language: https://lchc.ucsd.edu/MCA/Mail/xmcamail.2009_11.dir/pdfZz8vEGN9aS.pdf

but in these cases they had to heavily restrict the analysis to pieces "of nationalistic character" because the "average" piece contains a good deal of cross-reference

sand nymph Oct 5, 2023, 8:26 PM

#

The more I read the world model paper the less compelling I find it tbh

timber talon Oct 5, 2023, 8:27 PM

#

i mean what you're getting at is broader than music, I think... you're getting at any kind of "style" differences that exist based on geography/time. Dialects in language, artistic movements in visual art, etc.

timber talon Oct 5, 2023, 8:27 PM

#

sand nymph The more I read the world model paper the less compelling I find it tbh

haha i agree :/ but that's most papers these days

sand nymph Oct 5, 2023, 8:28 PM

#

Carlini told me that he liked the Pythia paper more each time he read it which is basically my goal with writing papers now

timber talon Oct 5, 2023, 8:31 PM

#

wow! high praise.

The "world model" paper basically reminds me of what Jason Wei had to say recently: We've seen a transition along the "gradient" of publishing: conference papers -> arxiv/tech reports -> blog posts -> (tweet + code), and I expect the trend to continue. (https://twitter.com/_jasonwei/status/1709634375233716591). As in, it was written for the tweet

hearty flicker Oct 9, 2023, 8:35 AM

#

Hey guys. My currently plan is to run a scaled up test over the weekend using the HPC. Before that I need to implement and test the context length extension, along with doing some refactoring.

#

I'm going to send Stella some more detailed information about the paper that I plan to write. The general idea will be 1/2 on scaling music transformers, and 1/2 on applications of transfer learning (e.g. fine-tuning the model) for generative and MIR tasks. On the generative side, I'm especially looking forward to fine-tuning on this dataset and tying in the LIMA paper since the dataset is quite small. I can't find the 2024 ICML deadline, however I might aim to get this work submitted there. Music transformer paper have done well there in the past.

#

If anyone wants to contribute, a good idea would be to experiment with your own fine-tuning and see if you can come up with anything interesting. I'd obviously add anyone who contributes something that makes it into the paper as an author. As I said, I'm refactoring the fine-tuning code this week so it should be fairly straightforward use.

#

Just a reminder that you can find the repo here - https://github.com/EleutherAI/aria. The HOWTO might be slightly outdated, however I can also update that this week too (after the refactor).

hearty flicker Oct 9, 2023, 2:14 PM

#

Hey @quasi steppe I'm curious what method you would choose for positional embeddings / context extension if you were given complete freedom over the architecture. I'm tempted to just stick with rotary embeddings and train with 2048 or 4096 context.

#

A context 2048 corresponds to about 2-3mins of music with one instrument.

quasi steppe Oct 9, 2023, 2:16 PM

#

hearty flicker Hey <@823129585230544906> I'm curious what method you would choose for positiona...

because we wrote the YaRN paper so I definitely pick RoPE haha

hearty flicker Oct 9, 2023, 2:16 PM

#

I've been reading the yarn paper this afternoon.

#

Is there anything that I'd need to change for pretraining?

quasi steppe Oct 9, 2023, 2:17 PM

#

nope. The whole point of that paper is that you can extend it afterwards in a data-efficient way

hearty flicker Oct 9, 2023, 2:17 PM

#

I could just carry out the fine-tuning procedure that you describe

quasi steppe Oct 9, 2023, 2:18 PM

#

so far it seems better to worry about context length in finetuning, unless you really really really go for long-range qualities and have tons of budget

hearty flicker Oct 9, 2023, 2:18 PM

#

I think it will be a pretty interesting test of your ideas : )

#

Long context music generation has been a fairy well researched area is the past too (for symbolic music).

quasi steppe Oct 9, 2023, 2:20 PM

#

if you try to put in YaRN, make sure to try the dynamic YaRN for extension without finetuning. I really want to see how it is in concrete tasks

hearty flicker Oct 9, 2023, 2:20 PM

#

Since aria will be in the range of 200m - 800m I think I'll be able to fit 4096 with flash attention on an A100 with ddp.

quasi steppe Oct 9, 2023, 2:21 PM

#

I'm still locked out of SAI infra for GPUs, otherwise I might have time to help you integrating YaRN in your stuff

#

I only have tpu pods and I'm working on adapting models I want to test in JAX which is really taking a lot of time 😂

hearty flicker Oct 9, 2023, 2:22 PM

#

Do you have any code that accompanies the paper?

#

If not, I have a mathematical background (the same as you I think - algebraic geometry) so I should be able to do it by myself ha

quasi steppe Oct 9, 2023, 2:24 PM

#

hearty flicker If not, I have a mathematical background (the same as you I think - algebraic ge...

Wow nice!! Fellow algebraic geometer! There are a few others here but we are rare lol

hearty flicker Oct 9, 2023, 2:24 PM

#

I ended up pivoting in ML instead for my PhD ha

quasi steppe Oct 9, 2023, 2:25 PM

#

me too and I'm doing it post-PhD 😂

hearty flicker Oct 9, 2023, 2:25 PM

#

Anyways I'll give the yarn stuff a proper read now and get it in my codebase

#

It's probably good to make sure it works before I do the pretraining

#

I was pretty skeptical about using ALiBi due to how much I like rotary embeddings

hearty flicker Oct 9, 2023, 2:30 PM

#

timber talon i wonder how it'll do with Phillip Glass? e.g. Piano etude no. 12?

If you can find a MIDI file for this, I'd love to try this by the way

#

I had some trouble finding it myself

hearty flicker Oct 9, 2023, 2:42 PM

#

quasi steppe if you try to put in YaRN, make sure to try the dynamic YaRN for extension witho...

If I'm reading the code right, the only difference between dynamic yarn and normal rotary embeddings is that when scaling above the context length, we multiply t by self.max_position_embeddings / seq_len ?

#

From here - https://github.com/jquesnelle/yarn/blob/master/scaled_rope/GPTNeoXDynamicScaledRotaryEmbedding.py

#

That should be very easy to implement

quasi steppe Oct 9, 2023, 2:58 PM

#

hearty flicker If I'm reading the code right, the only difference between dynamic yarn and norm...

yes it's a super simple idea. Just dynamically change the scaling factor as the generation grows

hearty flicker Oct 9, 2023, 2:59 PM

#

I can try this now for you then

#

Current mini experiement is trained with 512 so I can try to double it and see what happens

quasi steppe Oct 9, 2023, 2:59 PM

#

gotcha. Basically when the ctx length is < 512 nothing should change. When it get bigger we just gradually scale up the factor from 1.0 to 2.0

quasi steppe Oct 9, 2023, 3:03 PM

#

hearty flicker Anyways I'll give the yarn stuff a proper read now and get it in my codebase

by the way, below is a draft for the EAI blog about YaRN. It's mostly similar but I tried to make it easier to follow while cutting down some technical details. Feel free to read and give comments
https://docs.google.com/document/d/1Vozhiuv2EV5zKD_iskExHTvm04sOPG7-e7vPg12eTyA/edit

hearty flicker Oct 9, 2023, 3:27 PM

#

Hmmn I think due to the way I'm tokenizing my sequences, there is an issue with just changing the code

#

Unless I'm making an obvious mistake

#

During pre-training the sequences normally have some padding on the end, I think around ~512 the model puts in a padding token and then fills the rest of the additional context with it too.

#

Not always immediately, however it kinda makes sense since <P> is a fairly common token at the end of a sequence

#

I can try setting the weight for the padding token to 0 and see if that helps

#

This is actually a pretty good find tbh, I should change this in my tokenization

sand nymph Oct 9, 2023, 3:36 PM

#

@hearty flicker you should be masking the loss for the padding tokens, so that the model doesn't learn to generate them

hearty flicker Oct 9, 2023, 3:37 PM

#

Yeah this is something I've just overlooked, super glad that I found it to be honest

#

It doesn't matter in the context of generating sequences < max_seq_len

sand nymph Oct 9, 2023, 3:38 PM

#

Well

hearty flicker Oct 9, 2023, 3:38 PM

#

But in this specific case it does matter a lot

sand nymph Oct 9, 2023, 3:39 PM

#

That makes it sound like you're misconfiguring it

#

If a sequence has max sequence length, why is there a padding token at all?

hearty flicker Oct 9, 2023, 3:40 PM

#

<P> tokens can only be found either directly after a <E> token, or within the range max_seq_len - 3 < x < max_seq_len as to not truncate a note midway

hearty flicker Oct 9, 2023, 3:40 PM

#

sand nymph If a sequence has max sequence length, why is there a padding token at all?

This is exactly what I overlooked

#

I originally implemented it so that a note wasn't truncated midway (as it is described by 2 tokens)

quasi steppe Oct 9, 2023, 3:41 PM

#

hearty flicker <P> tokens can only be found either directly after a <E> token, or within the ra...

I used to pad every single short sample and I vaguely remember the result was slightly worse than just concatenate everything (separated by eos) and truncate. Training is MUCH slower also.

hearty flicker Oct 9, 2023, 3:42 PM

#

Yah this is just something I've overlooked, glad I found it now

hearty flicker Oct 9, 2023, 3:47 PM

#

sand nymph <@150031585553547264> you should be masking the loss for the padding tokens, so ...

Do you know if I can do this by setting ignore_index in nn.CrossEntropyLoss, to the id of the padding token?

sand nymph Oct 9, 2023, 3:49 PM

#

hearty flicker Do you know if I can do this by setting `ignore_index` in nn.CrossEntropyLoss,...

Yup! That's exactly what it's for.

hearty flicker Oct 9, 2023, 4:23 PM

#

quasi steppe gotcha. Basically when the ctx length is < 512 nothing should change. When it ge...

I'll test this for you when the next test is done, need to retrain with the padding fixed. I'll let you know

hearty flicker Oct 9, 2023, 5:00 PM

#

sand nymph Yup! That's exactly what it's for.

Did you say I should aim for 170tflops on a A100 (with flash attention) at bfloat16?

quasi steppe Oct 9, 2023, 5:09 PM

#

hearty flicker Did you say I should aim for 170tflops on a A100 (with flash attention) at bfloa...

the number looks right

sand nymph Oct 9, 2023, 5:48 PM

#

hearty flicker Did you say I should aim for 170tflops on a A100 (with flash attention) at bfloa...

Yup

hearty flicker Oct 12, 2023, 9:40 AM

#

I just got finished training another small version of the model on a slightly larger dataset. The new data is had sequenced and has made a big difference - I'll upload some more samples throughout the day

hearty flicker Oct 12, 2023, 10:13 AM

#

It seems to do a good job understanding musical form without being explicitly told about it, which is quite suprising. For those familar with the Fugue form, checkout this non-cherrypicked sample. Still quite short since I'm using 512 sequence length. The real piece is Bach's c-major fugue BMV 846 - https://www.youtube.com/watch?t=131&v=_3qnL9ddHuw&feature=youtu.be

timber talon Oct 13, 2023, 8:21 PM

#

hey @hearty flicker this sounds exciting and i'm into it, apologies for the delay. I really like the idea of 50% of the paper being focused on fine-tuning... there are so many tasks already out there that can test for musical comprehension, even beyond the ones we've already discussed for eval, but they don't really get incorporated into GenAI papers at all.

Can I ask a more basic/fundamental question though, that's been on my mind?

Applying generative LLMs to other benchmark tasks usually works because the tasks can be reformatted into seq2seq tasks. Like, for a document classification task, instead of calculating probability vector p(y | x), we can ask the LLM to just generate the name of the class.

It's less clear to me how to transform many of the MIREX tasks into seq2seq musical representations that a generative music model could output.

The alternative is to use the output embeddings of the decoder, and then put a linear classifier layer on top of that. However, in text stuff i've done, I've found that doing this with autoregressive GPT models produces a worse classifier than using a bidirectional encoder model like Roberta.

So my question is — do you think it's worth it to also think about masked modeling, or encoder/decoder setups for this part of the paper?

hearty flicker Oct 13, 2023, 8:23 PM

#

I'll get back to you properly tomorrow, you might be interested in this paper in the meantime - https://arxiv.org/abs/2106.05630

timber talon Oct 13, 2023, 8:24 PM

#

octupleMIDI encoding nice lol

#

this is a really cool paper, exactly what i was curious about. Their tasks are also lacking .. in theory we could improve every aspect of this paper.

I guess a counter argument to my question is that maybe there is a way to format a lot of tasks as seq2seq tasks, would just require some thought

hearty flicker Oct 13, 2023, 8:28 PM

#

That is a big motivator for me, all music symbolic music research I've seen lacks in one way or another. The library I've built is 'basically' perfect as I've not been lazy anywhere. My hope is that this pays off when it comes to downstream tasks.

#

Generatively speaking, I think it's pretty evident. My supervisor today wanted to submit some of the fugue I've generated, he was pretty shocked by how well they adhered to the musical form.

timber talon Oct 13, 2023, 8:30 PM

#

here is a philip glass midi, btw — would be a great "easy" test to see if form-understanding is possible

#

hearty flicker Oct 13, 2023, 8:30 PM

#

For more MIR stuff, I think that there are ways to introduce seq-to-seq. I haven't looked into it too much yet but I'm quite excited.

#

There are ways to use autoregressive models to do information retrival, although you are right that the bert style encoders are normally better for obvious reasons.

timber talon Oct 13, 2023, 8:31 PM

#

it's actually not uncommon in NLP to have a generative model generate stuff, and have a separate encoder model evaluate it. BERTScore for translation and BARTScore for summarization are two examples of that

hearty flicker Oct 13, 2023, 8:31 PM

#

I am very interested in training a MLM version of my current model, it would only take a few hours to implement but I haven't done it yet.

#

I'm currently training a context length 2048 version of aria on the small classical dataset I've been using. I'll try the Philip Glass midi on that when it is done!

#

I'm about to go to the pub with my friend, but I will get back to you tomorrow about this! Maybe we should setup a meeting sometime soon : )

timber talon Oct 13, 2023, 8:36 PM

#

hearty flicker I am very interested in training a MLM version of my current model, it would onl...

yeah that's exactly what I'm thinking... unless the Midi representation is so different for MLMs vs. generative models?

I'm sorry i don't mean to put stuff on your plate. i'm super down to chat and also take on a task instead of just spitballing lol — maybe i can do some of the MLM stuff if that would be helpful. enjoy the pub! ttyl

hearty flicker Oct 13, 2023, 8:40 PM

#

I actually started off doing music stuff from the angle of non-autoregressive BERT style stuff! You can think of aria as pretty much being 1-1 with something like gpt. It really wouldn't be much effort to fork a version that does MLM instead.

#

Just need to change how the loss is calculated, the format of the (src, target) tensors, and the casual mask

hearty flicker Oct 15, 2023, 5:39 PM

#

When including hand sequenced data in the training set, the generative quality of the model degenerates in some sense.

#

I seems to grasp onto patterns of repetition and repeat them over and over again. Makes sense why it would do that.

#

I think it kind of confirms my suspicion that scaling up the dataset increases its quality as a pretrained model, however if you want to use it for generative stuff you need to finetune away some of those unwanted behaviors.

hearty flicker Oct 16, 2023, 11:08 AM

#

Hey guys, give this a listen. This is from a 2048 context length model. AI generated after ~12 seconds in

#

Unfortunely when training with longer contexts, the model seems to gravitate toward repeating itself over and over again when using normal sampling temps (0.6 - 0.8). As a result, I have to bump the temperature up to above 0.9 to get music with some variation in it. This results in some bad samples which you can hear in the music.

#

In this sample towards the end it degenerates too.

sharp quiver Oct 16, 2023, 12:04 PM

#

Have you tried top p? https://twitter.com/finbarrtimbers/status/1713230709312389346?t=eooZht02cDx5dFybiCw7TQ&s=19

sand nymph Oct 16, 2023, 12:05 PM

#

I can tell what you mean about the repetition

hearty flicker Oct 16, 2023, 12:08 PM

#

I use top-p already. I think there must be a sweet spot top-p and temperature setting that will do the trick.

#

If I reduce the temp too much it will just repeat the same bar over and over.

#

Since notes are encoded over two subsequence tokens (pitch and duration), I think that beam search might be a good idea. It's a bit of a pain to implement though...

sharp quiver Oct 16, 2023, 12:14 PM

#

I don't think beam search is a good idea for music generation

sand nymph Oct 16, 2023, 12:16 PM

#

My mother is a classical pianist and she said

They are pretty songs but the playing lacks emotion and nuance.
[I told her it was AI]
Interesting because I didn’t think that at first.

#

Regarding repetition

Sometimes songs do that but the emotion of the player helps. It was particularly noticeable towards the end with the repeated high note.

quasi steppe Oct 16, 2023, 12:20 PM

#

@hearty flicker have you tried CFG?

hearty flicker Oct 16, 2023, 12:34 PM

#

sand nymph My mother is a classical pianist and she said > They are pretty songs but the pl...

It's very interesting that it's not immediately obvious!

#

If you guys wana get a sense for other work on this topic btw, you can check out Google's stuff - https://magenta.github.io/listen-to-transformer/#a1_94913.mid

hearty flicker Oct 16, 2023, 12:37 PM

#

sharp quiver I don't think beam search is a good idea for music generation

I've heard bad things about beam search in general. Atm I'm just using greedy decoding with top-p=0.95

#

It is also interesting to note that the dataset I'm using for tests comes from a transcription model (audio -> midi). So in some sense the only primitive for that sample is audio

hearty flicker Oct 16, 2023, 1:32 PM

#

quasi steppe <@150031585553547264> have you tried CFG?

Is CFG the name for the context length extension that you were talking about

quasi steppe Oct 16, 2023, 1:36 PM

#

hearty flicker Is CFG the name for the context length extension that you were talking about

No, it was classifier free guidance.

#

oh nvm, CFG might actually make it worse (sticking more to the prompt)

hearty flicker Oct 16, 2023, 1:39 PM

#

I haven't heard of that but I'll have a look

hearty flicker Oct 16, 2023, 3:03 PM

#

It's possible that the regularization I'm using is causing the 'repeating' problem. I'm going to run another experiment with double model size and a previous value I was using for weight_decay in AdamW

timber talon Oct 16, 2023, 10:16 PM

#

CFG: https://arxiv.org/pdf/2306.17806.pdf

timber talon Oct 16, 2023, 10:18 PM

#

hearty flicker

lol it goes from debussy to chopin
i keep wanting it to resolve and it doesn't lol

#

well at least we know it definitely understands chords, even if the progressions themselves are kinda atypical to say the least

hearty flicker Oct 17, 2023, 9:19 AM

#

CFG sounds pretty promising especially for my use case. Will implement this in the sampling library

timber talon Oct 17, 2023, 6:12 PM

#

yeah i was thinking about it more last night. the debussy example you gave is definitely repeating a lot, but it's also repeating a phrase that's pretty far away from the prompt — like, it never really returns to the prompt after drifting away. "it goes from debussy to chopin" is real but it feels a little more pop/video game music, even, it seems like it just reverts towards some central mean in the dataset

quasi steppe Oct 17, 2023, 6:14 PM

#

CFG also reduces diversity so I actually see the arguments for both sides. Would be very interesting to find out exactly what will happen

tender dragon Oct 18, 2023, 2:36 AM

#

hearty flicker

I have a background in this field, but I'm more of a math-and-music-cognition person, and less skilled in ML.

I have a music generation project, but it's algorithmic rather than NN-based. My experience is that using temperature to avoid simple repetitions seems ill-founded (I'm using it too). I know more about the harmony aspect of temperature, but let's focus on phrasing since that is what seems to be giving trouble at the end of this piece. I'll talk about the algorithmic approach with naive temperature, what succeeds and falls down, then how these lessons transfer over to ML.

So let's consider the setting of my algorithmic generator. First, we have an existing phrase, and we generate another phrase that is similar to it. With temperature, the most likely result is simply a duplicate of that phrase. And this sounds bad. With sufficient grasp of harmony and rhythm, we can create a large body of good possibilities and then turn the temperature up. Fine - now there's a large diversity of sounds, and repetitions are infrequent. But when a repetition appears, it still sounds bad, most of the time.

What if we simply disallow repetitions? Now things are always new, but it still sounds bad. All the notes are consonant and novel, but it seems unstructured/meaningless. And it has no coherence.

Generating a stochastic mixture of new and not-new phrases, with the right proportions, occasionally sounds good, but it's clear that's by luck rather than intelligence. Usually it sounds bad.

So something about our approach is not working. The reason is that there's a hidden problem here, of tension and resolution, that this algorithmic approach is not modeling. Apparently the new and not-new phrases need to be in the appropriate places: sometimes the listener wants repetition and sometimes she doesn't. And randomness is the wrong approach.

#

So now what? For my algorithmic approach, I already identified the tension issue and solved about 3/4 of it, and I need to solve the remaining 1/4 to get good results. But for ML, the issue is different. Can the NN understand tension and resolution from existing compositions? I haven't seen ML music generators do well with it. But the answer should be yes: if the NN attends to the right things, it should be able to perceive tension and resolution easily.

The solution is to look at how humans perceive tension: it's caused by how human music memory works. And human music memory has two steps:

The first step is feature-matching, as in Feature Integration Theory. When comparing to a phrase in the past, the first few notes are compared to each other. This is captured decently by the attention of a transformer, although labeling metric stresses (as a bit of feature engineering) should make this attention more accurate. The second step in human music memory is retrieval, which is sequential in the forward direction. Which means that in human attention, if note A retrieves note B from the past as salient, then the note after A will retrieve the note after B as salient too. And a "resolution" happens when there is a strong connection between the present note and what the human's memory is attending to - usually a repetition in features. In ML terms, this means attention for phrase retrieval should also be sequential after the initial search, to accord with human music memory. So if a note has high attention, the next note should have its attention boosted in the next step. (There's annoying details with stream segregation and rhythmic stress letting memory skip notes, so it's probably safer to let the NN decide which of the next notes to boost.)

#

Being able to recognize and use tension requires the NN to see both the current and past phrases together. Then repetition becomes an active choice, conditioned on whether the NN wants tension or not. The highest probability choices shouldn't be dominated by repeated sequences, because that is a "bad" choice, not just an un-diverse choice. Then the purpose of temperature would not be to avoid repetition of phrases, but to avoid repetition of pieces.

It's not clear to me that as-is transformers can perform this comparison of phrases, because the attention of the current note depends on "what was the attention of the past note". That seems to require some sort of recurrence or information passing.

tender dragon Oct 18, 2023, 5:06 AM

#

I looked at OctupleMIDI encoding. Its position information actually captures the "labeling metric stresses" issue I mentioned, absent syncopation, but a transformer can already spot syncopation. So no feature engineering for stresses is necessary, OctupleMIDI solves it automatically.

hearty flicker Oct 18, 2023, 12:11 PM

#

Messing around with high values for cfg_gamma and temp produces some very weird stuff haha

#

This prompt is quite out of distribution for the training data (since the training data is all live performance), so it often produces weird results anyway

hearty flicker Oct 18, 2023, 12:15 PM

#

tender dragon I have a background in this field, but I'm more of a math-and-music-cognition pe...

One of my supervisors has a lot of experience with this algorithmic stuff. I suppose I'm more interested in pure DL methods as they produced more varied and weirder results.

#

It's also got a lot to do with the data. I if I train/ft a transformer on some very structural music with well defined cadences, it doesn't really have a problem perfectly reproducing it.

#

The data I'm currently testing on is very very varied and all from live performance (only 5000 recordings), so the model has a harder time producing very structural music. I do wonder if/how much strucutre would emerge when scaling the dataset 100x.

quasi steppe Oct 18, 2023, 12:31 PM

#

hearty flicker Messing around with high values for cfg_gamma and temp produces some very weird ...

Man I actually like it a lot 😂

hearty flicker Oct 18, 2023, 12:37 PM

#

This is with cfg of 1.5

#

It still has the repeating problem, however part of me thinks that this sort of issue is unavoidable when you are working with smaller datasets.

#

Luckily it's straightforward (in theory) to scale it up, but will take some engineering effort on my end.

#

I'm actually super interested in trying to apply some of the ideas they used in alphafold to improve the audio -> midi conversion models.

#

To my ear, the cfg stuff is making a big difference. Even if the model is not perfect.

hearty flicker Oct 18, 2023, 1:43 PM

#

Giving no prompt apart from the composer name & high temp gives weird results too...

quasi steppe Oct 18, 2023, 1:45 PM

#

hearty flicker To my ear, the cfg stuff is making a big difference. Even if the model is not pe...

Wow

#

Yeah repeat might be a separate issue than CFG.

hearty flicker Oct 18, 2023, 1:52 PM

#

The repeating problem has gotten a lot better in the version I just trained, I think it was especially bad before because I cranked up the weight decay in AdamW by 10x

quasi steppe Oct 18, 2023, 1:55 PM

#

So the lower weight decay the better, even for small models?

When I train 3-13B models I set weight decay to 0 and saw no difference in eval 😂

hearty flicker Oct 18, 2023, 1:56 PM

#

I think the model was too small to have the large value I used

tender dragon Oct 18, 2023, 3:51 PM

#

hearty flicker One of my supervisors has a lot of experience with this algorithmic stuff. I sup...

I agree with your perspective and don't encourage people to generate music algorithmically. The main barrier that I see is that music cognition science is strongly necessary for algorithmic composition, but this field is not well-developed. I was forced into music cognition research myself to cover the missing theories; it's not something I intrinsically care about. It's much easier to let a NN figure out things by training on compositions.

tender dragon Oct 18, 2023, 3:52 PM

#

hearty flicker The data I'm currently testing on is very very varied and all from live performa...

I wasn't aware your dataset was so small; I agree that's a more fundamental issue than NN topology, and should be solved before experimentation in less-easy aspects. Are you planning on scaling to modern music (such as pop, EDM, atonal), or expanding within classical?

hearty flicker Oct 18, 2023, 4:01 PM

#

tender dragon I wasn't aware your dataset was so small; I agree that's a more fundamental issu...

One of the goals is to scale everything up by expanding the dataset massively. Right now I'm experimenting with different generative stuff, I'm particularly interested in style transfer. I'm about to push some fine-tuning code to the repo so I can get a sense of how sensitive it is to pretraining.

sand nymph Oct 22, 2023, 2:05 PM

#

@hearty flicker How is the research question formulation coming along? Like, concretely what are you hoping to show with this model?

hearty flicker Oct 22, 2023, 2:06 PM

#

sand nymph <@150031585553547264> How is the research question formulation coming along? Lik...

I was planning on updating tomorrow. I'll let you know then. I have a pretty concrete plan for the next couple of months : )

hearty flicker Oct 24, 2023, 3:42 PM

#

Hey guys, I've been extremely busy this week. I'll update when I get a chance.

#

Btw I have made a colab notebook if anyone is interested! I'm a bit wary of publicly sharing it as I don't want the model waits to get downloaded from gcp lots and lots (leaving me with a large bill). If you want a link, dm me : ) It's super easy to use!

hearty flicker Oct 30, 2023, 11:32 AM

#

Does anyone have a sense of how hard it is to build something ontop of ggml? I've got a little c experience but not too much. I think it would be cool to build an applet to generate and play music in real time on arm macs.

hearty flicker Oct 30, 2023, 12:32 PM

#

@here

Hey guys, update here - I'm planning to have this project released in mid-December. I have this entire month free to dedicate, so I reckon it is a realistic timeframe.

The first thing I'm going to work on is improving the Audio->MIDI transcription models and subsequently using this to significantly expand the live performance dataset I'm using (by 3-5x). If I can't make any improvements, I will use the current SOTA (Kong et al.) for this purpose. Either way, this should make a huge difference. Hopefully, I can get some GPUs for this purpose, as the actual transcription and audio processing is quite intensive.

I'm going to write a pre-print focused on generative controllability and the impact of pre-training scale. My supervisor wants me to submit it to ICML/NeurIPS; however, who knows if it will get accepted. If that fails, I can also submit to ISMIR in Summer 2024. He reckons that both the generative capabilities and controllability (using fine-tuning/CFmg/meta-tokens) are SOTA (though it is hard to measure quantitatively). There are quite a few models I can compare this against. I have access to ~50 music-related researchers that I can use for a qualitative study, so I should be able to get some data regarding this. While the audio files are being transcribed, I'm going to build on the inference library; there should be some really cool possibilities for controllability.

#

If possible, I'd like to produce two sets of pre-trained models. The first will only be trained on files with an open license (in line with EleutherAI's policy). As most of the good data falls under this license, this model should be powerful. I have explicit permission to use the MIDI dataset from ClassicalArchives too, which I would love to include in this version. I'd like to release these checkpoints publicly and use them for the analysis in the aforementioned paper. The second set will be trained on all the data I have access to. I understand that I might not be able to use Eleuther's resources here (or release the weights); however, I am actually really interested in how much of a difference using the bigger pre-training dataset will make. This is super valuable research IMO; for me, it would strongly inform the correct direction of further improving models like Aria. I have also curated a few high-quality fine-tuning datasets which should be really interesting to experiment with (and release checkpoints of). There are a bunch of improvements to the model that I have planned; I'm hoping to do this while the audio files are being transcribed.

I'm planning to release a blog post, similar to the musiclm (https://google-research.github.io/seanet/musiclm/examples/), along with the model weights (in Dec). Since Eleuther is involved in this project, I'd love to release it with their affiliation; however, I can also release it on my personal website.

quasi steppe Oct 30, 2023, 12:50 PM

#

hearty flicker @here Hey guys, update here - I'm planning to have this project released in mi...

Yeah evaluating any Gen AI is hard. Was watching James Betker's talk this weekend and even OAI only does human evaluation.
I guess EAI might have some resources to help organize human evaluations, though I'm not super familiar with the details.
When we wrote the CFG paper we made a simple UI and collected some preference data from the public. Was really nice to have some real world numbers in our paper and I guess you can do the similar thing too.

hearty flicker Oct 30, 2023, 12:51 PM

#

quasi steppe Yeah evaluating any Gen AI is hard. Was watching James Betker's talk this weeken...

Luckily I can basically use my entire department for a qualitative study as there are loads of music people. I also think a more public survey is a great idea!

sand nymph Oct 30, 2023, 1:01 PM

#

@subtle lance can help with licensing & policy questions (just not today, as they're traveling to the UK for the AI Summit)

@timber talon is also very interested in helping I know.

hearty flicker Oct 30, 2023, 2:34 PM

#

Hey guys, I'd just like to make it explicit that I'm more than happy to add anyone who has been contributing to the discussion in this Discord as a co-author (if you would like). This is particularly relevant for @quasi steppe, @timber talon, and @sand nymph. Going forward, there are some other areas which I see as ripe for collaboration:

Fixing/improving the repetition problem via different prompting/sampling methods (such as CFG)
Exploration of other prompting/sampling methods
Improvements to the data preprocessing and augmentation
Improvements to the model architecture (there are definitely changes to be made here)
Optimization of the training process and regularization
Expansion of pre-training/fine-tuning datasets
Generating ideas about improving the tokenizer and the format of the meta-tokens

There are lots of other topics as well; I'll mention them as they come to mind. As I said I'm going to focus on Audio->MIDI for the next few weeks, if anyone has any thoughts on that area

#

Also if anyone wants to try out the colab notebook, you can have it here. Both the checkpoint it uses and the sampling hyperparams are far from ideal, so beware 👻

#

https://colab.research.google.com/drive/18MN7uzBghLQ9T_HvBw81JMlsRsV2ZiDe?usp=sharing

#

Here are some MIDIs you can use as prompts

📎 prompts.zip

#

Repo link - https://github.com/EleutherAI/aria

tender dragon Oct 30, 2023, 3:54 PM

#

https://github.com/EleutherAI/aria/blob/8a9b40814d5fc358254d311a84675ad227ffc26a/aria/tokenizer/tokenizer.py#L355
You are encoding the time of messages as offsets between start times, rather than absolute times.
I think this is wrong. Human memory follows the grid of the meter: so suppose a meter is 128 midi ticks. Then at the beginning of a new meter, long-term memory only retrieves certain notes: 256 ticks back, 384 ticks, 512 ticks, etc. This requires it be able to see the absolute time difference between two notes (and also know the meter distance, which you might want to do some feature engineering on). I can provide some demonstrations of this effect. With Wait messages, the transformer is unable to see absolute time offsets because it would need to sum the Waits of all the notes in-between.

That means here: https://github.com/EleutherAI/aria/blob/8a9b40814d5fc358254d311a84675ad227ffc26a/aria/tokenizer/tokenizer.py#L348
I recommend you append timestamps to tokens, and remove all the "wait" tokens. So the token becomes (_onset_time, _instrument, _pitch, _velocity). The attention head will want to see some sine-like timestamp difference.

Duration is a different matter; I also recommend you append the duration to the token, but this is much less clear. It doesn't actually belong in either place, and although I understand how duration affects harmony, I don't have a good idea of where the transformer wants it to be, because there is an architecture mismatch. I don't have a demonstration for this, and you should think twice before following this recommendation on duration. You could just leave it as-is for now since it is less effort.

https://github.com/EleutherAI/aria/blob/8a9b40814d5fc358254d311a84675ad227ffc26a/aria/tokenizer/tokenizer.py#L343
I'm not sure why you are quantizing the velocities and times. If it's for regularization, I think data augmentation would be better, just like you are augmenting pitch and velocity.

#

Be careful about quantizing times too much; swing is very delicate on time. https://www.nature.com/articles/s42005-022-00995-z

It probably doesn't matter for your current dataset though.

Nature

Downbeat delays are a key component of swing in jazz

Communications Physics - To which extent and how do jazz musicians synchronize their timing to create swing? By analyzing jazz musical recordings and carrying out psychoacoustical experiments, the...

warm tangle Oct 30, 2023, 3:58 PM

#

hearty flicker Hey guys, I'd just like to make it explicit that I'm more than happy to add anyo...

Carlos and I have been busy with our own research but I would like to give the code a read and see if there’s anything we can contribute - even an architecture change or something like you said

tender dragon Oct 30, 2023, 4:10 PM

#

https://github.com/EleutherAI/aria/blob/8a9b40814d5fc358254d311a84675ad227ffc26a/aria/model/model.py#L74
here, I expect the position embedding to be by time, not by token position. attention head should see two position values: one is absolute position difference / meter, other is exp(2pi I * absolute position difference / meter)

hearty flicker Oct 30, 2023, 4:12 PM

#

tender dragon https://github.com/EleutherAI/aria/blob/8a9b40814d5fc358254d311a84675ad227ffc26a...

These are good observations however you have to consider the vocabulary explosion in the embedding/lm-head stages. This is also the reason for my choice of quantization. If I'm understanding your idea about the meter correctly, I think it might be a good idea for very structured MIDI files, however I'm not sure it's really applicable to the sort of data I'm using.

tender dragon Oct 30, 2023, 4:13 PM

#

hearty flicker These are good observations however you have to consider the vocabulary explosio...

by vocab explosion, you mean you are passing the velocity number in as an Int, rather than a Float?

hearty flicker Oct 30, 2023, 4:14 PM

#

I mean that don't want to size of set the set of possible tokens to get too large.

tender dragon Oct 30, 2023, 4:14 PM

#

I understand that you are constructing MIDI from performances rather than from scores. however, I think meter detection should still be possible. and over a short range, even an inaccurate meter guess would be better than no meter guess

#

for these 4 features, I expect the embedding to look like this: (float of onset_time, category of instrument, category of pitch, float of velocity). a 4D vector, where only the middle two dimensions are categorical. so I don't see how quantization is reducing the vocab

hearty flicker Oct 30, 2023, 4:18 PM

#

tender dragon for these 4 features, I expect the embedding to look like this: (float of onset_...

What sort of loss would this be measured against in the lm-head decoding? If you were to use the standard method for training transformers, at least as far as I am aware, you would need quantization.

#

I do agree with your point about the arithmetic possibly being a problem, however I'm not sure that it isn't practically solved by having multiple attention heads (as is standard). I've not noticed timing issues to be one of the problems with the models as it currently is.

#

When I fine-tune on chorales for instance, it perfectly reconstructs the beat-bar structure.

#

I do think it would be a cool idea to try out though, see if it makes any difference.

tender dragon Oct 30, 2023, 4:24 PM

#

hearty flicker What sort of loss would this be measured against in the lm-head decoding? If you...

maybe I'm not understanding this point, but what I mean by embedding is a direct map. like if you have (note time 100, instrument 2, pitch 99, velocity 60), the embedding is simply (100.0+augmentation noise, 2, 99, 60.0+augmentation noise)

hearty flicker Oct 30, 2023, 4:25 PM

#

So what is the loss function used to train the model? What does the output layer look like.

tender dragon Oct 30, 2023, 4:32 PM

#

hearty flicker I do agree with your point about the arithmetic possibly being a problem, howeve...

here's an incomplete specification. for output, you can have (token type which can be note/end of song/start of song/etc, four dimensions for a note). so I've put 5 dimensions in a vector. the loss is case-wise. first, cross-entropy on the token type. suppose the token is a note - then in the four dimensions of the note, loss is squared error in first and fourth dimensions, cross-entropy in second and third dimensions. if the token type is not "note", such as end-of-song, the loss ignores those four dimensions. or if the token is something else which uses one dimension, then append one dimension to the output layer to have 6 dimensions, and apply the relevant loss to that dimension when the token indicates the dimension is relevant

hearty flicker Oct 30, 2023, 4:34 PM

#

I'm quite skeptical of approaches like this in general, as it decouples information that should be coupled

tender dragon Oct 30, 2023, 4:34 PM

#

hearty flicker When I fine-tune on chorales for instance, it perfectly reconstructs the beat-ba...

I don't expect the timing information I'm describing to make the model follow the meter better; its intention is for long-range structure

quasi steppe Oct 30, 2023, 4:35 PM

#

hearty flicker Hey guys, I'd just like to make it explicit that I'm more than happy to add anyo...

Thanks! I would love to help out. CFG is definitely something super relevant to Alex and I. I could also try to experiment with yarn and its variants as well as some other stuff on the list

hearty flicker Oct 30, 2023, 4:36 PM

#

Timing, pitch, duration, and velocity are all highly connected when it comes to predicting the next note. Doing a linear sum over the losses does not accurately represent the situation. This is why most transformer models (and all music transformer models as far as I am aware) are trained with CEL where the token space is a product of sets.

#

If you want to try out your approach, I do invite you to. You can fork the repo and adjust the tokenizer and loss function.

tender dragon Oct 30, 2023, 4:43 PM

#

hearty flicker Timing, pitch, duration, and velocity are all highly connected when it comes to ...

ok, I see what you mean. it doesn't make sense to add cross-entropy of pitch to the other information. in that case, I'll adjust my model. so my changed proposal is: remove velocity from the vocab, express it as an output float attached to each possible categorical output. so for an output token (time, instrument, pitch), the output is (probability, velocity). and the loss is cross-entropy on probability, squared-error on velocity. this is because I do expect velocity to add linearly

hearty flicker Oct 30, 2023, 4:45 PM

#

tender dragon ok, I see what you mean. it doesn't make sense to add cross-entropy of pitch to ...

I actually also think even this is bad and degrades the outputs. Velocity is very much tied to the note itself.

#

A loud melody over a quiet chord for instance.

#

I do think you idea about the meter has some merit, but I forsee a bunch of complications when implementing it that really really complicate things.

#

And I'm really not sure that it would actually help anything. With just eight attention heads I haven't noticed timing being an issue, I'm really not sure that introducing a small meter would actually improve anything. If you want to experiement with it though, I would be interested in the results.

tender dragon Oct 30, 2023, 4:49 PM

#

hearty flicker I actually also think even this is bad and degrades the outputs. Velocity is ver...

is the concern here that velocity could be bimodal/trimodal, or that it could be not all Gaussians with the same variance?

hearty flicker Oct 30, 2023, 4:51 PM

#

tender dragon is the concern here that velocity could be bimodal/trimodal, or that it could be...

The issue is that if the next note is either a loud C or a quiet A with equal probability, then a model trained with that loss function would predict a medium C or medium A with equal probability. Which is a degradation.

#

Anyway, with the quantization I'm using I don't need to use this trick anyway. It's not really perceptible (both for timing and velocity) so why change it?

tender dragon Oct 30, 2023, 4:53 PM

#

the second proposal has one velocity per possible token. so it has separate predictions for loud C and quiet A.

changing is to reduce vocab by representing the continuous dimension as a continuous rather than discrete variable. but it's your decision.

hearty flicker Oct 30, 2023, 4:55 PM

#

tender dragon the second proposal has one velocity per possible token. so it has separate pred...

I don't really follow this.

#

If the output space is (p, v) (p /in R^n, v /in R_{>0}), are suggesting to train on the loss L_1(p) + L_2(v) where L_1 is CEL?

tender dragon Oct 30, 2023, 4:59 PM

#

in your existing formulation in your code, I assume that for each possible output (velocity, pitch, instrument), you output a probability through softmax. is that correct? so it's a ~127 x 127 x 10 vector

hearty flicker Oct 30, 2023, 5:00 PM

#

Velocity is quantized so it's roughly 127x10x10

tender dragon Oct 30, 2023, 5:01 PM

#

in my formulation, it's a 2 x 127 x 10 vector. the 2 is for (probability, velocity). it's CEL on probability as you describe. and the L_2(velocity) only kicks in if the note matches

hearty flicker Oct 30, 2023, 5:03 PM

#

I don't think that is true. I think instead of 2 you need 127

tender dragon Oct 30, 2023, 5:03 PM

#

for each (pitch, instrument), we output (probability, velocity)

#

and velocity is a float

hearty flicker Oct 30, 2023, 5:09 PM

#

Oh I think I see what you mean now! I still have doubts though, you would need to use the velocity loss function in quite a weird way since you don't have targets for all bar one of the predictions. I really don't think that vocab size is a problem anyhow, if you want to try out it I'd be interested to know if it does end up working.

#

I have tried a variety of ways for messing with loss function in the past, and unfortunely all of them have gone badly. Turns out a bog standard transformer is pretty good at learning the structure of sequences, and is able to overfit the train set even with extreme data augmentation.

tender dragon Oct 30, 2023, 5:18 PM

#

I'm very unlikely to try out model changes before your project finishes in December, but here's the extension of that loss function to time that I'm envisioning. first, time tokens are removed. then, for an output token (instrument, pitch), the output is (probability1, time1, velocity1, probability2, time2, velocity2, ...,). and let's say there are 4 of these triplets of (probability, time, velocity). what this does is mix the time in with the other note properties, so that it's no longer independent. the output dimension becomes 4 x 3 x 127 x 10. but it becomes complicated: it would probably generate notes out of order, and handling the loss function is probably not worthwhile in terms of improvement-to-effort ratio for your project. it's not a free win, and now that I know more about your project, it isn't something I necessarily encourage.

#

as for your advisor's opinion that your model is SOTA, I agree with it. I see some important medium-range correlational qualities in your generated music that I don't see in other models. I expect a large reason for this is the advantage of attending to simple MIDI messages, rather than attending to waveforms or waveform transformations.

hearty flicker Oct 30, 2023, 5:25 PM

#

I really appreciate your input by the way, definitely good to have original ideas swimming around in my head

tender dragon Oct 30, 2023, 5:56 PM

#

hearty flicker Oh I think I see what you mean now! I still have doubts though, you would need t...

you would need to use the velocity loss function in quite a weird way since you don't have targets for all bar one of the predictions

For the loss function specifically, using velocity as a float is better than quantizing to 10 positions. The issue with float velocity is that only one predicted velocity is being trained. But with 10 quantized velocities, the same problem applies, only 10x more - you still only get training for one prediction, but now the prediction is split into 10 velocities, and only one of these 10 velocities is being trained. It's like fitting a histogram. Whereas with a float velocity, the gradient pushes the velocity to the right value directly.

hearty flicker Oct 30, 2023, 6:25 PM

#

I haven't noticed any problems related to velocity to be honest. Having said that we could run an experiment with it completely omitted from the tokenizer, and compare the generative results (as far as pitch and timing is concerned). Could be a good way to tell if the way velocity is currently being handled is causing any degradation. My gut feeling is that it isn't, but it's good to be safe.

timber talon Oct 30, 2023, 6:30 PM

#

hearty flicker Hey guys, I'd just like to make it explicit that I'm more than happy to add anyo...

@hearty flicker sounds great! I'm excited to get things working on my end, and see waht kinds of hidden-layers get generated.

@quasi steppe feel free to message me about CFG or let's chat in this thread if we have interesting examples.

Regarding Audio->MIDI, @hearty flicker , can you say more? This is something I've been thinking about, too. Is the idea to be able to generate MIDI from raw audio files? I'm sure this is a whole line of work. Is the idea to do this for training data creation, or just for general interest?

hearty flicker Oct 30, 2023, 6:34 PM

#

timber talon <@150031585553547264> sounds great! I'm excited to get things working on my end,...

The training dataset I've been using is largely comprised of data obtained via modern Audio->MIDI transcription.

#

It's incredibly promising and can in theory be scaled up to as much high (piano) quality audio as one can obtain.

#

Here is the SOTA as far as I am aware - https://arxiv.org/pdf/2010.07061

#

I'm personally going to work on this specifically over the next few weeks. Hopefully we can then introduce some of the advancements in an async way whilst the expanded dataset is being built.

timber talon Oct 30, 2023, 9:03 PM

#

ahh i remember we talked about this before

One comment on that: I think we should assume the transcription function will be noisy but possibly in a biased way, like certain elements (e.g. thrills) will be consistently mistranscribed. In order to prevent our model from overfitting to that bias, it would be great to measure what kinds of errors transcribe(audio) = midi makes.

Trying to think how best to measure that — possibly for a gold set of pieces that we have real audio AND real midi for (i.e. with the MAESTRO dataset), would be interesting to see a list of the errors. And then, we'd have to think about how to de-bias. One way would simply be to warm-up the model with pairs of clean/noisy midi, and then freezing parts of the model before training on the audio-transcribed data for which we don't have any gold MIDI.

In other words:

input: midi dataset X, audio for X, audio dataset y

// warmup
noisy_x = transcribe(X_audio)
model.train(concat([X, noisy_X]))
model.freeze(layers=[0, 1, 2...])

// training
model.train(transcribe(y))

Another comment — if we're looking for alternative sources of data, how about running OMR on IMSLP, or other freely available PDFs out there? Apologies if we talked about this before, I'm remembering we've definitely been down this route before...

quasi steppe Oct 31, 2023, 12:44 AM

#

@hearty flicker
https://gist.github.com/honglu2875/f3a1c78970ad055e758d0a9fa8e09e47
I implemented kv caching here. Only renamed the model.py and sample.py and put them as gist because I haven't carefully written unit tests for the logits with/without kv-caching. But I generated a couple samples and they sound alright so this is likely correct. Also did some other minor optimizations. They speed up the generation by quite a lot.

I also replaced the RoPE by the one from huggingface NeoX codes. To apply PI or NTK it should be a trivial swap using those in the same file in huggingface repo.

Need to go to bed now but if everything is totally alright I can submit a PR tomorrow.

#

@timber talon
some generations with different CFG

It's really hard to evaluate those......

#

Here is one with CFG < 1 🤔

quasi steppe Oct 31, 2023, 8:58 AM

#

@hearty flicker @timber talon Listen to this one! Make sure you stay until the end! I just interpolated chopin with bach, not perfect but kinda interesting lol
@timber talon relevante to us because I used bach as "negative prompt" but made CFG < 1 (CFG=0.8) so negative of negative is positive (=interpolation)

quasi steppe Oct 31, 2023, 9:14 AM

#

CFG=0.5 interpolation (at the end the logits are only the mean of bach and chopin)

#

another way of interpolation allowing the end not to be exactly bach. This one is interesting (where is bach lol)

#

a degenerated sample..... empties

#

hmmm this one is quite cool. It somehow sticked to E flat until the end (that bach prelude is C major) but it tries to repeat a chord that relates them

#

This guy has gone mad lol

#

Here are all the samples I tried.
The interpolation script is the following (where I introduced another parameter alpha)
https://gist.github.com/honglu2875/f3a1c78970ad055e758d0a9fa8e09e47#file-sample_interpolate-py
I have some really great ones in it (esp when alpha=0.4) but I don't want to spam the channel too much.

📎 chopin_bach_interpolate.tar

#

This really made me feel that to generate great music we should traverse through the latent space along a path

quasi steppe Oct 31, 2023, 12:02 PM

#

Dynamic YaRN s=8 (scale factor), GEN_LEN=8192. The attention scale uses llama2's parameter so likely not optimal.
The first sample completely broke down somehow...

hearty flicker Oct 31, 2023, 12:02 PM

#

quasi steppe <@150031585553547264> https://gist.github.com/honglu2875/f3a1c78970ad055e758d0a9...

Great! Maybe we should work off of dev branch instead of main?

#

I don't want to break the notebook if we are going to make architecture changes

quasi steppe Oct 31, 2023, 12:04 PM

#

hearty flicker I don't want to break the notebook if we are going to make architecture changes

yep we should test it on notebook to make sure nothing breaks before merging to main if we do that at all

hearty flicker Oct 31, 2023, 12:04 PM

#

While we are at it we should probably make other changes in this vain

#

Any thoughts on Swiglu and mqa/gqa?

#

Also if there are any other issues with the implelmentation. I'm pretty sure it's up to date but idk

quasi steppe Oct 31, 2023, 12:06 PM

#

hearty flicker Any thoughts on Swiglu and mqa/gqa?

no idea.. My general feeling is that these architecture changes only have marginal effect when scaling

hearty flicker Oct 31, 2023, 12:06 PM

#

I might as well change swiglu I think, I don't see a reason not to

#

I originally just implemented the changes from llama1

#

@timber talon could we have a meeting about experiment planning? I've got some ideas already

#

If I want to use my department for a survey, I probably have to submit an ethics proposal or something along those lines

#

@quasi steppe have you got the context length extension stuff implemented?

#

I'll create a dev branch and we can merge prs into that

quasi steppe Oct 31, 2023, 12:09 PM

#

hearty flicker <@823129585230544906> have you got the context length extension stuff implemente...

yes and I'm running it

hearty flicker Oct 31, 2023, 12:09 PM

#

Does it work? lol?

quasi steppe Oct 31, 2023, 12:10 PM

#

quasi steppe Dynamic YaRN s=8 (scale factor), GEN_LEN=8192. The attention scale uses llama2's...

see this lol

#

not really working haha

hearty flicker Oct 31, 2023, 12:10 PM

#

Current checkpoint is 2048 and about 100m params

quasi steppe Oct 31, 2023, 12:10 PM

#

but I haven't double-checked carefully. Also the parameters are for llama2 and likely needs tuning

hearty flicker Oct 31, 2023, 12:10 PM

#

I think final version could be 4k/8k context with 200m - 800m params

quasi steppe Oct 31, 2023, 12:11 PM

#

hearty flicker I think final version could be 4k/8k context with 200m - 800m params

Yeah this would be great. You could do larger one (I can use SAI external cluster) but only if you have enough data to make it worthwhile

#

@hearty flicker how do you think about the interpolation experiments? I feel the quality really improved and it doesn't do boring stuff

hearty flicker Oct 31, 2023, 12:12 PM

#

I think it's very very cool lol

#

My supervisor was going to work on exactly what you implemeneted, seems you have beaten him to it lol

quasi steppe Oct 31, 2023, 12:13 PM

#

if we have enough context length, we can use like 4-5 prompts and interpolate them in different intervals and eventually come back to the first prompt. That could end up with a complete piece of music

hearty flicker Oct 31, 2023, 12:14 PM

#

Yes and btw in the lit, this is a big unsolved problem.

#

Controllability of these types of models, that is.

#

So these results are very cool

#

I'm going to refactor the inference part of the lib so we keep better track of stuff.

hearty flicker Oct 31, 2023, 12:19 PM

#

quasi steppe if we have enough context length, we can use like 4-5 prompts and interpolate th...

If you want, I could train a 4k version for the time being. Just need access to a A100 really.

#

I have some at my university too but their system is really really annoying to use, so I mainly just my own server (which has a 4090)

quasi steppe Oct 31, 2023, 12:22 PM

#

hearty flicker If you want, I could train a 4k version for the time being. Just need access to ...

Give me your training codes and scripts and I can throw it into SAI external cluster. We have a lot of A100

hearty flicker Oct 31, 2023, 12:22 PM

#

It's all in the repo actually

quasi steppe Oct 31, 2023, 12:22 PM

#

right now there are 80x A100 idling lol

hearty flicker Oct 31, 2023, 12:22 PM

#

All you need to do is built the datasets, I suppose I could also just give you a dl script for it too

#

I also have access to the SAI cluster, I didn't realise that there was that much idle compute wow

quasi steppe Oct 31, 2023, 12:23 PM

#

hearty flicker I also have access to the SAI cluster, I didn't realise that there was that much...

10 nodes so 80 A100 available

hearty flicker Oct 31, 2023, 12:24 PM

#

Ok I'll tell you what I'll do. I'll update the HOWTO to explain how to train and finetune stuff

#

it's like 3 cli commands so not hard

#

I haven't tested that my train script (accelerate) implementation works in a distributed setting.

quasi steppe Oct 31, 2023, 12:25 PM

#

how many tokens does your dataset have?

hearty flicker Oct 31, 2023, 12:25 PM

#

idk it's like 3gb?

quasi steppe Oct 31, 2023, 12:26 PM

#

if it's usual gpt2 tokenizer in typical LLM that's about 500M tokens I think but I don't know about your tokenizer

#

at this scale it's probably only gonna take 1 single node

hearty flicker Oct 31, 2023, 12:26 PM

#

Nah it's way less than that because of how inefficiently the data is stored.

#

If you install the repo and run python tests/test_data.py you will see what the dataset files look like in tests/test_results

#

The checkpoint you are currently using 24hours on a 4090, so I doubt we would ever need more than one node.

#

If you really want to run a full experiment (for pretraining), we could run an experiment on the full dataset (about 50x larger than I've used for tests).

#

That does include non-classical data though, so it would be more of a pretraining checkpoint than anything else

#

I actually have class from 1pm today, but I'll work on this stuff now and get back to you in a few hours.

quasi steppe Oct 31, 2023, 12:31 PM

#

yeah I will play with it later as well. Need to grab lunch

#

Oh you are in Europe as well?

hearty flicker Oct 31, 2023, 12:31 PM

#

I'm in London : )

quasi steppe Oct 31, 2023, 12:31 PM

#

Awesome

quasi steppe Oct 31, 2023, 1:21 PM

#

I'm spending my lunch time listening to all my interpolations.
Man this one is like a real improvisation...

hearty flicker Oct 31, 2023, 1:26 PM

#

It's cool that you are enjoying it lol, I also think it's pretty fun. I really want to create the best possible version of this stuff.

#

I think expanding the dataset will really make a huge difference. That is why I'm so focused on the audio->MIDI stuff rn

#

In theory you could just scale this up to all recorded piano on the internet

#

I mean, even with just 5000 recordings you can get pretty cool stuff

quasi steppe Oct 31, 2023, 1:33 PM

#

hearty flicker If I want to use my department for a survey, I probably have to submit an ethics...

which department are you in by the way?

hearty flicker Oct 31, 2023, 1:33 PM

#

I'm on a AI-Music CDT

#

https://www.aim.qmul.ac.uk/

quasi steppe Oct 31, 2023, 1:34 PM

#

Oh Queen Mary, I have a professor friend there

hearty flicker Oct 31, 2023, 1:35 PM

#

Only a month in really, before this I was working on this stuff whilst being a software engineer / data scientist

#

Before that I did a ug/masters in mathematics (geometry) at Imperial

quasi steppe Oct 31, 2023, 1:36 PM

#

cool!

timber talon Oct 31, 2023, 2:34 PM

#

Oh man @hearty flicker you got @quasi steppe interested, this project will probably wrap up in a few weeks lol 😂 just like CFG — that took 3 weeks total

#

Sure, I can meet whenever. I’m pretty free today, tomorrow. Only thing is I’m in California

hearty flicker Oct 31, 2023, 2:47 PM

#

timber talon Sure, I can meet whenever. I’m pretty free today, tomorrow. Only thing is I’m in...

Could we do tomorrow? I'm good for a meeting at anytime 6pm-9pm GMT? If that is good with you

#

@quasi steppe I should probably merge the fine-tuning code before running another experiment, should be sometime today I'll let you know

hearty flicker Oct 31, 2023, 4:10 PM

#

fine-tuning code is merged, you can also find a decent explanation of how to train a model in the 'quick overview' section in HOWTO.md

#

This is a good dataset - https://github.com/bytedance/GiantMIDI-Piano

#

I can also share others on the gcp bucket if people want : )

#

In order to not break the notebook until the next checkpoint is available, it's probably best to merge prs into the dev branch for now - https://github.com/EleutherAI/aria/tree/dev

quasi steppe Oct 31, 2023, 5:13 PM

#

Submitted a PR for all my stuff. It doesn't have to merge right away. I can test a bit more later.
Also added a bit of CFG variants that @timber talon we never tried in our paper such as letting gamma vary across the autoregressive decoding (See https://github.com/EleutherAI/aria/pull/55/files#diff-8fc4d42f9a791e15c020814665ac07b89c72055cb542291ad12f728eada25ffdR22-R38). It leads to my interpolation experiments by setting cfg_gamma < 1.

timber talon Oct 31, 2023, 5:59 PM

#

I know!! I’ve been thinking about that

#

I wonder how well we could learn the autoregressive hyperparameter

#

Or whether there’s a parameter for the a section and b section

#

@hearty flicker I’m free any time in that window. 6pm gmt is fine. @quasi steppe are you free to join?

quasi steppe Oct 31, 2023, 6:32 PM

#

timber talon <@150031585553547264> I’m free any time in that window. 6pm gmt is fine. <@82312...

When? Yeah I'm free tonight

timber talon Oct 31, 2023, 6:40 PM

#

hearty flicker Could we do tomorrow? I'm good for a meeting at anytime 6pm-9pm GMT? If that is ...

I think @hearty flicker said tomorrow?

hearty flicker Oct 31, 2023, 6:49 PM

#

I'm off tonight actually, I was planning on tomorrow. I'll merge the pr tomorrow @quasi steppe

quasi steppe Oct 31, 2023, 7:23 PM

#

oh sorry I didn't check. Yeah tomorrow sounds good

hearty flicker Nov 1, 2023, 12:23 PM

#

Ok guys lets do 6pm today? lmk if 7 is better for you @timber talon

#

I'm writing a small tool ontop of spotifydl for downloading and keeping track of all of the music (for the transcription stuff).

hearty flicker Nov 1, 2023, 3:13 PM

#

Actually guys lets reschedule for another day if possible. Some irl stuff has come up that I need to deal with tonight...

timber talon Nov 1, 2023, 3:33 PM

#

oh shoot ok no worries

#

let's see. I'm probably doing an all-day train trip either tmrw or Friday. But I'm free whatever day I don't do it. And can do earlier too, like 5pm GMT

hearty flicker Nov 1, 2023, 3:36 PM

#

Ok that sounds good, let us know which day you have free and we can try to schedule a time that works for @quasi steppe as well. I'm free pretty much every evening apart from tonight.

quasi steppe Nov 1, 2023, 3:38 PM

#

yeah me too. Evening is good with me

timber talon Nov 1, 2023, 3:47 PM

#

ok i'll let you know as soon as I decide. Will likely be in the next few hours

hearty flicker Nov 1, 2023, 3:49 PM

#

Right now I'm compiling a huge list of spotify links to various classical piano recordings

#

I recon I should be able to pretty easily 5x the GiantMIDI dataset. Really depends how much of this mind numbing work I can take lol

timber talon Nov 1, 2023, 3:50 PM

#

what does it entail?

#

I'm happy to split some mind-numbing work

hearty flicker Nov 1, 2023, 3:50 PM

#

I actually am a bit burnt of coding rn, so it's good for me haha

timber talon Nov 1, 2023, 3:50 PM

#

are you literally clicking "download" for classical piano albums lol?

hearty flicker Nov 1, 2023, 3:52 PM

#

Basically the pipeline will be: compile a huge list of links to spotify albums -> use spotify_dl to download as many as possible -> transcribe them to MIDI using (Kong et al.)

timber talon Nov 1, 2023, 3:52 PM

#

ok cool

#

lmk if you need help, whether you wanna set up a google doc of something

hearty flicker Nov 1, 2023, 3:52 PM

#

I'm going to write a proper script so that it can be manged properly and we won't have to redownload from scratch ect.

#

I'm a bit burnt from coding today though, so I may as well start compiling the list haha

#

I think the best way to be systematic and efficient it so go by pianist instead of by composer. Easier to navigate that way I think.

timber talon Nov 1, 2023, 3:54 PM

#

aw man i know that feeling haha

hearty flicker Nov 1, 2023, 3:54 PM

#

But you have to be carful because you can also use piano-only recordings, no concertos or anything

timber talon Nov 1, 2023, 3:55 PM

#

tricky, yeah... links can be whole albums, or just songs?

hearty flicker Nov 1, 2023, 3:55 PM

#

With the GiantMIDI dataset, they took a lot of care to only have 1 version of each piece. For our purposes, this is actually not what we want at all

#

As many different versions of each piece as possible!

#

That's the best version of data-augmentation lol

timber talon Nov 1, 2023, 3:55 PM

#

yeah... Yuja Wang has so many recordings of the same Rachmaninoff pieces lol

hearty flicker Nov 1, 2023, 3:56 PM

#

I think that actually transcribing these recordings might be very GPU intensive, I might be able to persuade the head audio guy at SAI to help out with compute if it's too demanding for EAI.

#

My supervisor knows him decently well apparently

timber talon Nov 1, 2023, 3:57 PM

#

cool. so why don't you shoot me a google docs link with what you have already, we can throw composers on there and divy it up. I need a few hours this morning to get some models running for an unrelated thing, but should be free soon

#

ok

hearty flicker Nov 1, 2023, 3:57 PM

#

Cool : )

#

We just have to make sure that it's piano only, no rach2 or anything like that

timber talon Nov 1, 2023, 3:58 PM

#

for sure

hearty flicker Nov 1, 2023, 3:58 PM

#

because then the models will try to transcribe the other instruments as piano too lmao

timber talon Nov 1, 2023, 3:59 PM

#

hmm. I know we can find underutilized clusters.... what memory requirements are we talking about here? Just because of high-batch? or even batch size=1 is high mem?

hearty flicker Nov 1, 2023, 3:59 PM

#

I'm not even sure yet. Apparently the GiantMIDI dataset took 300 GPU hours or something like that.

#

Tbh I'm not super familiar with audio stuff yet so I'm not sure.

#

I'm going to be researching this over the next few weeks pretty hardcore though

sand nymph Nov 1, 2023, 4:00 PM

#

What are these 300 GPU hours for? Data augmentation?

hearty flicker Nov 1, 2023, 4:00 PM

#

sand nymph What are these 300 GPU hours for? Data augmentation?

For transcribing an (audio) .mp3 file into a (symbolic) MIDI file

tender dragon Nov 1, 2023, 4:01 PM

#

hearty flicker As many different versions of each piece as possible!

the great thing about this procedure is that re-weighting the pieces to de-dupe isn't even necessary. more famous (better) pieces are played more often, so they deserve to be predicted better. the weighting bias gets you a little bit of quality information

hearty flicker Nov 1, 2023, 4:02 PM

#

The SOTA for doing this transcription uses a neural net

timber talon Nov 1, 2023, 4:02 PM

#

tender dragon the great thing about this procedure is that re-weighting the pieces to de-dupe ...

I don't know that I agree with this

sand nymph Nov 1, 2023, 4:02 PM

#

300 what hours? A100? A40? 2080 Ti?

hearty flicker Nov 1, 2023, 4:03 PM

#

I'm not sure, I'll look into it. Here is the reference if anyone is interested - https://arxiv.org/abs/2010.07061

timber talon Nov 1, 2023, 4:04 PM

#

i would bet it's smaller GPUs. audio-models tend to be smaller CNNs

sand nymph Nov 1, 2023, 4:04 PM

#

I expect this to be non-problematic to run

timber talon Nov 1, 2023, 4:04 PM

#

whisper for instance can run on 12GB gpus

hearty flicker Nov 1, 2023, 4:04 PM

#

I'm aiming to get this setup in an async way, so we can transcribe the audio while doing other stuff that needs to get done

timber talon Nov 1, 2023, 4:04 PM

#

we have loads and loads of those just sitting around

hearty flicker Nov 1, 2023, 4:04 PM

#

That's why I'm working on it now

#

Yeah I'm pretty sure vram isn't really an issue, so a100s are not really needed.

sand nymph Nov 1, 2023, 4:05 PM

#

Cool, we can easily throw 16 2080s at this and do it in a day

hearty flicker Nov 1, 2023, 4:05 PM

#

Yeah that would be cool, thanks @sand nymph

timber talon Nov 1, 2023, 4:06 PM

#

definitely.... I'm sure we all have so many of those clusters just sitting around. another lab group at uni tried to donate some 2080s to the general university cluster, and no one even wanted to maintain them

timber talon Nov 1, 2023, 4:08 PM

#

tender dragon the great thing about this procedure is that re-weighting the pieces to de-dupe ...

this is a statement that needs to be unpacked. famous = better? better according to whom? "deserve"? lots of value-driven words here. And totally neglects the idea of more niche styles. Obviously, in the extreme case, a HEAVILY duplicated dataset would lead to a degenerate network that just generates a few of the same pieces

hearty flicker Nov 1, 2023, 4:09 PM

#

I find it funny that there are only like 50-100 Chopin recordings in the current testisng dataset

timber talon Nov 1, 2023, 4:09 PM

#

clearly with a large enough dataset, we're not gonna be too bad, but there is the risk that everything we generate will start to sound like rachmaninoff, no matter what the prompt is

hearty flicker Nov 1, 2023, 4:09 PM

#

Of just the etudes alone, imagine how many recordings exist lol

#

Thats definitely an issue, the nice thing about using spotify_dl is that I can tag everything with meta-data

#

So we will have a rough sense of the composition of the dataset. In Kong et ak. they do this type of analysis too I believe.

timber talon Nov 1, 2023, 4:12 PM

#

i agree there's a lot of potential, here! i worry a bit about transcription noise, but i'm looking at the Giant-MIDI paper and those eval numbers are pretty impressive

#

here's the github of the transcription system they use fyi: https://github.com/bytedance/piano_transcription

#

idk if you already posted it, if you did, my bad

#

and the paper specifically describing the transcription model: https://arxiv.org/pdf/2010.01815.pdf

quasi steppe Nov 1, 2023, 4:17 PM

#

it works for any mp3? I'm thinking about some non-public sources

timber talon Nov 1, 2023, 4:17 PM

#

there's nothing in the paper about inference memory requirements. they just say training takes 1 V100 card, 32GB

#

and the model is 20m params

tiny coral Nov 1, 2023, 4:22 PM

#

Since you're looking at gathering data from more sources, is there any interest in exploring semantic data deduplication? I.e. create an embedding model trained on tokenized midi to measure similarity

hearty flicker Nov 1, 2023, 4:22 PM

#

I mean, if nothing else we should be able to just expand GiantMIDI

hearty flicker Nov 1, 2023, 4:23 PM

#

timber talon i agree there's a lot of potential, here! i worry a bit about transcription nois...

I also worry about this too, but modern transcription models are actually suprisingly good

tender dragon Nov 1, 2023, 4:24 PM

#

timber talon this is a statement that needs to be unpacked. famous = better? better according...

I won't address the first half of this, but I agree with the second half. thinking back on what I heard from spotify, some of the more famous pieces are heavily over-represented in recordings. so I believe the duplication is enough to cause problems and re-weighting is necessary, contrary to my original opinion

tiny coral Nov 1, 2023, 5:13 PM

#

I recently noticed a data quality issue in GiantMIDI. Some youtube channels that are included in the data have outro tracks unrelated to the main piece. In retrospect, these explain a common strange behavior from the RWKV models I trained some time ago. For example, this is one of the source audios for the dataset: https://www.youtube.com/watch?v=Zj_psrTUW_w
I have confirmed that this is all transcribed into the raw midi file.

#

I suspect we will also find single files containing multiple movements in series. I'm going to add a step in my own preprocessor that splits large delays into multiple documents, and add a minimum document length filter.

hearty flicker Nov 1, 2023, 5:35 PM

#

tiny coral I recently noticed a data quality issue in GiantMIDI. Some youtube channels that...

There is actually lots of duplication in GiantMIDI too. I'm not exactly sure why, but many files with different titles are actually the same.

timber talon Nov 1, 2023, 6:35 PM

#

i think it's because they download from Youtube, through some complicated pipeline where they first scrape IMSLP and then query youtube. they have a whole section on their error rate... section 5.2

#

i have a feeling that @hearty flicker 's approach with spotify will address both the outtro issue and the mislabeled titles issue

#

I'm betting spotify's data is better quality than Youtube

hearty flicker Nov 1, 2023, 6:36 PM

#

Yeah I also skimmed the paper a few hours ago

#

I feel like spotify_dl is a much better approach

timber talon Nov 1, 2023, 6:37 PM

#

for sure

#

(what's the licensing if we do that? it's probably like, don't release anything, right?)

hearty flicker Nov 1, 2023, 6:37 PM

#

I've just tested it out and it's surprisingly good, it gets the correct mp3 like 99% of the time

timber talon Nov 1, 2023, 6:37 PM

#

cool!

hearty flicker Nov 1, 2023, 6:38 PM

#

I mean it's all technically downloaded from youtube too, it's just done in a way where it only downloads if there is as complete match

#

They released the mp3 files for GiantMIDI, however I would be skeptical doing that myself.

timber talon Nov 1, 2023, 6:40 PM

#

spotify_dl downloads from youtube? oh sorry, my bad

hearty flicker Nov 1, 2023, 6:41 PM

#

Yeah it does, but only complete matches. Whoever made it did a really good job

timber talon Nov 1, 2023, 6:41 PM

#

gotcha, sorry was misunderstanding, but yeah... it sounds better than GiantMIDI's approach

hearty flicker Nov 1, 2023, 7:06 PM

#

I also think it's good to remember that ideally this stuff will be used for pre-training, higher quality datasets can be used to tune the specifics. My experience is that this sort of approach this works in practice.

timber talon Nov 1, 2023, 8:23 PM

#

got it, that makes sense

sharp quiver Nov 1, 2023, 8:26 PM

#

Why use spotify_dl rather than anything else? From reading the conversation, I'm assuming piracy is not an issue, so why not just download the actual rips from CDs?

quasi steppe Nov 1, 2023, 8:32 PM

#

sharp quiver Why use spotify_dl rather than anything else? From reading the conversation, I'm...

I actually have a lot of those rips but mostly for my own convenience

#

My guess is that there might be legal risks to circulate those data empties

sharp quiver Nov 1, 2023, 8:37 PM

#

I mean, if spotify_dl gets exact matches, it's probably getting official uploads which are probably identical to rips except degraded

#

XD

sharp quiver Nov 1, 2023, 8:39 PM

#

quasi steppe I actually have a lot of those rips but mostly for my own convenience

Me too 😉

quasi steppe Nov 1, 2023, 8:41 PM

#

sharp quiver XD

hmmmm that looks very familiar thinkies

#

I bought a 1T hard drive 11 years ago just to save my collections

sharp quiver Nov 1, 2023, 8:46 PM

#

Reminds me I need to rip the Bach collection someone gave me a while ago...
It's the full set, 155 CDs berk I'm not even a Bach kind of guy

hearty flicker Nov 2, 2023, 12:02 AM

#

sharp quiver I mean, if spotify_dl gets exact matches, it's probably getting official uploads...

I think it would be less convenient to download rips. The reason I'm using spotify_dl atm is because it's surprisingly good lol, I'm not entirely sure how/why. I highly encourage other people try it for themselves ha

#

Whoever made this tool was pretty smart, I might look into the src tomorrow to see how exactly it's finding these matches

minor hamlet Nov 2, 2023, 10:14 AM

#

Thought i would ask here, whats the best way to get midi embeddings? Similar to say a t5 encoder module for text.

#

could I get some from the rwkv model?

#

im not too sure how that works however

versed trench Nov 2, 2023, 3:00 PM

#

@quasi steppe hey there! Do you have any other opensource implementations for your cfg experiments? Can take other discussions to DMs if needed

quasi steppe Nov 2, 2023, 3:25 PM

#

versed trench <@823129585230544906> hey there! Do you have any other opensource implementation...

Hi! Do you look for the code or some concrete experiments?
See DM

hearty flicker Nov 3, 2023, 3:01 PM

#

@timber talon nice job on the albums : )

timber talon Nov 3, 2023, 3:09 PM

#

Oh thanks haha WIP

hearty flicker Nov 3, 2023, 3:22 PM

#

I like this piece, it might make an interesting prompt

#

There is also audio post-processing which makes everything sound way better.

#

@here This sample is pretty cool! This prompt works well.

#

The audio post-processing also goes a long way

hearty flicker Nov 3, 2023, 4:46 PM

#

https://colab.research.google.com/drive/1SmwmsSf92Bv30algvZ-D4rW8dtH0kJNL?usp=sharing

#

Here is an updated version of the notebook that will automatically convert to mp3 (you can listen inside the notebook)

hearty flicker Nov 3, 2023, 6:27 PM

#

Ok I just put the orchestral version of Rach 2 (2nd movement) and the results are weird as hell haha

timber talon Nov 3, 2023, 6:35 PM

#

Darn, i mainly listen to music by composer, not by performer.
But compiling this spotify list, it makes sense to be performer-centric. It's CRAZY how many of the biggest performers all play the same few composers

#

Like, Chopin over and over again. A bit of Schubert. Rachmaninoff. Repeat. lol

hearty flicker Nov 3, 2023, 6:35 PM

#

timber talon Darn, i mainly listen to music by composer, not by performer. But compiling thi...

This is exactly what I was saying last night too haha, I found this too

timber talon Nov 3, 2023, 6:36 PM

#

I'm gonna try to get some Bach on there. Glenn Gould is the undisputed master but he hums along with most of his tracks......

hearty flicker Nov 3, 2023, 6:36 PM

#

btw do not under any circumstances put Autumn Leaves into the model, if you want to keep your ears

timber talon Nov 3, 2023, 6:36 PM

#

lmao

hearty flicker Nov 3, 2023, 6:37 PM

#

To be fair, it does surprisingly well given the circumstances lol

hearty flicker Nov 5, 2023, 12:50 AM

#

Once we get the inference library built out and the context length extended, I think my dream of a never ending fugue might become a reality ha!

#

hearty flicker Nov 5, 2023, 12:15 PM

#

There is a bug will the MIDI processing which is resulting in some notes being played as staccato when they shouldn't be. Gonna look into this tomorrow morning...

#

I thought I fixed this bug ages ago but obviously not...

hearty flicker Nov 5, 2023, 2:38 PM

#

These are nice - https://twitter.com/loubbrad/status/1721167745679552851

hearty flicker Nov 5, 2023, 3:07 PM

#

Schubert, this one has pretty decent long term structure before the end

hearty flicker Nov 6, 2023, 12:46 PM

#

Hey @here, the following is a rough list of things for the next week:

I'm going to fix some bugs related to MIDI conversion, which occasionally cause weird staccato notes.
Adding automatic audio processing to the inference library. Additionally, I'll refactor this part of the codebase to make it more sustainable.
@timber talon and I are going to work on experiment planning. I'll try to create a first draft (Google Doc) today or tomorrow.
@quasi steppe and I are going to conduct a pre-training experiment. I need to make some changes to the TokenizedDataset code first. After that, we'll aim to run an experiment on the SAI cluster this week. It might be cool to use a slightly larger dataset for this purpose. While we're at it, we might as well conduct some experiments with fine-tuning.
@timber talon and I are going to continue compiling the list of Spotify links. Great work on this, by the way, @timber talon! 🙂
I'm going to write the code for downloading the audio (spotify_dl) and the code for performing the audio->MIDI transcription in parallel on multiple GPUs.
There are several composer people that I'm going to reach out to, to see if they are interested in the project. There is one guy in particular (https://www.youtube.com/@cedarvillemusic) who might be interested in the Fugue stuff. This might also tie into the paper, as @timber talon suggested in the meeting.

hearty flicker Nov 6, 2023, 1:55 PM

#

Chopin - minute waltz

quasi steppe Nov 6, 2023, 2:24 PM

#

hearty flicker Hey @here, the following is a rough list of things for the next week: - I'm goi...

I made the dataset according to what I proposed using Huggingface datasets api. It's a fixed parquet file with samples duplicated 20x and concatenated, regrouped (2048 length) and shuffled. Also applied your data augmentations as well. It's only 2Gb thanks to the parquet format and we have about 1.6B tokens in it.

But I really like your dataset codes and we don't have to make drastic changes. I can use Huggingface API to quickly train the medium model using my dataset, and let's see if it solves the quality degradation problem towards the end.

hearty flicker Nov 6, 2023, 2:25 PM

#

If you can do it without code changes, then we can just use that!

#

The main reason I have MidiDataste and TokenizedDataset seperated out is so that we have freedom to change TokenizedDataset

quasi steppe Nov 6, 2023, 2:26 PM

#

hearty flicker If you can do it without code changes, then we can just use that!

Basically hugging face dataset is just a generator. You can plug it into torch.utils.data.DataLoader and get the one for training.

hearty flicker Nov 6, 2023, 2:26 PM

#

Since TokenizedDataset is only used for generating training datasets

quasi steppe Nov 6, 2023, 2:27 PM

#

hearty flicker Since TokenizedDataset is only used for generating training datasets

Yeah I actually like the idea of doing all the stuff on-the-fly

#

it's just that hugging face api is good for quick-and-dirty stuff 😂
But their datasets api is solid. The huggingface trainer is incredibly slow but with our scale it will be alright.

hearty flicker Nov 6, 2023, 2:28 PM

#

TokenizedDataset is pretty similar to the hf datasets API.

#

It's actually in the backlog to make it inherit from that class

#

I just haven't gotten around to it yet haha

#

Ok cool : )

#

1.6b tokens is quite a lot haha

#

How many sets of data augmentation did you try?

#

Or is the 1.6b without any data augmentation in there

quasi steppe Nov 6, 2023, 2:30 PM

#

I just followed the

[
  tokenizer.export_chord_mixup(),
  tokenizer.export_velocity_aug(1),
  tokenizer.export_pitch_aug(5),
  tokenizer.export_tempo_aug(0.2),
]

in your code

#

Did it for every sample

hearty flicker Nov 6, 2023, 2:31 PM

#

Fyi I think if you are using hugging face's dataset class you can use this https://huggingface.co/docs/datasets/process.html#format-transform

#

I've replicate it in TokenizedDatasets with the same name

quasi steppe Nov 6, 2023, 2:32 PM

#

I actually have to write a little custom mp codes for these. Somehow huggingface doesn't like the nested tuples and lists (after tokenize)

hearty flicker Nov 6, 2023, 2:32 PM

#

Datasets.set_transform()

quasi steppe Nov 6, 2023, 2:32 PM

#

oh those don't work after tokens are encoded into numbers

#

and huggingface datasets doesn't like nested list/tuples

#

😂

hearty flicker Nov 6, 2023, 2:33 PM

#

No they don't! Thats kind of why I made my own class instead of using HF

#

I'm still trying to fix the staccato note bug, so we should wait to build the pre-training dataset until that is fixed

#

I'll push the fix to dev

quasi steppe Nov 6, 2023, 2:34 PM

#

awesome!

hearty flicker Nov 6, 2023, 2:34 PM

#

I think I have it figured out, but it's not quite working yet.

#

MIDI is such a shitty file format hahaah

#

full of eccentricities : )

quasi steppe Nov 6, 2023, 2:36 PM

#

I will figure out training in the meantime. Hugging face trainer works with deepspeed and I should be able to set up something quick for multiple gpus

#

we can eventually improve the code base with or without those stuff but I'm really curious and want some quick-and-dirty results first 😂

quasi steppe Nov 6, 2023, 3:39 PM

#

Wow it's fast. Got a few checkpoints already

hearty flicker Nov 6, 2023, 3:43 PM

#

quasi steppe Wow it's fast. Got a few checkpoints already

What model size / how many gpus are you training on?

quasi steppe Nov 6, 2023, 3:46 PM

#

hearty flicker What model size / how many gpus are you training on?

8xA100, using exactly your medium config (so that we can compare)

hearty flicker Nov 6, 2023, 3:46 PM

#

It should train incredibly fast on tht

#

that*

#

The last checkpoint I did was 24hrs on 1x4090

quasi steppe Nov 6, 2023, 3:47 PM

#

the tflops stat is bad... I never had good experiences with huggingface's stuff

hearty flicker Nov 6, 2023, 3:47 PM

#

In my training code?

quasi steppe Nov 6, 2023, 3:47 PM

#

no

hearty flicker Nov 6, 2023, 3:47 PM

#

Yeah I think it's not correct

#

Oh

quasi steppe Nov 6, 2023, 3:47 PM

#

huggingface is bad

hearty flicker Nov 6, 2023, 3:47 PM

#

Ah

#

How many tflops are you getting?

quasi steppe Nov 6, 2023, 3:48 PM

#

Also the model is very small. This could also impact the efficiency

#

I did it manually and got about 100+ tflops but we got 8xA100 lol

hearty flicker Nov 6, 2023, 3:48 PM

#

I think it's measuring per gpu btw

quasi steppe Nov 6, 2023, 3:48 PM

#

no I calculate by hand haha

hearty flicker Nov 6, 2023, 3:48 PM

#

100tflops on 8xa100 must be wrong

#

Because I was getting like 80 on my 4090

quasi steppe Nov 6, 2023, 3:49 PM

#

I'm using huggingface Trainer class + accelerator to do the training.

hearty flicker Nov 6, 2023, 3:49 PM

#

It must be per gpu, if not that is insanely low lol

#

There must be something seriously wrong in the code

#

We should defoo fix all of this before we do the proper training run...

quasi steppe Nov 6, 2023, 3:50 PM

#

ok the batch size per device is 16, and we get 2 batches per second

#

I will check my math haha

hearty flicker Nov 6, 2023, 3:50 PM

#

I think with medium on my 4090 with batch size 8 I was getting 5its/s

quasi steppe Nov 6, 2023, 3:50 PM

#

can easily forget a zero or something

#

what? LOL

#

last year when I used huggingface Trainer I actually got something similar and never figured out what's wrong

hearty flicker Nov 6, 2023, 3:52 PM

#

Yeah I don't use trainer, only accelerate

#

I'll sort this out when I figure out how to use SAI cluster lol

#

gotta read the docs

quasi steppe Nov 6, 2023, 3:52 PM

#

to carefully do it we should just use torchrun FSDP

#

I had good experiences with that combo.

#

I don't know what benefit accelerate has

hearty flicker Nov 6, 2023, 3:53 PM

#

Because the model size is so low, I don't actually think any shardings is required

#

Can just use DP

quasi steppe Nov 6, 2023, 3:53 PM

#

hearty flicker Because the model size is so low, I don't actually think any shardings is requir...

no we will do data parallel

#

I vaguely remember in torch docs they say everything can be wrapped with FSDP

#

even if you want to do dp only

#

I could be wrong though. Or we use DDP. I remember that DP is not supposed to be used.

#

~~hmm the model still get repetitive towards the end~~maybe not that bad....

hearty flicker Nov 6, 2023, 4:06 PM

#

I've realised what my bug is too, but it's actually pretty annoying to fix

hearty flicker Nov 6, 2023, 6:00 PM

#

The staccato bug is fixed now, it wasn't a problem with the tokenizer, just with midi.dict_to_midi

#

So it should not affect the model in anyway @quasi steppe

quasi steppe Nov 6, 2023, 6:01 PM

#

This is awesome

#

I just submitted a couple jobs for 400m and 2b models. I will play a bit more with hparams

hearty flicker Nov 6, 2023, 6:02 PM

#

If you wana do bigger jobs, do you want some more data?

#

I can give you access to the gcp bucket holding the data

quasi steppe Nov 6, 2023, 6:03 PM

#

hearty flicker If you wana do bigger jobs, do you want some more data?

With small model I suspect the data is already not enough

hearty flicker Nov 6, 2023, 6:03 PM

#

There are some larger collections I have already zipped

#

But they contain other data such as pop ect...

quasi steppe Nov 6, 2023, 6:03 PM

#

2B token is very very small for pretraining plus that it's already like 20 epochs

quasi steppe Nov 6, 2023, 6:04 PM

#

hearty flicker But they contain other data such as pop ect...

How much more do you have comparing to that giant midi v1.2 dataset?

hearty flicker Nov 6, 2023, 6:05 PM

#

Giant is 10k files

#

I have over 200k files

#

But a lot of it is pop music

quasi steppe Nov 6, 2023, 6:05 PM

#

Oh that's nice

hearty flicker Nov 6, 2023, 6:05 PM

#

Randomly scraped from the net

#

so lower quality

quasi steppe Nov 6, 2023, 6:05 PM

#

Gotcha

hearty flicker Nov 6, 2023, 6:05 PM

#

For pretraining purposes it should still be good though

#

Might need to ft the model after if you want classical

quasi steppe Nov 6, 2023, 6:05 PM

#

Boarding a flight now. Will talk later

hearty flicker Nov 6, 2023, 6:05 PM

#

Have a safe flight

hearty flicker Nov 6, 2023, 7:07 PM

#

I can always add it back in later if we want to use the old method for ft.

#

@quasi steppe The dataset building you wanted is now the default behavior, you should be able to do the dataset building and truncation with the relevent code on dev now

tiny coral Nov 6, 2023, 7:21 PM

#

have you done any data augmentation?

hearty flicker Nov 6, 2023, 7:28 PM

#

There is a few forms currently implemented

hearty flicker Nov 6, 2023, 8:25 PM

#

There is also now a MIDI->mp3/wav converter in aria.utils. It will run automatically if you use the aria sample cli entrypoint after pip installing the package.

quasi steppe Nov 6, 2023, 11:58 PM

#

Wow I got cuda oom even with batch size 1 for a 2B model 🤦‍♂️. Ok my hacky solution using huggingface Trainer doesn't scale. Will work on your training code later this week.

#

Got a checkpoint for a "large" model. It still prefers to repeat toward the end though

quasi steppe Nov 7, 2023, 7:39 AM

#

Made a 100x augmented dataset. Gonna see if it overfits or gets better 🤔

quasi steppe Nov 7, 2023, 8:29 AM

#

oh wow @hearty flicker I think the model just needs more epochs. I'm listening to the completions by a large one trained on 2x augmented data

hearty flicker Nov 7, 2023, 8:30 AM

#

I think the checkpoint currently on the colab is epoch 50 lol

#

With data aug being applied differently each time haha

#

And it didn't start overfitting

quasi steppe Nov 7, 2023, 8:32 AM

#

yeah. It's half way through my 100-epoch dataset. The log-loss curve is showing signs of either overfitting or double-descent

#

this is really nice

hearty flicker Nov 7, 2023, 8:42 AM

#

quasi steppe this is really nice

Pull the latest version of dev and use utils.midi_to_audio

#

You will need to install fluid synth (apt install fluidsynth)

#

Makes a huge difference to the quality

quasi steppe Nov 7, 2023, 10:15 AM

#

I put some converted mp3 here
https://honglu.fan/files/
It's the 400M model (large.json) with about 3B tokens.
I used some other soundfont but will try yours later

#

they are small so I will keep updating

hearty flicker Nov 7, 2023, 10:16 AM

#

Btw @quasi steppe I've found that when sampling, the better prompt you use the better the results

#

That chopin MIDI is a little off beat so the model continues that off beat nature

#

If you find a better recording you might get a better result

#

This website is pretty good

#

http://www.piano-midi.de/chopin.htm

Classical Piano Midi Page - Chopin

Classical piano MIDI and MP3 sequences by Chopin for free download

#

Try using this

#

Make sure that the composer name is in the name of the file too, so that the model pulls the composer meta information

quasi steppe Nov 7, 2023, 10:21 AM

#

hearty flicker That chopin MIDI is a little off beat so the model continues that off beat natur...

I feel that there are quite a lot of those tiny off-beat from romantic era pieces but they are probably the performer's preference. The model might have learned that lol.

#

gonna try bach and beethoven haha

hearty flicker Nov 7, 2023, 10:21 AM

#

the chopin you are prompting with is from audio-> MIDI

#

So it's a little weird

#

Honestly I've been experimenting with high quality MIDI files for prompts, the results are way way way better

#

using the fluidsynth soundfont that I included in aria makes it sound so much better too

#

There is actually a bug in dev that means it's not using the correct soundfont, gonna push the fix npw

#

now*

#

Here is your chopin sample with the proper soundfont

#

Almost sounds like a real perf

quasi steppe Nov 7, 2023, 10:24 AM

#

indeed

hearty flicker Nov 7, 2023, 10:28 AM

#

The change is in dev now

quasi steppe Nov 7, 2023, 10:38 AM

#

hmm

hearty flicker Nov 7, 2023, 10:41 AM

#

Weird lol

#

I always liked this pieced by chopin

#

But the repeated note might make the model do some weird stuff

#

Also this is a nice etude

#

quasi steppe Nov 7, 2023, 11:02 AM

#

hearty flicker

here we go this one is weird though

#

hmm this one is fun

hearty flicker Nov 7, 2023, 11:03 AM

#

Very weird, maybe the stacatto notes stuff isn't completely fixed yet

#

It's actually such an annoying bug to fix

quasi steppe Nov 7, 2023, 11:04 AM

#

Overall these are actually pretty good.

#

wow empties . I haven't even started trying style interpolation yet.

hearty flicker Nov 7, 2023, 11:18 AM

#

Ok, I also merged dev into the main branch

#

with kv-caching the code currently does 20 toks/sec on a t4

#

Pretty decent

#

@here Here is a notebook if anyone is interested - https://colab.research.google.com/drive/1SmwmsSf92Bv30algvZ-D4rW8dtH0kJNL?usp=sharing

#

It will work very well even with the free version of colab

#

I like this

#

I think there is still a slight issue with staccato notes which is frustrating... The way that MIDI deals with the sustain pedal is very strange

quasi steppe Nov 7, 2023, 11:37 AM

#

Style interpolation. Sounds more interesting than before. It's clearly trying to mix them instead of gluing the two pieces together

hearty flicker Nov 7, 2023, 11:55 AM

#

This prompt works really well for some reason

hearty flicker Nov 7, 2023, 2:48 PM

#

I might add a key-meta token to the tokenizer

#

Could be good for controllability stuffs

quasi steppe Nov 7, 2023, 2:58 PM

#

@hearty flicker what does this message mean? WARNING:root:Tried to decode unexpected note message

hearty flicker Nov 7, 2023, 2:59 PM

#

That happens when detokenizing into a MidiDict object

#

It can happen when tokens are not in the expected order

#

For instance a note token not followed by a dur token

#

It's normally a sign of degenerate behaviour of the model

#

It should always get the order of the tokens correct

quasi steppe Nov 7, 2023, 3:11 PM

#

I was trying to go for 4096. It's actually not too crazy when going out-of-bound. It might have potential for YaRN. Gonna try it out later

#

4096 completions. Second half does sound weird but better than random tokens or repeating tokens

hearty flicker Nov 7, 2023, 3:18 PM

#

I mean in theory we could train 8k right?

#

With flash attention it would fit on a 80gb a100 pretty easily

#

If you find any that you like in particular lmk, because I'm compiling some good samples for an email rn

#

@quasi steppe Can I send you dataset files for a dataset larger than just GiantMIDI

#

I can build it without the padding as that is now the default behaviour

quasi steppe Nov 7, 2023, 3:32 PM

#

hearty flicker <@823129585230544906> Can I send you dataset files for a dataset larger than jus...

yeah feel free to dm me.
Like the bucket name and the credential file (or make it public for a short while) or you directly send to me by magic-wormhole

quasi steppe Nov 7, 2023, 3:34 PM

#

hearty flicker I mean in theory we could train 8k right?

yeah when we do the training seriously we should do either 4k or 8k.
Or a gradual scaling plan like 1/2 2k, 1/4 yarn s=2 4k, 1/8 yarn s=4 8k and more

hearty flicker Nov 7, 2023, 5:39 PM

#

I like this one

quasi steppe Nov 7, 2023, 7:30 PM

#

Oooooh I made a mistake in the yarn parameter. Also found a bug in my code in sample.py (not affecting us because it bugs out only when cfg turned off). Will push the fix.
I just tested the ppl and YaRN should be working fine with our models. Will try some 4096 generations and see how it looks.

#

@hearty flicker Can I put use_yarn and some other parameters directly into ModelConfig?

hearty flicker Nov 7, 2023, 7:35 PM

#

Yeah sure but you need to change all the relevant stuff

#

If you make a pr I'll review to make sure nothing breaks

quasi steppe Nov 7, 2023, 7:36 PM

#

Oh by the way, the ModelConfig could have been a dataclass.
I think I could define a new YaRN config as an optional param (because there are quite a lot of params and I don't want it to overcomplicate the original ModelConfig)

hearty flicker Nov 7, 2023, 7:48 PM

#

I think I originally had it as a dataclass but I changed it for some reason

#

You can change it back if you like

quasi steppe Nov 7, 2023, 8:22 PM

#

hmm I start to feel that dynamic yarn is only making ppl artificially low. The stuff sounds weird

hearty flicker Nov 7, 2023, 8:28 PM

#

What do you mean by people

#

We should also do a proper yarn ft

quasi steppe Nov 7, 2023, 8:30 PM

#

hearty flicker What do you mean by people

oh I meant the perplexity 😂

hearty flicker Nov 7, 2023, 8:30 PM

#

ha!

quasi steppe Nov 7, 2023, 8:30 PM

#

dynamic yarn has the advantage of not blowing up perplexity until 2x original max_length without finetuning

#

I'm testing this

#

but yeah with finetuning it should be all good because it at least sees some actual long samples

timber talon Nov 8, 2023, 9:18 AM

#

oh man there's a better way to do the spotify-dl... if you go composer first, and type "Prokofiev complete piano works", or something, into spotify, then you'll often get these massive albums with like 200 songs — every piano piece ever written by that one composer

#

anyway, I have ~20 of those sitting in a commit on my branch, waiting for the first PR to be merged

hearty flicker Nov 8, 2023, 10:24 AM

#

timber talon oh man there's a better way to do the spotify-dl... if you go composer first, an...

My only worry is that it will include non solo-piano works

#

I'm going to try to include some solo-piano detection in the pipeline like GiantMIDI, but it's best to be as safe as possible

#

That is a really good idea though Alex 🙂

#

I will merge your pr! Just got held up yesterday ha

hearty flicker Nov 8, 2023, 12:29 PM

#

This is a great strategy btw, the nice thing about spotify_dl is that is can work using any playlist

#

I'm gonna add some regex stuff I think too, scraping the titles to exclude all other instruments apart from piano

#

Also to exclude other bad keywords like concerto ect.

#

A good strategy might also be to automatically skip any spotify tracks that have more than two artists credits

#

21st goldberg variation

#

Ok I think instead of rejecting these albums based off of there being a single bad track, I'm going to use other methods to filter out non solo-piano

#

I'm going write the code for downloading the tracks and processing the metadata done this weekend. Then next week I'll try to get the transcription pipeline up and running.

timber talon Nov 8, 2023, 8:16 PM

#

hearty flicker My only worry is that it will include non solo-piano works

yeah wait until you see the block of albums at the bottom... they're literally called "Complete Solo Piano works of <Composer X>" and some of those albums have 100s of songs

#

i'm asking some of my friends about more niche piano composers, to continue diversifying a bit

hearty flicker Nov 8, 2023, 8:17 PM

#

I tried it out btw, works well on these compilations

#

I ran spotify_dl over one of the them and got a ton of hits

#

About 50%

timber talon Nov 8, 2023, 8:18 PM

#

oh great!

#

i wonder if this is something @sand nymph would consider tweeting out from the EleutherAI account?

"Who is your favorite classical piano composer (bonus points if they're niche or underrepresented, e.g. Clara Schumman, Scott Joplin, Fanny Mendelssohn)"

#

so we could crowdsource a wider range of piano composers for this dataset?

sharp quiver Nov 8, 2023, 11:20 PM

#

Do you already have a list of composers you have data from or that you are planning to get data from?

ruby frost Nov 9, 2023, 12:12 AM

#

can i beg for some scriabin

hearty flicker Nov 9, 2023, 12:24 AM

#

sharp quiver Do you already have a list of composers you have data from or that you are plann...

everything and anything (within reason)!

timber talon Nov 9, 2023, 5:29 AM

#

oh yes, we have a lottt of scriabin

sharp quiver Nov 9, 2023, 9:51 AM

#

hearty flicker everything and anything (within reason)!

I meant to ask this to avoid asking for composers that are already taken care of

hearty flicker Nov 9, 2023, 9:58 AM

#

We are mostly going by pianist as it's easier when gathering links

#

Most of the popular composers are well represented

hearty flicker Nov 9, 2023, 1:48 PM

#

Idk if this is open source @timber talon https://arxiv.org/pdf/2107.09142.pdf

#

But this could be a more promising direction for doing the transcription.

#

This is the same method I had in mind to improve over Kong et al.

#

However I've just found it during lit review lol

#

If this model isn't open source, I think I could pretty easily reimplement it in the aria repo

#

Might even work well with the aria tokenizer for the decoder, instead of using MIDI representation

#

I think there are other improvements possible too, like using synthetic data and more data augmentation

sharp quiver Nov 9, 2023, 2:06 PM

#

it looks like sony has a (SotA?) open source piano transcription model
https://arxiv.org/abs/2307.04305
https://github.com/sony/hFT-Transformer

arXiv.org

Automatic Piano Transcription with Hierarchical Frequency-Time Tran...

Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term...

GitHub

GitHub - sony/hFT-Transformer: Pytorch implementation of automatic ...

Pytorch implementation of automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture (hFT-Transformer). - GitHub - sony/hFT-Transformer: Pytorch...

hearty flicker Nov 9, 2023, 2:07 PM

#

Good find !

#

I wonder if this was presented at ismir this year, I wasn't there (it's going on rn)

sharp quiver Nov 9, 2023, 2:08 PM

#

yeah I've seen some people tweet about it, it looks pretty fun xD
https://fixupx.com/is_s_yun/status/1722416557559583231?s=20

FixTweet / FixupX

Amazing night at @ISMIRConf - thanks to Uri, Romain, Genís, and André for playing with me. Also, "Fear of MATLAB" was the best thing I heard in years! 😭😭

Sihun Lee (@is_s_yun)

Amazing night at @ISMIRConf - thanks to Uri, Romain, Genís, and André for playing with me. Also, "Fear of MATLAB" was the best thing I heard in years! 😭😭

▶ Play video

hearty flicker Nov 9, 2023, 2:09 PM

#

I know the guy playing guitar ha (well I met him once)

quasi steppe Nov 9, 2023, 3:28 PM

#

@hearty flicker do messages like
[2023-11-09 15:28:02,465] aria.data.datasets: [INFO] MIDI at bigger_data/bitmidi/bitmidi/96166.mid failed preprocessing tests: [('max_programs', 12), ('max_instruments', 8), ('total_note_frequency', 37.87731885463125)]
matter?

hearty flicker Nov 9, 2023, 3:29 PM

#

It's performing some filtering as part of the data lib

#

You can adjust them in the config file to your liking

#

Might be worth tuning for bitmidi

quasi steppe Nov 9, 2023, 3:29 PM

#

I'm running it across bitmidi and got my terminal screen full of these scrolling nonstop lol

hearty flicker Nov 9, 2023, 3:30 PM

#

Yeah that makes sense, I'd adjust the values to whatever you think would work

#

max_programs is the number of different intruments in the midi file

#

max_instruments is the number of different instruments (e.g. saxophone and clarinet would both be classified as woodwind)

#

total_note_frequency is max midi notes per second

#

you can adjust in config/config.json under preprocessing tests I think

#

btw the code for building MidiDatasets is multithreaded so I'd suggest running it on a machine with multiple cores or it will take ages

quasi steppe Nov 9, 2023, 3:33 PM

#

do "dataset_gen_args" matter any more?

quasi steppe Nov 9, 2023, 3:33 PM

#

hearty flicker btw the code for building MidiDatasets is multithreaded so I'd suggest running i...

no problem SAI machine has 96 cores

hearty flicker Nov 9, 2023, 3:33 PM

#

lmao it should take no time then

#

Dataset gen args are for making TokenizedDatasets

#

All you need to do is set the max_seq_len you want

#

So if you use aria tokenized-dataset then it will make the tokenized dataset based on dataset_gen_args in the config

#

That code is also multithreaded so should be fast

quasi steppe Nov 9, 2023, 3:36 PM

#

~~how many processes by default? Is there an argument for that?~~ Oh nvm, looks like it's max cpu count.
But it's still running 🤔

hearty flicker Nov 9, 2023, 3:42 PM

#

It should just use all available cpu cores

#

On 16 cores I think GiantMIDI takes about 5-10mins

quasi steppe Nov 9, 2023, 3:42 PM

#

strange... it just froze. The log should keep scrolling

hearty flicker Nov 9, 2023, 3:44 PM

#

Hmmn, it shouldn't run out of ram or anything

#

weird

#

Maybe try settings pool size to smaller than the max

#

https://github.com/EleutherAI/aria/blob/540b3b308ee560b58d172e18cda7a2a8caacffdf/aria/data/datasets.py#L248C20-L248C20

GitHub

aria/aria/data/datasets.py at 540b3b308ee560b58d172e18cda7a2a8caacf...

Contribute to EleutherAI/aria development by creating an account on GitHub.

#

Change Pool() to Pool(32) or something

#

I've never had it freeze on me before

quasi steppe Nov 9, 2023, 3:45 PM

#

oh one process got an error

Traceback (most recent call last):
  File "/admin/home-honglu/miniconda/envs/aria/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/fsx/home-honglu/aria/aria/data/datasets.py", line 212, in _get_mididict
    failed_tests = _run_tests(mid_dict)
  File "/fsx/home-honglu/aria/aria/data/datasets.py", line 180, in _run_tests
    test_res, val = test_fn(_mid_dict, **test_args)
  File "/fsx/home-honglu/aria/aria/data/midi.py", line 660, in test_note_frequency
    notes_per_second = (num_notes * 1e3) / total_duration_ms
ZeroDivisionError: float division by zero

hearty flicker Nov 9, 2023, 3:48 PM

#

Eek, I'll patch this

quasi steppe Nov 9, 2023, 3:48 PM

#

got this when hitting ctrl-c. Maybe the pool got some deadlock caused by this error in one process

hearty flicker Nov 9, 2023, 3:48 PM

#

Leme have a look

#

Must be an edge case

#

pull main and try again? I patched it I think.

#

@quasi steppe

quasi steppe Nov 9, 2023, 9:13 PM

#

@hearty flicker Got

[2023-11-09 21:12:11,906] aria.data.datasets: [ERROR] Failed to tokenize midi_dict: note_msgs is empty after ignoring instruments
[2023-11-09 21:12:12,063] aria.data.datasets: [ERROR] Failed to tokenize midi_dict: note_msgs is empty after ignoring instruments
[2023-11-09 21:12:12,432] aria.data.datasets: [ERROR] Failed to tokenize midi_dict: note_msgs is empty after ignoring instruments
Traceback (most recent call last):
  File "/admin/home-honglu/miniconda/envs/aria/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/admin/home-honglu/miniconda/envs/aria/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/fsx/home-honglu/aria/aria/run.py", line 179, in <module>
    main()
  File "/fsx/home-honglu/aria/aria/run.py", line 171, in main
    build_tokenized_dataset(args=_parse_tokenized_dataset_args())
  File "/fsx/home-honglu/aria/aria/run.py", line 138, in build_tokenized_dataset
    dataset = TokenizedDataset.build(
  File "/fsx/home-honglu/aria/aria/data/datasets.py", line 613, in build
    buffer += entry
TypeError: 'NoneType' object is not iterable

When running tokenized-dataset

#

but the jsonl dataset is done alright

quasi steppe Nov 9, 2023, 9:49 PM

#

also json.decode long dict seems incredibly slow. The MidiDataset.load is taking unbearable time to load stuff

quasi steppe Nov 9, 2023, 10:27 PM

#

The bitmidi jsonl file is huge. Parsing json is so slow that I got fluctuating performances between 20 - 100 lines/sec but we have 90k there lol. Probably the MidiDataset needs to do lazy loading in order to hide the latency🤦‍♂️.

hearty flicker Nov 9, 2023, 10:32 PM

#

Ok let me look at this tomorrow...

#

That bug should be an easy fix too

#

MIDI is a messed up file format tbh, you can't imagine the amount of random bugs related to it

hearty flicker Nov 9, 2023, 10:40 PM

#

quasi steppe also json.decode long dict seems incredibly slow. The `MidiDataset.load` is taki...

This too I'll fix tomorrow

quasi steppe Nov 9, 2023, 11:42 PM

#

hearty flicker This too I'll fix tomorrow

haven't tested it yet but my guess is that json format might just not be good enough if we have super long pieces

hearty flicker Nov 9, 2023, 11:43 PM

#

There should be a way around it

quasi steppe Nov 9, 2023, 11:43 PM

#

MidiDataset lazy loading seems like a small fix. Let me submit a PR and you check if it makes sense

hearty flicker Nov 9, 2023, 11:44 PM

#

Thanks : )

#

I can profile it tomorrow

quasi steppe Nov 10, 2023, 12:01 AM

#

+12 -10 such a small change lol.
Hmm I tried it but doesn't seem to help much. Guess json decoding is a real bottleneck but I need to go to bed now.

hearty flicker Nov 10, 2023, 9:50 AM

#

I haven't checked yet, but it could be due to having to load the json and then pickle it to be sent to the process

#

It's slightly annoying that the json loading has to run on the main process though

#

Once you have a tokenized dataset, using that should be really fast since the json loading doesn't run on the main process

hearty flicker Nov 10, 2023, 9:53 AM

#

hearty flicker I haven't checked yet, but it could be due to having to load the json and then p...

An alternative in the mp code could be to just send the raw string to the process, and then do the json.load in there

#

Would be an easy fix

#

I'm super busy during the day today but I'll push these fixes in the evening

hearty flicker Nov 10, 2023, 11:44 AM

#

@quasi steppe I pushed a fix, can you see if this speeds it up before I look at your pr?

#

It should move the json loading into the multiprocessing

#

Also the bug with += buffer should be fixed but I haven't tested it

#

Oh btw if you are building a large tokenized dataset from a mididict one, you should definitely use the 'aria tokenized-dataset' cli as it doesn't load the entire thing into memory

quasi steppe Nov 10, 2023, 12:05 PM

#

hearty flicker Oh btw if you are building a large tokenized dataset from a mididict one, you sh...

yeah this is what I was using

#

hmm the speed feels similar

hearty flicker Nov 10, 2023, 12:08 PM

#

The bottleneck might be the pickle then

quasi steppe Nov 10, 2023, 12:08 PM

#

I ran the cProfile and I'm still trying to understand the result.
For my own script I tokenize, transform and encode right away and make it a parquet file. Run this routine 100 times async and save to 100 different parquet files and then read and combine as the last step.

hearty flicker Nov 10, 2023, 12:08 PM

#

There is no way around that to my knowledge

#

If it's not the json load then it's probably the pickle

quasi steppe Nov 10, 2023, 12:09 PM

#

yeah

hearty flicker Nov 10, 2023, 12:09 PM

#

I can't think of what else it could be

#

That's a bottleneck for mp in general, esp for this sort of thing where you are sending input to each process

quasi steppe Nov 10, 2023, 12:09 PM

#

cProfile result

📎 profile

#

the biggest bottleneck was.... acquire Lock 🤔

#

wonder which thread did that

hearty flicker Nov 10, 2023, 12:11 PM

#

Hmmmn

#

If you are getting 20/s that should be 1.5 hours

#

Kind of sucks but idk

#

It's doable

quasi steppe Nov 10, 2023, 12:12 PM

#

yeah totally

#

I'm just curious what was going on

hearty flicker Nov 10, 2023, 12:12 PM

#

It's defo related to mp btw

#

In the mp function, the tokenizer and the mididict are passed in

quasi steppe Nov 10, 2023, 12:13 PM

#

yeah mp has a lot of overheads and I'm never fully comfortable about what it's doing

#Neuro-Symbolic Music Models