#Myna: Small Music Representation Model

4 messages · Page 1 of 1 (latest)

rigid oxide
#

If anyone wants to join the project - write something here!!

twilit ruin
#

I don't think I'm in a position to pick up more projects at this moment, but I'm interested in seeing this move forward

#

It seems you and the other collaborator have already done some work - maybe you could share more details about the project plan at a high level, including concrete things potential contributors could look at/do?

rigid oxide
# twilit ruin It seems you and the other collaborator have already done some work - maybe you ...

I would love to share some details! We are training on a contrastive objective and would probably really benefit just from scaling up and some architecture improvements. We are currently using a variant of Minz Won's self-attention architecture (https://arxiv.org/abs/1906.04972) as the base model. I am a bit hesitant to share more details publicly as we've been working on this for roughly a year (I don't know how it goes in the research world).

As for contributors and what you could do:

  1. We could definitely benefit from some architecture changes to make the model more capable
  2. There are definitely parts in my methodology that could be improved
  3. We should evaluate on the MARBLE benchmark (https://marble-bm.shef.ac.uk/) - we still need to write the evaluation code for this
  4. General ideas to improve performance. Everyone else in the field is training on at least 16 GPUs right now and I have an RTX 4080. While results are good for only a single GPU, I wonder if scaling it up correctly will yield a competitive model that is an order of magnitude smaller and faster.

Would love to hear your thoughts!