#Model Adaptation with Diffusion
1 messages · Page 1 of 1 (latest)
@latent dagger I forget, are you a jax user at all?
if so, I'm willing to help out on code for this
as in give advice/debugging support, I don't have bandwidth to do the bulk of the work
I think it's pretty simple to implement this in jax if you have a diffusion model w/ the appropriate output shape
I am a PyTorch user.
Also preparing papers for NeurIPS and I have some other deadlines coming up. But can start later.
Is this project still active ? I’d love to contribute to this
Now that I am done with NeurIPS, I'll start working on this.
Great, where we can discuss? Here? And is there any code that I can start with?
I am also interested in contributing to this project if its still active.
@latent dagger is your goal to only reproduce their results and extend to a wider variety of tasks? What are your expected outcomes from doing this?
In any case I have a set of close to a thousand checkpoints of different architectures for fine grained image recognition on a wide variety of datasets. I expect to release them along with a paper hopefully in the next few weeks. Do you think this would help with your task?
We aren't interested in reproducing their results. Instead, we want to use diffusion models as hypernetworks for model adaptation.
We can approach this problem in different ways. One challenge is generating the weight data for the diffusion model (which you might be able to help with). However, I think we can avoid the requirement of weight data and train these models end-to-end. Another thing to consider is which architectures/problems to test this on.
Since there appears to be some interest, I will write a more detailed plan this weekend—if anyone has any ideas or suggestions, please share them here.
@latent dagger if you are still planning to continue with this project, then ig if everyone agrees, we can schedule a meet to further discuss your idea?
Yes. I wrote a mini proposal—I will post when I have time, but I am happy to schedule a meeting or discuss here.
@latent dagger any updates?
This has some surface similarities to ideas I had a few years ago about building an artificial analogue of the LC-NE complex in the human brain for rapid task adaptation. I chose the system as a relatively tractable looking case where we might be able to build artificial neurotransmitters. My interpretation was that the LC-NE optimizes the covariance of errors in subcircuits of tasks in order to balance between exploration and exploitation.
But I was being pretty hasty at the time and I'm not confident my understanding of the neuroscience was right, couldn't find anyone using the word covariance. But the idea is that you want to hedge so that if subcircuit 1 is wrong, subcircuit 2 is wrong in a way that counterbalances it, and how aggressive you want to be about exploiting the covariance changes depending on if you're already good at a task or bad at it.
Transfer learning has gained significant attention in recent deep learning research due to its ability to accelerate convergence and enhance performance on new tasks. However, its success is often contingent on the similarity between source and target data, and training on numerous datasets can be costly, leading to blind selection of pretrained...
related paper that came out around the same time
i wonder if you could condition the diffusion model with say 1-10 images, and have it output a lora of the character in the images
could be a way to generate much more data efficient loras
Yes. This would be similar to the original idea.
https://weight-space-learning.github.io/
This whole workshop is based around this topic in case anyone is interested
Proposal PDF with more technical details on the topic:
https://openreview.net/pdf?id=Bz6wEdobY7