#stanford-rna-3d-folding

1 messages · Page 1 of 1 (latest)

silent delta
#

Hi

coarse bobcat
#

Hi

silent delta
#

ok... looks like someone sleept upon the keyboard

fickle nebula
#

I'm a bit confused by this challenge. On one hand, training a model from scratch would be very expensive, and require lots of training data. On the other hand, one could simply run AlphaFold 3, which has been shown to work well with RNA structures as well, and get a good score. Not sure how we could do better than the current state of the art.

#

One idea I have is to use one of the new foundation models published recently, and start from there. For example, Evo2 is a foundation model based on DNA sequences, that was published a couple of weeks ago. One of the figures of the paper (4d) mentions that the model is able to learn sequences associated with specific 2D/3D structures, from DNA (even if the challenge is based on RNA sequences).

#

Does anyone want to create a team to work on this challenge?

undone dust
fickle nebula
tacit tundra
#

@fickle nebula
utilizing grokfast (https://arxiv.org/abs/2405.20233) and bitnet(https://arxiv.org/pdf/2310.11453), for speed and effectiveness; and using Paperspace (https://www.paperspace.com/) for GPU options for training has been my go to.

This was my initial idea, but since I got distracted with a different idea, Ill give out this idea for RNA‑FoldNet.

included is a comprehensive outline of the idea, what sets it apart from AlphaFold, and An effectively comprehensive iterative development plan

lapis mason
#

@fickle nebula are you still looking for teammates?

solemn grail
#

Hihi

#

this looks fun 🙂

solemn grail
tacit tundra
# solemn grail what prompt/LLM did you use for that?

https://chatgpt.com/share/67c99743-6568-8004-bc80-7f0e0586d81c

heres the full chat for that one, mainly utilizing o3-mini-high i believe (with search at times).

the txt file hase three sections, the first two were generated within the provided chat above,

here is my prompt for finalizing plans (requires a conversation first):
[ please take all of the provided papers, and create a comprehensive plan outlining in entire detail the concept, idea, and flowchart of this model. It's purpose is for RNA 3d Folding, taking the sequence of 4 values and predicting the full 3d structure of that sequence. we want to go into no specific coding or implementation details within the plan, but we REQUIRE A COMPREHENSIVE amount of required knowledge in order to create full fledged model. the input is supposed to be a string that is representing a 2d input of 4 possible values: A, C, G, and U. the output is supposed to be a list of tuples, representing the x y and z values of the item in the corresponding position of the input sequence. ] (papers are pasted below)

For programming I utilize these prompts (With O1 Pro): https://docs.google.com/document/d/1wlC7-k7VCJqJvTcFTwXeFbbaVuC7lcEHGjlzdcUXsHk/edit?usp=sharing

the first one is only used for initial generation, you just paste in the full generated plan.
the second and third one are the meat and potatoes, and are both required each time you want to make a modification.

it's oriented towards jupyter notebook usage (bc I use paperspace)
the only problem i have on a regular basis is that it needs more context as to what it means for a segment modified or not, but thats pretty minor. (it sometimes marks functionally unmodified segments as modified)

I almost exclusively use O1 Pro to program, While o3-Mini-high is really close; I'd rather wait the extra 5 minutes for o1 Pro than have to recursively reiterate upon a problem with o3-Mini-high.

fickle nebula
tacit tundra
# fickle nebula Thank you, that's quite cool. Are you still participating to the challenge? I'v...

I am participating still; Just working on a different approach.
for whatever reasons, i've been having trouble extracting a list of tuples from a string; which has been the entire past week for me.
I actually hadn't heard of arnie before, looks interesting.

although, the Multi-Scale High-Resolution Feature Extraction of RNA‑FoldNet should be capable of effectively learning the same information withheld inside of the secondary sequences as well as other relevant context organically.

also, whatever you end up doing, it may be worth also trying to use it on the new BYU - Locating Bacterial Flagellar Motors 2025 competition. The needs of the AI are surprisingly similar between the two, with the main exception being the BYU input data being Images instead sequences.

zinc hare
#

Hi guys I don't really understand this challenge much but really love to do something here. Can anyone explain what we are trying to do here ?

#

Thanks a lot

zinc hare
#

Nvm I think I just understand a little more about this competition

left cobalt
#

Adenine guanine cytosine thymine

#

If I remember correctly

#

RNA has no thymine but it has uracil

silent delta
#

sequences of Nucleotids, the bases are just a part of them

#

but you're almost right, the letters refers to the corresponding base in them, but RNA not DNA

winged ether
#

Looking for team to join thanks you

zinc hare
#

Someone develop a model that bypass vfold_human_expert.csv already

coarse bobcat
#

Yeah, he is good AI coder.

#

Hengck23 has good data analysis and knowledge of how to use it.

silent delta
#

I'm curious about vfold_human_expert.csv, they gave a sequence to an human an asked to model 3D from it? Imagine manual modeling for a 4k sequence

coarse bobcat
#

I think the data was manually modeled by humans. Looking at the GitHub code, it seems there's a pattern of repeatedly analyzing and submitting results for a single RNA sequence at a time. Of course, I might be wrong.

silent delta
#

thx

coral furnace
#

Hey all, when it comes to training your model how have you set up your data when it comes to batches? Sequences have variable length and I'm unsure of how to batch them

long laurel
coral furnace
#

One more question, when asked to submit 5 sets of coordinates does this mean we submit 5 different inference passes for the same nucleotide? Train data only has 1 set of 3D coords

coral furnace
#

Furthermore from what I'm seeing there is a way to set up the external data, for the competition's sake should all data training etc be done in a single notebook? In the case of using the additional external data

coral furnace
silent delta
#

just submit same prediction 5 times, but run it just once

#

run it 5 times only if your model is not deterministic

dire mortar
#

did any of you all had any success pretraining the model on the 400k samples of synthetic data? Im about to try some transformer-GNN but I doubt is any good compared to the alredy existing SOTA models for ARN folding

coral furnace
dire mortar
#

and did you had any sucess ? i did not

winged ether
#

anyone need team member i am open join team

plucky tiger
#

Can anyone clarify how to account for rotations in your predictions? Couldn't I have infinitely many valid RNA structures that are just rotated differently than the one provided as label?

feral yarrow
#

Im looking for experienced and active team members for this competition, highly interested ppl dm me thanks

silent delta
#

USalign should find the best translated/rotated match between your predictions and the labels

coral furnace
coral furnace
plucky tiger
#

i mean the 3d position of the C1 primer atoms of the individual nucleotides, basically what you want to predict and I mean you can just rotete the entire structure in 3d space (or so i would imagine) without changing it function. Do you get what i mean by that?

silent delta
#

I haven't found public information about USalign, but we can trust, a simple Kabasch algorithm can find the best rotation assuming coincident centroids. So if USalign can do it better, perfect. You always can submit A, rot90xy(A),rot90xz(A),rotyz(A)... and as many experiments you want and check if your score changes

plucky tiger
#

I mean that might work for training but for submitting I have to rotate all my samples individually and check if that affects my score? That seems tedious + there aren‘t unlimited submissions every day

silent delta
#

No, the scoring should automatically align your predictions, only experiment if you don't trust automatic alignment

plucky tiger
#

ahh i get it, thanks alot man 🙏

coral furnace
plucky tiger
#

yeah but you‘ll learn nothing new that way I would think, because the input sequence is the same, just the output changes and your network should inherently be able to be invariant to rotation

coral furnace
#

How would it be invariant to rotation if you do not train it for the task? Similar to vertical/horizontal flips in images I'm presuming

solemn grotto
#

For all interested in joining this competition I strongly recommend looking at the pinned discussion posts

coral furnace
#

Hey all quick question: is it ok to build my dataset in another notebook to avoid having building the dataset reducing my available training time? Given the resulting dataset is public ofc, I'm taking the synthetic data and converting it into a dict for faster training time, most likely saved as a .parquet

coral furnace
#

On that note, in a few hours I'll be uploading .parquet files of the sequences in the synthetic data (https://www.kaggle.com/datasets/andrewfavor/uw-synthetic-rna-final) formatted for use in the competition. My initial attempt was just reading straight from the dataframes but finding the corresponding labels given a sequence was done in O(N), with a dict it's now in O(1). Shaving time is important when notebook executions are timed so hopefully this will help anyone trying to extend their model training time

cunning moat
#

But usually if we're comparing two structures you don't try to predict exact X,Y,Z you usually predict structural features.

#

Then figure out how to map the X,Y,Z to the structural feature if you need, but anything trying to predict the exact XYZ directly is usually complete trash for generalization.

solemn grotto
solemn grotto
plucky tiger
cunning moat
# solemn grotto Is this too computationally expensive for a loss function though?

Not at all. A lot of them are mild compute times. It amounts to a change of variables, we did these on CPUs before. The hardest part is if you don't have a clean inverse function. THat's where it gets a bit troublesome.

Even Alpha Fold works off predicting a pair representation first. Which effectively is another symmetry function.

plucky tiger
#

i'm currently predicting the EDM of the 3d structure

solemn grotto
#

Given two sets of XYZs, what's the best way to see if they "structurally" compare? Is there a name I could search up

cunning moat
coral furnace
tribal fossil
#

hey how long does submission scoring take?

coarse bobcat
empty frigate
#

Hi everyone, I’m Sam and I’m a Bioengineering student. There are explainers and tutorials for beginners in biology who are participating in this competition, but I can’t seem to find any resources for someone who understands the biology but not the programming part. Where do I start? Any help would be much appreciated! Thanks for your time 🙂

coral furnace
empty frigate
coral furnace
#

Since a lot of the approaches are multimodal (aka using multiple NN architectures at once), go over the rest of the basic NN architectures. What is an RNN? LSTM? Graph NN?

#

See how an RNN, a CNN and a graph NN can be used to tackle the problem

#

Once you have that down you can experiment w combininggg them for a multimodal solution

#

Or looking at what state of the art models are doing

#

Attention mechanisms would he something worth looking into as well

#

Imo read up on all those simpler iterations and then move to sota

#

If anyone is more experienced and has more to add please feel free to do so

empty frigate
#

Thanks for your guidance! 🙏

coral furnace
coral furnace
silent delta
#

Or try it at least to see if makes a difference.

#

I'm talking about RibonanzaNet of course.

coral furnace
#

So I've been trying to tackle this competition task for a while and I'm starting to have a few questions, I have experience with ML but this is the first time I'm handling RNA and the data and task are completely new to me

#

First of all regarding the data, does sequence length matter that much when training? In the proposed synthetic data each sequence is about 5k characters long while in the competition itself the sequences are about 200 chars max, how does this affect training? Given that currently the training dataset isn't huge per se

#

Secondly how much does padding the sequences to a common length (with a value that doesn't affect the loss function) affect training itself? Is this one of the tasks where the bigger the batch the better the outcome or is training with batch size 1 preferred to avoid padding altogether?

coral furnace
cunning moat
# coral furnace First of all regarding the data, does sequence length matter that much when trai...

Yes. The sequence length and position both matter because these things interact with each other. Think of it like if you put magnets on a stiff rope. If the magnets are too close they may not be able to bend the rope to interact. If you change the location of the magnet you change the length of the the curled region between magnets. If you have multiple magnets you have multiple pairs that can interact.

#

Short ropes can't loop on itself while long ones can.

coral furnace
#

So training on 5k long sequences while the ones in the valid set are 200 chars long isn't a great idea

#

Can a parallel be drawn to image resolutions as an example I'm guessing? Training on 4k images and having 720p on inference

#

But if that's the case how useful is the proposed synthetic data for the task?

#

Since afaik you can't exactly crop RNA sequences since that massively changes their structure

cunning moat
coral furnace
#

So if that's the case the synthetic data isn't of much use for the task

#

So from the original dataset when any series that has NaN is removed we only get about 600 sequences if I remember correctly

#

Yes more data will be added in about April but I'm really trying to make something work and haven't gotten past 0.11 on TM score

#

I would guess going pretrained is the only viable option rn and just tuning

cunning moat
#

But yes traditional data science approaches don't work as well because it's got a high cross correlation factor

#

It's a similar problem to LLMs where your synthetic data is synthetic prompts.

#

And the response prompts still need to make logical sense

coral furnace
#

So I'm guessing a way to have the best of both words is to create synthetic data from the current data in the dataset

silent delta
novel bough
#

Has anyone tried using actual RNA sequence data from RCSB? I'm building a parser that follows the contest's format (extracting coords from C1' atoms from nucleic acid residues).

coral furnace
novel bough
#

Wdym 200 max? Some sequences in the training set reach 4k

coral furnace
#

really? must got my datasets confused

#

i'll take a look

novel bough
#

The lengths of the training set is kinda weird, some sequences are very short

plucky tiger
#

i mean yeah, there are some longer sequences in the training data, but only like 6% are longer than 300 nucleotides and 3.5% longer than 1000 nucleotides. Not really what I would call balanced dataset.

serene sundial
#

Anyone want to collaborate?

shadow cypress
#

Hi All, I am looking for Kaggle Grandmasters who have won competition who can mentor me. I am willing to pay for mentorship. Thank you!

plucky tiger
#

Do y'all have some thoughts on a viable approach to normalizing the 3d structure labels?

novel bough
#

Not sure if it'll work though

plucky tiger
#

Yeah, I had something similiar in mind, that would keep the relative distances of the nucleotides but scales them down/up accordingly, but unfortunately US-Align doesn't account for scaling, only for rotation and translation, which makes training easy but submissions are a pain because of the TM-Score.

gritty garden
#

Would anyone want to work on the Stanford RNA 3D competition together? If so, DM me, I’m down to work together and work on improving our accuracy and learning a lot from this competition and growing our AI and ML and Data science skills

tidal spruce
tidal spruce
plucky tiger
#

This is my first competition, so I don't quite understand how the submussion notebooks work? It kinda confuses me

turbid lava
#

That is why the train is so small. You don't have all of the values.

tidal spruce
#

All going for deep learning methods or template-based n ab initio are worth trying too ?

flint socket
# plucky tiger This is my first competition, so I don't quite understand how the submussion not...

I’m new to competitions and am simplifying, but basically your notebook does the following:

load the data,

train a model that predicts the targets,

feed the predictions into a submission file (formatted according to competition details).

Then they use your code to predict not only targets for the test cases for which you have a sequence that you can see in the test file already. They may add more (if I remember well, they will, in this competition).

So your code needs to be resilient, able to predict test sequences you haven’t seen.

plucky tiger
tidal spruce
#

There's no such restriction in the comp

plucky tiger
#

perfect, thanks a lot guys

wheat galleon
#

This is very exciting! However, the submission deadline is in May, and the competition ends in September. I was wondering when the winners will be announced.

tidal spruce
flint socket
solemn grotto
plucky tiger
#

This is a valid approach, I opted to training with batch_size of one and accumulating the gradients before updating the model to get a more stable training.

solemn grotto
#

How long does a whole training session take

tidal spruce
#

Do someone know any gnn based approach

solemn grotto
#

There are a few but idk the names

solemn grotto
#

just search on google

tidal spruce
bleak cedar
#

Hello Everyone!
I am Shashank, with 3+ years of experience in the domain of data - I am very much interested in creating and deploying end-to-end machine learning models.
Since, going deep into the idea, models related to Artificial Intelligence, also facinate me to work on, and to have a proper solutions to the business.

Same interest students/professionals can connect me on my linkedin: https://www.linkedin.com/in/snkp0018

Happy Learning!
Best,
Shashank Pandey

plucky tiger
# solemn grotto How long does a whole training session take

That largely depends on the model and training data, and hardware you use. My initial model was a rather simple CNN, utilizing only the training data provided by the competition and training on my RTX 4090, each epoch took like 10-20 seconds.

solemn grotto
#

Holy shit

#

That's crazy

#

I'm trying to use paperspace for cuda and stuff but there's a bottleneck I can't fix and it's slowing it down so much

plucky tiger
#

have you tried using pytorch.utils.bottleneck? that might give you some further insights

#

i mean where exactly your pipeline‘s spends the most time at, like data loading etc.

pine harbor
#

Any issue on notebook submissions?

tidal spruce
plucky tiger
silent delta
#

I was curious. Over what dimensions you apply convolutions?

pine harbor
#

anyone having trouble with memory on the long sequences when submitting with DL models?

atomic wing
#

Scaler. joblib file?

solemn grotto
#

looking for someone with biochem knowlege to collaborate

#

Just shoot me a DM

solemn grotto
plucky tiger
# silent delta I was curious. Over what dimensions you apply convolutions?

Given the one-hot encoded input tensor of shape LxM (L=Length of sequence; M=Encoding dimensions [4 -> G, A, U, C]) I perform the kronecker product to get to the shape (L, L, M^2) which is quadratic in shape, with M^2 number of chnanels which I then can process with a CNN. Does that clarify my approach?

#

def kronecker_product(self, one_hot_encoding):
"""
Computes the Kronecker product of the one-hot encoded sequence.

    Args:
        one_hot_encoding (torch.Tensor): One-hot encoding of the sequence (L x 4).

    Returns:
        torch.Tensor: Pairwise Kronecker product (L, L, 16).
    """
    # Compute the outer product (Kronecker product) for all pairs
    L = one_hot_encoding.shape[0]
    kron_product = torch.einsum('ik,jm->ijkm', one_hot_encoding, one_hot_encoding)
    kron_product = kron_product.view(L, L, -1)  # Reshape to (L, L, 16)
    return kron_product
#

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product (which is denoted by the same symbol) from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basi...

plucky tiger
silent delta
#

but kronecker product of ortonormal vectors don't produces unnecessary larger and sparsed new vectors?

solemn grotto
#

Just adam or fused or 8bit or lion

plucky tiger
plucky tiger
silent delta
#

My team mate told me about a RNA model using 2D images. What I first though was base pair probabilities, but personally I'm skeptical about any CNN to this task.

solemn grotto
#

A cnn would work well with it it just needs to be in a transformer

#

That's how their ribonanzanet model works and its provided to you and works kinda well

tidal spruce
solemn grotto
#

No

#

1d convolutional i think

#

it uses each possible nucleotide as a diff vector so 4 total and treats each sequence as an input

#

so it runs self attention on the sequence

tidal spruce
solemn grotto
#

I'm not too familiar with vits how would it be used here

plucky tiger
solemn grotto
#

Ah really

#

Interesting

#

So no encoding whatsoever?

#

That's surprising with nucleotide sequences you'd think you'd need attention yk for long range interactions like folding

plucky tiger
#

the model converges sufficiently well, but the resulting structures are somewhat off, so i‘ll try training with the diffusion data and making the model bigger and if that doesn‘t improve the results i‘ll probably start working with attention

solemn grotto
#

sounds good

#

wdym diffusion data

solemn grotto
#

Oh and if you're interested in using attention the ribonanzanet model they provide has a bunch of stuff setup

tidal spruce
#

Try differential transformer

plucky tiger
#

listed under „additional files“ for the competition

plucky tiger
solemn grotto
#

Yeah you'll have to peek into their premade network.py

solemn grotto
turbid lava
#

"Potentially Scam"

cinder raptor
#

Hey guys im new to the competition, what are some good scores you've seen on the TM scale for the public leaderboard?

#

Or have found yourself

tacit kernel
cinder raptor
cinder raptor
flint socket
cinder raptor
tacit kernel
cinder raptor
#

cus i can see there's a significant amount of missing coordinates

flint socket
#

I merged train_seq with train_lab, then dropped the rows with missing values for x_1, y_1, z_1

I couldn’t think of an alternative

teal lava
#

Is the current top 1 legit or data leakage?

flint socket
# teal lava Is the current top 1 legit or data leakage?

If a solution ends up being too tailored to the existing scoring dataset, there can be a big shake up in the leaderboard when a new scoring dataset is introduced. The introduction of a new scoring set will happen two times in this competition. So we’ll find out soon I guess. Don’t forget to check the discussions on Kaggle. You can search for terms like leaderboard and see what folks have been speculating about. There’s a lot more discussion going on there.

teal lava
#

I checked it already there's just a lot going on there so it's hard to find the gist

small dune
#

Hey, is anyone else getting this issue?

When I try to submit to the Stanford RNA 3D Folding competition, it says:

Cannot submit — Submissions have been disabled for this competition.

But the competition deadline is still a month away. Just wanted to check if it’s a platform issue or something temporary. Let me know if you’re seeing the same thing.

silent delta
#

disabled while rescoring

#

should finish in about few hours

sly sinew
#

submission failed !. Can anyone tell why this is happening?

silent delta
#

It depends on your code. I think some of the most promising public codes have some error when executed on new test. You should check it carefully or step by step debugging, submit with incremental pieces of the code to find what crashes it.

#

Fast answer, test samples have changed.

flint socket
#

If your notebook is failing, in addition to checking if you have the right columns, you might want to check the order of rows.

The order of rows needs to be exactly as in the sample submission file. And the order is not always perfectly sequential in that file (at least it wasn’t before the leaderboard pause).

#

And the indexes and IDs also need to match those of the sample submission.

#

At least, that has been my experience. Changed everything: columns, data types. Only worked when indexes and IDs were aligned in the same way as in the sample submission.

cinder raptor
#

guys do these values in the validation labels have any meaning? Or should I just delete them

silent delta
#

NaN

barren crag
#

Hello everyone!
I'm a B.Tech undergraduate currently looking to join a team. If any team has an open spot and is looking for a dedicated member, I’d love to be a part of it!

Alternatively, if you’re also looking for a team, feel free to join mine— I’m open to collaborating with like-minded people. Let’s connect!

vivid iris
#

Hi guys; quick question: is the foundational ribonanzanet model trained on data that isn't publicly available/posted within the comp? Would I be losing out on a lot of data if I'm not using ribonanzanet? Thanks in advance

cinder raptor
#

Hey guys pardon if i ask dumb questions, the sample submission asks for 5 sets of coordinates. I've gotten one set of coordinate by using a model. How do I get the rest of the 4? Should I use different models in the same code notebook and generate 4 other coordinate sets then put all of these together?

silent delta
#

you have 5 guesses, you can use five models, postprocess output from a single model 5 different ways, or just repeat a single prediction five times, is up to you

mental mountain
#

Hey, I'm getting the submission file not found error when I submit. I've checked that the notebook runs through and generates the attached submission file format.

I'm wondering if it could be a dependency thing? I'm !pip install _ing two modules at the top of my notebook with internet enabled, and I have them listed in my dependency file and turn internet off when I submit. The submission process seems to get past the dependency installation and moves on to actually running the notebook, but not sure this necessarily means the dependencies were installed successfully.

Is there any way to get more information on what might be going wrong? How do people typically debug submission errors-- just add/remove components until it starts submitting properly?

silent delta
#

does your code produce extra files on disk? Often that causes this error, so if it does you should clean everything from disk before to_csv().

#

Ah, and not this time but save csv with index=False will avoid future format errors.

mental mountain
#

I don't think it produces extra files, it does download a model and tokenizer though

silent delta
#

so yes, those are files different from csv, try leave only submission.csv at disk

mental mountain
#

Thank you, realized also I was attempting to download said model/tokenizer with internet off 😂 😭

silent delta
#

lol, I miss that too, but that shouldn't produce submission.csv not found, does it?

mental mountain
#

I figured if any cell errors out, it stops notebook execution the csv file saved off in a later cell just wont get created?

cinder raptor
#

Hey guys there's about 15 missing values in the validation labels. I need to predict them to make a submission file. How am I supposed to predict values for them if my test set doesn't have data for them?

silent delta
#

Test in local is just an example and you should not use it to train. Anyway, if for any reason you have missing values in your training data you can either ignore them at loss calculation or value imputation, in this case, I suggest just ignore them.

#

I'm not sure if I understand correctly your question.

cinder raptor
#

cus the submission file demands it

silent delta
#

Yes, but that's the key. For a test, you don't need to know label, just input (-CAU-). If you want to use them as training data since is only a toy test you can, just mask the unknown positions and don't count them in your loss calculation. Same for local testing, process the full chain, but when you compute score, remove the unknown positions.

#

You wont know any label for actual hidden test.

cinder raptor
# silent delta You wont know any label for actual hidden test.

But for the hidden test, the format of the file will be the same. My code will pick up the y_1 and z_1 labels for even the hidden set and predict the x_1 labels. Which is also what im doing here. And if the y_1 and z_1 are faulty in the test set here, im not sure how to get x_1

silent delta
#

Your submitting code have to reading only input sequences of a general test produce as many xyz coordinates than five times the lenght of the sequence. You don't need any labels for that. Only need them to train and score.

#

What should been done a part with properly labeled or masked data.

cinder raptor
#

I ran a different model and it got rid of the missing labels, it now gives proper labels. But the submission still shows an error, i cant seem to get a score. I don't understand why :((

cinder raptor
#

If anyone would want to help with checking out my submission file, id be very grateful

#

can't seem to get a score

silent delta
#

you can share the code in Kaggle and ask for help

cinder raptor
#

btw are too many decimal places eg 7 a problem for my predictions in submission file?

silent delta
#

not at all, but I think your code is not general and is producing submission for the toy test example rather than a general unknown test.

cinder raptor
paper magnet
#

hello guys,is there one use esm2/3 (protein language model) to solve this competition?

thin ruin
#

This is more of a general Kaggle question than specific to this competition--but when I edit one of the provided starter notebooks, my new "forked" notebook seems to get saved in a different "folder" or "area" on Kaggle where it doesn't show up at all in "Code" under "Your work". I'm wondering why this is--I'm guessing that "competition notebooks" are different from general Kaggle notebooks and that only the former show up under "Your work".

#

Going off of that, I was wondering if there's an easy way to copy the dependencies from an existing notebook into your own fresh competition notebook (not an "edit" of that other notebook). The RibonanzaNet secondary structure inference works beautifully in the example notebook, but if I just copy the notebook code to my own new notebook, it doesn't run, at least in part because RibonanzaNet is not included in the inputs directory. But even if I download and re-upload the .pt files into my own notebook, I'm not sure if it will work because there are other files in the directory of that starter notebook than just the weights.

silent delta
#

No idea what are you talking about

#

forks should be found in your work with all inputs from original attached, refresh if not

thin ruin
thin ruin
#

I'm wondering if it's possibly because it says "Draft session" at the top--maybe that's a kind of temporary notebook rather than a "full" one? It's still saved, because as I said I've been working on it for like a month, and I was able to rename it and everything but it still doesn't show in "Your Work".

silent delta
#

if is from another competition you will need to upload notebook and set all inputs manually, I think

#

so try to search that fork on your work but in that competition, if possible

#

ok is not a competition, but is about I was thinking, to submit in this competition you'll need to upload it manually in this competition code section, and there, set all inputs manually (or may be that will be automatic, I'm not sure)

winter geyser
#

I am wondering. The competition is in “close” but the timeline says it has 4 months to go. Does this mean I have time to form a submission?

silent delta
#

I don't think so, those 4 months are to get a final safe test of true new structures

winter geyser
#

Anyway we can get confirmation on that?

silent delta
#

probably they will reopen them at the end

winter geyser
#

Bummer

solemn grotto
#

its closed

bright violet
thin ruin
#

This is the only one I've tried forking.

thin ruin
#

By the way, I am still looking for people who want to collaborate on RNA structure prediction in the long term. I presented some ideas to the Eterna game people, I don't know if anyone from here is on there too.