So i want to train RTdetr on pubtables and have created a small subset https://www.kaggle.com/datasets/sreesankar711/pubtables-img-detect-test i want to first train it for a small amount to see if it works and to train fully on something like gcs(This part also i need help as i lack computional resources. seems like RAM gets full after like 10k images on DETR).
the main problem i seem to have is
-
How do i give the data as input. the lables are in xml format, and supposedly the rtdtr takes only txt.
-
the samplr .txt file looks like
"""
0 0.445688 0.480615 0.075125 0.117295
0 0.640086 0.471742 0.0508281 0.0814344
20 0.643211 0.558852 0.129828 0.097623
20 0.459703 0.592121 0.22175 0.159242
0 0.435383 0.45832 0.0534531 0.111025
""" -
so the cordinates are xmin ymin width heigh normalized to image width and height????
-
at the last how can i train a large data set about 100 Gb on cloud. can i do it on a GCS VM