#Guidlines on how to train https://docs.ultralytics.com/models/rtdetr/ on large dataset

2 messages · Page 1 of 1 (latest)

quiet moon
#

So i want to train RTdetr on pubtables and have created a small subset https://www.kaggle.com/datasets/sreesankar711/pubtables-img-detect-test i want to first train it for a small amount to see if it works and to train fully on something like gcs(This part also i need help as i lack computional resources. seems like RAM gets full after like 10k images on DETR).

the main problem i seem to have is

  1. How do i give the data as input. the lables are in xml format, and supposedly the rtdtr takes only txt.

  2. the samplr .txt file looks like
    """
    0 0.445688 0.480615 0.075125 0.117295
    0 0.640086 0.471742 0.0508281 0.0814344
    20 0.643211 0.558852 0.129828 0.097623
    20 0.459703 0.592121 0.22175 0.159242
    0 0.435383 0.45832 0.0534531 0.111025
    """

  3. so the cordinates are xmin ymin width heigh normalized to image width and height????

  4. at the last how can i train a large data set about 100 Gb on cloud. can i do it on a GCS VM

quiet moon
#

hi all,

so i solved the training part,

but this warning comes,

train: Scanning /kaggle/working/datasets/tables/labels/train... 0 images, 5000 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:02<00:00, 2146.68it/s]
train: WARNING ⚠️ No labels found in /kaggle/working/datasets/tables/labels/train.cache. See https://docs.ultralytics.com/datasets/detect for dataset formatting guidance.
train: WARNING ⚠️ Cache directory /kaggle/working/datasets/tables/labels is not writeable, cache not saved.
WARNING ⚠️ No labels found in /kaggle/working/datasets/tables/labels/train.cache, training may not work correctly. See https://docs.ultralytics.com/datasets/detect for dataset formatting guidance.

Navigate through supported dataset formats, methods to utilize them and how to add your own datasets. Get insights on porting or converting label formats.