#make-data-count-finding-data-references
1 messages · Page 1 of 1 (latest)
Hello.
Can you clarify, in the examples in the data tab, what evidence was used to determine that the secondary sources are indeed secondary?
For pdb 5yfp is it because the text was in the introduction?
Although the refering paper is far from my field, E-MTAB-10217 seems primary to me.
From https://doi.org/10.3389/fimmu.2021.690817 (Papoutsopoulou, Stamatia, et al. "Impact of interleukin 10 deficiency on intestinal epithelium responses to inflammatory signals." Frontiers in immunology 12 (2021): 690817.):
RNA Sequencing
Host transcriptome analysis was performed by RNA sequencing of unstimulated and TNF (40 ng/ml) stimulated enteroid cultures from C57BL/6J mice (N = 3). RNA extraction and purification from enteroids were performed using the RNeasy mini kit (Qiagen), as per manufacturer's instructions. Strand-specific sequencing libraries were prepared with the TruSeq stranded Total RNA kit (Illumina) from 1 µg total RNA of each sample and sequenced on an Illumina HiSeq2000 (100-nucleotide paired-end reads)
It reads to me like the RNA sequencing results were explicitly generated from mouse intestinal enteroids that were experimentally stimulated with TNF specifically for this study.
Thanks
Interleukin 10 (IL-10) is a pleiotropic, anti-inflammatory cytokine that has a major protective role in the intestine. Although its production by cells of th...
I could make a similar argument for the other data source in the Papoutsopoulou, Stamatia, et al. paper(PRJE43395):
Assay for Transposase-Accessible Chromatin Sequencing
The ATAC sequencing protocol was based on the protocol of Buenrostro et al. (31) and modified for the specific cell type, as described below. Enteroid cultures were maintained in 24-well plates, as described above, and they were either left unstimulated or they were treated with 40 ng/ml TNF for 2 h (four wells per condition). At the end of stimulation, the medium was removed, the plate was transferred on ice, and 1 ml cold PBS was added in each well. [...]
I reads to me like the authors similarly directly generated the data by treating intestinal organoids with TNF to identify chromatin accessibility regions within the scope of this specific investigation.
<@&1303433601177751593>
Also, what pdf reading library is available to us if we have to turn off the internet for submission?
I found the discussion that answered this other more baisc question in the discussion here
I guess the discussion is more active than the dicord. I will ask my previous question in the discussion.
Do we need to use models which are under models section which are mostly qwen versions or can we use any other models like gemma, llama
nah, you can use any freely & publically accessible model
Is there any update regarding dataset?
Here the new modified training labels
Btw i'm looking for someone to team up with
Can you share your kaggle profile?
hello
I use MinerU to extract text from PDFs, it might help to have a good data
https://www.kaggle.com/datasets/omiderfanmanesh/make-data-count-dataset-mineru-extraction
you can download it from here
Hi there, has anyone run into issues with their notebook being marked as failed even though their code ran successfully? It seems to be running into a papermill exception when converting the notebook after code execution. Unfortunately, it is marking all of my notebook versions as failed.
I updated one of my projects today and it worked, but I couldn't identify what you mentioned in my projects.
