hey guys,
I'm currently reading a lot about self-supervised graph representation learning.
Often, authors here evaluate how expressive the representations that their methods learns (e. g. representations of nodes in a graph) are by applying linear SVMs on them and reporting the classification.
For example, the much-cited paper that introduced Infograph reports downstream classification accuracy of 49.69% on the IMDB-M dataset.
I replicated their evaluation setup but used the raw dataset data without any pretraining and achieved an accuracy of 50.2%, so even more than what they achieved.
I observed this behavior on several datasets like MUTAG, Cora and Amazon Photo with other state of the art SSL methods: The pretrained data usually outperforms the raw data only by a small margin, if at all.
Does anybody know why it is like that? Like, what's the point of the whole research field then, if computationally costly training a DL model yields such negligible benefits? What am I missing?