#Why are SSL GRL baselines already so strong?

2 messages · Page 1 of 1 (latest)

shy gull
#

hey guys,

I'm currently reading a lot about self-supervised graph representation learning.
Often, authors here evaluate how expressive the representations that their methods learns (e. g. representations of nodes in a graph) are by applying linear SVMs on them and reporting the classification.

For example, the much-cited paper that introduced Infograph reports downstream classification accuracy of 49.69% on the IMDB-M dataset.

I replicated their evaluation setup but used the raw dataset data without any pretraining and achieved an accuracy of 50.2%, so even more than what they achieved.

I observed this behavior on several datasets like MUTAG, Cora and Amazon Photo with other state of the art SSL methods: The pretrained data usually outperforms the raw data only by a small margin, if at all.

Does anybody know why it is like that? Like, what's the point of the whole research field then, if computationally costly training a DL model yields such negligible benefits? What am I missing?

shy gull
#

it seems here that using the original data for the downstream task yields a similar result
then why pretrain the data at all?