#Conflicting class labels with Naive Bayes text classification
10 messages · Page 1 of 1 (latest)
I found these rows where the abstracts are the same but the labels are different (B vs E)
So I should replace the B with E right? cause 3 entries are E and only 1 is B
@coral vale
I'm not sure if I get the second image. It looks like you performed a pandas merge, but I'm kinda lost.
Regardless, the correct approach would be to investigate the reason for this inconsistency in the dataset. Is it possible it was annotated automatically and the system accused both classes? If you read said abstract, can you identify the "correct" class? Like I've said before, some datasets have multiple annotators and therefore different entries. But to assume that is the reason for this and just go with the majority can cause you problems on the long run
you gotta do a bit of detective work. Tbh, anything I advise you to do without knowing why you have conflicting entries in the dataset would be irresponsible
Apologies about not sufficiently explaining the second image. What I did was numbered rows with the same abstract into clusters (cluster_index) so all rows with the same cluster index have the same abstract. I joined it with itself so that I can find the pairwise combinations where the cluster is the same but class label are different