#Machine Learning - Dummy variable trap & feature selection

35 messages · Page 1 of 1 (latest)

pseudo thunder
#

First, so I know that to avoid the dummy variable trap, I should drop a column from the dummy variables. So now I see some people who say that you don't need to drop a column from your dummy variable because Sklearn will do it automatically.
I read some articles that said, "You don't have to do this because the sci-kit learn library automatically removes one of the variables for you in the following code."

# X is the training dataset and we are using Sci-Kit Learn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [3])], remainder = 'passthrough')
X = np.array(ct.fit_transform(X))
print(X)        

Other people said that you need to do it manually, and when I searched and looked at the OneHotEncoder parameters on the Sklearn website, I found that
drop{‘first’, ‘if_binary’} or an array-like of shape (n_features,), default=None
so the default of the drop is None, not first.
So, does the SK Learn actually take care of the dummy variable trap and remove a column, or should I do it manually? im confused?

Second, I read an article that says, "Backward Elimination is irrelevant in Python, because the Scikit-Learn library automatically takes care of selecting the statistically significant features when training the model to make accurate predictions."
So again, should I do feature selection manually? or Sklearn takes care of it? Idk, it doesn't make sense for me.

#

@topaz tusk

mystic trellis
#

r u ai developer?

pseudo thunder
topaz tusk
#

Also, what article says feature selection is not needed with sklearn tf

pseudo thunder
topaz tusk
#

Yeah and keep reading

pseudo thunder
pseudo thunder
topaz tusk
#

And what does None do

pseudo thunder
#

nothin'

topaz tusk
topaz tusk
pseudo thunder
topaz tusk
#

All you had to do is read the documentation lol. Could have also just tried running it and seeing what you got

pseudo thunder
topaz tusk
topaz tusk
pseudo thunder
pseudo thunder
#

well yh i tried it, no column has been dropped lol

topaz tusk
pseudo thunder
# topaz tusk There ya go lmaooo

i thought like oh maybe there's some magic happens behind the scene so when these catrgorical data fit to the model a column will be dropped 😂

topaz tusk
pseudo thunder
pseudo thunder
#

n now u said there r some models who can actually do it automatically so..

topaz tusk
#

They don't

#

It's still important to learn and implement feature selection yourself

pseudo thunder
topaz tusk
#

For the others, yes 100% you should always check your features and based on domain knowledge, metadata, etc you should make your decisions