First, so I know that to avoid the dummy variable trap, I should drop a column from the dummy variables. So now I see some people who say that you don't need to drop a column from your dummy variable because Sklearn will do it automatically.
I read some articles that said, "You don't have to do this because the sci-kit learn library automatically removes one of the variables for you in the following code."
# X is the training dataset and we are using Sci-Kit Learn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [3])], remainder = 'passthrough')
X = np.array(ct.fit_transform(X))
print(X)
Other people said that you need to do it manually, and when I searched and looked at the OneHotEncoder parameters on the Sklearn website, I found that
drop{‘first’, ‘if_binary’} or an array-like of shape (n_features,), default=None
so the default of the drop is None, not first.
So, does the SK Learn actually take care of the dummy variable trap and remove a column, or should I do it manually? im confused?
Second, I read an article that says, "Backward Elimination is irrelevant in Python, because the Scikit-Learn library automatically takes care of selecting the statistically significant features when training the model to make accurate predictions."
So again, should I do feature selection manually? or Sklearn takes care of it? Idk, it doesn't make sense for me.