#🚑┊nlp-with-disaster-tweets
1 messages · Page 1 of 1 (latest)
I am Currently working on the kaggle nlp twitter disaster project and I have reached a rock bottom with my code and I am not sure what I am doing wrong..please I need help
The error seems to be clear: "NameError: name 'preprocess' is not defined"
Thank you for this reply..I have to use finalprocessing(string) but it threw a 'RecursionError: maximum recursion depth exceeded'
sorry i meant..i had used the ''return lemmatizer(stopword(finalpreprocess(string)))'' and it threw the recursionError
"keras_nlp does not have attribute version" - what is unclear in that message? Unlike most python packages, you can't find out about the version by using this line. You can either dig deeper to find out where this package stores its version, or simply deleted that line as it doesn't contain anything that is truly necessary for the script to run.
since there is no longer free/cheap access to tweets, what do people usually use instead of them as a source of real-time text stream?
https://twitter.com/XDevelopers/status/1641222782594990080
Today we are launching our new Twitter API access tiers! We’re excited to share more details about our self-serve access. 🧵
1479
1071
Load a DistilBERT model.
preset= "distil_bert_base_en_uncased"
Use a shorter sequence length.
preprocessor = keras_nlp.models.DistilBertPreprocessor.from_preset(preset,
sequence_length=160,
name="preprocessor_4_tweets"
)
Pretrained classifier.
classifier = keras_nlp.models.DistilBertClassifier.from_preset(preset,
preprocessor = preprocessor,
num_classes=2)
classifier.summary()
Can anyone help me with this error msg?
Hello I am training a model with X_train.shape = (60, 40) and y_train.shape = (60,). Then my model code is as follow model=Sequential()
###first layer
model.add(Dense(32,input_shape=(60,)))
model.add(Activation('relu'))
###second layer
model.add(Dense(10))
model.add(Activation('relu'))
###third layer
model.add(Dense(10))
model.add(Activation('relu'))
###final layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))
but it is giving this error
what should I do. Pls help
read the error. it's expecting a different number of inputs than you're giving it
you need to update the layer architecture to fix the bug
I am extracting features from audio signals and now want to compare them with the real voice input. But I am unable to figure out how to do it. If anyone can guide me for this I will be really grateful.
issue resolved
somebody guide with this
Voice sounds unrelated to disaster tweets which is this channel
Hi, i want to join this compitition
Hello! I was wondering, in the getting started notebook for this competition, what are the three numbers within the array when finding your score?
its individual score for each time the cross validation run. since you chose cv = 3, its returns 3 outputs
Why does this only show 5 of the predicted data points? How do we see the submission file? I can’t find the viewer it mentions.
"the function "head" returns the first 5 rows of a file, that is why it only shows the first 5.
You are currently looking at the code editor. Once you click "save version" in the top right you can look at the notebook in the viewer and then under the files tab can submit the output to the competition.
Alternatively, you should see options on the right hand side of the screen in the editor to submit to the competition.
Thank you! What is clf? For example: scores = model_selection.cross_val_score(clf, train_vectors, train_ …)
Is it the model being used for the estimator? What does it stand for?
anyone interested in doing this project together
Yes, it's usually the instance of the model. It stands for something like 'CLassiFier' I guess
Anyone interested to do this project together?
I am looking for a partner!
Me
anyone up for discussion? I am at EDA
Hey all, this is my first NLP project and I am very excited to join you! I have a week with this competition already and I'm trying to improve my score so I'll be checking this constantly ☺️
Hiii I wanted some help with the model DistilBert. I was referring to a notebook https://www.kaggle.com/code/alexia/kerasnlp-starter-notebook-disaster-tweets/notebook#Load-a-DistilBERT-model-from-Keras-NLP. But I don't understand the reason for using the parameters in preprocessor = keras_nlp.models.DistilBertPreprocessor.from_preset(preset,
sequence_length=160,
name="preprocessor_4_tweets")
Why did they take take sequence length as 160 ?
Do we have to figure that out ourselves or any reference is given ?
It is in the Load a DistilBERT model from Keras NLP section of the notebook
anyone interested in learning nlp through competition? PyTorch person please
hey im a beginner in NLP, i thought i could ask, if i should go with LSTM model or is there any suggestions for alternative models ?
also, while im using pytorch, i wanna know some tokenization techniques because torchtext doesnt seem stable, any suggestions ?
If anyone interested to improve the accuracy I got : 0.82
Hi guys, When does the competition start and end?
It's always active
The getting started competition don't have a start or end time, they are always open
And Is there evaluation?
FOR BEGINNERS:
https://www.kaggle.com/code/vishalyginny/natural-language-processing
Here is my code if anyone is interested for starting out this competition. My code is quite simple and easy to understand.
hey vishaly, thanx for the code!! I went through it but why are we imputing keyword with fatality? it has only 61 missing values so isnt it better to drop them? coz considering every missing keyword as fatality might cause a bias? Im new to this so im not sure but i would love to discuss about it
Hey! Just like you said, it only has 61 missing values, we can't always drop columns, if the number of missing values isn't large enough, then it's better to impute them with mean or mode. This way, if the column is relevant to the target column, the values of the imputed column are important.
Oh okay vishaly!! Will keep this in mind!!
Hey Guys, any youtube video you recommend about this competition? I'm really struggling on this one
hello guys, i have just started this competition, any one that want to collaborate???
Are you still interested?
link to the comptn?
Hi Amit, I am interested in a colab. But I am kinda novice.
Hi everyone. I’m looking for suggestions on tackling this problem. I have about a 100,000 unlabeled job description data that I’m trying to use to determine the category of job. For example, from a job description text I want to know if it’s in IT, Software, Admin/Clerical etc. I tried using pre trained models from hugging face transformers but it didn’t work well. I have thought about labeling the data but it would take time to do it for a 100,000.
Has anybody tried to include the "keyword" and "location" columns into your model? All the notebooks I looked at so far didn't include these columns. Anyway, if you did include them, how did you encode them? The "keyword" column has ~222 unique values and the "location" column has ~3341 unique values. I don't think one-hot-encoding makes sense in this case. Any thoughts?
@tawny shore please don't share irrelevant content
you can try target encoding
is there anyone doing this project right now? I wanna join.
Hii