#data-science-and-ml

1 messages · Page 301 of 1

exotic maple
#

Thanks @grave frost you are awesome

grave frost
exotic maple
#

I dread the struggle but its 100x times the learning experience than anything else

#

Bonus if i can get her or the CEO to vouch for my.implementation on my resume lol

grave frost
#

personal experience - the grind is usually only for a few days

grave frost
exotic maple
#

No data pipelines and stuff? I dpubt i can just toss raw CVs rhere lol

grave frost
#

you're pretty lucky to have such an oppurtunity - make the most of it

grave frost
exotic maple
grave frost
#

all in all, a simple text classification takes 60 lines

#

I doubt NER would push it any more

#

^ including data preprocessing BTW

exotic maple
#

Some simple lemmatization and tokenizing wouldnt be much code yeah

#

Ill look into once we formally define the project. Thanks a lot man 😊

serene scaffold
#

ner
that is a thing that I know about
what is happen

grave frost
#

what is happen?

candid basin
#

so wait..
is the hidden layer in a neural network just taking the values from the input layer, and repeating the same sigmoid process?

worn tusk
#

Anyone good with NLTK?

gray arch
worn tusk
#

I was actually going to ask a question about it lol

gray arch
#

Oh okay lol

grave frost
worn tusk
#

Well I posted it in a help channel but:

I've got a question - I'm trying to evaluate some sets of ngrams using a training set and a data set from a corpus and write some statistics about it. I have to include two measures: accuracy (which is easy using the ".evaluate" module in NLTK. But I also have to find out the "words/error" rate of it. So for example I'm:

using the Brown corpus' subcorpus "news"
this supcorpus has 100,554 words in total
I've split the corpus in to a training set of 500 sentences, and the rest are the testing set (the suborpus has 4,623 phrases in all).
I have a partially filled chart with an example:
The default ngram tagger has an accuracy of 30.41% (when comparing the training and testing sets) and a error rate of 1.4 words/error
But I can't figure out how this 1.4 words/error was calculated. Anyone have any ideas? I think, and I may be wrong here, that it's calculated as a function of the rest of the corpus that is NOT accurate - that is, 100%-30.41% = 69.95% of the corpus. That number is then related to the total number of words somehow I think?

worn tusk
#

...yeah, I have no idea either...lol...

exotic maple
#

@grave frost can I pm you man?

cinder crow
#

What is the best way to convert an excel .xslx file to csv? Or alternatively, how to properly read in an excel .xslx file into a python program so that you can parse it and write what you want to a separate csv?

exotic maple
#

and you an export csv with pandas as well

cinder crow
#

So, is there a way to have pandas library work with a script that I run, or do I need to interactively be in a pandas notebook and run the commands?

exotic maple
#

notebook is just convenient for testing

cinder crow
#

Do you happen to have a good link to something explaining this that worked for you? I tried following a couple links I found but was getting errors. These posters were using a combination of openpyxl and pandas tho, so not sure if that had something to do with it.

exotic maple
#

honestly its difficult without actualy having the files

#

pandas has native excel reder

#

pd.read_excel

cinder crow
#

TY

#

I shoulda started here lol

austere swift
#

you can do it in one line too

#

pandas.read_excel("filename.xlsx").to_csv("out.csv")

cinder crow
#

Since this file has multiple sheets, would I just do pandas.read_excel("filename.xlsx", sheet_name=0).to_csv("out.csv") if I wanted first sheet, or is that inferred if I dont specify sheet?

mortal trout
#

hi can someone tell me after training cnn model how to predict for custom input images

#

im using tensorflow

cinder crow
#

Hmm, so I'm getting an error with several lines in the traceback showing directoriess of python packages and at the end it says TypeError: expected string or bytes-like object

exotic maple
#

or something

cinder crow
#

Ohh. Do I need xlrd or openpyxl installed or should this work with just pandas?

exotic maple
#

xlrd is usually needed

#

as the backend

#

pandas abstracts its usage if im not mistaken

dapper halo
#

So I honestly have no freakin clue on how to pick an adequate model. Any resources anyone would recommend?

exotic maple
#

yor model depends on your problem and your data

cinder crow
#

I installed openpyxl as well as xlrd after running with just xlrd returned a message saying i needed openpyxl for .xslx files

dapper halo
#

Tbh not sure a good general way to describe it. I’ve got a bunch of simulation data that changes four different initializing parameters for each generation of a grid. I want to train a model to predict two of the input parameters based on the other two input parameters and the output parameter of the simulation.

Two relations appear to be somewhat linear. Maybe another is a broken or segmented power law, no clue the shape of the final parameter.

#

I also know that is a piss poor explanation too

dapper halo
exotic maple
dapper halo
mortal trout
lapis sequoia
#

Hey 👋 I’m coming from web dev background how to get started in machine learning and ai??

gray arch
lapis sequoia
#

Okay can you list the math needed?

iron basalt
#

It's in the pins.

lapis sequoia
#

Where though

#

Seen ...

#

Thanks

#

Similar to what I found just wanted to confirm... and so my journey begins ...

marsh gale
#

Hey all!
Im very much a AI beginner (if even) and I had a small project in my mind for a browsergame.
Basically I want to teach an AI to generate a "good Unit composition"
But i'm wondering how the cost function would look like.

A battle works as follows: both opponents start with a unit composition (e.g.: Player1: 10 light fighter, 5 heavy fighter, 2 battleships & Player2: 15 light fighter 2 heavy fighter, 2 baattleships) Some units are better versus some units than others are and some units cost more than others do.

So.. after the battle you can simply sum up the units that u lost and there is a indicator of how well your "team" performed.
Moreover you can sum the "damage" that u dealt, which is another indicator.

But how do I get value out of this?

I imagine my AI to gimme some Unit composition.
Than I calculate the battle.
and than feed it with what? I mean I can't tell whats "the best" Unit composition

pulsar badger
#

I have a question, where and how can I learn AI for a beginner?

lapis sequoia
#

What's the most respected data science coursera course

grave frost
uncut barn
#

is this the right channel to ask stats question?

tidal bough
lean ledge
#

Do you mean reward function? Value function has a specific meaning

tidal bough
#

Oh yeah, I actually meant utility/reward.

grave frost
#

I read a research paper where they split the dataset Into 2 parts to test out different types of model philoshpies and in their testing, they proved their point via train and validation accuracy. Do you think I should consider their results to be genuine, even though their testing methods seem sketchy? I would have expected some amount of CV atleast in such a paper.

marsh gale
tidal bough
#

Hmm, how are you going to generate win-loss data?

#

If you have a way to determine how likely a configuration is to win, then you basically already have the important half of a model to generate best configurations, don't you?

marsh gale
#

Yeah I think I just realised that AI likely won’t solve my problem.. lol.

Well I can just generate two random unit compositions (of equal „price“) let them battle a hundred times, avg the data and there I am.

My problem is, that idk what’s „the best“ unit composition against something.
So I don’t know how to train the AI

I want to input one unit composition and the AI should tell be what it would use to fight it, but since I can’t train it that’s probably not going to work by just telling it „this wins against that“ right?

Sorry for the bad language only typing this on my phone rn

lean ledge
#

2 splits end up okay

shadow quiver
#

I have just a list of 43 texts. I have encoded them via ber-base model from Huggingface, limited to 512 tokens.

tokenized = df.text.apply(lambda text: tokenizer.tokenize(text)).to_list()
inputs = tokenizer(tokenized, is_split_into_words=True, truncation=True, max_length=512, padding='max_length', return_tensors='pt')
outputs = model(**inputs)  # this line fills my ram

I have 16 GB of RAM and another 16 GB of swap. When the third line is being run, the RAM usage goes beyond my machine's limitations. I have tried many different ways but didn't helped.

What could be done? My texts have long sentences, some of them may be consist of no sentences at all; could this be the problem? I have limited them to have 512 tokens though.

tidal bough
shadow quiver
brave owl
#

The Problem is about Logistic Regression

grave frost
brave owl
#

Hello @grave frost

#

can you hop on #help-cheese if you know about Logistic Regression

grave frost
brave owl
#

Okay, thanks for reccomendation

agile sky
#

hey guys

#

i am implementing siamese network

#

and i am trying to improve the accuracy of my model

#

currently i am using single-channel(b&w) images

#

of size (32*32)

#

and my training model is using adam classifier with learning rate = 0.001

#

can someone help me try improving the model

#

i can use transforms and optimization using optim

#

but i dont know what or how to use them

molten hamlet
#

hey, does anyone knows, where could I find some english textes to classify?

#

i want longer than 1 page.

tidal bough
#

you mean a dataset, or just some texts without labels?

molten hamlet
#

does not matter, I can make labels, I need just 4 or 5 textes long enough to build markov chains

#

checking kaggle now

lapis sequoia
#

Hi

molten hamlet
#

data set is ok, if its big enough, I mean. there is enough words and I can see where sentece ends and starts

lapis sequoia
molten hamlet
#

WIKI, there is huge amount of dataa

#

🤦‍♂️

tidal bough
#

because if you don't need labels, you could maybe get free articles from arxiv or something

#

yeah, wiki would work too

molten hamlet
# tidal bough because if you don't need labels, you could maybe get free articles from arxiv o...

These famous speeches lifted hearts in dark times, gave hope in despair, refined the characters of men, inspired brave feats & changed the course of history

weak marten
#

hi guys, i have a question that i hope someone can help me solve this problem.
I'm doing my capstone project, the project is about smart door lock using facial recognition. I have already built the API using Flask and OpenCV for capturing faces and training. After that, I don't know how to put the model to the Raspberry Pi for facial recognition. If you have any ideas, please let me know. Thanks for your help guys.

serene scaffold
weak marten
weak marten
#

Okay thanks

misty flint
#

any resources for chatbots? thats going to be our next project

serene scaffold
misty flint
#

not my idea but were doing it

grave frost
lapis sequoia
#

Hey anyone knows where I can find pandas.to_datetime sub-functions? like .month_name()?

sonic raft
#

Can I use sigmoid as a normalization function if I only have one output activation? In theory it does the same as softmax in case of multiple output activation

marsh gale
low lintel
#

Hey, I'm a PhD student and I've learnt Python basics few months ago and I'd like to develop deep learning models. From what I've read, Keras is easy to use but PyTorch can make more complex and flexible models. As I'm not really good at programming with Python (like creating classes), is it worth to use PyTorch or should I stick with Keras ? Thanks for the help

serene scaffold
#

I have an array of N strings, and I want an encoder that can take a string and return an array of all 1s and 0s where a 1 indicates that the string at the ith position in the original array is a substring of the string being encoded

#

I don't believe this is one hot encoding. I'm pretty sure that would return arrays of N^2 elements

#

What is this called?

grave frost
severe rover
serene scaffold
#

Suppose N = 3 and my tokens are "cat", "dog", and "mouse", and when I encode a string, I want an array of ['cat' in string, 'dog' in string, 'mouse' in string]

severe rover
#

so multilabel binarizer?

serene scaffold
#

So that's what that's called? Thanks!

severe rover
#

hopefully this is what you want

low lintel
serene scaffold
#

@severe rover

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(['x', 'y', 'z'])
mlb.transform(['a', 'b', 'x', 'z', 'c'])

# output

array([[0, 0, 0],
       [0, 0, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 0, 0]])

The desired output is array([1, 0, 1])--is taking the columnwise sum the only way to get that?

#

The problem must be that it's treating every string as its own iterable

#

("problem" as in "incongruity between my expectations and the spec")

#

This is great. Thanks!

vivid cape
#

hi, i want to work with instagramgraph api anyone can help to me ?

serene scaffold
#

I don't know anything about instagramgraph, but it's not likely that anyone who does will volunteer themselves until they know what the question is.

full shore
#

hi all, I have a matrix that shows the rate of agreement between different participants in a survey based on how many times they voted the same in some questions. What would be a good algorithm to cluster them based on similarity?

pure quiver
#

Knn?

#

No I mean, I'm suggesting knn 😅

grave frost
grave frost
full shore
#

@pure quiver ty

cursive peak
#

hey what is best course for learn data science

#

actually need advance data science for compete in kaggle

hollow sentinel
#

if you have the math background for it

#

the Stanford course by Andrew Ng is good

cursive peak
#

have background

hollow sentinel
#

there is also a columbia coursera course

cursive peak
#

andrew is not advance i think

hollow sentinel
#

in Python

#

andrew is pretty good

cursive peak
#

sure andrew is legend

#

i mean advance for compete

#

i think andrew teach

#

how it works u understand

hollow sentinel
#

oh

#

well

#

idk

cursive peak
#

i saw ibm advance DS in coursera

#

but i think its beginner

tidal bough
# marsh gale Pretty much both yeah, there is a website where players upload their „combat log...

In this case, yeah, you can do a lot of nice things with this data, and especially the simulator. You can have an utility function of, say, average winrate against a representative sample of teams or (harder, and might not be a good idea if there's a counter to any team) minimaxed winrate: that is, make the utility function the lowest winrate among all possible enemy teams of the same cost, so that an optimal team would be one that doesn't lose too badly against any enemy team (even if that makes it mediocre against any team too).

After desigining such a function, you can start with just a metaheuristic optimization algorithm like simulated annealing, rather than a neural network or something.

hollow sentinel
#

maybe you should try looking at some advanced books for ML

#

I don't know about advanced courses

grave frost
cursive peak
#

Yes i try this actually i can say i know in intermediate degree

#

Just i have lack is english but i try hard to this too

grave frost
#

you don't need to know advanced DS to win at kaggle

grave frost
cursive peak
#

Azerbaijani

cursive peak
#

Xgboost :D

grave frost
#

learning english is a pretty important skill. a lot of the stuff online is in english, so I recommend you also try to improve it little by little. google translate is always there

cursive peak
#

Hey i know english juet u knoe

#

This is technical works

grave frost
cursive peak
#

For read papers need advance english technical based

cursive peak
#

Actually important things is daTa analysis

#

Or feature engineering

grave frost
#

every time I read some new paper, its like a bucket of fresh water on my head

cursive peak
#

What u think about it

grave frost
cursive peak
#

Yes it's most important thing

#

Other things is common

grave frost
#

feature engineering is easily the best way. model techniques come second

rancid zealot
#

where would I get help with a small cython problem? i'm so lost on this discord

grave frost
#

I find feature engineering interesting because a lot of it is common sense and thinking out-of-the-box

cursive peak
#

Ml models os easy for adjust hyperparameters

#

Actually i not like feature engineering

#

Now i m interesting much in DL

#

Cnns object detection

#

But i think i should do something in kaggle for show myself that i can do this

#

:)))

grave frost
#

kaggle is very hard

#

not to discourage you or anything, but you would have to put significant amount of effort to do anything

#

since there is a lot of people too, competion also increases. but its fun too

cursive peak
#

Sure it's not deal to get 1 st in world

#

But nit deal get last position in world

#

It's important just u know that u competitive and u can push limits u can make hard day for master's of kaggle jajaja

lean ledge
#

There are entire fields of ML that have been shown to be bogus because of hyperparameter tuning

cursive peak
#

Example which models

#

I think it's hard if i do reall serios work

lean ledge
cursive peak
#

Sometimes u need 99.9 accuracy then it must be hard to tune model

#

I ll read this paper

lean ledge
cursive peak
#

This paper looks like high level :)

grave frost
lean ledge
#

Hm?

grave frost
# lean ledge Hm?

the one where a paper just used test set (no val) and no CV on a Low resource language haha

#

and their proposal was that CNN+embedding is always better than traditional ML methods

lean ledge
#

You sure it's training + test no Val instead of training + val no test?;

#

...Isn't CNN+embedded a traditional ML technique?

grave frost
grave frost
cursive peak
#

What is NB

grave frost
#

Naive Bayes

#

they also used stuff like Decision trees, SVM etc.

cursive peak
#

Ah i know but i know briefly

#

Not theory behind bahes

#

Now you saying cnn is traditional ml technique?

grave frost
cursive peak
#

Ah yes i understand

#

How d are u brother?

grave frost
#

I really hate papers behind paywalls

#

42,99 € are you kidding me?!

cursive peak
#

Try sci hub

native vault
#

Hey guys, can you recommend a book to learn complex AI math? I want a book that'll help me understand the complex math used in AI (especially natural language processing) that is related to programming (python preferred)

cursive peak
#

Ah nlp i Really hate nlp :(

#

Lstm gru is really i afraid so much of rnn s

gray arch
#

I am working on a project of NLP with LSTM right now
My head is being burnt with encoder/decoder

cursive peak
#

Yes it's really hard

#

I really read watch many things about it about RNN when i went do drink tea and came bacm

#

I understand i didn't understand anything :?*

gray arch
cursive peak
#

Hahahaahha

#

Anyway i love CNN s hahah

gray arch
#

CNN is much better and much more popular too
I used VGG16 for captioning images
Kinda easier to understand

cursive peak
#

Yes sure i m not pro

#

But for me cnn is sooo much easy than rnn

gray arch
cursive peak
#

Yes maybe one day i ll learn thoose too

grave frost
#

I think I would try CNN + Embedding for classification

#

logically, it does seem like a great choice

velvet thorn
gray arch
# velvet thorn what do you want to understand

I am just working on a Kaggle project to make a customer service chatbot based on Twitter data using LSTM and NLP, read docs and do not understand much...
I will reach you if I have specific high-level questions

velvet thorn
#

I might not be around 😉

cursive peak
#

Because i know embedding only in nlp

grave frost
gray arch
velvet thorn
#

no, I mean, in the future

gray arch
#

I will remember that, thank you!

velvet thorn
#

🙂

cursive peak
#

If you have projects send links

#

I would like to look and learn smth

hollow sentinel
#

you can learn on Kaggle 🙂

#

there are a lot of projects on Kaggle

cursive peak
#

Hahha i know ;)))

hollow sentinel
#

also github

cursive peak
#

I mean learn from our friend's projects

#

Example from u hahah

grave frost
#

when you want to build something quick: our code

cursive peak
#

What

shut apex
#

Hi, I have a shp file streets names and info, and a csv file with 3 million addresses. And I want to know the coordinates of those addresses. Does anyone know how should I start? I don't want to use google maps api or here api because they are too slow and have requests limit per month

lapis sequoia
#

If anyone has experience with solving systems of ODEs using SciPy's solve_ivp, I would love to get some feedback on this project: https://github.com/wigging/bfb-gasifier. It takes about 16 minutes to run the dynamic model but I get overflow warnings while the solver is running (see below). See the README in the repo for usage instructions and a reference for the differential equations. Please let me know if you have suggestions to fix the warnings and if there is a way to speed up the solver for this particular problem. Thanks for the help.

dyn-bfbgasf/solid_phase.py:133: RuntimeWarning: overflow encountered in power
  + hps * (Tp - Ts)
dyn-bfbgasf/solid_phase.py:309: RuntimeWarning: overflow encountered in power
  - hps * (Tp - Ts)
dyn-bfbgasf/solid_phase.py:393: RuntimeWarning: overflow encountered in power
  qwr = np.pi * Dwi * epb / ((1 - ep) / (ep * epb) + (1 - ew) / ew + 1) * sc * (Tw**4 - Tp**4)
dyn-bfbgasf/gas_phase.py:230: RuntimeWarning: overflow encountered in multiply
  SmgV = SmgG + Smgs * (ug + v) - (Smgp - Smgg) * ug - SmgF

----------------------- Solver Info ------------------------

message The solver successfully reached the end of the integration interval.
success True
nfev    63758
njev    291
nlu     1188

----------------------- Results Info -----------------------

t0      0.0
tf      1000.0
len t   8707
y shape (1600, 8707)

elapsed time 16m 22s
thin prism
#

hello guys

#

I have left the question at the help-croissant channel but let me leave the qeustion here as well.

so the data contains name age city country course.

If I use .drop column age, normalize the data with rest of the columns... how can I re add the column age back to the dataset?

shut apex
#

Do you have that column else where?

serene scaffold
#

you can save the column that you want to exclude from the operations you're going to do and join them back together later

#

there's probably a lot of ways you can do it.

thin prism
thin prism
#

The data I gave you guys are just examples.

lavish laurel
#

HEL

#

P

#

please

#

help

thin prism
#

okay...?

hollow sentinel
#

Yeah idk seems like they said it in multiple channels

thin prism
hollow sentinel
#

@thin prism no not you

thin prism
#

ohhh gotcha I was like confused with that person

#

haha

cinder crow
#

Was running a script using pandas fine and then started getting exceptions related to read_excel()

#

Has anyone had issues related to read_excel() for opening .xslx files recently? Weirdest thing is it wasnt working yesterday, then it was for most of today. Have tried like uninstall/reinstalling openpyxl and xlrd bc it keeps asking me to.

warm wharf
#

hi i have a question about k means clustering

#

i have an assignment that wants me to implement k means based on the following strategy:
Strategy 2: pick the first center randomly; for the i-th center (i>1), choose a sample (among all possible samples) such that the average distance of this chosen one to all previous (i-1) centers is maximal.

#

i don't understand what this even means

#

so i pick the first point randomly from the data set and set it as the first centroid, then for the second one, i find the farthest point from the first centroid, and for the third one i find the point where the average distance of the average distance of the two centroids is the largest?

#

is that correct?

#

please tag me if you respond thanks :)

marsh gale
# tidal bough In this case, yeah, you can do a lot of nice things with this data, and especial...

Well.. the thing is.. I can simulate a battle. But its really difficult to figure out what works BEST against a team.
Its farily easy to see this Unit1 is good against Unit2
But there is ~10 different Units and the composition/relative number of Units of that type in the comp makes it really hard to tell what its struggling against the most.
So the only thing I have in my mind to do that would be brute force it.
100% Unit1 -> simulate, store data
99% Unit1 + 1% Unit1 -> simulate, if better result store the data.
etc. Thats like.. 100^10 Simulations if I'm not mistaking with ~1-3s per Simulation that would be 3.170.979.198.376+++ years.
But that would be a monstrous task.
Especially because thats only one "perfect Unit composition against one very specific Team"
Again: I'm by no means professional but thats what I came up with.

€dit: Thats why I thought hey, maybe AI is the way to go.
Could I Input Data of "an enemy Team" (or Batches of it) (E.g.: Input Neuron1 = 3 light fighters) let the AI come up with a Team themself (Input Neurons == Output Neurons), than simulate a battle and teach it via the results of that battle?
E.g. Minimize the Amount of Ships that it lost?/Maximize the amount of Ships that the enemy(Input) lost? I'm scared that this would result in the AI just sending an "empty" team all the time because than the losses would be 0.

Again apologies for my bad language, kinda hard to get this difficult topic into proper words

tidal bough
#

100% Unit1 -> simulate, store data
99% Unit1 + 1% Unit1 -> simulate, if better result store the data.
That's brute force, but it's not the only way to find extrema of functions. Simulated annealing is more like a weird form of gradient descent - it calculates small changes from the current configuration, calculates their utility, and accepts or rejects them based whether they are better or worse and on the current "temperature" (that's where the method's magic is).

#

Could I Input Data of "an enemy Team" (or Batches of it) (E.g.: Input Neuron1 = 3 light fighters) let the AI come up with a Team themself (Input Neurons == Output Neurons), than simulate a battle and teach it via the results of that battle?
Yup, if you want to teach a model to generate teams to counter a specific team, that would probably require ML.

inland sky
#

hi all ^^, I would like to make a "simple" AI to act like a ant, but I don't know how to start and where to start

marsh gale
raven knoll
#

Hey I am in need of some advice. This is probably a simple question but I am doing text-sentiment-analysis and I need 3 different classifiers. I currently have SGDClassifier from linear_model and Multinominal from Naive_bayes.

I tried KNN but my input variables become inconsistent and I ran SVM for like 2 hours but It's still loading. Anyone any advice about this topic?

summer pier
#

Hello, has anyone tried to make poker ai with reinforced learning? Would love some guidance

full shore
#

hi everyone. This probably is a really basic question, but I have 0 training in statistics or anything similar.

#

I have this matrix that shows the rate agreement of participants' ratings on images. So for each pair of participants, I have a number between 0 and 1. Can these numbers be used to somehow cluster participants that all tend to agree with each other? I have ~70 participants

#

this is the matrix

primal tulip
#

If you know the sum of each row and each column, you could know which is the most agreeable participant.

full shore
#

this matrix is symmetrical along the diagonal, so I guess either summing rows or columns would tell me that, right? Also, I think that is a good metric, thank you. But I'm also interested in clustering participants that voted similarly, because that may help me better detect commonalities on the stuff they voted on (images). I may be wrong but "agreeability" probably doesn't help me learn much about the images themselves?

primal tulip
#

It won't help you directly but you'll understand better the relationship between participants. I can't think of a better approach so far, so sorry I can't be of much help, but try to sort the data in different ways to check for patterns. I would do what I said before as a first step.

full shore
#

yes, absolutely. And I hadn't thought of that at all. Thank you @primal tulip

marsh gale
summer pier
#

thats my uni topic well to get the hang i could do blackjact and then holdem

grave frost
#

searching playing poker with ai might give some video with a github repo

silver oak
#

how to convert this to string

silver widget
#

It is already a string. You need to change it to integers or floats

#

Is it a categorical data?

silver oak
#

this the data

#

I want to make sure we can put in age and gender and then it will tell you the hobby

silver widget
#

Ordinal encoder will do good for you.

silver oak
silver widget
#

pre = model.... line, you forgot to add ) at the end

silver oak
#

i add this

silver widget
#

Hi all. For past few days I've been working on forage.com's ANZ data -task 2. My task is to predict annual salary of the customers. The data includes current budget, amount withdrew, age and date (there are other variables such as transaction location etc, don't think they will be useful for prediction).
Anyways, The transaction data has ordinal data such as 'credit card', 'salary/pay check'. So I gathered monthly salaries of the customers. Then I got stuck here. For the annual salary, should I be using Time series analysis? The task requires simple regression model to predict annual salary.
After got stuck, I looked at youtube; the videos I found just used regression with x= amount, age, y= balance.
Am i doing sth wrong or my approach is good to go?

silver widget
# silver oak i add this

You need to change the genre data into numeric data first. Easiest way to do this is by ordinalencoder.

#

Of course ,guys with way too much experience and knowledge will answer more detailed than me. I'm just a learner too.

sudden panther
#

Hello everyone, I have a problem on making sentiment analysis twitter. I want to search the hashtag in specific location, but I still get error. Does anyone solve the error? Here are the code and the error. Thanks in advance🙏🏻

shut valve
#

['covid-19'] is a list

#

'covid-19' is a string / data[0] is a string for the sting concatenation

tidal bronze
#

import pandas as pd
import numpy as np

df_sample =\
pd.DataFrame([["day1","day2","day1","day2","day1","day2"],
              ["A","B","A","B","C","C"],
              [100,150,200,150,100],
              [120,160,100,180,110]] ).T  

df_sample.columns = ["day_no","class","score1","score2"]  
df_sample.index   = [11,12,13,14,15,16]  

agg = df_sample.groupby(["day_no","class"]).sum()

so for day 1 class B it currently displays no value how can I make it so it will fill in a 0 when no value is observed? Same for A on day 2
(e.g. I need a value for each possibilities no matter what)

#

This is currently what it outputs

serene scaffold
#

@static owl That's where we are

static owl
#

i-

#

my bad

serene scaffold
#

I actually did that a few days ago

#

here

static owl
#

haha i didn't realize i'd switched here lemon_sweat

#

gonna blame discord, it's totally not my fault

serene scaffold
tidal bronze
#

you can see the current output there no value for ["day1", "B] and ["day2", "A"] and I'd like to have 0s instead of nothing @serene scaffold

native patrol
serene scaffold
native patrol
#

TIL df.reindex has a fill_value param

shell summit
#

Is there any downside to using replit to run my AI and get it through the first couple generations?

grave frost
#

and replit is shit anyway

shell summit
#

It would be, I just wanna make sure it works

grave frost
#

just run it on your laptop then

shell summit
#

Don’t want to let it populate for a week only for it to break without noticing

#

Fair enough.

#

Thanks

grave frost
#

or use colab if you want to run for a few hours

shell summit
#

Colab?

grave frost
uncut barn
#

what plots are useful for time series data apart from linegraphs?

tidal bronze
uncut barn
tidal bronze
#

one boxplot for each time period if it's not too long

#

for example @uncut barn

#

otherwise heatmaps or auto-correlation plots come to mind

#

depends on what your goal and data are

uncut barn
#

@tidal bronze so this is my task and I justused a line graph

#

also I did a bar plot for this task , but dont if I should any other visualisations that wouldn't be redundant

twin moth
#

Hey guys,
I was instructed to replace all outliers in my DF with np.NaNs.
Outliers should be numerical values which were greater than Q3 + 1.5 * IQR OR lesser than Q1 - 1.5 * IQR

For some reason it does not work as intended.

Can you spot any issues in my code?

def outlier_detection_iqr(df):
    n_df = df.select_dtypes(include=np.number)
    
    q1, q3 = n_df.quantile(numeric_only=True, q=[0.25, 0.75]).iloc
    iqr = q3 - q1
    
    n_df, iqr = n_df.align(iqr, axis=1, copy=False)
    
    return n_df.where((n_df > (q1 - iqr * 1.5)) & (n_df < (q3 + iqr * 1.5)), other=np.NaN)\
    .join(df.select_dtypes(exclude=np.number))\
    .reindex(columns=df.columns.tolist())
#

I should add that it does return a DF but it doesn't contain the expected NaNs, at least for some of the columns

twin moth
#

Works now, I forgot to change > and < to >= and <=

grave frost
#

Does anyone have any specific architecture recommendations for classification on a vector as an input? CNN + MLP is the standard one, but there might be some that I have missed

serene scaffold
#

I've gotten as far as deciding on how to encode objects for two classes that I want to classify, but the representations end up being pretty sparse. Is there an algorithm that's known to be good at binary classification on sparse vectors?

I could provide more context about what I'm doing and how I decided to encode everything, but I kind of want to see how what I came up with turns out.

serene scaffold
mortal pendant
#

I've finally gotten back into AI and I'm wanting to try again at this project. I've managed to get the vanilla GTP2 model working in huggingface, but unsure how to fine-tune it. Any ideas? #data-science-and-ml message

#

Currently, my dataset is just a bunch of text files for each user where every message is on a new line, with a few filters to avoid span (I've also considered using list(set(...) to remove duplicate messages from each user, but decided against it) https://paste.pythondiscord.com/gucurocago.py

fickle sinew
fickle sinew
mortal pendant
mortal pendant
fickle sinew
#

Efficiency shouldn't really matter
suggest you rethink that mindset, this would not fly in most professional settings.

mortal pendant
#

I am usually all about efficiency, but for a small hobbyist project, someone that already only takes 2 seconds probably doesn't need to be any more efficient

fickle sinew
#

....you also posted it to a public chat and asked for feedback.....

#

sorry, i dont mean to be too harsh, i do get what you are saying, just realize where you're coming from

mortal pendant
#

I wasn't asking for feedback- I was just showing how I produce my dataset to make it clearer what I'm working with so my question about fine-tuning GPT2 could be answered 😅

young dock
#

So this might be a patzer question, but I just did some quantile regression, and I'm confused why each equation for each quantile has its own 95% confidence interval, I'm having trouble wrapping my head around this

mortal pendant
#

I appreciate feedback, but that isn't actually what I posted it for as the code I posted works fine

lapis sequoia
#

hmm

mortal pendant
heavy tundra
#

are there any packages that can transform your image dataset?

#

I have been using https://app.roboflow.com/ to add a grayscale and horizontal flip but there's a limit to how much data you can upload

Even if you're not a machine learning expert, you can use Roboflow train a custom, state-of-the-art computer vision model on your own data.

#

is there something else that can create augmented images while keeping the labels for object detection

mortal pendant
heavy tundra
#

maybe

#

It needs to keep the labels is the main thing

#

or move the labels accordingly

fickle sinew
heavy tundra
#

yeah I guess so

fickle sinew
#

PIL cam do most of that pretty easy

heavy tundra
#

with roboflow I could upload an image dataset and then generate "augmented" images that were resized, grayscaled, flip etc.

#

but also it needs to generate labels for the new images

bronze skiff
swift kettle
#

Folks, could I please ask you about the Python packages and libraries you are using for data science and ML? In our app (shameless promotion) Devbook (https://usedevbook.com/) we added support for NumPy and PyTorch docs and right now I'm working on Pandas, but we do not have that much data about what people really use and metrics like GitHub stars seem dangerous to rely on.

Well, maybe I should rephrase the question, being in the data science channel - how do you think I should estimate what packages and libraries should I add first?

bronze skiff
#

tbf, this product sounds cool, but ultimately doesn't give much lift if it's just python packages like numpy, pandas, etc

#

a lot of these libraries have actually fairly decent docs to begin with-- unless your product does more (isolated testing repls for tricky functions, maybe integration of blog resources)

#

for a lot of us, if it makes looking up pyspark docs, or pyarrow docs easier then this would be great

#

considering those projects have notoriously bad docs

swift kettle
#

Huh, that actually sounds more helpful that a list of the most used libraries, thank you 😄

bronze skiff
#

the stack overflow integration is 😘 tho

iron basalt
#

Something that you could have, which I don't see much of, is a concept graph. When you search for something it shows the concept you searched for as a node, and all the surrounding nodes are related concepts and prerequisite knowledge. Navigation can be done by clicking on nodes.

swift kettle
# bronze skiff the stack overflow integration is 😘 tho

Apart from the docs you mentioned and the SO :D, are there any other resources that you look up when you are working? Maybe some private solutions, infrastructure dashboards,...I've never done data science professionally, but is there anything to view/browse dataset repositories online?

iron basalt
#

The reason for this addition is that finding a specific function is not really the problem, it's understanding its context that let's you understand when to use it and how.

swift kettle
#

Huh, that makes sense - do you have this problem with a specific libraries, or languages or do you think it is a general problem?

iron basalt
#

It's a problem for all libraries and programming projects. Many functions in a project are really just built on top of the core part of the project and tailored for specific tasks, understanding why those specific functions were added, what the specific use case was, and what it was built on is key. Following a graph backwards from the searched function to the core of the project will give a deeper understanding of it and an automatic understanding of other functions in the project (can already guess what they do).

#

Ideally one does not need to search anything, so getting that deeper understanding pays off in the long run by reducing lookup.

exotic maple
swift kettle
swift kettle
iron basalt
swift kettle
#

Huh, can you think of a reason why such graphs are not more widely used?

iron basalt
#

It takes effort to make. Most people (sadly) can't render a graph.

#

And in this case it's a simple static setup. Yours would need to dynamically add stuff.

#

Over time.

#

Since libraries change rapidly and therefor some kind of web crawler or maybe even code parser for github would be ideal.

swift kettle
#

Yeah. I'm thinking that it would be a little easier with a typed language though. The Graph of ALL the packages interconnectibility would be something else.

#

It starts to morph into a new "IntelliSense" - you have a data/object and you can browse the graphs of all the transformations that all the packages for the language provide. Ideally you would then just select the final form.
Of course this is all like 100x harder that this.

iron basalt
#

It would probably involve a graph database and maybe even some ML.

#

Basically the stuff social media websites work on all day.

#

You might want to ask in #databases about that though.

swift kettle
#

Allright, thanks. I will think about the concept first and if I decide to explore it more I will talk about it somewhere here.
Any cool project on your side, btw?

iron basalt
#

Right now i'm making a little game with Ursina. Not data-science related.

mortal pendant
#

It's also worth noting I'd like to live fine-tune it, so I can actively receive data to add to it, while still generating new data from the model

warm wharf
#

i have a question regarding calculating error functions for k means clustering, i am tryign to implement this objective function and either i don't understand it completely or i'm implementing it wrong

errors = []  
for i in range(2,11):
    clf = K_Means()
    print("\n\nK = ", i)
    clf.fit(X, i)
    error = 0
      for  j in range(i):
          for classification in clf.classifications:
              for point in clf.classifications[classification]:
                  error += np.linalg.norm(point - clf.centroids[classification])**2
  
      errors.append(error)```
so clf has centroid attribute that are the means in the picture above
and then it has classification which contains the x's
arctic wedgeBOT
warm wharf
grave frost
#

you could parallelize it though. one node to keep fine-tuning model and another to keep serving predictions. thus, the only delay would be the time to fine-tune (i.e about 15 mins or so)

mortal pendant
#

Huh, how not? So, for example, YouTube has to redo their model constantly for recommendations? Surely they'd just add to the model?

grave frost
mortal pendant
#

Then what do they do? Surely that's live fine-tuning?

grave frost
#

Google usually re-train your models every month. like if you own a google assistant, you might notice it having some new features every month

#

and your voice becomes more easier to understand for the device since now it has more data

mortal pendant
#

But then, how come if you make a new account, watch a video then go back to the home screen, your recommendations are already similar to what you just watched? Surely by the logic of re-training every month, it would take a month before you'd find your recommendations being more like your tastes?

grave frost
#

no. we do not have the exact specifics of their models, but I would guess its mostly transfer learning fine-tuned for particular demographic categories that would be decided by another model. this is still a pretty naive approach (this being a guess) but theoretically it would do the trick

mortal pendant
#

Hm I suppose that makes sense

#

So how exactly would I go about this? Would I add data to a queue and once the current fine-tuning is done, it restarts the fine-tuning again with the data in the queue? Or would I start a new fine-tuning process with every new data recieved?

grave frost
#

see my previous message ^ parellization seems to be the best option IMO

mortal pendant
#

That's what I'm referring to- the node that adds data to the queue would be the same one generating new data from the model, waiting for the other node to finish fine-tuning, or telling the other node the fine-tune every time new data is recieved

exotic maple
mortal pendant
#

Either way, I still don't know how to fine-tune in the first place

balmy junco
#

Hey, does anyone know of a function in python for computing the linear transformation from one matrix to another?

mortal pendant
shell summit
#

Just realized wrong channel, sorry!

serene scaffold
#
>>> b = MultiLabelBinarizer().fit([{'a', 'b'}, {'c', 'd'}])
>>> b.transform(['a', 'c', 'd'])
[[1, 0],
 [0, 1],
 [0, 1]]

This is not the actual output. This is the desired output. What's the right encoder to encode different objects the same way?

slate hollow
#

so i was wondering why for gradient descent

#

we decrease each weight

#

by it's partial derivative * the learning rate

#

shouldn't we just add the learning rate if the pd (partial derivative) is possible, and subtract it if negative?

#

wouldn't that work as welll?

#

(ping 2 reply thx)

sour beacon
#

I think this might go here idk really. how would I find the percentage of a number between 2 numbers? like the percentage of 10 from 5 to 20.

austere swift
#

this channel is for data science, scientific python, and AI

dapper halo
#

Dumb question here...is the target distribution here the same as a prior?

sharp prairie
#

Can someone please explain this to me?

#
df1 = df.melt('EST').dropna(subset=['value'])
d = {k: dict(zip(v['EST'], v['value'])) for k, v in df1.groupby('variable')}
analog cave
#

hi i was wondering if someone could please explain the open-cv distance measurement? im trying measure distance from centroid of one's face, to camera in real-time, however i haven't found a way to calculate distance correctly.. any ideas?

short heart
#

is it possible to upload lstm keras model into a file

grave frost
#

and can you elaborate with an example what you are trying to do?

ruby ermine
#

Are there any downsides to using ensure_ascii=False when dumping a dictionary to a JSON file?

grave frost
#

ooh, did anyone see the updated colab with the status bas? it tracks all the call functions performed and the lines being executed. noice for a jupyter notebook

brave owl
#

Hello, I'm a pretty newbie in Machine Learning and started following Andrew Ng's Machine Learning Course from coursera and as many of you may know - It is based on Matlab and Octave, so I researched a bit and found it'll be better if I used NumPy instead of MatLab, so I started implementing Assignments with NumPy but what I've found was because of some Datatype restrictions, my NumPy Programs which were written as almost Translation of MatLab Programs were'nt working right, also I searched about this on YouTube - then found out Algorithms which those guys were implementing were hella small i.e. w/ less lines of code and no complex Maths were giving Great Accuracy for Logistic Regression, I'm in Duality about following Andrew Ng's Course or not, can anyone help me in this issue ?

solid quest
#

Hi, any expert could tell me at a starting point what could I expect to try doing whit a clustering algorithm on a database that has this features?

primal tulip
brave owl
primal tulip
#

I wouldn't say outdated. Matlab, R are used a lot in the Data Science fields. You might encounter them more commonly at investigation and research or medicine related fields for example and less in banking fields which leans more towards python. But it's not at all outdated. I haven't used Octave so I can't say much about it.

brave owl
#

because of restriction of datatype limits only or there are reasons, too?

primal tulip
#

I have no idea without looking at the code. Either way, try to find the part that's being weird and share it. Even if I can't help you, there are bright heads here that could.

brave owl
#

okay

arctic wedgeBOT
lapis sequoia
#
crisp vapor
#

I am trying to learn, encoder decoder for NLP by training a model that can generate docstrings for small code snippets ( Java ).

The model has 93% accuracy, but it always predicts 0 ( padding token). With mask_zero = True

If you can make a good model or tell me what's wrong, it would be of great help.

Model link: https://filetransfer.io/data-package/XneDbGur#link

brave owl
serene scaffold
lapis sequoia
#
import xlrd

excel_workbook = xlrd.open_workbook("data.xlsx")
excel_worksheet = excel_workbook.sheet_by_index(0)


#reading data

for row in range(excel_worksheet.nrows):
    for col in range(excel_worksheet.ncols):
        if col == 0 and row !=0:
            print(excel_worksheet.cell_value(row,col), end='')
        print('\t', end='')
    print()```

it gives me a error saying

```py
Exception has occurred: XLRDError
Excel xlsx file; not supported
  File "D:\everything\Legacy\Game_currency_stats\code.py", line 5, in <module>
    excel_workbook = xlrd.open_workbook("data.xlsx")```
dense cosmos
#

Try XlsxWriter or openpyxl

lapis sequoia
#

ok

#

thank u

lapis sequoia
dense cosmos
#

Both are libraries

lapis sequoia
#

ohk

dense cosmos
lapis sequoia
#

hmm

lapis sequoia
dense cosmos
#

In Excel

lapis sequoia
#

ok

#

ok wait let me try

lapis sequoia
#

right?

dense cosmos
#

Should work yes

lapis sequoia
#

just like tht?

#

@dense cosmos

dense cosmos
#

Did you save it via office or just rename the extension?

lapis sequoia
#

just rename

dense cosmos
#

Not sure if it's gonna work or not. You can try but office might need to convert some stuff depending on what's inside.

lapis sequoia
lapis sequoia
#

didnt work

#

:/

#

@dense cosmos

severe python
#

hi all, have a question. so i've created a script that prompts the user what column they would like to search in aka account ID, parent ID, etc. then it prompts what they are searching for. but, i've added a column that the data is all numerical, and it doesn't find them ex 188991, but it can find M188991. any idea how to fix this? coding in python using pandas, code is on a separate comp

sharp prairie
severe python
#
while True:
        variable = input(f"{bcolors.WARNING}Search by Acronym / Parent / Alert ID / Account?    {bcolors.ENDC}")
        if variable == "Exit":
            sys.exit(0)

        if variable not in df.columns:
            print(f"{bcolors.FAIL}Error: Invalid Input{bcolors.ENDC}") 
            continue

        if variable == "Acronym":
            while True:
                input1 = input("Please provide an Acronym:   ")
                result1 = df.loc[df[variable] == input1]
                if input1 == "Back":
                    break
                if len(result1) == 0:
                    print(f"{bcolors.FAIL}Acronym not found. Please try again{bcolors.ENDC}") 
                else:
                    print(tabulate(result1, headers='keys', tablefmt='psql'))
                continue`
#

here is one part of the code, the rest is the same as if variable == "Acronym": just with Parent ID, Alert ID, etc @sharp prairie

lapis sequoia
#

is there any NN that given an image can output the object on the image on different views?

#

Like if it makes a 3D composition and returns u the object from different angles

severe python
#

@iron basalt @exotic maple

granite wolf
#

Anyone know how to search and replace within an entire data frame based on a different data frame?

#

I have one data frame full of email addresses and another data frame with a column for email addresses and a column for names, I’m wanting to search the first dataframe for any email addresses which match an email address in the second dataframe, and then replace that email with the persons name

deft ruin
#

You can use the apply method on the first df and then use a custom function that checks the second column that searches the second df and replaces based on that

#

Alternatively, you could do it with merges

slate hollow
#

so i was wondering why for gradient descent
we decrease each weight
by it's partial derivative * the learning rate
shouldn't we just add the learning rate if the pd (partial derivative) is possible, and subtract it if negative?
wouldn't that work as welll?
(ping 2 reply thx)

tidal bough
#

(on the assumption that closer to the goal, the gradient is lower)

lapis sequoia
#

imagine u are close to the goal. If u move a lot, u may exit it. Is like if u never reach the goal cuz u move a lot

waxen girder
#

The gradient supplies the direction to the goal, the learning rate determines the step towards that goal.

lapis sequoia
#

imagine u are at point 1/2. And goal is 0. If ur steps are length 1, u will never reach 0

waxen girder
#

If your learning rate is too high you maybe either overshoot the goal, or spiral out of control. If your learning rate is too small, you might just bounce back and forth on the loss curve without ever reaching the minimum.

tidal bough
waxen girder
#

It's the magnitude of the gradient times the learning rate, I don't see what you are trying to say here?

tidal bough
#

I'm saying that they aren't asking why the learning rate is involved/what it does, they are asking why the magnitude of the gradient is also used, instead of only its direction.

slate hollow
#

yeah

#

but like say hypothetically the gradient is low for a very long period of time

#
          ___----O
min___---```just hypothetically
#

low gradient, but it'd take a very long time

grave frost
#

well, network does not converge on my dataset, but an SKlearn algo works

#

assuming the dataset is lineraly differentiable, why cannot the model capture that relationship?

distant hedge
#
import pandas as pd
cwd = os.path.abspath('C:\\Temp\\Reports\\Combine') 
files = os.listdir(cwd)

df = pd.DataFrame()
for file in files:
    if file.endswith('.csv'):
        df = df.append(pd.read_csv(file), ignore_index=True) 
df.head() 
df.to_csv('C:/Temp/Reports/Combine/combined_file.csv')```

Error: FileNotFoundError: [Errno 2] No such file or directory: 'members_LISTS_NO_CONTRACTS_SENT.csv'

```FileNotFoundError                         Traceback (most recent call last)
<ipython-input-3-59d50ed3d83a> in <module>
      3 for file in files:
      4     if file.endswith('.csv'):
----> 5         df = df.append(pd.read_excel(file), ignore_index=True)
      6 df.head()
      7 df.to_csv('C:/Temp/Reports/Combine/combined_file.csv')```

The file is in the directory as per cwd. I am not sure what I am doing wrong :/ Can someone help?
What I am trying to achieve: Combine all CSV files into one.
lapis sequoia
#

How to display graph on my website?

hollow sentinel
#

I was about to say

sonic raft
#

Hi! I have a question about Pytorch tensors.
So, Is there any difference between

   parameters.grad.zero_()

and

            p.data -= p.grad*lr
            p.grad.zero_()```
The params is a tuple containing two tensors a params tensor and a bias tensor.
lapis sequoia
#

better? @exotic maple

exotic maple
#

no, worse.

#

What libraries are you using?
Is it an interactive chart, or just a Pthon-generated PNG?
are working front-end or back-end?
etc

lapis sequoia
#

so

exotic maple
#

you can't just say "I want to put my line chart in my webpage, how to?" and expect people to know what you're talking about lol

lapis sequoia
#

flask. values will be added to graph every day

bronze skiff
#

so that you don't add to the computational graph

lapis sequoia
bronze skiff
#

for example, if you look at the source code for torch.optim.Adam they call p.data directly to detach it from the backprop graph

lapis sequoia
#

and matplotlib library for graphs

bronze skiff
exotic maple
#

but answering @lapis sequoia Sorry i dont experience with Flask

lapis sequoia
#

aight]

grave frost
#

well, my network does not converge on my dataset, but an SKlearn algo works
assuming the dataset is lineraly differentiable, why cannot the model capture that relationship?

waxen girder
#

There's documentation on the matplotlib website that explains how to integrate matplotlib plots into your webapplication @lapis sequoia

#

Let me see if I can find it.

lapis sequoia
#

aight

waxen girder
#

Embedding in a web application server (Flask)

#

Look at that.

lapis sequoia
#

I will! Thanks

sour abyss
#

im trying out minimax, and this is what i have to far. been following a course on it, but i can't find what's causing this to go on forever. any ideas?

lapis sequoia
#

THANK YOU

#

i have been suffering with this for like 2weeks

#

< 3

wind bobcat
#

Can anyone recommend me below100USD keyboards (tenkeyless is fine) for programming beginner?

twin mantle
#

I have a question

#

What's the best way to improve OCR?

#

Resize, then preprocess or Preprocess, then resize?

severe python
#

@exotic maple can you check out my q above when you're free?

severe python
#

hi all, have a question. so i've created a script that prompts the user what column they would like to search in aka account ID, parent ID, etc. then it prompts what they are searching for. but, i've added a column that the data is all numerical, and it doesn't find them ex 188991, but it can find M188991. any idea how to fix this? coding in python using pandas, code is on a separate comp

 while True:
        variable = input(f"{bcolors.WARNING}Search by Acronym / Parent / Alert ID / Account?    {bcolors.ENDC}")
        if variable == "Exit":
            sys.exit(0)

        if variable not in df.columns:
            print(f"{bcolors.FAIL}Error: Invalid Input{bcolors.ENDC}") 
            continue

        if variable == "Acronym":
            while True:
                input1 = input("Please provide an Acronym:   ")
                result1 = df.loc[df[variable] == input1]
                if input1 == "Back":
                    break
                if len(result1) == 0:
                    print(f"{bcolors.FAIL}Acronym not found. Please try again{bcolors.ENDC}") 
                else:
                    print(tabulate(result1, headers='keys', tablefmt='psql'))
                continue```
slate hollow
#

maybe just

#

open

#

a help channel for this?

severe python
#

here is one part of the code, the rest is the same as
if variable == "Acronym":
just with Parent ID, Alert ID, etc
@exotic maple

slate hollow
#

oh wait nvm

twin mantle
exotic maple
#

aside from that, i cant think of anything else

turbid drift
#

Why do I get cv2.cv2 Even though I didn't even use it

#

Same on my notebook too

robust granite
#

any powerBI user?

exotic robin
#

Hi. Trying to use/debug an add-on used for open source Python based flashcard app Anki. Uses a python link to communicate with Spacy, python based open source NLP processor. the add-on ,Morphman, communicates with Spacy and gets POS tagging and dependency labels. Get this exception when I attempt to pass certain fields of a flashcard through the language processor through a "recalc", which computes this for all user-specified flashcards.

Asking here because it seems the project has been vacated and devs aren't actively supporting it.

Add-on
https://github.com/rteabeault/MorphMan/tree/rteabeault/spacy_support

Debug info:
Anki 2.1.35 (84dcaa86) Python 3.8.0 Qt 5.14.2 PyQt 5.14.2
Platform: Windows 10
Flags: frz=True ao=True sv=1
Add-ons, last update check: 2021-03-28 23:12:08

Caught exception:
Traceback (most recent call last): File "C:\Users\AppData\Roaming\Anki2\addons21\Morphman__init__.py", line 17, in onMorphManRecalc main.main() File "C:\Users\AppData\Roaming\Anki2\addons21\Morphman\morph\main.py", line 573, in main allDb = mkAllDb(cur) File "C:\Users\AppData\Roaming\Anki2\addons21\Morphman\morph\main.py", line 195, in mkAllDb ms = getMorphemes(morphemizer, fieldValue, ts) File "C:\Users\AppData\Roaming\Anki2\addons21\Morphman\morph\morphemes.py", line 166, in getMorphemes ms = morphemizer.getMorphemesFromExpr(expression) File "C:\Users\AppData\Roaming\Anki2\addons21\Morphman\morph\morphemizer.py", line 51, in getMorphemesFromExpr morphs = self._getMorphemesFromExpr(expression) File "C:\Users\AppData\Roaming\Anki2\addons21\Morphman\morph\deps\spacy\morphemizer.py", line 40, in _getMorphemesFromExpr self.proc.stdin.flush() OSError: [Errno 22] Invalid argument

carmine iron
#

this seems rather simple but I believe I am getting an infinite loop
nums = [1,0,4,10,14]
for i in range(len(nums)):
if nums[i] == 0:
nums.insert(i,99)

grave frost
abstract zealot
#

If i scale my features before sklearn logisticregression, i should get different confusion matrices for scaled and unscaled data and yet im not. Does anyone know why this might be?

exotic maple
#

I feel inclinde to test it

carmine iron
#

well i just trying to add a certain number before each element in a list if the element met a certain condition like i ==0

exotic maple
#

AHHH

#

I see what happens

#

It doesnt ever delete the 0, and the loops lenght is reset

#

interesting

#

Is there a replace method for lists?

#

ah yes, i forgot lmao

#

the recommended method is list comprehension duh

#

@carmine iron do this

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

exotic maple
#
lista2 = [n for n in nums if n != 0 else 99]
carmine iron
#

well I dont want to replace 0

#

just add a 99 before 0

exotic maple
#

oh

#

mmm

#

but that....why...you know what, nvm lol

#

I...dont see a good way to do that. Anytime you add an element to a Python list every successive element's index change, so you'll have to do massive iteration loops

#

for 1 element it wont be hard, but if you have large or dynamic lists...-shivers-

#

Maybe you can create a dummy index variable and use range dynamically with it? so instead of just range(len(list))) you can do range('dummy index here', len(list)))

#

basically, skip all the previously seen elements

cyan shard
#

there is a problem with your syntax

#

[x if x != 0 else 99 for x in nums]

cyan shard
#

this leads to infinite loop

carmine iron
#

yeah i am not understanding why it creates an indefinite loop

cyan shard
#

because every time you add 99 length is increasing 1 and for loop still takes 0 on the 1 index

carmine iron
#

ah,

cyan shard
#

0 goes nowhere and for loop detects it and adds 99 infinitely

#

take a look at that

#

every time it adds 99 it goes right before 0

exotic maple
cyan shard
#

it looks so much like if else in other statements

carmine iron
#

is there a clean way to do this

exotic maple
tidal bough
#

What's a nice way to get the first row of a DataFrame matching some predicate as a Series?

def get_first_row_cond(df:pd.DataFrame,predicate):
    rows = predicate(df.index)
    first = np.where(rows)[0][0] # np.where here will return a 1-element tuple, the first element of which are the indexes that matched the predicate
    return df.iloc[first]

first_row_with_arg = get_first_row_cond(weapons,lambda rows:rows.str.contains("arg")) # for example
#

looking for an easier solution than this

#

also interested in the specific case when the predicate is just "the index of the row (the dataframe uses a string index) contains some substring"

#

And another question: is there a way to hide some columns of a DataFrame so that they are accessible, but not shown by default when printing (well, or whatever the fancy output Jupyter shows dataframes is is called) the dataframe?

grave frost
#

Does anyone have any idea why a network cannot grasp a linerally differentiable dataset which can be done with Gaussian Naive Bayes?

lapis sequoia
#

Anyone can help me

#

why i can't change ["Status"] HU for 1

#

using pandas

grave frost
#

you can debug that yourself. for example, what does tornados['Status'] return?

#

do you think your replace function applies to each element =?

grave frost
#

Damn, chat's dead 😞

serene scaffold
ruby magnet
#

Hello everyone, currently trying to understand why this error keeps popping up. Any help?

AttributeError: 'int' object has no attribute 'lower'

The code is:
`import pandas as pd
from stop_words import get_stop_words
my_list=get_stop_words('english')

df=pd.read_csv("C:/Users/ymaxn/Documents/Python Data Mining/yelpreviews.csv")

#seperate x and y

x=df["stars"]
y=df["text"]

#convert x into document term matrix
from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer(stop_words=my_list)

cv.fit(df) ##fitting, getting features

features=cv.get_feature_names()`

lapis sequoia
#

Apparently SymPy can't solve x^2 = y^4 (1 - y^3) in terms of y. Anyone know how to use SymPy to solve this equation for y? The range for x and y is -1 to 1.

import sympy
x, y = sympy.symbols('x y')
expr = x**2 - y**4 * (1 - y**3)
sympy.solve(expr, y)
# this returns []
spare arch
#

hello

#

Should I install jupyternotebook per conda virtual env

#

or should I install it globally?

#

anyone have any opinions?

serene scaffold
restive scroll
#

Is there a standard library for solving system of equations?

serene scaffold
river fog
#

Where can I get started with machine learning?

restive scroll
trim oar
trim oar
trim oar
#

So that you find out which variable is integer

#

An integer doesn’t have attribute lower() because that’s for string only

#

Are you sure this is all the codes?

trim oar
balmy junco
#

in spark I am reading in a csv file with column headers

#

I want one of the columns to be an index

#

is there a way i can do that?

spare arch
#

what does df.iloc[:,4] actually do?

#

I've never seen the comma before

#

oops to be more specific

#

df.iloc[:,4].rolling(window=ma).mean()

#

where df is dataframe ofc

indigo obsidian
#

i'm trying to use pplot from seaborn-qqplot to graph a set of data and fit regression lines. however of the 5 sets of data, it's only fitting 2 regression lines. any help would be appreciated

#

the code i'm using:

pplot(car_data, x = "horsepower",
  y = "price-sq",
  hue = "body-style",
  kind = 'qq',
  height = 4, aspect = 2,
  display_kws = {"identity":False, "fit":True})
plt.show()
serene scaffold
indigo obsidian
#

and what i get is

spare arch
#

@serene scaffold yes but idk what the comma does

serene scaffold
#

and you separate the two with the comma

spare arch
#

so [:,4] is saying every row and col 0 to 3

serene scaffold
#

because 4 is an int, not a slice

spare arch
#

ooo right

#

ty @serene scaffold

lapis sequoia
#

What's the difference between Scikit Learn, PyTorch, and TenserFlow? I can't find a clear answer online. I'm just getting my hands dirty with machine learning and wanna know what would be best to start off with. What would be the best to create something like a chat bot that can have full on conversations?

exotic maple
velvet thorn
#

if you do, then TensorFlow or PyTorch would be more appropriate

#

that said

serene scaffold
velvet thorn
#

if you're just getting started

#

I'd say a chatbot is kinda over your head

#

I would strongly suggest you begin with something simpler

#

very strongly

lapis sequoia
serene scaffold
#

They could always make an Eliza bot, but I don't think that's what was wanted.

velvet thorn
lapis sequoia
velvet thorn
lapis sequoia
#

Since it’s just taking in input and responding to it

velvet thorn
#

you spend some time reading up on the history of machine learning

#

and deep learning in particular

#

natural language processing is extremely complex

velvet thorn
#

but think about it...

serene scaffold
velvet thorn
#

how many animals do you know that can understand natural language? 🙂

lapis sequoia
#

Not many lol

velvet thorn
#

and think about

#

how many billion years

#

it took those animals to evolve

#

natural language processing with neural networks is nowhere near a century old

#

let it suffice to say

#

that a human-level chatbot is still far beyond our abilities @ the moment

#

we can get close in restricted situations, yes

#

but only with state of the art techniques

serene scaffold
#

a lot of NLP tasks have pretty narrow scopes

velvet thorn
#

think about how much $$ Google pours into its AI

#

and look @ Google Translate

serene scaffold
#

like, "identify all the words in this document that belongs to a certain category, even if you've never seen that word before"

velvet thorn
#

it's a lot better than it was 10 years ago, yes, but it's still nowhere near perfect

#

so...if you want a general chatbot? you might need to wait a while

#

of course, if your chatbot's scope is restricted, the problem becomes more tractable

#

but anyway, I would say...work on your fundamentals first.

lapis sequoia
#

So what would be a good library for me, a noobie in machine learning but pretty experienced in Python. And where should I start off with learning it?

velvet thorn
lapis sequoia
#

Alr

velvet thorn
#

really...?

lapis sequoia
#

?

velvet thorn
#

?

lapis sequoia
#

Nvm lol

#

But thank you

#

I’ll start off with that

velvet thorn
#

yeah

#

you should

#

minimally

#

be able to write a simple deep learning library

#

IMO

#

backpropagation, feedforward, gradient descent, fully connected layers, all the basic stuff

#

you could pick up something already in existence and make a basic chatbot, but it'd probz suck

#

oh, and don't forget your linear algebra, graph theory, calculus, statistics, probability, etc. etc.

lapis sequoia
#

Damn alr

velvet thorn
#

foundation is everything in DL

#

oh

#

alr = "alright"?

lapis sequoia
#

Yeah

velvet thorn
#

in this part of the world "alr" = "already"

#

hence my confusion

lapis sequoia
#

Yeah lol

#

Is there a difference between deep learning and machine learning

velvet thorn
#

yes

#

machine learning is more general

#

it's a bit fuzzy

#

but basically

#

machine learning refers to any technique that allows a computer to, based on an algorithm, modify its responses to incoming data based on already-seen data

#

deep learning is a specific type of machine learning that uses neural networks with many layers

#

the "depth" of a neural network refers to the number of layers it has

#

with computational power having become increasingly cheap in the last 20 years

spare arch
#

What's the best way to get the min value across columns ?

velvet thorn
#

more complex (deeper) neural networks became increasingly viable

spare arch
#

but if I only want to compare certain columns

lapis sequoia
velvet thorn
spare arch
#

I have to type every single one out

velvet thorn
#

what rule do you have

spare arch
#

what if there are many columns

velvet thorn
#

for identifying those columns

spare arch
#

is there a better way?

velvet thorn
#

that depends

#

on how you determine that subset

#

you can't assume your program automagically knows which columns you want

spare arch
#

I know but I was looking for a better way than just listing all of them

#

my columns look like this

#

High Low Open Close Volume Adj Close Sma_50 Ema_3 Ema_5 Ema_8 Ema_10 Ema_15 Ema_30 Ema_35 Ema_40 Ema_45 Ema_50 Ema_60

#

I wanted to compare across my emas

#

but obv not across high, low, open etc

velvet thorn
#

df[[col for col in df.columns if col.startswith('Ema')]].min()?

spare arch
#

oo

velvet thorn
#

I reco this book

#

it's not as digestible as a lot of the material you will find online

#

but it is good

lapis sequoia
#

Ima check it out, thanks for all the advice 😅

velvet thorn
#

yw 👋

exotic maple
#

can anyone think of a programatically better way to add values to a pandas column? Let me explain:
My data looks like this

#

I want to be able to programaticall modifiy the movies column IF both names have movies in common (duh)

#

this data is held in a list

#

right now im doing:

  1. find name in either column
  2. obtain index location
  3. create copy df
  4. find 2nd name in either column of mirror
  5. obtain index location
  6. add value in original df at the specified iloc
#

but this doesnt sound very efficient to me... any hints=?

#

It works, and I get why I want, but God save me its...messy

#

ok so...I did it, much better and efficient, but It's the most disgusting piece of code my eyes have ever witnessed

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

exotic maple
#
relationships.index[((relationships["Name1"] == weights[0][0]) | (relationships["Name2"] == weights[0][0])) & ((relationships["Name1"] == weights[0][1]) | (relationships["Name2"] == weights[0][1]))]
turbid drift
#

Hey everyone 👋 I have an issue here:

#

I get this error, and It says op error, but why is it happening?

wintry sapphire
#

@velvet thorn hi there I have a question also regarding pandas

#

if I want to filter all the results from my dataframe in that column and add a new column, how do I do it

#

so for example, in my column names, there is a lot of names, some also repeated

#

but I want to filter all the names that are repeated into 1 new column

tropic junco
#

can anyone help i want to make an ai i am new to python

#

so just wanted someone to help build an ai

#

@hasty grail

#

i want to make an ai so can anyone help with the code

vivid wren
#

I'm using a numpy array to track snowflakes in a grid, is there a way to create an editable "window" array that affects the bigger array?

tropic junco
#

#i need help

#

can anyone help

#

helpers

spare arch
#

does anyone know how to use the index as the column for the plot?

#

for instance my data is like this

#

I wanna plot the index against the impliedVolatility'

dense cosmos
#

You can use df.index to access it

wintry sapphire
#

@dense cosmos hey

#

do you know how to see what are the duplicates in pandas

dense cosmos
#

df.duplicated()

wintry sapphire
#

yeah so like

#

what I wanna do is to

#

group the data I have in my dataframe

#

and create a new column

#

do you know hwo to do that @dense cosmos

#

so in my dataframe I have date, stocks, prices

#

in the stocks column, it has not been cleaned so I have a bunch of different stocks

#

I want to be able to group the same stock name together

dense cosmos
#

Are the stock different datapoints for dates or just duplicate rows?

spare arch
#

I saw this plot from the data that I have

#

its so nice

wintry sapphire
#

they are not duplicated

spare arch
#

im trying to do this right now ❤️

wintry sapphire
#

but the stock name is duplicated

dense cosmos
#

Probably want to use df.groupby("stock")

wintry sapphire
#

so when I use that

#

how then do I retrieve a certain stock

#

like how do i retrieve how mnay groups I have

dense cosmos
#

You can do it with len(df.groupby("stock").groups.keys())

#

And getting a certain stock can be done like selecting a column with groupby["stockname"]

spare arch
#

does anyone else feel like the plotting libraries are a lot to get used to?

#

D:

wintry sapphire
#

@dense cosmos thanks sir

#

let me play around with it

#

I have another question like

#

so like I have a list of dates --- all last dates for every month for every year

#

so in the my dataframe I have various dates

#

1 Jan, 5 Jan etc etc

#

I want to convert all this dates in the dataframe to the ones in the list

#

so for example , 1 Jan will be converted to 31 Jan 20XX

#

same for the rest of the dates and the corresponding years

#

I will then add it into a new column of the dataframe

#

how should I go about doing that? @dense cosmos

dense cosmos
#

Are you trying to aggregate by month?

wintry sapphire
#

so like

#

month and year

#

meaning like for every date in the dataframe that is in Jan 2020, I will convert it to 31 Jan 2020

#

if it is in Jan 2019 then I will convert it to 31 Jan 2019

#

@dense cosmos

spare arch
#

im a little bit confused on how to recreate this plot

#

I'm currently doing something like this df.plot_bokeh(kind = 'scatter', x = 'dte', y = 'impliedVolatility')

#

I'm gonna switch to line graph

#

but I have the x and y axis

#

how do I make it graph every different date?

#

like in the one above

velvet thorn
#

like 10

#

in text

spare arch
wintry sapphire
#

@velvet thorn so these are the dates in my dataframe

#

ideally what I want is anohter column

#

that will be

2008-04-31
2009-03-31
2008-07-31

velvet thorn
#

so

#

you want

#

to turn that

#

into the last day

#

of the month?

#

@wintry sapphire

spare arch
#

oh you weren't talking to me

wintry sapphire
#
    trading_dates = []
    trading_month = []

    date_ranges = pd.date_range(pf_clean['PC Date'].iloc[0], pf_clean['PC Date'].iloc[-1], freq = 'BM' )    
    for td in date_ranges:
        trading_dates.append(td)```
spare arch
#

😅

wintry sapphire
#

@velvet thorn in the trading_dates list

#

I have the dates that will be for used for the respective dates

velvet thorn
#

in general

#

for data manipulation questions

#

the simplest way to get the answer you want

#

is to show a text example of input data and the expected result

wintry sapphire
#

ok hold one

#

PC Date
2008 - 01 - 12
2009 - 05 - 30

Trading Dates = [2008-01-31, 2008-02-28,2009-05-31]

#

so I want a new column - new dates

#

such that it will input the dates according to the months

#

and year

#

so the New Dates column will be:

2008-01-31
2009-05-30

#

@velvet thorn

velvet thorn
#

but what I'm guessing is

#

you have two Series of datetimes

#

the second Series is unique on a month level

#

so you want to match the two Series on their months and get the corresponding value from the second Series

#

is that right?

wintry sapphire
#

yes

#

that's what I meant 😆

velvet thorn
#

let me think about this for a bit

wintry sapphire
velvet thorn
# wintry sapphire alright, thanks!
>>> left = pd.to_datetime(['13/01/2010', '05/05/2010']).to_series().reset_index(drop=True)
>>> left
0   2010-01-13
1   2010-05-05
dtype: datetime64[ns]
>>> right = pd.to_datetime(['16/01/2010', '18/02/2010', '24/05/2010']).to_series().reset_index(drop=True)
>>> right
0   2010-01-16
1   2010-02-18
2   2010-05-24
dtype: datetime64[ns]
>>> pd.concat([left, left.dt.month.rename('month'), left.dt.year.rename('year')], axis=1).merge(pd.concat([right, right.dt.month.rename('month'), right.dt.year.rename('year')], axis=1), on=['month', 'year'], suffixes=['', '_new']).drop(columns=['month', 'year'])
           0      0_new
0 2010-01-13 2010-01-16
1 2010-05-05 2010-05-24
#

real quick

#

you can, of course, clean it up and spread it out

#

but I believe this is what youw ant

spare arch
#

@velvet thorn

#

can u help me too?

#

🙂

wintry sapphire
#

trying to understand the code haha

#

@velvet thorn py on=['month', 'year'], suffixes=['', '_new']).drop(columns=['month', 'year'])

#

what does this line do

uncut barn
#

anyone got any tips of which algorithms I should use for Audio classiification:
My project consists of audio files which are 32 numerals and each with a label of the style of voice i.e. bored, excited, neutral etc..

vivid wren
#

is it possible to roll a 3d array containing an image in numpy?

short heart
#

uh

#
runcell(0, 'D:/!Code/папкипитона/!!!Project_currency/spyd.py')
Traceback (most recent call last):

  File "D:\!Code\папкипитона\!!!Project_currency\spyd.py", line 55, in <module>
    from prophet import Prophet

  File "D:\!Code\папкипитона\!!!Project_currency\prophet.py", line 56, in <module>
    m = Prophet()

NameError: name 'Prophet' is not defined```
#

tf is this

#
import fbprophet
from prophet import Prophet
m = Prophet()```
#

im importing fbprophet and it says that Prophet isnt defined

#

tho i very clearly imported it

#

or can it be something with spyder ide?

short heart
#

how do i pass a time series column to lstm. For example i've got a column with dates and column with values. Do I need to pass both?

buoyant haven
#

can someone who has any experience with matplotlib please look at #☕help-coffee

brave owl
#

Update on Problem I was facing yesterday, I found out it was all my fault for bad code, I' recoded everything and now it's working with 91% Accuracy. Plugging the code - The Code isn't optmized cause my main focus was just make it work

https://ideone.com/Rpjj0v

raven knoll
#

Could anyone show me a example of a parameterized stored procedure used in big data?

tidal bough
short heart
#

how do i feed several datasets into keras lstm

#

i dont think just uniting them is good since its time series

grave frost
# trim oar Because it’s not linear by nature?

well, I tried a stack of dense layers with relu activation and it didn't work - not to mention CNN'S, RNN's and transformer architecture. all of them cannot converge (hell, they have 5% accuracy) while Naive Bayes gets near top results.

#

its a complete mystery - my network seems to overfit to simple data, so I doubt there is a poblem there

grave frost
#

the structure of the two frameworks has become very similar now, so not much difference exists except the ease-of-use they offer

serene scaffold
#

I'm not referring to any implementation details and how optimized they might be.

still karma
#

Hi everyone,
I am a high school student. I have a question, what math do i need for ai topics and how can i learn them quickly?btw i have intermediate level python

grave frost
# still karma Hi everyone, I am a high school student. I have a question, what math do i need ...

you can't learn anything quickly - building knowledge takes time. I recommend you keep learning your school level math and try to learn the AI concepts intuitively rather than focusing on the math at first.

you can always dive into the math later (like in college where they would teach you everything from the ground up). you would be surprised how far an intuitive understanding can get you

serene scaffold
winged yew
#

can anyone explain me about OneHotEncoding ?

serene scaffold
still karma
grave frost
winged yew
grave frost
serene scaffold
serene scaffold
# winged yew yes

so you have a vector of all zeros, except one element is 1. that's a one hot vector.

still karma
grave frost
serene scaffold
#

The CS program that I'm in currently requires you to do well in math courses or they won't let you in.

winged yew
still karma
winged yew
#

i got this error

#

how can i solve this

grave frost
#

well leave that for a moment

grave frost
#

I would say that you learn your school-level math - and learn the deep concepts in the college because you would have to do them anyways

serene scaffold
grave frost
#

but make sure your school math is exceptionally strong