#data-science-and-ml
1 messages · Page 401 of 1
did you try using the KFold generator?
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
will this work for you?
you can't copy and paste it as text in a code block?
from sklearn.utils import shuffle
col_num = X.shape[1]
new_Ind = []
cur_MaxScore = 0
col_Ind_Random = shuffle(range(0,col_num), random_state=1)
for cur_f in range(0, col_num):
new_Ind.append(col_Ind_Random[cur_f])
newData = X.iloc[:, new_Ind]
clf=DecisionTreeClassifier(max_features="sqrt", max_depth=4, min_samples_split=22, min_samples_leaf=9, random_state=0)
cur_Score = cross_val_score(clf, X, targets, cv=5).mean()
if cur_Score < cur_MaxScore:
new_Ind.remove(col_Ind_Random[cur_f])
else:
cur_MaxScore = cur_Score
print("Score with " + str(len(new_Ind)) + " selected features: " + str(cur_Score))```
that would not show show you outputs
Oh. Should I use a k-fold generator and loop through it and do feature scaling before each iteration?
you can copy and paste that as well. remember that you're more likely to get help when you make it as easy as possible for answerers, and text is easier to work with than screenshots/pictures.
anyway, try using the KFold generator to make the folds, and then evaluate during each iteration of the KFold generator.
sounds like you understand what I'm saying 😄
I studied k fold generator. I just forgot about it
Is it not possible to scale the whole dataset though?
sc.fit_transform(X)
you could do that before starting the kfold train/eval loop, I suppose. I've never actually worked with decision trees.
Then there would be no need to do the loop at all
do you know the difference between fit and fit_transform?
cross_val_score(clf, X, targets, cv=5).mean()
no
I just discovered feature scaling today
I believe it doesn't matter in decision trees though.
fit and transform are two operations. fit adjusts the parameters of an sklearn object, and transform modifies data in some way. whereas predict passes X data through a model and returns the predictions.
so, fit_transform isn't going to be a method of the decision tree classifier, because it's a model.
the decision tree classifier is intended to return decisions
but you have clf=DecisionTreeClassifier(max_features="sqrt", max_depth=4, min_samples_split=22, min_samples_leaf=9, random_state=0)
Could you solve this problem first. Why isn't my accuracy getting improved even by doing feature selection.
what are the features
Index(['age', 'anaemia', 'creatinine_phosphokinase', 'diabetes',
'ejection_fraction', 'high_blood_pressure', 'platelets',
'serum_creatinine', 'serum_sodium', 'sex', 'smoking', 'time']
can you show me a few lines of the CSV as text?
75,0,582,0,20,1,265000,1.9,130,1,0,4,1
55,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
65,0,146,0,20,0,162000,1.3,129,1,1,7,1
50,1,111,0,20,0,210000,1.9,137,1,0,7,1
works?
in this code
I don't get how it's doing feature selection
It's taking in a single feature at a time. Checking if it's increasing the accuracy of the model. If yes, we take it in. If no then we discard it.
well, it doesn't look like your newData variable ever gets used.
now you know how I feel every day
Man! That's what happens when you try to reuse someone else's code 😭😭😭
this just goes to show that anyone can betray you
anyone
you can never let your guard down
professors can't write python code if their life depended on it.
Btw. I will modify it. Back to feature scaling. If I don't do train test split. And do fit_tranform(X) and pass that transformed X in cross_val. Will it be right?
X_transformed=sc.fit_transform(X)
cross_val_score(clf, X_transformed, targets, cv=5).mean()```
only one way to find out 😄
Like this
on it
Worked
Good job kolv
Kolv and stelercus partners
https://c.tenor.com/7Ypq9_9najcAAAAM/thumbs-up-double-thumbs-up.gif
What's Is/eum
"he/him" but in latin
Okay. So i corrected the code. It's still giving me lower accuracy than what I got by selecting all features.
🔥 🔥 🔥 🔥
it might be that decision trees can learn to ignore features that aren't helpful.
(though it would likely take additional computation time to figure out which ones those are.)
tysm
yw
copying that to put it my report. Haha
fax

Anyone good with TFX?
I am getting an accuracy boost taking p=2 instead of p=3 in knn. In the minkowski distance. But what effect does increasing p would bring in generalisation of model. Does taking p with the highest validation score suffice?
The boost is 78.94 to 79.60
can any one can help me with this
output=[]
for i in range(int(input(""))):
a,b,c,d=map(int,input("").split(" "))
if (a+b+c)<=d:
output.append(1)
elif (d*2)>=(a+b+c)>=d:
output.append(2)
else:
output.append(3)
for i in output:
print(i)
that's my program and code chef is not accepting it
Is this a test?
No
Ok. I will still wait 1h to help, just in case 😉
if it's not a test, then there is nothing wrong waiting a bit
Just help me it’s not a test
I will help you in 1h
Okay u will give me the answer in 1 hour?
I will help you figure it out in 1h.
It's better to help you understand it than spoon feed you an answer that you don't understand
!rule 8
8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.
It’s not an exam
What makes u think that?
Like I said it’s unlimited tries
I just don’t have hours to wait@for 1 problem
It’s 12 o clock where I live
Had an experience with that he asked me to code in Matlab lmao after I helped him extend his PHD thesis with a Python script...other profs knew only Matlab so Matlab it was..
Ok very quick question. So i have like 10 graphs of same event but they differ slightly. I wanted to somehow use graph neural network to get approximation of fitting function, is it a right approach? Maybe some libs for that exist alr? Thanks.
Or it will be easier to get all functions from each graph and get median?
that depends on what knowledge you have about the graphs. if they are different because they are afflicted with 0-mean noise, you can simply average them. if the noise is impulsive, median filtering could work. if you already know a model for what the graph should look like but not the parameter models, you can optimize for parameters (e.g. with machine learning or classical gradient/newton methods)
if you know the type of noise but not the model, you could try and make a denoising network that takes in several noisy versions of a signal you made yourself synthetically, and give the network knowledge of the clean version, too
Hello. I was wondering, do we typically only use cross_val_score on training data?
do you understand what cross validation involves?
Yes, it is already splitting the data
Is there someone here who is currently exploring Object detection in python?

one of our profs "preferred" people to do their projects in matlab
everyone ended up choosing python instead

Nice go python
Is that a test??
Hi everyone, my pytorch ai wont work and i've narrowed its issue down to the fact that it doesnt respond to reward, i've attatched the py file of which i use to calculate its next move and how it uses reward for that
and im wondering what i can do to make it respond to reward and calculate a move based on its reward
does anyone used ACO,PSO algorithms for classification ?
😂 The kind of rebellion I love to see
Guys, im new to data science and im currently out of ideas, so need help 😰 . I found a solution with iterrows, but the script will running for 2 weeks, so i need better solution
# DataFrame A
# ID | NAME | CITY | STATE |
# 1 | Jeff | New York | "" |
# 2 |Harold| Dallas | |
#DataFrame B
# ID | CITY | STATE |
# 1 | New York | NYC |
# 2 | Houston | TX |
# 3 | Dallas | TX |
I need to check for every row if CITY from DataFrame A is in CITY of DataFrame B if it matches then return value from STATE column in matched row and assign to the DataFrame A
so in the end it would be
# DataFrame A
# ID | NAME | CITY | STATE |
# 1 | Jeff | New York | NYC |
# 2 |Harold| Dallas | TX |
atm i have
dfA = dfA.assign(state=dfA["CITY"].isin(dfB["CITY"]).astype(str))
but it only returns boolean. How would i combine it to return those states?
that is just a left inner join (in sql terms), you can use merge or join method for that
!d pandas.merge
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)```
Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column...
Confirm if there's any word token in your corpus. If your corpus list isn't empty, then the problem is from your wordcloud code
!d pandas.DataFrame.join
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)```
Join columns of another DataFrame.
Join columns with other DataFrame either on index or on a key
column. Efficiently join multiple DataFrame objects by index at once by
passing a list.
😮 i'll check it, brb
I want to start learning AI ML, so from where should I start? Suggest some good resources to start with
do you have a solid basis in numpy ?
nope I have knowledge of python, so from here... in what direction should i move? I mean what should be my learning path?
would be great , if u give some advice
in short: numpy -> pandas -> matplotlib (optionally seaborn & more) -> AI ML
have you seen the resources section ?
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
nope I'll check it out thankss 🙂
Hi everyone, my pytorch ai wont work and i've narrowed its issue down to the fact that it doesnt respond to reward, i've attatched the py file of which i use to calculate its next move and how it uses reward for that
https://paste.pythondiscord.com/hekedehepe
and im wondering what i can do to make it respond to reward and calculate a move based on its reward
Hello, I just created a bar plot with the following picture, it turns out that there is a text hit the top spine(the red sign), and I want the bar is close to the side spines(the green sign). Does anyone know the solution for these problem? Thank you
This is the code
can you share the code as text and (preferably) the data ?
Hey @lapis sequoia!
It looks like you tried to attach file type(s) that we do not allow (.zip). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Hey @lapis sequoia!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
all set-up
increase ylim and decrease* xlim
yeah that should work (though I haven't been able to sofar)
Oh yeah, i forgot about that, thank youu:D
and try not to hardcode the limits, just add like a proportion from the highest bar
lmk if it works, I kept getting 'int object is not callable', even when using an example straight from the matplotlib docs
can anyone suggest any full length pandas tutorial which goes from beginner to advanced topics?
hi, i had this code that can scrape data from multiple pages, i tried to run it on google colab but it couldnt cause of the RAM somehow i tried to upgrade it but still couldnt run it so i decided to run the same code in Jupyter notebook and it's still running for 4 hours now . i didnt understand the issue, can someone help me out with it ? thank you in advance.
How fast it supposed to run?
it's worked, I added this
plt.xlim(xmin=-0.8, xmax=22)
plt.ylim(ymax = 45)
well it depends on the code and the output but usually it takes seconds and maybe minutes
So hours doesn't seem right
does anyone know how to set the linewidth of the spines?
Where did you run it when it took minutes?
not the same code tho ... at first i tried on google collab but then the RAM wasnt sufficient so it didnt run and now on jupyter it takes hours
How do you know it should be minutes?
If it's not the same code
Difficult to investigate this with so little details
i did extract data but it was from one page
so when i try to do the pagination it stuck
maybe there is a problem in your code
well yeah it can be but in this case it should run and get error right ?
infinite loop can be the problem too
set_linewidth so
plt.gca().spines[:].set_linewidth(4)
``` to have a slightly thick border around the graph (all the spines)
right it does make sense.. thanks
thankss
hi i want to develop a fintech do you ahve some code
can anyone help me with a bayesian hidden markov chain in rule?
in chain is the thingh
with stan
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

ok apparently if I send a json in this format:
{'symbol': 'AVAXUSDT', 'timeframe': '1', 'side': 'SHORT', 'order_type': 'Market', 'order_price': '52.535', 'tp_min': '52.535', 'tps': '[52.515, 52.51, 52.5, 52.475, 52.45]', 'sl': '52.565', 'rr': '2.8483333333'}
the error appears, but if I send the json like this:
{
"symbol":"AVAXUSDT",
"timeframe":"1",
"side":"SHORT",
"order_type":"Market",
"order_price":"52.535",
"tp_min":"52.535",
"tps":"[52.515, 52.51, 52.5, 52.475, 52.45]",
"sl":"52.565",
"rr":"2.8483333333"
}
the error doesnt appear
nice, i had some troubles with duplicates, but finally it worked, so thanks!
Getting started with ML and AI, would love to see what you guys use for predicting/forecasting sales. Anyone able to help?
is this real or one of those fake news articles https://www.techradar.com/news/your-mechanical-keyboard-isnt-just-annoying-its-also-a-security-risk
bc idk if you can train a feasible model using speech recognition and word probabilities
with NLP
like
it sounds like the accuracy wouldnt be good but thats just my intuition
Sounds feasible different areas on the keyboard does yield different sounds...maybe even fft and signal processing can do it without resorting to nn..try tapping your keyboard an listen while drinking coffee lmao
like the kind people are getting for their computers so they can work remotely?
In paper they say detect 41.8% of keystrokes and 27% of typed words correctly in such a noisy environment---even without user specific training. To investigate the pote
Be noisy lmao
yeah, keyboards are rather awful security wise, but the vast majority of exploits require physical access, and if a bad guy can touch your computer, the keyboard is not your biggest problem

There are more fun physical hacks, like using a laser to shine into a printer's LED through the window and uploading a virus.
(LEDs are inputs too)
It's a non issue though. Just an interesting research project.
Hey @barren wedge!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
How to make my model run faster in google colab?
https://paste.pythondiscord.com/pagorozeyi
how do I locate my ipynb file, my os is windows 10
You can copy paste it
Maybe because json doesn't allow single quote
import os
os.getcwd()
This will tell you your current work directory. If the iPython-Notebook file you're looking for is different from the one you're currently working on, just search on your pc / try looking in the right folder.
anyone know how to do this
Hello, how do I reach the hours in this column?
df['hour'] = [x[11:13] for x in df.index]
It doesnt work.. it says this
You must slice the timeperiod column, not the index of the dataframe
so it should say df.Timeperiod?
Yes
does this mean they're in zip files now?
Anyone here familiar with YOLO? I've used inference in DarkNet and OpenCV on the same image, but the inference on OpenCV has some slightly more detections. Wonder if anyone has experienced something similar.
You could easily convert the column to datetime object and extract the hour.
pd.to_datetime(df['time_period']).dt.hour
You could also try using the apply function + lambda on the column to extract the hour as well.
df['time_period'].apply(lambda x: x[10:12])
always the one that uses dt 😄
also wouldn't df['time_period'].str[10:12] be more idiomatic?
That's the one I mostly use when working with a time column 😃
well, if they have timestamps encoded as strings, they should really convert those to a proper datetime
Big thanks guys, will try it now
Hey, thanks for your answer. Sorry I haven't replied till now.
I am having some trouble with the .rank() function. Can't seem to figure it out. also the code you shared is not filtering the first 2 distinct IDs but its only returning 1 unique value
I'll try to use dense instead of min
seems to have a better result but it's still not quite right
How do I approach explaining Graph Neural Networks to a group of people who have absolutely no idea about neural networks let alone programming?
if they're good at math, you can explain the math
compate it to our brain?
but what if they're not good at math
they won't understand a thing either 😛
you just explain all the necessary prereqs
if you haven't practiced explaining things enough times it will be hard to explain it all concisely enough
and that's why people love cheatsheets
there's one on graph theory for nns, i bet
@tough bolt have you seen this site: https://distill.pub/2021/gnn-intro/ condensing it for nonexperts may be difficult
and i agree with @urban lance's idea that the human brain is a good analog for graph nns because we have compartmentalized brains that operate like them.
does anyone here have the time to review a brief colab notebook
like i'm talking 20 cells
half of which are just things you can ignore like setting up the dataset
i just need feedback and to see what i'm doing right/wrong
notebook in question
anyone familiar with lstm? i have a question
I haven't but it might be a good starting point. Thank you
Not a bad idea
Any have to read book about DS, AI or smth?
To understand the basics,to extend knowledge, to see how to implement effective algorithms, projects for AI on Python?
recently did some citizen datascience using polars
One of the most expensive hospitals in America may actually be a nonprofit. Insurers pay the hospital, Mary Lanning Healthcare in Nebraska…
here is the notebook, if anyone would like to give feedback
i'm not a pro data scientist, but i'm doing my best
Calculus is necessary in programming after all
Not everything but some parts like integrals are very useful
And matrix math
Oh
Can polars be used as an alternative for pandas? Nice project BTW
absolutely. i think of it as the spiritual successor to pandas
it's the future imo
i like pandas fine, but i find polars cleaner as an API and faster
and the code that i write is just way more elegant and readable
Oh
I'm looking at your notebook. the most obvious difference is col, which must involve some delayed execution. it's an interesting solution to all the self-referencing indexing one often sees in pandas.
yep. polars expressions are awesome
imo
are there any other similarly significant differences?
Sure is
also, does it support sets?
select/filter are cleaner
Okay thanks 🙂
i mean these are judgments
but what i'd recommend is just rewriting one of your existing notebooks in polars
i had a "wow" moment where i was like holy shit, this is just a better way
I actually avoid notebooks as much as possible 👀
if you want to comment here https://www.reddit.com/r/Python/comments/ululk1/i_used_a_new_dataframe_library_polars_to_wrangle/
i can basically give you examples, if you want specifics
Read through it,understood nothing,can you explain what is this you write code in?
assuming other people have the same questions as you
thanks for letting me know!
Yeah looks like a deep dark forest
Can u explain for us in plain words which instruments did you use?

thats fair
theyve def been abused
but they do have their use cases
such as rapid experimentation
once the rapid experimentation is done, it all has to get composed into py files, and the notebook must be destroyed
U say that python > jupyter?
Then why there are a lot more jupyter ai projects,then python?
jupyter runs python code. it's just that it also allows you to put in tex, markdown, and images all in the same place, too, but you could just as easily make a script in a file (or a few) and run it from the terminal or your favorite IDE. jupyter is one way of doing that, too
Oh so jupyter isn't necessary
Out of curiosity why destroy the notebook? I use the markdown in it heavily as a notebook, and I find it good to refer back to
And I can use a pure python
Community version is enough for AI stuff, or I'l have to buy a professional?
community version of? python is free
Pycharm
you don't need pycharm either. the community version would be fine, if you like it. you can pay for the pro version if it has anything you like. technically, all you need is notepad or any other text editor, and to download python
There ide better then pycharm as far as I know
So embedded notepad for editing is enough?
sure. i don't think windows' notepad supports syntax highlight though, which is imo the core thing you'd want. you could use, for instance, notepad++ or something like that
or spyder, if you like it. many people like vscode
on linux, your default text editor supports syntax highlighting
or you can use vim, emacs, or (the one i use) micro
You also are on linux?😎
I heard that vim takes ages to learn how to use it properly and to get accustomed to it's features
Because I'm a destructive person
I probably should take some of that energy.. *Stares at years old py2.7 files I refuse to delete, but never use*
in plain words: i used this dataframe library
curious as to how you would do data science without notebooks
Regular py files and running them from the command line.
what's the advantage there?
Linear execution order, being able to import stuff, modularity
Notebooks are fine for exploratory analysis. But they're bad for production.
I mostly use an IPython repl for quick experimentation though.
I mean being able to import stuff that you create
if you define a function in a notebook, how do you import that into something else?
idk. it seems like you have a strong opinion about this, and i won't try to change your mind, but it is possible to import functions defined in notebooks
we used notebooks in a pretty serious cross-lab collaboration, it didn't seem to be an issue tbh
but yea, everyone has their tastes
how do you import functions defined in a notebook?
i mean i just googled this: https://duckduckgo.com/?q=importing+functions+from+another+jupyter+notebook&t=newext&atb=v312-1&ia=web
DuckDuckGo. Privacy, Simplified.
if that was a serious, non-rhetorical question
I for one use Atom+Hydrogen, which allows for me to run python code somewhat like IPython but in normal .py files - that way I can test stuff while still keeping it well formatted
I admit that importing stuff can be a pain though, so breakpoints + to/from clipboard is useful
What's hydrogen
I use atom. And ran it earlier in the regular python console. Now I do it with vscode.
Damn. Looks like it is a transformation of atom into vscode
What's better in life 😁
!e
import pandas as pd df = pd.DataFrame([[1, 1, 'a'], [1, 2, 'b'], [1, 3, 'c'], [1, 4, 'd'], [2, 7, 'e'], [2, 10, 'f'], [3, 0, 'g']], columns = ['user_id', 'timestamp', 'auxillary']) print(f"Before:\n {df}") df['rank'] = df.groupby('user_id')['timestamp'].rank(method='min', ascending=True) df = df[df['rank'] <= 2] print(f"After:\n {df[['user_id', 'timestamp', 'auxillary']]}")
@echo vigil :white_check_mark: Your eval job has completed with return code 0.
001 | Before:
002 | user_id timestamp auxillary
003 | 0 1 1 a
004 | 1 1 2 b
005 | 2 1 3 c
006 | 3 1 4 d
007 | 4 2 7 e
008 | 5 2 10 f
009 | 6 3 0 g
010 | After:
011 | user_id timestamp auxillary
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/xajiseride.txt?noredirect
Correct me if I'm wrong but that seems to be working fine -- what should be different about the output on this test case?
Is there way I can make changes on installed libraries on Kaggle?
if i had to do this in a prod environment, i would use a tool that allows you to track model data
nbconvert 🙂
dvc is an excellent tool for this kind of thing
ive heard great things about it. the creator spoke about it on a podcast once lol
i can highly recommend it, I've used it quite a bit
I can't speak to how it would scale across a large team, but I've used it in a small team and it worked pretty well
It also made sharing data and trained models pretty easy, less time wasted re-computing things
i can't wait for https://mlem.ai/
it's a project related to dvc. a trained model database, if i understand correctly
can someone help with an exe of a bot i built please?
an exe of a bot. doesn't sound like a data science question
okay, so it's not a discord bot? what is the problem?
no not a discord bot
a chess bot
trying to figuring out it's language, i tried like 3 decompilers
maybe its py
not sure tho
its an program but i forgot what language it is written in
if the question is really just about decompiling, try a general help channel. see #❓|how-to-get-help
mhmh well not what im looking exactly for but i'll give a shot ok
How would I fix this?
The code: ```py
from bitcoin import *
import bitcoinlib
def new():
public_key=privtopub(private_key)
private_key = random_key()
public_key=private_key.to_public()
address=public_key.to_address('P2PKH')
P2WPKH=public_key.to_address('P2WPKH')
print(f'Priv key>{private_key}\n Pub key> {public_key}\n Address>{address}\n P2WPKH> {P2WPKH}')
new()
Not sure if this would be the right channel but
from ib_insync import *
from numpy import *
import yfinance as yf
import time
import pandas as pd
ib = IB()
#vars
contract = Stock('MSFT', 'SMART', 'USD')
nextOperation = True #True = buy, False = sell
#Buys the asset if its price decreased by more than the threshold.
dipThreshold = -2.25
#Buys the asset if its price increased by more than the threshold.
upTrendThreshold = 1.5
#Sells the asset if its price has increased above the threshold since bought.
profitThreshold = 1.25
#Sells if its price has decrease by more than this threshold to stop loses.
stopLossThreshold = -2
def returnCurrentPrice():
stock = yf.Ticker('MSFT')
return stock.info["regularMarketPrice"]
def returnHistoricalPrice():
stock = yf.Ticker('AMD')
return stock.history(period='7d', interval='1d')
lastOpPrice = 90
buyOrder = LimitOrder('BUY', 1, returnCurrentPrice())
sellOrder = LimitOrder('SELL', 1, returnCurrentPrice())
def returnBalance():
accountBalanceString = ib.accountSummary()
for a in accountBalanceString:
if a.tag=="CashBalance":
return double(a.value)
#funcs
def attemptToMakeTrade():
precentageDiff = (returnCurrentPrice() - lastOpPrice)/lastOpPrice*100
if nextOperation == True:
tryToBuy(precentageDiff)
else:
tryToSell(precentageDiff)
def placeBuyOrder():
print("Placing buy order")
ib.placeOrder(contract, buyOrder)
lastOpPrice = returnCurrentPrice()
def placeSellOrder():
print("Placing sell order")
ib.placeOrder(contract, sellOrder)
lastOpPrice = returnCurrentPrice()
def tryToBuy(precentageDiff):
if precentageDiff >= upTrendThreshold or precentageDiff <= dipThreshold:
placeBuyOrder()
nextOperation = False
print("Next operation is", nextOperation)
def tryToSell(precentageDiff):
if precentageDiff >= profitThreshold or precentageDiff <= stopLossThreshold:
placeSellOrder()
nextOperation = True
print("Next operation is", nextOperation)
def startBot():
print("Starting...")
ib.connect('127.0.0.1', 7497, clientId=1)
lastOpPrice = returnCurrentPrice()
nextOperation = True
while(1 > 0):
attemptToMakeTrade()
time.sleep(30)
startBot()
Its expected to change the nextOperation bool but instead just repeats the tryToBuy and placeBuyOrder
and it just repeats that non stop
sorry if its a bit of a text wall
:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1652142105:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
lmao
pytorch_grad_cam/base_cam.py line 62
def forward(self,```
`pytorch_grad_cam/base_cam.py` line 74
```py
outputs = self.activations_and_grads(input_tensor)```
`pytorch_grad_cam/activations_and_gradients.py` line 39
```py
def __call__(self, x):```
This bot is amazing.
No, you are 
looking for some help with ray and custom environments
code is bare-bones and very very simple, and the environment is as simplest ive ever seen
entire traceback is also included
https://bpa.st/F5SA
hmm, seems i needed to follow the directions on the ray website and add the env_config param to the env __init__
however, getting some numpy errors now 😕
Guys, how do you change the figsize for jointplots and pairplots with Seaborn? I tried going for the usual plt.figure(figsize=(12,6)) but it clearly doesn't work, couldn't find anywhere how.
im trying to figure out how i would get started with artificial intelligence, say for example with text. how might i go about doing that?
like is there any good tutorials, websites anything reallly
would you mind expanding on that?
This is great notebook, which you can run on GPU for free on kaggle https://www.kaggle.com/code/jhoward/getting-started-with-nlp-for-absolute-beginners
fastai courses are great for deep learning as well course.fast.ai
So this is one of the ways and it's a practical way, you learn stats, and all else on a learn as you need basis. So you can start playing the real game, NLP in your case as soon as possible. With the current libraries like hugging face you can get a lot done with minimal code or stats. Then you learn as you go.
Hugging face NLP course is great too https://huggingface.co/course/chapter1/1
is python 3.10 supported by major DS libs? like pandas, scki-kit, matplotlib, pytorch, etc. ?
Sadly not yet by pytorch (soon!), but most others do.
Heya, so this code is giving me the error belowpy from bitcoin import * import bitcoinlib def new(): private_key = random_key() public_key=private_key.to_public() address=public_key.to_address('P2PKH') P2WPKH=public_key.to_address('P2WPKH') print(f'Priv key>{private_key}\n Pub key> {public_key}\n Address>{address}\n P2WPKH> {P2WPKH}') new()
Error: Desktop>python main2.py Traceback (most recent call last): File "main2.py", line 12, in <module> new() File "main2.py", line 8, in new public_key=private_key.to_public() AttributeError: 'str' object has no attribute 'to_public'
How can I fix that?
what is the best ML library that is user friendly that is not tensorflow or keras
@eager cloak what is the best ML library that is user friendly that is not tensorflow or keras
hi
hi, I get a warning when i do this but it works, I dont know the right way to do this
for col in df.columns[1:]:
df[col] = pd.to_numeric(df[col], errors='coerce')
I get this warning. Could not find a way to solve this on stack over flow
/local_disk0/tmp/1652161177867-0/PythonShell.py:147: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.io.write(decoded)
please @ me
I have a supervised classification problem (yes,no), but the data is highly imbalanced. Yes - 5% and No - 95%.
Is it possible to make a good prediction model using this data?
pytorch with fastai
I don't understand why it is deleting the 3rd and 4th row 🤔
this is the result I'm looking for:
a | 1.0
a | 1.0
a | 1.0
b | 2.0
c | 3.0
c | 3.0
d | 4.0
a count of distinct values
why u asking me lol
seems like random_key returns string, did you expect something else?
use metric for imbalanced data, you could also use training techniques for imbalanced data, like undersampling, oversampling.
i mean.. no, tho why is it giving an error?
Okay, thank you!
Hi, I have a question. Is that possible to create product recommender systems based on their phone number instead of their username on website since
my sales dataset mostly came from whatsapp
I think I got it to work
Have anyone worked on Time Series Classification? I am working on a problem now where I have to forecast the change in insurance carrier.
The inputs I have now so far is the Insurance Carrier Name, Companies and Premium amount of each company.
So the goal is to build a model that takes these inputs and forecast whether the company would change their carrier in future or not.
Still now I tried windowing techniques but I feel like Im missing out somewhere. There are almost 99 different companies.
Any help or tidbits to get started on this would be highly appreciated.
This is how the sample data looks.
how to create an ai voice assistant
ask siri
yes, you can use the phone number like an id
Hi, has anyone here used stablebasline3?
using py bruh
yes
Hi im training a chess AI using self play. I have a couple questions about rewards and training if you have the time
hey guys, How to combine multiple hyperspectral images into one? ```py
Example: Iterate over data set
for sample in dataset:
datacube, labelmap = sample
print(datacube.shape, labelmap.shape)
output -
(240,520,16)(240,520)
(240,520,16)(240,520)
(240,520,16)(240,520)
(240,520,16)(240,520)
Hi, I'm currently working on Time Series. I can look into this if you provide me a few more details.
Sure what other information you need? or else you want me to DM?
yes you can send the info over DM.
does anyone have good references for document verification methods? i wonder if non-CNN approach is available somewhere
document verification. what are you trying to verify?
just if a word or sentence is in the document
like let said the sender name should be inside the document
i know there is possibility that the name in the document will be abbreviated, so i will try to check the similarity of the phrase in the document with the sender name
If you already knew the specific words/sentence you wanna search for in the document, you could do something like this
list_of_words = ['Money Laundering', 'Police', 'Weed', 'Cocaine', 'Drugs', 'Gun']
df.loc[df["document_message"].str.contains('|'.join(list_of_words), na=False)]
'''
If you wanna flag 🚩 document where such words appeared you could do this
'''
df['flag'] = np.where( (df['document'].str.contains('|'.join(list_of_words)) == True) 1, 0)
why not
df['flag'] = df.loc[df["document_message"].str.contains('|'.join(list_of_words), na=False)].astype(bool)
The one that comes to mind first. This is shorter and it works as well.
!docs
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
can anyone give me ML docs
for what library?
Does someone here have experience with OpenMMLabs?
will take note this
👍
This might be a dumb question but can someone please explain me the use of 'passthrough' in this code and in general?
A bit. What's your question
Do you know how img_metas should be formed? I'm using another library and it sends the image to mmdet without that, which causes an error. I changed some lines in the libraries to get past it but now I have to manually declare it in the library's code
Planing to declare it here https://github.com/open-mmlab/mmdetection/blob/73b4e65a6a30435ef6a35f405e3474a4d9cfb234/mmdet/models/detectors/base.py#L112
Right now img_metas is NoneType
mmdet/models/detectors/base.py line 112
def forward_test(self, imgs, img_metas, **kwargs):```
Try paddle ocr, light but powerful
Python machine learning
What ML library? There are many of them @lapis sequoia
There are a lot of those. You have to know which library you want before you can look for the docs of it.
oh idk about ml lib let me check it out
It sounds like there's a misunderstanding here. What are you trying to do? Just learn about machine learning?
i want to learn machine learning
im confused
um these?
can u suggest which one should i use? irdk
Not sure this is the channel, but anyone done any quantum computing work?
We don't really have a channel for that. You can try a general help channel, but ask your actual question.
Sounds like you should read a machine learning book. All those libraries solve different problems. Learning individual libraries is not a viable way to learn ml
okay
@spare briar i remember you were into bandits https://eugeneyan.com/writing/bandits/
Yeah I was just wondering. Quantum has a lot of overlap with Data Science stuff in my head, was wondering if some people here did it.
Just because Quantum is 100% about state following a probability distribution.
Everywhere I go I see his(rex) face.

No don't leave
I need some help. I have created a handwritten data of 1000 png images and now I'm stuck with the part of pre-processing and feature engineering. I mean I am facing problems with how to import, load, and process the data for model training with scikit-learn.
someone help me in finding the syntax error in this?
How do you know you have syntax error here?
Whats your end goal? Image classification?
i run the code n i got the error
is there no error in the code?
What's the error message?
SYNTAX ERROR : invalid syntax in line 129 which is the 1st line of "Output(...)"
is this plotly?
from plotly website i see this should be the syntax?:
@app.callback(
Output(component_id='body-div', component_property='children'),
Input(component_id='show-secret', component_property='n_clicks')
)
yup
oh thanks a lot
Guys can you help me
Hey I am looking for some amazing people to colab on a project, anyone's interested.
I am a beginner in ML and wanted to begin with some good project if we could make up a team or someone could add to his project would love to help
Me too buddy. Add me to your team
That' cool, let just others reply than we form a team
@gray swallow y ( the label) should be a 1dimension array
How can I turn a P2PKH BTC address into a P2SH address?
do you want to use multi output ? https://scikit-learn.org/stable/modules/multiclass.html
This section of the user guide covers functionality related to multi-learning problems, including multiclass, multilabel, and multioutput classification and regression. The modules in this section ...
Thanks let me check it
but figure out first if you want to use multioutup or sigle output is sufficient for you
I used list of tuples
what dataset are you using
Here is the dataset. I combined risk_score and default values as outputs . So I zipped them into a list of tuples and did the same for the X values
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
import numpy as np
from sympy import together
import matplotlib.pyplot as pyplot
import pickle
from matplotlib import style
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics
import data
dataset = pd.read_csv('financial_data.csv')
now we start the Exploratory Data Analysis process
print(dataset.head())
response = dataset["default", "risk_score"]
response values
default_values = dataset["default"]
risk_score_values = dataset["risk_score"]
place_holders = np.zeros( len(risk_score_values))
print(place_holders)
exit()
Y_values is a combination of risk score and default values
Y_values = list(zip(risk_score_values, default_values, place_holders, place_holders, place_holders, place_holders, place_holders, place_holders))
Y_values = list(zip(risk_score_values, default_values))
X values defined here
age = dataset["age"]
pay_schedule = dataset["pay_schedule"]
home_owner = dataset["home_owner"]
income = dataset["income"]
months_employed = dataset["months_employed"]
years_employed = dataset["years_employed"]
has_debt = dataset["has_debt"]
amount_requested = dataset["amount_requested"]
zippping X values together
X_values = list(zip(age, pay_schedule, home_owner, income, months_employed, years_employed, has_debt,amount_requested ))
drop columns
dataset = dataset.drop(columns=["default","entry_id", "risk_score"])
X_train,X_test, y_train, y_test = train_test_split(dataset, Y_values, test_size=0.2, random_state=0)
clf = svm.SVC()
clf.fit(X_train, y_train)
y_predict = clf.predict(X_test)
acc = metrics.accuracy_score(y_test, y_predict)
print(acc)
That was the code
No bro that's now how svm works, It requires a label of 1d array not combination of all the col
choose one feature as your label and then make predictions
I didn't know about that. Thanks bro.
Here i can see Risk_score is the label here, therfore y = dataset["risk_score"]
Your code would work more nice if you use jupyter notebook or google colab
you can do the process step by step though jupyter notebook
Ok thanks bro
I'm creating a system to predict whether any applicant would default or not.
Will it work if I use default as my label here
yes definitely
How to solve this?
RuntimeError: CUDA out of memory. Tried to allocate 5.56 GiB (GPU 0; 11.17 GiB total capacity; 6.59 GiB already allocated; 3.88 GiB free; 6.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
All the text is styled in yellow, is there a way for me to seperate it?
Like make the unblurred text yellow and the other text green
This is essentially the code```py
from bitcoinaddress import Wallet
info=input(f"{bcolors.RESET}Private key>{bcolors.YELLOW} ")
wallet = Wallet(info)
print(wallet)
does anyone here know how i could add a tensor of shape (3,4) to a specific part of a larger tensor for example a tensor with shape (40,10) and add it to the point of (20:3,3:4) of the large tensor
with tensorflow
it is a small part of a school assingment the entire assingment is making a tetris ai and this is just for the tetris game part
pls ping me if you reply to me
what do you mean by the point (20:3,3:4)? you mean (20:23,3:7)?
you are trying to fill more stuff into your gpu that it can handle. use smaller model, smaller batch size, if these are images, lower image size...
Hey guys, can someone give me some help with Gym-Retro? I didn't edit not even a single thing in the files, yet I still get this error.
This is (literally) the entire code(excluding imports):
env = retro.make(game="StreetFighterIISpecialChampionEdition-Genesis")
obs = env.reset()
env.render(close=True) # habilita o fechamento da janela
env.close()
checkpoint = CheckpointCallback(save_freq=100000, save_path="D:/Python/Projects/Hakisa")
model = PPO2(policy="CnnPolicy", env=env, gamma=0.99, n_steps=64, learning_rate=3e-9, vf_coef=0.5, verbose=1)
start = time.time()
model.learn(total_timesteps=1000000, log_interval=100, reset_num_timesteps=True, callback=checkpoint)
end = time.time()
print("Duration: ", (end-start)/3600)
PS: Ping me if you can help.
hi, if noone here knows you could keep an eye on this thread post https://github.com/openai/large-scale-curiosity/issues/8 (the one from yesterday)
Oh yes, I've looked in retro_env.py and there's nothing unusual there. There's a self.em attribute correctly assgined, etc. There's only a garbage collector to ensure there'll be only a single emulator running at a time.
Unless that garbage collector is deleting the current emulator I'm running, but this shouldn't be happening, right?
this is my code
inputs = self.tokenizer([first_token] + name.tolist(),
max_length=self.max_length,
truncation=self.truncation,
padding=self.padding,
return_tensors='pt').to(device)
attention = inputs.attention_mask
outputs = self.model(**inputs)
len of the name = 121401
Hi Yall, I'm working on a project where we are trying to optimize some cooling patterns for a specific manufacturing process and the simulation result will be similar to this image and I also have the data with coordinates and temperatures at those certain points during that process. So I was wondering what sort of ML algorithms I would need to optimize the cooling patterns
no i mean he second one i dont think that i thought much when i said it so (20:23,3:7)
all right. well the main issue is that tensorflow likes keeping things const
ik so is there a way to add it to it
one work around is to circumvent it entirely: cast your tensors to numpy nd arrays, do the math there, and then make a new tensorflow tensor and assign it to the original one
alternatively, you can make tensorflow modify the entries of the tensor, but i don't recall how to do that in any easy way
sure, that's also an option, but that's very inefficient
well it is only for very small tensors so memory isn't a issue
it would probably be like 10x faster than the numpy solution
probably, yeah
because the data doesn't have to be moved from gpu to cpu and then back to the gpu
not that that would be a solution that i would want because i dont know how it will pad the image
how about this
Scatter updates into an existing tensor according to indices.
Adds sparse updates to an existing tensor according to indices.
this works, for example:
import tensorflow as tf
import numpy as np
x = tf.constant( [[1,2,3],
[4,5,6],
[7,8,9]])
y = tf.constant( [[1,1],
[3,3]] )
indices = np.zeros((2,2,2), dtype=int) #2 x num rows in y x num cols in y
indices[0,:,:] = np.arange(0,2).reshape(-1,1) #row indices
indices[1,:,:] = np.arange(1,3).reshape(1,-1) #col indices
indices = indices.reshape(2,-1).T #reshaped into size #inds x 2
print(indices)
print(tf.tensor_scatter_nd_add(x, indices, tf.reshape(y, [4])))
pretty convoluted though, and does not add in-place. it makes a new tensor
Hellow everyone
I tried a lot of times to learn reinforcement learning, but I always find myself lost or I don't know what's the next step or what to learn.
I would like to put a plan for myself to learn so I don't find myself one day lost and without a task.
Can you guys help me with that, would appreciate it.
Like with resources, a link, a plan or anything.
Thank you 
what's your batch size?
you get this error when you train or make prediction?
i was thinking more something like https://www.tensorflow.org/api_docs/python/tf/pad
it seems to support a bit more things
and no numpy needed
although i dont know if i could make a solution that jit compiles well
no numpy was needed in what i did either. this thing also makes the tensor sparse, so the 0 entries don't wase more memory
what i did was just a makeshift cartesian product. there are other ways of doing it without numpy
and it was only to make the indices, not the actual tf tensors
oh to place the thing at the right place i think it will be just some simple math
probably the same math i put there 😛
numpy and tensorflow (probably) should have a meshgrid function that does the same thing
Broadcasts parameters for evaluation on an N-D grid.
there you go
but this will make the index array in gpu, which i thought was kinda pointless
i dont think that a index array would be possible because each game will be slightly different
you need the index array both for padding and for the function i suggested
otherwise you don't know where in the larger tensor to add the smaller one
it is always at the same place at the start
sure, and then it will presumably change as time goes on. you have to track it some how
either in gpu or memory
well my solution for that was to make it 0.5 instead of 1
make what 0.5?
the value of the one that can move
idk what you mean by that. if it works for you, cool 😛
but in any case, you always need to know WHERE to add the smaller tensor, otherwise the operation is not well defined. neither in math nor in code
ik and i also know that i should try to avoid stupid mistakes like #data-science-and-ml message
Make prediction in model(**inputs)
Throw me error
is this the channel to talk about pandas in?
the animal? no. the package? yes
excellent
i have a query,
i had three dataframes that i believe i correctly processed so that their columns were identical and any that were not present in all three were removed
as you can see they were all reduced to 116
but i used the concatenate command on them, expecting them all to merge seamlessly and remain at 116 columns but i have experienced this unexpected result and i am not sure why
( i added one extra column)
the display comes out with 233 columns and i am not sure why
I don't remember how concat works but I think it might be because the column names aren't the same
but i think i used the column names to filter them out initially
what is the contents of q_set?
so basically all three data frames had a number of columns, i then reduced them down so that the only columns left in each were columns that were present in all three dfs
for df in df_array:
for question in df.loc[0]:
map[question] = map.get(question, 0) + 1
q_set = set()
for key, val in map.items():
if val == 3:
q_set.add(key)
print(len(q_set))
116
Are there null values in the columns of the final frame?
You might need to specify axis=1
i assume you mean in the newly created df?
one moment
Also I'd try if set(df1.columns) == set(df2.columns)
what will that do, sorry my brain is a bit fried atm from something else
creating two sets and comparing them?
Yes, just testing that the columns in each of the data frame are the same
Also I'd try adding axis=1 in your statement and see if that fixes it
Either your problem is that the columns aren't the same names or something else I don't know
this actually increases the columns from 233 to 351, but 351 is 3*117 so that makes some kind of sense i suppose
ill try the set
i guess the problem is here somewhere
Looks like it, try getting q_set by doing the intersections of those sets
Not sure if I'm remembering right, but I think it may only allow you to do 1 intersection at a time
Hi does anyone know how to overlay 2 images over one another in python i.e. the mask onto the original image using the numpy arrays?
What's the model size? And input size? Does these both fit into GPU memory?
you could, for instance, use matplotlib to plot one of the images first, then plot the other on top with transparency and a possibly different color. alternatively, you could use the images in gray-scale to define different color layers of an RGB(a) image
Looks like there's only 31 common columns. Is concat working now?
with transparency and different colour, how would I do this?
do you mean doing concat as per before or changing the formulae
instead of me coding it, you can check out this example on stackoverflow. i think it shows exactly what you want: https://stackoverflow.com/questions/31877353/overlay-an-image-segmentation-with-numpy-and-matplotlib
namely, the accepted response
I think axis=1 was wrong, try without
thanks
Make sure you update the frames with the function you wrote before to drop everything not in the set
So i have to switch rows and columns without using the transpose function. This code is somehow not working, and is giving errors because I use matrix[y] and matrix[x][y]. I dont get why, does someone know how to fix this code?
ah. sadly i've never used pandas :x hopefully someone else can help you out
thank you anyways!
hello, I might be able to help, but I won't look at screenshots of code. If you have a numpy array named arr, arr.T should be sufficient to transpose it most of the time. There are other options if you have more than two dimensions and you don't just want to reverse them.
for this function i'm not allowed to use df.T. I am trying to do it with a for loop but it's giving errors
why are you not allowed to use .T? is this part of an assignment?
yes it's an assignment
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
^ this is the only format I will accept. I will also not accept screenshots of error messages.
def transpose(matrix):
transposed = pd.DataFrame()
for y in range(len(matrix)):
for x in range(len(matrix[y])):
transposed[y][x] = matrix[x][y]
return transposed
my impression is that, instead of a double loop, you'd wanna iterate through the rows or columns only. pick one, not the other. read the lists, and put them into another list of lists. then make a new dataframe from that. i don't know the exact functions to do that though
There are a few problems here:
- You would need to initialize the DataFrame with empty cells for the desired shape
- You need to also transpose the columns and indices of the original DataFrame and put those in the transposed DataFrame
- You index Dataframes with the
ilocaccessor, which means "index location". See the next point: - DataFrames are one data structure. They are not a list of lists. Expressions like
transposed[y][x]should betransposed.iloc[x, y]because you are indexing the one data structure, not two.
Keep in mind that you would never do this in a real situation, and this assignment is almost completely pointless.
any "real" code that involves allocating empty space in a DataFrame and then putting stuff into it later is wrong.
oh, you also need to be using .iloc
okay, i guess that should help me for now, thank you! It's indeed a really annoying assignment since df.T would be so much easier and faster ;/
as long as you understand why you should never actually write code like this, I suppose it might be useful for helping you understand what all .T does.
Sorry to barge in but I was told I should ask here instead of the general question channel
Let's say I'm trying to predict a category (decided using arbitrary statistical cutoffs) using these same cutoffs with other additional statistics. How do I determine how well my regression predicts a category without arbitrarily assigning them beforehand?
Or is it just impossible?
I'm not sure I follow. do you mind making your question less abstract? what are the categories, and what are the "statistical cutoffs"?
I'm working on a project on categorizing poker player types. I use 3 statistics VPIP, PRF, and Limping to decide which player is which category. If these 3 values are within certain ranges they are marked as that player type. Then I run an aggregate on each category onto a larger set of statistics. Then I use both these aggregate statistics and the 3 statistics I used to sort category to predict player category
I'm trying to see if this makes sense or I'm doing something really wrong
so you are trying to decide what category a poker player belongs to. in data science we call this "classification". so you care trying to predict the classes of poker players.
Yes
well, I guess it might not be classification per se if the class is determined by properties of the player that are always known
anyway
it does make sense. you could take the statistics as a vector of numbers input into your classifier, and the output is the class the player belongs to (encoded in some way. probabilities, most likely)
nothing about it immediately jumps out at me as terrible
The idea is to first create broad categorizations of poker players to create strong counter strategies
and then try to predict which category a random player is given their stats (to choose the proper strategy)
the only thing is that depending on how the aggregates are computed, some might be redundant. not a huge problem if it's not that many
I'm trying to do this in python but I'm not sure how exactly I would structure this
you're essentially trying to pair up vectors of these statistics with labels. how you do that is a bit more freehand, though. for example, the labels could be encoded as a single number or as a vector. i also wouldn't know what the best architecture for this is. a support vector machine can split up the parameter space with hyperplanes, but you probably don't know a priori that this is the best way of doing it
it feels like it would just predict with 100% accuracy if I include the 3 main stats I used to categorize in the first place
if you generate the numbers yourself perfectly and without any noise, and this model perfectly describes what you are analyzing, yes, that would be the case (and also if the phenomenon can be measured without noise). this is why results that are entirely synthetic should be taken with a grain of salt if you don't put some work into noise statistics
so I need to add some randomization into how I categorize?
you'd probably wanna contaminate the measured numbers on purpose, or as you say, label some data incorrectly, or something like that. if you're making everything up and there is no noise or anything, and you know the model perfectly, there's hardly any need to use ML in the first place
you could use classical optimization methods on your known model to learn the parameters and that's all
there's a lot of noise in poker statistics
since it takes an obnoxious amount of hands to be certain about stats
that's not really noise, just missing info. the thing is, if you are making up a definition yourself in which you assign a label by hand to some set of vectors using something simple like inequalities, the problem is very easy and you don't need ML 😛
so id just use a regression?
if you only have 3 parameters, you wouldn't even need that
so you think I should go the opposite direction and see how the aggregate stats NOT included in the 3 parameters predicts class?
but for the sake of argument, let's say these 3 numbers are indeed noisy with unknown noise stats. and that maybe the labelling process is a bit more complicated. then it makes sense to use ML.
well, why would you not include the 3 params?
the question i'm asking you is, is the problem so easy that you don't really need ML? there's no point in hiding the parameters "just because"
wouldn't they perfectly predict class? since those are the only inputs
if there is no noise, yes
there is lots of noise
so my question goes kind of like this
let's say we have numbers a and b in a vector, [a,b]. if a > c_0 and b > c_1, then we assign class A
so on and so forth. this gives us 4 classes.
next we receive the numbers [a,b] from some scenario we've never seen before. do we know anything about a and b a priori? can we trust the numbers, or do they have mistakes?
these numbers have mistakes
and going a step further, do we know c_0 and c_1, or not?
we define c_0 and c_1
all right. then it does make sense. it's a probability estimation problem. how certain one is that the underlying a > c_0 given only a noisy a, and having learned noise statistics from previous examples
alright so how would I go about coding that?
should I use sample size to determine how noisy a/b are expected to be?
and then take into account that noise when assigning class?
so instead of saying this player is 100% class Y or class Z
we say there's a 40% chance of them being class Y and a 60% chance of them being class Z?
i don't think you have an appropriate model to say that though, unless you have good reason to make a probability distribution dependent on those 3 numbers
if most of the data is labelled correctly, it should be ok as is
poker standard deviation is generally agreed upon? although depending on playstyle it ranges
i can't comment on that, idk
but if it isn't needed I won't bother with it lol
so how do we decide when the model incorrectly labels a player?
vs correctly
yes there;s noise in the input stats but if the input stats are all we use to classsify wouldn't it necessarily be 100% accurate?
the only way is to have some ground truth you labeled by hand, then
since the labelled data using only thresholds can be incorrect
this feels like a chicken and egg problem
you can't very it because you don't know the ground truth
this is more a clustering problem than a classification one, then
there is nothing to very against
yeah actual player type is impossible to be certain about
we can only have a level of certainty over the stats we have
then maybe approach is as clustering instead
clustering would be letting the data catogorize itself?
pretty much
though in fairness, even if the data has a few mistakes in the labelling, it should still work
what if it just says nothing though? that there's too much noise to be certain that any player type exists
that's a result in itself, then, isn't it?
yeah but that's pretty obviously incorrect
what i mean is that it would mean the data has no inner structure, and the categories were arbitrary from the get-go
if that's the case, there is no difference from simply doing a set of inequalities yourself, no network needed
what do you mean by this?
alright Ill do this as well I guess
don't worry about it. go ahead and try it out
label the data somehow and see if the network learns anything interesting
so first id
label player type based on those 3 stats
let machine algorithm try to create its own player types
see the correlation between the 2?
alright that's something I probably know how to do
in that case which clustering method should I use
I plan on having 5 player types
that's one thing. the other is to also treat it as a classifier problem and see what the network does. you'd expect it modifies the threshold values in some sense. idk how well it will work out with so few dimensions though.
how would I do that
Ill just do both tbh
that would be the usual thing of taking an image and saying it's a dog, take another image and say it's a cat
and feed them to the network
i would say to try both approaches, yeah
see if you find anything interesting
so id include the aggregate statistics into the machine learning model right
alongside the main 3 classifiers?
you can if you want, why not. you can also test if including them makes any difference, too
yeah tbh the one thing ive learned in my economectrics class is that data science is basically trying a bunch of random shit and seeing if it tells you anything
😂
especially with this type of black box ML you're dealing with here, where the model is based on... not much
it gets pretty artistic
there are other problems where the architecture and target functions have very good foundations, but i don't think this is one of them 😛 play around and see if anything interesting comes out
yeah I'm just trying to see if I can get something useful using hud stats
and potentially classify players on the fly
using limited/noisy information
causal stats is making a rise since its more explainable than DL

however you usually need an SME for such methods
what's an SME?
subject matter expert
for the domain you are working in
this is crucial for making sure you are looking at the right features and your assumptions are correct
right, that's why i was asking you so much annoying stuff about the stats
if you could incorporate that, it'd be better
how would I incorporate that though?
you could turn it into a parametric estimation problem where you update your priors on the distributions
use a SME to manually classify each player?
basically have ML-powered bayesian estimation
if you could assign a parametric family to how each of the numbers you feed the network are distributed, that'd be a good start
you mentioned variance earlier, but my guess is that that is precisely... a guess. that the numbers were assumed to be gaussian distributed
the variance will be immensely different
ah I use poker variance calculator
the idea is to find the parametric model that best explains the observed data
you can read into maximum likelihood and bayesian estimation layer
for now i'd suggest to try what we discussed now, and see if it does what you wanted
wave your hands wildly while yelling central limit theorem
then the name of the game is hyperparameters
each of your 3 statistics is contaminated by noise, and so it has its own mean and variance for what you say is usually regarded a normal distribution
ah yep
the idea is to have the network estimate the mean somehow
and then to realize that both the mean and variance have their own... mean and variance
if the variance is low, you can trust the result (these are the so-called "error bars" and are related to p values, as some people call them)
then you can attach a certainty to your classification
you can do that with usual classification btw, but that does not consider the statistics of the measured parameters explicitly
i would still recommend you begin with what we discussed earlier because this other stuff can get hairy. if you have time constraints, gotta use that time wisely
yeah I have literally 24 hours
but this 3rd thing seems super interesting
how would I go about doing it? ill do it last if I have time
you would need to make a custom cost function based on the maximum likelihood estimation. since this is gaussian, funnily enough, it's the same as doing least squares. however, the knowledge that each parameter is gaussian distributed instead of normal distributed means there is an extra term multiplied to it (added, after taking the log-likelihood) that depends on the statistics of each parameter
yeah I have zero idea how to do that lol
I've learned about maximum likelihood estimation but I thought it was for binary prediction
i would more call it a "philosophy" or solution approach. make a cost function that searches for the parameters that maximize the probability of observing the given data
it can be done in many different settings
shit if only I had an extra week to work on this
ah well I can always do it as like a curiosity after the due date
and for labeling I should only have one column for it right?
rather than 5 different columns with binary 0/1?
either is ok. the difference is whether you use categorical or sparse categorical cross entropy
what does that mean
the cost function to minimize is different, but the problems are equivalent
ah if they can have multiple classes at once?
ah nvm
they legit are just the same
yep
im lazy so id rather have 1 column only
Hi i'm trying to create an algorithm to bin my data but i'm not sure how to build it, like i could create a for loop and an if statement for each but that would be very long
Does anyone know if Python has any deconvolution algorithms for image preprocessing/processing and if so could you share the links to them. As I've only found the weiner and richard_lucy ones?
Heya!
How can I make it so that it just prints the response? (in this case, Hi, my friend, what can I do for you today?)
if you already have the values of t in a vector and have chosen T, you can compute all the values of tau. then, once you have the tau values, apply to all of them the procedure described in the text. in reality, splitting tau into bins follows the same formula that one uses to find the "phase" of the observations. having chosen M, you divide the period T by M, let's call this w = T/M. this is the width of each bin. then to find the bin index, you take tau/w and round down, i.e. floor(tau/w). if you have the values of t and/or tau in a vector, this requires no loops at all if using numpy arrays, for example.
The response is a dictionary, so to refer to the part of the response you want to print you would refer to it as response.json()['cnt']
okay thank you
❤️ 

Hi so I have the values of tau(phases) in a array and I tried using pandas to bin it but i'm not sure its working right
Tp=1/fr#time period
print(Tp)
phases = foldAt(t, Tp, T0=0)
plt.figure()
#data is the intensity
x7, y7 = zip(*sorted(zip(phases, data)))#ensures x and y values correspond to each others in pairs when sorted
plt.plot(x7, y7)
df4 = pd.DataFrame({'X' : x7, 'Y' : y7}) #we build a dataframe from the data
M=20#no of bins
bins=np.arange(0, max(phases), step=Tp/M)
categorical_object = pd.cut(x7, bins)
count=pd.value_counts(categorical_object)
grp = df4.groupby(by = categorical_object) #we group the data by the cut
ret = grp.aggregate(np.mean)```
pd.cut looks likei t should do what you ask it to, yeah
pd is what you imported pandas as, right? should it be count = x7.value_counts(...) ?
I am new to data science and ai and I am a self taught learn.
What are basic topic to learn
you wanna check out stuff like numpy, pandas, tensorflow, pytorch, jax on the python side
and at least some multivariable calculus, linear algebra, statistics, and optimization on the math side
when i do that it says ``` count=x7.value_counts(categorical_object)
AttributeError: 'tuple' object has no attribute 'value_counts'```
its giving wiered values like just 2's in a small amount of bins and zero in others
seems about right. it'll depend on your actual data though, you'd have to plot it and see
try df4.value_counts, then? i'm not sure what a good way to do this with pandas is, tbh. i'd just do it with numpy
yeah still gives an error
how would i do it with numpy
just do the math operations on the phases. or easier, use numpy histogram, since it seems that's what you're trying to make
but i still would have to put the values into the bins right
to do what?
!Resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Regression, Classification, Dimensionality Reduction (LDA, PCA), Clustering. Those are the basic topics
You'd need to feed your tokenized text to your NN. So apply TF-IDF before feeding it to GRU. Or better still, use the tokenizer function in Keras directly.
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts('pass your input text here')
If you want to pad the sequence (text to sequence), then you need to also import the pad_sequence function from Keras
Sorry, I omitted 's'. It should be fit_on_texts... It's been fixed now.
let me check
should I use density based hierchal or partional clustering
when I expect the different groups to have some overlap
What's your X?
vector
usig tf idf vectorizing
using
i think this method is not working on tf idf
its working on word2vec
what's a good resource for how to best structure my feature set for a binary classification problem of predicting an event 2 months in the future? .. not sure how my features should be built . ... one column for rolling 12 month average for sales vs 12 columns, one for every month in the past 12 months. ... or a mixture of both. ... looking for some examples
Replace where you have X with a random text, say, "I love cats" to see how it works. The problem is from your 'X'
obviously, other measures than sales, but looking for structure
It ought to be a text or list of texts.
Text tokenization utility class.
What are Data Science projects that are good for a resume?
The one you'll definitely have lots of fun in doing. Personalised projects are the best
ayo what up anyone got any resources on cnn?
how do you determine cluster std for kmean
So in normalization we should subtract the dataset by it's mean then divide by the standard deviation and here's a code
x -= x.mean(axis=0)
x /= x.std(axis=0)
where std from the documentation is simply applying the standard deviation formula with subtracts included:
std = sqrt(mean(x)), where x = abs(a - a.mean())**2.
The question is why should I subtract with the mean before dividing by the standard deviation?
you mean why can't you do it in one line?
Nope
since the std formula already subtract by the mean
Why should I still subtract by the mean before dividing by the standard deviation?
Shouldn't I just do x /= x.std(axis=0) instead?
isn't x the value from the random sample?
x is the sample
yeah
so you subtract mean to figure out how much this specific value differs from the mean
then divide it by the standard deviation to see how extreme that deviation is
yeah it's very logical
Thanks a lot!
np lol
hello guys , hope you doing well . pls I need help !
I need how to retrieve the number of clusters for the kmeans with elbow curve
you pick how many clusters there are, yes?
yes
did you use sklearn to do it?
no, i used " elbow courbe " for that but that oblige me to read the curve to return it
You'd have to get the number of clusters from the plot. And because of that, we'll only be able to know how many clusters are present in your data if you post the Elbow Plot.
alright , I thought that it was another way to return it , thanks
another qst pls , how to plot the fitness curve vs Kmeans ?
There's actually another method to find the number of cluster(s) but Elbow plot is kinda more popular. You can get the number of clusters using KElbowVisualizer. You could do something like this
from sklearn.cluster import KMeans
from yellowbrick.cluster import KElbowVisualizer
print('Elbow Method to determine the number of clusters to be formed:')
Elbow_ = KElbowVisualizer(KMeans(), k=10, timings=False)
Elbow_.fit(train_df)
Elbow_.show()
If you don't fancy this method, you can use Silhouette plot as well.
What I usually do is, first use Elbow Plot, then validate the number of clusters using KElbowVisualizer.
I'm not sure I understand this Ques. Care to elucidate?
it makes error : No module named 'yellowbrick' !
You need to pip install the yellowbrick package on your machine first.
So.. I used this dataset I don't remember from where of weather pictures (literal sunshine, cloud, etc.)
What I did was use the KNN algorithim to classify the weather type. I am getting near 79% accuracy
The "features" I used (don't know exatcly how it works yet) was the medium of the R, G, and B proportions of every image pixels.
So all Red values of the pixels summed and divided by the amount of pixels.
Does this make sense?
It is analyzing the color context of the image, so as much as it ain't capable of identifying "clouds" or "sunrays" it can identify a color scheme of a "sunshine colored image" per say, I think
aaaah okay , thaaanks
I have no idea about KNN 😅
:p
any one here did a genetic algorithm for clustering ?
Since you didn't used NN to do the image classification, what it means is, KNN used the pixels in the image as features. RGB with/without their specific location in each respective image for each class was used as a feature.
Yeah. By NN you mean Neural Networks right
Can anyone help, I have three dataframes that I want to perform difflib.get_close_matches on before merging them into one single dataframe but I am a bit stuck
@odd meteor can u help me also this time pls 🥺 I want to plot time execution vs clusters number ( time execution in Y axe and number of clusters in X axe ) I don't know how
from sklearn.cluster import KMeans
num_clusters = range(1, 11)
inertia = []
for i in num_clusters:
model = KMeans(n_clusters = i, random_state=42, init='k-means++')
model.fit(train_df)
inertia.append(model.inertia_)
plt.figure(figsize=(10,6))
plt.plot(n_clusters, inertia, marker='o')
plt.title('Elbow Plot: Number of Clusters vs. Inertia')
plt.ylabel('Inertia')
plt.xlabel('Number of Clusters (K)')
plt.show()
max_temp = weather_df.groupby(['Station.State'])[['Station.City', 'Data.Temperature.Max Temp']]
I want to fetch the only city with max temperature
Is it possible to scrape only those twitter profiles which have a certain profile pictures?
Suppose I want to scrape profiles with Bored Apes as there profile pic.
max_temp = weather_df[weather_df['temperature'] == max(weather_df['temperature']]```
first i want to group by state
then i want to find the city with max temperature in that state
So, you want to find city with mac temperature for each state?
try weather_df.groupby('Station.State')['temperature'].max()
how do you run dbscan machine learning with a ton of Nan values
I'm trying to use several columns in my prediction so naturally there's gonna be a lot of observations with at least 1 Nan
do I just restrict the model to columns where I expect Nan to be the lowest and drop em or is there a better way to deal with missing values
replacing everything with 0 will fuck up a lot of stuff
ill just restrict the model to columns where I expect Nan to be lowest and drop Nan for now but hopefully someone comes up with a better solution lol
what did I do wrong here lol
city with max temperature for each state
nvm figured out the graph
ah, maybe weather_df.loc[weather_df.groupby('Station.State')['temperature'].idxmax()]['city']
is anyone able to tell me how i would be able to keep the 0 index when i am filtering like this please?
i would like to keep all the questions intact
I cant understand the error
@safe moss
maybe add "or year of the answer == 'Year'" in the inequality?
you have to explicitly single out the first row that's the only way I think
That helped me thank you 😊
for some reason it does this i have no idea why but i agree with your premise
ah yes
i have tried OR and '|' but neither has worked
same result
wait maybe i need to reload the data
thank you that worked, with | rather than or
ah np
is there any way to update sklearn models to the newest version of sklearn?, so it works with the newer versions?
yeah you should use the -U
i mean i have the model dumped already, im just trying to load it with pickle and gives error
i dumped it with, 0.19.0 and im now on 1.0.2
ModuleNotFoundError: No module named 'sklearn.feature_extraction.dict_vectorizer'
you tried uninstalling and reinstalling already?
i mean im on a separate work environment rn so ye i had to install everything from scratch
in what ive seen online it might just be misnamed
you checked the folder to make sure you're typing it right?
ye, thats not the issue, basically the issue is that when there is sklearn version mismatch. For example, trying to deserialize a sklearn(>= 0.22.X) object dumped with another sklearn version < 0.22.X. since sklearn introduced a change between those version
yes
other than redoing the code the new way idk what fixes there are
pain, i dont have the dataset to train again
;-;
dont even remember how i found it
or maybe you can like manually install an old version of the packages
after uninstalling the new one
if you remember which version it was that is
ye it was 0.19.0
i remember that
but i need some dependencies that need higher versions of scikit-learn
so i cant even downgrade it
if this has any solution it's gonna be hella bullshit is all I know
ima google around more to see if there is a way
@wooden sail
you got any clue how to interpret this?
this is optics modeling of my poker data using 5 different variables normalized
very cool stackoverflow 👍
https://stackoverflow.com/questions/62283893/scikit-learn-upgrade-from-0-19-1
guess gotta find the training data again, rip, they should have maintained model persistancy for older versions as well
;-;
they dont care lmao
can anyone tell me what the point of doing something like .fillna(0) to get rid of NaN is because as soon as I do something else to the dataframe the NaNs just come back?
LMAO, i hate my brain got an idea
ima just edit the pickle with the newer module names
hopefully that works ?
idk, thats the only hack which comes in my mind rn
anddddd it workeddddd
bruhhhhhhh
nice
You need to save it once you've done so. You can either use inplace = True or df = df.fillna(0)
thanks for the reply bro ill use that
is anyone able to assist with seaborn
Any idea how to convert a 1gb json file to csv?
!pypi json-stream You can use something like that to not have to hold it all in memory
also, don't know your situation, but it may be worth looking into parquet or some filetype with a splittable compression algorithm
I have converted it using to_csv how do I download it?
I'm trying to get an animation of filling in a 3d line (to visually represent a line integral)
Effectively this: https://pythonguides.com/matplotlib-fill_between/#Matplotlib_fill_between_animation
It's not working though
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.animation as mtA
import numpy as np
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
nsteps = 10000
t_init = 0.0
t_final = 20.0
t=np.linspace(t_init,t_final, nsteps)
rectangles=30
val = np.empty(rectangles + 1)
ts = np.empty(rectangles + 1)
tp = np.empty(rectangles + 1)
#parameterization of curve
def x_param ():
#x=np.cos(t)
x=t
return x
def y_param ():
#y = np.sqrt(t)
y=t
return y
def z_param ():
z=t
return z
def reinman ():
pass
def main ():
fig =plt.figure()
ax=fig.add_subplot(projection='3d')
#line, = ax.plot([], [], [])
ax.plot(x_param(),y_param(),z_param(),color="orange")
set01xtoline = [x_param(), y_param(), np.zeros(len(t))]
set1 = [x_param(), y_param(), z_param()]
zZ=np.zeros(len(t))
def animate(i):
x3=x_param()
y3=y_param()
z3=z_param()
xA=x3[0:(i-1)]
yA=y3[0:(i-1)]
zA=z3[0:(i-1)]
p = fill_between_3d(ax, xA,yA,zZ, xA,yA,zA, mode=1, c="C0")
return p
ani = mtA.FuncAnimation(fig,animate, frames=1000, interval=50000)
#fill_between_3d(ax, *set01xtoline, *set1, mode=1, c="C0")
plt.show()
if __name__ == '__main__':
main()```
Fill 3d function I'm using
def fill_between_3d(ax, x1, y1, z1, x2, y2, z2, mode=1, c='steelblue', alpha=0.4):
if mode == 1:
for i in range(len(x1) - 1):
verts = [(x1[i], y1[i], z1[i]), (x1[i + 1], y1[i + 1], z1[i + 1])] + \
[(x2[i + 1], y2[i + 1], z2[i + 1]), (x2[i], y2[i], z2[i])]
ax.add_collection3d(Poly3DCollection([verts],
alpha=alpha,
linewidths=0,
color=c))
if mode == 2:
verts = [(x1[i], y1[i], z1[i]) for i in range(len(x1))] + \
[(x2[i], y2[i], z2[i]) for i in range(len(x2))]
ax.add_collection3d(Poly3DCollection([verts], alpha=alpha, color=c))```
Andrew has more than one course :) which one you are referring to?
hello guys , I want to plot time execution ( in y label ) vs clusters number ( in x label ) , how can I do it pls ? PS : I have the time execution and clusters number
Might be able to, what the question


