#data-science-and-ml
1 messages · Page 269 of 1
If the min and Max of that column is too wide apart with lots of digits in btw, better use standard scaler.
for context, var a is a %, var b, is a number between 0-5000, var c is a value between 0-10000
ultimatly joining the 3 to create a 'score' from 0-100'
I haven't used them much outside basic knowledge, I'm also still learning. But I know that MinMax scaler clusters digits together when the difference is very large. For instance, 0.15 and 2.4 could be lumped into a single variable which might affect results.
Try the both and see which gives better results.
Yea, you can wait for the other experienced guys to come give you more insights or Google stuff on your own. Happy coding!
*More.
TY! Google has lead to a few, but its always nice to ask you guys and get a bit of the expert opinion
The scaling shouldn't change in deployment with minmax scaler because you should have trained it on the training data set and you are applying it to any unseen data. I assume it will give values outside of 0 and 1 if your data goes outside of the range of the training data, but I am not sure. Maybe it just caps its.
Do you guys any websites that give away free data? I want to practice my data science skills
Can anyone guide me on how to implement the I python kernel in to vs code. Freaking having a hard time doing it even with the main website and downloading conda
@somber torrent https://scikit-learn.org/stable/datasets/index.html
thanks bro
Hey all what's a way to convert a series of lists in a column of a groupby object to one giant list for each group?
ie.
a [1, 2]
a [2, 3]
b [1, 2]
b [2, 3]
into
a [1, 2, 2 3]
b [1, 2, 2, 3]
Do you guys any websites that give away free data? I want to practice my data science skills
@somber torrent try kaggle or data.world
The scaling shouldn't change in deployment with minmax scaler because you should have trained it on the training data set and you are applying it to any unseen data. I assume it will give values outside of 0 and 1 if your data goes outside of the range of the training data, but I am not sure. Maybe it just caps its.
@hollow gull by default, it can give values outside the range of [0, 1]
Hey all what's a way to convert a series of lists in a column of a groupby object to one giant list for each group?
ie.
a [1, 2] a [2, 3] b [1, 2] b [2, 3]into
a [1, 2, 2 3] b [1, 2, 2, 3]
@heady hatch write a custom aggregation function
and pass that into .agg
Hey guys, i have a question around dataset and what actions to take. I have 3 variables that are not at all on the same scale. I need to normalize or standardize them so they result in something i can weight and then pull into a single score as a result.
@small tartan where do the weights come from
To clarify, i am not deploying this in a ML model. But a dashboard
I will manually adjust the weights to achieve my desired rack and stack
i have 50 records (with about 5 being added quarterly)
I've picked the top 10 and bottom 10 based on understanding the data and what it represents. I need to apply the weights to basically get those top and bottom 10 in the correct area and let the rest work within the scale
I'm just getting really caught up in the standardizing of the data so its on an equal playing field
Since its not exactly dealing with metrics that are easy to just add together, hence standardizing. I did standardize so 95% is within 1 Standard deviation. and the outputs are mostly between -1 and 1
since the first value is a percentage 🤷♂️
Well i can build that metric to not be a percentage. It just comes that way raw
but yeah the first value being a percent is nice since thats inherently a 0-100 already ha
I'm building a score for spaces where content is held. The content age, usage, and some data about it is what is driving the variables. I'll have this score updated monthly with the backwards rolling 180 days worth of info
anyone have any luck using to_sql in pandas to load data into Snowflake when the dataframe has a datetime field? It keeps giving me this error: Failed processing pyformat-parameters; 255001: Binding data in type (timestamp) is not supported. and I haven't been able to find a solution online that doesn't involve manually converting each column that is a date, which isn't feasible for my use case
or is there any way to dynamically convert all datetime columns to strings in pandas without knowing every single column name that is a datetime?
you can always loop, after getting pd.dtypes
and convert them to string, whosever data type is datetime
i cant even get the string conversion to work using astype(str) or unicode due to ascii-unicode errors
really wonder why pandas apparently had this working in 0.15 but then broke it in 0.24 🤷♂️
or is there any way to dynamically convert all datetime columns to strings in pandas without knowing every single column name that is a datetime?
@deep spire.select_dtypes
into .apply
into .dt.strftime
I'm at beginning of my ML learning path, so it'd be nice of you if you help me clarify one question: is PCA the same as estimator?
@velvet thorn whats the proper way to call this and set those columns in the df? I'm trying df.select_dtypes(include='datetime64') = df.select_dtypes(include='datetime64').apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S')) which fails (as does apply(dt.strftime('%Y-%m-%d %H:%M:%S'))
But I think the error is with setting the columns to be the new column with converted type. getting SyntaxError: can't assign to function call
hi :)
how to correctly install vaex lib
I mean, that i downloaded vaex by
pip install vaex, but when i want to start it in shell
shell throw ModuleNotFoundError: No module named 'vaex.remote'.
Than i want to install this one module,....
ERROR: Could not find a version that satisfies the requirement vaex.remote (from versions: none) ERROR: No matching distribution found for vaex.remote
how to increase the spacing between each x axis element?
or how to set x-axis each element text on multiple line?
@lavish zinc you better rotate them 90°
plt.xticks(roation=90)
Conversely, you can set huge width in plt.figure(figsize=(30, 5))
But second option is not recommended, of course
My Validation accuracy and Training accuracy print there own values. Am I meant to make them print together as an average of both?
hi 🙂
for any JupyterLab users who like a good dark theme, and maybe want something that looks a bit more modern than what JupyterLab offers, I just published a build of One Dark Pro
which can be installed in the Extension manager 😊
@somber torrent try out kaggle.com for loads of datasets. You'll also find ML and data analysis examples of those datasets.
Hi, someone knows how can I use TPU on google colab to compute a ANN please?
Hi people I made a paper for my school project on AI and ML
Can you please check it out and tell me if it’s good or not
I have a very good math background, and a lot of experience in swe, but not data sci. Are there any intermediate projects you guys could recommend? I want something that can allow me to get a feel for this field
@hoary sluice thank you kind stranger for your suggestion to use HDBSCAN it worked nicely!
Hi, does anyone have an idea about calculating the reading time of an article by considering the syllables of the words as well, apart from considering the number of words and words per minute? or does anyone know how the reading time and speaking time is calculated in Grammarly?
Hi guys, I have question about NN. I create a custom loss function which work on a ANN but it doesn't when I put it in a RNN. Why? def correlation(y_true, y_pred): corr = tfp.stats.correlation(y_true, y_pred, sample_axis=0, event_axis=None) return corr
Hi guys, any scraper here? Just looking for opinnions on Scrapy vs BS
@torpid cave what you mean by BS?
Hmm I've never used BeautifulSoup as a scraper but more as a parser and I've never used Scrapy as a parser but as a scraper.
Often I just use request + bs instead of Scrapy unless I need something heavy duty.
What are you scraping?
Ah thats BeautifulSoup
they are not comparable
BS is a HTML parser, Scrapy is a web crawler framework which includes a HTML parser too
using scrapy just to parse HTML does not makes sense
you can use BS to parse the HTML pages crawled by Scrapy
hi how can i convert this in python
> set.seed(1)
> x <- w <- rnorm(100)
> for (t in 3:100) x[t] <- 0.666*x[t-1] - 0.333*x[t-2] + w[t]
> layout(1:2)
> plot(x, type="l")
> acf(x)```
it is R
What's x <- w <- rnorm(100)?
x=w=np.random.normal(100 values)
And what's x[t]? is that accessing x at index t?
if so
import random
import numpy as np
random.seed(1)
np.random.seed(1)
x = w = np.random.normal(size=100)
for t in range(3, 100):
x[t] = 0.666 * x[t-1] - 0.333 * x[t-2] + w[t]
... plotting
thank you master @heady hatch
Hello guys, I'm new to data science and python, however i have some experience with languages like C#, java or js. I have to do some tasks, is it a good place to ask some questions?
Right now i have to complete some programming tasks using pandas module
Sounds relevant to data science, shoot your questions.
okay so, i have a dataframe with columns(ID;Country;owns_car;gender;Age) and i have to create new df that has coums Country, average goods, minimalAge and %ofWomen
so i don't know how to create a new df with given columns and them populate the columns
i am a total pandas noob so maybe it is a simple task but i don't know the tool to achieve my goal 😄
I used to use jupyter notebook with VScode. It was really slow and sometimes made really weird errors (not on me). Did anyone have the same problem? Does anyone recommend alternatives?
@waxen birch
We'll work on it one at a time, but I do recommend reading up on basic Pandas first then we'll break down the problem at hand.
@cerulean spindle
Hmm what do you mean by really slow? Comparing it to regular Jupyter instances?
@cerulean spindle i usually use a docker image to run jupyter notebooks and just pass the url to localhost
@heady hatch okay, do you know maybe some good source of basic pandas? Maybe some tutorial which is valuable? ;)
@waxen birch Here are some to get started.
https://github.com/ajcr/100-pandas-puzzles
They also have links on how to get started.
I recommend getting the basics of pandas down first because otherwise you have to think about data transformation and pandas syntax as the same time.
Unless you feel comfortable enough to dive right in, then show us your data and we can go straight in.
Ooo! This looks great! Thank you so much. I've read in one of the O'Reilly book that python community is really nice. I guess they were right :)
@glad mulch I don't know if this will work, but you can try df.T
hahah
Another way would be
df.index = df.columns
Though I'm unsure how that will go.
I guess you can do a temp or xor exchange.
df.columns, df.index = df.columns, df.index
I think you can just
pd.DataFrame or
pd.concat(list)
I might need more information.
What do you mean by same indexes and how do you want the dataframe to look?
Hey, how could I update a complete Pandas column following a condition? For example: update Sex: male, female, male, male, female... to Sex: 0, 1, 0, 0, 1...?
Yeah but df[“sex”].map(...)?
It did, thanks
@glad mulch oh 6 of those have the same indexes?
Anybody else staring at some code not knowing where to start or is that just me?? (I'm still fairly new to python, but I spent some time away from it so I'm picking it back up and trying to lean ML 😓 )
@hoary sluice thank you kind stranger for your suggestion to use HDBSCAN it worked nicely!
@bronze barn no problem, i suggested HDBSCAN because i myself found in similar problem... HDBSCAN improves the way density clustering works building a hierarquical structure also, its Very good to find outliers... And toghter with UMAP ia great
Trying to remove all columns that have a last row element of NaN.
started by trying to create a droplist but i'm running into problems
droplist = [col for col in df.columns if ((df.loc[df['date'] == today][col]).isna()) == True]
@cyan sun
You can try
df.iloc[-1].isna()
That should give you all the columns that have nan in the last row.
Hello was wondering if anyone code explain this piecewise fit function code using numpy import https://paste.pythondiscord.com/yisowirelu.properties as I am struggling to grasp the idea
@heady hatch thanks for the help but the drop method doesn't allow for boolean arrays - any suggestion on how to handle that?
@cyan sun You don't need to use drop. You can just filter it out using boolean indexing.
Yo so i have a csv file with a list of multipolygons that are 'community areas' of chicago. I am trying to find the center coordinate of each polygon, how should i go about doing this?
Make a 10X4 dataframe with random numbers, you can use any names for columns names.
Use one easy built in function to show the basic statistics of all the columns such as count, mean, std and percentiles.
Transpose your dataframe.
Print the 3rd row and 5th and 6th columns from the transposed dataframe.
can someone help with this?
if anyone could take a look at help-copper aswell that would be great
@heady hatch got it working. Thanks again for your help 👍
Nice nice.
@shell berry What do you need help with in pt?
Hye everyone
hi, does anyone know how to plot eeg using csv files?
and willing to help me with a projects?
hi all, i have posted my issue in help-nickle but no one replied. I will summarize the essay i have written there in one sentence. any one expert in machine learning in python can give me a private tutoring to guide me in my project. I don't think i can learn everything in 10 days and submit my project.
hi all, i have posted my issue in help-nickle but no one replied. I will summarize the essay i have written there in one sentence. any one expert in machine learning in python can give me a private tutoring to guide me in my project. I don't think i can learn everything in 10 days and submit my project.
@pliant kestrel honestly
your problem
isn't really that advanced
but you're asking for quite a big commitment.
i know man, i know, but i just feel totally alone in this shit that is driving me into despair
@heady hatch Are you familiar with pytorch lightning
Nope.
It looks super clean though.
i know man, i know, but i just feel totally alone in this shit that is driving me into despair
@pliant kestrel you can ask specific questions here
and have a reasonably high chance of an answer.
If I have a tensor of input tensors and a tensor of outputs tensors, how exactly should I feed it into the model
but I think your problems run deeper than that
and well
maybe this isn't exactly the place
[inputs] + [labels]
or
[(input, label), (input, label)]
I know I have to feed it into a DataLoader
@velvet thorn if this is not the place, then where is the proper place? i tried asking in reddit, but no buddy responded
i will try to see the codes that have been used in this course, and try to figure out as much as i can
like I said
this isn't really the place to find someone who is willing to commit to that long term
you might, but it's really unlikely.
especially for free
what if not for free
Interestingly @shell berry I might actually look into pytorch lightning. hahaha Thank you for this.
In terms of your question. Depends on your model.
what if not for free
@pliant kestrel then you need to take into account that you get what you pay for
Haha np, it is pretty clean
what do u think the prices might be
def __init__(self, input_size, hidden_size, output_size, dropout=False, dropout_p=0.1):
super(MultiLayerPerceptron, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size, bias=True)
self.fc2 = nn.Linear(hidden_size, output_size, bias=True)
self.add_dropout = dropout
self.dropout = nn.Dropout(dropout_p)```
Super basic
i have been laid off since july, so I can barely offer much
@pliant kestrel If you're using sklearn I can help you for free tomorrow.
@shell berry So I see you have your layers set up.
For PT, you can either define a full on loop or a
def forward(self, x):
x = layer1(x)
x = layer2(x)
x = layer3(x)
return x
Yes, thank you
And then when you're calling the model in your loop
I have the model done and working with basic pytorch
Im just using lightning now and trying to wrap my head around datamodule and dataloader haha
Datalaoder takes in one input corresponding to data, so I assume its a list of tuples of inputs and labels?
So I think you use a dataloader with a dataset.
ie make your dataset subclass, then create a dataloader based on that dataset.
Yup, so I created my dataset with another class
Now I just want to put it into a dataloader
But my subclass just returns a list of inputs and outputs
should I zip them into tuples and then into the dataloader?
I want to say yes, but if you don't mind let me read it up real quick.
Been working with tf largely recently.
Sure, thank you very much
Ahh okay. Yea you bundle the two up together.
And when you're loading it, you would unpack it.
@shell berry kind sir, what is in sklearn that you are willing to explaing
So let's say you have your dataset.
iterating through your dataset object from the dataloader
for idx, data in enumerate(dataloader):
image, label = data[0], data[1]
...
the file is in csv file
I can help reading it in, cleaning it, augmenting/tokenizing/etc., then putting it into a format sklearn can read, then using a SVC or decision tree or whatever you need on it
Also by the way @heady hatch
For my labels
Does it matter if I'm using a labexindexer or a multilabelbinarizer
Like let's say I have 5 labels
I can represent them as [0], [1], etc
or [1 0 0 0 0], [0 1 0 0 0]
I'm doing multilabel classification so I'm using the latter, but I see some people use the former
Just message me whenever tomorrow and Ill let you know, but please don't hedge your entire project on me helping, I don't want you to get screwed if I'm busy or something
I have no idea what either of those are. hahaha
In terms of deep learning, you would end up using different losses.
If you have 5 labels, and depending on what the output is like you would either use sparse categorical crossentropy or categorical crossentropy.
oh lol my bad
Ive been doing alot of classical machine learning with sklearn before this so Im thinking of everything like that lol
No worries, I had a similar transition too.
no man, i just want the push to start, since i am lost and don't know where to go
What does your data look like @pliant kestrel
the professor is supposed to post the project today, but he did not yet for some reason
it is due in two weeks
Yeah u havent even looked at it man
it is supposed to be posted already
atleast look at it before you give up 😛
since the begining, we had miniprojects, a guy in our group was doing the coding while the rest did the exercises in excel for better understanding of the concepts
the final project is an indiviual work, i haven't look at a code for ml, since the code our professor gave us was old that the confusing matrix was working due to an update in panda_ml that is contradicting something else, and since our beloved instructor did not update, and me being total noob in python, i gave up on learning
Does it matter if I'm using a labexindexer or a multilabelbinarizer
@shell berry in short, not really.
but
okay wait do you understand the difference between multi-class and multi-label?
Yup
@pliant kestrel So you're a complete python noob?
I recommend learnign python before you start doing ML then
Also don't let one guy in the group do all the coding
but now this individual project came like a 12 inch stick in the .. and now i need to do the following, 1-import the data ( i believe it is cleaned, since we are given training and test data) 2- do classification ( various classifiers) 3- using the confusion matrix, 4- write a report about each observation
well, i don't know any more man
How much python do you know?
Because all the stuff you told me can be done in like 50 lines lol
i know, that is why i am a bit confused in this discussion 😄
Yup
@shell berry yeah so the latter representation can handle multi-label classification
since you can have multiple 1s
@glad mulch how much of a pain was it
[5, 16, 20] for each label
[5, 16, 20] for each label
@shell berry then you'd have a variable-length target
yw
@pliant kestrel you're getting ahead of yourself
no wait man, i am not, am i?
if you don't know programming then don't worry about the data yet
tell me whatever u want
When did you start learning python?
yesterday
yikes
i had some circumstances this semester
being laid off and such
it was not funny
you get destroyed when u are alone
Yeah man Im not judging you or anything, I said that with kind intentions
I just dont know how you can do ML in python if you dont even know python
But your project seems pretty basic, you could look up tutorials and put together the pieces
that is what i am trying to do at the moment.
You have two weeks?
that is a ray of hope
What is your masters degree in?
well, i have another project in regression analysis that i am trying to solve also using JMP
engineering mangement
the course i am taking is called : data mining
They didn't have prereqs for that course?
statistical learning in machine learning or close enough
I have never seen a course like this without programming courses as a prerequisite
the prerequiset was that u should have taken a programming in ur undergrade
i did, that was 9 years ago in Java
so i am a bit flexed on some conepts, but that is that, i have never dealt with python past this point
should i neglect the course i am taking in datacamp?
So for handwritten dataset. Is it true that sprite sheet is more common than csv?
Practice cant hurt
That is up to you
ok i see, but can i continue to ask more questions on what to do in the future?
Sure
sorry man i laughed a bit
are you sure you didnt open ur data incorrectly cause wtf
ok guys, see you, going to watch some lectures. Thanks kali, gm, light
good luck, sorry to hear about ur lay off
thanks, it is ok, hopefully things will be resolved soon
I'm in grad school for NLP 😛
how about you
oh nice
you look like a finance student with your suit 😛
pct?
Can't you just parse through and check for what you need at each row
Sorry, I know literally nothing about sql 😛
Parse through the dataframe and read each row
each row will have the values of each column
so itll be like
[date, ticker, price], [date, ticker, price]
How am i able to find the Geolocation and Geocoding Limits for API usage? i have a dataset of approximately 300,000 entries it needs ran on. Will i be able to run it out all of them?
Google APIs
How much is 100QPS?
if i wanted to do pct changes in price for each ticker based on the date, how would i do that
@glad mulch groupby
then diff
oh, no, not diff
pct_change
yeah
so what's wrong with groupby
In a saved keras model (via model.save()), there are two keys in the .h5 file: model_weights and optimizer_weights. Am I gonna ever have to use optimizer_weights If I'm never gonna continue the training on the model? I'm willing to use it for prediction only
Getting a loss of almost 0 after only 200 epochs.. Is this fishy? Something is wrong, right?
For a multilabel dataset with 3k examples and ~40 labels
hello
Can anyone help me on how to load multiple models from checkpoints in TensorFlow 2.1?
Have two checkpoint directories, and I need to load in a model for each.
🙂
Getting a loss of almost 0 after only 200 epochs.. Is this fishy? Something is wrong, right?
@shell berrydo you have a test dataset or validation dataset? its likely that your model has overfitted/overfit (unsure of correct grammar here)
if you can run a test with your model using the test or validation dataset and see what the loss there is then you could potentially get an answer
if your testing loss is super high and ur test accuracy is low then your model has overfit
if your model has overfit, well it depends on the model and what you want to actually do because there's various ways to counter overfitting but if it's a CNN for example you can add dropout layers
Hello, so i simply want to calculate the percentage of something, yet i keep getting a divide by zero error, when i'm not dividing by 0
that equals 0
you wouldn't get the error otherwise
oh no actually a = 0, you don't have any parentheses so I'm assuming it's doing ((number/a) + sa + nad + d+ sd)
oh no...i removed the brackets when i moved it to a function, thank you
i am saving an image at path_resources folder and then i am deleting it os.remove(path_resources/"im.png") this way
i am getting error at test_img = cv2.imread(path_resources/"im.png") this line
Traceback (most recent call last):
File "E:\demo3\modules\recDoc1.py", line 211, in post
test_img = cv2.imread(path_resources/"im.png")
SystemError: <built-in function imread> returned NULL without setting an error```
CAN I use Conda with the PyCharm Community Edition?
hey, can anyone help me with my scatterplot, im trying to chance the values of the x axis as my graph is coming out like this https://gyazo.com/68542f25c58330b3c0db3fed929eb9e4
what is the module for calculus?
@agile pollenscipy's got quite a bit for it as well assympy
I am trying to apply a function to all columns in a df without typing out the column names. Is there a faster way to do this? I've looked to see if I could use the column numbers, but this hasn't worked.
Whta function?
.astype(int)
I've been .astype(int) individual columns, but I need to do it to all 16 in my df. The column titles are long, and I am to save time.
The columns are objects.
import pandas as pd
d = {'A':['1','2','3','4','5'], "B":['2','3','4','5','6'], 'C':['3','4','5','6','7']}
df = pd.DataFrame(d)
df = df.astype(int)
type(df.loc[1,'A'])
Just tried tat and it worked
by numbers I meant, whatever they are but in the bottom of their hearth they are numbers
Other way is creating a column index and then changing the type to that column index
col_index = df.columns[0:2]
df[col_index] = df[col_index].astype(int)
Then you can use slicers
To select your columns
Or use a loop
for col in df.columns:
df[col] = df[col].astype(int)
Last one I don't approve as it is not vectorized
There are at least 3 other ways I can think of doing this, let me know if what I did earlier works or if I missunderstood your problem
Nww
col_index = df.columns[0:2] df[col_index] = df[col_index].astype(int) Worked like a charm. Thanks again.
hi, how can i convert this code in python. it's is matlab ```py
%Simulate AR(3)
T = 1000; %Set how many observations you need
y = ones(T,1); %Create a vector of dim Tx1 to store the simulations in
y(1) = 1; %Set the first obs. to 1
y(2) = 0.5; %Set the second obs. to 0.5
y(3) = 1.5; %Set the third obs. to 1.5
rho1 = 0.2; %Set the value of rho1 (coefficient on y(t-1))
rho2 = 0.2; %Set the value of rho2 (coefficient on y(t-2))
rho3 = 0.1; %Set the value of rho3 (coefficient on y(t-3))
sigma = 1; %Set the value of the s.d. of the error term
mu_e = 0; %Set the value of the mean of the error term
eps = normrnd(mu_e, sigma, T, 1); %Creat a vector of normal random numbers with mean, mu_e and s.d. sigma. Dimension is Tx1
for t=4:1000; %Start the loop running from obs. 4 to 1000
y(t) = rho1y(t-1) + rho2y(t-2) + rho2*y(t-3) + eps(t); %The AR(3) model
end```
Well up to mu_e is exactly the same
After that I think it is better if you explain what you want to do
Nevermind, I see what you are trying to do
It sucks that I can do it in R but not Python
I think you could either work out the equation to get Y
And loop y
it would be easier to ignore y1, y2, y3.. and just do an AR(3) simulation
import statsmodels.api as sm
import numpy as np
arparams = np.array([0.2, 0.2, 0.1])
ar = np.r_[1, -arparams]
arma_process =sm.tsa.ArmaProcess(ar=ar, nobs=1000)
Something among those lines
arparms it is nothing like the exemple i put . you cannot put 3 (y) in a same array
i already tried it. i did like you. i took it from forums
cross_val_score of sklearn returning list of nan values...any help guys?
I mean you are simulating an AR process
@torpid cave any help?
I am thinking but it is 230 am here
I havent done much calculus in python tbh
I guess I would try to get Y on one side and roll the equations
Ok got it I think
I am using my tablet so I cant test this but I think the idea is quite clear
y_1 = 1.5
y_2 = 0.5
y_3 = 1
rho_1 = 0.2
rho_2 = 0.2
rho_3 = 0.1
mu, sigma = 0, 1
error = np.random.normal(mu, sigma, 1000)
y_list = [y_3, y_2, y_1]
for i in range(3,999):
y = rho_1 * y_1 + rho2 * y_2 + rho3 * y_3 + error[i]
d = {i: 'y_value'}
y_list.append(y)
#Update lag variables
y = y[i]
y = y[i-1]
y = y[i-2]
Damn
I forgot index starts at 0
Just fix that and you should be good I guess
*fixed
In poisson regression what means "deviance"?
https://gyazo.com/c5a698d166fe7a44e867e23b12cad4e3
@light warren
I think your data has infs or nans in them.
Check your dataframe.
df.info()
yeah it does, do u have how i can make it skips those data rows?
You can drop the na via df.dropna().
would i just add that to the above code?
You're going to need to reassign it.
df = df.dropna()
Ok so I have a pandas dataframe i need to split into quintiles as I need to get the average of the top/bottom 20% of the rows in it by a given key (INDEX, an integer that's a calculated score)
having difficulty finding the function i need in the docs
Is there a way to do the following transformation without that for loop? I am spliting rows and changing column names
def df_split_rows(df: pd.DataFrame):
raw_df = {'Attacker': [], 'Defender': [], 'AttackerAdvantage': [], 'Damage': []}
for _, row in df.iterrows():
raw_df['Attacker'].append(row['Player1'])
raw_df['Defender'].append(row['Player2'])
raw_df['AttackerAdvantage'].append(1)
raw_df['Damage'].append(row['Player1_score'])
raw_df['Attacker'].append(row['Player2'])
raw_df['Defender'].append(row['Player1'])
raw_df['AttackerAdvantage'].append(0)
raw_df['Damage'].append(row['Player2_score'])
return pd.DataFrame(raw_df)
Hey guys does anyone knows of a way to dynamically create from a list of numerical data, a dataframe with 2 columns (Bins, frequency) AKA a frequency table, with the only arguments being the list with the data and the number of desired bins?
The bins should be of equal size. For ease here is a random list:
lst = [111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139]
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
@heady hatch thanks sooo much
@rich silo range(111, 140)?
can anyone give me some help with running this code im on help-aluminium?
https://gyazo.com/a5c28edcbd41640f4f3e8c003247b0e3 what does this error mean?
Can someone point me out if I am wrong here. I am making an image recognition application (for verifying signatures).
I am currently looking at Tensorflow to get the work done but there are just so many libraries such as OpenCV and LSH.
I was hoping someone can point out what I should implement in my code. PS- I have to make a check for mirrored images as well
I just joined the industry so go easy on me. Cheers!
I have faced this problem before, solution is simple: update matplotlib
hi , im struggling on starting this question that i have been given on data science, i tried doing it on my own thou im getting more confused
this is my guide line
@deep ingot What part are you struggling with?
@neon shard i am new to data science , basically i started reading my notes and now application is calling and i dont know where to start. i have knowledge on the topiv but dont know how to apply
Specifically, what in the above document are you stuck on?
what i have done
i just like to get a whole example of how i can do this question so i can do it by myself again if that makes sense
I don't think anyone is just going to do your whole homework assignment for you. You need to break it down into chunks, try it, and then ask for help on parts that you're stuck on
It looks like you need to generate the student number with range() or a loop. Do you know how to do that?
That will require the user to input 150 numbers. I think it's asking you to just generate the IDs
So, for each loop you can just automatically generate an ID
You could do this if the IDs can be 1-150
for x in range(150):
ids.append(x)
Or this which does the same thing in a cleaner way. It's a list comprehension
[x for x in range(150)]
i perfer the 1st way u did it cause i have done that prev
Anyone know if you can use NumPy to solve algebra problems?
Hello does anyone know how I can get this code to run struggling at the minute?
You can run it on any real IDE
sorry my coding language isn the best fairly new to the game lol what do you mean
do u mean a debugger
Integrated Development Environment. A place to write and run code
Like pycharm
If you really wanna do it in the web you can use google colab or kaggle notebooks
Or the best thing to use when using matplotlib and numpy is Jupyter notebook
ahhh he sent me the code just on an email and I copied it
Ok maybe in the terminal you can try doing pip install matplotlib
Or !pip install matplotlib
Not really sure about this one
Just some advice:
Do yourself a favour and start running code on an IDE rather than using these web based editors
could it be I am using the wrong code for matplotlib?
It doesn’t look like it is
def plot_it(x,y,p): #(uncomment to start working on this function - optional)
plot_it(x,y,p)
are those plot commands needed
No these are just functions and the error is with matplotlib before anything else
Python goes from the top to bottom and when it encounters a error it displays like and doesn’t show the other errors
yeah I noticed that first error it sees it just tells you that one when you could have 5 more
What do you think is the best way round this problem?
Honestly, just use google colab instead
Much better
Before you copy and paste the code
Write !pip install matplotlib in the first code cell
and then just copy the rest
Yeah try that
right I will try that and get back to you,thanks @blazing bridge ,
Ok gl
That’s because the code has to be separate from the installation
With cells you only get one output
Break the code up to different outputs
How do I do that lol ?
How is the cost function minimized in a neural network?
hahha my code skills aint the best got this off a mate, I just need to which variables to change so that I get an output
I have a dataframe like this:
2 DNA False
3 DNA False
4 DNA False
... ... ...
8790 nonDRNA False
I need to get the percentage of rows where the boolean value is True, grouped by the first column.
df.groupby('class').count() is a great start but that counts every row; I could divide another dataframe by this
I want to say it's x.groupby('class').sum() / x.groupby('class').count() but idk what is being summed
anyone utilize pdftotext often? trying to pull out data from a huge PDF but having trouble with it
I successfully made a script to load npz files. I need some advices about how can I extract datetime corresponding to some of the vars of the npz. Thanks for the help
i have a dataframe where i want to calculate the returns for each ticker
@glad mulch what do you mean not working
I want to say it's
x.groupby('class').sum() / x.groupby('class').count()but idk what is being summed
@serene scaffold the boolean value
@velvet thorn so it's just summing all numeric-like values in the dataframe?
Hey, I need a little help... Anyone here use jupyter on AWS?
@low oracle go ahead and ask what you would ask if someone said yes
@serene scaffold lol alrighty then. So I'm trying to utilize a tsv file into jupyter, I have looked around (YouTube, sof, etc...) and cant figure out how to properly work with my tsv data file
are you using pandas?
Attempting to yes
@low oracle if you're using pandas, it's pd.read_csv but you have to specify that tabs are the delimiter
!docs pandas.read_csv
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, [...]```
Read a comma-separated values (csv) file into DataFrame.
Also supports optionally iterating or breaking of the file into chunks.
Additional help can be found in the online docs for [IO Tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).
Parameters **filepath\_or\_buffer**str, path object or file-like objectAny valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: <file://localhost/path/to/table.csv>.
If you want to pass in a path object, pandas accepts any `os.PathLike`.
By file-like object, we refer to objects with a `read()` method, such as a file handler (e.g. via builtin `open` function) or `StringIO`.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)
and then you get to experience the joy of learning how to use pandas, which as you can see from my earlier question is not something I've fully accomplished myself.
but there are people who hang out in this channel who have
This is what I have so far
@low oracle those seem like really strange errors. Are you creating a spark session somewhere? Are you trying to use pyspark?
Yeah I made that mistake and changed to conda-python3
but I'm still having issues
<ipython-input-12-530695f4cce5> in <module>
2 import matplotlib.pyplot as plt
3 tsv_file = open('data.tsv')
----> 4 read_tsv = csv.reader(tsv_file, delimiter='\t')
NameError: name 'csv' is not defined
you just haven't imported csv it looks like
okay, maybe your path is wrong.
guys i need some help : well i need to do a list like y = 5 ; y = 0.98 *y; y = 0.90 *y
for i in range(1, 20):
np.random.seed(1000)
y[i] = y[i-1]+ np.random.normal(0,1,size=200)```
after i need to use them here
someone can help me?
@glad mulch there is a argument in df.dropna that lets you specify the axis. Think you want axis=1 for column or axis=0 for row, so axis=1 for you.
@slender nymph I don't really understand your question based on what you have said.
!docs pandas.DataFrame.dropna
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)```
Remove missing values.
See the [User Guide](../../user_guide/missing_data.html#missing-data) for more on which values are considered missing, and how to work with missing data.
Parameters **axis**{0 or ‘index’, 1 or ‘columns’}, default 0Determine if rows or columns which contain missing values are removed.
• 0, or ‘index’ : Drop rows which contain missing values.
• 1, or ‘columns’ : Drop columns which contain missing value.
Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
**how**{‘any’, ‘all’}, default ‘any’Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
• ‘any’ : If any NA values are present, drop that row or column.
• ‘all’ : If all values are NA, drop that row or column.
**thresh**int, optionalRequire that many non-NA values.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna)
Can anyone help me out with this question?
https://stackoverflow.com/questions/64904587/how-to-generate-a-list-of-tokens-that-are-most-likely-to-occupy-the-place-of-a-m
Basically, I've found a StackOverflow answer, but this did not answer the question.
Mehn, I was practicing and stuff, training a heavy dataset on my PC, now my PC is not responding. This is the first time. 😂
The activity light is on ATM. Normally, it blinks once every minute but now it's just on. Don't know if I should shut down and restart or leave it.
I first did some processing like oneHotEncoder and stuff, then I scaled it using standard scaler, used isomap to reduce it to 3 components, fit it using linear regression, make predictions and check the mean accuracy score. That's basically what I was doing.
Even the clock on my PC is not working anymore. I'll leave it for 10 more minutes.
@glad mulch not sure if the first guy answered your questions well enough. If you use axis=1, that means it should drop columns with null values, there's other parameters which you can use to fine-tune this also like threshold, how etc. If you use axis=0, that means it would drop any row that contains NaN, you can also set threshold and stuff for this argument also. You can read the documentation to get insights about the other parameters.
@velvet thorn so it's just summing all numeric-like values in the dataframe?
@serene scaffold yup
count, on the other hand, counts non-null values
@velvet thorn ok, lemme do a different question. if i wanted to skip the first date in my data frame how would i do that in multiindex
@glad mulch are you thinking of.iloc
how is iloc different from loc?
how is iloc different from loc?
@serene scaffold.loctakes boolean series or string indexers (labels, strictly speaking)
.iloc takes boolean series or positional indexers
so one common pattern is
selecting a subset of a DataFrame by applying a condition to the rows and taking only some columns
e.g. df.loc[df['value'] > 3000, ['colour', 'model']]
huh, I didn't think that worked
I didn't think that worked, either
but the first one doesn't? (without the .loc)
but the first one doesn't? (without the
.loc)
@serene scaffold nope
but with .loc it does
!e
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
print(df.loc[df['a'] > 2, ['b']], end='\n\n')
print(df[df['a'] > 2, ['b']])
@velvet thorn :x: Your eval job has completed with return code 1.
001 | b
002 | 1 4
003 |
004 | Traceback (most recent call last):
005 | File "<string>", line 5, in <module>
006 | File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
007 | indexer = self.columns.get_loc(key)
008 | File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
009 | return self._engine.get_loc(casted_key)
010 | File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
011 | File "pandas/_libs/index.pyx", line 75, in pandas._libs.index.IndexEngine.get_loc
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/ipoquzinuz.txt
@velvet thorn I'm learning 
h
OwO snekbox back
(Noob) I successfully made a script to load npz files. I need some advices about how can I extract datetime corresponding to some of the vars of the npz. Thanks for the help
if the position of the dates are the same in the files, can't you loop through it and record them in a list?
the datetime module would be helpful if you need to reformat those dates later on.
What do you mean by loop through it and record them in a list? How can I capture those data
it seems that you have an array.
So your array must have items like rows? Using a for loop wouldn't you be able to go through that array?
Yes, I thought the same, the thing is that I have no clue about how to extract some of the columns of the npz
just regular slicing isn't it?
where can I learn data science
where can I learn data science
@dim moss datacamp
is it free
If you're using python, Nass, and you assigned that array to a variable name, type
type(var_name)
to see what you got.
I'm not asking about what is in the file. I'm asking about the type of your data structure in your python shell.
cuz the type/structure will impact what you can do with it.
is data camp free
The best way to pick up data science is you build yourself your own project, cloneb.
Practice is important
@dim moss check maybe some pinned messages
@dim moss check maybe some pinned messages
@molten hamlet nah
what is kaggle
oh it is google some community
I need a free data science learning scourcw
I need help with an array, is someone available ?
what?
yo can i DM anyone with my colab link my shit is crashing
like im tryna run this GAN but its crashing
@cobalt jetty sorry to bother you would you be able to help me
Hey, I'm still in class for the next 5 hours. Maybe then. But understand I've never implemented a GAN.
no worries, is it cool if i DM you and you can look at it when you're available?
I'll ping you here when I'm available.
Hi I have to calculate the percentage for maths score but with a maximum of 130 can someone assist me
I have faced this problem before, solution is simple: update matplotlib
@earnest forge This didnt work
ImportError: DLL load failed while importing ft2font: The specified procedure could not be found.
Still getting this
if anyone could help me out
@dim moss first, if you know Python basics and stuff, you should learn data analysis before moving to data science.
I could help you with resources to learn data analysis and data science, PDF files.
Is there a faster way to find a linear combination of several large numpy memmaps?
I can't load them all into disk
@sage rock try to import marplotlib without %inline
can anyone help me figure this error out
ValueError: Dimensions must be equal, but are 16 and 60000 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](generator/activation_17/Relu, mean_squared_error/Cast/x)' with input shapes: [16,28,28,1], [60000,28,28,1].
im inputting a dataset with batch size of 16
but i cant seem to batch the mnist dataset to the same size
@lapis sequoia it'd be better if you provided an exerpt from your code?
i can paste it one sec
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
@earnest forge
i just changed the batch size to 16
oh
ValueError: Dimensions must be equal, but are 16 and 60000 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](generator/activation_17/Relu, mean_squared_error/Cast/x)' with input shapes: [16,28,28,1], [60000,28,28,1].
@lapis sequoia anyway, it says your second input is the size of 60000, not 16
lol
make sure you pass it the right data
just wanted to say that i think ive gotten my model working and i want to thank everyone in here for all their help because i think i was on the verge of a mental breakdown i have a very good feeling that this shit will look like hot ass but we made it
is anyone good with nltk that I could ask a beginner question?
When was save_fig() introduced in pyplot? Is it recent?
Does anybody here know good dimensionality reduction techniques for binary data? I'm working on a data science project, with a dataset of 130 binary attributes. I'm looking for something that could be easily implemented in Python using sklearn or similar
@tribal wind I have **some experience with nltk, so shoot your shot
Anyone do Monte Carlo sim?
They are very easy to find through Google solar
You should increase your batch size if your hardware allow it, @lapis sequoia
You're also missing
validation_split=0.2,
subset="validation"``` in your preprocessing function to create test_ds
Hi, I am very new to Python. By that I mean, I only started learning it about 1 month ago as part of my university physics course.
I have been given a csv file with wavelength (x) and 31 sets of observations (y)
I need to fit a linear + gaussian model for each observation (though if I do it for one I can probably repeat a similar thing for the remaining observations)
I have fit a linear model to the first observation data but I am struggling with fitting a gaussian fit
We are using the numpy and matplotlib packages
As you can see the linear fit is there
I am supposed to fit a gaussian to it, similar to how it is done in this example
If someone could help me with using the scipy.optimise.curve_fit() function, it would be appreciated
I need to make initial guesses for the peak and width of the gaussian, and that should hopefully be automated so that I can repeat it for the other 31 observations
Hello everybody, Is there an discord chanel or a forum focused on python Pandas?
create a Figure with multiple Axes and pass them manually
this is it, unless you mean a different server
a long time ago
you're missing a dependency. did you try Googling?
Actually I think it is a function I was supposed to create... not sure if there is a save fig method?
Was sorta testing the new reply feature
Hey. I need some help with data science within 2 hours. Can I pay somebody here to help me 1:1?
Sorry if the wrong place. Have yet to find anyone who wants to help me 😦
Paying $60 for some quick q&a
I know python basics
Can anybody tell me how feed forward works in a conv model
what do you mean?
a convolutional layer?
what dimension?
Conv2d
So we have 32 filters
That mean if i enter 1 img that img will be 32 imgs after passing the first conv layer
Isnt it?
uh
no?
okay, purely in the abstract sense
and assuming you're using same padding
say each image is of shape (w, h, c) (width, height, channels) and you pass it into a layer with k filters, the output will be of shape (w, h, k).
Okay that makes some sense
I thought each filter will be applied on the img and get a new img
no
But it is not that simple
each filter is applied on a channel.
Thanks dude
therefore, the number of parameters for a layer with k filters of size (fw, fh) is (fw * fh) * k * fc + k
yw
Hey guys!
How are yall?
is there anyone here who is really good with neural networking?
just ask your question
Ok ok. So i am wanting to build a game bot that plays through Pokemon Emerald all on its own. But i am having trouble getting it set up
well
maybe you could elaborate on what trouble you're having
in general it would be better to ask a question
which people can answer
without having to ask for more info.
just about everything. I cannot find any resources on how to create a game bot.
well
then this probably isn't a good place to start
it's not as difficult or specialised as it was
but it's still a fair bit of work.
I'm wanting it to play on an emulator and i want the bot to recognize the emulator and play through it
how much Python and DL experience do you have
So what is ur problem
how to i get it to recognize the emulator as a whole?
What u r talking bro
Take a screen of ur game
Feed it into ur classifier
Predict an action
uh
gonna be honest
do not really know how the classifier works
just kinda built it
It doesnt predict the right?
I am going to predict that this will not work
at all
Or got error during training
Y bro?
i think itll work
many reasons, but the simplest one is
not much of the gamestate is exposed by the screen
Yaaa
Is it a strategy game?
its pokemon
Never tired out..ive a bot but it was of dino run
which is a totally different game from Pokemon
Which worked pretty nice as it can be easily predict through screen data
doing this would be like creating a poker AI that looks only at what cards you have
well if you have extensive knowledge of the game it should not be hard to know the game state, just coding it is the hard part for me
hm are you sure you understand what my problem is
i think i do, but i am not sure
okay.
so
you said you think
it's enough to predict the next action
given a screencap.
that doesn't make sense for multiple reasons.
the first is that a screencap won't, for example, take into account what Pokemon you have
or where in the story you are (because you might need to backtrack, for example)
the second is that you're going to need a ton of training data that more or less requires you to play the game yourself
where will you get that?
well, if i play the game will that work?
what if i get a few people to play it?
you could try it
but my guess is not enough data
if you said
write AI to handle a subset of the game
like battling
that would be much simpler
to play the whole game?
I think you underestimate the scope of that project.
by a lot.
unless you hardcode a ton of stuff, but even then
Pokemon vs Dino Run is like chess vs tic-tac-toe.
how are you going to translate your knowledge into code?
it's not that it wouldn't help
that is the very least you need
and it's nowhere near enough
i do not know. That is why i need help.
Translate the image data into feature set as u know bout the game
I would suggest you build an AI for a much simpler game first
The reason i want to do this big of a project is because i want to learn as i go
then you can properly appreciate how difficult this is
also it wouldn't be efficient to do this IMO
I have a decently simple data science question that I posted in #help-croissant , if anyone would be able to help me out it would be much appreciated 🙂
would probably make more sense to just pull data from the game's memory
don't advertise your help channels please
but you can just post here
the question
@rotund sail pandas methods in general make copies; they do not modify inplace.
anyway, that's a bad way to do things
you should use vectorised filtering
penguin_data = penguins.loc[penguins['species'] != 'Chinstrap', ['species', 'flipper_length_mm']
look up the .loc indexer.
in general, if you have a for loop in pandas code, you're doing things wrong.
well gm, what do you think would be easier?
also look up the inplace parameter.
Oh really? I took a data science course at my university last year and they wanted it
there is no easy way.
nope, it's wrong
100%
Essentially every project we did utilized a for loop lol
200%, actually
but I would suggest at least learning the basics of DL (and in particular RL)
and then
writing an AI for a simpler game.
well I mean
you don't have to take what I say as the truth
no! i mean an easier game
Tic tac toe should be used to learn RL
ok cool! and yes im gonna take advice. I do not know what im doing haha !
Cos that would be too easy for DL
But eventually u will know
im trying, i dont have the money to go to school so im trying to learn online
not really?
just for fun
or rather
not right now
Nice, was just curious since you seemed rather knowledgeable about it
So why are for loops bad usage in DS?
no, not in DS
just in pandas (and not all the time, but in general)
okay, pandas DataFrames use numpy arrays for storage
these arrays have fixed sizes.
what do you do for work?
so every time you remove or add a column/row, you're actually creating a whole new array (and DataFrame wrapping it).
in that for loop, therefore
for every row that doesn't satisfy your condition, you create a new DataFrame
oh, so it's just very inefficient
so say there are N such rows; you end up creating N - 1 throwaway DataFrames
yup
that's the first thing
secondly
modern processors have something called SIMD instructions
which basically let them perform arithmetic on more than one memory address at a time
if you use a for loop, this optimisation isn't triggered
So, for efficiency purposes, I should read up on vectorization
as an illustration
import numpy as np; a = np.arange(10000000); b = [v + 1 for v in a]; print('comprehension done'); c = a + 1; print('vectorised done')
run this
and watch the prints
backend engineering, mostly
some frontend
I'm looking at going back into DS/ML though
so there are two problems with the for loop.
one, it's not vectorised (slow)
two, it creates throwaway objects (slow AND wastes memory)
well is it ok if i keep asking ya questions?
don't ask me specifically, but you can post stuff here and whomever will answer
well its about jobs
try #career-advice
okie!
so, using that book you gave me should help a lot right?
I really wanna get to make a A.I to play pokemon
it's a start.
and tic-tac-to is a good game to start with?
for the 12 people who haven't seen this yet https://www.youtube.com/watch?v=aircAruvnKk&ab_channel=3Blue1Brown
Home page: https://www.3blue1brown.com/
Brought to you by you: http://3b1b.co/nn1-thanks
Additional funding provided by Amplify Partners
Full playlist: http://3b1b.co/neural-networks
Typo correction: At 14 minutes 45 seconds, the last index on the bias vector is n, when it's supposed to in fact be a k. Thanks for the sharp eyes that caught th...
are there any resources for converting excel spreadsheets to python
Do you mean read excel files? Or is it something else I'm too inexperienced to understand? If it is the former, there is a read_excel() function in pandas.
Hello guys, do you happen to know any free resources for learning statistics and probability? What I want to do is supplement a course I'm learning on statistics to have a better intuitive understanding of the things, with some interactivity and stuff. I have in mind jupyter notebooks or webpages or books or anything at all. One I tried is Think Stats by Allen Downey but he doesn't delve too much into the mathematics so it isn't that helpful to me.
Hello (:
I am looking for a decent image comparison algorithm. I have looked into MSE and SSIM so far. Can someone recommend another algorithm except LSH or using OpenCV?
My end goal is to create a large scale image comparison (hand written signatures) algorithm most likely using Tensorflow.
Thanks
Ping me (:
if i don't get it wrong, you want to use deep learning to compare images. a very fundamental way is that you can use a pretrained VGG model to generate features for two input images and calculate cos similarity
Sweet! I'll look into VGG right away.
There was also this one thing I couldn't find on the internet. How do I store multiple images in a file and import it in python rather than using img.open everytime
Thanks a lot for VGG heads up by the way (:
in tensorflow there is a module called tf.data.Dataset where you can store and read all the image in the form of arrays (or tensors)
you would encounter it anyway since you are going to use VGG
Awesome! This makes things a lot easier for me. Thanks bud
you are welcome
Hello,
Can someone help me in the help-nitrogen chanel. I've described the problem I am facing in that channel
Thank you!
Is there a way to make a nested dataframe accessor?
e.g.
@pd.api.extensions.register_dataframe_accessor("base.second.third")
Can someone help me? I have a list of times it takes my server to proccess and I want to find anomalis but for some reason It's not working,
I'm doing the following if
stdev = statistics.stdev(times)
if min(currentTime) + stdev < statistics.mean(times):
But it's not working, it doesn't find the anomalis or mark normals as anomalis
Hi, can someone help me in the -krypton channel? I'm new to python and it's just some simple excel manipulation, but I don't start python in my grad program until next year.
Hello, I am looking for a way to automatically classify JSON data that may or may not have headers, using an external site like Wikipedia to determine context and collect tags. Is there a script I can look at?
Please ping me if anyone knows. Thanks.
Heya anyone here familiar with protobuf and parsing tfrecords?
I'm having issue parsing tfrecords and not too sure how to work with it since I'm not familiar with protobufs at all.
so waht is up
Hello guys, i am look to create a function take as arguments continuous data and bin number and create a frequency table with 2 columns (pandas), the bin ranges and the frequency count.
The data input should be a list like range(0,1500).
Anyone has any ideas about this?
do you need to create function @rich silo , or can you use function like pd.qcut()? (assuming thats what you meant by binning continous value)
That's what i have tried at first but i couldnt get it to the format that i wanted meaning the 2 column table
Maybe i am missing something obvious......
how would output look like? can you give an example, will make it easier to understand exact objective
Yeah kinda. can this be converted into a pandas table?
definetely
you can also perform some cleanup, to format the data, incase you need
Something else can the bound of the bins be open?
For example , to also have less than -0.05 and more than 50
your min/max values becomes the bound, when you use cut
A i see. i think i got it from here
Thanks a lot for your helps
help
I am quite new to this
sure, no problem
@green hemlock its seems that now it creates the bins but all of them have the same number of observations.
Could i perhaps use numpy linspace to try and sort this?
they are not same
Last 2 values are 386, rest are 387
And yeah, you can use linspace, but your problems should be possible to be solved by cut/qcut.
@rich silo
I have just done it using linspace now it looks like this:
Is there anyway to sort the bins from lowest to greatest and also can those be formatted as percentage for example
Can you try qcut and see the results
And by sorting lowest to greatest, do you mean frequency or 1st value of tuple?
Your best best would be extract the numbers by treating it as string, or replacing the last ] with ), and then use ast.literal_eval for converting it into python tuple. Will make it easier to sort then
Hm that's what i was thinking as well although string manipulation is always painful to me
I am not sure, if there is any other easier way, but if there is, let me know too
On classification reports for sklearn, I'm finding it hard to wrap my head around what recall actually means. I know support is the total number of that variable present in that dataset, f1 is the harmonic mean btw precision and recall, predictions is the percentage of right predictions over the total of the dataset being predicted, but can't seem to understand recall.
I'd appreciate if someone explains it to me like a 10 years old. Tha is.
*thanks.
im trying to find away to convert latex expressions into images however i cant find a way to do this, any help would be apprecieated
say I have 10 dogs and 10 cats in a room
and I tell you "go into the room and get me all the dogs".
Listening...
but you're not very good at differentiating dogs and cats
so you can make the very safe play
and bring out every single animal
or
you could choose only those you are very sure are dogs.
in the first case, you get 10 dogs and 10 cats.
which is good in one sense, because you got every dog that was there to get
but you also have lots of stuff that I didn't ask for (cats)
in the second case, maybe you only got 2 dogs?
but you didn't get any cats
which is good in a different sense, because you didn't get any extraneous rubbish.
precision measures the second sense of goodness: how much stuff you got that wasn't relevant
recall measures the first sense of goodness: of the relevant stuff that was available, how much did you get?
and that's generally why you want measures that combine both, like f1 score: because you usually want results that are largely complete (get most of what you're looking for) and relevant (don't contain much of what you're not looking for)
make sense?
in what context?
like you pass a LaTeX expression to code and it generates a .png or .jpg or something like that?
So recall is like the percentage of right predictions you got (dogs) over all the stuff I brought out (dogs and cats). So recall is 0.5 in the first instance?
@green hemlock yes and yes.
for the percentage thing, pass normalize=True to value_counts
to sort, call .sort_index().
no, that's precision
recall is the percentage of correct predictions over all the correct predictions there are to make
so there were 10 dogs and you got all 10
recall is 1
in the second case, there were 10 dogs and you got 2
So recall in the second case is 0.2?
yes exacty, i give it a latex expression and then a png is rendered showing it eg:
sin(sqrt(x**2 + 20)) + 1
did you Google "render latex with Python"?
ye
p sure MPL can do that @blissful pendant
Ooh, okay. Thanks. I really appreciate @velvet thorn 🙏🏿
yes but sympy wasnt working correctly, prob should have specified that
can you save directly from it?
probably need to do some stuff
but yeah just go try it out
ok thx
did anybody heard about geopandas?
!d pandas.DataFrame.dropna
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)```
Remove missing values.
See the [User Guide](../../user_guide/missing_data.html#missing-data) for more on which values are considered missing, and how to work with missing data.
Parameters **axis**{0 or ‘index’, 1 or ‘columns’}, default 0Determine if rows or columns which contain missing values are removed.
• 0, or ‘index’ : Drop rows which contain missing values.
• 1, or ‘columns’ : Drop columns which contain missing value.
Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
**how**{‘any’, ‘all’}, default ‘any’Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
• ‘any’ : If any NA values are present, drop that row or column.
• ‘all’ : If all values are NA, drop that row or column.
**thresh**int, optionalRequire that many non-NA values.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna)
@glad mulch show code
you still have a threshold on it, so those are probably just nans that werent dropped since it was below the threshold