#data-science-and-ml

1 messages ยท Page 383 of 1

serene scaffold
#

you're not supposed to.

violet gull
#

why not

serene scaffold
#

because each prediction counts towards tp, fp, tn, or fn

violet gull
#

what

serene scaffold
#

and then you pick a performance metric that best reflects what you want to know about the model's performance

#

true positive, false positive, true negative, false negative

#

if there's only two classes, you can basically just look at true positives and false negatives.

#

and then the performance is tp / (tp + fn)

violet gull
#

i not understand

serene scaffold
#

do you understand what classification is? and what a prediction is? if not, I can explain it to you.

violet gull
#

classification does the maffs to see what it looks most like

serene scaffold
#

sort of. classification is when you have categories ("classes") and you have a program that looks at data points and decides ("predicts") which category they belong to.

#

so you're making a classifier that predicts if something "is square" or "is not square"

#

make sense?

violet gull
#

ye

serene scaffold
#

so if your model says its a square, and it is a square, that's a true positive

violet gull
#

yes

serene scaffold
#

if it says it's a square, but it's not a square, that's a false positive

violet gull
#

ye

serene scaffold
#

so, why don't you rewrite your code so that it counts tp, fp, tn, and fn

#

and then reports (tp + tn) / (tp + fp + tn + fn) at the end

violet gull
#

so just redo the score function

serene scaffold
#

sure, start with that.

#

and see what the score is

violet gull
#

@serene scaffold ```py
def score(self):
tp = 0
tn = 0
fp = 0
fn = 0
for square in squares:
if self.classify(square) == True:
tp += 1
else:
fn += 1
for notSquare in notSquares:
if self.classify(notSquare) == False:
tn += 1
else:
fp += 1
return (tp + tn) / (tp + fp + tn + fn)

#
Generation: 1 Score 0.5!
Generation: 2 Score 0.5!
Generation: 3 Score 0.5!
Generation: 4 Score 0.5!
Generation: 5 Score 0.5!
Generation: 6 Score 0.5!
Generation: 7 Score 0.5!
Generation: 8 Score 0.5!
Generation: 9 Score 0.5!
Generation: 10 Score 0.5!
Generation: 11 Score 0.501!
Generation: 12 Score 0.5!
Generation: 13 Score 0.503!
Generation: 14 Score 0.501!
Generation: 15 Score 0.501!
Generation: 16 Score 0.501!
Generation: 17 Score 0.501!
Generation: 18 Score 0.502!
Generation: 19 Score 0.503!
Generation: 20 Score 0.503!
serene scaffold
#

replace if self.classify(square) == True: with if self.classify(square):, and the other one with if not self.classify(notSquare):

#

so that I can be happy

violet gull
#

ok

serene scaffold
#

anyway, since there are only two classes, this means that your model is pretty much random.

violet gull
#

two classes?

#

meaning not square or square?

serene scaffold
#

right

violet gull
#

ok

#

what does this decimal made by (tp + tn) / (tp + fp + tn + fn) mean btw?

#

its up to 0.55

serene scaffold
#

and 45% bad.

violet gull
#

thats better than 50% good and 50% bad

serene scaffold
#

well, if there are only two classes, and the chances of it being in either class are 50/50, then 50% isn't really good.

iron basalt
violet gull
#

yes

#

but its beating the average

serene scaffold
#

by 5% ๐Ÿ˜›

iron basalt
#

It is, but you want to beat it by a lot.

violet gull
#

i would bet on those odds

serene scaffold
#

have fun losing all your money ๐Ÿ˜›

violet gull
#

casinos bet on 1% and get rich

#

i have a whole 7%

serene scaffold
#

except for the people who lose everything

iron basalt
#

The casinos have a lot of money, on average over time they win, but it takes a while and a bunch of money.

violet gull
#

alright how i make my thingy more accurate

iron basalt
#

(And they do way more than 1% for other "games")

serene scaffold
violet gull
#

so it makes a neural net that has 3 doing thingy matrixes

#

if puts in the 121 data points

#

it does the first doeey thingy matrix and brings it to 80 data points

#

it does it again and brings it to 40

#

then 2

#

it does 100 of these

#

it takes the best 50 and copies them over the bad 50

#

changes one of the matrix numbers by a tiny amount

#

and repreat

#

@serene scaffold

serene scaffold
#

I don't really know about neural architectures for identifying shapes, esp when they're clearly rule-based. but it sounds like you're on the right track.

violet gull
#

rip

#

maybe there is a better model?

#

idk why this one has limits

serene scaffold
#

all models have limits ๐Ÿ˜› but I'm sure there's one that suited to this task.

urban prism
#

I'm trying to calculate the jaccard score between two values that are like:
[[[0]
[0]
[0]
...
[1]
[0]
[0]]

[[0]
[0]
[0]
...
[1]
[1]
[1]]
(dims=(256, 256, 1))
Though sklearn.metrics.jaccard_score seems to only compare lists with structures like [0,0,0,1,1,0,1]
Any ideas on how can I calculate this?

iron basalt
urban prism
#

ValueError: Classification metrics can't handle a mix of continuous
and binary targets

#

Alright .astype("uint8") worked

violet gull
#

is there one of these that would work best for my squares and not squares?

Logistic Regression
Decision Tree
SVM
Naive Bayes
kNN
K-Means
Random Forest
Dimensionality Reduction Algorithms
Gradient Boosting algorithms
GBM
XGBoost
LightGBM
CatBoost```
urban prism
serene scaffold
violet gull
#

@serene scaffold logistic regression looks good

serene scaffold
violet gull
#

so good?

serene scaffold
#

can you explain what logistic regression is, according to your understanding?

violet gull
#
Letโ€™s say your friend gives you a puzzle to solve. There are only 2 outcome scenarios โ€“ either you solve it or you donโ€™t. Now imagine, that you are being given wide range of puzzles / quizzes in an attempt to understand which subjects you are good at. The outcome to this study would be something like this โ€“ if you are given a trignometry based tenth grade problem, you are 70% likely to solve it. On the other hand, if it is grade fifth history question, the probability of getting an answer is only 30%. This is what Logistic Regression provides you.
#

thats from the website i look at

serene scaffold
#

sure, but you just copied that. that doesn't tell me if you understand it

violet gull
#

it has 2 outcomes and uses a big data set to increase the probability of getting it right

#

@serene scaffold

urban prism
#

I have a custom data generator that I feed into model.fit(). Is there a way for me to access to some of that data outside of .fit()?

misty flint
#

just create an intermediate object like a pandas dataframe

#

then you can access said dataframe later

#

if its something like randomly generated values, store them into a list, a np array, etc.

urban prism
#

Memory is an issue tho

serene scaffold
#

@urban prism does the algorithm you're using support partial_fit? Because if you're passing a generator to a function, the things that generator generates doesn't get saved anywhere else, no.

#

But with partial_fit, you don't have to have every training instance in memory at once.

urban prism
#

Thanks for the idea. I'll check on it

urban prism
#

Is there a guide for postprocessing? Been trying to apply morphological expressions to some semantic segmentation outputs to no avail

serene scaffold
urban prism
#

Like applying closing, opening, erode and such to the predicted masks to make them better

#

Make them look closer to the actual masks

serene scaffold
#

erode?

urban prism
#

cv2.erode

serene scaffold
#

are you basically trying to take a sequence that represents some passage of text, and break it down into subsequences that correspond to sentences and sub-sentence units?

urban prism
#

No

#

I'm talking about masks that are predicted after image segmentation

urban prism
serene scaffold
#

why do you need open cv for a natural language problem?

urban prism
#

Its an image segmentation problem

frank moth
#

Hello, I'm trying to use pmdarima in my jupyter notebook. I've tried uninstalling and using conda, uninstalling and using pip but I can't seem to import pmdarima. When trying to install it from conda it keeps going through:

Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
serene scaffold
frank moth
#

I've tried uninstalling conda and using pip as well but it won't seem to import

serene scaffold
#

conda sucks you in to an approach to dependency management that no one outside of data science uses, yet data science instructors tell people to use it before those people even know when using it might be advantageous.

serene scaffold
frank moth
#

Should i try uninstalling everything again and doing pip?

#

I'll do that rn and get the error message

serene scaffold
#

if there's an error message, I need to see the whole thing.

#

oh, you're talking about the "solving environment" one

#

that means you're still using conda.

#

before I tell you to uninstall anaconda, who told you that you need to use it? an instructor for a class that you're taking?

frank moth
#

Yeah, I'm getting rid of conda atm to try another way

#

yeah the previous classes told me to use conda and when i use the school VM it's all conda

serene scaffold
#

then I guess you have to stick with it. but I can't help, in that case.

#

for what it's worth, I work for a research company, and we're moving away from anaconda.

frank moth
#

We don't have to use it, it's just recommended. I just really need to get my code to work in the end, so I'll try any other way. Do you have another recommended way?

serene scaffold
#

and the python channel on our slack is pretty much always people complaining about conda.

#

Do you have another recommended way?
just making a virtual environment (which is a feature that comes with python) and using actual pip, not the pip that interacts with conda.

#

I mean I guess it's the same pip under the hood, but if you use a normal python virtual environment without touching conda, pip won't install it to a conda-based environment

frank moth
#

You mean after uninstalling everything, reinstall python and using jupyter separately or something like pycharm then doing pip install pmdarima from cmd?

misty flint
#

@serene scaffold have you worked with knowledge graphs before stelercus

serene scaffold
misty flint
#

or just formal ontologies?

misty flint
serene scaffold
#

still learning more about them though.

misty flint
#

ah, just wanted to ask about any resources

#

you found useful or texts

serene scaffold
misty flint
#

oh man this looks super promising

#

thanks bud. definitely going to look through this one

serene scaffold
#

let me know what you think. I think I saw that they use TSV files pretty extensively, and that makes one wonder how it performs as compared to a "proper" graph database like neo4j.

misty flint
#

yeah i was also looking at neo4j

serene scaffold
#

the neo4j query language, cypher, is pretty fun

iron basalt
#

You may want to ask in #databases for advice on this.

serene scaffold
misty flint
misty flint
#

I do have a DS question tho

#

im being asked if i could "use ML to improve search results" for this company platform thing

#

and im like...idek where to look to solve that type of problem

#

me: just throw elasticsearch at it RunFail

frank moth
#

is there a python version that definitely works with pmdarima?

serene scaffold
iron basalt
misty flint
frank moth
graceful glacier
#

given the first column

#

what would be the best way to get the second column

lapis sequoia
graceful glacier
#

a for loop lol

#

it works

lapis sequoia
#

can you show the code?

graceful glacier
#

but inefficient

#

sure

#

{'Age Group': {0: '13-14', 1: '15-16', 2: '17-18'}, 'Hours teaching per week': {0: 1, 1: 2, 2: 4}, 'Start_age': {0: 13, 1: 15, 2: 17}, 'End_age': {0: 14, 1: 16, 2: 18}}

#
df_hours['Range'] = [list(range(i, j+1)) for i, j in df_hours[['Start_age', 'End_age']].values]
lapis sequoia
#

i think .apply would be faster.

#

but I'm trying to find something better than that.

#

but yeah .apply would be faster than this.

#

!d pandas.Series.apply

arctic wedgeBOT
#

Series.apply(func, convert_dtype=True, args=(), **kwargs)```
Invoke function on values of Series.

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
#

@lapis sequoia :x: Your eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 4, in <module>
003 |   File "/snekbox/user_base/lib/python3.10/site-packages/pandas/core/frame.py", line 8740, in apply
004 |     return op.apply()
005 |   File "/snekbox/user_base/lib/python3.10/site-packages/pandas/core/apply.py", line 688, in apply
006 |     return self.apply_standard()
007 |   File "/snekbox/user_base/lib/python3.10/site-packages/pandas/core/apply.py", line 812, in apply_standard
008 |     results, res_index = self.apply_series_generator()
009 |   File "/snekbox/user_base/lib/python3.10/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
010 |     results[i] = self.f(v)
011 | TypeError: <lambda>() missing 1 required positional argument: 'y'
lapis sequoia
#

will mess in bot commands

#

!e

import pandas as pd
d = {'Age Group': {0: '13-14', 1: '15-16', 2: '17-18'}, 'Hours teaching per week': {0: 1, 1: 2, 2: 4}, 'Start_age': {0: 13, 1: 15, 2: 17}, 'End_age': {0: 14, 1: 16, 2: 18}}
df = pd.DataFrame(d)
df['lst'] = df[['Start_age', 'End_age']].apply(lambda x: list(range(x.Start_age, x.End_age+1)), axis=1)
print(df.lst)
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 | 0    [13, 14]
002 | 1    [15, 16]
003 | 2    [17, 18]
004 | Name: lst, dtype: object
lapis sequoia
#

@graceful glacier this would be faster

#

i can't seem to find better solN right now since this is not the usual operation we do in pandas.
there are hella vectorized methods but can't find one for this.

graceful glacier
#

thanks for helping

solar phoenix
#

Hi
I started exploring concepts in AI and machine learning. I watched a workshop about Reinforcement Learning where we used the Open AI Gym and used Q-Learning. I did some reading on approaching the MountainCar-v0 environment as well.

If I would like to do a side project at some point using an ML concept, do you advise me to continue to explore other topics before starting or do you think that I should try to do something with what I've learned about Reinforcement Learning as the base?

prime hearth
#

Are these the only algorithms you know for ML? Also side projects are to target a field of ML and showcase skills related to jobs in area

worldly dawn
solar phoenix
solar phoenix
prime hearth
#

Oh okay, i mean i never learned reinforcement learning but like a lot of the problems on Kaggle use other ML algos

#

Like NN or regression classification etc

#

And if building a ML project, the solution may require to try other ML algos for best accuracy

solar phoenix
#

Oh that's good to know

#

There are other workshops that I have access to about other topics

prime hearth
#

So you are on right track , yeah try learning bit more not too much though like there no need to learn all ML algos just a few so you have knowledge

#

This is just my opinion but cus like if want to build ML project, need to find a problem or can use one you know or kaggle etc and part of ML lifecycle is trying other solutions that are appropiate

solar phoenix
frank moth
#

Hello, I've differenced a timeseries so that I could put it in an arima, after I got the fitted prediction I am trying to reverse the differencing by using cumsum but it does not seem to work, the prediction is shifted downward and I'm not sure how it became that way. Here's an image of my fitted model on top of the differenced original data

datatimeseries.diff(1)
prime hearth
#

No problem and also when building ML project it be good idea to learn best practices and how to deploy ML model, usually this involves dockers and frameworks with pythonโ€ฆ

iron basalt
# solar phoenix Hi I started exploring concepts in AI and machine learning. I watched a workshop...

Reinforcement learning is one of the approaches. For AI I recommend knowing at least one algorithm from each of these: https://en.wikipedia.org/wiki/Machine_learning#Approaches

Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed ...

#

With that knowledge you should be able to come up with some ideas on how to tackle just about any problem.

#

There are others but these are the most commonly known / used.

#

Combing approaches is common and often required (for getting any decent results).

#

Note that in the models section of this page artificial neural networks seem to have the most stuff going on but that is in large part due to many models being labelled as such even though their connection to actual neural networks is near non-existent in many cases (other than nodes that feed into each other with some kind of "activation" value and parameters).

#

(Which is kind of cheating since pretty much all algorithms can be represented in some way by a (compute) graph (a good way to visualize it though))

solar phoenix
#

Thanks @iron basalt for this. Appreciate your help

iron basalt
#

Reinforcement learning on its own (tabular) does not give you much, it always needs something else to support it.

solar phoenix
#

Oh I see

iron basalt
#

It's in part due to RL being flawed (for another discussion though), and also because RL is kind of high level thing.

#

It being high level means it needs some other parts to do a bunch of work for it to make the problem more approachable.

#

And that typically involves one or more of the other approaches to ML/AI (some of them produce intermediate results which act as a simplified "view" of the problem and/or give some structure to make it easier (may even be problem-specific structure knowledge for best results)).

night gorge
#
#Scaling data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# transform data
dfScaled = scaler.fit_transform(df[["satisfaction_level","last_evaluation","average_montly_hours","Age"]])

dfScaled =pd.DataFrame(dfScaled,columns=list(df[["satisfaction_level","last_evaluation","average_montly_hours","Age"]]))

dfScaled = df.drop(["satisfaction_level","last_evaluation","average_montly_hours","Age"],axis=1).append(dfScaled)

dfScaled.head()```

All 4 colums are giving NaN when appending. Why is that and how to avoid it?
lapis sequoia
#

in simple words?

tacit basin
lapis sequoia
#

here energies are scalar values, what should i use to show it? the boxes are simply matrices and last circles are neurons, but how to show energies part?

desert minnow
#

Hello all, Im trying to build a ordinal classification model (basically ranking prediction). Is there a python library that has ordinal regression(rank)

misty flint
#

ah i figured out my search issue

#

i think im going to take my web search/info retrieval class first

#

to understand more of the fundamentals in the field

#

before looking at state-of-the-art

pastel valley
#

earlystopping by definition is stoping the training if it observes that the performance of the model is not improving right?
is there any drowbacks with using this for example i traing with 100 epochs and used earlystopping and the training stopped with 23rd epoch with an accuracy of 89% is there a possibility that if i did not use earlystopping i could gain more accuracy ?

agile cobalt
#

probably depends on which model you're using?
if it has reached the local (or global) minimum/maximum, further training wouldn't help much, but you could try modifying the learning rate

pastel valley
#

it means there is still that possibility that if you dont use earlystopping your model will learn more?

misty flint
#

earlystopping is used to mitigate overfitting. thats why we use it in the first place. if you think your model wont overfit, i guess you could not use earlystopping.

pastel valley
#

how is this performance? is it over fitting weird bad or ok?

#

based on the tutorials i see the patterns they get is pretty good like steady increase no big spikes like that

tacit basin
pastel valley
tacit basin
pastel valley
tacit basin
pastel valley
#

yes

#

this one gives me the best state of the model when earlystopped? did i understand it correctly?

tacit basin
pastel valley
#

btw also another question aside from earlystopping
resnet final layer is 1000 fc softmax layer so if i plan to add another layer then the number of units should be less than 1k?

tacit basin
pastel valley
tacit basin
pastel valley
pastel valley
#

btw i just use summary() in the resnet50 on keras

#

it doesnt include the fc on keras right?

#

the architecture i see on google it has 1000 fc

#

this is the resnet50 right? what flops mean?

neat anvil
#

In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measure than measuring instructions per second.

#

I'm guessing they're saying that's the required FLOPs to perform some constant number of predictions with that model architecture per second

#

over the different model architectures

pastel valley
#

oh thats for efficiency maybe

pastel valley
tacit basin
lapis sequoia
#

is this the right place to ask query on .fits image

serene scaffold
#

@lapis sequoia I don't know what that is, but you can ask and cut/paste your question to the correct channel once we have enough information to ascertain what it's about.

lapis sequoia
#

I have a folder in which I have 500 . FITS images. This images are opened using astropy.io import fits (just for information). I have written code lines which reads through the header of each image and measures a desired angle parameter. The range of the angle needed for my study is 60 deg. My question is how can I delete the image if the condition is not met ?

#

I can write a if condition loop. But how exactly can the .FITS image be deletedi

chrome ferry
#

Hi, what degree you need to become a data scientist?

lapis sequoia
#

is this question meant for me?

#

well im just a beginner.

tacit basin
chrome ferry
#

A question for anybody

serene scaffold
#

@lapis sequoia you want to actually delete the file from you computer's hard drive? you can import os and use os.remove("path/to/file").

lapis sequoia
tall crest
#

Hi, i was bored this afternoon and started making chess in order to make an ai for it, i am a beginner at python but i still really would love to code a genetically improving ai, i know i will probably fail terrebly but does anyone have some tips for me? (i am not looking to use libraries)

serene scaffold
tall crest
#

yes, of course i use numpy and stuff if i were to need it

#

i meant i did not want to use neat or something

serene scaffold
#

anyway, what do you mean by "genetically improving"?

tacit basin
lapis sequoia
#

Those files do contain some other important information which I may need in case. So wanted to save them seperately

tacit basin
lapis sequoia
#

delete and save them in a different location

tacit basin
lapis sequoia
#

my aim is to remove the un-necessary image which does not fulfil the condition, so that the main folder consists of only the correct images which I will use for the next image processing. But we need few information from the deleted image, which will be useful in future.

lapis sequoia
tacit basin
lapis sequoia
tacit basin
stone marlin
#

Yeah, follow miwojc's code to move then, then maybe use something like regex or some kind of subsetting to get the information out that you want. Either way, you might want to ask this in one of the other help rooms since this isn't related to data science.

fierce dawn
#

hi guys, do you know if it's possible to create something which is able to detect a hand? for example i pause a video and the software is able to take characteristics of the hand and recognise that it's my hand?

mild dirge
#

Depends on how many different hands it must be able to distinguish

#

If it's between a black and a white hand, and the lighting conditions don't change, maybe ๐Ÿ˜›

#

But think that it would be quite hard to classify hands

#

@fierce dawn

misty flint
#

very interesting

iron basalt
#

In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory.
The difference between the orders lies in which elements of an array are contiguous in memory. In row-major order, the consecutive elements of a row reside next to each other, whereas the same hold...

#

In that image the row major is stored as [a_11, a_12, a_13, a_21, a_22, a_23, a_31, a_32, a_33].

#

Accessing any (row, col) is index = col + row * num_columns (num_columns may also be called row_length).

#

(So the "stride" is (3, 1))

#

((num_columns, 1))

misty flint
#

yeah its interesting

serene scaffold
#

what is a stride?

iron basalt
#

For N-dimensions see the bottom of the wikipedia page.

#

"Address calculation in general"

misty flint
#

we have to implement these functions for minitorch

iron basalt
#

So you can either do something like two for loops for row, col, and compute the index in the inner most loop based on row and col (above eq). Or you can create a pointer (or index) pointing to the first element and in the outer loop you would do +3 to the pointer while in the inner +1. Hence the "strides" of (3, 1).

#

Well it's two pointers.

#

One points to the start of each row.

#

The inner one then gets set to that and goes +1 each iteration.

misty flint
iron basalt
#

They are equivalent.

misty flint
#

yeah

iron basalt
#

If you think that the first is more computation you need only realize that you can move the row * col_count to the other loop and it's the same then.

#

(Optimizer will probably do that)

misty flint
#

interesting

iron basalt
#

The difference between a pointer and index is that an index is relative to the start of the array while a pointer is relative to "0".

#

Both are "pointers".

#

However, depending on what you are doing the pointer method can be nicer.

#

But it's still the same.

#

In numpy and pytorch, etc it can be sometimes nice to hack the stride values to do some other computation (aka stride_tricks).

misty flint
#

it wants us to do tensor map, zip, and reduce functions

neat anvil
#

Iโ€™d be careful throwing the word pointer around when discussing low level data structures

#

Youโ€™re gonna hurt someoneโ€™s brain

misty flint
neat anvil
#

If youโ€™re not talking about actual pointers to locations in RAM

misty flint
#

dw my brain is already broken

iron basalt
misty flint
#

anyway i guess its interesting learning how these libraries kinda work

#

not that ill be really using that knowledge i guess

iron basalt
#

Numpy is implemented in this way. I have read its source code to confirm.

misty flint
#

i thought the visuals were super helpful

#

havent actually looked at its source code tho

#

so i will have to check

iron basalt
#

Yea, broadcasting made sense for me when I read the pair-wise distance calculation (I think it was that IDR).

misty flint
#

yeah def gonna read up on all this again before trying to implement

iron basalt
#

(Pro-tip fast k-trees are implemented with a contiguous array also, but the indexing is a bit more complicated)

#

(Using nodes that are made separately rather than all in one array is slow (but still the way it's often taught to beginners))

#

(So you could store a binary tree where each node contains an int in a numpy array (and make a fast search, etc with numba))

iron basalt
# serene scaffold what is a stride?
>>> a = np.arange(16).reshape((4, 4))
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
>>> a.shape
(4, 4)
>>> a.strides
(32, 8)
>>> a.dtype
dtype('int64')
>>> b = a[:, ::2]
>>> b
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10],
       [12, 14]])
>>> b.shape
(4, 2)
>>> b.strides
(32, 16)
>>> b.dtype
dtype('int64')
>>> 
#

Note that the strides in numpy is the number of bytes it moves (8 // sizeof(int64) = 1).

#

So it's actually (4, 1) and (4, 2).

#

When I slice the first array with a step of 2 (aka stride of 2), the shape gets smaller but the stride gets bigger because the slice is still referencing the original, no copy was made. It's just stepping/striding across it differently (skipping some).

#

(When any numpy function operates on a numpy array, in its loops it uses the array's (broadcasted) strides)

#

(Written in a way where the algorithm does not need to worry / care about how to correctly iterate over the arrays, it's encapsulated in an iterator / generator which makes use of the shape and stride information)

#
>>> for v in np.nditer(a):
...     print(v)
... 
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#

The C code internally looks kind of like that for a lot of the operations (it does not need to know / need to deal with the shape/strides directly which lets a single function work for N dimensions / no duplicate code (one impl for 1D, 2D, 3D, slice with step>1, etc)).

marble tulip
#

I am Titnaic DataSet and I want to see if they people with highest fare survived or not. How can I check that df.Fare.value_counts().max() 43
How can I check this with survived Column

tacit basin
tacit basin
short heart
#

is it useless to look at correlation between categorical variables?

pastel valley
#

the top of the models are the inputs and outputs?

tacit basin
tacit basin
pastel valley
tacit basin
gloomy anvil
#

I have a question regarding single-step and multi-step predictions in the SARIMAX model. I posted my questions here to stackoverflow: https://stackoverflow.com/questions/71392886/legacy-code-is-this-one-step-ahead-prediction-can-i-turn-it-into-multistep-pre
My question is if this is in fact a single step prediction and how to interpret the model.predict() parameters

lone drum
#

my current code python mydb = MySQLdb.connect( host="localhost", user="root", password="covid2020",database= f"{db_name}") query = 'INSERT INTO `2020_1_min_8_noida_data` (unique_id_for_symbol ,timestamp, open, high, low, close, volume, full_candle, value) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)' mycursor = mydb.cursor() for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python', chunksize=5000000 , iterator=True)): print('i=', i) # all_value = [] for row in chunk.iterrows(): print('row\n', row) value=(row[1][0], row[1][1], row[1][2], row[1][3],row[1][4], row[1][5], row[1][6], row[1][7], row[1][8]) # all_value.append(value) mycursor.execute(query, value) mydb.commit() i am not getting data inserted in mysql table

#

i am getting empty rows in database table

serene scaffold
solar phoenix
#

Hi, I want to take a standard deviation from a pandas dataframe, and then perform an action if it is larger than another value. When i do this i get an error- "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." I have a feeling that i need to define the standard deviation as a number of some kind, at the moment i do it like this: stand=lastinframe.std(axis=1)

serene scaffold
#

I'll probably need additional material to answer it, but those two are the bare minimum.

#

Please ping me if you decide to show that information.

#

!traceback

arctic wedgeBOT
#

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

serene scaffold
#

please don't ping people to draw attention to your question.

pastel valley
#

thank you sir

pastel valley
#

in this code if does model A and be B just an identical model? or if i train modelA then modelB will also learn?

#

or they will be identical models all from architecture weights and compile info ?

#

is this 2 snippet produce the same outcome?

resnet50_model = ResNet50(include_top=False,
                   input_shape=(144,144,3),
                   pooling='max',classes=6,
                   weights=None)

modelA = Sequential()

modelA.add(resnet50_model)

modelA.add(Flatten())
modelA.add(Dense(1024, activation='relu'))
modelA.add(Dropout(0.5))
modelA.add(Dense(512, activation='relu'))
                   
modelA.add(Dense(6, activation='softmax'))

modelA.compile(optimizer='adam', loss='categorical_crossentropy', metrics=METRICS)



modelB = Sequential()

modelB.add(resnet50_model)

modelB.add(Flatten())
modelB.add(Dense(1024, activation='relu'))
modelB.add(Dropout(0.5))
modelB.add(Dense(512, activation='relu'))
                   
modelB.add(Dense(6, activation='softmax'))

modelB.compile(optimizer='adam', loss='categorical_crossentropy', metrics=METRICS)


modelB.set_weights(modelA.get_weights())
tidal bough
pastel valley
#

its like i can just save the model and try to clone it and do something and if its better then i save it then create a clone again

#

its like i always keep a copy that i can retrieve whenever something bad happened

tidal bough
#

oh thats the term "linked"
not a technical term, I made it up

tidal bough
pastel valley
pastel valley
stone marlin
#

No question, just excited because my new gig got me a jetbrains license so now I get to try out PyCharm, and use DataGrip. :'] Exciting.

tacit basin
#

Something to remember though is when you use any schedulers for training like learning rate for example. Make sure scheduler state is saved and loaded with checkpoint otherwise you will start with high lr is something like cosine anealing is used.

tacit basin
stone marlin
#

I mainly use it as a "database IDE" --- it scans our DBs and allows for autocompletes, common (macro) query competions, and a bunch of other cool stuff.

#

I don't think it would read jupyter notebooks --- that might be PyCharm's thing, but I honestly have no idea, I've only used the Jupyer notebook as a standalone. :']

brave sand
#

so I did this tutorial, are there any project ideas that use the same concepts?

misty flint
#

it almost sounds like a data engineering tool

stone marlin
#

Haha, I'm in the data engineering dept, technically, so that tracks. I'd say it's exactly that.

lapis sequoia
#

Hello I have a conda environment that contains the following: https://www.toptal.com/developers/hastebin/ujarotijeg.md

I would like to install my existing project as a package in development mode within my conda environment by running python setup.py develop. The thing is that I'm new to packaging and I'm a bit lost on how to createย setup.pyย containing all the information of my conda environment(dependencies, name, etc...).

#

Once I installed conda-build, conda comes in with a command calledย conda developย which is supposed to do the exact same thing I describe above, but according to what I read it has not seen any development lately. I'm trying to figure out the best way to have a properly setup package within my environemt that allows me to keep developing and reflect new changes.

stone marlin
#

This is not really ds and ai, you may want to post in the standard help rooms.

lapis sequoia
#

Thank you

craggy tiger
#

Hey folks, looking for interesting projects. Anyone on something?

vague kindle
#

How would one go about saving a model and using that model for other projects?

merry ridge
#

Does taking 30 seconds to load in 27045 rows and 99 columns of an .xlsx into a dataframe by calling pd.read_excel sound reasonable? I mainly ask because this workstation has been having a lot of other computer issues and I have lost all sense for if this is within acceptable bounds or not.

urban prism
#

I wanna calculate my metrics after post processing my initial predictions
I have a validation data generator that reads the files from the disk and returns X,Y. Normally I use model.evaluate(validation_generator) in order to calculate my metrics though now I have a function post_process() that is supposed to take in X and return the processed, new X (And later on calculate metric(X_new, Y)). How can I go with this?

agile cobalt
merry ridge
#

That's not really a hill I am willing to die on. I get what I get and have to deal with it

merry ridge
iron basalt
merry ridge
#

I have no idea how I would even check that

iron basalt
merry ridge
#

Windows 10

merry ridge
#

I can't run that command because of a lack of administrative privilege's. It takes about 4 seconds from excel closed to fully opening the file and being able to manipulate it for whatever that is worth

iron basalt
#
Remarks

    Membership in the local Administrators group, or equivalent, is the minimum required to use winsat. The command must be executed from an elevated command prompt window.

    To open an elevated command prompt window, click Start, click Accessories, right-click Command Prompt, and click Run as administrator.
merry ridge
#

I do not have access to any administrative credentials

iron basalt
#

Time to get an admin then.

iron basalt
#

Anything above is very slow.

merry ridge
#

Alright, that's helpful thank you

hardy jetty
#

Is there a way to easily turn a 3d plot into a 2d plot with matplotlib / seaborn? (e.g. top down or side view)?

merry ridge
#

You could just set whatever coordinate you want to 0, but if you want to do a projection onto an arbitrary plane I would imagine there would be work involved.

hardy jetty
#

hmm

merry ridge
#

In a usual right hand coordinate system a top down view would just be setting the z coordinate to be 0 for all your data

iron basalt
#

(convert to csv, pandas reads csv files much faster)

hardy jetty
iron basalt
hardy jetty
iron basalt
hardy jetty
#

im unable to view from the side or top, trying to figure that out, its just 1 function plotted it 3d space atm

iron basalt
hardy jetty
#

with matplotlib?

#

really?

iron basalt
#

When you call show.

hardy jetty
#

im calling plt.show() but its just an png ;p

iron basalt
#

Show code.

misty flint
stone marlin
#

Yeah, if you can try it out, do so. It's way better than pgadmin, and I'm a fan of pgadmin, haha.

misty flint
#

oh yeah? i def want to take a looksee

#

but you know how data engineering is, so many tools and toys

#

coming out all the time

stone marlin
#

Yeeeeeeep. It's honestly very difficult to keep track of. We've got some tooling that's only 3 years old and it's already deprecated.

misty flint
#

oh no

#

sounds about right tho

stone marlin
#

But the gist of all the stuff is usually the same. Might not be using kafka, but it's always some kind'a streaming thing; might not be using k8s but some kind of container orchestration thing --- so it's not so bad, but, man, is it intimidating at first.

misty flint
#

yeah its def interesting from an outsiders perspective; im still mostly in DS world

#

but i like looking at adjacent fields

#

and exploring to see/gauge my interest

stone marlin
#

Yep! I'm still doing DS stuff, but I'm mostly in an adjacent field now that "enables" DS to do their work better ("Machine Learning Engineering") and it's pretty cool. Part DS, part DE.

misty flint
#

nice nice

#

have you seen the MLOps stuff

#

by whats his name

#

demetrios brinkmann

stone marlin
#

The practice of MLOps, or is there a tool called MLOps?

misty flint
#

practice

#

he just hosts the MLOps community meetups

stone marlin
#

Haha, yeah, that's essentially my job. So, we work in a similar way to the standard google whitepaper. I haven't actually see the community meetups, but I'll check them out now!

misty flint
#

interesting interesting

#

yeah he has a podcast that ive also been listening to

#

the podcasts are basically past meetup speakers

stone marlin
#

This looks very nice! There's not as many resources for MLOps as there are for Devops (even if many overlap) so it would be nice to join up and see.

misty flint
#

they do have a huge community tho

#

with tons of MLOps peeps

#

at least thats what he said on Ken Jee's podcast

#

they even have a section where they discuss various tools and comparing them

#

which honestly sounds super useful

#

lol

#

i imagine if i was in that world

#

def something i want to explore but i def need some cloud experience first

stone marlin
#

Huh, well, they have a slack, so I'll check that out and see. On one hand I was surprised they didn't have a discord, but --- on the other hand, maybe it makes sense, haha.

misty flint
#

haha yeah

stone marlin
#

Oh, def. My recommendation for that, and I feel like a popular rec, is the Cloud Guru series for AWS Practitioner. It took --- a fairly long time to go through, but it was 100% worth it.

#

It's fairly hands-on, but you learn a ton about AWS services (which are basically the same, modulo the names, as GCP and Azure ones --- you can pick those up on the job if you got AWS) and, maybe more important, the terms to communicate with devops people about what you might need, haha.

misty flint
#

more on the dev side

stone marlin
#

There are serverless things, but I'd start on the general Cloud Practitioner one asap, then either at the same time or after, check out the serverless dealies. Your job might even let you expense the monthly fee for A Cloud Guru for a few months.

#

There might be free resources of the same quality, but I've not found them yet. :''[

misty flint
#

interesting interesting

#

yeah they actually seem amenable to that idea

#

well at least i think so

#

yeah ill def check it out

#

thanks bud

#

if i break into ML Engineering, ill let you know lol

#

come back in 5 years ๐Ÿ’€

stone marlin
#

Haha, no problem, def check out the AWS stuff (there might be a free month? idk.) since that's stuff I wish that I had done earlier. :']

glass minnow
#

๐Ÿ”ด what is similarity score(XgBoost) and why we use it can someone explain ?

stone marlin
#

Also, this MLOps slack channel is super professionally done. Thanks for pointing me in this direction, it's something I'm gonna chat around in and check out.

misty flint
pastel valley
lone drum
serene scaffold
#

@lone drum I'm not interested to help if you're not going to use the method I referred you to.

lone drum
# serene scaffold <@680099760836968475> I'm not interested to help if you're not going to use the ...

i tried the code u shared ```python
Traceback (most recent call last):

File "C:\Users\Admin\AppData\Local\Temp/ipykernel_11872/466920832.py", line 9, in <module>
df.to_sql('2020_1_min_8_noida_data_new', con=engine)

File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\core\generic.py", line 2963, in to_sql
return sql.to_sql(

File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\sql.py", line 697, in to_sql
return pandas_sql.to_sql(

File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\sql.py", line 1726, in to_sql
table = self.prep_table(

File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\sql.py", line 1625, in prep_table
table.create()

File "C:\Users\Admin\anaconda3\lib\site-packages\pandas\io\sql.py", line 830, in create
raise ValueError(f"Table '{self.name}' already exists.")

ValueError: Table '2020_1_min_8_noida_data_new' already exists.```

#

my code ```python
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite://', echo=False)

for i, chunk in enumerate(pd.read_csv('E:/latest_data_noida/2020_1_min_8-Dec.csv' , engine='python', chunksize=5000000 , iterator=True)):
print('i=', i)
df = chunk
df.to_sql('2020_1_min_8_noida_data_new', con=engine)``` this way

pastel valley
#

is there a keras way to output total training time?

lone drum
pastel valley
#

what is the difference here?

#

i know average and max pooling

#

but in order to use the output for dense layer is should be 1d right?

#

what is global pooling?

#

flatten makes it 1d but the global pooling?

stone marlin
#

For Maddy, I think it might be the case (?? maybe? I couldn't track down the error.) that if you're running in a jupyter notebook, you're accidentally trying to remake a table in memory that you already have by running the .to_sql command. You may be able to reset the jupyter notebook and try again.

Either way, here's some example code that shows how to make a table and query it. This is like one single chunk of a bigger df.

import numpy as np
from sqlalchemy import create_engine
import pandas as pd

# Sample dataframe.
a = np.random.rand(1000)
b = np.random.randint(-100, 100, size=1000)
c = np.random.choice(list("abcdefghij"), size=1000)

data_bundle = {"a": a, "b": b, "c": c}
df = pd.DataFrame(data_bundle)

# Create the engine in memory.
engine = create_engine('sqlite://', echo=False)

# Create the table using this context.
with engine.connect() as con:
    df.to_sql("cool_table", con=con)

# Sample query using this context.
with engine.connect() as con:
    df_results = pd.read_sql("select * from cool_table where c = 'j'", con=con)
lone drum
# stone marlin For Maddy, I think it might be the case (?? maybe? I couldn't track down the e...

can u please help me in my code of inserting dataframe in table ```python
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite://', echo=False)

for i, chunk in enumerate(pd.read_csv('E:/latest_data_noida/2020_1_min_8-Dec.csv' , engine='python', chunksize=5000000 , iterator=True)):
print('i=', i)
df = chunk
df.to_sql('2020_1_min_8_noida_data_new', con=engine, if_exists='append')```

stone marlin
#

What is the error you're getting now?

lone drum
stone marlin
#

The code above is all you have? So, this doesn't even get to print i=?

#

My gut here is telling me that if you change the name of the table to start with a letter instead of a number, this error will go away. For example, noida_data_new_2020_1_min_8.

lone drum
#

so i terminated the code

#

and run the above code which gives error

stone marlin
#

What above code?

lone drum
#

thi gived error

stone marlin
#

Alright, so --- I know you said that you queried this above, but when asking a question, please try to tell the person what you're doing exactly and what the error is from. For example, I had no idea that 1) you terminated the script in the middle of it running, 2) what code you ran to execute your SQL, 3) any context for the error you ran into and what script it came from.

This is so we can answer your questions easier.

#

If you could, try engine.execute("SELECT * FROM '2020_1_min_8_noida_data_new'").fetchall()

lone drum
# stone marlin If you could, try `engine.execute("SELECT * FROM '2020_1_min_8_noida_data_new'")...

i am getting ```python
Traceback (most recent call last):

File "C:\Users\Admin\anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1771, in _execute_context
self.dialect.do_execute(

File "C:\Users\Admin\anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 717, in do_execute
cursor.execute(statement, parameters)

OperationalError: no such table: 2020_1_min_8_noida_data_new```

stone marlin
#

When you create the table in the script, it only lasts in memory for the duration of the script. So, you might want to create a named db that isn't in memory so you can access that.

#

This will save it as a file and you'll be able to access it, even if you interrupt the script.

#

Do not DM, please keep things in public chat.

#

@lone drum Please, do not DM, keep things in public chat.

somber prism
#

guys will the pytorch pretrained object detection give a good accuracy if i use it to fine tune on pascal voc format cuz it says its trained on coco dataset ? coco format : xmin ymin H W pasvoc format : xmin ymin xmax ymax

lone drum
# stone marlin This will save it as a file and you'll be able to access it, even if you interru...

my code this way python db_name = 'backtest_data' table_name = '2020_1_min_8_noida_data' mydb = MySQLdb.connect( host="localhost", user="root", password="covid2020",database= f"{db_name}") query = 'INSERT INTO `2020_1_min_8_noida_data` (unique_id_for_symbol ,timestamp, open, high, low, close, volume, full_candle, value) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)' mycursor = mydb.cursor() for i, chunk in enumerate(pd.read_csv(f'{path}{file_name}{extension}' , engine='python', chunksize=5000000 , iterator=True)): print('i=', i) for row in chunk.iterrows(): value=(row[1][0], row[1][1], row[1][2], row[1][3],row[1][4], row[1][5], row[1][6], row[1][7], row[1][8]) mycursor.execute(query, value) mydb.commit()
i terminated code in middle to check data is inserted in database table or not but when i use select * from table i am not getting rows data

#

select query gives connection error

#

plz check this

stone marlin
#

Okay, so now you're doing it a different way not using pandas to construct the db. I recommend using pandas for this --- for example, like this https://pythontic.com/pandas/serialization/mysql --- since it'll be a lot easier. I cannot read that error, and I've got to go to bed. Perhaps someone else here can help out.

lone drum
#

do u get my point here @stone marlin

tacit basin
# pastel valley how to decide learning rate? is it also trial and error ? is it ok to use the de...

you can use learning rate finder to get better idea which learning rate to use. Adam is optimizer, so like Gradient Descent but 'improved'. There are others like AdamW etc. Depends on data and problem, for image classification with resent i think adam or adamw are still good choices.
Regarding learning rate, you can choose to use constant learnign rate and then manually lower it. or you can use learning reate schedulers for example cosine annealing which will apply different learning rate (lower after warmup) which each epoch.
these optimizers are available in keras:
SGD
RMSprop
Adam
Adadelta
Adagrad
Adamax
Nadam
Ftrl

tacit basin
# pastel valley what is the difference here?

Global Average Pooling is a pooling operation designed to replace fully connected layers in classical CNNs. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and...

tacit basin
naive relic
#

Output:-
"'Led by Woody", " Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart", ' Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner'," the duo eventually learns to put aside their differences.'"

#

Help me

river maple
#

I've made a counting object program using yolov4 but it counts in real time. I want to keep the counting number even after the object moves away from the camera.

terse oracle
#

Hello, I have to build a machine learning classifier for this data, any tips on how to begin on what classifier should I use?

tacit basin
tacit basin
tacit basin
terse oracle
#

I am kinda confused in which steps do I need to start with, so it would be great if you could maybe recommend what steps should I do first.

tacit basin
#

let me know if you need help with it?

tacit basin
terse oracle
lapis sequoia
#

multilabel naive bayesian

terse oracle
lapis sequoia
#

just see which one is most probable.

terse oracle
lapis sequoia
#

this place is not advertisement, kindly remove the post.

zenith bison
#

sorry

lapis sequoia
#

probabilities those you can count.
like if
good word is in 10 records and 3 out of them are class 0, then p(good/class0) is 3/10

terse oracle
modest shuttle
#

What is Pose Estimation?

gray iron
#

Anyone interested in Google Summer of Code and that too in ML Based Open Source Organizations

#

Happy to have you there, and feel free to ask me anything.

thorn venture
#

I need to concat/append csv data in a single xl file.
df = pd.concat(map(pd.read_csv, ['file1.csv','file1.csv','file1.csv'])) helped me in this; but I need to add another column in the XL which contains the file name from where the data is coming. Can anyone help me please?

serene scaffold
#

and then the name will be an additional level of indexing in the concatted df.

tacit basin
tacit basin
gray iron
tacit basin
gray iron
#

Check the project ideas page

#

@tacit basin

thorn venture
serene scaffold
#

@gray iron I had to delete your message, as it constitutes advertising

gray iron
#

It's a genuine message about Google Summer of Code and a Python + ML Based Project.

serene scaffold
serene scaffold
gray iron
#

People can benefit from it and I'm just spreading awareness

serene scaffold
gray iron
#

Oh okay! Sure! NP

frosty flower
#

How do I shift all the data points to their right? i.e. making m[i][j] = m[i][j-1] for all data in the matrix

#

For j=0, pad the result with 0

serene scaffold
#

or are you using nested lists?

frosty flower
#

Figured.

serene scaffold
#
In [7]: arr
Out[7]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [8]: np.roll(arr, -1, axis=1)
Out[8]:
array([[ 1,  2,  0],
       [ 4,  5,  3],
       [ 7,  8,  6],
       [10, 11,  9]])
frosty flower
#

Thanks for the help tho

serene scaffold
somber prism
#

guys i am following this tutorial to build the pretrained faster rcnn in pytorch, but dont you think we need to call model.eval() for the validation data since we dont need to use batch norm , dropouts and other steps and only required for the training or i am just wrong ????

arctic wedgeBOT
#

torchvision/models/detection/generalized_rcnn.py line 15

class GeneralizedRCNN(nn.Module):```
twin willow
#

Hey guys, anyone here have a solid knowledge in NLP and Text Mining ? i need help.

tacit basin
frosty flower
#

How do I find the tangent line of a curve using opencv?

twin willow
#

Okay i work on a project where i have to extract some measures informations like size and volume from a description field, example of the description: " a bottle of 1.5L of Water"
i need to extract the "1.5L" but in other example it is "2 Liter" or "250mL".

frosty flower
#

The image is represented as a 2d np array, with black dots represented as 1 and white dots 0

#

Given an arbitrary point on the curve I want to find its tangent

iron basalt
terse hare
#

i am doing project where i have a huge list where i have to (Print a data frame with only two columns item_name and item_price ) can anyone help me with this?

acoustic forge
#

Is it accurate to say that there are three primary categories of NLP? Sentence Classification
Token Classification
and Sequence to Sequence.
Tasks in NLP can be put under one of these categories?

frosty flower
#

I have trouble understanding what it actually does

iron basalt
frosty flower
#

Oooh i see.

#

So for my image, it's still the same points, drawing them out won't help

#

But in contours there's this sequential structure for the pixels on the curve

#

So I can use that

#

I see, thanks!

graceful glacier
#

i have the following table

#

{'Organic Coffee': {0: 'Americano', 1: 'Latte', 2: 'Cappuccino', 3: 'Espresso', 4: 'Filter Coffee', 5: 'Flat White', 6: 'Mocha', 7: 'Macchiato'}, 'Organic Coffee Price': {0: 2.3, 1: 2.65, 2: 2.65, 3: 1.75, 4: 0.99, 5: 2.65, 6: 2.65, 7: 1.75}, 'Iced Coffee': {0: 'Iced Americano', 1: 'Iced Latte', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Iced Coffee Price': {0: 2.2, 1: 2.65, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Organic Tea': {0: 'Earl Grey', 1: 'English Breakfast', 2: 'Peppermint', 3: 'Tropical Green Tea', 4: nan, 5: nan, 6: nan, 7: nan}, 'Organic Tea Price': {0: 1.99, 1: 1.99, 2: 1.99, 3: 1.99, 4: nan, 5: nan, 6: nan, 7: nan}, 'Fruit Infusions': {0: 'Lemon & Ginger', 1: 'Raspberry & Pomegranate', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Fruit Infusions Price': {0: 1.99, 1: 1.99, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Other Beverages': {0: 'Chai Latte', 1: 'Hot Chocolate', 2: 'Matcha Latte', 3: 'Miso Soup', 4: 'Tumeric Latte', 5: nan, 6: nan, 7: nan}, 'Other Beverages Price': {0: 2.65, 1: 2.65, 2: 2.65, 3: 1.6, 4: 2.65, 5: nan, 6: nan, 7: nan}, 'Frappรฉs': {0: 'Chocolate Frappรฉ', 1: 'Classic Frappรฉ', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Frappรฉs Price': {0: 3.35, 1: 3.35, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Fruit Smoothies': {0: 'Berry Blast', 1: 'Strawberry & Banana', 2: 'Mango & Raspberry', 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Fruit Smoothies Price': {0: 3.35, 1: 3.35, 2: 3.35, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Extras': {0: 'Syrup', 1: 'Extra Shot', 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}, 'Extras Price': {0: 0.45, 1: 0.45, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan}}

#

and i want to transform it into this

#

{'Category': {0: 'Extras', 1: 'Frappรฉs', 2: 'Fruit Infusions', 3: 'Fruit Smoothies', 4: 'Iced Coffee', 5: 'Organic Coffee', 6: 'Organic Tea', 7: 'Other Beverages', 8: 'Extras', 9: 'Frappรฉs', 10: 'Fruit Infusions', 11: 'Fruit Smoothies', 12: 'Iced Coffee', 13: 'Organic Coffee', 14: 'Organic Tea', 15: 'Other Beverages', 16: 'Fruit Smoothies', 17: 'Organic Coffee', 18: 'Organic Tea', 19: 'Other Beverages', 20: 'Organic Coffee', 21: 'Organic Tea', 22: 'Other Beverages', 23: 'Organic Coffee', 24: 'Other Beverages', 25: 'Organic Coffee', 26: 'Organic Coffee', 27: 'Organic Coffee'}, 'Subcategory': {0: 'Syrup', 1: 'Chocolate Frappรฉ', 2: 'Lemon & Ginger', 3: 'Berry Blast', 4: 'Iced Americano', 5: 'Americano', 6: 'Earl Grey', 7: 'Chai Latte', 8: 'Extra Shot', 9: 'Classic Frappรฉ', 10: 'Raspberry & Pomegranate', 11: 'Strawberry & Banana', 12: 'Iced Latte', 13: 'Latte', 14: 'English Breakfast', 15: 'Hot Chocolate', 16: 'Mango & Raspberry', 17: 'Cappuccino', 18: 'Peppermint', 19: 'Matcha Latte', 20: 'Espresso', 21: 'Tropical Green Tea', 22: 'Miso Soup', 23: 'Filter Coffee', 24: 'Tumeric Latte', 25: 'Flat White', 26: 'Mocha', 27: 'Macchiato'}, 'price': {0: 0.45, 1: 3.35, 2: 1.99, 3: 3.35, 4: 2.2, 5: 2.3, 6: 1.99, 7: 2.65, 8: 0.45, 9: 3.35, 10: 1.99, 11: 3.35, 12: 2.65, 13: 2.65, 14: 1.99, 15: 2.65, 16: 3.35, 17: 2.65, 18: 1.99, 19: 2.65, 20: 1.75, 21: 1.99, 22: 1.6, 23: 0.99, 24: 2.65, 25: 2.65, 26: 2.65, 27: 1.75}}

#

what would be the best way to get this? i personally ziped every two columns and then unpivoted(melted)

serene scaffold
#

let me take a crack at it

serene scaffold
graceful glacier
#

yes thats the original shape unfortunately

agile cobalt
#

you could try doing something like pd.concat([df.iloc[:, ::2].melt(), df.iloc[:, 1::2]].melt(), axis=1) but that data format looks sooooo weird
like, how do they even decide what goes into each row?

misty flint
graceful glacier
#

^i like that solution

misty flint
#

and nowadays modern models like transformers are state of the art

tacit basin
# twin willow Okay i work on a project where i have to extract some measures informations like...

I think named entity recognition should help https://paperswithcode.com/task/named-entity-recognition-ner

Named entity recognition (NER) is the task of tagging entities in text with their corresponding type.
Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities.
O is used for non-entity tokens.

Example:

Mark Watney visited Mars
B-PER I-PER O B-LOC

...

graceful glacier
acoustic forge
serene scaffold
acoustic forge
acoustic forge
serene scaffold
#

@graceful glacier I haven't come up with a more elegant solution yet, but I'll keep it open and come back to it later

misty flint
#

i think this a naive estimate that underestimates the broad field of NLP tbh

acoustic forge
misty flint
#

where do you place speech recognition

graceful glacier
acoustic forge
#

Yeah - That is true. I wasn't including speech synthesis and audio etc

#

If we talk purely text based NLP

misty flint
#

what about information retrieval topics?

#

theres a ton in there

acoustic forge
tacit basin
serene scaffold
#

Rex has conveniently raised both of the points I was going to raise

acoustic forge
serene scaffold
#

Also, what about information extraction?

misty flint
acoustic forge
tacit basin
acoustic forge
acoustic forge
#

Not trying to be annoying, writing my thesis and trying to figure out what the best structure is

misty flint
#

where do you place the concept of TF-IDF? since that's extremely important in info retrieval and search engines

acoustic forge
#

Good point

misty flint
#

honestly, im taking my info retrieval and web search class next semester so i actually dont know as much about info retrieval rn

#

other than these adjacent concepts

#

maybe something to look into when writing your thesis

acoustic forge
misty flint
#

just to make sure you cover stuff

#

that you need to

#

other than that, i think maybe you got a decent argument for at least a significant amount of NLP tasks

#

i just would stay from the word "all" or you might get some pushback

#

from your committee

acoustic forge
misty flint
#

thats good you could probably group the stuff mentioned above in an other category or something and get away with it

#

i actually heard about an interesting NLP model the other day

acoustic forge
misty flint
#

"zero-shot multilingual neural machine translation"

#

a mouthful but its actually pretty cool concept

acoustic forge
#

Interesting - Will check that out ๐Ÿ˜ฎ

misty flint
#

yeah maybe you can include it in your recent developments if you find it interesting/relevant enough

#

how i understood it is you have a neural network that instead of translating from French -> English and then English -> German. It goes straight from French -> German with only training data of (FR->EN. And EN->GR).

#

"Zero-shot" since you try to do it all in one go

agile cobalt
# graceful glacier i think its based on index

that seems to work? pd.concat([data.iloc[:, ::2].melt(), data.iloc[:, 1::2].melt()], axis=1).set_axis(['Category', 'Subcategory', '_', 'Price'], axis='columns').dropna().drop(columns="_").sort_values(["Category", "Subcategory"]) (if so, have fun cleaning it up)

misty flint
#
acoustic forge
misty flint
#

yeah def check out that episode and let me know your thoughts

#

bc it sounds like an awesome NLP model

acoustic forge
misty flint
#

interesting yeah maybe not as applicable but who knows

acoustic forge
#

You working on any fun projects? ๐Ÿ™‚

misty flint
#

me?

#

idk about fun

#

but our group is probably going to try to do something with recommender systems

#

for our DL class

#

that or GANs

#

lol

#

we havent come to a consensus yet

acoustic forge
#

Both sounds fun though! Do you know in what context you'd want to do something with rec. systems or GANs?

misty flint
#

not too sure about the rec systems yet

#

but for the GANs we would probs try to extend this paper

#

Given more time, we wouldโ€™ve liked to explore a generative application [6] capable of producing a new Moonboard problem, given a user-specified difficulty.

#

so kinda the opposite of the problem they solved in their paper

acoustic forge
neat anvil
#

It's called bouldering in the US as well

acoustic forge
#

Ah - Alright. Everytime I have talked to people from other countries about bouldering they have been like what

misty flint
#

yeah basically

misty flint
#

thats how you can tell

misty flint
acoustic forge
misty flint
#

ill let you know if we end up deciding that one

#

ill have to take a look at the current rec system models first tho

#

and test them out

modern cypress
#

Hey guys, could anyone lead me in the right direction on how I can improve my accuracy? I am giving it 6000 pictures of 7 different classes

#

I've tried all kinds of filters and epochs, but I think I'm missing something

#

Maybe it a problem within my data?

#

red arrow is what the model predicts

#

motorcycle lemon_angrysad

iron basalt
#

Making use of knowledge that is not currently being learned / touched (and may never be if it's just some static rules)?

#

"Additional structure"?

quiet vault
#

You need to add dropout layers

#

Honestly, I would look into using architectures that have already proved to be very successful such as GoogleNet

modern cypress
modern cypress
quiet vault
modern cypress
#

This?

quiet vault
#

yes

#

It basically randomly changes weights to 0 which somehow reduces overfitting (iirc)

#

I also see that you only have 1 conv2d layer

#

you should use more

#

with the number of filters increasing and the filter/kernel size decreasing in odd numbers

#

(7, 5, 3) for size and amount of filters like this (64, 128, 256)

#

This is very expensive to train so im not sure if you can

#

But if you can, it improves results

#

and add some more dense layers

modern cypress
#

Where should I add the dropout layer?

quiet vault
#

add keras.layers.Dropout(0.3) before the final dense layer

#

and increase it if overfitting is still bad

#

typical ranges are from 0.3-0.8 iicr

#

one more thing

#

add a pooling layer

#

it decreases the amount of memory being used in the conv2d layers

modern cypress
#

Ahhh I see

#

I didn't know so many layers existed

quiet vault
#

yea there are a lot

modern cypress
#

i just tried running this with 2 epochs just to see how it goes

#

should be done in around 5 mins

quiet vault
#

alright

#

surprised you have enough memory ngl

#

oh

#

i forgot to mention

#

what is the shape of your y?

modern cypress
#

y is just an int

#

x is the image with shape 400, 400, 3

#

I have 32gb ram

grave frost
modern cypress
#

I tried testing out the max images I could train on the old model I had I reached about 80k

modern cypress
#

Nice nice a lot better than before

#

So I should add a pooling layer

quiet vault
quiet vault
modern cypress
#

Oh, what do you mean?

quiet vault
#

how many classes do you have?

modern cypress
#

umm 7

#

Or 6 and a default class

#

where I have like a bunch of non-class pictures

quiet vault
#

for an output of the first class, the output has to be [1, 0, 0, 0, 0, 0, 0]

#

and for 2 it would be

#

[0, 1, 0, 0, 0, 0, 0]

#

if that makes sense

modern cypress
#

Oh hmmmm

quiet vault
#

to do this, you can easily use the to_categorical function on your y dataset

modern cypress
#

When I print the prediction

#

OH WAIT

#

I UNDERSTAND YOU

#

I've been

quiet vault
#

print the prediction variable

modern cypress
quiet vault
#

yep thats good

modern cypress
#

I understand you though

quiet vault
#

yea good

#

you see the values are negative? they shouldnt be

modern cypress
#

Right now each class has an index, so class 1 is 1 and so on, but you're saying it's better for class 1 to be [1, 0, 0, 0, 0, 0, 0]

quiet vault
#

always with multi class classification, use the softmax activation function on the final dense layer

#

this will make all the outputs add up to 1

modern cypress
#

keras.layers.Dense(7, activation = 'softmax')

quiet vault
#

yes

misty flint
modern cypress
#

Oh hmm

#

Change it to this?

quiet vault
#

interesting

modern cypress
#

Hmm, this is a lot deeper into stuff like this than i've ever gone

quiet vault
#

you dont want that

#

for the loss just use 'categorical_crossentropy' and see what happens

#

with the softmax activation

misty flint
#

metaverse?

#

jk

modern cypress
#

ValueError: Shapes (None, 1) and (None, 7) are incompatible

#

Oh

#

Do I need to fix my y?

quiet vault
#

Did you change anything else or just the loss function?

modern cypress
#

and the accuracy

#

tf.keras.metrics.CategoricalAccuracy()

quiet vault
#

Not sure what this could be

#

If it doesn't work, just go back

modern cypress
#

Ahh fixed my y and I think it's working?

#

I still hadn't changed the 1 to [1,0,0,0,0,0,0]

quiet vault
#

ah

iron basalt
modern cypress
quiet vault
#

share the full training

modern cypress
quiet vault
#

increase epochs

#

also

#

you can watch the model test on the validation data after every epoch if you want

modern cypress
#

oh for real?

quiet vault
#

if you do use validation_data=(x_train, y_train) in the fit function

#

Do this and increase epochs to 10

#

see if the validation accuracy decreases at all

iron basalt
#

The real life equivalent is making use of evolved knowledge / not learned during life. And so associating things with it gives "zero-shot" learning. It's the additional structure or biases provided to make learning much faster / accurate (from scratch is really hard). **These biases are not completely immutable though.

modern cypress
#

I appreciate your help so much

quiet vault
#

no problem

#

wait

#

i wanted to say validation_data=(x_test y_test)

#

not x_train and y_train

#

If you used train just restart training

#

its useless

modern cypress
#

Ahh thought so

#

retrying

#

it takes about 90 seconds per epoch

quiet vault
#

alright

#

im gonna eat dinner now but ill get back to you once i get back

modern cypress
#

Alright tysm

brave sand
#

could basic machine learning be learned by doing countless projects?

iron basalt
brave sand
brave sand
brave sand
#

would I need one for every agent?

#

but wouldn't that be super slow?

iron basalt
brave sand
# iron basalt Depends on what you are doing. IDK

This is a report of a software project that created the conditions for evolution in an attempt to learn something about how evolution works in nature. This is for the programmer looking for ideas for interdisciplinary programming projects, or for anyone interested in how evolution and natural selection work.

Before commenting on the religious/t...

โ–ถ Play video
#

this is just like an example of an outcome I want to achieve

iron basalt
#

You can evolve reinforcement learners.

brave sand
#

Could you elaborate?

iron basalt
#

I recommend learning how genetic algorithms work, it should be pretty obvious then.

brave sand
#

A project like that would take too long with basic q learning correct?

iron basalt
#

The project you gave me is about virtually evolved creatures, it does not require RL.

brave sand
#

Gotcha thanks

#

I think I'm mixing things up lol

iron basalt
#

Adding RL into the mix would probably give better results, but also take much more compute.

brave sand
#

Yeah, because wouldn't a genetic algorithm be less superior to an RL algorithm?

iron basalt
#

No.

#

Genetic algorithms are incredibly good. Their downside is that they require multiple agents.

brave sand
#

Well then I will give that a shot and try that

#

thanks

iron basalt
#

Also genetic algorithms only learn between generations, not during.

#

They can't adapt on the fly during the agent's life.

#

Combining genetic algorithms and RL gives you both, but you still need multiple agents and those just got much more expensive to compute.

brave sand
#

Expensive to compute as in time and speed?

iron basalt
#

Yes, you need to simulate each agent in its environment and that now includes an RL model.

#

If the RL model is efficient enough it can be worth it.

brave sand
#

So genetic algorithms are easier to start with so I'll start there

#

So the video above is just using a basic genetic algorithm?

iron basalt
#

Yeah genetics algorithms in their most basic form are stupidly simple.

brave sand
#

I thought they would be using a neural network

iron basalt
#

It does not require a neural network.

#

It does not require much of anything.

#

(Which is why it can easily show up in nature)

brave sand
#

Ohh, but in the video they used a neural network, why is that? Especially when a genetic algorithm is much easier to implement?

iron basalt
#

I recommend just learning about genetic algorithms and all will become clear.

brave sand
#

Alright thank you

clear vale
#

hello guys... I am going to start studying ML.. just finished a pandas playlist in Youtube... any ideas of cool projects for beginners?

modern cypress
quiet vault
#

Thats weird

#

it says val_categorical accuracy is 95% tho

modern cypress
#

Mhmm I am confused too

#

flicking through I can see some errors

iron basalt
# clear vale hello guys... I am going to start studying ML.. just finished a pandas playlist ...

By the 1950s, science fiction was beginning to become reality: machines didnโ€™t just calculate; they began to learn. Machine calculating was out. Machine learning was in. But we had to start small.

Donald Michieโ€™s โ€œMachine Educable Noughts And Crosses Engineโ€ -- MENACE -- was composed of 304 separate matchboxes that each depicted a possible stat...

โ–ถ Play video
#

Fun physical interactive ML project (also analogous to genetic algorithms (losing genes removed from gene pool)) . You can implement it in Python later if you want maybe with a GUI. @brave sand

modern cypress
#

wait let me look at confusion matrix

#

because I'm seeing a lot of fire

#

maybe im overfitting on that specific class?

#

that class has 1.7k/6k images

quiet vault
#

perhaps

#

honestly it might be an error in the code

clear vale
modern cypress
quiet vault
#

wait

#

whats the code u have for fit

modern cypress
iron basalt
brave sand
quiet vault
#

ive never seen the val accuracy not line up with score when evaluating model

iron basalt
# brave sand So it keeps on going till thereโ€™s none left?

No, because the genetic algorithms produce more new agents and the population can even grow over time to have even more parallel strategy search (on a computer you need to limit it for performance reasons and IRL it's limited by resources available too, like food).

modern cypress
brave sand
iron basalt
brave sand
iron basalt
# iron basalt https://www.youtube.com/watch?v=sw7UAZNgGg8

Also maybe read the wiki page on genetic algorithms to start: https://en.wikipedia.org/wiki/Genetic_algorithm .

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired ope...

#

Genetic algorithms can be used to evolve arbitrary parameters (can be applied on top of an existing algorithm that has parameters in need of tweaking). There is a hello world program for genetic algorithms and that is to evolve a population of strings into "Hello, World!".

spiral furnace
#

sup folks?
I'm trying to count unique values in a df column, I do len() but sometimes there is nan that I want to exclude. Do you know of any fast method on counting the number of values minus the nan or should I go barbarian?

spiral furnace
#

I do this-- if df.var.isnull().values.any(): len(df.var.unique())-1
but is there any method already in pandas for that?

spiral furnace
#

really I need to learn better about how to search in the documentation

fading gate
#

when doing a df.groupby("x").apply(lambda x: x["y"] - x.loc[0, "y"]) it seems the indices in my groupby aren't reset. Is this expected?

#

what I'd like to do is subtract the first row's y from each row's y

#

do I really need to do df.groupby("x").apply(lambda x: x.reset_index()["y"] - x.reset_index().loc[0, "y"]) ?

spiral furnace
fading gate
#

my df index is reset it appears, index goes from 0 - N

spiral furnace
#

are you on Kaggle?

fading gate
#

what is kaggle?

#

I think this might work: df.groupby("x").apply(lambda x: x["y"] - x.reset_index().loc[:0, "y"])

#

I guess there's no way to just say get me the first row irrespective of the index values except maybe with iloc but then iloc wants a numeric column offset

spiral furnace
#

what is your "y"

#

a column?

#

cause I cannot understand loc[:0,"y"]

fading gate
#

yeah y is a column

#

df["marginal_y"] = df.groupby("x").apply(lambda x: x["y"] - x.reset_index().loc[:0, "y"]) is really what I'm trying to achieve

#

yup that works actually

frosty flower
#

Hey quick question

#

Is there a quick way to turn a binary image into points?

#

Like if I have:

[[ 0, 1, 1 ],
 [ 1, 0, 0 ], 
 [ 0, 0, 0 ]]
#

And I want to turn it into:

#
[[0, 1], [0, 2], [1, 0]]
#

Of course I can do it with a double loop

steep lotus
#

guys if i have question about social media mining is this the right place to go to

misty flint
steep lotus
#

well currently for now its just questions that i have while i run the code that im getting from the book.

#

like for example import requests

#

is there any practical applications to this while analyzing soccer games and its live game data?

#

and thank for answering @misty flint

misty flint
#

lets zoom out and think about the bigger picture first

#

say you have a soccer game

#

and people are live-tweeting about it using a certain hashtag

#

what type of questions do you want answered if you had access to that aggregated information?

#

how do people feel about the game as a whole? about a certain player? (sentiment analysis)

#

you could probably see more tweets that happen right after a goal is scored

#

stuff like that

iron basalt
# frosty flower Hey quick question
>>> a = np.array([[0, 1, 1], [1, 0, 0], [0, 0, 0]])
>>> a
array([[0, 1, 1],
       [1, 0, 0],
       [0, 0, 0]])
>>> np.argwhere(a > 0)
array([[0, 1],
       [0, 2],
       [1, 0]])
>>> 
misty flint
#

scientific method and all that etc.

charred light
#

Why does pyspark's df.count() return a different # of rows compared to pyspark df.toPandas() and then using panda's .shape? I'm seeing a difference of ~30 rows.

For example:

df1 = df['col1', 'col2'].dropDuplicates(['col1', 'col2'])
df1.count() #Returns 81049
df2 = df['col1', 'col2'].dropDuplicates(['col1', 'col2']).toPandas()
df2.shape #Returns 81077 ???
lapis sequoia
#

I can see you did try to drop it but may be...

charred light
lapis sequoia
#

I'm not sure. I have never personally used pyspark.

#

Lemme dig in a lil bit if i can find

#

Hold on

#

It may be NA rows

charred light
#

df2.isna().sum() each col* returns 0
I'm internally screaming...

I think I'm stuck working within pyspark's dataframe.

lapis sequoia
#

lets see what values each col has

charred light
#

df2.col1.value_counts(dropna=False) returns 1 of each value (This is a column of unique ids, len same as shape)
df2.col2.value_counts(dropna=False) returns 81077 of val1

df.count() returns each col matching shape as well
df.count returns some individual rows, 81077 rows.

lapis sequoia
#

so shape gives more only
edit: oh nono sorry count also returns 81077

charred light
#

lol, pyspark is cancer. I tried df3 code and it returned entirely different count

df1 = df['col1', 'col2'].dropDuplicates(['col1', 'col2'])
df1.count() #Returns 81049
df3 = df1.toPandas()
df3.shape #Returns 81054
df2 = df['col1', 'col2'].dropDuplicates(['col1', 'col2']).toPandas()
df2.shape #Returns 81077 ???
lapis sequoia
#

Jesus lol

charred light
#

I found the problem.

I tested out a query with just 20 rows.
The df1 = df['col1', 'col2'].dropDuplicates(['col1', 'col2']) IDs are different from the original query, which is different from the IDs in df2 = df['col1', 'col2'].dropDuplicates(['col1', 'col2']).toPandas()

#

But I have no idea why

#

"Pyspark similar to pandas" yea, ok

lyric tartan
#

can anybody help me with OpenCV And CSV file?

lapis sequoia
lyric tartan
#

i am working face recoginition project i want to display details from csv file

#

import csv
import os
from pathlib import Path

faces_path = "C:\Users\kingm\Desktop\pythonProject\faces"

def search():
face_names = os.listdir(faces_path)
for i, name in enumerate(face_names):
filename = os.path.basename(name)
numm = Path(filename).stem
num = numm
read = csv.reader(open('C:\Users\kingm\Desktop\test.csv'))
for row in read:
if num == row[0]:
print(row)

search()

#

i used this for getting number as name of jpg and print same number details in csv file

lyric tartan
graceful glacier
#

hello

#

maybe not the right channel to ask this but

#

i need to know how to extract the poem part of this html file

arctic wedgeBOT
#

Hey @graceful glacier!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.