#data-science-and-ml

1 messages Β· Page 44 of 1

prime hearth
#

with tfidf when i print it, it shows like a dataframe where the columns name is the word representing that column

#

i am able to get the features from the tfidfvectorizer after fitting my dataset

#

i was thinking, do you think it good idea for any new data to perform tfidfvectorizer then check which columns or features exist in my model and drop any that arent part of it, and then predict with that data?

queen cradle
#

To compare your models, you need to pick the one loss function and use it for all your data. That can be mean absolute price change, root mean squared price change, mean relative price change, root mean squared relative price change, or lots of other things. But as long as you pick one loss function and use it consistently, the results are comparable. (R^2 is not helpful for this purpose. It's frequently misinterpreted; in fact, in almost all cases you should just ignore it and use something like mean squared error instead.)

serene scaffold
#

@prime hearth you can add new columns for words that you don't already account for, but for arrays made with the old vectorizer, I guess you'd have to pad them

prime hearth
#

hello, i would like to please ask, im trying to improve the accuracy of my model but i notice my model is trying to classify if a business is good or not based on reviews. My accuracy came to be 0.75 for predicting enw data. However my model uses 700 features (each column represents a word) and i noticed my model seem to be to over fitting as the features my model is using is not meaningful. Do you think it be a good idea to extract the most popular feature my model is using then to only choose adjectives or noun

stuck shard
prime hearth
#

@stuck shard yes. So for example:

#

'food", "quick", "ugly"

#

those would be the column after performing bag of words or tfidfvectorizer

#

However tfidf and bag of words has a built in method that shows the top features you can say, and I choose the top 100 to see what my model is using this is what it showz:

['score' 'serving' 'product' 'pork' 'portion' 'portland' 'pretty' 'price'
 'priced' 'pro' 'probably' 'problem' 'professional' 'point' 'pub' 'put'
 'quality' 'question' 'quick' 'quickly' 'quiet' 'quite' 'ramen' 'rating'
 'poor' 'pm' 'ready' 'plus' 'move' 'moved' 'moving' 'much' 'multiple'
 'music' 'must' 'nail' 'name' 'near' 'need' 'needed' 'photo' 'pick'
 'piece' 'pizza' 'place' 'plate' 'pleasant' 'please' 'plenty' 'read'
 'really' 'service' 'sandwich' 'saturday' 'sauce' 'saw' 'say' 'saying'
 'seafood' 'seat' 'seated' 'seating' 'second' 'see' 'seeing' 'seem'
 'seemed' 'seems' 'seen' 'selection' 'sell' 'serve' 'served' 'server'
 'sat' 'salon' 'reason' 'salmon' 'reasonable' 'received' 'recent'
 'recently' 'recommend' 'recommended' 'red' 'regular' 'reservation'
 'restaurant' 'return' 'review' 'rice' 'right' 'road' 'roll' 'room' 'rude'
 'run' 'said' 'salad' 'able']
#

you can see it not very good if i use google reviews on this model it can be bad as the words are too fit to my dataset

#

so i was thinking of getting the most meaningful words such as adjectives and nouns maybe

#

what do you think?

stuck shard
#

have you tried using stemming or lemmatization to condense words? for example, "recommended" would become "recommend" with a stemmer such as Porter or Lancaster stemmer (available in the NLTK library)

#

and "moved" and "moving" would become "move"

prime hearth
#

yes i did do text cleaning-stopwords, lemmatization

stuck shard
#

so you would create more room for more unique words in the top 100. additonally, consider using the TextBlob library to do sentiment analysis such as polarity measure (-1 being very negative sentiment, +1 being very positive sentiment) as an additional feature in your dataset

prime hearth
#

thats a good idea , my dataset also has the rating review (from 0-5 )

#

so do you think i should still do this?

#

i use rating reviews and all the word features in my model

stuck shard
#

what is the criteria for "good" versus "bad"? did you label each point as good/bad or is this a dataset you obtained from somewhere?

prime hearth
#

oh yeah i have the y-label or another feature that determine if it bad or not the business

#

so i have 3 features in original dataset, rating review, the review, and if business is successful or not

#

after performing tfidf or baf of words i get lots of features (like 700 )which i use in my model

#

i using classifcaiotn model like SVM as thats the one with best accuracy so far and KNN is close

stuck shard
#

have you tried random forest?

prime hearth
#

no i havent, you think i should?

stuck shard
#

it is probably worth a shot

#

although i think you would have to represent your data as tfidf numpy vectors for random forest to work appropriately

prime hearth
#

so random forest wont work with tfidf?

#

oh sorry i see what you mean

stuck shard
#

the data has to be the numbers generated by representing your data as tfidf vectors (if i recall correctly)

#

it's been a while since i did text-based classification

prime hearth
#

oh okay thanks il try that, do you think i should try what im doing right now

#

like the getting most popular words in dataset that are meaninful (adjectives and nouns)

#

and use those features and try doing feature selection again and repeat?

#

i notice my model does well with thousands of features because some words dont exist in other reviews. But i know it not good to have that many features so trying to reduce it

stable sierra
#

Hello
I have a problem
TypeError: 'NoneType' object is not iterable
when I'm trying to scrape information from amazon using for loop with requests html library
anyone knows the solution ?

undone fiber
#

suggestions with type of ml to use or example.. say i want to try to build an ml program that can diagnose windows errors.. and the dataset would be errors, logs, current config, golden config, common solutions..
where it reads the errors, logs, current config and compares against golden config, and common solutions for those errors.

lapis sequoia
eternal hull
#

Anyone has idea how to build retention model

violet monolith
#

Hi

lapis sequoia
#

hi guys, i would like to create a text-to-image ai, do you have some suggestions for start this project?

lusty sail
mild dirge
#

I just made a neural network that performs regression on some numerical data. The neural network has 2 hidden nodes with tanh activation. And the result of these nodes are simply added up for the final output of the model. The input is about 100 data points with 50 features each.

Now what is interesting is that (after training) these two hidden nodes seem to have mirrored/flipped weights. So they both have weights from each feature to the node, but the weight from feature 1 to node 1 (call it w11) is approx equal to the flipped weight from feature 1 to node 2 -w12. I'm not really sure what this means or how to interpret this result.

wooden sail
#

i would start by saying that, with no further context, neural networks and their parameters are in general not interpretable

mild dirge
#

Yeah, but 50 weights per node, almost exactly equal but flipped. Seems to be something going on.

wooden sail
#

it could be that what you're computing has some symmetry

#

you would expect something like this if what you're computing behaves, e.g., like a distance (minus the negative sign)

tidal bough
#

some points of view:

  • it seems to me that having w12 = -w11 is about equivalent to having them both equal to zero, so I guess the output just doesn't depend on feature1 much
  • on the other hand, it's not entirely equivalent unless the other weights are also equal, and neural networks can stuff a lot of computation into tiny portions of their activations like that, so it may be meaningful after all.
    i don't have much of an grasp on which POV is better here πŸ₯΄
mild dirge
#

I can show it in a bit, it's on diff computer.

wooden sail
#

if you can rewrite it explicitly as matrices and vectors, i can revise what i said. cuz on second thought i'm not sure what you call w11 and w12 in the first place πŸ˜›

mild dirge
#

The weight from input 1 (so feature 1) to node 1 or 2 respectively for w11 and w12

wooden sail
#

so something like y = sum(tanh(MX)), where M is of size 2 x 50 and X is of size 50 x 100? then i'd agree with reptile

#

but "almost exactly equal" and getting "small outputs" is a hard thing to quantify without a proper metric in the output space of that layer

#

and also whether 0 means something in the output space of the layer

#

(that meaning and metric are usually absent)

#

if you wanna corroborate whether it's that the feature does not matter, you can do some sort of sensitivity analysis, e.g. by now freezing the network weights and differentiating w.r.t. the input

mild dirge
#

Yeah i'll take a look at the data, maybe there's some interesting pattern

tidal bough
#

here's an example where there's two output neurons and weights of a for them are opposing

#

as you can see, when b=0, the output is 0 regardless of a - but when it's not, it does still depend on a.

#

so that may be what it's doing, very roughly.

leaden cosmos
#

Anyone familiar with tensor flow-gpu

prime hearth
#

it is valid to transform new text to tfidfvectorizer to predict new data on a model that was trainned on a dataset using tfidfvetorizer? My concern is i would need to drop any features in the new data that doesnt exist in my model after performing tfidf on the new data

mild dirge
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

mild dirge
#

Made this code to try and visualize the output of my model for just 1 feature for different weight values for the 2 hidden nodes. It's an interactive plot using just numpy and matplotlib. I guess the weights having opposing sign makes it so more complex functions can be approximated, as when they have the same sign, it is basically the same as having just 1 hidden node. Still not sure how this would generalize to more features though.
https://paste.pythondiscord.com/akasotunor.py

#

It's an interactive plot. If you guys (@tidal bough @wooden sail) were still interested ^^

tidal bough
#

when there's only one feature, it's just tanh(w1 x) + tanh(w2 x)

#

though, hmm, it doesn't really simplify

ripe sapphire
#

what does tanh mean?

delicate apex
#

hyperbolic tangent. defined as hyperbolic sin over hyperbolic cosh (sinh/cosh), or:

#

.latex $\frac{e^x - e^{-x}}{e^x + e^{-x}}$

strange elbowBOT
wooden sail
# mild dirge

doesn't really simplify much, as reptile says. your plot just added in an extra compression/dilation to yield a family of plots like the surface reptile shared, just stretched along both axes depending on x. the next question is whether the sum means anything. you can study the weights assigned to each input parameter and see which ones will have which effect

#

but trying to interpret a network in general is not very meaningful unless the network enforces some sort of prior knowledge

#

is there a special reason to use a sum of hyperbolic tangents instead of just one tanh or more?

mild dirge
#

No, it's just some assignment for uni, but they specifically want us to plot the weights, which looks like this

wooden sail
#

you can also make a plot of wi1 + wi2 to get a feel for which params do what

mild dirge
#

And ig they expect us to say something "useful" about that

wooden sail
#

ah, lmao

#

plot the sum too

mild dirge
#

Left is original, right is sum, Still not too much to say I think πŸ˜›

wooden sail
#

not quite

#

the abs of the right plot tells you which parameters weigh into the sum

#

though one also has to consider the magnitude of the input, so that alone also doesn't tell the whole story

mild dirge
#

Yeah

#

It depends on the sum of the rest too

#

Its not that two opposing weights cancel out

wooden sail
#

i would claim 2abs(input_i)/abs(input_i (wi1 + wi2)) is meaningful, but the input cancels out

mild dirge
wooden sail
#

aight

#

if you look at 2 / abs(wi1 + wi2)?

#

wait ugh, the tanh

mild dirge
#

2/abs(tanh(w1+w2)) ?

wooden sail
#

no

#

2/(abs( tanh( mean_i * wi1) + tanh(mean_i) * wi2 ))

mild dirge
#

The mean is 0

wooden sail
#

byotiful

mild dirge
#

Maybe instead of looking for a reason they are opposed, is there a reason that they never have the same sign?

#

Because there is no exception (except for a single weight, but could be due too low epochs)

tidal bough
#

hmm, the tactic it uses to analyze weights is to look at W.T@W

wooden sail
#

i was about to mention looking at the correlation among the weights, yeah

#

you'd expect this to be a 2x2 near-rank-one matrix in your case, which would mean you asked for too high dimensional a vector space given your data

mild dirge
#

Or did I have the wrong shape?

wooden sail
#

i would have expected a 2x2x50 array

mild dirge
wooden sail
#

at any rate though, you can tell the matrix is highly structured regardless of how you choose to represent it

#

high structure goes hand in hand with low dimensionality

mild dirge
#

So high correlation between the weights of the same features means that too many hidden nodes are used or am I misinterpreting?

wooden sail
#

that would be my take

#

you can try using a 1 x 50 matrix instead and see if there's any noticeable difference in the training error at the end

mild dirge
#

Yeah I could try that. thx for the help btw

wooden sail
# mild dirge

this one can be interpreted as the average correlation between wi1 and wi2, btw. you can see that one column is -1 times the other, so it's rank deficient (or close to)

#

importantly, the off diagonals are about as large as the main diagonals, telling you that one weight explains the other

mild dirge
#

That's pretty good to know. I think in this case it does not give us much more info than what we already know (a high correlation between the weights) right?

#

I think I'm probably just going to keep the discussion of the plot simple, since I don't seem to understand very much why it is doing what it is doing with the opposing weight values.

wooden sail
#

high correlation/low rank are indicators of the possibility to exploit a low dimensional subspace, usually

#

the question of "why" is a more difficult one πŸ˜›

#

but the way i would put it is like this

mild dirge
#

Reducing the hidden nodes by 1 would probably give poor results, since that would make the model a bit too simple

#

So I'm not sure if they expect us to give that conclusion

#

I could still test it out

wooden sail
#

idk, i'd be curious πŸ˜›

#

i would put it like this. imagine you know your model is a + b = c, but you can only observe c

#

finding a and b requires making some sort of assumption or introducing prior knowledge, in general there's infinitely many solutions

#

there should be a clever transformation one can apply to your model to turn it into this kind of problem. i think? idk

#

if you feel like doing some asymptotics, you can also notice that tanh is odd, and so you have tanh(a) - tanh(a + delta), which looks like a finite difference. we can then ask the question of how close this is to a 1st order taylor series expansion. or also what happens when you expand tanh(a + delta), so that we can see exactly what a - b is doing

#

the first order taylor approx is x i think? which already turns it into a sort of a + b = c problem if you expand around a or b

#

assuming my handwaving isn't too far off the mark, this gives you an alternative interpretation of something like "the parameters cannot be distinguished from each other" in the asymptotic regime

#

πŸ‘‹

mild dirge
#

I will probably discuss some of this stuff with my partner to see what he thinks. But I really appreciate all the input

finite sierra
#

not sure if this is the right place to ask but I want to create a mobile app that detects playing cards with a phone camera using some sort of AI, I've already seen a few videos of people training a model to be able to do that but I want to go through that process myself.. I don't know what course to start with because there are different libs and idk which one I should be using

#

basically I want to know which python libraries I should be learning (with the math that comes with it) or course recommendations...

serene scaffold
finite sierra
#

I already know how to make a mobile app using flutter but I am clueless about the first part that you mentioned

serene scaffold
#

look into image classification for playing cards

#

you might also need object detection, to figure out where the cards are in a given image.

lapis sequoia
#

Does anyone know how to create a fuzzymatch using pandas?

#

A fuzzy match is performed to determine matching street instances, [LFA1_Street] -> [EA_Street and House Number (emp)], with a 90% threshold to produce a table with 2 columns, [Test3_RecordID] with their corresponding matching [Test3_RecordID2] called FM

serene scaffold
#

I think you've asked about this before?

lapis sequoia
thin palm
#

If a company provided previous quarters of their revenue and profit, basically their historical data. How can I estimate what their Q1 for the new year will be?

arctic crown
#

please help in sklearn why does .fit take a 2d array

serene scaffold
arctic crown
arctic crown
serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

anyway

#

!docs sklearn.linear_model.LinearRegression

arctic wedgeBOT
#

class sklearn.linear_model.LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)```
Ordinary least squares Linear Regression.

LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
serene scaffold
#

try doing dataset[['area']]. please also copy the code into the chat as shown above ^

#

@arctic crown

arctic crown
#
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv("homeprices.csv")

model = linear_model.LinearRegression()
model.fit(dataset[["area"]], dataset.price)
print(model.predict(3000))```
serene scaffold
#

I will not look at screenshots of text. Please do text as text.

arctic crown
#

error: ValueError: Expected 2D array, got scalar array instead:
array=3000.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

serene scaffold
#

try this instead

model.fit(dataset["area"], dataset.price)
print(model.predict([3000]))
arctic crown
#

ValueError: Expected 2D array, got 1D array instead:
array=[2600 3000 3200 3600 4000 4100].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

#

i am getting this error because my x value is not a 2d array in this line: model.fit(dataset["area"], dataset.price)

serene scaffold
#

where is array=[2600 3000 3200 3600 4000 4100]?

arctic crown
#

which is a csv file

#

area,price
2600,550000
3000,565000
3200,610000
3600,595000
4000,760000
4100,810000

serene scaffold
#

is it dataset.price, or what?

#

okay, thanks

#

keep in mind that I don't know any of this unless you show me

arctic crown
#

okay

serene scaffold
#

dataset['area'] is your X, and dataset.price is your y. and this is what the docs say

X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data.

yarray-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary.

#

so X needs to be two dimensional, and y can be one dimensional or two dimensional

arctic crown
#

yea thats my question why cant it accept a 1d array for the x

#

why does it have to be a 2d array

serene scaffold
#

so try this

model.fit(dataset[["area"]], dataset['price'])
print(model.predict([3000]))
serene scaffold
#

a one-dimensional array doesn't have rows or columns. it's just a flat sequence.

#

so instead of it being (n,) shaped, which is one dimensional, you need it to be (n, 1) shaped. which is basically the same

#

but if your X array is just (n,)-shaped, sklearn doesn't know if that's one row with n features, or n rows with one feature each.

#

@arctic crown make sense?

arctic crown
#

a little bit

arctic crown
serene scaffold
arctic crown
serene scaffold
#

in fact, data points are sometimes called objects

#

but a data point is a "thing", and a feature is a type of information you can have about that thing

#

like a person can be a data point, and their features would be things like height, weight, age, etc.

#

and every value for a given feature should be the same type/unit.

arctic crown
#

can you please give like a visual array example

#

i think you are talking about multiple linear regression

serene scaffold
serene scaffold
#

each row is a data point representing a person. the types of each element in a row are different.

each column is a feature. each column represents a specific kind of information that we can know about a person. the types and what they represent are the same.

arctic crown
#

wait so when we are doing .fit the x values are the rows and y values are the columns?

#

@serene scaffold

serene scaffold
arctic crown
mild dirge
#

Because you supply the data for the model to train on. The data consists of multiple rows, each being 1 datapoint. And each datapoint has multiple values, 1 for each feature.

#

Even if you only have 1 feature, then the libraries still expect it to be in the 2d format (like a table). But then it is just a table with 1 column, for the single feature that your dataset has.

hasty mountain
#

Interesting... Using a VAE with a DCGAN and making it not only trying to generate new images from random noise, but also trying to mimic the real images from their latent vector seems to make Generator-Discriminator convergence way easily.
Too bad DCGAN is too simple to my data...so I don't know if it might cause mode collapse eventually because of thatpithink

#

I know there's a VAE-GAN model, but I'm doing something slightly different... I guess

#

I just hope that, when I change the architecture from a DCGAN to a more robust one, this helps the model converge...I almost got crazy because it wouldn't converge

hasty mountain
#

My code is a Frankenstein's Monster already brainmon

hasty mountain
#

Yes, but for my GAN it's different

#

I've some comment code for a UNet discriminator, comment code for vanilla discriminator, comment code for duelling discriminators, there's also some code for Progressive Growing GANs...

#

Someday I might organize it...but only when I can make a decent GAN...which isn't today... yert

iron basalt
hasty mountain
#

But there's some things that I feel that might be promising, like UNet and Relativistic Discriminator.

#

And duelling discriminators...though with not so much interest than UNet and Relativistic

merry fern
sick fern
#

Hey guys, I have a question regarding neural networks in TensorFlow

#

||cols = ['letter', 'x-box', 'y-box', 'width', 'height', 'onpix', 'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar', 'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx']
DataFrame = pd.read_csv('letter-recognition.data' , names=cols)
Scaler = StandardScaler()

X = DataFrame[DataFrame.columns[1:-1]].values
X = Scaler.fit_transform(X)

Y = DataFrame[DataFrame.columns[0]].values

nn_model = tf.keras.Sequential([

tf.keras.layers.Flatten(input_shape = (1 , )),
tf.keras.layers.Dense(128 , activation="relu"),
tf.keras.layers.Dense(128 , activation="relu"),
tf.keras.layers.Dense(26 , activation="relu")

])

nn_model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
nn_model.predict(X)

||

#

This is my code that I've written. It's a neural network that is attempting to match features to digits and I've made it predict it.

#

But, I've been getting this bug: Input 0 of layer "dense_42" is incompatible with the layer: expected axis -1 of input shape to have value 1, but received input with shape (32, 15)

#

I'm sure it's because of the input_shape but I don't know how to fix it. If someone could help me with this, that'd be great.

#

I'm also available to VC

arctic wedgeBOT
#

Hey @sick fern!

It looks like you tried to attach file type(s) that we do not allow (.data). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

clever owl
#

I've currently got a df that looks like this
Item January February March
car 10 NaN NaN
bike NaN 20 NaN


import pandas as pd

data = {
        "Item": ['car', 'bike', 'house'],
        "Price": [10,20,30],
        "Months": ["January", "February", "March"]
}

df = pd.DataFrame(data)

df = df.pivot(index="Item", columns="Months", values="Price")

print(df)

How can I make it to be like this
January February March
Item Price Empty Price Empty Price Empty
car 10 NaN NaN NaN NaN NaN
bike NaN NaN 20 NaN NaN NaN

Where the Price columns contain the price, of the items, and the Empty columns is just full empty

serene scaffold
#

you pretty much never want to allocate empty cells to put data in later. Once you've computed the data that those empty cells are actually for, then you can create a dataframe that has all of them.

sick fern
serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

sick fern
#

Thank you

#
cols = ['letter', 'x-box', 'y-box', 'width', 'height', 'onpix', 'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar', 'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx']
DataFrame = pd.read_csv('letter-recognition.data' , names=cols)
Scaler = StandardScaler()


X = DataFrame[DataFrame.columns[1:-1]].values
X = Scaler.fit_transform(X)

Y = DataFrame[DataFrame.columns[0]].values

nn_model = tf.keras.Sequential([

    tf.keras.layers.Flatten(input_shape = (1 , )),
    tf.keras.layers.Dense(128 , activation="relu"),
    tf.keras.layers.Dense(128 , activation="relu"),
    tf.keras.layers.Dense(26 , activation="relu")

])

nn_model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
nn_model.predict(X)

serene scaffold
sick fern
sick fern
serene scaffold
# sick fern

please never ask people to read screenshots of text. it's a waste of everyone's time

sick fern
#

Okay, my bad.

#
ValueError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 248, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer "sequential_16" (type Sequential).
    
    Input 0 of layer "dense_58" is incompatible with the layer: expected axis -1 of input shape to have value 1, but received input with shape (32, 15)
    
    Call arguments received by layer "sequential_16" (type Sequential):
      β€’ inputs=tf.Tensor(shape=(32, 15), dtype=float32)
      β€’ training=False
      β€’ mask=None

serene scaffold
#

@sick fern thank you. please remember these two things (showing the code with markdown formatting, and the whole error message as text) for all future questions

sick fern
serene scaffold
#

@sick fern the input that has a shape of (32, 15) -- what does that represent?

serene scaffold
#

what is the input?

sick fern
#

The input is the X values of the DataFrame

serene scaffold
#

why is the shape (32, 15)?

sick fern
#

I'm not really sure. I don't think I kept that in my code.

#

It was there in the error message, though.

#

Let me send the dataset

#

"letter-recognition.data" is the file

serene scaffold
#

@sick fern so, it appears that the first column is a label, and the other 16 columns are features of some kind. it isn't immediately obvious to me what they mean

#

but it appears that you're trying to run 32 instances (ie, 32 rows) through the model at once

sick fern
#

oh

#

I'd like to run an instance at a time

#

How would I do that?

serene scaffold
#

try changing tf.keras.layers.Flatten(input_shape = (1 , )), to just tf.keras.layers.Flatten(),

sick fern
#

Wow

#

It worked

serene scaffold
#

cols = ['letter', 'x-box', 'y-box', 'width', 'height', 'onpix', 'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar', 'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx'] -- I guess this is what each column means

sick fern
#

However, I don't understand what it outputs

#

array([[0. , 0. , 0.37864068, ..., 0. , 0. ,
0.22023912],
[0. , 0. , 0.10195272, ..., 0.4670404 , 0. ,
0.54725075],
[0.04015716, 0. , 0.20881076, ..., 0.38116002, 0. ,
0.41306818],
...,
[0. , 0. , 0.40097547, ..., 0.00563453, 0. ,
0.6130048 ],
[0. , 0. , 0.24123913, ..., 0.16478615, 0. ,
0.01898614],
[0. , 0.12412387, 0.3103178 , ..., 0.17968003, 0.05001948,
0.53410375]], dtype=float32)

serene scaffold
#

that's the thing with neural networks, isn't it?

sick fern
#

This is the array it outputs

serene scaffold
#

well
welcome to ML

sick fern
#

lmao

serene scaffold
#

it looks like you're not encoding the letters numerically. which you should probably do. and it's not clear where in here you're training the model

#

it looks like you start predicting with it right away

sick fern
#

oh wait

#

yeah I didn't mean to do that

#

Do i train the model by fitting it?

serene scaffold
#

ya

sick fern
#

Okay, and how do I convert the letters to numbers again?

clever owl
sick fern
#

I know I need to typecast it in some sort of way

serene scaffold
serene scaffold
clever owl
#

Alright thanks

sick fern
serene scaffold
arctic wedgeBOT
#

Series.replace(to_replace=None, value=_NoDefault.no_default, *, inplace=False, limit=None, regex=False, method=_NoDefault.no_default)```
Replace values given in to\_replace with value.

Values of the Series are replaced with other values dynamically.

This differs from updating with `.loc` or `.iloc`, which require you to specify a location to update with some value.
sick fern
echo halo
#

Hey, I want to start doing Data Science from scratch how do i start?

azure tinsel
#

Hello, I am going over a tutorial about Data Science and ML and encountered an error message, which I can not solve. I am new to DS and ML...

serene scaffold
azure tinsel
#

The task is to calculate the survival rate survival_rate = passenger_age.groupby("Age")["Survived"].mean()

#

And I get:

#

AttributeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 survival_rate = passenger_age.groupby("Age")["Survived"].mean()

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'

serene scaffold
#

I'm going to sleep though. Good luck!

#

So passenger age is an array. Not a DataFrame

azure tinsel
serene scaffold
#

So, look into the differences between those two types.

#

You overwrote the passenger age variable

azure tinsel
#

You mean in cell 3?

#

Thanks @serene scaffold! I commented it out and could proceed with the tutorial. I will write the author a message about this. I definitely will have to learn more about this. This tutorial is just a primer. Thank you again and have a well deserved sleep.

serene scaffold
#

But I haven't seen the one you're referring to

#

Also I'm not sleeping

azure tinsel
#

@serene scaffold I found out, that there appears to be a typo. In cell 3 you probably have to calculate passenger_ages

serene scaffold
#

I have issues

azure tinsel
#

It is by Michael Hartl. And he is a great educator, so the tutorial is ok. The point is, that it has come out 1 month ago and this probable typo is almost at the last page

thin palm
#

how do I tell pandas to read this? I'm not even sure i know how to arrange this data

wooden sail
#

<@&831776746206265384>

#

this is being spammed in many channels

drowsy viper
#

Hi there I have a question about training an AI model in pytorch ,
I have about 30 classes (soon that will be about 180) since I am crawling more images, but the thing is I am getting multiple errors on my tensors at the criterion part. Is there a simple notebook I can follow to understand pytorch better. Btw my images in 30 different directories for now. Is this how my folder structure should be ? What is your general comment on this issue?

dusk musk
#

does anyone have a suggestion for a library that handles units well?
kind of like astropy's?

i wasn't sure if there was any other really good suggestions (especially if there's one that works with scipy's methods like integration/diffeqs)

wooden sail
dusk musk
#

i'll check it out :) thanks

wooden sail
#

lol

#

it does explicitly mention numpy support, so i'm optimistic

spiral pasture
#

Heyo im new in the whole AI-Thing and want to try out someting easy (i hope it is).
Iβ€˜d like to code and train an ai for spam detection in my discord bot. So this β€žfree nitroβ€œ stuff or invitelinks seperated with [dot] or smth sould be learned and deleted automatically.
I would use tensorflow for deeplearning bc i like the idea of deeplearning.
Do i have the right approach for my idea?

#

Why tensorflow?
Bc i want to implement a system in this bot so special useres can upvote or downvote the decision of the bot and it gets a live review

magic swallow
#

Maybe think about what sort of model would fit your needs and your training data

late shell
#

Hello. I'm a final year under graduate in India. And I want to pursue my Masters from Japan through the MEXT scholarship. The scholarship requires that the applicant proposes a SOP explaining the project/study he will be doing while his stay in Japan. I don't have any great ideas for the scholarship as of yet except this one. Can you guys please let me know what are your thoughts on this? Also I'm not very confident with this one since the release of Chat GPT because of its amazing capabilities.

My Idea: I plan to make a tool that generates meaningful stories using machine learning techniques like NLP by taking a particular set of keywords as input and other factors depending on the user.

When learning Japanese, particularly during my revision of previously learned words/kanji (a system of Japanese writing using Chinese characters, used primarily for content words.), I had to read the word/kanji, its meaning and an example sentence in which the word had been used. I had to do this over and over again for every single word. This method of revising felt inefficient, slow and boring after a certain amount of time. I also came to know through social media platforms and communities (such as Nihongo VR on discord) that a huge percentage of people learn words in a similar way. I wanted to make this part interesting and fun.

So, I intend to make a model/tool that will help international students learn the Japanese language in an easy and fun way. After inputting the words already learned (his vocab) by the user in the past, the model will generate an interesting story that would use the words inputted and some other unknown/new words depending on other parameters set by the user. The user can then read the story which will help him/her revise previously learned words in an interesting, fun and efficient manner compared to the usual robotic way of revising/learning words.

The model/tool will presumably have a few requirements from the users such as:

#

β€’ The words already learnt by the user in the past
β€’ The length of the story
β€’ The genre of the story
β€’ The age group for which the story is intended
β€’ The variability of new words to introduce in the story

#

Please let me know your thoughts on this.

obsidian peak
spring sphinx
#

Hi guys, I'm a web developer and have tried my hand at pandas and numpy to learn a little bit of data science..

I don't have any definite guide or pathway to follow, (my aim is to learn Machine learning (Deep Learning))

Can y'all guide me and recommend some resources, give me sort of a path that I can follow?

rancid quartz
#

I am trying to create a algorithm where it is trained 100 of photos and learns to read emotions displayed of photos, is should I use PyTorch or TensorFlow (I'm tipping toes in DL for the first time btw)

#

im going offline so just ping me if u know thanks!

stone glacier
#

hi, would anyone have a good source link where I check out how to tune/optimize hyperparameters for a TFIDF model?

lapis sequoia
#

why does my dataframe has a second header Date

agile cobalt
#

that's the name of the index

lapis sequoia
#

then what is the above header row

agile cobalt
#

the names of the columns

lapis sequoia
#

how is Date accesed

agile cobalt
#

df.index

lapis sequoia
#

can i delete this index?

#

index column

agile cobalt
#

do you want to get rid of the date column or just turn it into a normal column?

lapis sequoia
#

yeah

agile cobalt
#

which of the two options

lapis sequoia
#

oh either works

agile cobalt
#

you can use reset_index(), see the parameters it supports

#

!d pandas.DataFrame.reset_index

arctic wedgeBOT
#

DataFrame.reset_index(level=None, *, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=_NoDefault.no_default, names=None)```
Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.
agile cobalt
#

drop=True if you want to get rid of it
drop=False if you want to turn it into a normal column

#

dataframes must have an index though - using reset index just gives it a default unnamed numerical index

lapis sequoia
#

tried now, didnt work, its still there

agile cobalt
#

pandas methods just about never modify in-place, they all return a new dataframe/series instead

lapis sequoia
#

oh

#

awesome worked thank you

#

different problem the rows get shuffled

#

whats going wrong

agile cobalt
#

isn't that because of the train/test split?

lapis sequoia
#

probably

agile cobalt
#

!d sklearn.model_selection.train_test_split

arctic wedgeBOT
#

sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)```
Split arrays or matrices into random train and test subsets.

Quick utility that wraps input validation, `next(ShuffleSplit().split(X, y))`, and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation).
agile cobalt
#

shuffle=True

thin palm
#

how do I read this as a dataframe for my pandas?

lapis sequoia
agile cobalt
#

I recommend taking a better look at the documentation of the functions and modules you're using πŸ˜…

lapis sequoia
#

didnt cross my mind that it could be found in docs

#

or are you telling me i should read docs for every single method use it or not

hard birch
#

Guys quick question. I'm practicing doing codity's sample coding tests for a job application. I keep running into time problems when they ask me array questions i.e. return minimal elements of an array, find which array element is unique, etc. Of course these are really easy questions if you do not care about code efficiency but my question is how can I make my code run faster in these cases. For example if I have to return the max element of an array I would just use max() i.e. my code is literally just 1 line. But even then they say it's not efficient enough

#

but surely python must have some fairly efficient list indexing algorithms no built into it no?

#

how would it even be possible to write a faster code outside of producing a research paper

agile cobalt
nocturne eagle
#

honestly, even for very large libraries, reading the tutorials and docs takes a day or two, max

eternal hull
#

Does anyone use pyspark

serene scaffold
eternal hull
#

I am getting this error the input column index should have at least two distinct values

#

I am using vectorindexer,stringindexer for onehotencoding

torpid pecan
#

hi! someone here knows about a data science/data analytics case using heuristics algorithms?

serene scaffold
torpid pecan
#

thats the point. I just wanted a link, .pdf or something to start studying. If anyone (or everyone) has a case to share with me, I'll be grateful πŸ™‚

hot hearth
#

so i'm trying to do something slightly strange with matplotlib

#

this is a bit of an xy problem, but i really wanna solve x to improve my understanding

dawn urchin
#

send you problem

#

the code

hot hearth
#

i want to set the exact size for a figure, and not have it autoadjust

#

when i draw a pie chart, no matter how i configure everything, i can't seem to get it to cut off

#

here's a contrived example. the blue line is where i want the left edge of the image to be (yes, really)

#

but it's automatically growing to avoid cutting anything off. no matter what, i can't seem to disable that behavior

vast lintel
#

Is it possible to use BERT's small model for organising and expanding some text input?

dawn urchin
# hot hearth here's a contrived example. the blue line is where i _want_ the left edge of the...
freeCodeCamp.org

When creating plots using Matplotlib, you get a default figure size of 6.4 for the width and 4.8 for the height (in inches). In this article, you'll learn how to change the plot size using the following: * The figsize() attribute. * The set_figwidth() method.

mint palm
#

i might have asked before, but still confused
Actually i couldnt find some good source to "ACTUALLY" understand "HOW" zero shot learning works
I have read medium, and saw many video but i dont understand how while prediction, it outputs label

woven coral
potent sleet
#

Would it be Ok if I ask for feedback about the work I have published on my youtube channel here? I have composed all music in this channel using my own AI model and would appreciate constructive feedback

hot hearth
sick fern
#

Hello everyone. This is my code to recognize digits from 0-10.

from tensorflow.python.ops.logging_ops import image_summary
#DataSet
mnist = tf.keras.datasets.mnist
(x_train, y_train) , (x_test, y_test) = mnist.load_data()

#Normalization
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)


#Model
nn_model = tf.keras.models.Sequential()
nn_model.add(tf.keras.layers.Flatten(input_shape = (28,28)))
nn_model.add(tf.keras.layers.Dense(128, activation="relu"))
nn_model.add(tf.keras.layers.Dense(128, activation="relu"))
nn_model.add(tf.keras.layers.Dense(10, activation="softmax"))

nn_model.compile(optimizer="adam" , loss="sparse_categorical_crossentropy" , metrics=['accuracy'])
nn_model.fit(x_train, y_train, epochs=1)
nn_model.save('recognition.model')

image = cv2.imread('/1.png')[:,:,0]
image = np.invert(np.array(image))
image = image.reshape(1,28,28)
prediction = nn_model.predict(image)
print(np.argmax(prediction))

#

However, everytime I try to print the prediction, it doesn't tell the right digit even though it has an accuracy of 98%. I also checked the distortion of the image and it looks fine. Is something wrong with the code?

lapis sequoia
agile cobalt
lapis sequoia
#

yep and when you read the docs you feel like where am i even going after reading all these

mint palm
#

I am getting very confused with zero-shot learning. Can anyone please comment on following, if its true/false/correction needed:

  1. "attention is all you need" was actually zero-shot learning
  2. for zero-shot learning, input might be novel, but it is represented in common learned embedding space, and hence after representation model get idea about what it actually is.
  3. I dont see how zero-shot learning is possible without having "embeddings" and "embedding space"
grand belfry
#

Is there any way to feed ai music theory and ask it to compose a song?

serene scaffold
#

or do you mean hard-coding rules about music theory (no parallel fifths, go from the dominant to the tonic, etc.) into it, and then having it generate new music that follows those rules?

ocean swallow
#

How do you draw this in R? Circle's radius is a continous data and center is just a coordinate.

#

yeah sry about sending to pythΓΌon channel but was kind of urgent

prime hearth
#

hello, i tried SVM and random forest and got an accuracy of 0.76 for validation.
However, i tried to improve my model so i did some feature engineer and took meaningful column names and got an accuracy of 0.73 instead.
My question is, is it okay to choose a model with lower accuracy in this case since the number of features is 700 for the 0.76 accuracy while 300 features around for 0.73
i was afraid if the 0.76 validation is because my model is overfitting

lapis sequoia
#

I only add one element but the shape end up like this, why

mild dirge
#

So in your case you want to reshape the second array such that it is (1, 30, 1) (Might not be necessary)

#

And then append along axis=0

lapis sequoia
#

i would have no clue even if i read the docs

prime hearth
#

hello, i would like to ask how many features is okay for trainning a model in NLP?

#

im unsure if i have too little features or too many

#

im doing a classification problem

hasty mountain
prime hearth
#

@hasty mountain i know the idea behind PCA but im still new to it. Also , what im unsure of is how would I find out what those new features are with PCA?

#

Thank for answering - my model predicts yes or no based on the text from user reviews on apps the use and their rating for the app

#

so my model has around right now 300 features, i think it makes sense since it is using the most popular words to predict something

#

but if i add more features where the new features are new popular words i noticed an increase in accuracy

#

but again im unsure if that is okay- im really new to NLP and this is like my first big project so just trying to make sure im doing things right

hasty mountain
hasty mountain
#

Perhaps you could use Grid Search using different number of features and check how your model will perform.

prime hearth
#

the features i chose are like the most impactful with polarity (sentiment value).

mild dirge
#

Popular words are also not always useful words

#

and f.e.

prime hearth
#

yeah when i mean popular i mean like "worst" "hate" "love"

#

since those impact whether the user likes the app or not

hasty mountain
#

Sentiment "remark words"

#

There might be a specific term for that, but I don't know

prime hearth
#

yeah i used spacy library to get adjectives from my features - im also doing feature selection right now but i guess my original question is how many features is okay for my SVM model

#

thank you i can try that grid search- it just it takes a long time to run and my computer slows down if i do it for all features and models

hasty mountain
#

I think there's better options...I just don't remember.

#

There's Grid Search, Randomized Search...

prime hearth
#

yeah i used randomize search too

hasty mountain
#

Then you might already have an idea on how many features you should use.,

prime hearth
#

i guess i not sure if 700 features is okay- my model gives like 0.75 accuracy after performing metric testing with sklearn

#

so how would i do randomize search with each feature? You mean like forward feature selection process?

#

what you mentioned earlier

hasty mountain
#

I guess you can try making randomized search with 700 features, then try with 600, 500...

#

Or even Grid Search with 700, 600, 500...and then you can filter the best number.

prime hearth
#

oh ok thanks that make sense. I guess il try that then whatever gives the best accuracy or metrics i will try forward feature selection to choose the optimal features

hasty mountain
#

I guess some of those cross-validation methods might help

#

K-Fold...

prime hearth
#

thank you

tardy summit
#

Pandas. I have a tiny data set read from a csv. The first step I need to do is map the contents of one column to two columns, and the value of the second column will depend of the predecessor.

df = {
    "Accounts": [
        "Assets",
        "Current Assets",
        "- Checking",
        "Total Current Assets",
        "Total Assets",
    ],
    "12/31/2016": ["NaN", "NaN", 1000.00, 1000.00, 1000.00],
}

Desired

new_df = {
    "Accounts": ["Assets", "Current Assets", "Checking"],
    "Parent": ["root", "Assets", "Assets:Current Assets"],
    "NaN","NaN",1000.00]
}

I'm not clear on the best way to approach this. Seems one option would be to build a new column first by iterating over the "Accounts" column, and then filtering out the lines that start with "Total".

I think that would work, but is there a better way?

pulsar ether
#

I finally got it installed on WSL2 / Ubuntu from scratch. Haha what a PITA

lapis sequoia
#

is Seq2Seq model the same as iterating a sequential model several times

mint palm
#

I dont see how zero-shot learning is possible without having "embeddings" and "embedding space"
Can we do zero shot learning when labels are like orange, banana, papaya, etc. (i mean they aren't descriptive but specific)

grand belfry
#

is there any snapshot of a musical composition ai?

mint palm
#

It doesn't makes sense to fine-tune a model(pre-trained on large dataset) that was trained for Zero-shot learning, unless we are making changes in architecture. Right?

young terrace
#

what's everyone's opinion on Data camp?

lapis sequoia
#

im interested in ai what programming language should i start , ik the basics of python but i dont know why im so bad.

sick fern
#

Hey guys, I really need help with a convolutional neural network I'm trying to work with. This is the code:

#MNIST
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


train_images = train_images.reshape((60000,28,28))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((10000,28,28))
test_images = test_images.astype('float32')/255

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape = (28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64,activation = 'relu'))
model.add(layers.Dense(10, activation= 'softmax'))

model.compile(optimizer = "Adam" , metrics = ["accuracy"] , loss = "categorical_crossentropy")
model.fit(train_images, train_labels, epochs=1)

Prediction = model.predict(test_images[0])
print(Prediction, test_labels[0])
#

This is the error code:

ValueError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 228, in assert_input_compatibility
        raise ValueError(f'Input {input_index} of layer "{layer_name}" '

    ValueError: Exception encountered when calling layer "sequential" (type Sequential).
    
    Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=2. Full shape received: (None, 28)
    
    Call arguments received by layer "sequential" (type Sequential):
      β€’ inputs=tf.Tensor(shape=(None, 28), dtype=float32)
      β€’ training=False
      β€’ mask=None

#

If anyone can help me, that'd be great. Also I'd like to talk to experienced ML learners to better understand this field, so if you're fine with me asking questions to u in DMS react to this message.

#

Thanks a lot!

lapis sequoia
#

im interested in ai what programming language should i start , ik the basics of python but i dont know why im so bad.

lapis sequoia
#

ik the basics

sick fern
#

Whoever said that is definitely wrong

lapis sequoia
#

mybe u can help me

sick fern
#

ML Engineers prefer Python because it's easy and accessible

lapis sequoia
#

im 14 years old

sick fern
sick fern
#

I think it's also important to learn math skills like calculus, and linear algebra if you want to master neural networks

sick fern
lapis sequoia
#

is it really programming when u using libraries why cant u do it without them??

sick fern
#

I used to think that too

lapis sequoia
#

reply

sick fern
#

But the truth is, libaries just make it more easier. For example, making a neural network without a library is next to impossible.

sick fern
lapis sequoia
#

got it

#

ik the basics but im like saying corrupted

#

should i relearn

sick fern
sick fern
# lapis sequoia ik the basics but im like saying corrupted

If you don't understand it too well, and think you still need some relearning, do this crash course: https://www.youtube.com/watch?v=_uQrJ0TkZlc

Python tutorial - Python full course for beginners - Go from Zero to Hero with Python (includes machine learning & web development projects).

πŸ”₯ Want to master Python? Get my Python mastery course: http://bit.ly/35BLHHP
πŸ‘ Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF

πŸ‘‰ Watch the new edition: https://youtu.be/kqtD5dpn9C8

...

β–Ά Play video
#

Get my Free NumPy Handbook:
https://www.python-engineer.com/numpybook

Learn NumPy in this complete 60 minutes Crash Course! I show you all the essential functions of NumPy, and some tricks and useful methods. NumPy is the core library for scientific computing in Python. It is essential for any data science or machine learning algorithms.

β–Ά Play video

Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith

Data & code used in this Tutorial: https://github.com/KeithGalli/pandas
Python Pandas Documentation: http://pandas.pydata.org/pandas-docs/stable/

Let me know if you have any questions!

In this video we walk through many of the ...

β–Ά Play video
#

And then, I'd suggest you do an ML course to get the hang of tensorflow. I did a course called Machine Learning for Everybody and it was a 3 hour course which helped me a lot

pseudo aspen
#

how can you find the index of a specific tensor in a list of tensors?

lapis sequoia
#

difference between ml and data science

agile cobalt
# lapis sequoia difference between ml and data science

machine learning is about creating, evaluating, deploying and maintaining models, ranging from some traditional methods like decision trees to neural networks

data science covers almost everything that regards working with data - collection, transformation, analysis, storage (to a lesser degree), presentation etc

young terrace
#

is it worth it? it appears to be on a small sale and i was considering it

sick fern
#

Any people experienced in ML who I can DM for help?

serene scaffold
snow thicket
#

Hi everyone,

I'm trying to use MONAI TorchVisionFCModel to include inception_v3.
I'm getting error
Calculated padded input size per channel: (1 x 1). Kernel size: (5 x 5). Kernel size can't be greater than actual input size
looks like i'm trying to run input size (1x1) with larger kernel size (5x5). I'm not sure what I should add to make changes to my input size.

sick fern
#

Hey guys, I have this code:

#MNIST
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


#Normalization
train_images = train_images.reshape((60000,28,28,1))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((10000,28,28,1))
test_images = test_images.astype('float32')/255

#Labels --> Numerical Values
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

#Model
NN_Model = models.Sequential()
NN_Model.add(layers.Flatten(input_shape = (28,28,1)))
NN_Model.add(layers.Dense(128, activation='relu'))
NN_Model.add(layers.Dense(128, activation='relu'))
NN_Model.add(layers.Dense(10, activation='softmax'))

#Compilation, Fitting
NN_Model.compile(optimizer = "Adam" , metrics = ["accuracy"] , loss = "categorical_crossentropy")
NN_Model.fit(train_images, train_labels, epochs=1)

#Prediction, Display
Prediction = NN_Model.predict(test_images[0])
print(np.argmax(Prediction), test_labels[0])
#

but it's giving this error:


    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 248, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer "sequential_23" (type Sequential).
    
    Input 0 of layer "dense_57" is incompatible with the layer: expected axis -1 of input shape to have value 784, but received input with shape (None, 28)
    
    Call arguments received by layer "sequential_23" (type Sequential):
      β€’ inputs=tf.Tensor(shape=(None, 28, 1), dtype=float32)
      β€’ training=False
      β€’ mask=None

#

Can someone please help me? I don't know why it's not working.

weary rock
#

what i see from the error is output from flatten layer is not compatible with the input to dense layer

#

check that out

serene scaffold
#

@sick fern someone answered you ^

sick fern
weary vigil
#

Could someone with a good understanding of Pandas please help me:

steady basalt
#

is it worth learning how to build generative models or language models due to their increasing popularity? every interview I had thesedays asked me about chatgpt for some dumb reason

woven coral
fiery dust
#

I dont want to spam the channel with a huge message, so if someone's willing to read it, plz ping me and I'll post the message its a somewhat big question since I give context and some more stuff πŸ™‚

tranquil sand
#

guys, i'm reading "deep learning with python" by chollet, what should i do after i finish it?

slender sand
#

Can anyone recommend a good Udemy (or other) course for someone who wants to learn how to perform various business data analysis tasks (such as forecasting) and already has an intermediate understanding of Python and SQL? I only have time to take probably 10-15 hours of classes before Monday and I don't want to pick a lemon, been burned on bad classes before

tranquil sand
#

and posibly work at deepmind or openAI

fiery dust
#

Guys I think I'm investigating a lot on where to study from, what's the best course and bla bla bla. I want to start somewhere. I thought FreeCodeCamp PyTorch course would be ok. But I dont know yet, where should I learn from? Also it seems like PyTorch can't do everything, we also need SciKit, SciPy and more things to work with different AI/ML models.

#

I just want to start I'm just in a loop of investigating where to learn from but I never start

#

btw I like learning with videos and, for the moment im just interested in a predicting values. for example ive a list or values that when plugged in a function it returns a percentage, so I test different lists and then based on the percentage value, and giving to the model the list with the values tested, which are possible better lists ( better = higher percentage)

serene scaffold
tranquil sand
#

i'm gonna start to see calculus this year{

serene scaffold
tranquil sand
#

yep

serene scaffold
# tranquil sand become good at NLP

Speech and text are sequences of symbols, so if you've learned the basics of neural networks (like feed forward ones), you should start looking into ones that involve sequences. Like LSTMs.

tranquil sand
#

but it just makes an overview

serene scaffold
#

which is fine

serene scaffold
#

well, my roadmap for you is to implement something that involves a feed-forward neural network (even if it isn't NLP related), and then do something that involves an LSTM.

hasty mountain
supple terrace
#

import requests
import json

Replace with your own OpenAI API credentials

openai_api_key = "your_openai_api_key"
openai_model = "your_openai_model"

Replace with your own Zendesk API credentials

zendesk_subdomain = "your_zendesk_subdomain"
zendesk_username = "your_zendesk_email"
zendesk_token = "your_zendesk_token"

Function to generate response from OpenAI API

def generate_response(text):
headers = {
"Authorization": f"Bearer {openai_api_key}",
"Content-Type": "application/json",
}
data = {
"model": openai_model,
"prompt": text,
}
response = requests.post("https://api.openai.com/v1/engines/davinci/completions", headers=headers, data=json.dumps(data))
response_json = response.json()
return response_json["choices"][0]["text"]

Function to retrieve previous tickets from Zendesk API

def get_previous_tickets():
headers = {
"Content-Type": "application/json",
}
response = requests.get(f"https://{zendesk_subdomain}.zendesk.com/api/v2/tickets.json", auth=(zendesk_username, zendesk_token), headers=headers)
response_json = response.json()
return response_json["tickets"]

Main code

previous_tickets = get_previous_tickets()
for ticket in previous_tickets:
text = ticket["description"]
response = generate_response(text)
print(f"Generated response for ticket {ticket['id']}: {response}")

#

does this look right? im trying to use openai to read past tickets in zendesk to answer new tickets

weary vigil
strong sedge
#

I have some data where applying regular forcasting algos is not applicable (data is not at all periodic, it's all over the place), is there a way to do forcasting with a collaborative recommendation system ?
Making a recommendation system was part of the assignment.
I am just looking for articles that goes over this process, I wasn't able to find something like this.
(Not looking for code, just a high level explanation will suffice)

hoary wigeon
#

Hi, this is the input_dataset of the time_series problem that I'm working on..

#

If anyone can help me in building a time series model using the above data.. that would be great.

#

I want to automate the value selection of p,d,q

strong sedge
mint palm
#

can you do zero shot learning when labels are like orange, banana, papaya, etc. (i mean they arent descriptive but specific)

considering i am doing zero shot text2video retrieval and training included examples of
running,
jumping,
catching,
kicking

(one word only)

now on testing if i insert two new class:
skipping
swimming

how will my model even with learnt embedding know the difference in "skipping" and "swimming"?

In descriptive text i do understand model might learn from auxiliary words too which are kind of 'describing' the activity and might help understand, decipher when new text is given because it might have descriptive auxiliary word too.

hoary wigeon
#

How can I reduce the gap between actual and forecast?

fiery adder
celest vine
#

I don't understand which columns need scaling and which needs one hot encoding

#

I am aware numerical columns need scaling and categorical need one hot encoding

#

But a column with 0 and 1 (boolean values) considered numerical or categorical?

mild dirge
#

If it is either 0 or 1, then it's probably categorical

#

Either belongs to class A, or class B

#

In that case you also don't need to 1-hot encode it

#

@celest vine

celest vine
#

Also, how big should be the difference between two numerical columns to perform scaling?
Suppose column A has a highest numerical value of 5 and column B as a highest numerical value of 50. Does these require scaling? @mild dirge

mild dirge
#

always

#

Just always normalize the data

#

make sure the values are between 0 and 1. Or if the data is approx. normal, standardize it

warm pike
#

Hello, I am using Alteryx to load data in from a private website but when beginning the API calls using download tool, it says no data received.

displays HTTP/1.1 400 BAD REQUESTDate: Thu, 02 Feb 2023 10:06:26 GMTContent-Type: text/plainContent-Length: 0Connection: keep-alive

celest vine
#

Also, should I always normalize all numerical columns?

mild dirge
#

normalizing is scaling all values to be between 0 and 1

#

And you generally want to do that for all columns

#

Such that the model will not be biased towards columns with higher values f.e.

celest vine
#

Got it. Thanks

hot slate
#

Hi guys, I have a question about memory optimization in parallel training.

I have a dataset D, and I want to train a model M that has several parameters that I want to modify. When I run the training in a parallel manner with multiprocessing (using multiprocessing.pool), my machine often run out of memory. Because the dataset is not that large to overwhelm my machine, I believe that each process makes a copy of the dataset on its own memory zone. So the dataset is loaded multiple times onto the memory and hence overwhelming my machine.

My current naive solution is that I create a class to store the dataset as an internal property self.dataset. My training procedure will read the dataset from self.dataset instead of reading from a function's parameter). However, it doesn't seem to work.

Is there any way that I can load the dataset onto the memory once only, and have all the training procedures in all processes read from that?

young granite
#

Hey guys is there a way to adjust height and width of the legend box in plotly?
If i use yanchor="bottom" and xanchor="center" the legend box (in horizontal mode) is way too high.
Thanks in advance!

mild dirge
#

That means if you have enough memory to load your dataset twice, there is no issue

hot slate
#

Yes, I'm using pool.map and there are roughly 30 child processes, which will load my dataset 30 times. That's why I'm looking for a solution for this issue

mild dirge
#

I don't think it makes 30 copies of the entire dataset

#

As a side note, how do you parallelize the model training by splitting up the data? Training the model is normally sequential. How do you combine the resulting partially trained models?

hot slate
#

As I initialize the Pool object without any argument, I believe that the max_workers will be equal to the number of CPU. So when I run the code, I can see that all of my 32 CPUs are utilized. However, when the memory runs out, some processes are killed.

hot slate
mild dirge
#

Big models?

hollow citrus
#

If I train my model on multiple datasets without making changes or recompiling the nn does the nn retain information or does it get replaced?

mild dirge
vestal lagoon
#

best library to learn machine learning anyone help

hollow citrus
hollow citrus
vestal lagoon
hollow citrus
#

I would say initially pandas, numpy, matplotlib, then later on scikit-learn, or xgboost, etc.

mild dirge
hollow citrus
vestal lagoon
#

thanks gotta watch some documentation and tutorials then

hollow citrus
#

no problem!

hot slate
mild dirge
#

Could you try simply replacing map with imap ?

#

This returns an iterator instead of a copy of the chunk from what it seems

#

I've also read a post on SO, and some guy explains that pool.map uses copy-on-write, so unless you modify the data itself, it shouldn't make a copy for every child process.

mild dirge
supple terrace
#

import requests
import json

Replace with your own OpenAI API credentials

openai_api_key = "your_openai_api_key"
openai_model = "your_openai_model"

Replace with your own Zendesk API credentials

zendesk_subdomain = "your_zendesk_subdomain"
zendesk_username = "your_zendesk_email"
zendesk_token = "your_zendesk_token"

Function to generate response from OpenAI API

def generate_response(text):
headers = {
"Authorization": f"Bearer {openai_api_key}",
"Content-Type": "application/json",
}
data = {
"model": openai_model,
"prompt": text,
}
response = requests.post("https://api.openai.com/v1/engines/davinci/completions", headers=headers, data=json.dumps(data))
response_json = response.json()
return response_json["choices"][0]["text"]

Function to retrieve previous tickets from Zendesk API

def get_previous_tickets():
headers = {
"Content-Type": "application/json",
}
response = requests.get(f"https://{zendesk_subdomain}.zendesk.com/api/v2/tickets.json", auth=(zendesk_username, zendesk_token), headers=headers)
response_json = response.json()
return response_json["tickets"]

Main code

previous_tickets = get_previous_tickets()
for ticket in previous_tickets:
text = ticket["description"]
response = generate_response(text)
print(f"Generated response for ticket {ticket['id']}: {response}")

does this look right? im trying to use openai to read past tickets in zendesk to answer new tickets

fallow frost
#

pls someone show him how to format the code in Discord

mild dirge
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

hollow citrus
#

Is it possible to get my code reviewed here? I have a bit of confusion in NLP with GloVe.

supple terrace
#

!code

hollow citrus
#

can I just share repo link?

#

it keeps doing random guesses at the sentiments

#

so 50% train acc and ~60% train loss

hot slate
neon obsidian
#

Hello, I am trying to convert a txt file into a csv file, using pandas, but having some trouble

#

Does anyone know much about writing csv files? I have more detail in #1070706066523951156 , or you could ask me here ... thanks in advance 🌞

tacit galleon
#

Hi everyone.
I'm working on a university project for image recognition. So the idea is if a take a photo of a building I can locate the building, so I want to train a network to recognize the buildings from my university and I was testing mobileNet (https://keras.io/api/applications/mobilenet/) but I think I'm not doing it right. The learning curves are weird and I don't know what can I do to improve my model, so I'm open to suggestions if anyone can help me

#

This are the curves from the training process

hardy kernel
#

Sorry for interrupting the ongoing conversation but what's the most popular/intuitive NLP library? Pypi.org doesn't let me sort based on popularity

hard rover
#

df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=True)
is this right? it's not doing what I want to do

#

so , I have one column called STRIKE , and I want rows which have values between 17000-19000 but in return it gives me a row which has STRIKE = 20300

#

some stupid logical error IG

fiery dust
#

does someone find this video good? https://www.youtube.com/watch?v=pqNCD_5r0IU, the guy just codes and literally narrates what he is coding, he never explains the functionality of each function or even why he is doing each thing. I'm finding it really hard to learn scikit like this. What can you guys recommend me? Thanks

hollow citrus
upbeat dagger
#

Anyone know of a good tool to use to for speech recognition? I.E. A model that can tell that it's a specific person's voice?

celest vine
hard rover
hard rover
hollow citrus
coral trout
#

what do i type to get between the numbers
like
if the temperature is between 20-30 i want it to say "its sunny"
between 10-20 "its a nice day"

hard rover
#

there is some logical error and I can't figure it out, maybe im using drop wrong, I can't tell

charred light
hard rover
charred light
hard rover
#

i will try again

charred light
# hard rover this is my main problem

Also this would work too. In this case, I'm assuming between includes the edges. You can adjust if you don't want the edges.
df = df[(df['STRIKE'] >= 17000) & (df['STRIKE'] <= 19000)]

#

Although above statement would require you to rerun your original code where you read in your dataset.

proud solstice
#

So I just learned python like two weeks and I'm trying to evoke Artificial intelligence as far as I know
Basically I'm trying to make a code that understands a topic or a scripted text, like I'm making a bot understand what's being said to it

For that.....
I made a list of adjectives as they usually add nothing to the conversation's theme in English and then I made the code remove whatever adjective comes on its way, good

Then I'll probably make another list with keywords which usually indicate a certain topic like "the" "is" "on"......... etc.

And I'll make a greater list with all the weird words that indicate a certain topic to determine what's the text input is about, how'd you rate this stupid idea as a star?

proud solstice
#

Why I imported random? I really don't remember lol

umbral charm
#

In the library matplotlib, is there a way to make the y axis at 0 instead of at the start, just so it looks nice when my graph is symettry across x = 0

umbral charm
#

Like i moved the spine from the right to the center

#

but how would i get ti to display exactly at 0

charred light
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

charred light
umbral charm
#
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()
#

swear that use to colour code it

#

!paste fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

charred light
#

That's fine. You need to add {py} after the ```

umbral charm
#
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()
#

πŸ’€

#
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()
charred light
#
import numpy as np
#

oh, you need the dot

umbral charm
#

Oh do u want the full code?

charred light
#

No, that's fine

umbral charm
#

Ive just sent the plot the other is just functions imports and unpacking

umbral charm
charred light
#

It shows up fine to me.

umbral charm
#

ax.spines['left'].set_position('zero') instead of 'center'

umbral charm
charred light
umbral charm
#

i just put 'zero' in it but i didnt even know i could

charred light
umbral charm
#

i cant find the docs on that set_positon

charred light
#

#ReadTheDocs

umbral charm
#

i did

#

it said nothing useful

charred light
umbral charm
#

na

#

wtf

#

THank you tho :O

charred light
charred light
plush jungle
#

I'm training a q learning reinforcement agent and I'm trying to start really simple, so the game is just a stationary agent (blue) trying to aim a laser at a stationary target (red), by rotating clockwise or counterclockwise

#

it gets a reward whenever the laser is on the target and gets a slight negative reward whenever it's not on the target

#

the state space is {0,360}, with each state being a scalar representing a different angle the laser could be at

#

in theory, the q learning algorithm should generate a q value for each angle, learning to highly rate the couple of angles that are on target and rate all the other ones with a low q value

#

the problem is that all the q values are changing together for some reason

#

these are visualizations of each angle's q value as predicted by the neural net

#

red means high and blue means low

#

purple means close to the center of the distribution

#

what it should look like is the angles that are pointed at the target should get more and more red and everything else should get more and more blue

hard rover
charred light
river sapphire
plush jungle
#

by reward shaping do you mean making the reward continuous? so it gets higher the closer to the target the laser is?

river sapphire
#

yes I mean making some sort of function

river sapphire
#

doesn't have to necessarily be just continuous

#

could be a combination of continuous and discrete rewards

#

in this case what I would so is give it a reward based on some sort of formula if it chose an action that points it closer to the target and a punishment if it points away

#

and it gets more reward if it chooses an action that points it towards the target and it's already close to pointing towards the target

#

basically creates a gradient

plush jungle
#

and all of the q values still change together as one

river sapphire
#

wdym by that?

#

your'e using a neural network right

#

each time the neural network would update the q-values for all states should update as well

plush jungle
#

yes, and I've tried a bunch of different layer/neuron combinations

river sapphire
#

unless it updates some weight in the last layer or something

#

no wait even so it should backpropagate and update a large majority of the weights
just realized I remembered how backpropagation works wrong lol it should theoretically update all the weights unless you have an issue like dying relu

#

is your issue that it's predicting the same q-value for all states?

plush jungle
#

not the same no

river sapphire
#

what's the issue then

plush jungle
#

but I'm pretty sure the distribution is the same

#

I was trying to normalize the values between 0 and 1 so I could visualize the q values with color

#

and I was doing it like this

def NormalizeData(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))```
#

and all the the values were the same after normalization

river sapphire
#

all the q-values were the same?

plush jungle
#

yes

#

either 0 or 1

river sapphire
#

do you have a reward graph or a loss graph

plush jungle
#

no, but I have the numbers, I just haven't graphed them

river sapphire
#

the graph helps a lot imo

#

can you send the code idk if I can help because this seems like a bug in the implementation but I just wanna see

plush jungle
#

the reward stuff is at the bottom of top_down.py

river sapphire
#

shoot really gotta learn python lol

#

understood about 10% of it

charred light
hard rover
#

i had one, df.drop and if i remove it, i still get the same error

charred light
river sapphire
hard rover
#

wanna see the code?

charred light
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hard rover
charred light
hard rover
#

if u open the sheet you can still see all those columns in it

charred light
#

You mean the url?

hard rover
charred light
hard rover
#

i mean, your code doesn't have it, i just did it

charred light
#

Paste a version w/ your changes then.

hard rover
#

if u r comfortable hop on VC , I'll explain
totally understandable if you don't, I still appreciate it.

charred light
# hard rover https://paste.pythondiscord.com/avezediqat

I don't see any errors coming from the code itself.

Try this and let me know what prints out:

df = pd.DataFrame(l_OC)
df.columns = OC_col
print(len(df))
print(df.columns)
print(len(df.columns))
#df1=df.drop(columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK'])
df1=df.drop(['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK'], axis = 1)
print(df1.columns)
print(len(df1.columns))
marks_data = pd.DataFrame(df1)
print(len(df1))
file_name = 'OI.xlsx'
marks_data.to_excel(file_name)
print("done")

hard rover
#

where can I learn this? so that I don't have ask stupid questions?

#

i mean what should I search? i tried searching web scaping using python

charred light
#

Technically, df1=df.drop(columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK']) this should work, although it's not preferred method.

hard rover
#

or just learn pandas completely , cuz I need graphical representation as well , will have to compare daily end of the day sheets

charred light
#

Yea, I'm not sure why exactly it didn't.

hard rover
#

thanks tho, i really appreciate it, if only I could be useful to you as well

charred light
#

Then generally, I just click through all the stack overflow links, and skim to see if the original problem is similar to the problem at hand. If so, take a quick look at the solution to see if it'll work or it's over complex/too niche.

hard rover
charred light
hard rover
#

it doesn't gives the same error

charred light
hard rover
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hard rover
charred light
#

No, the error not the code.

hard rover
#

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10504\1824382283.py in <module>
5 headers={'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36'}
6 response = requests.get(url, headers=headers, timeout=10)
----> 7 json_object = response.json()
8 expiry_date = '09-Feb-2023'
9 def set_decimal(x):

~\anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
976
977 @tribal fog

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

charred light
#

The JSON decodeError is basically saying you have something invalid from your scraping.

hard rover
#

but it was scraping just fine right now

charred light
hard rover
#

how can I know if they banned me?

#

i can access the site in my browser

charred light
hard rover
#

401

#

unauthorized

hard rover
#

sorry

tribal fog
charred light
#

That's actually pretty funny

hard rover
#

yea xD

#

it says 401 for u as well

tribal fog
charred light
#

But yea, 401 means you didn't scrape successfully. And your response is empty.

So your next line of code json_object = response.json() is complaining that your input is essentially empty.

hard rover
#

omg, so tiring I just fixed it... it was working fine
I add one drop command and shit goes south

#

i hope i don't get 403

#

i removed that drop, now everything is working

#

is my jupyter working fine???

charred light
hard rover
#

see now it works fine i even get 200

charred light
hard rover
#

can you teach me, i'll pay you

charred light
#

Oh, if your going to set the results to df2, then you can't use inplace.
df2=df.drop(df1[ (df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index)

#

It's either df_new = df.drop(something)
or df.drop(something, inplace=True

hard rover
#

why is it giving an error even in comment?

charred light
hard rover
#

remove that line it will work fine

#

no error

charred light
#

Also, there's not really a need for .drop in the first place.
df2 = df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].reset_index(drop=True)

charred light
hard rover
#

oof

#

in stead of wasting time like this, I might as well learn this stuff

#

I am not understanding anything

charred light
hard rover
#

im trying to change someone else's coding with zero knowledge of scraping & pandas

charred light
#

Yea, I can tell lol

hard rover
#

xD

#

I really appreciate your patience with me, if I were u I wouldn't be able tolerate this
you are such a nice person

charred light
#

But just as a warning, most websites will be able to tell that your scraping and can ban you on the spot. Or, if you access too fast they will ban you too. (Humans can't read 100 web pages in under 5 seconds, but a program can)

hard rover
#

will definitely learn it all, meanwhile can you do me a favor??
i only want ;
'c_OI', 'c_CHNG_IN_OI', 'c_VOLUME', 'c_IV', 'c_LTP', 'c_CHNG', 'STRIKE', 'p_LTP', 'p_IV', 'p_VOLUME', 'p_CHNG_IN_OI', 'p_OI'
can you do it?

hard rover
charred light
charred light
#

There's a save option

hard rover
#

oh

#

from the API link? im using the code

charred light
#

Yea, it bypasses the 401 errors from the requests.

hard rover
charred light
#

Yea, I only added comments

hard rover
#

oh I see ,thanks
will get back to you after learning pandas & lil bit of scraping or maybe getting data from local json.

charred light
#

But, if you rerun the same code again without restarting your notebook, you will run into the JSON issue you had earlier.

#

This only occurs if I rerun the cell without restarting my notebook.

hard rover
#

too much to learn, kinda overwhelming

#

thanks for everything , it's already 6:30 AM here, lost sleep due to it
gotta go to work soon
bye

tropic matrix
#

is there a place that lets me create such images, or do I have to handmake them in like photoshop

lapis sequoia
#

hello, someone can help me with a problem that i have using library scikit learn, numpy and pandas?

drowsy timber
#

do you guys have a rule of thumb when deciding what dimensionality reduction technique to go for?

I have this dataset thats just a list of subjects in a school. So rows are students, and columns are subjects then its just denoted with either 1 or 0 if the student is taking that course in the current semester.

celest vine
#

LabelEncoder Or one hot encoding for categorical features?
Which should I do?

sturdy breach
#

i have a q

#

how do i print specific val from pandas?

#

i tried it but it just prints .... at the end

celest vine
sturdy breach
#

i selected value from df using [][] and it just prints something and in the middle of it it just stops and prints ...

#

df[df["Počet tlačítek"]>0].nlargest(1,"Počet tlačítek")["url"]

#

".to_string()" does not help

fringe bay
#

hey guys
I have a list of gps coordinates. They are plotted in this image.
The coordinates are from the route of my place to a shop in my town.
Sometimes there are lots of points in the same relative small location, sometimes points are missing for a longer distance.
I'm looking for a way to draw spline or something similar from start point to endpoint.
Ultimately I would need to be able to split the spline into X parts, and get those coordinates.
I was looking at scipy with splrep, but I was getting an error "Error on input data", apparently the coordinates have to be sorted and unique. Not sure if that would work in my case though.
What do yo guys suggest ?

wooden sail
# fringe bay hey guys I have a list of gps coordinates. They are plotted in this image. The c...

so, the issue with splines is exactly as you said. you specify the endpoints that each spline exactly passes through, so the points have to be sorted and unique. these are easy problems to solve tbh, but there are alternatives. one of them is to use some form of fitting function instead. however, these do not pass through the data points in general, only "close" to them. they also don't join nicely to each other, which is the main feature of splines: you specify continuity up to some degree of derivative

#

so my answer would be "pick your poison", because there is no nice way of doing this with all the properties you might want πŸ˜›

fringe bay
wooden sail
#

i would decide based on how important accuracy and smoothness are to you

#

if you need the curve to actually pass through the points, it'll have to be splines and you will need to preprocess the data

fringe bay
#

I need them to be kind of accurate
I like the idea of splines, but my block is here: I have the coordinates which might be forward, backward, forward, backward, and since I need to sort them and make them unique, I don't want to mess it in those situations

#

but I assume that's what you mean by preprocessing it

wooden sail
#

yep

#

you can sort by time stamp if you have that

#

and discard points with the same coords if they have consecutive time stamps

fringe bay
#

if I have for eg consecutive longitude of 45.4 let's say, I have 3 consceutive ones, I keep only one

#

I would need to remove the latitude from the other list as well, from the same positions of those 2 removed from longitude

#

I'm just thinking out load, lol, please correct me if I'm wrong

wooden sail
#

that sounds right

#

but only if the latitudes are also repeated

fringe bay
wooden sail
#

you either remove both or neither

#

but a point is only equal to another if both lat and long are the same

fringe bay
#

I agree

wooden sail
#

ah, i'm dumb i see what you mean. you wanna avoid a multivalued point

#

yes, you're right

fringe bay
#

no worries, this is the part that was unclear to me too

wooden sail
#

so yeah, if an "x coordinate" is repeated in consecutive points, you have to pick one. keeping the first is the easiest

#

tensorflow and other tools like (e.g. pytorch) it are necessary in machine learning because of a handful of reasons:

  • they construct a computation graph
  • that graph allows the computation to be optimized, making it faster and use less memory
  • it also allows automatic differentiation, so it can do most of the math for you
  • they use extremely efficient math libraries
  • you can use accelerating hardware without changing much of your code at all
  • they make parallelization over devices very easy

you can do most of this stuff by hand too, but the libs make it automatic. it's not realistic to do these things yourself for large scale models, but it's fine for small ones

fringe bay
#

let's say this might work, this gets me to the 2nd question
once I have the spline, how can split it into let's say 30 equal points / distances and get the points ?

wooden sail
#

what the splines give you is the coefficients of polynomials

#

once you have something of the form f(x) = a_nx^n + ... + a_0 x^0, you can evaluate this at whatever value of x you want

fringe bay
#

that makes sense
appreciate it @wooden sail !

steady basalt
#

i just saw some guy build a website in 15 minutes using chat gpt.....

#

tf does this mean for alot of coding jobs

#

barrier to entry falls, salary plummets

scarlet zealot
#

Hello. What statistical tests can we use to find the relationship of one stock to another? For example, if I have a Stock A (say Apple) and a Stock B (say Microsoft), I’m looking to test if the movement of Apple has any impact of the movement on Microsoft.

#

I was just thinking a Pearson correlation between the change in daily / weekly or monthly prices between the 2 stocks.

eternal hull
#

Hi. How do I learn to do eda

eternal hull
#

I have to build a retention model can anyone help me

#

I don't know how to begin

celest vine
#

Today I developed my first Machine learning model and got an accuracy of 90%. Am I a data scientist now?

wooden sail
#

yes, you will receive your degree by mail in the next two weeks.

hoary wigeon
#

Hi guys, I need help in time series..

In my problem there is a trend which repeats after every quarter. There's peak on first day of quarter and decreases as keep moving close to quarter end.

But, after every year the height of peak keeps decreasing in the actual data. But, I'm not able to catch that in my model. So, can anyone say me what could be wrong in modeling?

celest vine
versed gulch
#

does anyone know how to extract a smaller random 3D array from a larger 3D array (in an image sense)
i.e. extracting a (242, 256, 256) array from a (242, 512, 512) image?

wooden sail
#

you can generate index arrays and use them to index the large image. the catch is that they need the right shape!

#

i.e., bigarray[:, v[:, np newaxis], u[np.newaxis, :]]

#

generate v and u however you like, e.g. by doing a random permutation of arange(512) followed by [0:252]

versed gulch
#

im not sure about the 2nd two indexes,of what they mean

wooden sail
#

hmm?

versed gulch
#

with v and u

wooden sail
#

they are arrays of indices, where i used np.newaxis to add extra dimensions

#

you can use reshape if you prefer

versed gulch
wooden sail
#

that's a special case of the solution i gave you

#

u and v are just np.arange(start, stop)[...] in that case

#

you originally said a random 3d array, which is why i wrote the example that way

versed gulch
wooden sail
#

in your case, it only makes sense to pick random values from 0 to 252, and the slice goes up to that value plus 252

wooden sail
#

because you said you want the images to have size 252

versed gulch
#

242

#

but okay may be mistake on both sides

wooden sail
#

ah, 242 then lol, i misread

versed gulch
#

242 is the number of slices and 256 are the height and width

wooden sail
#

ah, then i meant 256, not 242

versed gulch
#

so i want the number of slices fixed but changing the height and width accordingly to the sub block extracted

wooden sail
#

the 242 doesn't matter here since you're selecting from all 242 slices, that'll just be a :

#

hmm i'm not sure i get your definition of subblock, that's not a well-defined term

queen cradle
# fringe bay hey guys I have a list of gps coordinates. They are plotted in this image. The c...

You need to distinguish between "interpolation splines" and "smoothing splines". The former are easier to implement and are widely available (e.g., in SciPy). Interpolation splines are guaranteed to pass through your data points. Smoothing splines are not guaranteed to pass through your data points; their use is to smooth out noisy data. Generally, in applications where there's noise involved, smoothing splines will give better results than interpolation splines. However, if the amount of noise is pretty low, then they're not necessary. They might be useful for you if there is too much noise in your samples and interpolation splines give you bad results.

Also, to avoid issues with duplication, I suggest making your data points 3-dimensional: (Lat, long, time). Make your splines with all three; if you only care about (lat, long), then just drop time after making the splines.

wooden sail
#

do you want a block of size 242 x 256 x 256, or a block whose shape depends on the index

versed gulch
wooden sail
#

yes, but what size

#

earlier you gave a fixed size, but now you said the height and width change

versed gulch
wooden sail
#

ok, so the size is fixed

versed gulch
wooden sail
#

all right

#

so you can draw the indices as random integers between 0 and 512 - 256

versed gulch
wooden sail
#

fixed h and w for all slices

#

we can just formulate it in terms of h and w

versed gulch
wooden sail
#

as starting points, yes

#

but you need to define a slice

#

if you pick a value larger than 252 as the starting slice, you'll get index out of bounds

queen cradle
# scarlet zealot Hello. What statistical tests can we use to find the relationship of one stock t...

This is a question of determining the covariance of the two stock prices. It's been extensively studied in mathematical finance. You can find plenty of information if you Google "stock covariance". If you want more sophisticated techniques, there is a series of really good papers by Ledoit and Wolf on shrinkage estimation of covariance matrices and their applications to finance. (Ledoit's website will come up if you Google him, and all the papers are there.)

wooden sail
#

picking 252 gives you the lower right quadrant of the image

versed gulch
wooden sail
#

256*, i keep mixing it up

#

lemme write it more generally

versed gulch
#

unless we also do random values of 1 and -1 for the lower half

wooden sail
#

!e

import numpy as np

x = np.arange(75).reshape(3,5,5)
print(x)

h = 2 #desired height
w = 3 #desired width

row = np.random.randint(low=0, high=x.shape[1]-h+1) #pick a random row
col = np.random.randint(low=0, high=x.shape[2]-w+1) #pick a random col

y = x[:, row:row+h, col:col+w]
print("")
print(y)
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[[ 0  1  2  3  4]
002 |   [ 5  6  7  8  9]
003 |   [10 11 12 13 14]
004 |   [15 16 17 18 19]
005 |   [20 21 22 23 24]]
006 | 
007 |  [[25 26 27 28 29]
008 |   [30 31 32 33 34]
009 |   [35 36 37 38 39]
010 |   [40 41 42 43 44]
011 |   [45 46 47 48 49]]
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/anatokohiv.txt?noredirect

wooden sail
#

this does what you want in general for any h and w. just replace x with whatever 3D data you have

#

careful with the indices, i may or may not be off by 1. i think this is right though, running it several times hasn't given any errors and does seem to go all the way to the end of the array

queen cradle
# hoary wigeon Hi guys, I need help in time series.. In my problem there is a trend which repe...

Here's an idea; I don't know if it will work. There's a pretty obvious downward trend among the peaks; but the troughs are all about the same height of zero. So maybe a good model would have some kind of steadily decreasing multiplicative factor. And maybe that suggests that you might have some luck fitting the logarithm of the data. (or probably log(1 + y) if y is your data). Like I said, I don't know if it will work, but it might be worth a try.

wooden sail
versed gulch
versed gulch
wooden sail
#

let's see. your array has size 512, and we want a window of size 256. 512 - 256 is 256, and the +1 leaves us at 257. then randint generates a number between 0 and 257 exclusive, so in the interval [0,256]. in the worst case scenario, we draw 256. then the slice is 256:256+256 = 256:512. but the upper limit of the slice is also exclusive, so that gives the indices from 256 to 511

#

i think it should be ok, or?

lapis sequoia
#

does anyone know how to do perform a fuzzymatch with a 90% threshold using pandas?

prime hearth
#

Hello, I have a dataset with 700 features. I did PCA and got around 428 features, but i dont think I can intepret 428 features

#

whats a good amount of components to have with PCA? Like should i have 2 up to 10 so it easily intepretable?

#

actually now that i think about it- my features are uncorrelated- im doing NLP where most of features are individuals words

wooden sail
#

what do you mean by "interpret"? if your data is inherently high dimensional, there's not much you can do about it

prime hearth
#

@wooden sail oh okay, and yeah i mean like to know what i should label those new components or features made by the PCA

#

but i just realized my data has no correlation between variables

#

so i dont think PCA will help much

vast lintel
#

If I wanted to use a neural network to take relatively simple user input (say, describes the results of some analysis, perhaps it describes the parameters of a simulation and how some variables relate to one another) and segments the user input into categories on which it expands on for you (with relatively meaning word filler), would I use something like a transformer to do that?

I have been looking at something like BERT but I have no experience with this type of thing.

fringe bay
#

@wooden sail I managed to get the spline working with your help and some from stackoverflow
Now that I have the spline, discarding the initial coordinates which generated the line, it would look like the 2nd pic
based on the code snippet how would you get the 9 points from the spline with same distance between them (same distance in terms of distance as it would be visible with the eye on the plot) pic3 [I tried lol] ?

wooden sail
#

there should be a way to get the polynomial coefficients from the spline function you used

fringe bay
#

that would have something to do with tckp, u or xnew,ynew,znew ?

wooden sail
#

ah, on the line you do xnew, etc

#

change the linspace

#

set it to 9 steps and you're done

#

you have 400 atm

#

also plot with "*" instead of "r" or "bo"

fringe bay
#

made changes to linspace from 400 to 9 and changed plot

#

this looks a bit strange though, the initial one looks good
the thing is, now that I have the exact spline, I just need to grab the points, don't need to change the spline anymore

wooden sail
#

use * instead of r on the second plotting line

#

you can keep bo for the original points

fringe bay
#

there are a couple new ones, but not really filling the gaps

wooden sail
#

you're getting there, but sadly i have to get going. i'm sure someone else will pick it up where we left off

fringe bay
lapis sequoia
#

does anyone know how to do perform a fuzzymatch with a 90% threshold using pandas?

prime hearth
#

hello, i would like to please ask- should it be a red flag if my trainning accuracy for KNN is 75% and validation is around the same?

I do expect some drop in accuracy, like for SVM i got 0.9 for trainning but 0.73 for testing.

charred light
prime hearth
#

i did train_test_split and thats what i got, is this what you mean? sorry

#

also i did cross validaiton and thats what i got too

charred light
prime hearth
#

oh okay thanks. Im still a beginner with NLP, and i noticed my model performs really well when i have 1200 features (where each features is an indidiual word)

#

is this okay? Someone told me as long as the features make sense and im not overfitting

charred light
charred light
prime hearth
#

oh okay thanks. Im aware of more features mean less accuracy, and i tried PCA but PCA doesnt improve the accuracy of my model. The only thing i plan now is to do feature selection and remove features that are not meaningful and see the accuracy again

#

I dont have any more data, i would have to webscrap - is this consider validation set or just use real world inputs?

charred light
charred light
prime hearth
#

yeah my data is like mostly from an API and webscraped data

#

okay thanks.

tropic niche
#

I'm trying to parse through data collected from an OCR using pytesseract. The data is pretty simple and comes out quite clean. It is tabular data, and I need an easy and reliable way of classifying the data back into rows and columns. So far I have been taking the x_pos and y_pos for each word, but there is variability in that so I find myself trying to manage that with if statements and what not. I was thinking, is it feasible to use nearest-neighbor to classify the data into rows and then columns?

lapis sequoia
#

Anyone have incorporated Ai with hacking?

sick fern
#

Hey guys

#

I want to create a program that converts python code to c++ and whatever