prime hearth Jan 29, 2023, 11:02 PM

#

with tfidf when i print it, it shows like a dataframe where the columns name is the word representing that column

#

i am able to get the features from the tfidfvectorizer after fitting my dataset

#

i was thinking, do you think it good idea for any new data to perform tfidfvectorizer then check which columns or features exist in my model and drop any that arent part of it, and then predict with that data?

queen cradle Jan 29, 2023, 11:23 PM

#

To compare your models, you need to pick the one loss function and use it for all your data. That can be mean absolute price change, root mean squared price change, mean relative price change, root mean squared relative price change, or lots of other things. But as long as you pick one loss function and use it consistently, the results are comparable. (R^2 is not helpful for this purpose. It's frequently misinterpreted; in fact, in almost all cases you should just ignore it and use something like mean squared error instead.)

serene scaffold Jan 29, 2023, 11:47 PM

#

@prime hearth you can add new columns for words that you don't already account for, but for arrays made with the old vectorizer, I guess you'd have to pad them

prime hearth Jan 29, 2023, 11:51 PM

#

hello, i would like to please ask, im trying to improve the accuracy of my model but i notice my model is trying to classify if a business is good or not based on reviews. My accuracy came to be 0.75 for predicting enw data. However my model uses 700 features (each column represents a word) and i noticed my model seem to be to over fitting as the features my model is using is not meaningful. Do you think it be a good idea to extract the most popular feature my model is using then to only choose adjectives or noun

stuck shard Jan 30, 2023, 1:14 AM

#

prime hearth hello, i would like to please ask, im trying to improve the accuracy of my model...

so each column of your data is a unique word from the review of the business?

prime hearth Jan 30, 2023, 1:15 AM

#

@stuck shard yes. So for example:

#

'food", "quick", "ugly"

#

those would be the column after performing bag of words or tfidfvectorizer

#

However tfidf and bag of words has a built in method that shows the top features you can say, and I choose the top 100 to see what my model is using this is what it showz:

['score' 'serving' 'product' 'pork' 'portion' 'portland' 'pretty' 'price'
 'priced' 'pro' 'probably' 'problem' 'professional' 'point' 'pub' 'put'
 'quality' 'question' 'quick' 'quickly' 'quiet' 'quite' 'ramen' 'rating'
 'poor' 'pm' 'ready' 'plus' 'move' 'moved' 'moving' 'much' 'multiple'
 'music' 'must' 'nail' 'name' 'near' 'need' 'needed' 'photo' 'pick'
 'piece' 'pizza' 'place' 'plate' 'pleasant' 'please' 'plenty' 'read'
 'really' 'service' 'sandwich' 'saturday' 'sauce' 'saw' 'say' 'saying'
 'seafood' 'seat' 'seated' 'seating' 'second' 'see' 'seeing' 'seem'
 'seemed' 'seems' 'seen' 'selection' 'sell' 'serve' 'served' 'server'
 'sat' 'salon' 'reason' 'salmon' 'reasonable' 'received' 'recent'
 'recently' 'recommend' 'recommended' 'red' 'regular' 'reservation'
 'restaurant' 'return' 'review' 'rice' 'right' 'road' 'roll' 'room' 'rude'
 'run' 'said' 'salad' 'able']

#

you can see it not very good if i use google reviews on this model it can be bad as the words are too fit to my dataset

#

so i was thinking of getting the most meaningful words such as adjectives and nouns maybe

#

what do you think?

stuck shard Jan 30, 2023, 1:18 AM

#

have you tried using stemming or lemmatization to condense words? for example, "recommended" would become "recommend" with a stemmer such as Porter or Lancaster stemmer (available in the NLTK library)

#

and "moved" and "moving" would become "move"

prime hearth Jan 30, 2023, 1:19 AM

#

yes i did do text cleaning-stopwords, lemmatization

stuck shard Jan 30, 2023, 1:19 AM

#

so you would create more room for more unique words in the top 100. additonally, consider using the TextBlob library to do sentiment analysis such as polarity measure (-1 being very negative sentiment, +1 being very positive sentiment) as an additional feature in your dataset

prime hearth Jan 30, 2023, 1:20 AM

#

thats a good idea , my dataset also has the rating review (from 0-5 )

#

so do you think i should still do this?

#

i use rating reviews and all the word features in my model

stuck shard Jan 30, 2023, 1:21 AM

#

what is the criteria for "good" versus "bad"? did you label each point as good/bad or is this a dataset you obtained from somewhere?

prime hearth Jan 30, 2023, 1:23 AM

#

oh yeah i have the y-label or another feature that determine if it bad or not the business

#

so i have 3 features in original dataset, rating review, the review, and if business is successful or not

#

after performing tfidf or baf of words i get lots of features (like 700 )which i use in my model

#

i using classifcaiotn model like SVM as thats the one with best accuracy so far and KNN is close

stuck shard Jan 30, 2023, 1:24 AM

#

have you tried random forest?

prime hearth Jan 30, 2023, 1:28 AM

#

no i havent, you think i should?

stuck shard Jan 30, 2023, 1:32 AM

#

it is probably worth a shot

#

although i think you would have to represent your data as tfidf numpy vectors for random forest to work appropriately

prime hearth Jan 30, 2023, 1:33 AM

#

so random forest wont work with tfidf?

#

oh sorry i see what you mean

stuck shard Jan 30, 2023, 1:33 AM

#

the data has to be the numbers generated by representing your data as tfidf vectors (if i recall correctly)

#

it's been a while since i did text-based classification

prime hearth Jan 30, 2023, 1:36 AM

#

oh okay thanks il try that, do you think i should try what im doing right now

#

like the getting most popular words in dataset that are meaninful (adjectives and nouns)

#

and use those features and try doing feature selection again and repeat?

#

i notice my model does well with thousands of features because some words dont exist in other reviews. But i know it not good to have that many features so trying to reduce it

stable sierra Jan 30, 2023, 8:35 AM

#

Hello
I have a problem
TypeError: 'NoneType' object is not iterable
when I'm trying to scrape information from amazon using for loop with requests html library
anyone knows the solution ?

undone fiber Jan 30, 2023, 10:59 AM

#

suggestions with type of ml to use or example.. say i want to try to build an ml program that can diagnose windows errors.. and the dataset would be errors, logs, current config, golden config, common solutions..
where it reads the errors, logs, current config and compares against golden config, and common solutions for those errors.

calm ore Jan 30, 2023, 11:19 AM

#

stable sierra Hello I have a problem TypeError: 'NoneType' object is not iterable when I'm ...

Are you using bs4

lapis sequoia Jan 30, 2023, 11:26 AM

#

stable sierra Hello I have a problem TypeError: 'NoneType' object is not iterable when I'm ...

does not seem like a #data-science-and-ml problem, kindly post in appropriate place. also, I think you gotta give us tiny snippet describing your problem to help us understand it better.

eternal hull Jan 30, 2023, 1:06 PM

#

Anyone has idea how to build retention model

violet monolith Jan 30, 2023, 1:07 PM

#

Hi

lapis sequoia Jan 30, 2023, 1:25 PM

#

hi guys, i would like to create a text-to-image ai, do you have some suggestions for start this project?

lusty sail Jan 30, 2023, 1:38 PM

#

lapis sequoia hi guys, i would like to create a text-to-image ai, do you have some suggestions...

Depending on how good the model should be, training from scratch could be infeasible. Perhaps look at finetuning stable diffusion to your needs https://huggingface.co/docs/diffusers/training/text2image

Stable Diffusion text-to-image fine-tuning

mild dirge Jan 30, 2023, 2:07 PM

#

I just made a neural network that performs regression on some numerical data. The neural network has 2 hidden nodes with tanh activation. And the result of these nodes are simply added up for the final output of the model. The input is about 100 data points with 50 features each.

Now what is interesting is that (after training) these two hidden nodes seem to have mirrored/flipped weights. So they both have weights from each feature to the node, but the weight from feature 1 to node 1 (call it w11) is approx equal to the flipped weight from feature 1 to node 2 -w12. I'm not really sure what this means or how to interpret this result.

wooden sail Jan 30, 2023, 2:11 PM

#

i would start by saying that, with no further context, neural networks and their parameters are in general not interpretable

mild dirge Jan 30, 2023, 2:12 PM

#

Yeah, but 50 weights per node, almost exactly equal but flipped. Seems to be something going on.

wooden sail Jan 30, 2023, 2:12 PM

#

it could be that what you're computing has some symmetry

#

you would expect something like this if what you're computing behaves, e.g., like a distance (minus the negative sign)

tidal bough Jan 30, 2023, 2:13 PM

#

some points of view:

it seems to me that having w12 = -w11 is about equivalent to having them both equal to zero, so I guess the output just doesn't depend on feature1 much
on the other hand, it's not entirely equivalent unless the other weights are also equal, and neural networks can stuff a lot of computation into tiny portions of their activations like that, so it may be meaningful after all.
i don't have much of an grasp on which POV is better here 🥴

mild dirge Jan 30, 2023, 2:14 PM

#

I can show it in a bit, it's on diff computer.

wooden sail Jan 30, 2023, 2:15 PM

#

if you can rewrite it explicitly as matrices and vectors, i can revise what i said. cuz on second thought i'm not sure what you call w11 and w12 in the first place 😛

mild dirge Jan 30, 2023, 2:15 PM

#

The weight from input 1 (so feature 1) to node 1 or 2 respectively for w11 and w12

wooden sail Jan 30, 2023, 2:18 PM

#

so something like y = sum(tanh(MX)), where M is of size 2 x 50 and X is of size 50 x 100? then i'd agree with reptile

#

but "almost exactly equal" and getting "small outputs" is a hard thing to quantify without a proper metric in the output space of that layer

#

and also whether 0 means something in the output space of the layer

#

(that meaning and metric are usually absent)

#

if you wanna corroborate whether it's that the feature does not matter, you can do some sort of sensitivity analysis, e.g. by now freezing the network weights and differentiating w.r.t. the input

mild dirge Jan 30, 2023, 2:24 PM

#

Yeah i'll take a look at the data, maybe there's some interesting pattern

tidal bough Jan 30, 2023, 2:24 PM

#

here's an example where there's two output neurons and weights of a for them are opposing

#

as you can see, when b=0, the output is 0 regardless of a - but when it's not, it does still depend on a.

#

so that may be what it's doing, very roughly.

undone fiber Jan 30, 2023, 2:32 PM

#

undone fiber suggestions with type of ml to use or example.. say i want to try to build an ml...

bump 🙂

leaden cosmos Jan 30, 2023, 2:56 PM

#

Anyone familiar with tensor flow-gpu

prime hearth Jan 30, 2023, 3:12 PM

#

it is valid to transform new text to tfidfvectorizer to predict new data on a model that was trainned on a dataset using tfidfvetorizer? My concern is i would need to drop any features in the new data that doesnt exist in my model after performing tfidf on the new data

mild dirge Jan 30, 2023, 3:23 PM

#

!paste

arctic wedgeBOT Jan 30, 2023, 3:23 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

mild dirge Jan 30, 2023, 3:25 PM

#

Made this code to try and visualize the output of my model for just 1 feature for different weight values for the 2 hidden nodes. It's an interactive plot using just numpy and matplotlib. I guess the weights having opposing sign makes it so more complex functions can be approximated, as when they have the same sign, it is basically the same as having just 1 hidden node. Still not sure how this would generalize to more features though.
https://paste.pythondiscord.com/akasotunor.py

#

It's an interactive plot. If you guys (@tidal bough @wooden sail) were still interested ^^

#

tidal bough Jan 30, 2023, 3:32 PM

#

when there's only one feature, it's just tanh(w1 x) + tanh(w2 x)

#

though, hmm, it doesn't really simplify

ripe sapphire Jan 30, 2023, 3:34 PM

#

what does tanh mean?

delicate apex Jan 30, 2023, 3:41 PM

#

hyperbolic tangent. defined as hyperbolic sin over hyperbolic cosh (sinh/cosh), or:

#

.latex $\frac{e^x - e^{-x}}{e^x + e^{-x}}$

strange elbowBOT Jan 30, 2023, 3:41 PM

#

$latex.png$

wooden sail Jan 30, 2023, 3:57 PM

#

mild dirge

doesn't really simplify much, as reptile says. your plot just added in an extra compression/dilation to yield a family of plots like the surface reptile shared, just stretched along both axes depending on x. the next question is whether the sum means anything. you can study the weights assigned to each input parameter and see which ones will have which effect

#

but trying to interpret a network in general is not very meaningful unless the network enforces some sort of prior knowledge

#

is there a special reason to use a sum of hyperbolic tangents instead of just one tanh or more?

mild dirge Jan 30, 2023, 4:00 PM

#

No, it's just some assignment for uni, but they specifically want us to plot the weights, which looks like this

wooden sail Jan 30, 2023, 4:00 PM

#

you can also make a plot of wi1 + wi2 to get a feel for which params do what

mild dirge Jan 30, 2023, 4:00 PM

#

And ig they expect us to say something "useful" about that

wooden sail Jan 30, 2023, 4:00 PM

#

ah, lmao

#

plot the sum too

mild dirge Jan 30, 2023, 4:01 PM

#

#

Left is original, right is sum, Still not too much to say I think 😛

wooden sail Jan 30, 2023, 4:01 PM

#

not quite

#

the abs of the right plot tells you which parameters weigh into the sum

#

though one also has to consider the magnitude of the input, so that alone also doesn't tell the whole story

mild dirge Jan 30, 2023, 4:03 PM

#

Yeah

#

It depends on the sum of the rest too

#

Its not that two opposing weights cancel out

wooden sail Jan 30, 2023, 4:03 PM

#

i would claim 2abs(input_i)/abs(input_i (wi1 + wi2)) is meaningful, but the input cancels out

mild dirge Jan 30, 2023, 4:04 PM

#

wooden sail though one also has to consider the magnitude of the input, so that alone also d...

The inputs are all standardized btw if that's what you mean

wooden sail Jan 30, 2023, 4:04 PM

#

aight

#

if you look at 2 / abs(wi1 + wi2)?

#

wait ugh, the tanh

mild dirge Jan 30, 2023, 4:06 PM

#

2/abs(tanh(w1+w2)) ?

wooden sail Jan 30, 2023, 4:06 PM

#

no

#

2/(abs( tanh( mean_i * wi1) + tanh(mean_i) * wi2 ))

mild dirge Jan 30, 2023, 4:07 PM

#

The mean is 0

wooden sail Jan 30, 2023, 4:07 PM

#

byotiful

mild dirge Jan 30, 2023, 4:08 PM

#

Maybe instead of looking for a reason they are opposed, is there a reason that they never have the same sign?

#

Because there is no exception (except for a single weight, but could be due too low epochs)

tidal bough Jan 30, 2023, 4:10 PM

#

I wonder if this can be explained by https://transformer-circuits.pub/2022/toy_model/index.html somehow

#

hmm, the tactic it uses to analyze weights is to look at W.T@W

#

wooden sail Jan 30, 2023, 4:11 PM

#

i was about to mention looking at the correlation among the weights, yeah

#

you'd expect this to be a 2x2 near-rank-one matrix in your case, which would mean you asked for too high dimensional a vector space given your data

mild dirge Jan 30, 2023, 4:14 PM

#

#

Or did I have the wrong shape?

wooden sail Jan 30, 2023, 4:14 PM

#

i would have expected a 2x2x50 array

mild dirge Jan 30, 2023, 4:14 PM

#

wooden sail Jan 30, 2023, 4:16 PM

#

at any rate though, you can tell the matrix is highly structured regardless of how you choose to represent it

#

high structure goes hand in hand with low dimensionality

mild dirge Jan 30, 2023, 4:16 PM

#

So high correlation between the weights of the same features means that too many hidden nodes are used or am I misinterpreting?

wooden sail Jan 30, 2023, 4:17 PM

#

that would be my take

#

you can try using a 1 x 50 matrix instead and see if there's any noticeable difference in the training error at the end

mild dirge Jan 30, 2023, 4:18 PM

#

Yeah I could try that. thx for the help btw

wooden sail Jan 30, 2023, 4:23 PM

#

mild dirge

this one can be interpreted as the average correlation between wi1 and wi2, btw. you can see that one column is -1 times the other, so it's rank deficient (or close to)

#

importantly, the off diagonals are about as large as the main diagonals, telling you that one weight explains the other

mild dirge Jan 30, 2023, 4:24 PM

#

That's pretty good to know. I think in this case it does not give us much more info than what we already know (a high correlation between the weights) right?

#

I think I'm probably just going to keep the discussion of the plot simple, since I don't seem to understand very much why it is doing what it is doing with the opposing weight values.

wooden sail Jan 30, 2023, 4:25 PM

#

high correlation/low rank are indicators of the possibility to exploit a low dimensional subspace, usually

#

the question of "why" is a more difficult one 😛

#

but the way i would put it is like this

mild dirge Jan 30, 2023, 4:26 PM

#

Reducing the hidden nodes by 1 would probably give poor results, since that would make the model a bit too simple

#

So I'm not sure if they expect us to give that conclusion

#

I could still test it out

wooden sail Jan 30, 2023, 4:26 PM

#

idk, i'd be curious 😛

#

i would put it like this. imagine you know your model is a + b = c, but you can only observe c

#

finding a and b requires making some sort of assumption or introducing prior knowledge, in general there's infinitely many solutions

#

there should be a clever transformation one can apply to your model to turn it into this kind of problem. i think? idk

#

if you feel like doing some asymptotics, you can also notice that tanh is odd, and so you have tanh(a) - tanh(a + delta), which looks like a finite difference. we can then ask the question of how close this is to a 1st order taylor series expansion. or also what happens when you expand tanh(a + delta), so that we can see exactly what a - b is doing

#

the first order taylor approx is x i think? which already turns it into a sort of a + b = c problem if you expand around a or b

#

assuming my handwaving isn't too far off the mark, this gives you an alternative interpretation of something like "the parameters cannot be distinguished from each other" in the asymptotic regime

#

👋

mild dirge Jan 30, 2023, 4:57 PM

#

I will probably discuss some of this stuff with my partner to see what he thinks. But I really appreciate all the input

finite sierra Jan 30, 2023, 5:08 PM

#

not sure if this is the right place to ask but I want to create a mobile app that detects playing cards with a phone camera using some sort of AI, I've already seen a few videos of people training a model to be able to do that but I want to go through that process myself.. I don't know what course to start with because there are different libs and idk which one I should be using

#

basically I want to know which python libraries I should be learning (with the math that comes with it) or course recommendations...

serene scaffold Jan 30, 2023, 5:33 PM

#

finite sierra not sure if this is the right place to ask but I want to create a mobile app tha...

there are two separate concerns here: having an image classifier, and having a mobile app. the app part is off-topic for this channel, but the rest is on-topic.

finite sierra Jan 30, 2023, 5:51 PM

#

I already know how to make a mobile app using flutter but I am clueless about the first part that you mentioned

serene scaffold Jan 30, 2023, 5:52 PM

#

look into image classification for playing cards

#

you might also need object detection, to figure out where the cards are in a given image.

lapis sequoia Jan 30, 2023, 6:24 PM

#

Does anyone know how to create a fuzzymatch using pandas?

#

A fuzzy match is performed to determine matching street instances, [LFA1_Street] -> [EA_Street and House Number (emp)], with a 90% threshold to produce a table with 2 columns, [Test3_RecordID] with their corresponding matching [Test3_RecordID2] called FM

serene scaffold Jan 30, 2023, 6:57 PM

#

lapis sequoia Does anyone know how to create a fuzzymatch using pandas?

you can't. you have to use a separate library.

#

I think you've asked about this before?

lapis sequoia Jan 30, 2023, 6:58 PM

#

serene scaffold you can't. you have to use a separate library.

you mean fuzzywatch?

thin palm Jan 30, 2023, 8:59 PM

#

If a company provided previous quarters of their revenue and profit, basically their historical data. How can I estimate what their Q1 for the new year will be?

arctic crown Jan 30, 2023, 9:01 PM

#

please help in sklearn why does .fit take a 2d array

serene scaffold Jan 30, 2023, 9:02 PM

#

arctic crown please help in sklearn why does .fit take a 2d array

you have to say what you're trying to .fit, because tons of things in sklearn have that

arctic crown Jan 30, 2023, 9:02 PM

#

serene scaffold you have to say what you're trying to `.fit`, because tons of things in sklearn ...

for example if i try to fit a 1d x value it gives me an error

serene scaffold Jan 30, 2023, 9:02 PM

#

arctic crown for example if i try to fit a 1d x value it gives me an error

show code

arctic crown Jan 30, 2023, 9:04 PM

#

serene scaffold Jan 30, 2023, 9:04 PM

#

arctic crown

in the future, please always do code as text

#

!code

arctic wedgeBOT Jan 30, 2023, 9:04 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold Jan 30, 2023, 9:04 PM

#

anyway

#

!docs sklearn.linear_model.LinearRegression

arctic wedgeBOT Jan 30, 2023, 9:04 PM

#

sklearn.linear\_model.LinearRegression


class sklearn.linear_model.LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)```
Ordinary least squares Linear Regression.

LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

serene scaffold Jan 30, 2023, 9:05 PM

#

try doing dataset[['area']]. please also copy the code into the chat as shown above ^

#

@arctic crown

arctic crown Jan 30, 2023, 9:05 PM

#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv("homeprices.csv")

model = linear_model.LinearRegression()
model.fit(dataset[["area"]], dataset.price)
print(model.predict(3000))```

#

serene scaffold Jan 30, 2023, 9:05 PM

#

I will not look at screenshots of text. Please do text as text.

arctic crown Jan 30, 2023, 9:06 PM

#

error: ValueError: Expected 2D array, got scalar array instead:
array=3000.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

serene scaffold Jan 30, 2023, 9:06 PM

#

try this instead

model.fit(dataset["area"], dataset.price)
print(model.predict([3000]))

arctic crown Jan 30, 2023, 9:07 PM

#

ValueError: Expected 2D array, got 1D array instead:
array=[2600 3000 3200 3600 4000 4100].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

#

i am getting this error because my x value is not a 2d array in this line: model.fit(dataset["area"], dataset.price)

serene scaffold Jan 30, 2023, 9:08 PM

#

where is array=[2600 3000 3200 3600 4000 4100]?

arctic crown Jan 30, 2023, 9:08 PM

#

serene scaffold where is `array=[2600 3000 3200 3600 4000 4100]`?

that in my dataset

#

which is a csv file

#

area,price
2600,550000
3000,565000
3200,610000
3600,595000
4000,760000
4100,810000

serene scaffold Jan 30, 2023, 9:08 PM

#

is it dataset.price, or what?

#

okay, thanks

#

keep in mind that I don't know any of this unless you show me

arctic crown Jan 30, 2023, 9:09 PM

#

okay

serene scaffold Jan 30, 2023, 9:10 PM

#

dataset['area'] is your X, and dataset.price is your y. and this is what the docs say

X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data.

yarray-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary.

#

so X needs to be two dimensional, and y can be one dimensional or two dimensional

arctic crown Jan 30, 2023, 9:11 PM

#

yea thats my question why cant it accept a 1d array for the x

#

why does it have to be a 2d array

serene scaffold Jan 30, 2023, 9:11 PM

#

so try this

model.fit(dataset[["area"]], dataset['price'])
print(model.predict([3000]))

serene scaffold Jan 30, 2023, 9:11 PM

#

arctic crown why does it have to be a 2d array

because each row represents a data point, and each column represents a feature

#

a one-dimensional array doesn't have rows or columns. it's just a flat sequence.

#

so instead of it being (n,) shaped, which is one dimensional, you need it to be (n, 1) shaped. which is basically the same

#

but if your X array is just (n,)-shaped, sklearn doesn't know if that's one row with n features, or n rows with one feature each.

#

@arctic crown make sense?

arctic crown Jan 30, 2023, 9:13 PM

#

a little bit

arctic crown Jan 30, 2023, 9:13 PM

#

serene scaffold because each row represents a data point, and each column represents a feature

why does that matter for x and not y?

serene scaffold Jan 30, 2023, 9:14 PM

#

arctic crown why does that matter for x and not y?

good question! the y value is pretty much always one feature anyway, so it can just be assumed.

arctic crown Jan 30, 2023, 9:15 PM

#

serene scaffold because each row represents a data point, and each column represents a feature

whats the difference between data point and feature

serene scaffold Jan 30, 2023, 9:15 PM

#

arctic crown whats the difference between data point and feature

it's like the difference between an object and an attribute

#

in fact, data points are sometimes called objects

#

but a data point is a "thing", and a feature is a type of information you can have about that thing

#

like a person can be a data point, and their features would be things like height, weight, age, etc.

#

and every value for a given feature should be the same type/unit.

arctic crown Jan 30, 2023, 9:18 PM

#

can you please give like a visual array example

#

i think you are talking about multiple linear regression

serene scaffold Jan 30, 2023, 9:19 PM

#

arctic crown i think you are talking about multiple linear regression

this isn't about linear regression. it's just data science in general.

serene scaffold Jan 30, 2023, 9:21 PM

#

arctic crown can you please give like a visual array example

   name  age  height gender
0   bob   25      42      M
1  jane   30      29      F
2  carl   60      40      M

#

each row is a data point representing a person. the types of each element in a row are different.

each column is a feature. each column represents a specific kind of information that we can know about a person. the types and what they represent are the same.

arctic crown Jan 30, 2023, 9:30 PM

#

wait so when we are doing .fit the x values are the rows and y values are the columns?

#

@serene scaffold

serene scaffold Jan 30, 2023, 9:36 PM

#

arctic crown wait so when we are doing .fit the x values are the rows and y values are the co...

no. you can't have rows and columns as separate things. every element belongs to a row and to a column.

arctic crown Jan 30, 2023, 9:42 PM

#

serene scaffold no. you can't have rows and columns as separate things. every element belongs to...

okay now back to my original question why does .fit and .predict require a 2d array?

mild dirge Jan 30, 2023, 9:53 PM

#

Because you supply the data for the model to train on. The data consists of multiple rows, each being 1 datapoint. And each datapoint has multiple values, 1 for each feature.

#

Even if you only have 1 feature, then the libraries still expect it to be in the 2d format (like a table). But then it is just a table with 1 column, for the single feature that your dataset has.

hasty mountain Jan 31, 2023, 12:23 AM

#

Interesting... Using a VAE with a DCGAN and making it not only trying to generate new images from random noise, but also trying to mimic the real images from their latent vector seems to make Generator-Discriminator convergence way easily.
Too bad DCGAN is too simple to my data...so I don't know if it might cause mode collapse eventually because of that pithink

#

I know there's a VAE-GAN model, but I'm doing something slightly different... I guess

#

I just hope that, when I change the architecture from a DCGAN to a more robust one, this helps the model converge...I almost got crazy because it wouldn't converge

hasty mountain Jan 31, 2023, 12:43 AM

#

My code is a Frankenstein's Monster already brainmon

serene scaffold Jan 31, 2023, 1:09 AM

#

hasty mountain *My code is a Frankenstein's Monster already* <:brainmon:439516188771483658>

that's all code.

hasty mountain Jan 31, 2023, 1:10 AM

#

Yes, but for my GAN it's different

#

I've some comment code for a UNet discriminator, comment code for vanilla discriminator, comment code for duelling discriminators, there's also some code for Progressive Growing GANs...

#

Someday I might organize it...but only when I can make a decent GAN...which isn't today... yert

iron basalt Jan 31, 2023, 1:21 AM

#

hasty mountain Someday I might organize it...but only when I can make a decent GAN...*which isn...

I recommend writing something about it before moving on, even if it is a failed attempt. If you keep at it, you may end up with thousands of Git branches after some number of years and need some way to search through them...

hasty mountain Jan 31, 2023, 1:22 AM

#

iron basalt I recommend writing something about it before moving on, even if it is a failed ...

Oh, but I do that... It's only that everything works until I get out of DCGAN architecture

#

But there's some things that I feel that might be promising, like UNet and Relativistic Discriminator.

#

And duelling discriminators...though with not so much interest than UNet and Relativistic

merry fern Jan 31, 2023, 1:34 AM

#

https://stackoverflow.com/questions/75291837/projecting-future-balances-over-x-periods-ex-12-months-in-pandas

Stack Overflow

Projecting future balances over X periods (ex: 12 months) in Pandas

df has ~140 lines, each with its own Balance, Latest_Date and Category (15 types)
Account Name
Balance
Latest_Date
Category
ABC
1,000
2025-0315
TypeA
DEF
4,000
2026-0131
TypeB
GHI
10,000
2024-0...

sick fern Jan 31, 2023, 3:30 AM

#

Hey guys, I have a question regarding neural networks in TensorFlow

#

||cols = ['letter', 'x-box', 'y-box', 'width', 'height', 'onpix', 'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar', 'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx']
DataFrame = pd.read_csv('letter-recognition.data' , names=cols)
Scaler = StandardScaler()

X = DataFrame[DataFrame.columns[1:-1]].values
X = Scaler.fit_transform(X)

Y = DataFrame[DataFrame.columns[0]].values

nn_model = tf.keras.Sequential([

tf.keras.layers.Flatten(input_shape = (1 , )),
tf.keras.layers.Dense(128 , activation="relu"),
tf.keras.layers.Dense(128 , activation="relu"),
tf.keras.layers.Dense(26 , activation="relu")

])

nn_model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
nn_model.predict(X)

||

#

This is my code that I've written. It's a neural network that is attempting to match features to digits and I've made it predict it.

#

But, I've been getting this bug: Input 0 of layer "dense_42" is incompatible with the layer: expected axis -1 of input shape to have value 1, but received input with shape (32, 15)

#

I'm sure it's because of the input_shape but I don't know how to fix it. If someone could help me with this, that'd be great.

#

I'm also available to VC

arctic wedgeBOT Jan 31, 2023, 3:42 AM

#

Hey @sick fern!

It looks like you tried to attach file type(s) that we do not allow (.data). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

clever owl Jan 31, 2023, 3:52 AM

#

I've currently got a df that looks like this
Item January February March
car 10 NaN NaN
bike NaN 20 NaN


import pandas as pd

data = {
        "Item": ['car', 'bike', 'house'],
        "Price": [10,20,30],
        "Months": ["January", "February", "March"]
}

df = pd.DataFrame(data)

df = df.pivot(index="Item", columns="Months", values="Price")

print(df)

How can I make it to be like this
January February March
Item Price Empty Price Empty Price Empty
car 10 NaN NaN NaN NaN NaN
bike NaN NaN 20 NaN NaN NaN

Where the Price columns contain the price, of the items, and the Empty columns is just full empty

serene scaffold Jan 31, 2023, 3:59 AM

#

clever owl I've currently got a df that looks like this Item January February March car ...

why do you want the Empty columns?

#

you pretty much never want to allocate empty cells to put data in later. Once you've computed the data that those empty cells are actually for, then you can create a dataframe that has all of them.

sick fern Jan 31, 2023, 4:06 AM

#

sick fern I'm sure it's because of the input_shape but I don't know how to fix it. If some...

^ Can anyone help with this issue? Sorry, I'd like to get this issue down ASAP.

serene scaffold Jan 31, 2023, 4:07 AM

#

sick fern ^ Can anyone help with this issue? Sorry, I'd like to get this issue down ASAP.

there are no guarantees about how quickly you'll get help, or if you will get help at all. but you can maximize your chances by providing the code in an easily-readable way

#

!code

arctic wedgeBOT Jan 31, 2023, 4:07 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

sick fern Jan 31, 2023, 4:07 AM

#

serene scaffold there are no guarantees about how quickly you'll get help, or if you will get he...

Okay, will do.

#

Thank you

#

cols = ['letter', 'x-box', 'y-box', 'width', 'height', 'onpix', 'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar', 'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx']
DataFrame = pd.read_csv('letter-recognition.data' , names=cols)
Scaler = StandardScaler()


X = DataFrame[DataFrame.columns[1:-1]].values
X = Scaler.fit_transform(X)

Y = DataFrame[DataFrame.columns[0]].values

nn_model = tf.keras.Sequential([

    tf.keras.layers.Flatten(input_shape = (1 , )),
    tf.keras.layers.Dense(128 , activation="relu"),
    tf.keras.layers.Dense(128 , activation="relu"),
    tf.keras.layers.Dense(26 , activation="relu")

])

nn_model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
nn_model.predict(X)

serene scaffold Jan 31, 2023, 4:08 AM

#

sick fern But, I've been getting this bug: Input 0 of layer "dense_42" is incompatible wi...

also, always show the whole error message, starting from Traceback

sick fern Jan 31, 2023, 4:08 AM

#

sick fern Jan 31, 2023, 4:08 AM

#

sick fern

This is the error code.

serene scaffold Jan 31, 2023, 4:09 AM

#

sick fern

please never ask people to read screenshots of text. it's a waste of everyone's time

sick fern Jan 31, 2023, 4:10 AM

#

Okay, my bad.

#

ValueError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 248, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer "sequential_16" (type Sequential).
    
    Input 0 of layer "dense_58" is incompatible with the layer: expected axis -1 of input shape to have value 1, but received input with shape (32, 15)
    
    Call arguments received by layer "sequential_16" (type Sequential):
      • inputs=tf.Tensor(shape=(32, 15), dtype=float32)
      • training=False
      • mask=None

sick fern Jan 31, 2023, 4:10 AM

#

sick fern ``` ValueError: in user code: File "/usr/local/lib/python3.8/dist-packages/...

Error message

serene scaffold Jan 31, 2023, 4:10 AM

#

@sick fern thank you. please remember these two things (showing the code with markdown formatting, and the whole error message as text) for all future questions

sick fern Jan 31, 2023, 4:11 AM

#

serene scaffold <@538631811098738699> thank you. please remember these two things (showing the c...

Yes, will do. Sorry, I haven't spoke in this server in a long time.

serene scaffold Jan 31, 2023, 4:12 AM

#

@sick fern the input that has a shape of (32, 15) -- what does that represent?

sick fern Jan 31, 2023, 4:13 AM

#

serene scaffold <@538631811098738699> the input that has a shape of `(32, 15)` -- what does that...

What do you mean?

serene scaffold Jan 31, 2023, 4:13 AM

#

what is the input?

sick fern Jan 31, 2023, 4:13 AM

#

The input is the X values of the DataFrame

serene scaffold Jan 31, 2023, 4:13 AM

#

why is the shape (32, 15)?

sick fern Jan 31, 2023, 4:14 AM

#

I'm not really sure. I don't think I kept that in my code.

#

It was there in the error message, though.

#

Let me send the dataset

#

https://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/

#

"letter-recognition.data" is the file

serene scaffold Jan 31, 2023, 4:17 AM

#

@sick fern so, it appears that the first column is a label, and the other 16 columns are features of some kind. it isn't immediately obvious to me what they mean

#

but it appears that you're trying to run 32 instances (ie, 32 rows) through the model at once

sick fern Jan 31, 2023, 4:17 AM

#

oh

#

I'd like to run an instance at a time

#

How would I do that?

serene scaffold Jan 31, 2023, 4:18 AM

#

try changing tf.keras.layers.Flatten(input_shape = (1 , )), to just tf.keras.layers.Flatten(),

sick fern Jan 31, 2023, 4:19 AM

#

Wow

#

It worked

serene scaffold Jan 31, 2023, 4:19 AM

#

cols = ['letter', 'x-box', 'y-box', 'width', 'height', 'onpix', 'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar', 'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx'] -- I guess this is what each column means

sick fern Jan 31, 2023, 4:19 AM

#

However, I don't understand what it outputs

#

array([[0. , 0. , 0.37864068, ..., 0. , 0. ,
0.22023912],
[0. , 0. , 0.10195272, ..., 0.4670404 , 0. ,
0.54725075],
[0.04015716, 0. , 0.20881076, ..., 0.38116002, 0. ,
0.41306818],
...,
[0. , 0. , 0.40097547, ..., 0.00563453, 0. ,
0.6130048 ],
[0. , 0. , 0.24123913, ..., 0.16478615, 0. ,
0.01898614],
[0. , 0.12412387, 0.3103178 , ..., 0.17968003, 0.05001948,
0.53410375]], dtype=float32)

serene scaffold Jan 31, 2023, 4:19 AM

#

that's the thing with neural networks, isn't it?

sick fern Jan 31, 2023, 4:19 AM

#

This is the array it outputs

serene scaffold Jan 31, 2023, 4:19 AM

#

Pepega

#

well
welcome to ML

sick fern Jan 31, 2023, 4:20 AM

#

lmao

serene scaffold Jan 31, 2023, 4:20 AM

#

it looks like you're not encoding the letters numerically. which you should probably do. and it's not clear where in here you're training the model

#

it looks like you start predicting with it right away

sick fern Jan 31, 2023, 4:20 AM

#

oh wait

#

yeah I didn't mean to do that

#

Do i train the model by fitting it?

serene scaffold Jan 31, 2023, 4:21 AM

#

ya

sick fern Jan 31, 2023, 4:23 AM

#

Okay, and how do I convert the letters to numbers again?

clever owl Jan 31, 2023, 4:23 AM

#

serene scaffold why do you want the Empty columns?

I want to output to an excel file. The requirements for the excel file is that there's an empty column after every price with a title "Bank" (I called it empty there just as an example)

sick fern Jan 31, 2023, 4:23 AM

#

I know I need to typecast it in some sort of way

serene scaffold Jan 31, 2023, 4:23 AM

#

clever owl I want to output to an excel file. The requirements for the excel file is that t...

I'm running out of time, so I won't be able to help with that right now. but you might look into reindexing the pivoted table.

serene scaffold Jan 31, 2023, 4:24 AM

#

sick fern I know I need to typecast it in some sort of way

not exactly. if you assign consecutive numbers to each letter (the relationship between a letter and the number you assign to it is arbitrary), you can replace each letter with the assigned index.

clever owl Jan 31, 2023, 4:25 AM

#

Alright thanks

sick fern Jan 31, 2023, 4:25 AM

#

serene scaffold not exactly. if you assign consecutive numbers to each letter (the relationship ...

Well, if A is 1, how do I convert all As to 1s in the DataFrame?

serene scaffold Jan 31, 2023, 4:26 AM

#

sick fern Well, if A is 1, how do I convert all As to 1s in the DataFrame?

!docs pandas.Series.replace

arctic wedgeBOT Jan 31, 2023, 4:26 AM

#

pandas.Series.replace


Series.replace(to_replace=None, value=_NoDefault.no_default, *, inplace=False, limit=None, regex=False, method=_NoDefault.no_default)```
Replace values given in to\_replace with value.

Values of the Series are replaced with other values dynamically.

This differs from updating with `.loc` or `.iloc`, which require you to specify a location to update with some value.

sick fern Jan 31, 2023, 4:26 AM

#

serene scaffold !docs pandas.Series.replace

Okay, thank you.

echo halo Jan 31, 2023, 5:18 AM

#

Hey, I want to start doing Data Science from scratch how do i start?

azure tinsel Jan 31, 2023, 5:27 AM

#

Hello, I am going over a tutorial about Data Science and ML and encountered an error message, which I can not solve. I am new to DS and ML...

serene scaffold Jan 31, 2023, 5:27 AM

#

azure tinsel Hello, I am going over a tutorial about Data Science and ML and encountered an e...

If there's an error message, always always show the error message right away.

azure tinsel Jan 31, 2023, 5:27 AM

#

The task is to calculate the survival rate survival_rate = passenger_age.groupby("Age")["Survived"].mean()

#

And I get:

#

AttributeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 survival_rate = passenger_age.groupby("Age")["Survived"].mean()

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'

serene scaffold Jan 31, 2023, 5:27 AM

#

I'm going to sleep though. Good luck!

#

So passenger age is an array. Not a DataFrame

azure tinsel Jan 31, 2023, 5:28 AM

#

serene scaffold Jan 31, 2023, 5:28 AM

#

So, look into the differences between those two types.

#

You overwrote the passenger age variable

azure tinsel Jan 31, 2023, 5:34 AM

#

You mean in cell 3?

#

Thanks @serene scaffold! I commented it out and could proceed with the tutorial. I will write the author a message about this. I definitely will have to learn more about this. This tutorial is just a primer. Thank you again and have a well deserved sleep.

serene scaffold Jan 31, 2023, 5:46 AM

#

azure tinsel Thanks <@253696366952316929>! I commented it out and could proceed with the tuto...

A lot of tutorials are trash tier, and are just fodder for the author's linkedin

#

But I haven't seen the one you're referring to

#

Also I'm not sleeping

azure tinsel Jan 31, 2023, 5:46 AM

#

@serene scaffold I found out, that there appears to be a typo. In cell 3 you probably have to calculate passenger_ages

serene scaffold Jan 31, 2023, 5:46 AM

#

I have issues

azure tinsel Jan 31, 2023, 5:48 AM

#

It is by Michael Hartl. And he is a great educator, so the tutorial is ok. The point is, that it has come out 1 month ago and this probable typo is almost at the last page

thin palm Jan 31, 2023, 6:31 AM

#

how do I tell pandas to read this? I'm not even sure i know how to arrange this data

Screenshot_2023-01-30_at_11.30.29_PM.png

wooden sail Jan 31, 2023, 7:17 AM

#

<@&831776746206265384>

#

this is being spammed in many channels

drowsy viper Jan 31, 2023, 8:22 AM

#

Hi there I have a question about training an AI model in pytorch ,
I have about 30 classes (soon that will be about 180) since I am crawling more images, but the thing is I am getting multiple errors on my tensors at the criterion part. Is there a simple notebook I can follow to understand pytorch better. Btw my images in 30 different directories for now. Is this how my folder structure should be ? What is your general comment on this issue?

dusk musk Jan 31, 2023, 8:38 AM

#

does anyone have a suggestion for a library that handles units well?
kind of like astropy's?

i wasn't sure if there was any other really good suggestions (especially if there's one that works with scipy's methods like integration/diffeqs)

wooden sail Jan 31, 2023, 8:44 AM

#

dusk musk does anyone have a suggestion for a library that handles units well? kind of lik...

i know of pint, but idk if it plays well with scipy

dusk musk Jan 31, 2023, 8:45 AM

#

wooden sail i know of pint, but idk if it plays well with scipy

it horrifies me that they named it pint of all things, but i suppose it gets the message across as to what it does deepkek

#

i'll check it out :) thanks

wooden sail Jan 31, 2023, 8:45 AM

#

lol

#

it does explicitly mention numpy support, so i'm optimistic

spiral pasture Jan 31, 2023, 9:19 AM

#

Heyo im new in the whole AI-Thing and want to try out someting easy (i hope it is).
I‘d like to code and train an ai for spam detection in my discord bot. So this „free nitro“ stuff or invitelinks seperated with [dot] or smth sould be learned and deleted automatically.
I would use tensorflow for deeplearning bc i like the idea of deeplearning.
Do i have the right approach for my idea?

#

Why tensorflow?
Bc i want to implement a system in this bot so special useres can upvote or downvote the decision of the bot and it gets a live review

magic swallow Jan 31, 2023, 9:31 AM

#

Maybe think about what sort of model would fit your needs and your training data

late shell Jan 31, 2023, 10:01 AM

#

Hello. I'm a final year under graduate in India. And I want to pursue my Masters from Japan through the MEXT scholarship. The scholarship requires that the applicant proposes a SOP explaining the project/study he will be doing while his stay in Japan. I don't have any great ideas for the scholarship as of yet except this one. Can you guys please let me know what are your thoughts on this? Also I'm not very confident with this one since the release of Chat GPT because of its amazing capabilities.

My Idea: I plan to make a tool that generates meaningful stories using machine learning techniques like NLP by taking a particular set of keywords as input and other factors depending on the user.

When learning Japanese, particularly during my revision of previously learned words/kanji (a system of Japanese writing using Chinese characters, used primarily for content words.), I had to read the word/kanji, its meaning and an example sentence in which the word had been used. I had to do this over and over again for every single word. This method of revising felt inefficient, slow and boring after a certain amount of time. I also came to know through social media platforms and communities (such as Nihongo VR on discord) that a huge percentage of people learn words in a similar way. I wanted to make this part interesting and fun.

So, I intend to make a model/tool that will help international students learn the Japanese language in an easy and fun way. After inputting the words already learned (his vocab) by the user in the past, the model will generate an interesting story that would use the words inputted and some other unknown/new words depending on other parameters set by the user. The user can then read the story which will help him/her revise previously learned words in an interesting, fun and efficient manner compared to the usual robotic way of revising/learning words.

The model/tool will presumably have a few requirements from the users such as:

#

• The words already learnt by the user in the past
• The length of the story
• The genre of the story
• The age group for which the story is intended
• The variability of new words to introduce in the story

#

Please let me know your thoughts on this.

obsidian peak Jan 31, 2023, 10:56 AM

#

Scan the number plate and get all the details of the vehicle! 🚘, https://github.com/YashIndane/platefetcher

GitHub

GitHub - YashIndane/platefetcher: Scan the number plate and get all...

Scan the number plate and get all the details of the vehicle! 🚘 - GitHub - YashIndane/platefetcher: Scan the number plate and get all the details of the vehicle! 🚘

spring sphinx Jan 31, 2023, 12:07 PM

#

Hi guys, I'm a web developer and have tried my hand at pandas and numpy to learn a little bit of data science..

I don't have any definite guide or pathway to follow, (my aim is to learn Machine learning (Deep Learning))

Can y'all guide me and recommend some resources, give me sort of a path that I can follow?

rancid quartz Jan 31, 2023, 3:10 PM

#

I am trying to create a algorithm where it is trained 100 of photos and learns to read emotions displayed of photos, is should I use PyTorch or TensorFlow (I'm tipping toes in DL for the first time btw)

#

im going offline so just ping me if u know thanks!

stone glacier Jan 31, 2023, 4:42 PM

#

hi, would anyone have a good source link where I check out how to tune/optimize hyperparameters for a TFIDF model?

lapis sequoia Jan 31, 2023, 4:52 PM

#

why does my dataframe has a second header Date

agile cobalt Jan 31, 2023, 4:52 PM

#

that's the name of the index

lapis sequoia Jan 31, 2023, 4:52 PM

#

then what is the above header row

agile cobalt Jan 31, 2023, 4:52 PM

#

the names of the columns

lapis sequoia Jan 31, 2023, 4:53 PM

#

how is Date accesed

agile cobalt Jan 31, 2023, 4:53 PM

#

df.index

lapis sequoia Jan 31, 2023, 4:53 PM

#

can i delete this index?

#

index column

agile cobalt Jan 31, 2023, 4:53 PM

#

do you want to get rid of the date column or just turn it into a normal column?

lapis sequoia Jan 31, 2023, 4:54 PM

#

yeah

agile cobalt Jan 31, 2023, 4:54 PM

#

which of the two options

lapis sequoia Jan 31, 2023, 4:54 PM

#

oh either works

agile cobalt Jan 31, 2023, 4:54 PM

#

you can use reset_index(), see the parameters it supports

#

!d pandas.DataFrame.reset_index

arctic wedgeBOT Jan 31, 2023, 4:54 PM

#

pandas.DataFrame.reset\_index


DataFrame.reset_index(level=None, *, drop=False, inplace=False, col_level=0, col_fill='', allow_duplicates=_NoDefault.no_default, names=None)```
Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

agile cobalt Jan 31, 2023, 4:54 PM

#

drop=True if you want to get rid of it
drop=False if you want to turn it into a normal column

#

dataframes must have an index though - using reset index just gives it a default unnamed numerical index

lapis sequoia Jan 31, 2023, 4:55 PM

#

tried now, didnt work, its still there

#

agile cobalt Jan 31, 2023, 4:56 PM

#

pandas methods just about never modify in-place, they all return a new dataframe/series instead

lapis sequoia Jan 31, 2023, 4:56 PM

#

oh

#

awesome worked thank you

#

different problem the rows get shuffled

#

#

whats going wrong

agile cobalt Jan 31, 2023, 5:00 PM

#

isn't that because of the train/test split?

lapis sequoia Jan 31, 2023, 5:00 PM

#

probably

agile cobalt Jan 31, 2023, 5:00 PM

#

!d sklearn.model_selection.train_test_split

arctic wedgeBOT Jan 31, 2023, 5:00 PM

#

sklearn.model\_selection.train\_test\_split


sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)```
Split arrays or matrices into random train and test subsets.

Quick utility that wraps input validation, `next(ShuffleSplit().split(X, y))`, and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation).

agile cobalt Jan 31, 2023, 5:00 PM

#

shuffle=True

thin palm Jan 31, 2023, 5:00 PM

#

how do I read this as a dataframe for my pandas?

lapis sequoia Jan 31, 2023, 5:01 PM

#

agile cobalt > shuffle=True

ahh ty again

agile cobalt Jan 31, 2023, 5:01 PM

#

I recommend taking a better look at the documentation of the functions and modules you're using 😅

lapis sequoia Jan 31, 2023, 5:02 PM

#

didnt cross my mind that it could be found in docs

#

or are you telling me i should read docs for every single method use it or not

hard birch Jan 31, 2023, 5:06 PM

#

Guys quick question. I'm practicing doing codity's sample coding tests for a job application. I keep running into time problems when they ask me array questions i.e. return minimal elements of an array, find which array element is unique, etc. Of course these are really easy questions if you do not care about code efficiency but my question is how can I make my code run faster in these cases. For example if I have to return the max element of an array I would just use max() i.e. my code is literally just 1 line. But even then they say it's not efficient enough

#

but surely python must have some fairly efficient list indexing algorithms no built into it no?

#

how would it even be possible to write a faster code outside of producing a research paper

agile cobalt Jan 31, 2023, 5:16 PM

#

lapis sequoia or are you telling me i should read docs for every single method use it or not

all methods you plan to use (perhaps not the entire page, but at least the parameters it accepts and the function's docstring) + some more generic resources
both pandas and sklearn have some pages that are more like guides than actual documentation

nocturne eagle Jan 31, 2023, 5:58 PM

#

lapis sequoia or are you telling me i should read docs for every single method use it or not

how would you know if you want to use a method (or not) until you know what it does? and how would you know what it can do (and not do) until you read the docs?

#

honestly, even for very large libraries, reading the tutorials and docs takes a day or two, max

eternal hull Jan 31, 2023, 6:04 PM

#

Does anyone use pyspark

serene scaffold Jan 31, 2023, 6:10 PM

#

eternal hull Does anyone use pyspark

someone probably does. don't ask to ask 😛

eternal hull Jan 31, 2023, 6:11 PM

#

I am getting this error the input column index should have at least two distinct values

#

I am using vectorindexer,stringindexer for onehotencoding

torpid pecan Jan 31, 2023, 6:22 PM

#

hi! someone here knows about a data science/data analytics case using heuristics algorithms?

serene scaffold Jan 31, 2023, 6:25 PM

#

torpid pecan hi! someone here knows about a data science/data analytics case using heuristics...

don't ask to ask. ask a complete question that someone who knows the answer could start answering.

torpid pecan Jan 31, 2023, 6:28 PM

#

thats the point. I just wanted a link, .pdf or something to start studying. If anyone (or everyone) has a case to share with me, I'll be grateful 🙂

hot hearth Jan 31, 2023, 6:30 PM

#

so i'm trying to do something slightly strange with matplotlib

#

this is a bit of an xy problem, but i really wanna solve x to improve my understanding

dawn urchin Jan 31, 2023, 6:31 PM

#

send you problem

#

the code

hot hearth Jan 31, 2023, 6:31 PM

#

i want to set the exact size for a figure, and not have it autoadjust

#

when i draw a pie chart, no matter how i configure everything, i can't seem to get it to cut off

#

here's a contrived example. the blue line is where i want the left edge of the image to be (yes, really)

#

but it's automatically growing to avoid cutting anything off. no matter what, i can't seem to disable that behavior

vast lintel Jan 31, 2023, 6:48 PM

#

Is it possible to use BERT's small model for organising and expanding some text input?

dawn urchin Jan 31, 2023, 6:51 PM

#

hot hearth here's a contrived example. the blue line is where i _want_ the left edge of the...

https://www.freecodecamp.org/news/matplotlib-figure-size-change-plot-size-in-python/#:~:text=The figsize() attribute can,the height of a plot.

freeCodeCamp.org

Matplotlib Figure Size – How to Change Plot Size in Python with plt...

When creating plots using Matplotlib, you get a default figure size of 6.4 for the width and 4.8 for the height (in inches). In this article, you'll learn how to change the plot size using the following: * The figsize() attribute. * The set_figwidth() method.

mint palm Jan 31, 2023, 7:44 PM

#

i might have asked before, but still confused
Actually i couldnt find some good source to "ACTUALLY" understand "HOW" zero shot learning works
I have read medium, and saw many video but i dont understand how while prediction, it outputs label

woven coral Jan 31, 2023, 7:51 PM

#

https://www.kaggle.com/code/sadikaljarif/cloth-image-classification-using-neural-network

Cloth Image Classification Using Neural Network

Explore and run machine learning code with Kaggle Notebooks | Using data from [Private Datasource]

potent sleet Jan 31, 2023, 7:56 PM

#

Would it be Ok if I ask for feedback about the work I have published on my youtube channel here? I have composed all music in this channel using my own AI model and would appreciate constructive feedback

hot hearth Jan 31, 2023, 7:59 PM

#

dawn urchin https://www.freecodecamp.org/news/matplotlib-figure-size-change-plot-size-in-pyt...

unfortunately that doesn't help with my problem. no matter the figsize, overflows cause the frame to expand to fit them

sick fern Jan 31, 2023, 8:06 PM

#

Hello everyone. This is my code to recognize digits from 0-10.

from tensorflow.python.ops.logging_ops import image_summary
#DataSet
mnist = tf.keras.datasets.mnist
(x_train, y_train) , (x_test, y_test) = mnist.load_data()

#Normalization
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)


#Model
nn_model = tf.keras.models.Sequential()
nn_model.add(tf.keras.layers.Flatten(input_shape = (28,28)))
nn_model.add(tf.keras.layers.Dense(128, activation="relu"))
nn_model.add(tf.keras.layers.Dense(128, activation="relu"))
nn_model.add(tf.keras.layers.Dense(10, activation="softmax"))

nn_model.compile(optimizer="adam" , loss="sparse_categorical_crossentropy" , metrics=['accuracy'])
nn_model.fit(x_train, y_train, epochs=1)
nn_model.save('recognition.model')

image = cv2.imread('/1.png')[:,:,0]
image = np.invert(np.array(image))
image = image.reshape(1,28,28)
prediction = nn_model.predict(image)
print(np.argmax(prediction))

#

However, everytime I try to print the prediction, it doesn't tell the right digit even though it has an accuracy of 98%. I also checked the distortion of the image and it looks fine. Is something wrong with the code?

lapis sequoia Jan 31, 2023, 8:08 PM

#

agile cobalt all methods you plan to use (perhaps not the entire page, but at least the param...

ngl i tried reading docs like pytorch's one and felt like they expect us to be on advanced levels already, its not beginner friendly

agile cobalt Jan 31, 2023, 8:09 PM

#

lapis sequoia ngl i tried reading docs like pytorch's one and felt like they expect us to be o...

if you means in terms of ML theory, pretty much yes - though usually there are some guides that explain it in simpler terms

lapis sequoia Jan 31, 2023, 8:11 PM

#

yep and when you read the docs you feel like where am i even going after reading all these

mint palm Jan 31, 2023, 8:12 PM

#

I am getting very confused with zero-shot learning. Can anyone please comment on following, if its true/false/correction needed:

"attention is all you need" was actually zero-shot learning
for zero-shot learning, input might be novel, but it is represented in common learned embedding space, and hence after representation model get idea about what it actually is.
I dont see how zero-shot learning is possible without having "embeddings" and "embedding space"

grand belfry Jan 31, 2023, 8:47 PM

#

Is there any way to feed ai music theory and ask it to compose a song?

serene scaffold Jan 31, 2023, 9:01 PM

#

grand belfry Is there any way to feed ai music theory and ask it to compose a song?

by "feed music theory", do you mean using sheet music-like information as training data?

#

or do you mean hard-coding rules about music theory (no parallel fifths, go from the dominant to the tonic, etc.) into it, and then having it generate new music that follows those rules?

ocean swallow Jan 31, 2023, 9:08 PM

#

How do you draw this in R? Circle's radius is a continous data and center is just a coordinate.

#

yeah sry about sending to pythüon channel but was kind of urgent

prime hearth Jan 31, 2023, 10:09 PM

#

hello, i tried SVM and random forest and got an accuracy of 0.76 for validation.
However, i tried to improve my model so i did some feature engineer and took meaningful column names and got an accuracy of 0.73 instead.
My question is, is it okay to choose a model with lower accuracy in this case since the number of features is 700 for the 0.76 accuracy while 300 features around for 0.73
i was afraid if the 0.76 validation is because my model is overfitting

lapis sequoia Jan 31, 2023, 10:21 PM

#

I only add one element but the shape end up like this, why

mild dirge Jan 31, 2023, 10:22 PM

#

Check the docs
https://numpy.org/doc/stable/reference/generated/numpy.append.html

#

So in your case you want to reshape the second array such that it is (1, 30, 1) (Might not be necessary)

#

And then append along axis=0

lapis sequoia Jan 31, 2023, 10:23 PM

#

mild dirge And then append along axis=0

amazing, thank you

#

i would have no clue even if i read the docs

prime hearth Jan 31, 2023, 11:47 PM

#

hello, i would like to ask how many features is okay for trainning a model in NLP?

#

im unsure if i have too little features or too many

#

im doing a classification problem

hasty mountain Jan 31, 2023, 11:53 PM

#

prime hearth hello, i would like to ask how many features is okay for trainning a model in NL...

Try to make sure your amount of features make sense.
If anything, you could try using a PCA to have an idea, since dimensionality reduction try to decompose the data into main features that make sense mathematically.

prime hearth Jan 31, 2023, 11:56 PM

#

@hasty mountain i know the idea behind PCA but im still new to it. Also , what im unsure of is how would I find out what those new features are with PCA?

#

Thank for answering - my model predicts yes or no based on the text from user reviews on apps the use and their rating for the app

#

so my model has around right now 300 features, i think it makes sense since it is using the most popular words to predict something

#

but if i add more features where the new features are new popular words i noticed an increase in accuracy

#

but again im unsure if that is okay- im really new to NLP and this is like my first big project so just trying to make sure im doing things right

hasty mountain Feb 1, 2023, 12:03 AM

#

prime hearth <@388857837222100993> i know the idea behind PCA but im still new to it. Also , ...

I don't remember how PCA works exactly, but it kinda creates pseudo-labels. It's more useful to determine how many groups you can distinguish or data, it tries to find patterns in data.

But again, I'm a bit rusty on those classic algorithms since I dived into neural networks.

hasty mountain Feb 1, 2023, 12:03 AM

#

prime hearth so my model has around right now 300 features, i think it makes sense since it i...

Are those features the common words for determining the binary classification?

#

Perhaps you could use Grid Search using different number of features and check how your model will perform.

prime hearth Feb 1, 2023, 12:05 AM

#

the features i chose are like the most impactful with polarity (sentiment value).

mild dirge Feb 1, 2023, 12:05 AM

#

Popular words are also not always useful words

#

and f.e.

prime hearth Feb 1, 2023, 12:06 AM

#

yeah when i mean popular i mean like "worst" "hate" "love"

#

since those impact whether the user likes the app or not

hasty mountain Feb 1, 2023, 12:06 AM

#

prime hearth yeah when i mean popular i mean like "worst" "hate" "love"

Yeah, I was thinking about that

#

Sentiment "remark words"

#

There might be a specific term for that, but I don't know

prime hearth Feb 1, 2023, 12:07 AM

#

yeah i used spacy library to get adjectives from my features - im also doing feature selection right now but i guess my original question is how many features is okay for my SVM model

#

thank you i can try that grid search- it just it takes a long time to run and my computer slows down if i do it for all features and models

hasty mountain Feb 1, 2023, 12:08 AM

#

I think there's better options...I just don't remember.

#

There's Grid Search, Randomized Search...

prime hearth Feb 1, 2023, 12:08 AM

#

yeah i used randomize search too

hasty mountain Feb 1, 2023, 12:09 AM

#

Then you might already have an idea on how many features you should use.,

prime hearth Feb 1, 2023, 12:09 AM

#

i guess i not sure if 700 features is okay- my model gives like 0.75 accuracy after performing metric testing with sklearn

#

so how would i do randomize search with each feature? You mean like forward feature selection process?

#

what you mentioned earlier

hasty mountain Feb 1, 2023, 12:10 AM

#

I guess you can try making randomized search with 700 features, then try with 600, 500...

#

Or even Grid Search with 700, 600, 500...and then you can filter the best number.

prime hearth Feb 1, 2023, 12:11 AM

#

oh ok thanks that make sense. I guess il try that then whatever gives the best accuracy or metrics i will try forward feature selection to choose the optimal features

hasty mountain Feb 1, 2023, 12:12 AM

#

prime hearth oh ok thanks that make sense. I guess il try that then whatever gives the best a...

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection

scikit-learn

API Reference

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidel...

#

I guess some of those cross-validation methods might help

#

K-Fold...

prime hearth Feb 1, 2023, 12:13 AM

#

thank you

tardy summit Feb 1, 2023, 12:42 AM

#

Pandas. I have a tiny data set read from a csv. The first step I need to do is map the contents of one column to two columns, and the value of the second column will depend of the predecessor.

df = {
    "Accounts": [
        "Assets",
        "Current Assets",
        "- Checking",
        "Total Current Assets",
        "Total Assets",
    ],
    "12/31/2016": ["NaN", "NaN", 1000.00, 1000.00, 1000.00],
}

Desired

new_df = {
    "Accounts": ["Assets", "Current Assets", "Checking"],
    "Parent": ["root", "Assets", "Assets:Current Assets"],
    "NaN","NaN",1000.00]
}

I'm not clear on the best way to approach this. Seems one option would be to build a new column first by iterating over the "Accounts" column, and then filtering out the lines that start with "Total".

I think that would work, but is there a better way?

pulsar ether Feb 1, 2023, 12:50 AM

#

I finally got it installed on WSL2 / Ubuntu from scratch. Haha what a PITA

lapis sequoia Feb 1, 2023, 6:06 AM

#

is Seq2Seq model the same as iterating a sequential model several times

mint palm Feb 1, 2023, 2:00 PM

#

I dont see how zero-shot learning is possible without having "embeddings" and "embedding space"
Can we do zero shot learning when labels are like orange, banana, papaya, etc. (i mean they aren't descriptive but specific)

grand belfry Feb 1, 2023, 5:47 PM

#

is there any snapshot of a musical composition ai?

mint palm Feb 1, 2023, 5:50 PM

#

It doesn't makes sense to fine-tune a model(pre-trained on large dataset) that was trained for Zero-shot learning, unless we are making changes in architecture. Right?

young terrace Feb 1, 2023, 6:10 PM

#

what's everyone's opinion on Data camp?

lapis sequoia Feb 1, 2023, 6:39 PM

#

im interested in ai what programming language should i start , ik the basics of python but i dont know why im so bad.

sick fern Feb 1, 2023, 6:40 PM

#

Hey guys, I really need help with a convolutional neural network I'm trying to work with. This is the code:

#MNIST
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


train_images = train_images.reshape((60000,28,28))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((10000,28,28))
test_images = test_images.astype('float32')/255

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32,(3,3), activation='relu', input_shape = (28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64,activation = 'relu'))
model.add(layers.Dense(10, activation= 'softmax'))

model.compile(optimizer = "Adam" , metrics = ["accuracy"] , loss = "categorical_crossentropy")
model.fit(train_images, train_labels, epochs=1)

Prediction = model.predict(test_images[0])
print(Prediction, test_labels[0])

#

This is the error code:

ValueError: in user code:

    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 228, in assert_input_compatibility
        raise ValueError(f'Input {input_index} of layer "{layer_name}" '

    ValueError: Exception encountered when calling layer "sequential" (type Sequential).
    
    Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=2. Full shape received: (None, 28)
    
    Call arguments received by layer "sequential" (type Sequential):
      • inputs=tf.Tensor(shape=(None, 28), dtype=float32)
      • training=False
      • mask=None

#

If anyone can help me, that'd be great. Also I'd like to talk to experienced ML learners to better understand this field, so if you're fine with me asking questions to u in DMS react to this message.

#

Thanks a lot!

lapis sequoia Feb 1, 2023, 6:44 PM

#

im interested in ai what programming language should i start , ik the basics of python but i dont know why im so bad.

lapis sequoia Feb 1, 2023, 6:44 PM

#

sick fern If anyone can help me, that'd be great. Also I'd like to talk to experienced ML ...

maybe you can help me

sick fern Feb 1, 2023, 6:44 PM

#

lapis sequoia im interested in ai what programming language should i start , ik the basics of ...

I've learnt it through Python

lapis sequoia Feb 1, 2023, 6:45 PM

#

sick fern I've learnt it through Python

ispython a good choice will it be good for a career some person is saying that it kid language

#

ik the basics

sick fern Feb 1, 2023, 6:45 PM

#

Whoever said that is definitely wrong

lapis sequoia Feb 1, 2023, 6:45 PM

#

mybe u can help me

sick fern Feb 1, 2023, 6:45 PM

#

ML Engineers prefer Python because it's easy and accessible

lapis sequoia Feb 1, 2023, 6:46 PM

#

im 14 years old

sick fern Feb 1, 2023, 6:46 PM

#

lapis sequoia mybe u can help me

I'd say learn Python and then learn packages like Numpy, Pandas, and Matplotlib

sick fern Feb 1, 2023, 6:46 PM

#

sick fern I'd say learn Python and then learn packages like Numpy, Pandas, and Matplotlib

And then get into ML by doing a course on it.

#

I think it's also important to learn math skills like calculus, and linear algebra if you want to master neural networks

sick fern Feb 1, 2023, 6:47 PM

#

lapis sequoia im 14 years old

I'm 15

lapis sequoia Feb 1, 2023, 6:48 PM

#

is it really programming when u using libraries why cant u do it without them??

sick fern Feb 1, 2023, 6:48 PM

#

I used to think that too

lapis sequoia Feb 1, 2023, 6:48 PM

#

reply

sick fern Feb 1, 2023, 6:49 PM

#

But the truth is, libaries just make it more easier. For example, making a neural network without a library is next to impossible.

sick fern Feb 1, 2023, 6:50 PM

#

sick fern But the truth is, libaries just make it more easier. For example, making a neura...

Not impossible, I've seen people making them with raw mathematics, but it's a waste of time. It's useful when you're trying to learn but faang companies like Google don't want you to create neural networks from scratch. They want you to complete a task as efficiently as possible, so you need libraries such as Tensorflow, Numpy, Sklearn and others

lapis sequoia Feb 1, 2023, 6:50 PM

#

got it

#

ik the basics but im like saying corrupted

#

should i relearn

sick fern Feb 1, 2023, 6:51 PM

#

sick fern Hey guys, I really need help with a convolutional neural network I'm trying to w...

^ Anyways, if someone can help me with this problem, that'd be really helpful. Thanks again

sick fern Feb 1, 2023, 6:52 PM

#

lapis sequoia ik the basics but im like saying corrupted

If you don't understand it too well, and think you still need some relearning, do this crash course: https://www.youtube.com/watch?v=_uQrJ0TkZlc

YouTube

Programming with Mosh

Python Tutorial - Python Full Course for Beginners

Python tutorial - Python full course for beginners - Go from Zero to Hero with Python (includes machine learning & web development projects).

🔥 Want to master Python? Get my Python mastery course: http://bit.ly/35BLHHP
👍 Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF

👉 Watch the new edition: https://youtu.be/kqtD5dpn9C8

...

▶ Play video

#

After that, do these: https://www.youtube.com/watch?v=9JUAPgtkKpI ; https://www.youtube.com/watch?v=vmEHCJofslg

YouTube

Patrick Loeber

NumPy Crash Course - Complete Tutorial

Get my Free NumPy Handbook:
https://www.python-engineer.com/numpybook

Learn NumPy in this complete 60 minutes Crash Course! I show you all the essential functions of NumPy, and some tricks and useful methods. NumPy is the core library for scientific computing in Python. It is essential for any data science or machine learning algorithms.

▶ Play video

YouTube

Keith Galli

Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel fi...

Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith

Data & code used in this Tutorial: https://github.com/KeithGalli/pandas
Python Pandas Documentation: http://pandas.pydata.org/pandas-docs/stable/

Let me know if you have any questions!

In this video we walk through many of the ...

▶ Play video

#

And then, I'd suggest you do an ML course to get the hang of tensorflow. I did a course called Machine Learning for Everybody and it was a 3 hour course which helped me a lot

pseudo aspen Feb 1, 2023, 6:53 PM

#

how can you find the index of a specific tensor in a list of tensors?

lapis sequoia Feb 1, 2023, 6:55 PM

#

difference between ml and data science

agile cobalt Feb 1, 2023, 7:01 PM

#

lapis sequoia difference between ml and data science

machine learning is about creating, evaluating, deploying and maintaining models, ranging from some traditional methods like decision trees to neural networks

data science covers almost everything that regards working with data - collection, transformation, analysis, storage (to a lesser degree), presentation etc

young terrace Feb 1, 2023, 7:19 PM

#

young terrace what's everyone's opinion on Data camp?

vicksyPeek

#

is it worth it? it appears to be on a small sale and i was considering it

sick fern Feb 1, 2023, 7:33 PM

#

Any people experienced in ML who I can DM for help?

serene scaffold Feb 1, 2023, 7:45 PM

#

sick fern Any people experienced in ML who I can DM for help?

that's not really how we do things here. you should always just put your question in the chat. ML is a large area, and no one is a universal expert.

snow thicket Feb 1, 2023, 7:57 PM

#

Hi everyone,

I'm trying to use MONAI TorchVisionFCModel to include inception_v3.
I'm getting error
Calculated padded input size per channel: (1 x 1). Kernel size: (5 x 5). Kernel size can't be greater than actual input size
looks like i'm trying to run input size (1x1) with larger kernel size (5x5). I'm not sure what I should add to make changes to my input size.

sick fern Feb 1, 2023, 8:45 PM

#

Hey guys, I have this code:

#MNIST
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


#Normalization
train_images = train_images.reshape((60000,28,28,1))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((10000,28,28,1))
test_images = test_images.astype('float32')/255

#Labels --> Numerical Values
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

#Model
NN_Model = models.Sequential()
NN_Model.add(layers.Flatten(input_shape = (28,28,1)))
NN_Model.add(layers.Dense(128, activation='relu'))
NN_Model.add(layers.Dense(128, activation='relu'))
NN_Model.add(layers.Dense(10, activation='softmax'))

#Compilation, Fitting
NN_Model.compile(optimizer = "Adam" , metrics = ["accuracy"] , loss = "categorical_crossentropy")
NN_Model.fit(train_images, train_labels, epochs=1)

#Prediction, Display
Prediction = NN_Model.predict(test_images[0])
print(np.argmax(Prediction), test_labels[0])

#

but it's giving this error:


    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step  **
        outputs = model.predict_step(data)
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
        return self(x, training=False)
    File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 248, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer "sequential_23" (type Sequential).
    
    Input 0 of layer "dense_57" is incompatible with the layer: expected axis -1 of input shape to have value 784, but received input with shape (None, 28)
    
    Call arguments received by layer "sequential_23" (type Sequential):
      • inputs=tf.Tensor(shape=(None, 28, 1), dtype=float32)
      • training=False
      • mask=None

#

Can someone please help me? I don't know why it's not working.

weary rock Feb 1, 2023, 8:54 PM

#

what i see from the error is output from flatten layer is not compatible with the input to dense layer

#

check that out

serene scaffold Feb 1, 2023, 9:03 PM

#

@sick fern someone answered you ^

sick fern Feb 1, 2023, 9:05 PM

#

weary rock what i see from the error is output from flatten layer is not compatible with th...

Yeah, I just figured out the problem. I didn't keep the batch size in my input so I got an error. Thanks a lot

weary rock Feb 1, 2023, 9:07 PM

#

sick fern Yeah, I just figured out the problem. I didn't keep the batch size in my input s...

Np

weary vigil Feb 1, 2023, 10:21 PM

#

Could someone with a good understanding of Pandas please help me:

#

https://discord.com/channels/267624335836053506/1070466173738696774

steady basalt Feb 1, 2023, 11:20 PM

#

is it worth learning how to build generative models or language models due to their increasing popularity? every interview I had thesedays asked me about chatgpt for some dumb reason

woven coral Feb 1, 2023, 11:23 PM

#

https://www.kaggle.com/code/sadikaljarif/create-a-convolutional-neural-network-cifar10

Create a Convolutional Neural Network_(cifar10)

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

fiery dust Feb 1, 2023, 11:41 PM

#

I dont want to spam the channel with a huge message, so if someone's willing to read it, plz ping me and I'll post the message its a somewhat big question since I give context and some more stuff 🙂

tranquil sand Feb 1, 2023, 11:53 PM

#

guys, i'm reading "deep learning with python" by chollet, what should i do after i finish it?

slender sand Feb 2, 2023, 12:00 AM

#

Can anyone recommend a good Udemy (or other) course for someone who wants to learn how to perform various business data analysis tasks (such as forecasting) and already has an intermediate understanding of Python and SQL? I only have time to take probably 10-15 hours of classes before Monday and I don't want to pick a lemon, been burned on bad classes before

serene scaffold Feb 2, 2023, 12:21 AM

#

tranquil sand guys, i'm reading "deep learning with python" by chollet, what should i do afte...

what is your goal?

tranquil sand Feb 2, 2023, 12:31 AM

#

serene scaffold what is your goal?

become good at NLP

#

and posibly work at deepmind or openAI

fiery dust Feb 2, 2023, 12:54 AM

#

Guys I think I'm investigating a lot on where to study from, what's the best course and bla bla bla. I want to start somewhere. I thought FreeCodeCamp PyTorch course would be ok. But I dont know yet, where should I learn from? Also it seems like PyTorch can't do everything, we also need SciKit, SciPy and more things to work with different AI/ML models.

#

I just want to start I'm just in a loop of investigating where to learn from but I never start

#

btw I like learning with videos and, for the moment im just interested in a predicting values. for example ive a list or values that when plugged in a function it returns a percentage, so I test different lists and then based on the percentage value, and giving to the model the list with the values tested, which are possible better lists ( better = higher percentage)

serene scaffold Feb 2, 2023, 12:55 AM

#

tranquil sand become good at NLP

where are you now in your education?

tranquil sand Feb 2, 2023, 12:56 AM

#

serene scaffold where are you now in your education?

high school

#

i'm gonna start to see calculus this year{

serene scaffold Feb 2, 2023, 12:57 AM

#

tranquil sand i'm gonna start to see calculus this year{

that's good. be sure to learn as much from that course as you can.
keep in mind that if you want to work for a preeminent AI company, you will need a masters or PhD. Is that something you're willing to consider?

tranquil sand Feb 2, 2023, 12:58 AM

#

yep

serene scaffold Feb 2, 2023, 1:00 AM

#

tranquil sand become good at NLP

Speech and text are sequences of symbols, so if you've learned the basics of neural networks (like feed forward ones), you should start looking into ones that involve sequences. Like LSTMs.

tranquil sand Feb 2, 2023, 1:01 AM

#

serene scaffold Speech and text are sequences of symbols, so if you've learned the basics of neu...

the book explains almost everything, including transformers

#

but it just makes an overview

serene scaffold Feb 2, 2023, 1:02 AM

#

tranquil sand the book explains almost everything, including transformers

you probably won't understand most of the material after reading the book once. especially if you haven't started calculus.

#

which is fine

tranquil sand Feb 2, 2023, 1:02 AM

#

serene scaffold you probably won't understand most of the material after reading the book once. ...

the book doesn't use methematical notation and explains the key methematuical concepts

#

https://www.amazon.com/-/es/Francois-Chollet/dp/1617296864/ref=d_pd_vtp_sccl_2_2/141-7916779-7454657?pd_rd_w=BJqOo&content-id=amzn1.sym.8e065679-52e9-4d16-ae63-fa3d08b93cef&pf_rd_p=8e065679-52e9-4d16-ae63-fa3d08b93cef&pf_rd_r=BNA35KD98JHWWNGWMYNR&pd_rd_wg=eKmxc&pd_rd_r=eae08e33-8ec6-4690-92cc-fd3dc07f2b10&pd_rd_i=1617296864&psc=1

Deep Learning with Python, Second Edition

#

it's from the creator of keras

serene scaffold Feb 2, 2023, 1:03 AM

#

well, my roadmap for you is to implement something that involves a feed-forward neural network (even if it isn't NLP related), and then do something that involves an LSTM.

hasty mountain Feb 2, 2023, 1:08 AM

#

serene scaffold well, my roadmap for you is to implement something that involves a feed-forward ...

Tell them to also dive deep into Transformer

#

py_guido hyperlemon

supple terrace Feb 2, 2023, 1:52 AM

#

import requests
import json

Replace with your own OpenAI API credentials

openai_api_key = "your_openai_api_key"
openai_model = "your_openai_model"

Replace with your own Zendesk API credentials

zendesk_subdomain = "your_zendesk_subdomain"
zendesk_username = "your_zendesk_email"
zendesk_token = "your_zendesk_token"

Function to generate response from OpenAI API

def generate_response(text):
headers = {
"Authorization": f"Bearer {openai_api_key}",
"Content-Type": "application/json",
}
data = {
"model": openai_model,
"prompt": text,
}
response = requests.post("https://api.openai.com/v1/engines/davinci/completions", headers=headers, data=json.dumps(data))
response_json = response.json()
return response_json["choices"][0]["text"]

Function to retrieve previous tickets from Zendesk API

def get_previous_tickets():
headers = {
"Content-Type": "application/json",
}
response = requests.get(f"https://{zendesk_subdomain}.zendesk.com/api/v2/tickets.json", auth=(zendesk_username, zendesk_token), headers=headers)
response_json = response.json()
return response_json["tickets"]

Main code

previous_tickets = get_previous_tickets()
for ticket in previous_tickets:
text = ticket["description"]
response = generate_response(text)
print(f"Generated response for ticket {ticket['id']}: {response}")

#

does this look right? im trying to use openai to read past tickets in zendesk to answer new tickets

fiery dust Feb 2, 2023, 2:24 AM

#

fiery dust Guys I think I'm investigating a lot on where to study from, what's the best cou...

someone? 🤔

weary vigil Feb 2, 2023, 3:16 AM

#

Someone who knows pandas well - please, please help: https://discord.com/channels/267624335836053506/1070542237227823174

strong sedge Feb 2, 2023, 4:53 AM

#

I have some data where applying regular forcasting algos is not applicable (data is not at all periodic, it's all over the place), is there a way to do forcasting with a collaborative recommendation system ?
Making a recommendation system was part of the assignment.
I am just looking for articles that goes over this process, I wasn't able to find something like this.
(Not looking for code, just a high level explanation will suffice)

hoary wigeon Feb 2, 2023, 5:21 AM

#

Hi, this is the input_dataset of the time_series problem that I'm working on..

#

If anyone can help me in building a time series model using the above data.. that would be great.

#

I want to automate the value selection of p,d,q

strong sedge Feb 2, 2023, 5:30 AM

#

hoary wigeon If anyone can help me in building a time series model using the above data.. tha...

From my knowledge there is no automating this

#

Take a look at this tho
https://analyticsindiamag.com/quick-way-to-find-p-d-and-q-values-for-arima/

Analytics India Magazine

Yugesh Verma

Quick way to find p, d and q values for ARIMA

finding the values of p, d, and q parameters is one of the major tasks to perform while modelling time series with ARIMA models.

mint palm Feb 2, 2023, 6:47 AM

#

can you do zero shot learning when labels are like orange, banana, papaya, etc. (i mean they arent descriptive but specific)

considering i am doing zero shot text2video retrieval and training included examples of
running,
jumping,
catching,
kicking

(one word only)

now on testing if i insert two new class:
skipping
swimming

how will my model even with learnt embedding know the difference in "skipping" and "swimming"?

In descriptive text i do understand model might learn from auxiliary words too which are kind of 'describing' the activity and might help understand, decipher when new text is given because it might have descriptive auxiliary word too.

hoary wigeon Feb 2, 2023, 7:47 AM

#

strong sedge From my knowledge there is no automating this

Hey there

#

#

How can I reduce the gap between actual and forecast?

fiery adder Feb 2, 2023, 8:40 AM

#

https://www.kaggle.com/competitions/ml-olympiad-multilingual-spell-correction/

ML Olympiad - Multilingual Spell Correction

Reconstruct noisy sentences in European languages: English, French, German, Bulgarian and Turkish

celest vine Feb 2, 2023, 8:53 AM

#

I don't understand which columns need scaling and which needs one hot encoding

#

I am aware numerical columns need scaling and categorical need one hot encoding

#

But a column with 0 and 1 (boolean values) considered numerical or categorical?

mild dirge Feb 2, 2023, 9:40 AM

#

If it is either 0 or 1, then it's probably categorical

#

Either belongs to class A, or class B

#

In that case you also don't need to 1-hot encode it

#

@celest vine

celest vine Feb 2, 2023, 10:06 AM

#

mild dirge In that case you also don't need to 1-hot encode it

Okay, got it

#

Also, how big should be the difference between two numerical columns to perform scaling?
Suppose column A has a highest numerical value of 5 and column B as a highest numerical value of 50. Does these require scaling? @mild dirge

mild dirge Feb 2, 2023, 10:08 AM

#

always

#

Just always normalize the data

#

make sure the values are between 0 and 1. Or if the data is approx. normal, standardize it

warm pike Feb 2, 2023, 10:14 AM

#

Hello, I am using Alteryx to load data in from a private website but when beginning the API calls using download tool, it says no data received.

displays HTTP/1.1 400 BAD REQUESTDate: Thu, 02 Feb 2023 10:06:26 GMTContent-Type: text/plainContent-Length: 0Connection: keep-alive

celest vine Feb 2, 2023, 10:22 AM

#

mild dirge Just always normalize the data

Okay, so normalizing the data is different from scaling it?

#

Also, should I always normalize all numerical columns?

mild dirge Feb 2, 2023, 10:22 AM

#

normalizing is scaling all values to be between 0 and 1

#

And you generally want to do that for all columns

#

Such that the model will not be biased towards columns with higher values f.e.

celest vine Feb 2, 2023, 10:24 AM

#

Got it. Thanks

hot slate Feb 2, 2023, 10:50 AM

#

Hi guys, I have a question about memory optimization in parallel training.

I have a dataset D, and I want to train a model M that has several parameters that I want to modify. When I run the training in a parallel manner with multiprocessing (using multiprocessing.pool), my machine often run out of memory. Because the dataset is not that large to overwhelm my machine, I believe that each process makes a copy of the dataset on its own memory zone. So the dataset is loaded multiple times onto the memory and hence overwhelming my machine.

My current naive solution is that I create a class to store the dataset as an internal property self.dataset. My training procedure will read the dataset from self.dataset instead of reading from a function's parameter). However, it doesn't seem to work.

Is there any way that I can load the dataset onto the memory once only, and have all the training procedures in all processes read from that?

young granite Feb 2, 2023, 11:10 AM

#

Hey guys is there a way to adjust height and width of the legend box in plotly?
If i use yanchor="bottom" and xanchor="center" the legend box (in horizontal mode) is way too high.
Thanks in advance!

mild dirge Feb 2, 2023, 11:30 AM

#

hot slate Hi guys, I have a question about memory optimization in parallel training. I ha...

So if you use pool, I assume you use pool.map? In that case a copy of each chunk of the dataset is made and sent to the child process iirc

#

That means if you have enough memory to load your dataset twice, there is no issue

hot slate Feb 2, 2023, 11:31 AM

#

Yes, I'm using pool.map and there are roughly 30 child processes, which will load my dataset 30 times. That's why I'm looking for a solution for this issue

mild dirge Feb 2, 2023, 11:32 AM

#

hot slate Yes, I'm using `pool.map` and there are roughly 30 child processes, which will l...

Do you get the same error with around 4* child processes f.e.?

#

I don't think it makes 30 copies of the entire dataset

#

As a side note, how do you parallelize the model training by splitting up the data? Training the model is normally sequential. How do you combine the resulting partially trained models?

hot slate Feb 2, 2023, 11:37 AM

#

As I initialize the Pool object without any argument, I believe that the max_workers will be equal to the number of CPU. So when I run the code, I can see that all of my 32 CPUs are utilized. However, when the memory runs out, some processes are killed.

hot slate Feb 2, 2023, 11:38 AM

#

mild dirge As a side note, how do you parallelize the model training by splitting up the da...

No no, actually I train different models (using the same algorithm but with different parameters) on the same dataset

mild dirge Feb 2, 2023, 11:39 AM

#

Big models?

hollow citrus Feb 2, 2023, 11:40 AM

#

If I train my model on multiple datasets without making changes or recompiling the nn does the nn retain information or does it get replaced?

mild dirge Feb 2, 2023, 11:40 AM

#

hollow citrus If I train my model on multiple datasets without making changes or recompiling t...

If you use .fit() on the same model, it will retain previous weights

vestal lagoon Feb 2, 2023, 11:40 AM

#

best library to learn machine learning anyone help

hollow citrus Feb 2, 2023, 11:41 AM

#

mild dirge If you use `.fit()` on the same model, it will retain previous weights

Thanks just wanted to make sure!

hollow citrus Feb 2, 2023, 11:41 AM

#

vestal lagoon best library to learn machine learning anyone help

do you mean best library to learn FOR ml or to learn ml?

vestal lagoon Feb 2, 2023, 11:41 AM

#

hollow citrus do you mean best library to learn FOR ml or to learn ml?

yeah in python i am confused so

hollow citrus Feb 2, 2023, 11:42 AM

#

I would say initially pandas, numpy, matplotlib, then later on scikit-learn, or xgboost, etc.

mild dirge Feb 2, 2023, 11:42 AM

#

vestal lagoon best library to learn machine learning anyone help

You don't really learn ml from a library. You implement it with a library, for learning and understanding ml you want to get some courses or read a book on the topic.

hollow citrus Feb 2, 2023, 11:43 AM

#

mild dirge You don't really learn ml from a library. You implement it with a library, for l...

I think he means FOR ml. Atleast I assume thats the case.

vestal lagoon Feb 2, 2023, 11:43 AM

#

hollow citrus I think he means FOR ml. Atleast I assume thats the case.

yeah i am thinking of using pandas

#

thanks gotta watch some documentation and tutorials then

hollow citrus Feb 2, 2023, 11:44 AM

#

no problem!

hot slate Feb 2, 2023, 11:57 AM

#

mild dirge Big models?

Not really, I'm testing a modified decision tree

mild dirge Feb 2, 2023, 12:00 PM

#

Could you try simply replacing map with imap ?

#

This returns an iterator instead of a copy of the chunk from what it seems

#

I've also read a post on SO, and some guy explains that pool.map uses copy-on-write, so unless you modify the data itself, it shouldn't make a copy for every child process.

mild dirge Feb 2, 2023, 12:04 PM

#

mild dirge Could you try simply replacing map with `imap` ?

And maybe you could achieve the same with manually providing just an iterator for the dataframe for each process

supple terrace Feb 2, 2023, 12:51 PM

#

import requests
import json

Replace with your own OpenAI API credentials

openai_api_key = "your_openai_api_key"
openai_model = "your_openai_model"

Replace with your own Zendesk API credentials

zendesk_subdomain = "your_zendesk_subdomain"
zendesk_username = "your_zendesk_email"
zendesk_token = "your_zendesk_token"

Function to generate response from OpenAI API

def generate_response(text):
headers = {
"Authorization": f"Bearer {openai_api_key}",
"Content-Type": "application/json",
}
data = {
"model": openai_model,
"prompt": text,
}
response = requests.post("https://api.openai.com/v1/engines/davinci/completions", headers=headers, data=json.dumps(data))
response_json = response.json()
return response_json["choices"][0]["text"]

Function to retrieve previous tickets from Zendesk API

def get_previous_tickets():
headers = {
"Content-Type": "application/json",
}
response = requests.get(f"https://{zendesk_subdomain}.zendesk.com/api/v2/tickets.json", auth=(zendesk_username, zendesk_token), headers=headers)
response_json = response.json()
return response_json["tickets"]

Main code

previous_tickets = get_previous_tickets()
for ticket in previous_tickets:
text = ticket["description"]
response = generate_response(text)
print(f"Generated response for ticket {ticket['id']}: {response}")

does this look right? im trying to use openai to read past tickets in zendesk to answer new tickets

fallow frost Feb 2, 2023, 12:52 PM

#

pls someone show him how to format the code in Discord

mild dirge Feb 2, 2023, 12:55 PM

#

!code

arctic wedgeBOT Feb 2, 2023, 12:55 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

hollow citrus Feb 2, 2023, 1:05 PM

#

Is it possible to get my code reviewed here? I have a bit of confusion in NLP with GloVe.

supple terrace Feb 2, 2023, 1:07 PM

#

!code

hollow citrus Feb 2, 2023, 1:07 PM

#

can I just share repo link?

#

https://github.com/HeavenEvolved/MachineLearning/blob/main/TensorFlow/NaturalLanguageProcessing/SentimentAnalysis_AmazonReviews/sentiment_analysis_GloVe.ipynb

GitHub

MachineLearning/sentiment_analysis_GloVe.ipynb at main · HeavenEvol...

Learning about the various interesting facets of Machine Learning. - MachineLearning/sentiment_analysis_GloVe.ipynb at main · HeavenEvolved/MachineLearning

#

it keeps doing random guesses at the sentiments

#

so 50% train acc and ~60% train loss

hot slate Feb 2, 2023, 2:01 PM

#

mild dirge Could you try simply replacing map with `imap` ?

I'll take a look at that, thanks for your help

neon obsidian Feb 2, 2023, 2:07 PM

#

Hello, I am trying to convert a txt file into a csv file, using pandas, but having some trouble

#

Does anyone know much about writing csv files? I have more detail in #1070706066523951156 , or you could ask me here ... thanks in advance 🌞

tacit galleon Feb 2, 2023, 2:35 PM

#

Hi everyone.
I'm working on a university project for image recognition. So the idea is if a take a photo of a building I can locate the building, so I want to train a network to recognize the buildings from my university and I was testing mobileNet (https://keras.io/api/applications/mobilenet/) but I think I'm not doing it right. The learning curves are weird and I don't know what can I do to improve my model, so I'm open to suggestions if anyone can help me

Keras documentation: MobileNet, MobileNetV2, and MobileNetV3

#

This are the curves from the training process

hardy kernel Feb 2, 2023, 3:22 PM

#

Sorry for interrupting the ongoing conversation but what's the most popular/intuitive NLP library? Pypi.org doesn't let me sort based on popularity

hard rover Feb 2, 2023, 3:37 PM

#

df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=True)
is this right? it's not doing what I want to do

#

so , I have one column called STRIKE , and I want rows which have values between 17000-19000 but in return it gives me a row which has STRIKE = 20300

#

some stupid logical error IG

fiery dust Feb 2, 2023, 3:55 PM

#

does someone find this video good? https://www.youtube.com/watch?v=pqNCD_5r0IU, the guy just codes and literally narrates what he is coding, he never explains the functionality of each function or even why he is doing each thing. I'm finding it really hard to learn scikit like this. What can you guys recommend me? Thanks

hollow citrus Feb 2, 2023, 4:14 PM

#

hard rover df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=Tru...

I think you missed a ] after the .index?

upbeat dagger Feb 2, 2023, 4:48 PM

#

Anyone know of a good tool to use to for speech recognition? I.E. A model that can tell that it's a specific person's voice?

celest vine Feb 2, 2023, 5:20 PM

#

hard rover df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=Tru...

Working on nifty options data?

hard rover Feb 2, 2023, 5:41 PM

#

hollow citrus I think you missed a ] after the .index?

Then it should give me an error

hard rover Feb 2, 2023, 5:41 PM

#

celest vine Working on nifty options data?

Yes

hollow citrus Feb 2, 2023, 5:42 PM

#

hard rover Then it should give me an error

No, you have an ending ] but in the wrong position so its giving a result, but probably not what you expect probably

hard rover Feb 2, 2023, 5:45 PM

#

hollow citrus No, you have an ending ] but in the wrong position so its giving a result, but p...

i closed every open brackets ...

coral trout Feb 2, 2023, 5:47 PM

#

what do i type to get between the numbers
like
if the temperature is between 20-30 i want it to say "its sunny"
between 10-20 "its a nice day"

hard rover Feb 2, 2023, 5:50 PM

#

hollow citrus No, you have an ending ] but in the wrong position so its giving a result, but p...

see

#

there is some logical error and I can't figure it out, maybe im using drop wrong, I can't tell

charred light Feb 2, 2023, 6:12 PM

#

hard rover see

You have an extra bracket after index.

hard rover Feb 2, 2023, 6:17 PM

#

charred light You have an extra bracket after index.

i know, @hollow citrus wanted me to use extra I told him, no actually aren't supposed to.

hard rover Feb 2, 2023, 6:24 PM

#

hard rover df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=Tru...

this is my main problem

charred light Feb 2, 2023, 7:08 PM

#

hard rover this is my main problem

Did you make sure 'STRIKE' field is an int first?

hard rover Feb 2, 2023, 7:09 PM

#

charred light Did you make sure 'STRIKE' field is an int first?

damn

#

i will try again

charred light Feb 2, 2023, 7:11 PM

#

hard rover this is my main problem

Also this would work too. In this case, I'm assuming between includes the edges. You can adjust if you don't want the edges.
df = df[(df['STRIKE'] >= 17000) & (df['STRIKE'] <= 19000)]

#

Although above statement would require you to rerun your original code where you read in your dataset.

hard rover Feb 2, 2023, 7:19 PM

#

charred light Also this would work too. In this case, I'm assuming between includes the edges....

thanks man

proud solstice Feb 2, 2023, 7:23 PM

#

So I just learned python like two weeks and I'm trying to evoke Artificial intelligence as far as I know
Basically I'm trying to make a code that understands a topic or a scripted text, like I'm making a bot understand what's being said to it

For that.....
I made a list of adjectives as they usually add nothing to the conversation's theme in English and then I made the code remove whatever adjective comes on its way, good

Then I'll probably make another list with keywords which usually indicate a certain topic like "the" "is" "on"......... etc.

And I'll make a greater list with all the weird words that indicate a certain topic to determine what's the text input is about, how'd you rate this stupid idea as a star?

proud solstice Feb 2, 2023, 7:24 PM

#

proud solstice So I just learned python like two weeks and I'm trying to evoke Artificial intel...

This was my first written code, I'm still learning so yes I probably easier ways to do this

#

Why I imported random? I really don't remember lol

umbral charm Feb 2, 2023, 8:27 PM

#

In the library matplotlib, is there a way to make the y axis at 0 instead of at the start, just so it looks nice when my graph is symettry across x = 0

charred light Feb 2, 2023, 8:28 PM

#

umbral charm In the library matplotlib, is there a way to make the y axis at 0 instead of at ...

Yes. https://stackoverflow.com/questions/22642511/change-y-range-to-start-from-0-with-matplotlib

#

The plt.ylim

umbral charm Feb 2, 2023, 8:30 PM

#

charred light Yes. <https://stackoverflow.com/questions/22642511/change-y-range-to-start-from-...

Was looking for somethig like this https://gyazo.com/a47aeedc2725d993fa5bba8576cd38d5

Gyazo

#

Like i moved the spine from the right to the center

#

but how would i get ti to display exactly at 0

charred light Feb 2, 2023, 8:31 PM

#

wtf

#

!paste

arctic wedgeBOT Feb 2, 2023, 8:32 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

charred light Feb 2, 2023, 8:32 PM

#

umbral charm Was looking for somethig like this https://gyazo.com/a47aeedc2725d993fa5bba8576c...

^ Paste the relevant code above for the plot.

umbral charm Feb 2, 2023, 8:32 PM

#

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()

#

swear that use to colour code it

#

!paste fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()

arctic wedgeBOT Feb 2, 2023, 8:33 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

charred light Feb 2, 2023, 8:33 PM

#

That's fine. You need to add {py} after the ```

umbral charm Feb 2, 2023, 8:33 PM

#

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()

#

💀

#

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.spines['left'].set_position('center')
ax.spines['top'].set_color(None)
ax.spines['right'].set_color(None)
plt.plot(Distnace-11.5, Tesla)
plt.show()

charred light Feb 2, 2023, 8:34 PM

#

import numpy as np

#

oh, you need the dot

umbral charm Feb 2, 2023, 8:34 PM

#

Oh do u want the full code?

charred light Feb 2, 2023, 8:34 PM

#

No, that's fine

umbral charm Feb 2, 2023, 8:34 PM

#

Ive just sent the plot the other is just functions imports and unpacking

umbral charm Feb 2, 2023, 8:36 PM

#

charred light No, that's fine

LMAO ive did it

charred light Feb 2, 2023, 8:36 PM

#

It shows up fine to me.

umbral charm Feb 2, 2023, 8:37 PM

#

ax.spines['left'].set_position('zero') instead of 'center'

umbral charm Feb 2, 2023, 8:37 PM

#

charred light It shows up fine to me.

The axis is a bit to the right on the pic

charred light Feb 2, 2023, 8:39 PM

#

umbral charm The axis is a bit to the right on the pic

Yea, it's once you add data. Since center is auto adjusted.

umbral charm Feb 2, 2023, 8:39 PM

#

charred light Yea, it's once you add data. Since center is auto adjusted.

mhm yea

#

i just put 'zero' in it but i didnt even know i could

charred light Feb 2, 2023, 8:39 PM

#

dead

umbral charm Feb 2, 2023, 8:40 PM

#

i cant find the docs on that set_positon

charred light Feb 2, 2023, 8:40 PM

#

#ReadTheDocs

umbral charm Feb 2, 2023, 8:40 PM

#

i did

#

it said nothing useful

#

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.set_position.html

charred light Feb 2, 2023, 8:40 PM

#

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

#

https://matplotlib.org/stable/api/spines_api.html

umbral charm Feb 2, 2023, 8:41 PM

#

na

#

wtf

#

THank you tho :O

charred light Feb 2, 2023, 8:42 PM

#

umbral charm wtf

EYES

charred light Feb 2, 2023, 8:42 PM

#

umbral charm THank you tho :O

Np, just make sure you're using most current docs.

plush jungle Feb 2, 2023, 9:40 PM

#

I'm training a q learning reinforcement agent and I'm trying to start really simple, so the game is just a stationary agent (blue) trying to aim a laser at a stationary target (red), by rotating clockwise or counterclockwise

#

#

it gets a reward whenever the laser is on the target and gets a slight negative reward whenever it's not on the target

#

the state space is {0,360}, with each state being a scalar representing a different angle the laser could be at

#

in theory, the q learning algorithm should generate a q value for each angle, learning to highly rate the couple of angles that are on target and rate all the other ones with a low q value

#

the problem is that all the q values are changing together for some reason

#

#

these are visualizations of each angle's q value as predicted by the neural net

#

red means high and blue means low

#

purple means close to the center of the distribution

#

what it should look like is the angles that are pointed at the target should get more and more red and everything else should get more and more blue

hard rover Feb 2, 2023, 10:19 PM

#

charred light Feb 2, 2023, 10:58 PM

#

hard rover

If your scraping, you should check what was scrapped. Maybe that's part of the screen shot, but what's after the 0. Ref: https://stackoverflow.com/questions/16573332/jsondecodeerror-expecting-value-line-1-column-1-char-0

river sapphire Feb 2, 2023, 10:59 PM

#

plush jungle it gets a reward whenever the laser is on the target and gets a slight negative ...

don't know much python but I have done DQN before and I would say adding reward shaping will definitely help because it provides better feedback for the model

plush jungle Feb 2, 2023, 11:00 PM

#

by reward shaping do you mean making the reward continuous? so it gets higher the closer to the target the laser is?

river sapphire Feb 2, 2023, 11:00 PM

#

yes I mean making some sort of function

hard rover Feb 2, 2023, 11:00 PM

#

charred light If your scraping, you should check what was scrapped. Maybe that's part of the s...

https://discord.com/channels/267624335836053506/1070832232350097498

river sapphire Feb 2, 2023, 11:00 PM

#

doesn't have to necessarily be just continuous

#

could be a combination of continuous and discrete rewards

#

in this case what I would so is give it a reward based on some sort of formula if it chose an action that points it closer to the target and a punishment if it points away

#

and it gets more reward if it chooses an action that points it towards the target and it's already close to pointing towards the target

#

basically creates a gradient

plush jungle Feb 2, 2023, 11:04 PM

#

river sapphire basically creates a gradient

but that still doesn't explain how I'm getting the results I'm getting right now. As a test, I changed the reward so that angles above 180 are 1 and angles below 180 are -1

#

and all of the q values still change together as one

river sapphire Feb 2, 2023, 11:04 PM

#

wdym by that?

#

your'e using a neural network right

#

each time the neural network would update the q-values for all states should update as well

plush jungle Feb 2, 2023, 11:05 PM

#

yes, and I've tried a bunch of different layer/neuron combinations

river sapphire Feb 2, 2023, 11:05 PM

#

unless it updates some weight in the last layer or something

#

no wait even so it should backpropagate and update a large majority of the weights
just realized I remembered how backpropagation works wrong lol it should theoretically update all the weights unless you have an issue like dying relu

#

is your issue that it's predicting the same q-value for all states?

plush jungle Feb 2, 2023, 11:06 PM

#

not the same no

river sapphire Feb 2, 2023, 11:06 PM

#

what's the issue then

plush jungle Feb 2, 2023, 11:07 PM

#

but I'm pretty sure the distribution is the same

#

I was trying to normalize the values between 0 and 1 so I could visualize the q values with color

#

and I was doing it like this

def NormalizeData(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))```

#

and all the the values were the same after normalization

river sapphire Feb 2, 2023, 11:09 PM

#

all the q-values were the same?

plush jungle Feb 2, 2023, 11:09 PM

#

yes

#

either 0 or 1

river sapphire Feb 2, 2023, 11:09 PM

#

do you have a reward graph or a loss graph

plush jungle Feb 2, 2023, 11:10 PM

#

no, but I have the numbers, I just haven't graphed them

river sapphire Feb 2, 2023, 11:10 PM

#

the graph helps a lot imo

#

can you send the code idk if I can help because this seems like a bug in the implementation but I just wanna see

plush jungle Feb 2, 2023, 11:14 PM

#

river sapphire can you send the code idk if I can help because this seems like a bug in the imp...

https://github.com/alexjbusch/top_down_game_reinforcement_learning

#

the main loop is the train() function in train.py

hard rover Feb 2, 2023, 11:15 PM

#

charred light If your scraping, you should check what was scrapped. Maybe that's part of the s...

can u help me now?

plush jungle Feb 2, 2023, 11:15 PM

#

the reward stuff is at the bottom of top_down.py

river sapphire Feb 2, 2023, 11:17 PM

#

shoot really gotta learn python lol

#

understood about 10% of it

charred light Feb 2, 2023, 11:19 PM

#

hard rover can u help me now?

Looks like you solved it?

hard rover Feb 2, 2023, 11:21 PM

#

charred light Looks like you solved it?

no, i found something weird

#

i had one, df.drop and if i remove it, i still get the same error

charred light Feb 2, 2023, 11:24 PM

#

hard rover i had one, df.drop and if i remove it, i still get the same error

Probably dropped bad rows as a side effect.

river sapphire Feb 2, 2023, 11:24 PM

#

plush jungle the reward stuff is at the bottom of top_down.py

yeah idk I don't understand python that much unfortunately
you are using a target network and the correct update equation right?
reward + gamma * argmax Q(s',a';theta-)

hard rover Feb 2, 2023, 11:25 PM

#

charred light Probably dropped bad rows as a side effect.

drop is doing nothing

#

wanna see the code?

charred light Feb 2, 2023, 11:25 PM

#

!paste

arctic wedgeBOT Feb 2, 2023, 11:25 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hard rover Feb 2, 2023, 11:26 PM

#

https://paste.pythondiscord.com/makavaqame

charred light Feb 2, 2023, 11:27 PM

#

hard rover https://paste.pythondiscord.com/makavaqame

So where is your error actually coming from?

hard rover Feb 2, 2023, 11:28 PM

#

charred light So where is your error actually coming from?

drop is not dropping the columns i asked it to drop

#

if u open the sheet you can still see all those columns in it

charred light Feb 2, 2023, 11:29 PM

#

You mean the url?

hard rover Feb 2, 2023, 11:30 PM

#

charred light You mean the url?

I asked it to drop columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK'] but df still contains these columns

charred light Feb 2, 2023, 11:32 PM

#

hard rover I asked it to drop columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_A...

Because you need to store the dropped version somewhere.

hard rover Feb 2, 2023, 11:32 PM

#

charred light Because you need to store the dropped version somewhere.

i did in df1

#

i mean, your code doesn't have it, i just did it

charred light Feb 2, 2023, 11:33 PM

#

Paste a version w/ your changes then.

hard rover Feb 2, 2023, 11:33 PM

#

if u r comfortable hop on VC , I'll explain
totally understandable if you don't, I still appreciate it.

#

https://paste.pythondiscord.com/avezediqat

charred light Feb 2, 2023, 11:45 PM

#

hard rover https://paste.pythondiscord.com/avezediqat

I don't see any errors coming from the code itself.

Try this and let me know what prints out:

df = pd.DataFrame(l_OC)
df.columns = OC_col
print(len(df))
print(df.columns)
print(len(df.columns))
#df1=df.drop(columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK'])
df1=df.drop(['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK'], axis = 1)
print(df1.columns)
print(len(df1.columns))
marks_data = pd.DataFrame(df1)
print(len(df1))
file_name = 'OI.xlsx'
marks_data.to_excel(file_name)
print("done")

hard rover Feb 2, 2023, 11:48 PM

#

charred light I don't see any errors coming from the code itself. Try this and let me know w...

nice

#

where can I learn this? so that I don't have ask stupid questions?

#

i mean what should I search? i tried searching web scaping using python

charred light Feb 2, 2023, 11:50 PM

#

Technically, df1=df.drop(columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BID','p_ASK']) this should work, although it's not preferred method.

hard rover Feb 2, 2023, 11:50 PM

#

or just learn pandas completely , cuz I need graphical representation as well , will have to compare daily end of the day sheets

hard rover Feb 2, 2023, 11:50 PM

#

charred light Technically, `df1=df.drop(columns=['c_BID','c_ASK','c_ASK_QTY','p_BID_QTY','p_BI...

it didn't tho

charred light Feb 2, 2023, 11:52 PM

#

Yea, I'm not sure why exactly it didn't.

hard rover Feb 2, 2023, 11:53 PM

#

thanks tho, i really appreciate it, if only I could be useful to you as well

charred light Feb 2, 2023, 11:53 PM

#

hard rover i mean what should I search? i tried searching web scaping using python

In terms of debugging, web scraping would be the start of your code. (Calling Requests). Here, the problem was: "Dropping columns doesn't work in pandas".
So you could start by throwing that into google.

#

Then generally, I just click through all the stack overflow links, and skim to see if the original problem is similar to the problem at hand. If so, take a quick look at the solution to see if it'll work or it's over complex/too niche.

hard rover Feb 2, 2023, 11:55 PM

#

hard rover thanks tho, i really appreciate it, if only I could be useful to you as well

i can explain to you, what i was doing in the code, you know coding maybe you can automate it and make some bucks if you aren't already doing these stuff

hard rover Feb 2, 2023, 11:55 PM

#

charred light Then generally, I just click through all the stack overflow links, and skim to s...

ok sir, noted

charred light Feb 2, 2023, 11:56 PM

#

hard rover i can explain to you, what i was doing in the code, you know coding maybe you ca...

I'm ok, just glad to help out.

hard rover Feb 2, 2023, 11:59 PM

#

charred light I'm ok, just glad to help out.

now this should also work?
df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=True)

#

it doesn't gives the same error

charred light Feb 3, 2023, 12:01 AM

#

hard rover it doesn't gives the same error

What's the full error?

hard rover Feb 3, 2023, 12:01 AM

#

!paste

arctic wedgeBOT Feb 3, 2023, 12:01 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hard rover Feb 3, 2023, 12:01 AM

#

https://paste.pythondiscord.com/zovitosizo

charred light Feb 3, 2023, 12:02 AM

#

No, the error not the code.

hard rover Feb 3, 2023, 12:02 AM

#

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_10504\1824382283.py in <module>
5 headers={'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36'}
6 response = requests.get(url, headers=headers, timeout=10)
----> 7 json_object = response.json()
8 expiry_date = '09-Feb-2023'
9 def set_decimal(x):

~\anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
973 # Catch JSON-related errors and raise as requests.JSONDecodeError
974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
976
977 @tribal fog

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

charred light Feb 3, 2023, 12:03 AM

#

hard rover it doesn't gives the same error

Ok, actually. This error & df.drop(df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index, inplace=True) are two separate things.

#

The JSON decodeError is basically saying you have something invalid from your scraping.

hard rover Feb 3, 2023, 12:03 AM

#

but it was scraping just fine right now

charred light Feb 3, 2023, 12:04 AM

#

hard rover but it was scraping just fine right now

Well, if you scrape to often, the website can auto ban you.
But you should be checking the status code.

hard rover Feb 3, 2023, 12:05 AM

#

how can I know if they banned me?

#

i can access the site in my browser

charred light Feb 3, 2023, 12:05 AM

#

hard rover how can I know if they banned me?

By checking the status code. print(response.status_code) and seeing what the response is

#

More detailed list. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

hard rover Feb 3, 2023, 12:06 AM

#

401

#

unauthorized

tribal fog Feb 3, 2023, 12:07 AM

#

hard rover JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of t...

Dont ping me cheif

hard rover Feb 3, 2023, 12:07 AM

#

tribal fog Dont ping me cheif

lol, what are the odds xD

#

sorry

tribal fog Feb 3, 2023, 12:07 AM

#

hard rover sorry

Its okay cheif i forgive you

charred light Feb 3, 2023, 12:07 AM

#

tribal fog Dont ping me cheif

Lmao, didn't even notice. Collateral damage.

#

That's actually pretty funny

hard rover Feb 3, 2023, 12:08 AM

#

yea xD

#

it says 401 for u as well

tribal fog Feb 3, 2023, 12:08 AM

#

charred light That's actually pretty funny

Thanks cheif

charred light Feb 3, 2023, 12:08 AM

#

But yea, 401 means you didn't scrape successfully. And your response is empty.

So your next line of code json_object = response.json() is complaining that your input is essentially empty.

hard rover Feb 3, 2023, 12:09 AM

#

omg, so tiring I just fixed it... it was working fine
I add one drop command and shit goes south

#

i hope i don't get 403

#

i removed that drop, now everything is working

#

is my jupyter working fine???

charred light Feb 3, 2023, 12:12 AM

#

hard rover i removed that drop, now everything is working

What? It request error shouldn't have anything to do with the drop.

hard rover Feb 3, 2023, 12:13 AM

#

charred light What? It request error shouldn't have anything to do with the drop.

apparently it does

#

see now it works fine i even get 200

charred light Feb 3, 2023, 12:13 AM

#

hard rover see now it works fine i even get 200

Paste a copy of the code with the drop statement commented out so I can see where it's located.

hard rover Feb 3, 2023, 12:14 AM

#

can you teach me, i'll pay you

hard rover Feb 3, 2023, 12:14 AM

#

charred light Paste a copy of the code with the drop statement commented out so I can see wher...

ok

#

https://paste.pythondiscord.com/rixemigeve

charred light Feb 3, 2023, 12:17 AM

#

Oh, if your going to set the results to df2, then you can't use inplace.
df2=df.drop(df1[ (df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].index)

#

It's either df_new = df.drop(something)
or df.drop(something, inplace=True

hard rover Feb 3, 2023, 12:19 AM

#

why is it giving an error even in comment?

charred light Feb 3, 2023, 12:19 AM

#

hard rover why is it giving an error even in comment?

Because the errors were never tied. It was probably just a coincidence it worked every time you commented out.

hard rover Feb 3, 2023, 12:20 AM

#

remove that line it will work fine

#

error

#

no error

charred light Feb 3, 2023, 12:21 AM

#

Also, there's not really a need for .drop in the first place.
df2 = df1[(df1['STRIKE'] <17000) & (df1['STRIKE'] > 19000)].reset_index(drop=True)

charred light Feb 3, 2023, 12:22 AM

#

hard rover no error

Run this enough times and you'll get an error

hard rover Feb 3, 2023, 12:22 AM

#

oof

#

in stead of wasting time like this, I might as well learn this stuff

#

I am not understanding anything

charred light Feb 3, 2023, 12:23 AM

#

hard rover I am not understanding anything

You have to understand what the code is actually doing.

hard rover Feb 3, 2023, 12:24 AM

#

charred light You have to understand what the code is actually doing.

exactly

#

im trying to change someone else's coding with zero knowledge of scraping & pandas

charred light Feb 3, 2023, 12:24 AM

#

Yea, I can tell lol

hard rover Feb 3, 2023, 12:25 AM

#

xD

#

I really appreciate your patience with me, if I were u I wouldn't be able tolerate this
you are such a nice person

charred light Feb 3, 2023, 12:34 AM

#

hard rover im trying to change someone else's coding with zero knowledge of scraping & pand...

I added some very high level comments and broke down based on what you'll need to know to understand it. https://paste.pythondiscord.com/bedehuqido
The learning part is up to you.

#

But just as a warning, most websites will be able to tell that your scraping and can ban you on the spot. Or, if you access too fast they will ban you too. (Humans can't read 100 web pages in under 5 seconds, but a program can)

hard rover Feb 3, 2023, 12:37 AM

#

will definitely learn it all, meanwhile can you do me a favor??
i only want ;
'c_OI', 'c_CHNG_IN_OI', 'c_VOLUME', 'c_IV', 'c_LTP', 'c_CHNG', 'STRIKE', 'p_LTP', 'p_IV', 'p_VOLUME', 'p_CHNG_IN_OI', 'p_OI'
can you do it?

hard rover Feb 3, 2023, 12:37 AM

#

charred light But just as a warning, most websites will be able to tell that your scraping and...

i will only scrape it once in a day

charred light Feb 3, 2023, 12:38 AM

#

hard rover i will only scrape it once in a day

why don't you just download the json file and run it locally?

hard rover Feb 3, 2023, 12:39 AM

#

charred light why don't you just download the json file and run it locally?

how?

charred light Feb 3, 2023, 12:39 AM

#

There's a save option

hard rover Feb 3, 2023, 12:39 AM

#

oh

#

from the API link? im using the code

charred light Feb 3, 2023, 12:40 AM

#

Yea, it bypasses the 401 errors from the requests.

hard rover Feb 3, 2023, 12:40 AM

#

charred light Yea, it bypasses the 401 errors from the requests.

ok, will learn about it

hard rover Feb 3, 2023, 12:40 AM

#

charred light I added some very high level comments and broke down based on what you'll need t...

this gave an error
same 401

charred light Feb 3, 2023, 12:41 AM

#

Yea, I only added comments

hard rover Feb 3, 2023, 12:43 AM

#

oh I see ,thanks
will get back to you after learning pandas & lil bit of scraping or maybe getting data from local json.

charred light Feb 3, 2023, 12:43 AM

#

If you download + change the path:
https://paste.pythondiscord.com/ubisuruhuw
You won't run into the 401 issues.

#

But, if you rerun the same code again without restarting your notebook, you will run into the JSON issue you had earlier.

#

This only occurs if I rerun the cell without restarting my notebook.

hard rover Feb 3, 2023, 12:44 AM

#

too much to learn, kinda overwhelming

#

thanks for everything , it's already 6:30 AM here, lost sleep due to it
gotta go to work soon
bye

tropic matrix Feb 3, 2023, 1:50 AM

#

I'm working on different CNN model architectures, and I would like a way to visualize them to see the differences in a presentation. I like the design style with the following images (resnet50 model) found on https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d

Medium

Illustrated: 10 CNN Architectures

A compiled visualisation of the common convolutional neural networks

#

is there a place that lets me create such images, or do I have to handmake them in like photoshop

lapis sequoia Feb 3, 2023, 4:14 AM

#

hello, someone can help me with a problem that i have using library scikit learn, numpy and pandas?

drowsy timber Feb 3, 2023, 7:01 AM

#

do you guys have a rule of thumb when deciding what dimensionality reduction technique to go for?

I have this dataset thats just a list of subjects in a school. So rows are students, and columns are subjects then its just denoted with either 1 or 0 if the student is taking that course in the current semester.

celest vine Feb 3, 2023, 7:24 AM

#

hard rover Yes

From where did you get the data?

#

LabelEncoder Or one hot encoding for categorical features?
Which should I do?

sturdy breach Feb 3, 2023, 8:14 AM

#

i have a q

#

how do i print specific val from pandas?

#

i tried it but it just prints .... at the end

celest vine Feb 3, 2023, 8:20 AM

#

sturdy breach how do i print specific val from pandas?

What do you mean by specific value?

sturdy breach Feb 3, 2023, 8:21 AM

#

i selected value from df using [][] and it just prints something and in the middle of it it just stops and prints ...

#

df[df["Počet tlačítek"]>0].nlargest(1,"Počet tlačítek")["url"]

#

".to_string()" does not help

fringe bay Feb 3, 2023, 9:12 AM

#

hey guys
I have a list of gps coordinates. They are plotted in this image.
The coordinates are from the route of my place to a shop in my town.
Sometimes there are lots of points in the same relative small location, sometimes points are missing for a longer distance.
I'm looking for a way to draw spline or something similar from start point to endpoint.
Ultimately I would need to be able to split the spline into X parts, and get those coordinates.
I was looking at scipy with splrep, but I was getting an error "Error on input data", apparently the coordinates have to be sorted and unique. Not sure if that would work in my case though.
What do yo guys suggest ?

wooden sail Feb 3, 2023, 9:18 AM

#

fringe bay hey guys I have a list of gps coordinates. They are plotted in this image. The c...

so, the issue with splines is exactly as you said. you specify the endpoints that each spline exactly passes through, so the points have to be sorted and unique. these are easy problems to solve tbh, but there are alternatives. one of them is to use some form of fitting function instead. however, these do not pass through the data points in general, only "close" to them. they also don't join nicely to each other, which is the main feature of splines: you specify continuity up to some degree of derivative

#

so my answer would be "pick your poison", because there is no nice way of doing this with all the properties you might want 😛

fringe bay Feb 3, 2023, 9:19 AM

#

wooden sail so my answer would be "pick your poison", because there is no nice way of doing ...

appreciate your answer @wooden sail
which one would you go with ?

wooden sail Feb 3, 2023, 9:20 AM

#

i would decide based on how important accuracy and smoothness are to you

#

if you need the curve to actually pass through the points, it'll have to be splines and you will need to preprocess the data

fringe bay Feb 3, 2023, 9:22 AM

#

I need them to be kind of accurate
I like the idea of splines, but my block is here: I have the coordinates which might be forward, backward, forward, backward, and since I need to sort them and make them unique, I don't want to mess it in those situations

#

but I assume that's what you mean by preprocessing it

wooden sail Feb 3, 2023, 9:22 AM

#

yep

#

you can sort by time stamp if you have that

#

and discard points with the same coords if they have consecutive time stamps

fringe bay Feb 3, 2023, 9:24 AM

#

if I have for eg consecutive longitude of 45.4 let's say, I have 3 consceutive ones, I keep only one

#

I would need to remove the latitude from the other list as well, from the same positions of those 2 removed from longitude

#

I'm just thinking out load, lol, please correct me if I'm wrong

wooden sail Feb 3, 2023, 9:24 AM

#

that sounds right

#

but only if the latitudes are also repeated

fringe bay Feb 3, 2023, 9:26 AM

#

wooden sail but only if the latitudes are also repeated

if they are not repeated, and I don't remove them, that would make the lists length different, wouldn't that be an issue ?

wooden sail Feb 3, 2023, 9:26 AM

#

fringe bay if they are not repeated, and I don't remove them, that would make the lists len...

a (long, lat) pair defines a single point

#

you either remove both or neither

#

but a point is only equal to another if both lat and long are the same

fringe bay Feb 3, 2023, 9:27 AM

#

I agree

wooden sail Feb 3, 2023, 9:27 AM

#

ah, i'm dumb i see what you mean. you wanna avoid a multivalued point

#

yes, you're right

fringe bay Feb 3, 2023, 9:28 AM

#

no worries, this is the part that was unclear to me too

wooden sail Feb 3, 2023, 9:28 AM

#

so yeah, if an "x coordinate" is repeated in consecutive points, you have to pick one. keeping the first is the easiest

#

tensorflow and other tools like (e.g. pytorch) it are necessary in machine learning because of a handful of reasons:

they construct a computation graph
that graph allows the computation to be optimized, making it faster and use less memory
it also allows automatic differentiation, so it can do most of the math for you
they use extremely efficient math libraries
you can use accelerating hardware without changing much of your code at all
they make parallelization over devices very easy

you can do most of this stuff by hand too, but the libs make it automatic. it's not realistic to do these things yourself for large scale models, but it's fine for small ones

fringe bay Feb 3, 2023, 9:31 AM

#

wooden sail so yeah, if an "x coordinate" is repeated in consecutive points, you have to pic...

I'll give it a shot

#

let's say this might work, this gets me to the 2nd question
once I have the spline, how can split it into let's say 30 equal points / distances and get the points ?

wooden sail Feb 3, 2023, 9:34 AM

#

what the splines give you is the coefficients of polynomials

#

once you have something of the form f(x) = a_nx^n + ... + a_0 x^0, you can evaluate this at whatever value of x you want

fringe bay Feb 3, 2023, 9:37 AM

#

that makes sense
appreciate it @wooden sail !

steady basalt Feb 3, 2023, 9:56 AM

#

i just saw some guy build a website in 15 minutes using chat gpt.....

#

tf does this mean for alot of coding jobs

#

barrier to entry falls, salary plummets

scarlet zealot Feb 3, 2023, 11:14 AM

#

Hello. What statistical tests can we use to find the relationship of one stock to another? For example, if I have a Stock A (say Apple) and a Stock B (say Microsoft), I’m looking to test if the movement of Apple has any impact of the movement on Microsoft.

#

I was just thinking a Pearson correlation between the change in daily / weekly or monthly prices between the 2 stocks.

eternal hull Feb 3, 2023, 11:44 AM

#

Hi. How do I learn to do eda

eternal hull Feb 3, 2023, 12:25 PM

#

I have to build a retention model can anyone help me

#

I don't know how to begin

celest vine Feb 3, 2023, 1:26 PM

#

Today I developed my first Machine learning model and got an accuracy of 90%. Am I a data scientist now?

wooden sail Feb 3, 2023, 1:27 PM

#

yes, you will receive your degree by mail in the next two weeks.

hoary wigeon Feb 3, 2023, 1:42 PM

#

Hi guys, I need help in time series..

In my problem there is a trend which repeats after every quarter. There's peak on first day of quarter and decreases as keep moving close to quarter end.

But, after every year the height of peak keeps decreasing in the actual data. But, I'm not able to catch that in my model. So, can anyone say me what could be wrong in modeling?

vz8SExMBABkZGdi5cyf27duH0aNHIyoqChcvXkRQUJDTaySi7sG3jYiIiEhROPNCREREisLwQkRERIrC8EJERESKwvBCREREisLwQkRERIrC8EJERESKwvBCREREisLwQkRERIrC8EJERESKwvBCREREisLwQkRERIrC8EJERESK8j9zBybmkY7ehAAAAABJRU5ErkJggg.png

celest vine Feb 3, 2023, 1:47 PM

#

wooden sail yes, you will receive your degree by mail in the next two weeks.

😂 It was fun though. Still long way to go and so much to learn

versed gulch Feb 3, 2023, 1:50 PM

#

does anyone know how to extract a smaller random 3D array from a larger 3D array (in an image sense)
i.e. extracting a (242, 256, 256) array from a (242, 512, 512) image?

wooden sail Feb 3, 2023, 2:03 PM

#

you can generate index arrays and use them to index the large image. the catch is that they need the right shape!

#

i.e., bigarray[:, v[:, np newaxis], u[np.newaxis, :]]

#

generate v and u however you like, e.g. by doing a random permutation of arange(512) followed by [0:252]

versed gulch Feb 3, 2023, 2:08 PM

#

im not sure about the 2nd two indexes,of what they mean

wooden sail Feb 3, 2023, 2:08 PM

#

hmm?

versed gulch Feb 3, 2023, 2:09 PM

#

with v and u

wooden sail Feb 3, 2023, 2:09 PM

#

they are arrays of indices, where i used np.newaxis to add extra dimensions

#

you can use reshape if you prefer

versed gulch Feb 3, 2023, 2:09 PM

#

wooden sail they are arrays of indices, where i used np.newaxis to add extra dimensions

but im only taking a sub-block out of a larger block

wooden sail Feb 3, 2023, 2:10 PM

#

that's a special case of the solution i gave you

#

u and v are just np.arange(start, stop)[...] in that case

#

you originally said a random 3d array, which is why i wrote the example that way

versed gulch Feb 3, 2023, 2:12 PM

#

wooden sail you originally said a random 3d array, which is why i wrote the example that way

ok would this also be a solution pick 2 integers randomly between 0 and 512 such that their difference is 256 and use these as the indices or is this basically what you said?

wooden sail Feb 3, 2023, 2:18 PM

#

versed gulch ok would this also be a solution pick 2 integers randomly between 0 and 512 such...

since you want them to be contiguous, all you need to pick is the starting index and then add an offset to pick the end of the slice

#

in your case, it only makes sense to pick random values from 0 to 252, and the slice goes up to that value plus 252

versed gulch Feb 3, 2023, 2:19 PM

#

wooden sail in your case, it only makes sense to pick random values from 0 to 252, and the s...

why 252?

wooden sail Feb 3, 2023, 2:19 PM

#

because you said you want the images to have size 252

versed gulch Feb 3, 2023, 2:20 PM

#

242

#

but okay may be mistake on both sides

wooden sail Feb 3, 2023, 2:20 PM

#

ah, 242 then lol, i misread

versed gulch Feb 3, 2023, 2:20 PM

#

242 is the number of slices and 256 are the height and width

wooden sail Feb 3, 2023, 2:21 PM

#

ah, then i meant 256, not 242

versed gulch Feb 3, 2023, 2:21 PM

#

so i want the number of slices fixed but changing the height and width accordingly to the sub block extracted

wooden sail Feb 3, 2023, 2:21 PM

#

the 242 doesn't matter here since you're selecting from all 242 slices, that'll just be a :

#

hmm i'm not sure i get your definition of subblock, that's not a well-defined term

queen cradle Feb 3, 2023, 2:23 PM

#

fringe bay hey guys I have a list of gps coordinates. They are plotted in this image. The c...

You need to distinguish between "interpolation splines" and "smoothing splines". The former are easier to implement and are widely available (e.g., in SciPy). Interpolation splines are guaranteed to pass through your data points. Smoothing splines are not guaranteed to pass through your data points; their use is to smooth out noisy data. Generally, in applications where there's noise involved, smoothing splines will give better results than interpolation splines. However, if the amount of noise is pretty low, then they're not necessary. They might be useful for you if there is too much noise in your samples and interpolation splines give you bad results.

Also, to avoid issues with duplication, I suggest making your data points 3-dimensional: (Lat, long, time). Make your splines with all three; if you only care about (lat, long), then just drop time after making the splines.

wooden sail Feb 3, 2023, 2:23 PM

#

do you want a block of size 242 x 256 x 256, or a block whose shape depends on the index

versed gulch Feb 3, 2023, 2:23 PM

#

wooden sail the 242 doesn't matter here since you're selecting from all 242 slices, that'll ...

so a smaller 3D array from a larger 3d array

wooden sail Feb 3, 2023, 2:23 PM

#

yes, but what size

#

earlier you gave a fixed size, but now you said the height and width change

versed gulch Feb 3, 2023, 2:24 PM

#

wooden sail do you want a block of size 242 x 256 x 256, or a block whose shape depends on t...

the 242 x 256 x 256 (within an array of 242 x 512 x 512)

wooden sail Feb 3, 2023, 2:24 PM

#

ok, so the size is fixed

versed gulch Feb 3, 2023, 2:24 PM

#

wooden sail earlier you gave a fixed size, but now you said the height and width change

in case I want to change h, w later

wooden sail Feb 3, 2023, 2:24 PM

#

all right

#

so you can draw the indices as random integers between 0 and 512 - 256

versed gulch Feb 3, 2023, 2:25 PM

#

wooden sail ok, so the size is fixed

size of the larger array is fixed

wooden sail Feb 3, 2023, 2:25 PM

#

fixed h and w for all slices

#

we can just formulate it in terms of h and w

versed gulch Feb 3, 2023, 2:27 PM

#

wooden sail so you can draw the indices as random integers between 0 and 512 - 256

yes but wouldn't this only search for the top half of the image as we'll only be getting indices between 0 - 256?

wooden sail Feb 3, 2023, 2:27 PM

#

as starting points, yes

#

but you need to define a slice

#

if you pick a value larger than 252 as the starting slice, you'll get index out of bounds

queen cradle Feb 3, 2023, 2:28 PM

#

scarlet zealot Hello. What statistical tests can we use to find the relationship of one stock t...

This is a question of determining the covariance of the two stock prices. It's been extensively studied in mathematical finance. You can find plenty of information if you Google "stock covariance". If you want more sophisticated techniques, there is a series of really good papers by Ledoit and Wolf on shrinkage estimation of covariance matrices and their applications to finance. (Ledoit's website will come up if you Google him, and all the papers are there.)

wooden sail Feb 3, 2023, 2:28 PM

#

picking 252 gives you the lower right quadrant of the image

versed gulch Feb 3, 2023, 2:29 PM

#

wooden sail if you pick a value larger than 252 as the starting slice, you'll get index out ...

true

versed gulch Feb 3, 2023, 2:29 PM

#

wooden sail picking 252 gives you the lower right quadrant of the image

is it not upper left *256?

wooden sail Feb 3, 2023, 2:29 PM

#

256*, i keep mixing it up

#

lemme write it more generally

versed gulch Feb 3, 2023, 2:31 PM

#

unless we also do random values of 1 and -1 for the lower half

wooden sail Feb 3, 2023, 2:32 PM

#

!e

import numpy as np

x = np.arange(75).reshape(3,5,5)
print(x)

h = 2 #desired height
w = 3 #desired width

row = np.random.randint(low=0, high=x.shape[1]-h+1) #pick a random row
col = np.random.randint(low=0, high=x.shape[2]-w+1) #pick a random col

y = x[:, row:row+h, col:col+w]
print("")
print(y)

arctic wedgeBOT Feb 3, 2023, 2:32 PM

#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[[ 0  1  2  3  4]
002 |   [ 5  6  7  8  9]
003 |   [10 11 12 13 14]
004 |   [15 16 17 18 19]
005 |   [20 21 22 23 24]]
006 | 
007 |  [[25 26 27 28 29]
008 |   [30 31 32 33 34]
009 |   [35 36 37 38 39]
010 |   [40 41 42 43 44]
011 |   [45 46 47 48 49]]
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/anatokohiv.txt?noredirect

wooden sail Feb 3, 2023, 2:33 PM

#

this does what you want in general for any h and w. just replace x with whatever 3D data you have

#

careful with the indices, i may or may not be off by 1. i think this is right though, running it several times hasn't given any errors and does seem to go all the way to the end of the array

queen cradle Feb 3, 2023, 2:39 PM

#

hoary wigeon Hi guys, I need help in time series.. In my problem there is a trend which repe...

Here's an idea; I don't know if it will work. There's a pretty obvious downward trend among the peaks; but the troughs are all about the same height of zero. So maybe a good model would have some kind of steadily decreasing multiplicative factor. And maybe that suggests that you might have some luck fitting the logarithm of the data. (or probably log(1 + y) if y is your data). Like I said, I don't know if it will work, but it might be worth a try.

versed gulch Feb 3, 2023, 2:43 PM

#

wooden sail this does what you want in general for any h and w. just replace x with whatever...

thanks ill test it out

wooden sail Feb 3, 2023, 2:48 PM

#

i think i've seen you ask indexing questions before, but nevertheless, this is a good place to check it out https://numpy.org/doc/stable/user/basics.indexing.html

#

https://numpy.org/doc/stable/user/basics.broadcasting.html

versed gulch Feb 3, 2023, 2:49 PM

#

wooden sail i think i've seen you ask indexing questions before, but nevertheless, this is a...

Yh I have but on some different problems, I get the whole problem in general, just have some issues gluing the ideas together

versed gulch Feb 3, 2023, 2:53 PM

#

wooden sail careful with the indices, i may or may not be off by 1. i think this is right th...

I think +1 would give index 256 which may throw an error

wooden sail Feb 3, 2023, 2:58 PM

#

let's see. your array has size 512, and we want a window of size 256. 512 - 256 is 256, and the +1 leaves us at 257. then randint generates a number between 0 and 257 exclusive, so in the interval [0,256]. in the worst case scenario, we draw 256. then the slice is 256:256+256 = 256:512. but the upper limit of the slice is also exclusive, so that gives the indices from 256 to 511

#

i think it should be ok, or?

lapis sequoia Feb 3, 2023, 4:15 PM

#

does anyone know how to do perform a fuzzymatch with a 90% threshold using pandas?

prime hearth Feb 3, 2023, 4:44 PM

#

Hello, I have a dataset with 700 features. I did PCA and got around 428 features, but i dont think I can intepret 428 features

#

whats a good amount of components to have with PCA? Like should i have 2 up to 10 so it easily intepretable?

#

actually now that i think about it- my features are uncorrelated- im doing NLP where most of features are individuals words

wooden sail Feb 3, 2023, 4:59 PM

#

what do you mean by "interpret"? if your data is inherently high dimensional, there's not much you can do about it

prime hearth Feb 3, 2023, 5:05 PM

#

@wooden sail oh okay, and yeah i mean like to know what i should label those new components or features made by the PCA

#

but i just realized my data has no correlation between variables

#

so i dont think PCA will help much

vast lintel Feb 3, 2023, 5:23 PM

#

If I wanted to use a neural network to take relatively simple user input (say, describes the results of some analysis, perhaps it describes the parameters of a simulation and how some variables relate to one another) and segments the user input into categories on which it expands on for you (with relatively meaning word filler), would I use something like a transformer to do that?

I have been looking at something like BERT but I have no experience with this type of thing.

fringe bay Feb 3, 2023, 5:43 PM

#

@wooden sail I managed to get the spline working with your help and some from stackoverflow
Now that I have the spline, discarding the initial coordinates which generated the line, it would look like the 2nd pic
based on the code snippet how would you get the 9 points from the spline with same distance between them (same distance in terms of distance as it would be visible with the eye on the plot) pic3 [I tried lol] ?

wooden sail Feb 3, 2023, 5:47 PM

#

there should be a way to get the polynomial coefficients from the spline function you used

fringe bay Feb 3, 2023, 5:48 PM

#

that would have something to do with tckp, u or xnew,ynew,znew ?

wooden sail Feb 3, 2023, 5:49 PM

#

ah, on the line you do xnew, etc

#

change the linspace

#

set it to 9 steps and you're done

#

you have 400 atm

#

also plot with "*" instead of "r" or "bo"

fringe bay Feb 3, 2023, 5:52 PM

#

made changes to linspace from 400 to 9 and changed plot

#

this looks a bit strange though, the initial one looks good
the thing is, now that I have the exact spline, I just need to grab the points, don't need to change the spline anymore

wooden sail Feb 3, 2023, 5:53 PM

#

use * instead of r on the second plotting line

#

you can keep bo for the original points

fringe bay Feb 3, 2023, 5:56 PM

#

there are a couple new ones, but not really filling the gaps

wooden sail Feb 3, 2023, 5:57 PM

#

you're getting there, but sadly i have to get going. i'm sure someone else will pick it up where we left off

fringe bay Feb 3, 2023, 5:57 PM

#

wooden sail you're getting there, but sadly i have to get going. i'm sure someone else will ...

no worries, and appreciate it once more

lapis sequoia Feb 3, 2023, 6:20 PM

#

does anyone know how to do perform a fuzzymatch with a 90% threshold using pandas?

prime hearth Feb 3, 2023, 8:50 PM

#

hello, i would like to please ask- should it be a red flag if my trainning accuracy for KNN is 75% and validation is around the same?

I do expect some drop in accuracy, like for SVM i got 0.9 for trainning but 0.73 for testing.

charred light Feb 3, 2023, 9:01 PM

#

prime hearth hello, i would like to please ask- should it be a red flag if my trainning accur...

Not really. Best way would be to see how well it does on the testing set.

prime hearth Feb 3, 2023, 9:01 PM

#

i did train_test_split and thats what i got, is this what you mean? sorry

#

also i did cross validaiton and thats what i got too

charred light Feb 3, 2023, 9:04 PM

#

prime hearth i did train_test_split and thats what i got, is this what you mean? sorry

Often "Test set" and "validation set" gets muddled together. In this case, test set is data that the model has never seen. So after you split, if you never used the test data set during your modeling, then yes.

prime hearth Feb 3, 2023, 9:05 PM

#

oh okay thanks. Im still a beginner with NLP, and i noticed my model performs really well when i have 1200 features (where each features is an indidiual word)

#

is this okay? Someone told me as long as the features make sense and im not overfitting

charred light Feb 3, 2023, 9:05 PM

#

Here's a better explanation. https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set

The reason I bring up the test set, is because the validation accuracy only gets you so far. The real test (pun not intended) is when you use the model for what it was originally designed for.

charred light Feb 3, 2023, 9:06 PM

#

prime hearth is this okay? Someone told me as long as the features make sense and im not over...

I haven't really worked on NLP models in depth so I can't give you an accurate answer on that.

prime hearth Feb 3, 2023, 9:08 PM

#

oh okay thanks. Im aware of more features mean less accuracy, and i tried PCA but PCA doesnt improve the accuracy of my model. The only thing i plan now is to do feature selection and remove features that are not meaningful and see the accuracy again

#

I dont have any more data, i would have to webscrap - is this consider validation set or just use real world inputs?

charred light Feb 3, 2023, 9:08 PM

#

prime hearth oh okay thanks. Im aware of more features mean less accuracy, and i tried PCA bu...

Yea, that falls into bias-variance trade-off.

charred light Feb 3, 2023, 9:09 PM

#

prime hearth I dont have any more data, i would have to webscrap - is this consider validatio...

If your original data set came from scraping, than additional scrapes would be considered as real world data.

prime hearth Feb 3, 2023, 9:09 PM

#

yeah my data is like mostly from an API and webscraped data

#

okay thanks.

tropic niche Feb 3, 2023, 10:09 PM

#

I'm trying to parse through data collected from an OCR using pytesseract. The data is pretty simple and comes out quite clean. It is tabular data, and I need an easy and reliable way of classifying the data back into rows and columns. So far I have been taking the x_pos and y_pos for each word, but there is variability in that so I find myself trying to manage that with if statements and what not. I was thinking, is it feasible to use nearest-neighbor to classify the data into rows and then columns?

lapis sequoia Feb 3, 2023, 10:14 PM

#

Anyone have incorporated Ai with hacking?

sick fern Feb 3, 2023, 10:51 PM

#

Hey guys

#

I want to create a program that converts python code to c++ and whatever