#data-science-and-ml

1 messages · Page 405 of 1

wooden sail
#

use this as a poor guess of the covariance matrix, and include that in the cost function

tidal bough
#

hmm, I'm not sure how to do it. Covariance matrix of what, for one? The deviations of individual samples?

#

ah, if I understand correctly, you're basically saying the variance depends on p here - higher close to the center, smaller at the edges. So, diagonal covariance matrix but not an identity one.

wooden sail
#

yep, that's exactly what i mean

#

the least squares function you proposed is optimal when the deviation from the mean follows an IID normal distribution

tidal bough
#

my naive idea would be to do something like (((pred - Y)**2)/pred).mean() to attempt to fix that - to make samples far from the center have more weight. This essentially assumes that the variance is proportional to the value here

wooden sail
#

that works too, sorta like the noise-based rescaling in wiener filters

#

this looks like a fun sunday afternoon task 😛

tidal bough
#

really shitty way I've been thinking of: sample some random values according to this probability distribution. Calculate the mean and std of these samples. Use them. 🥴

wooden sail
#

that can be done... but it's kinda like solving another copy of this exact same problem at every single point on the graph

#

nothing wrong with that ofc, it's a montecarlo approach

tidal bough
#

ah right, of course, I didn't think of it that way (that's it's just the monte-carlo solution to this).

wooden sail
#

i kinda wanna try it out now 😛 care to share the data? x and y axes

mild dirge
#

Im planning on training a multi-label classifier for some project. The goal is that I have a classifier that can tell if one or multiple specific letters are in an image. Could I still use only training images with a single letter in the image?

#

I also have images of bigrams and trigrams, but the problem would be that all images would have very different shapes. Would a solution to this be padding uni-grams and bi-grams with white space on the left and right before classification?

wooden forge
wooden sail
#

yeah, you indeed. idk if i'll have a chance now, but i can play around with it at some point soonish i think. maybe tomorrow

#

or i can assign it to my students and see what they come up with haha

wooden forge
#

would you rather have the numpy file directly or the github repo with all the functions?

#

it's basically for my internship at uni

wooden sail
#

just the numpy file, some npy file with the vectors inside

arctic wedgeBOT
#

Hey @wooden forge!

It looks like you tried to attach file type(s) that we do not allow (.npy). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

wooden forge
#

:|

#

you know what

#

Imma put them in my repo

#

and share you the link

#

Also Edd, you can use the different scripts to generate Gaussian distributions

#

you can also play with the initial State, this one is a simple Dirac spike, but you can put whatever you want

#

for the spin case, Psi is a ket with a Dirac distribution and a 0 array, here again you can put whatever you want

#

I'm kinda busy rn so I can't generate some array for you

#

you can also play with the beta parameters, that just allows you to compute an average value

wooden sail
#

if you could generate it and upload it later on, that'd be best. i'm familiar, but not fluent in this notation 😛

wooden forge
#

just remind me later in the day

mild dirge
#

What is wrong with my pd dataframe here?

import pandas as pd

file_path = 'data/ngrams_frequencies_withNames.xlsx'
df = pd.read_excel(file_path)
print(df.head(), '\n')

print(df['Frequencies'].head())
#

Do the hebrew characters somehow flip the stuff? haha

wooden forge
#

when the helper needs help you know it's a hard question

#

maybe it's the utf thing ?

#

you have to precise there are hebrew characters somehow ?

mild dirge
#

Yeah I'm just dropping them since the information is also supplied by the names column, but definitely weird

serene scaffold
#

@mild dirge did you try making a copy and filling the names column with Latin letters? Or some other LtR character s?

mild dirge
#

No, but after removing them it displays it as normal

serene scaffold
#

Welp

mild dirge
#

👍🏽

serene scaffold
#

I blame Zig and Scaleios.

rose agate
mild dirge
#

Well they are hebrew characters, and it's a school assignment to make the ocr and segmentation etc. our selves

rose agate
#

tesseract says it supports hebrew but if you have do it yourself that makes sense

mild dirge
#

It's also really old text, and I'm making a multi-label classifier for better image segmentation, so I need more than just the most probable letter

manic zenith
#

Hi, can someone help me interpret the results of my VECM estimation? I have a hard time understanding the coefficients that I get!

lone yacht
#

Hello again everyone. I have been training a CNN in PyTorch to classify images from the CIFAR10 dataset, with the stipulation that it only be three hidden layers. With help from @mild dirge I changed the architecture to be 2 convolutional layers and a fully connected linear layer, leading to much stabler results.

#

While tuning hyperparameters, I noticed that there continued to be a gap between the training and validation data; the former achieved accuracies of 70-80% while the latter languished around 45-50% (picture 1 below). After implementing weight decay of 0.01 in the optimiser, I significantly closed the gap, but with the result that the training accuracy itself now achieves only around 45% (picture 2 below). Does anyone have any suggestions to what I could tinker with to improve the accuracies while keeping the training-validation gap small?

The hyperparameters are:

  • batch size of 8
  • optimiser using SGD with a l.r. of 0.001, momentum of 0.9
  • cross entropy loss function
#

If you need any more information, please let me know :)

steady basalt
#

Incase anyone’s wondering, After Steve Is a good read so far (1/7th through)

wooden forge
# lone yacht

for a moment I thought you were doing what I am doing and I got so confused

#

and then I read the legend lol

mild dirge
# lone yacht

It's probably important to note that your goal shouldn't be to get a small gap between the train/valid accuracy, the second graph isn't better than the first imo

#

The first just shows over-fitting, and the second shows that the model is not even able to correctly predict the patterns it is trained on, thus maybe even underfitting

#

What size images do you have?

#

@lone yacht

lone yacht
#

Hi @mild dirge , the images are 32x32 pixels with 3 channels for RGB

mild dirge
#

A problem could be that the receptive field of the final convolutional layer is too small to learn more general patterns

#

Currently it is 5x5 for each feature

#

You could try experiment with different kernel sizes, and maybe dilation to increase this receptive field, that might help depending on your data

lone yacht
scenic tulip
#

Hello all, I'm working on a NN that can predict the next dataset from a trend. As of now I can gather the differences from one array to another, the problem is what would my inputs be as they would be different each time and how could I use that to help determine outputs?

#

This is what I have so far and it works, but I'm uncertain how to store the "trend" data.

arctic wedgeBOT
#

Hey @scenic tulip!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

scenic tulip
#

So I have lists in a file each containing 20 elements, which are integers. I need to be able to find the difference of the first 2, take that set of values and apply it against the next list....so on and so forth until there are no more lists to process

fervent vale
#

Hello guys, I have some data with a ranking of X names each name assigned with a score. The ranking is performed according to the score going from the highest to the lowest score. I want to perform some clustering to see if there are some clusters in the dataset between the names. Is k-means approach the best way to do this ?

steady basalt
#

Oh a neural network

#

I don’t think a neural network would be required for that

scenic tulip
#

linear

steady basalt
#

What’s the NN for

scenic tulip
#

to predict the trend based on all previous results

steady basalt
#

How big is this dataset?

#

Is each feature a cumulative sum of the previous ?

#

Or it’s just independently made from two of its own columns

scenic tulip
#

ok so im trying to predict Keno, so i have all the previous keno drawings

steady basalt
#

What is keno

scenic tulip
#

ive done simple maths on the entire dataset

#

Keno is a lottery game

steady basalt
#

If ur trying to predict the lottery 😂

scenic tulip
#

but it doesn't get picked by floating balls that get shat out tubes, it uses PRNG

steady basalt
#

Ohh

scenic tulip
#

I tried to find the seed that matched the first lottery drawing

#

after 4.29 billion combos still nothing

steady basalt
#

Sounds hopeless

#

How many features do u have

scenic tulip
#

so i don't think i can find the mersinne value of an array lol, or else id find them all

steady basalt
#

One for each week results?

scenic tulip
#

no each day

steady basalt
#

By mersenne you’re referencing the twister?

scenic tulip
#

yes, because PRNG is actually not random. so if i could find the MT values of the drawings, perhaps i could find a correlation between the PRNG idk lol

steady basalt
#

Wait what is a mersenne number again

scenic tulip
#

ive looked into bit shifting, which in my case id have to reverse the mersinne twister. that's fine with single integers

#

the mersinne number is what is used to generate the random number

steady basalt
#

Interesting topic though

scenic tulip
#

so if i seed the PRNG with the value of 1 and run it, it will produce a result, if i close and restart it produces the same result

#

if you run that same value of 1 in a loop of x, you get different results because they bit shift the seed value to produce a new result

steady basalt
#

Ahhh mersenne is 2n-1

scenic tulip
#

that being said, i do think its quite impossible so im trying to go back through and find the trend of all the data

steady basalt
#

Wouldn’t you need like

#

Infinite drawings

scenic tulip
#

problem is im not sure if it's accumulating the data right

#

well kinda LOL

steady basalt
#

There’s prob only been a few thousand winning combos so far

scenic tulip
#

but there are a finite amount of drawings

steady basalt
#

I do not think that’s enough

scenic tulip
#

hmmm

steady basalt
#

Even if u had 100,000 drawings

#

Just a load of pseudo random draws

#

What’s the plan?

scenic tulip
#

ill show you what i tried before

#

from random import randrange, sample, shuffle
import numpy as np
import time as t

ra = []
fa = []
temp = 0.0
inf = float('inf')
counter = 0
actualcount = 0
dummyarr = np.array([ 1, 2, 18, 19, 22, 24, 25, 35, 41, 44, 45, 50, 52, 54, 55, 59, 67, 68, 70, 74])

while np.array_equiv(dummyarr, ra) != True:
actualcount += 1
np.random.seed(counter)
#ra = np.random.choice(range(1, 80), size=20, replace=False)
ra = np.array(sample(range(1, 81), k=20))
fa = np.sort(ra)
ra = fa
print(ra)
if actualcount <= inf:
counter += 1
temp = inf
print(ra)
print("Count")
print(counter)
print("ActualCount")
print(actualcount)
print("Infinity = {}".format(temp))

#

so in that i had a dummy array that i tried to find the seed value for. that failed after i ran out of range on 64 bit which is 4.29 billion combos. i think theres like a trillion combos for keno, not sure

#

you like the actualcount <= inf ? 😂

steady basalt
#

are you trying to roll until you hit the dummy?

#

How long that that run for?

#

It’s not a trillion it’s 20^20 combos right?

#

Not possible unless you have nasa supercomputer from 100 years in the future

#

Even if u did manage to land on their combo that doesn’t tell u about any seed value and it doesn’t tell u if they use a constant value

#

Waste of time

scenic tulip
#

well there are 80 numbers

#

20 get picked each draw

#

yeah i was using different seed values to try to hit the dummy array, but yeah finding all those would be impossible

#

you get to pick up to 10 numbers. I just really want to see if they truly randomize the game with PRNG or if someone is behind the scenes inputting the numbers

#

i ran it for like 2 days and it hit 4.29 billion combinations before it ran out of memory for the counter

scenic tulip
#

I could figure out how they shifted the bits to arrive at that conclusion, basically deciphering the mersinne and reverse mersinne twistering all future drawings into the correct seed

fading geyser
#

hey guys... who all here know how to use pandas

#

i needed some help

serene scaffold
fading geyser
#

i'm trying to read a csv file into jupyter notebooks but i am getting an error

serene scaffold
fading geyser
#
table = pd.read_csv("wineReviews.csv")

le Code*

arctic wedgeBOT
#

Hey @fading geyser!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

#

Hey @fading geyser!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

serene scaffold
#

read the message

fading geyser
#

it's too lon

#

long

serene scaffold
#

yes, so read the message and it tells you what to do.

arctic wedgeBOT
#

Hey @fading geyser!

It looks like you tried to attach file type(s) that we do not allow (.docx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
#

you have to use the paste bin

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

timid narwhal
#

does anyone know a good way to open .fst files inside of python to be used with pandas?

fading geyser
#

try this

fading geyser
serene scaffold
# fading geyser yeah just did it

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 10: invalid start byte -- so the problem is that the file encoding for the csv file is not utf-8

fading geyser
#

and how to change it

serene scaffold
#

try pd.read_csv("wineReviews.csv", encoding='ascii')

fading geyser
#

ohk

#

File "<ipython-input-9-b6ef65161787>", line 1
table = pd.read_csv("wineReviews.csv" encoding='ascii')
^
SyntaxError: invalid syntax

serene scaffold
#

add the comma

fading geyser
#

this error is showing

serene scaffold
#

I have to leave soon, btw. did you add the comma and re-run?

fading geyser
fading geyser
fading geyser
serene scaffold
#

try pd.read_csv("wineReviews.csv", encoding_errors='ignore')

fading geyser
#

another error

serene scaffold
#

something about the way it's encoded is unexpected.

fading geyser
#

this is the file

serene scaffold
#

why does it look like that?

fading geyser
#

no idea

serene scaffold
#

how did you get it?

fading geyser
#

an open dataset

serene scaffold
#

it must be corrupted in some way. because csv files are supposed to be human readable

#

like, it will literally just be comma-separated values

fading geyser
#

ok

#

i'll seeit tmrw now ig.... its 12 am here

serene scaffold
#

night night

fading geyser
#

night

steady basalt
#

I’ll take 1%

fervent vale
#

Hello, I have some data with a ranking of X names each name assigned with a score. The ranking is performed according to the score going from the highest to the lowest score. I want to perform some clustering to see if there are some clusters in the dataset between the names. Is k-means approach the best way to do this ?

serene scaffold
fervent vale
#

Or should I lower the number of variables ?

thin palm
#

if we have a correlation that shows -.7 between two features, my intuition would be to drop it. What's everyone else's thoughts on this?

mild dirge
#

correlation of -0.7, what does that mean you think?

#

correlation of 0 means no correlation, 1 means a lot, -1 means a lot (but in negative direction)

#

so when one is higher, the other is lower and vice versa

#

Doesn't sound like useless information

#

and even then, no correlation between a feature and the desired output does not mean the feature cannot be useful in combination with other features

serene scaffold
thin palm
fervent vale
#

Does thats makes sense to you

#

Ok dropping rankings w/ an absolute correlation of 0.7 makes sense

#

And above*

mild dirge
#

yeah, so if you want to use those grades to cluster the people with different sets of skills that would work

fervent vale
#

OK

mild dirge
#

It wouldn't really give a label for each cluster though

#

it would jsut show a cluster for people with certain grades for certain subjects, but it won't say something like "scientific people"

#

But maybe you find a cluster of people who are really good at science subjects, and that they are commonly bad at languages f.e.

fervent vale
#

Thats perfect, I can interpret each category by myself, is kmeans the proper approach to do this ?

mild dirge
#

Maybe check this out

#

there is more than kmeans, some work better for specific settings

#

The interpretation of the clusters would mainly be the mean of the cluster (so the average grades for the courses), the spread and the amount of points in a cluster

fervent vale
#

Perfect

#

Thanks

#

Do you have some idea about which approach would be the most adapted to the problem at first sight

#

From your experience

#

?

mild dirge
#

absolutely not, I don;'t have your data 😛

#

Might be good to try visualize your data in some way, and get to understand it such that you can make an informed decision yourself

#

the link I just gave also shows what usecases each method has, try to see what your situation is with respect to those usecases

fervent vale
#

Very good advice

#

I will do that

thin palm
#

Does anybody know how to interpret this? It's telling me yearsExpereince, milesFromMetropolis, and degree have the least benefits? no way

stoic viper
#

Hey, i have a project for university. Its the prediction of energy usage of electric busses.
I decided to ujse XGBoost and after all i get an r2_score of 0.991. Im really surpürised cause that felt really easy and now im worried something ist right.

#

I mean i put such low effort into it.

#

Is going for a lower r2score better sometimes?

mild dirge
thin palm
thin palm
mild dirge
#

Must be something that went wrong right?

#

Nah doubt it

#

how is it scored anyways?

thin palm
stoic viper
#

thats surprisingly good for that values lol

mild dirge
#

can you try scoring='accuracy'

#

?

#

or does it take really long to run?

thin palm
mild dirge
#

That way we at least know what the score means

#

not sure what the default is

thin palm
mild dirge
#

appareantly "If None, the estimator’s default scorer is used"

#

didn't know the model would have an associated score

thin palm
#

got an error?
ValueError: Classification metrics can't handle a mix of multiclass and continuous targets

stoic viper
#

you try to put regression on classification

#

if om not mistaken

thin palm
#

ahh we can't use accuracy for regression metrics

stoic viper
#

yeah

mild dirge
#

But isn't the problem a regression problem?

stoic viper
#

mean squared error

#

pls

mild dirge
#

yeah but how did you get this accuracy?

thin palm
mild dirge
#

alright, try ‘r2’

stoic viper
#

or neg mean squared

thin palm
#

okay running now

mild dirge
#

and what kinda values is your y data?

#

like on average

thin palm
mild dirge
#

alright, so nothing over a few million then I assume

thin palm
#

mostly int64 of numbers ranging from 50k to 300k

stoic viper
#

thats a hell lot of money

thin palm
#

surgeons lmao

mild dirge
#

rubles*

thin palm
#

haha USD

stoic viper
#

then those numbers are in cents?

thin palm
#

no it's like this for example 120, 200, 45, etc

mild dirge
#

what is the NONE feature btw?

thin palm
mild dirge
#

ah right

#

so it's a 1 or 0?

#

and what is the model?

thin palm
#

so we want to predict salaries, degree obtained is with ordinal encoding- 0(HS) 1(bachelores) etc

#

LinearRegression

#

score came back: 0.7434167004759114

thin palm
mild dirge
#

yeah thought if it was a nn, it might just only focus on the job, as that already says quite a lot about the salary

#

How did you normalize the years experience?

#

or did you not?

thin palm
#

I normalized by numerical features which was years experience and miles From Metropolis

#

Also I had a column of companyId which was 63 unique values that I OHE.. thought about dropping this as we have a industry feature that may also tell us about the companies and thus reduce our features by 63 columns

mild dirge
#

Well I mean, that is what we are trying to find out with this permutation right

#

Maybe there is one or two companies that pay their employees like 10x more than others, and it would still be good to keep those in

thin palm
fading gate
#

anyone here use hyperopt or some kind of DEOptim library? I'm not sure at least with hyperopt how to batch runs together, i.e, have it return a set of 50 or so samples that I can run at a time then return all 50 results before it iterates the next batch of parameters

tired fog
#

anyone know how complicated it is to convert a large matlab script to python?

#

not counting specific, just general similarity between syntaxes

mild dirge
#

I mean it heavily depends on what kind of code you want to convert

#

sometimes there are high level functions that are similar in both languages

#

If you want to plot something using matplotlib f.e. then obviously things are going to be quite similar etc.

#

I think i'm pretty comfortable with python, but having a hard time getting used to matlab, so to me the languages feel quite different

tired fog
#

gotcha

#

im much more familiar with computational computing with matalb

#

but i have an equation in the project im working on that wont plot in matlab, so i had to plot it in wolfram mathematica and generate a matrix of the values to import into matlab for a specific example

#

so im just trying to find a better solution, as i need my project to be able to run self sufficiently

lapis sequoia
#

do u know any trained cnn for object salient detection?

jade fjord
#

I'm kinda new with AI, but I made this program that used a dataset of types of brain tumors and used tensorflow to make a neural network which evaluates the accuracy it can distinguish the types of brain tumors, I was wondering how this could actually be used? Like what's the logic of using your own data entry of a made up brain tumor data and inputting it in this newly built AI?

serene scaffold
#

Also I'm confused. Did you create a model that does this, or are you wanting to make one?

jade fjord
#

I made one, but its with a csv file with data not actual images

#

such as width, height, depth, color, of certain examinations

#

and i ran it and it gave me a overall accuracy of around 97%

jade fjord
#

i can show the program its pretty small

#

`import pandas as pd
import tensorflow as tf

dataset = pd.read_csv('cancer.csv')

x = dataset.drop(columns=["diagnosis(1=m, 0=b)"])
y = dataset["diagnosis(1=m, 0=b)"]

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(256, input_shape=x_train.shape[1:], activation='sigmoid'))
model.add(tf.keras.layers.Dense(256, activation='sigmoid'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=1000)

model.evaluate(x_test, y_test)`

tacit basin
jade fjord
#

Like put it into practice

#

like if i ever wanted to make a UI and have some enter their own data

#

whats the next step after training a model and putting it into practice

tacit basin
#

You would need an input data in the same form as training data, without target of course. Then you can make prediction

jade fjord
#

so another csv file with inputted data

#

is there a way it can show me the value 0 or 1 for each entry that it predicts

tacit basin
jade fjord
#

well the tumor is either malignant or benign so its set m as 0 and b as 1

#

so i meant show either as a 0 or 1

#

i can show one sec

tacit basin
#

Yeah, if that's the target then you will get such output on predict too.

#

Just note that if your dataset is not balanced, then you 97% accuracy score could mean nothing if there are a lot more 1s than zeros

#

Say you have 97% of ones in your set. Then if you always predict 1 then you are always have 97% accuracy. But you can always predict 1 without ML ;)

jade fjord
#

oh ya lol theres a lot more 0's below idk why theres a bunch of 1's in the beginning

untold smelt
#

could anyone take a look at my A* algorithm and tell me what im doing wrong

lapis sequoia
#

help pls

dapper axle
#

This may seem like a dumb question, But if im trying to get into data science and machine learning; Where is a good place to start or learn?

vernal solstice
#

hello

#

is this overfitting or underfitting?

#

im kinda confused

thin palm
#

I have 1 million lines of data... any advice on how to optimize this for training and fitting a model? My random forest regression takes soooooooooo long

#

still need to test other models as well

tacit basin
thin palm
#

Take a small sample instead of running experiments, feature engineering, and training baseline models on all the data. Typically, 10–20% is enough. Here is how it is done in pandas:

sample_df.shape
(191583, 120)```
tacit basin
#

Also valid option

thin palm
tacit basin
#

Just need to be sure the sample of the data is valid representation of the full dataset or even better production data

tacit basin
#

Do eda on the data and compare it to sample

thin palm
tacit basin
#

How do you train? What hparams?

thin palm
#

For time sake I may not do that, because it's for job interview assignment I'm sure it's a cool technique I'll show them. They gave me 1 million lines of data so that could've been a tester

tacit basin
#

1 million rows should not be that painful

thin palm
#

But why is it??

tacit basin
#

Unless you have millions of features?

thin palm
#

I'm testing 1 million lines of data and 27 features

#

I assumed that's what's taking it so long?

tacit basin
#

Depends of machine as well

thin palm
#

@tacit basin Does this graph make sense to you for important features for determining your salary??

#

2021 Mac 16inch... Fairly good computer

#

really would've thought miles From Metropolis would've been least concern lol

tacit basin
#

Don't know not a salary expert. But would love to learn outcome of your study to optimize my salary :)

thin palm
hollow flare
#

I want roadmap for data analyst

errant fern
#

hey how can i apply kfold with MLPRegressor ?

#

i have 2 outputs and 7 input dimensions

#

`from pandas import read_csv
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, cross_val_score, validation_curve,cross_val_predict
import numpy as np
import matplotlib.pyplot as plt

inputs = df_xg_nu[feat]
targets = df_xg_nu[['home_team_goal','away_team_goal']]

x_train, x_test, y_train, y_test = train_test_split(
inputs, targets, test_size=0.25, random_state=2)

rate1 = 0.005
rate2 = 0.1

mlpr = MLPRegressor(hidden_layer_sizes=(12,10), max_iter=700, learning_rate_init=rate1)

scores = cross_val_predict(mlpr, inputs, targets, cv=5)
print(scores)`

#

I'm trying to fit elo scores against soccer match scores .

#

feat=['elo_offensive_1', 'elo_defensive_1', 'elo_home_offensive_1', 'elo_home_defensive_1', 'elo_offensive_2', 'elo_defensive_2', 'elo_away_offensive_2', 'elo_away_defensive_2','homey']

carmine silo
#

hi, i am currently researching and making a script to autoplay my game, now i want to add a command line at line 23 so that it can recognize that the match has been matched earlier than expected. initial opinion and continue to execute the next commands in the event, how to do it?

steady basalt
steady basalt
compact rose
#

Hello guys, i'm currently working on Pyspark. I have a question about i can do a thing. I want to eliminate rows based on column values. I have two features (Home,HomeVariant) and i want to drop if both are positive in the same row. How do i do this?

#

I wanted to use something like this : " If ViewHome = 1 & ViewHomeVariant = 1 | then drop

rose agate
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lean cave until <t:1653305221:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

dusty valve
#

how do i load data from a BatchDataSet tf?

fleet plover
arctic wedgeBOT
#

gdas.py line 396

self.nodes[n-1].connections[ni].forward(x, types=types)  # Ltrain(w±, alpha)```
verbal hedge
#

Hello everyone

#

Please I need a little assistance in credit risk modeling in python. Link to some learning resources will do.

serene scaffold
compact rose
#

Does anyone know why it appear as null?

jade chasm
#

Does anyone know how to solve the cuda discrepancy between these two envs? I need cuda working on my yolo env as well for training some models

blissful locust
#

What all math concepts do I need to learn (like matrix) to learn ML using Scikit learn?

vital ruin
#

Hello! I am wanting to use the numba njit decorator in one of my classes but only if the import is successful. I was able to hand the import with:
with contextlib.suppress(Exception): from numba import jit
But I can't figure out how to only use the decorator if jit was imported. The only way I can get it to work is pull it out of the class and use a try and except block on a normal function and call it from the class but thats not very clean and i end up writing the function twice

serene scaffold
#

if you have jit = lambda func: func then using @jit as a decorator will have no effect

vital ruin
#

Ya I was able to handle the import just fine but if I add
@jit def func(self): ...
it will obviously error if I run it outside of my venv without numba installed

serene scaffold
#

simply suppressing the potential import error eliminates your only opportunity to know that the import failed

vital ruin
#

Ya I see what you are saying that I was just hoping for a way to do something like
try: @jit except: pass def func.....

#

inside the class

serene scaffold
vital ruin
#

Yep which sucks. I will probably just abstract two versions or use an interface or factory instead. Thanks for your help!

serene scaffold
#

you don't want to use my solution?

#

here's another option

HAS_NUMBA = True
try:
    from numba import jit
except ImportError:
    HAS_NUMBA = False

...

def func(a, b):
    ...

if HAS_NUMBA:
    func = jit(func)
vital ruin
#

Oh.... I understand what you mean now. I haven't used jit before now so I didn't look into if you could use it as not a decorator. My bad

#

Their overview just shows it being used as a decorator

serene scaffold
vital ruin
#

Got ya thank you!

serene scaffold
#

when you use a decorator, the function you're decorating gets passed to the decorator, which is also a function. and then whatever the decorator returns gets re-bound to the original name of the function

#

it's not even required that the decorator return a different function. or that it return a function

#

it can return whatever it wants

vital ruin
#

If I am passing kwargs in the decorator currently nopython=True I assume I can just add those in in the decorate func call?

#

Is there a standard for the kwarg for the function I am passing it?

serene scaffold
#
def func(a, b):
    ...

func = decorator(x, y, z)(func)

meaning that this is the semantics.

vital ruin
#

ok cool! Thanks!

serene scaffold
#

welcome to #decorators-and-ai

thin palm
#

any ideas on dropping the highly negative correlations? I'm afraid of dropping NONE because it's used up as the top correlation

steady basalt
#

I feel like feature selection is such a forced process in many projects

#

Which model will you use

#

Or will you choose best performing

thin palm
thin palm
steady basalt
#

Why don’t u run them all and choose best

#

It takes like 10 lines of code

#

Use a box plot

thin palm
thin palm
steady basalt
#

U can do it in one cell

thin palm
steady basalt
#

Run all three default models and output a plot of each average score over 10 folds

#

Ez

#

But u shud seriously consider adding other models to this plot

thin palm
steady basalt
#

Do 5+ I’d say

#

I think xgb has regressive

thin palm
steady basalt
#

Output a box plot of each models cross fold scores

#

Cross validation

thin palm
steady basalt
#

Then consider taking the highest accuracy model

#

Unless you will prioritise AUC and ignore accuracy

thin palm
#

so I'm gonna take the RMSE of each model and compare them

steady basalt
#

Oh of course it isn’t classification

#

My bad

#

Ur predicting salary

thin palm
#

no worries

#

yes

steady basalt
#

Fair

thin palm
#

but they do ask for another metric so i'm gonna see what other good approaches are

#

maybe accuracy

steady basalt
#

That’s not rly gona work?

thin palm
#

why's that

steady basalt
#

You’re not going to be using true positive style metrics

#

Error is perfect

thin palm
#

oh true true

#

so what other measure?

steady basalt
#

do MSR RMSE uhh

#

MSE*

#

There’s also another error

#

MAR

#

Mae

thin palm
#

oh true true

steady basalt
#

I think rmse is popular

thin palm
#

How do I solve this question?
9. a) Please estimate the RMSE that your model will achieve on the test dataset.

mild dirge
#

without using the test dataset?

#

otherwise it wouldn't be an estimate right?

thin palm
#

I mean the question is a bit confusing to me haha

steady basalt
#

Yeah just estimate rmse post tuning?

#

It will spit out a result

#

But it’s gonna overfit

#

Bit odd

mild dirge
#

You'd probably need to use some validation dataset if you want to estimate your model's performance on the test data

#

Otherwise you can't really estimate what the performance would be on the test data, as you only have the training data available

thin palm
mild dirge
#

right but that would just straight up give you the rmse, not some estimation of it

thin palm
#

I thought in order to get an estimate was with the X_train,y_train, x_test, y_test and we would use the x_test and y_test to get an estimate

mild dirge
#

Maybe they just want to have the rmse of the test data

thin palm
#

a) Please estimate the RMSE that your model will achieve on the test dataset.
b) How did you create this estimate?

mild dirge
#

you can just give the model the test input, compare the output with the desired output, which gives the actual rmse

#

but they want an estimate

thin palm
#

interesting

mild dirge
#

so a naive solution would be to give the rmse on your training data as an estimate of the rmse on your test data

thin palm
#

right

mild dirge
#

but this would be bad, as it will perform better on the data it is trained on than new data

mild dirge
#

so you want some new data, that it is not trained on

thin palm
#

hmm

mild dirge
#

i.e. validation data

thin palm
#

so then take the rmse of the test data?

mild dirge
#

no

#

that's not an estimate

#

that is the rmse on the test data

thin palm
mild dirge
#

you want to give an as close as possible estimate of your model's performance on the test data, without using the test data

#

Like I said, some data that the model is not trained on, as this would give a biased performance result

#

so Validation data

thin palm
mild dirge
#

if you haven't left out any validation data, you'd have to retrain your model on less data, and test it out on this validation data that you left out

#

or cross validation could work too

thin palm
#

perfect, well I made a .7 train and .3 test

#

so technically I did this

mild dirge
#

But then it would also be bad if you already used this data for tuning your parameters

thin palm
#

from sklearn.ensemble import RandomForestRegressor

instantiate model

rf = RandomForestRegressor()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1)

examine scores

cv_scores = cross_validate(rf,X_train,y_train, cv=5)

#
rf.fit(X_train, y_train)
rf.score(X_test, y_test)
y_pred = rf.predict(X_test)


from sklearn import metrics
# results of MAE
print(metrics.mean_absolute_error(y_test, y_pred))
# print results of MSE
print(metrics.mean_squared_error(y_test, y_pred))
# print results of RMSE
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))```
mild dirge
#

So don't use the test data

#

Read my messages again if you don't understand it atm

thin palm
#

But now I'm just confused because how the heck do you find out the RMSE without using test data?????

#

and if I use my actualy test.csv file that's not estimating

mild dirge
#

you don't find the rmse, that is not what they want

mild dirge
thin palm
#

okay right

mild dirge
#

do you have test data beyond this scope?

thin palm
#

to be used on my model

mild dirge
#

okay, and then you have some other file, that you split into train and test?

thin palm
mild dirge
#

So basically have 2 test sets, one is the actual test set, and one that you created

thin palm
#

yes

mild dirge
#

Okay, so one would be called the validation set normally

#

you have training data/validation data (both from 1 file), and then test data

#

From this you use only the training data for tuning the model, and training it

#

this can be done with k-fold cross validation

#

Then on the validation data you get an estimate of your model's rmse on new data

#

This is then also an estimate of your model's rmse on the test data, as this validation set is data that is also never shown to your model, yet you don't use the actual test data for this estimation

thin palm
#

ok ok got it man, thank you very much!

steady basalt
#

Take 20%

mild dirge
steady basalt
mild dirge
#

and then you ofc wouldn't use the test set from the "training data" for training

mild dirge
#

but an estimate implies that they don't want you to use the test set

steady basalt
#

I think it does

#

In the end it’s always an estimate

mild dirge
#

then it would be the actual rmse

#

not if you specify that it is on the test data

#

it would be an estimate of the rmse on new data

steady basalt
#

Why would you use validation set

mild dirge
#

but it would be the rmse on the test data

steady basalt
#

I cross validate on training set

mild dirge
#

Because you want an unbiased estimate, so you can't use the data you use for tuning params

thin palm
#

I thought after cross validation this gives you an idea of what your model score is. Once I do that I FIT my model and then score it on x_test and y_test

#

thus giving me my RMSE Scores

steady basalt
#

U don’t use training anyway so what’s the difference

#

Oh

#

I’m not sure how much it impacts the final resul

mild dirge
#

Here's my professional diagram

#

Maybe this clarifies a bit of the confusion

stark breach
#

Hey need help on a project

#

Is anyone ready to help .?

#

It’s based on linear regression

mild dirge
#

Maybe if you ask your question

stark breach
#

Yeah so I built this model on a car resale dataset and I just want to decrease the bic

#

Also my model has like 30 features and if. I reduce them my efficiency falls

#

Should I share the file ?

steady basalt
mild dirge
#

yeah, but 2 test sets

steady basalt
#

I Didn’t bother before

mild dirge
#

well you'd need an extra one if you'd want an estimate of the rmse on the test.csv data, without using test.csv

#

but normally you wouldn't

#

But maybe they just mean the actual rmse of the model on the test.csv data, it would be kinda unclear

robust jungle
#

this line is giving me an error:

outputs=(layers.Dense(output_nodes, activation=tf.nn.softmax))

code block:

def create_model(input_shape, output_nodes):
    model = keras.Model(
        (layers.Conv2D(filters=64, kernel_size=3, input_shape=input_shape[1:], activation='relu')),
        (layers.MaxPooling2D(2, 2)),
        (layers.Conv2D(filters=64, kernel_size=3, input_shape=input_shape[1:], activation='relu')),
        (layers.MaxPooling2D(2, 2)),
        (layers.Conv2D(filters=64, kernel_size=3, input_shape=input_shape[1:], activation='relu')),
        (layers.MaxPooling2D(2, 2)),
        (layers.Flatten()),
        (layers.Dense(64)),
        outputs=(layers.Dense(output_nodes, activation=tf.nn.softmax)),
        inputs=(tf.keras.Input(shape=(input_shape))),
    )
    return(model)

error:

TypeError: __init__() got multiple values for argument 'outputs'

first tensorflow project not directly using a tutorial, so apologies if it's a bit scuffed

mild dirge
#

the problem is in how you created the class instance

#

you supplied too many arguments for a single parameter

robust jungle
#

because tf.keras.Input(shape=(input_shape)) is in parentheses

#

so if it was to return multiple values

#

wouldn't they still be put into a single tuple?

mild dirge
#

(input_shape) is not a tuple btw

robust jungle
#

ah, what is it

mild dirge
#

it evaluates to input_shape

#

(input_shape,) is a tuple

robust jungle
#

input_shape is a variable I have defined

#

it's a tuple

mild dirge
#

why are you doing shape=(input_shape) then?

#

why the brackets

robust jungle
#

no idea, probably did it because it looks better imo

#

deleted it

#

(the parentheses I mean)

#

but yeah specifically looking at this line:

outputs=(layers.Dense(output_nodes, activation=tf.nn.softmax))
#

print returns a single object as expected

mild dirge
#

this eror normally comes up when you supply the argument as a positional argument, and then as a keyword argument

#

!e

def my_func(key, *args):
  print("blabla")

my_func('red', key='blue')
arctic wedgeBOT
#

@mild dirge :x: Your eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 4, in <module>
003 | TypeError: my_func() got multiple values for argument 'key'
mild dirge
#

like this

robust jungle
#

alright, thanks

mild dirge
#

So the problem is likely not that line, but one of the others

#

I'm more familiar with pytorch, so don't know what arguments it takes :/

robust jungle
#

it's fine, you seem to have found the issue

#

I appreciate it

tardy pelican
#

Hey, got this error, my drivers are updated, version is compatible

#

Used command from pytorch website to install pytorch

woven coral
#

whats wrong???

thin palm
#

Does this make sense for feature importances in a linear model for coef??

steady basalt
#

No

#

Have you tried RFE

thin palm
#

so this graph for feature importance is wrong correct?

thin palm
#

@steady basalt how about now

mild dirge
#

seem like very normal scores again 👀

thin palm
mild dirge
#

features that change your score by 100,000,000,000 seems pretty bad

thin palm
#

it's just in my dang linear model

mild dirge
#

There might be some other underlying issue

#

just seems weird

thin palm
#

but what could the underlying issue be?

#

Maybe this is an issue of over fitting?

mild dirge
#

No I doubt it, those values don't really make sense, why would permutating any of your values result in values that are 10^11 lower than what they were

thin palm
#

I really don’t know because I’m going back through my process and everything seems to make sense

#

@mild dirge I’ve never seen such high values before in coefficient so I’m like what the heck

mild dirge
#

There seem to be some more normally valued features

#

maybe try to find the difference between these and the 10^11 score affecting features

thin palm
#

hmmm, just for time sake I may not include this aspect and only show feature importance of my random forest which makes more sense lol

mild dirge
#

What are you doing all this for btw?

#

some job application right?

scenic tulip
#

Hey guys I had a quick question. Working on gathering data to put into my neural network. I have files, containing lists of 20 elements. I want to keep track of the "trend" in the data, so if i had 2 arrays 1.) [ 1, 2 ] 2.) [ 2, 3] the trend would be [ +1, +1 ]. For some reason I can't seem to store the result to add to the next 2 arrays differences. Here's my code.

#

I guess I'm asking. How do you store the difference of the current 2 arrays im using, then apply it to the next 2 arrays difference?

mild dirge
#

Not sure what you mean exactly, but for time dependent data some initial approach would be rolling regression

#

@scenic tulip

scenic tulip
#

I'm not even to applying it to the NN yet

thin palm
scenic tulip
#

If you see my code, you'll see what im trying to do.

mild dirge
#

So you have multiple arrays like:

[1, 2, 3]
[4, 6, 2]
[1, 9, 2]

and then you want difference arrays:

[3, 4, -1]
[-3, 3, 0]

?

mild dirge
scenic tulip
#

Yeah, i've accomplished getting the difference of the arrays, but how to do hold that value and then apply it to the next two arrays that get differenced

mild dirge
#

"apply it to the next two arrays"

#

you have an array with differences, and two arrays with values, what do you mean apply

scenic tulip
#

[1, 2, 3]
[4, 6, 2]
[1, 9, 2]

#

ok 123 is first array

#

difference would be like you stated 3, 4, -1

#

between array 1 and array 2

#

now

#

I have that difference, how could i keep that data going by grabbing 4,6,2 and 1, 9 , 2s difference and adding it to the first difference

mild dirge
#

So let me first show you how to make the differences a lot easier

#

and then i'll show you how to easily do that second part

wooden sail
#

you can rewrite it as a multiplication by a single row vector from the left

mild dirge
#

!e

import numpy as np
values = np.array([
  [1, 2, 3],
  [4, 6, 2],
  [1, 9, 2]
])

differences = np.array([arr_b-arr_a for arr_a, arr_b in zip(values, values[1:])])
print(differences)
sum_of_differences = np.sum(differences, axis=0)
print(sum_of_differences)
#

oops 1 sec

wooden sail
#

should be the same as [-1,0,1] multiplied from the left

arctic wedgeBOT
#

@mild dirge :white_check_mark: Your eval job has completed with return code 0.

001 | [[ 3  4 -1]
002 |  [-3  3  0]]
003 | [ 0  7 -1]
mild dirge
#

like this?

#

so 0 7 -1 would be the sum of the differences*

wooden sail
#
In [1]: import numpy as np
   ...: values = np.array([
   ...:   [1, 2, 3],
   ...:   [4, 6, 2],
   ...:   [1, 9, 2]
   ...: ])

In [2]: diffs = np.array([[-1,1,0],[0,-1,1]])

In [3]: diffs.dot(values)
Out[3]: 
array([[ 3,  4, -1],
       [-3,  3,  0]])

In [4]: adder = np.array([1,1])

In [5]: adder.dot(diffs.dot(values))
Out[5]: array([ 0,  7, -1])

In [7]: adder_diff = adder.dot(diffs)

In [8]: adder_diff
Out[8]: array([-1,  0,  1])

In [9]: adder_diff.dot(values)
Out[9]: array([ 0,  7, -1])
#

as corroboration. all you need to do is multiply the vector [-1,0,1] to get the sum of the differences. if you want to apply this to N matrices of values at the same time, concatenate them along the columns axis. the result will be a vector of size 3*N of differences

scenic tulip
#

hey im catching up on what you guys said, my neighbor needed help. sorry one sec

#

oh, wow that is cool never thought to use dot product. I was so caught up in handling the specific arrays rather than the entire dataset at one time.

#

so since im dealing with lists i should convert them to np arrays?

mild dirge
#

yeah both our methods use numpy arrays

scenic tulip
#

oh i guess i have to to use dot produ

#

ct

mild dirge
#

edd uses some mathematical tricks, I mostly use python tricks

scenic tulip
#

yeah im still new ish to python so, id rather learn those hehe

mild dirge
#

you can pick whichever one you find more intuitive

scenic tulip
#

thank you all much. I'll try some things and get back to ya with (hopefully) good results

wooden sail
#
In [10]: values2 = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [11]: vals_concat = np.concatenate((values,values2), axis=1)

In [12]: vals_concat
Out[12]: 
array([[1, 2, 3, 1, 2, 3],
       [4, 6, 2, 4, 5, 6],
       [1, 9, 2, 7, 8, 9]])

In [13]: adder_diff.dot(vals_concat)
Out[13]: array([ 0,  7, -1,  6,  6,  6])

to finish the example

#

and indeed, however you like. numpy as fast though, and it should scale really well with this type of operation

scenic tulip
#

to be honest the scope of my project is to take in Keno Drawings and identify a trend in the drawings since the beginning. then use that data in my NN

steady basalt
wooden sail
#

the trick is only that the operation you wanted can be written as a linear combination, and matrix multiplication is commutative. so the whole thing can be done in a single product

scenic tulip
#

to predict future trends

mild dirge
#

If you want to predict future trends, you don't need to calculate all the differences

#

if you train a rolling regression model it will already make some prediction on future values if you feed it new data

scenic tulip
#

im going back from the very beginning of when the drawings started

steady basalt
#

Could a neural network predict the universe; can I predict the euro millions

scenic tulip
#

i mean, the same array of 20 PRNG elements should never come up again in our life times so. That's one point, but I've found the mean, median, modes variance and std deviation of all drawing s that exist

#

i think if i could see how the drawings fluctuate and correlate that with all the other simple maths i could have something

steady basalt
#

I think you’d need to get an opinion from the Japanese dude who invented the twister you need some serious math expertise

scenic tulip
#

LOL yeah

steady basalt
#

To rule out if this is egen feasabile

#

I don’t think it is

#

How do you know that they use a random seed

scenic tulip
#

nothing is impossible

steady basalt
#

The same one each time

scenic tulip
#

oh ive moved beyond that lol

steady basalt
#

What other approach can you do now

#

There’s nothing

#

It’s akin to predicting the future to the highest accuracy

scenic tulip
#

one thing i have noticed from data that ive gathered is most drawings actually average out to between 35 and 46

#

so i started trying to generate random arrays of 20 elements that fit that average range

steady basalt
#

Are you a mathematician

#

Ur gona need a team

#

Of top tier researchers

scenic tulip
#

im gonna need divine intervention

steady basalt
#

And millions of dollars

scenic tulip
#

yeah

#

well

steady basalt
#

Even still it isn’t possible

scenic tulip
#

no not millions, just will power

#

well, my NN should be able to do that work of mathmaticians if i can find the right data and plug it in

steady basalt
#

No you’d need the worlds best statisticians and mathematicians

#

To know what to do

#

What are you trying to do it?

scenic tulip
steady basalt
#

Did you make it

scenic tulip
#

I did

#

its been months in the making

#

if you run get keno numbers prepare to wait, there's like 1.6 million drawings now of 20 numbers

steady basalt
#

I don’t really understand how one calculation can spit out the future values

scenic tulip
#

its a combination of multiple calculations, gathering all of them, and figuring out how to plug them into the NN

steady basalt
#

Explain it

scenic tulip
#

explain what?

#

how the process im using works?

steady basalt
#

What are the calcuLtions

scenic tulip
#

Mean, median, mode, variance, and std deviation

steady basalt
#

How can they predict a future array

scenic tulip
#

for 1, no drawing will ever appear twice

steady basalt
#

Really?

scenic tulip
#

yep

steady basalt
#

Rules?

scenic tulip
#

well, not for quite a long time

#

80 numbers

#

20 get selected at random

#

you get to pick up to 10 numbers

#

numbers being 1 - 80

steady basalt
#

I want to know how summary statistics of draws can predict future draws

#

Because when it happens next time it’s random

scenic tulip
#

correct, but if you had a '1' show up 3 times in a row, i would bet it wont next time?

steady basalt
#

What’s the prize for the winning draw

scenic tulip
#

matching 10 numbers is 110k

steady basalt
#

Is this a popular lottery

scenic tulip
#

it is

steady basalt
#

I think if you managed to cheat every time you’d be given a Nobel prize

#

No joke

iron basalt
steady basalt
#

I don’t think it’s mathematically possible

scenic tulip
#

im actually not trying to cheat it. i want to see if they control the random factor or if it's actually random

steady basalt
#

If it’s using mersenne twister it’s randomised well enough

#

You just can’t predict that for 20 number array matching

scenic tulip
#

because the notion of their being 80 numbers to randomly select that you selected 1 of them 3 times in a row isn't probable that it will be selected again

#

mersenne twisters are psuedo random, not random

steady basalt
#

😅

#

I’m gona need some equations

scenic tulip
#

look at my repo

#

it's got all the simple maths in it

steady basalt
#

And yet each flip is independant!

scenic tulip
#

depends on how many flipls you are allowed

#

if you are allowed 4 flips?

iron basalt
#

If I flip a fair coin 3 times and get 3 heads, it does not mean that next must be a tails, nor a heads again. They are independent events.

steady basalt
#

Yes that’s true

scenic tulip
#

correct, but the odds are

steady basalt
#

But it’s annoying how chain flips are less likely

#

At the same time

scenic tulip
#

that it will be the other, the universe has to balance itself out at some point

iron basalt
#

It's a fallacy for a reason.

#

Often described as "the slot machine owes me".

steady basalt
#

You can try to probabilistically predict this lottery but I think it is a inhuman task unles you had entire research teams

scenic tulip
#

Keno is the only lottery in Ohio that is PRNG

iron basalt
#

PRNG huh.

scenic tulip
#

the rest get selected with the floating balls

iron basalt
#

Oh, well the floating balls are not PRNG.

steady basalt
#

Can anyone explain how odds change with prng

scenic tulip
#

yes it is PRNG, that's why initially i wanted to find the mersenne value of one of the drawings and compare it to the next

steady basalt
#

Compared to balls

wooden sail
#

if the repetition cycle of the prng is long, you'd need the statistics of hundreds of thousands of games in the past to find a pattern

scenic tulip
#

yes i know edd, i ran my code on a dummy array till ihit my 4.29 billion bit limit lol

wooden sail
#

a prng can still appear uniformly distributed over moderately long sample sizes

scenic tulip
#

2 ^^ 32

#

yeah, so if they used seed value 1, then i could find it if i itereated over it till i found it.

steady basalt
#

How?

#

Wait a second

#

Don’t you pay per entry

iron basalt
#

You know which PRNG method they are using?

wooden sail
#

what if they hit you with a 1-time pad and they change the seed each time

steady basalt
#

Per iteration

scenic tulip
#

I do knot

#

not

delicate apex
steady basalt
#

You’d need to pay 4 billion times?

scenic tulip
#

no LOL

iron basalt
#

Do you know if they change the seed every once in a while?

scenic tulip
#

I do not, i haven't found it yet

wooden sail
scenic tulip
#

yeah i tried to do bit shifting with the mersenne....kinda like a reverse mersenne twister

iron basalt
#

If they just random the seed, then there is no way. The seeds are often done with not PRNG.

scenic tulip
#

problem is you can only do that with single int values

iron basalt
#

But if they kept using the same one for a long time, they might be dumb. Lotteries in the past have done dumber things.

scenic tulip
#

i speculated that they used system time as a seed too

steady basalt
#

@scenic tulip considered paying star and math PhDs to help?

scenic tulip
#

lol no i want to do this

delicate apex
# steady basalt Can you explain

i'm not familiar with the cracking process, but mersenne twister is a very common random number generator that's not very good at producing highly random numbers. this doesn't matter in quite a few fields, however, so it's still used just about everywhere.

scenic tulip
#

i sat at the bar and watched the drawings. when the current drawing ends a 2 minute timer starts. I ran my code that used system time as a seed and generated PRNG arrays up until 1 minute. I compared my results with the following drawing to see if a certain time within that range was being used. Some arrays matched alot, others not so much

#

it was difficult to tell if that was actually getting me anywhere other than more tipsy 😂

#

and i wanna be clear about this. lets say i found that mersenne value that could repeatedly win that would be "breaking" the game and is illegal. Im just trying to validate they are actually random and not done by someone after everyone has put their drawings in ....not trying to break laws lol

steady basalt
scenic tulip
#

that is the theory super

#

sorry probably a dumb idea but thank you all for helping anyways

steady basalt
#

can anyone help me with STATA

#

if theres any pro users

rugged falcon
#

is anyone here who has ever done something with CUDA (or more precisely cupy)?

serene scaffold
rugged falcon
#

ok

thin palm
#

How do I answer this question?
Please estimate the RMSE that your model will achieve on the test dataset.

rugged falcon
#

well: i was thinking about numpy/numba on very big images doing simple arithmethic operations is very slow

#

so when i have 20 cores

#

i can just use 5800 cores of my GPU

serene scaffold
#

yes, AI that involves images is often faster on a GPU

rugged falcon
#

so i looked into using CUDA and found cupy

serene scaffold
#

what GPU do you have

rugged falcon
#

i redid the exact same example this guy did

#

3070

#

and where he got a x10 increase in speed

#

i got 0.5x

#

so cupy is slower

serene scaffold
#

can you make a reproducible example, with every variable defined?

rugged falcon
#

u mean code example?

serene scaffold
#

yes. I happen to also have a 3070, so I can run it

rugged falcon
#

ok one sec

serene scaffold
#

but it has to be exact code that I can run without any changes

rugged falcon
#

well

#

i assume you have cupy?

serene scaffold
#

yes, anyone can download cupy at any time

rugged falcon
#

ok one sec

serene scaffold
#

remember to use markdown

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

rugged falcon
#

actually its not exactly the same as on website because he use 1000x1000x1000 float64 and this does not fit in our GPU

#

so we compare 950^3

#
import numpy as np
import cupy as cp
import time

shaper = 950
arrShape = (shaper,shaper,shaper)

#NUMPY
time0 = time.perf_counter()
arr = np.ones((arrShape))
time1 = time.perf_counter()
print(time1-time0)

#CUPY
time0 = time.perf_counter()
arr = cp.ones((arrShape))
time1 = time.perf_counter()
print(time1-time0)
serene scaffold
#

still working on it

rugged falcon
#

hm what u mean?

serene scaffold
#

the installation for cupy isn't as straightforward as intended

rugged falcon
#

take ur time all good. im just happy u help me xd

rugged falcon
#

x)

serene scaffold
harsh nexus
#

Anyone experienced with LDA? Got some issues with the topics (numbering?) and the words are not corresponding with the correct topic number. Using Genism

serene scaffold
#

I have to reboot my computer, apparently

#

woo I can import cupy now

harsh nexus
#

Sorry for interrupting :))

late peak
#

Hey I've got a Pandas question in #help-lemon if anybody's got some bandwidth

serene scaffold
#

@rugged falcon behold the results.

In [9]: import cupy

In [10]: %timeit cupy.random.random((500, 500, 500))
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached.
18.8 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: import numpy as np

In [12]: %timeit np.random.random((500, 500, 500))
601 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
rugged falcon
#

so how is that possible?

serene scaffold
rugged falcon
#

no

#

that is what i would expec tfrom a task that cuda is designed for

serene scaffold
#

so what is your question

rugged falcon
#

how come mines not working

serene scaffold
#

idk

rugged falcon
#

we have same Hardware

#

prolly same vram not that it matters with 500**3

#

how can i recreate what u did with %timeit

#

this should be equivalent to what you did @serene scaffold or am i missing something?

import time

import cupy
timecp0 = time.perf_counter()
cupy.random.random((500, 500, 500))
timecp1 = time.perf_counter()

import numpy as np
timenp0 = time.perf_counter()
np.random.random((500, 500, 500))
timenp1 = time.perf_counter()

print(timecp1-timecp0)
print(timenp1-timenp0)

0.186s for cupy
0.46s for numpy

so our numpy is similar but your cupy is 10x faster. what am i missing?

serene scaffold
#
python -m pip install IPython
python -m IPython
#

you can run that if you want to use IPython in a shell.

rugged falcon
#

but this has nothing to do with Ipython having no GIP right?

serene scaffold
#

IPython is just a console. the %timeit command will run a statement a bunch of times and report the average

#

which is more reliable than what you're doing.

rugged falcon
#

im so confused

#

!e
print(18.8*1e-6,"s")

arctic wedgeBOT
#

@rugged falcon :white_check_mark: Your eval job has completed with return code 0.

1.88e-05 s
rugged falcon
#

ok i implemented "run a statement a bunch of times"

#

i end up with 400µs, so yours is still 20x faster

rugged falcon
#

*C#

serene scaffold
#

oh

#

IPython is not ironpython, whatever that is

#

IPython is the whole name of it.

rugged falcon
#

oh

#

@serene scaffold

In [5]: %timeit cupy.random.random((500, 500, 500))
The slowest run took 8.97 times longer than the fastest. This could mean that an intermediate result is being cached.
14.3 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit np.random.random((500, 500, 500))
489 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
rugged falcon
#

so ur results are reproducabel

#

why would i ever not work in this IPython thing if its faster then without?

rugged falcon
serene scaffold
#

it's just telling you how fast it is.

rugged falcon
#

so if ipython is not making anything faster

#

then why time.perf_counter() has

#

how you say it

serene scaffold
#

(mean ± std. dev. of 7 runs, 1 loop each)
it did it seven times and reported the average

rugged falcon
#

"right to exist"?

serene scaffold
rugged falcon
#
for i in range(0,10):
    
    timecp0 = time.perf_counter()
    cp.random.random((500, 500, 500))
    timecp1 = time.perf_counter()


    
    timenp0 = time.perf_counter()
    np.random.random((500, 500, 500))
    timenp1 = time.perf_counter()

    print(timecp1-timecp0)
    print(timenp1-timenp0)

@serene scaffold wouldnt u agree it technically should do exactly the same (except summing/averaging)

serene scaffold
rugged falcon
#

from the doc it states:
4.665306263360271e-07 2,143,482
for resolution / tickrate

rugged falcon
serene scaffold
rugged falcon
#

ok i will google this behavior

#

one mor qeustion tho if you dont mind

#

can you think of a reason why the first execution is always so slow?

serene scaffold
#

it might be that cupy does additional startup when its used for the first time, rather than eagerly when it's first imported.

rugged falcon
#

hmm okay so some magic goign on like always

#

thanks for comparison/answer/help!

#

!close

#

dxd

serene scaffold
#

I'm doing another test.

#

this seems to support my theory

In [1]: import cupy

In [2]: _ = cupy.random.random((100, 100, 100))

In [3]: %timeit cupy.random.random((100, 100, 100))
57.9 µs ± 706 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
rugged falcon
#

i dont really believe it tbh

#

cupy advertises itself with several x up to 800x speed increases

#

and you get 16000x

serene scaffold
#

you have to look at what operations they did to measure that speedup

#

simply allocating an array is just one thing

rugged falcon
#

yes but look at what u did

#

random.random

serene scaffold
#

so?

rugged falcon
#

this implies all array entries being initialized as random value

serene scaffold
#

it doesn't imply anything. it just means that each element is filled as a random float between 0 and 1.

rugged falcon
#

yes

serene scaffold
#

that's not implied; it's the spec for that function

rugged falcon
#

yes yes sry english xd

#

i can assume that ones() is even easier because it skips RNG part?

#

or faster

serene scaffold
#

I guess

rugged falcon
#

because when i follow what u did (with not only doing it once, but doing it few times with %timeit

#

i get not 10x increase

#

but way better increase

serene scaffold
#

Idk if TDS has quality control

late peak
stone pollen
#

for a beginner starting out in python (kinda know the basics) and wanting to do AI,ML,DS in the future. what would you guys suggest as good beginner courses for that

barren wedge
#

How to make PyTorch faster?

harsh nexus
thin palm
#

If anyone has the time to glance over my deliverable for this job interview I am completing and give some feedback that would be great! I'm only 8 slides in but take a peak 🙂
https://docs.google.com/presentation/d/1ibyiIDu-b3k3y_yI4UwVdqK7I-4F2Kh5AFBUD74N5p4/edit#slide=id.g12e390617fa_0_811

paper trellis
#

whats a good method to automatically identify and remove columns with no change (from a csv file recording data for a long period of time that may have channels not hooked up to anything, but was never turned off)?

wooden sail
#

you could do a finite difference approximation of the derivative/gradient of the quantity you are measuring. if it is close enough to zero, you omit it

#

you'd probably want one that is accurate to 2nd order

barren wedge
desert bear
#

hi is there someone how knows what you can use to predict something in tensorflow lite. in tenserflow it is just perdict but in tflite you dont have that.

upper spindle
#

anyone here specialises in econometrics, specifically microeconometrics

dusty valve
#

getting this error while fitting a tf model ```
ValueError: Exception encountered when calling layer "lstm" (type LSTM).

slice index 0 of dimension 0 out of bounds. for '{{node strided_slice_1}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](transpose, strided_slice_1/stack, strided_slice_1/stack_1, strided_slice_1/stack_2)' with input shapes: [0,?,50], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

Call arguments received by layer "lstm" (type LSTM):
  • inputs=tf.Tensor(shape=(None, 0, 50), dtype=float32)
  • mask=None
  • training=True
  • initial_state=None```
#
# define model
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
model.fit(X, y, batch_size=128, epochs=100)```
shell panther
#

@ machine learning community
what do you prefer? reading research papers or watching YT Vids on those papers?

wooden sail
#

they can't achieve the same thing unless the youtube video is a recording of a talk or lecture that covers the paper in depth. you can watch videos while you have your coffee in the morning, but once you find an interesting idea, you need either a lecture on it or to read the full paper yourself if you wanna understand all of it

mild dirge
#

For pytorch how does one encode a target output for multi-label classification?

#

using multi-hot encoded rn like [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]

#

But getting this when trying to calculate some performance measure using the prediction and the label

manic elk
#

Hello, I believe this fits in data science. Could anyone help me with manipulating csv files? I'm trying to convert a file by deleting a column, copying 65,000 rows of data, and adding a header on top.

All of this should be done in a new file so to preserve the original file
It's trying to take a file that looks like the first image and convert it to a file that looks like the 2nd image. The header info is derived from the original file + a second file for Date, Time and VUnit.

I'm struggling trying to code this algorithm so I would appreciate any assistance 🙂 thank you so much!

manic elk
#

@serene scaffold I have very little experience with pandas unfortunately, could you drop some function suggestions or dataframe manipulation techniques to achieve this?

loud cove
#

What's happening here? why is the change type not work?

#

the dtypes returns float but it isn't as seen with info()

serene scaffold
loud cove
#

I thought it would, guess I'll have to reassign?

serene scaffold
serene scaffold
#

and drop

serene scaffold
# loud cove

you didn't chain the additional method calls onto read_csv, so nothing interesting happened.

loud cove
#

I don't understand, shouldn't df.astype({'Quantity': 'float64'}).dtypes just change it?

serene scaffold
#
df = pd.read_csv(...).astype({'Quantity': 'float64'})
#

this is what I mean by chaining.

loud cove
#

yes i understand

serene scaffold
#

you didn't do that.

loud cove
#

but how is this any different from doing it in another line with df?

serene scaffold
#

because if you don't display it or save it to a variable, it just creates a new object and immediately throws it away.

loud cove
#

nah

#

the issues was the last arguement as you said

#

the .dtypes jsut returns that when I don't care about it

serene scaffold
#

I don't think I can help with this. sorry.

loud cove
#

you already did, removing it fixed

#

dtypes seem to return the series,

manic elk
manic elk
#

@serene scaffold thank you so much

loud cove
#

I'm trying to group by columns (a list of columns other than Quantity) and get the sum of the quantity, but it doesn't seem to be working
df2 = df.groupby(columns).agg(Quantity = ('Quantity', 'sum'))
my other idea is just dropping duplicates then getting sum by order ID and merging.

desert oar
desert oar
# loud cove I'm trying to group by `columns` (a list of columns other than `Quantity`) and g...

!e ```python
import pandas as pd

data = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B', 'A', 'B'],
'Type': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
'Quantity': [3, 2, 4, 9, 4, 7],
})

columns = ['Category', 'Type']

result1 = data.groupby(columns).agg({'Quantity': 'sum'})

result2 = data.groupby(columns)[['Quantity']].sum()

result3 = data.groupby(columns)['Quantity'].sum()

print(result1)
print()
print(result2)
print()
print(result3)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |                Quantity
002 | Category Type          
003 | A        X            7
004 |          Y            2
005 | B        X            4
006 |          Y           16
007 | 
008 |                Quantity
009 | Category Type          
010 | A        X            7
011 |          Y            2
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ixepomulew.txt?noredirect

loud cove
#

this is what I ended up doing.

desert oar
#

something appears to be wrong with your data then

#

if you provide a sample of your actual data (use our paste site) then i can investigate further

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

loud cove
#

yeah that's why I posted, all columns are strings other than quantity.
the columns list have column names other than quantity.

desert oar
#

right, but without seeing the data you're forcing people to guess

loud cove
loud cove
# desert oar right, but without seeing the data you're forcing people to guess

only code i ran

columns = ["Sale Code",
"Order ID",
"Store Name",
"Player First Name",
"Player Last Name",
"Shipping First Name",
"Shipping Last Name",
"Shipping Address",
"Shipping City",
"Shipping State",
"Shipping Zip",
"Billing Phone",
"Billing Email"]
df = pd.read_csv("order_report.csv", usecols=(columns + ["Quantity"]), dtype=str).astype({'Quantity': 'float64'})
df.info()
desert oar
#

thanks for sharing the file

#

as a side note, you probably do not want to include sale code in the grouping columns... it looks like a unique row id

loud cove
#

it is here, but not always.

desert oar
#

what do you mean by that?

loud cove
#

in the current file, it is the same, but in the future there might be different ones.

desert oar
#

ok

#

and why are you casting quantity to float?

loud cove
#

ultimately, the reason im passing those columns is just because i just want the total for each order, the rest of the attributes will always be the same.

desert oar
#

oh, i see

loud cove
#

yea

desert oar
#

i will show you a tidier way to do that

loud cove
#

thanks, im also curious why that isn't working since i kept googling for quite a bit and just went with my idea of merging.

desert oar
#

i don't actually understand what this merge is supposed to be doing

#

why are you still using the tuple? i explained why that shouldn't work