wooden sail May 22, 2022, 10:37 AM

#

use this as a poor guess of the covariance matrix, and include that in the cost function

tidal bough May 22, 2022, 10:38 AM

#

hmm, I'm not sure how to do it. Covariance matrix of what, for one? The deviations of individual samples?

#

ah, if I understand correctly, you're basically saying the variance depends on p here - higher close to the center, smaller at the edges. So, diagonal covariance matrix but not an identity one.

wooden sail May 22, 2022, 10:39 AM

#

yep, that's exactly what i mean

#

the least squares function you proposed is optimal when the deviation from the mean follows an IID normal distribution

tidal bough May 22, 2022, 10:40 AM

#

my naive idea would be to do something like (((pred - Y)**2)/pred).mean() to attempt to fix that - to make samples far from the center have more weight. This essentially assumes that the variance is proportional to the value here

wooden sail May 22, 2022, 10:41 AM

#

that works too, sorta like the noise-based rescaling in wiener filters

#

this looks like a fun sunday afternoon task 😛

tidal bough May 22, 2022, 10:44 AM

#

really shitty way I've been thinking of: sample some random values according to this probability distribution. Calculate the mean and std of these samples. Use them. 🥴

wooden sail May 22, 2022, 10:45 AM

#

that can be done... but it's kinda like solving another copy of this exact same problem at every single point on the graph

#

nothing wrong with that ofc, it's a montecarlo approach

tidal bough May 22, 2022, 10:46 AM

#

ah right, of course, I didn't think of it that way (that's it's just the monte-carlo solution to this).

wooden sail May 22, 2022, 11:03 AM

#

i kinda wanna try it out now 😛 care to share the data? x and y axes

mild dirge May 22, 2022, 11:22 AM

#

Im planning on training a multi-label classifier for some project. The goal is that I have a classifier that can tell if one or multiple specific letters are in an image. Could I still use only training images with a single letter in the image?

#

I also have images of bigrams and trigrams, but the problem would be that all images would have very different shapes. Would a solution to this be padding uni-grams and bi-grams with white space on the left and right before classification?

wooden forge May 22, 2022, 11:31 AM

#

wooden sail i kinda wanna try it out now 😛 care to share the data? x and y axes

me ? sorry I just checked

wooden sail May 22, 2022, 11:33 AM

#

yeah, you indeed. idk if i'll have a chance now, but i can play around with it at some point soonish i think. maybe tomorrow

#

or i can assign it to my students and see what they come up with haha

wooden forge May 22, 2022, 11:34 AM

#

would you rather have the numpy file directly or the github repo with all the functions?

#

it's basically for my internship at uni

wooden forge May 22, 2022, 11:34 AM

#

wooden sail or i can assign it to my students and see what they come up with haha

hooo

wooden sail May 22, 2022, 11:35 AM

#

just the numpy file, some npy file with the vectors inside

arctic wedgeBOT May 22, 2022, 11:36 AM

#

Hey @wooden forge!

It looks like you tried to attach file type(s) that we do not allow (.npy). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

wooden forge May 22, 2022, 11:36 AM

#

:|

#

you know what

#

Imma put them in my repo

#

and share you the link

#

https://github.com/ChrisZeThird/Kicked-Rotor Quantum>Graphs>Spin @wooden sail

GitHub

GitHub - ChrisZeThird/Kicked-Rotor: Observation of generalized Ande...

Observation of generalized Anderson quantum phase transitions using a Kicked Rotor. - GitHub - ChrisZeThird/Kicked-Rotor: Observation of generalized Anderson quantum phase transitions using a Kicke...

#

Also Edd, you can use the different scripts to generate Gaussian distributions

#

Basically the Quantum Case (quantumKickedRotor.py)

#

you can also play with the initial State, this one is a simple Dirac spike, but you can put whatever you want

#

for the spin case, Psi is a ket with a Dirac distribution and a 0 array, here again you can put whatever you want

#

I'm kinda busy rn so I can't generate some array for you

#

you can also play with the beta parameters, that just allows you to compute an average value

wooden sail May 22, 2022, 11:44 AM

#

if you could generate it and upload it later on, that'd be best. i'm familiar, but not fluent in this notation 😛

wooden forge May 22, 2022, 11:44 AM

#

wooden sail if you could generate it and upload it later on, that'd be best. i'm familiar, b...

ho sure no problem

#

just remind me later in the day

mild dirge May 22, 2022, 11:46 AM

#

What is wrong with my pd dataframe here?

import pandas as pd

file_path = 'data/ngrams_frequencies_withNames.xlsx'
df = pd.read_excel(file_path)
print(df.head(), '\n')

print(df['Frequencies'].head())

#

#

Do the hebrew characters somehow flip the stuff? haha

wooden forge May 22, 2022, 11:49 AM

#

when the helper needs help you know it's a hard question

#

maybe it's the utf thing ?

#

you have to precise there are hebrew characters somehow ?

mild dirge May 22, 2022, 11:58 AM

#

Yeah I'm just dropping them since the information is also supplied by the names column, but definitely weird

serene scaffold May 22, 2022, 12:01 PM

#

@mild dirge did you try making a copy and filling the names column with Latin letters? Or some other LtR character s?

mild dirge May 22, 2022, 12:01 PM

#

No, but after removing them it displays it as normal

serene scaffold May 22, 2022, 12:01 PM

#

Welp

mild dirge May 22, 2022, 12:01 PM

#

👍🏽

serene scaffold May 22, 2022, 12:02 PM

#

I blame Zig and Scaleios.

rose agate May 22, 2022, 12:06 PM

#

mild dirge Im planning on training a multi-*label* classifier for some project. The goal is...

why do you need to train anything? can you use an existing trained Optical Character Recognition?

mild dirge May 22, 2022, 12:06 PM

#

Well they are hebrew characters, and it's a school assignment to make the ocr and segmentation etc. our selves

rose agate May 22, 2022, 12:07 PM

#

tesseract says it supports hebrew but if you have do it yourself that makes sense

mild dirge May 22, 2022, 12:18 PM

#

It's also really old text, and I'm making a multi-label classifier for better image segmentation, so I need more than just the most probable letter

manic zenith May 22, 2022, 1:21 PM

#

Hi, can someone help me interpret the results of my VECM estimation? I have a hard time understanding the coefficients that I get!

lone yacht May 22, 2022, 1:24 PM

#

Hello again everyone. I have been training a CNN in PyTorch to classify images from the CIFAR10 dataset, with the stipulation that it only be three hidden layers. With help from @mild dirge I changed the architecture to be 2 convolutional layers and a fully connected linear layer, leading to much stabler results.

#

While tuning hyperparameters, I noticed that there continued to be a gap between the training and validation data; the former achieved accuracies of 70-80% while the latter languished around 45-50% (picture 1 below). After implementing weight decay of 0.01 in the optimiser, I significantly closed the gap, but with the result that the training accuracy itself now achieves only around 45% (picture 2 below). Does anyone have any suggestions to what I could tinker with to improve the accuracies while keeping the training-validation gap small?

The hyperparameters are:

batch size of 8
optimiser using SGD with a l.r. of 0.001, momentum of 0.9
cross entropy loss function

#

#

acc_conv2_size3_stride2_batchsize8_weightdecay0.01.png

#

If you need any more information, please let me know :)

steady basalt May 22, 2022, 1:32 PM

#

Incase anyone’s wondering, After Steve Is a good read so far (1/7th through)

wooden forge May 22, 2022, 2:47 PM

#

lone yacht

for a moment I thought you were doing what I am doing and I got so confused

#

and then I read the legend lol

mild dirge May 22, 2022, 2:51 PM

#

lone yacht

It's probably important to note that your goal shouldn't be to get a small gap between the train/valid accuracy, the second graph isn't better than the first imo

#

The first just shows over-fitting, and the second shows that the model is not even able to correctly predict the patterns it is trained on, thus maybe even underfitting

#

What size images do you have?

#

@lone yacht

lone yacht May 22, 2022, 2:53 PM

#

Hi @mild dirge , the images are 32x32 pixels with 3 channels for RGB

mild dirge May 22, 2022, 2:54 PM

#

A problem could be that the receptive field of the final convolutional layer is too small to learn more general patterns

#

Currently it is 5x5 for each feature

#

#

https://theaisummer.com/receptive-field/

AI Summer

Understanding the receptive field of deep convolutional networks | ...

An intuitive guide on why it is important to inspect the receptive field, as well as how the receptive field affect the design choices of deep convolutional networks.

#

You could try experiment with different kernel sizes, and maybe dilation to increase this receptive field, that might help depending on your data

lone yacht May 22, 2022, 2:57 PM

#

mild dirge You could try experiment with different kernel sizes, and maybe dilation to incr...

Thanks Camel, I'll have a look at that 👍

scenic tulip May 22, 2022, 3:48 PM

#

Hello all, I'm working on a NN that can predict the next dataset from a trend. As of now I can gather the differences from one array to another, the problem is what would my inputs be as they would be different each time and how could I use that to help determine outputs?

#

This is what I have so far and it works, but I'm uncertain how to store the "trend" data.

arctic wedgeBOT May 22, 2022, 3:48 PM

#

Hey @scenic tulip!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

#

Hey @scenic tulip!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

scenic tulip May 22, 2022, 3:51 PM

#

https://paste.pythondiscord.com/rozikafixe

#

So I have lists in a file each containing 20 elements, which are integers. I need to be able to find the difference of the first 2, take that set of values and apply it against the next list....so on and so forth until there are no more lists to process

fervent vale May 22, 2022, 4:06 PM

#

Hello guys, I have some data with a ranking of X names each name assigned with a score. The ranking is performed according to the score going from the highest to the lowest score. I want to perform some clustering to see if there are some clusters in the dataset between the names. Is k-means approach the best way to do this ?

steady basalt May 22, 2022, 4:21 PM

#

scenic tulip So I have lists in a file each containing 20 elements, which are integers. I ne...

What model are you using, linear or logistic regression?

#

Oh a neural network

#

I don’t think a neural network would be required for that

scenic tulip May 22, 2022, 4:28 PM

#

linear

steady basalt May 22, 2022, 4:28 PM

#

What’s the NN for

scenic tulip May 22, 2022, 4:28 PM

#

to predict the trend based on all previous results

steady basalt May 22, 2022, 4:29 PM

#

How big is this dataset?

#

Is each feature a cumulative sum of the previous ?

#

Or it’s just independently made from two of its own columns

scenic tulip May 22, 2022, 4:29 PM

#

ok so im trying to predict Keno, so i have all the previous keno drawings

steady basalt May 22, 2022, 4:29 PM

#

What is keno

scenic tulip May 22, 2022, 4:29 PM

#

ive done simple maths on the entire dataset

#

Keno is a lottery game

steady basalt May 22, 2022, 4:30 PM

#

If ur trying to predict the lottery 😂

scenic tulip May 22, 2022, 4:30 PM

#

but it doesn't get picked by floating balls that get shat out tubes, it uses PRNG

steady basalt May 22, 2022, 4:30 PM

#

Ohh

scenic tulip May 22, 2022, 4:30 PM

#

I tried to find the seed that matched the first lottery drawing

#

after 4.29 billion combos still nothing

steady basalt May 22, 2022, 4:30 PM

#

Sounds hopeless

#

How many features do u have

scenic tulip May 22, 2022, 4:31 PM

#

so i don't think i can find the mersinne value of an array lol, or else id find them all

steady basalt May 22, 2022, 4:31 PM

#

One for each week results?

scenic tulip May 22, 2022, 4:31 PM

#

no each day

#

https://paste.pythondiscord.com/rozikafixe

steady basalt May 22, 2022, 4:31 PM

#

By mersenne you’re referencing the twister?

scenic tulip May 22, 2022, 4:32 PM

#

yes, because PRNG is actually not random. so if i could find the MT values of the drawings, perhaps i could find a correlation between the PRNG idk lol

steady basalt May 22, 2022, 4:33 PM

#

Wait what is a mersenne number again

scenic tulip May 22, 2022, 4:33 PM

#

ive looked into bit shifting, which in my case id have to reverse the mersinne twister. that's fine with single integers

#

the mersinne number is what is used to generate the random number

steady basalt May 22, 2022, 4:33 PM

#

scenic tulip yes, because PRNG is actually not random. so if i could find the MT values of th...

Sounds completely impossible dude

#

Interesting topic though

scenic tulip May 22, 2022, 4:33 PM

#

so if i seed the PRNG with the value of 1 and run it, it will produce a result, if i close and restart it produces the same result

#

if you run that same value of 1 in a loop of x, you get different results because they bit shift the seed value to produce a new result

steady basalt May 22, 2022, 4:34 PM

#

Ahhh mersenne is 2n-1

scenic tulip May 22, 2022, 4:34 PM

#

that being said, i do think its quite impossible so im trying to go back through and find the trend of all the data

steady basalt May 22, 2022, 4:35 PM

#

Wouldn’t you need like

#

Infinite drawings

scenic tulip May 22, 2022, 4:35 PM

#

problem is im not sure if it's accumulating the data right

#

well kinda LOL

steady basalt May 22, 2022, 4:35 PM

#

There’s prob only been a few thousand winning combos so far

scenic tulip May 22, 2022, 4:35 PM

#

but there are a finite amount of drawings

steady basalt May 22, 2022, 4:35 PM

#

I do not think that’s enough

scenic tulip May 22, 2022, 4:36 PM

#

hmmm

steady basalt May 22, 2022, 4:36 PM

#

Even if u had 100,000 drawings

#

Just a load of pseudo random draws

#

What’s the plan?

scenic tulip May 22, 2022, 4:36 PM

#

ill show you what i tried before

#

from random import randrange, sample, shuffle
import numpy as np
import time as t

ra = []
fa = []
temp = 0.0
inf = float('inf')
counter = 0
actualcount = 0
dummyarr = np.array([ 1, 2, 18, 19, 22, 24, 25, 35, 41, 44, 45, 50, 52, 54, 55, 59, 67, 68, 70, 74])

while np.array_equiv(dummyarr, ra) != True:
actualcount += 1
np.random.seed(counter)
#ra = np.random.choice(range(1, 80), size=20, replace=False)
ra = np.array(sample(range(1, 81), k=20))
fa = np.sort(ra)
ra = fa
print(ra)
if actualcount <= inf:
counter += 1
temp = inf
print(ra)
print("Count")
print(counter)
print("ActualCount")
print(actualcount)
print("Infinity = {}".format(temp))

#

so in that i had a dummy array that i tried to find the seed value for. that failed after i ran out of range on 64 bit which is 4.29 billion combos. i think theres like a trillion combos for keno, not sure

#

you like the actualcount <= inf ? 😂

steady basalt May 22, 2022, 4:42 PM

#

are you trying to roll until you hit the dummy?

#

How long that that run for?

#

It’s not a trillion it’s 20^20 combos right?

#

Not possible unless you have nasa supercomputer from 100 years in the future

#

Even if u did manage to land on their combo that doesn’t tell u about any seed value and it doesn’t tell u if they use a constant value

#

Waste of time

scenic tulip May 22, 2022, 4:55 PM

#

well there are 80 numbers

#

20 get picked each draw

#

yeah i was using different seed values to try to hit the dummy array, but yeah finding all those would be impossible

#

you get to pick up to 10 numbers. I just really want to see if they truly randomize the game with PRNG or if someone is behind the scenes inputting the numbers

#

i ran it for like 2 days and it hit 4.29 billion combinations before it ran out of memory for the counter

scenic tulip May 22, 2022, 5:00 PM

#

steady basalt Even if u did manage to land on their combo that doesn’t tell u about any seed v...

It really should because it would give me the seed value for that set of drawings. if i can find the seed values in a consecutive set of drawings I could tell if it's actually random. really i would need to sample everything. there are actually alot of ways to go about it but it's just time

#

I could figure out how they shifted the bits to arrive at that conclusion, basically deciphering the mersinne and reverse mersinne twistering all future drawings into the correct seed

fading geyser May 22, 2022, 5:18 PM

#

hey guys... who all here know how to use pandas

#

i needed some help

serene scaffold May 22, 2022, 5:21 PM

#

fading geyser hey guys... who all here know how to use pandas

well, just ask your pandas question. try dragging the CSV file into this chat and explain what you are trying to do.

fading geyser May 22, 2022, 5:22 PM

#

i'm trying to read a csv file into jupyter notebooks but i am getting an error

serene scaffold May 22, 2022, 5:22 PM

#

fading geyser i'm trying to read a csv file into jupyter notebooks but i am getting an error

copy/paste the text of the error

fading geyser May 22, 2022, 5:23 PM

#

table = pd.read_csv("wineReviews.csv")

le Code*

arctic wedgeBOT May 22, 2022, 5:23 PM

#

Hey @fading geyser!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

#

Hey @fading geyser!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

serene scaffold May 22, 2022, 5:23 PM

#

read the message

fading geyser May 22, 2022, 5:24 PM

#

it's too lon

#

long

serene scaffold May 22, 2022, 5:24 PM

#

yes, so read the message and it tells you what to do.

arctic wedgeBOT May 22, 2022, 5:26 PM

#

Hey @fading geyser!

It looks like you tried to attach file type(s) that we do not allow (.docx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold May 22, 2022, 5:26 PM

#

you have to use the paste bin

#

!paste

arctic wedgeBOT May 22, 2022, 5:26 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

timid narwhal May 22, 2022, 5:27 PM

#

does anyone know a good way to open .fst files inside of python to be used with pandas?

fading geyser May 22, 2022, 5:27 PM

#

https://paste.pythondiscord.com/ajedaxotac

#

try this

fading geyser May 22, 2022, 5:27 PM

#

timid narwhal does anyone know a good way to open .fst files inside of python to be used with ...

sry bro no idea

fading geyser May 22, 2022, 5:27 PM

#

serene scaffold you have to use the paste bin

yeah just did it

serene scaffold May 22, 2022, 5:27 PM

#

fading geyser yeah just did it

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 10: invalid start byte -- so the problem is that the file encoding for the csv file is not utf-8

fading geyser May 22, 2022, 5:28 PM

#

and how to change it

serene scaffold May 22, 2022, 5:28 PM

#

try pd.read_csv("wineReviews.csv", encoding='ascii')

fading geyser May 22, 2022, 5:28 PM

#

ohk

#

File "<ipython-input-9-b6ef65161787>", line 1
table = pd.read_csv("wineReviews.csv" encoding='ascii')
^
SyntaxError: invalid syntax

serene scaffold May 22, 2022, 5:29 PM

#

add the comma

fading geyser May 22, 2022, 5:29 PM

#

this error is showing

serene scaffold May 22, 2022, 5:31 PM

#

I have to leave soon, btw. did you add the comma and re-run?

fading geyser May 22, 2022, 5:31 PM

#

https://paste.pythondiscord.com/wucomidabo

fading geyser May 22, 2022, 5:31 PM

#

serene scaffold I have to leave soon, btw. did you add the comma and re-run?

yeah it didn't work

fading geyser May 22, 2022, 5:32 PM

#

fading geyser https://paste.pythondiscord.com/wucomidabo

this is the error

serene scaffold May 22, 2022, 5:33 PM

#

try pd.read_csv("wineReviews.csv", encoding_errors='ignore')

fading geyser May 22, 2022, 5:33 PM

#

https://paste.pythondiscord.com/ucunapoxiy

#

another error

serene scaffold May 22, 2022, 5:34 PM

#

fading geyser https://paste.pythondiscord.com/ucunapoxiy

I won't be able to diagnose the problem without seeing the file itself, unfortunately.

#

something about the way it's encoded is unexpected.

fading geyser May 22, 2022, 5:34 PM

#

📎 wineReviews.csv

#

this is the file

serene scaffold May 22, 2022, 5:34 PM

#

why does it look like that?

fading geyser May 22, 2022, 5:34 PM

#

no idea

serene scaffold May 22, 2022, 5:35 PM

#

how did you get it?

fading geyser May 22, 2022, 5:35 PM

#

from kaggle.com

#

an open dataset

serene scaffold May 22, 2022, 5:35 PM

#

it must be corrupted in some way. because csv files are supposed to be human readable

#

like, it will literally just be comma-separated values

fading geyser May 22, 2022, 5:35 PM

#

ok

#

i'll seeit tmrw now ig.... its 12 am here

serene scaffold May 22, 2022, 5:36 PM

#

night night

fading geyser May 22, 2022, 5:36 PM

#

night

steady basalt May 22, 2022, 5:44 PM

#

scenic tulip I could figure out how they shifted the bits to arrive at that conclusion, basic...

If u crack it share the winnings with me

#

I’ll take 1%

fervent vale May 22, 2022, 6:00 PM

#

Hello, I have some data with a ranking of X names each name assigned with a score. The ranking is performed according to the score going from the highest to the lowest score. I want to perform some clustering to see if there are some clusters in the dataset between the names. Is k-means approach the best way to do this ?

serene scaffold May 22, 2022, 6:41 PM

#

fervent vale Hello, I have some data with a ranking of X names each name assigned with a scor...

clustering involves points in space. what are the coordinates going to be?

fervent vale May 22, 2022, 6:47 PM

#

serene scaffold clustering involves points in space. what are the coordinates going to be?

Ok so basically I have 7 sub-rankings and 1 ranking which aggregate the 7 sub rankings. Can these 7 sub-rankings be my coordinates ?

#

Or should I lower the number of variables ?

thin palm May 22, 2022, 7:36 PM

#

if we have a correlation that shows -.7 between two features, my intuition would be to drop it. What's everyone else's thoughts on this?

mild dirge May 22, 2022, 7:37 PM

#

correlation of -0.7, what does that mean you think?

#

correlation of 0 means no correlation, 1 means a lot, -1 means a lot (but in negative direction)

#

so when one is higher, the other is lower and vice versa

#

Doesn't sound like useless information

#

and even then, no correlation between a feature and the desired output does not mean the feature cannot be useful in combination with other features

serene scaffold May 22, 2022, 7:47 PM

#

fervent vale Ok so basically I have 7 sub-rankings and 1 ranking which aggregate the 7 sub ra...

I guess. what would it mean if two instances were close together according to that coordinate system?

thin palm May 22, 2022, 7:47 PM

#

mild dirge and even then, no correlation between a feature and the desired output does not ...

okay gotcha, makes sense. Good looks on this

fervent vale May 22, 2022, 7:59 PM

#

serene scaffold I guess. what would it mean if two instances were close together according to th...

It would mean that 2 names have a similar profile in their ranking profile. Just imagine X names with classes rankings in Maths, Physics, English, German etc. People from a « scientific » cluster would have typically higher scores in Physics & Maths etc

#

Does thats makes sense to you

#

Ok dropping rankings w/ an absolute correlation of 0.7 makes sense

#

And above*

mild dirge May 22, 2022, 8:01 PM

#

yeah, so if you want to use those grades to cluster the people with different sets of skills that would work

fervent vale May 22, 2022, 8:01 PM

#

OK

mild dirge May 22, 2022, 8:01 PM

#

It wouldn't really give a label for each cluster though

#

it would jsut show a cluster for people with certain grades for certain subjects, but it won't say something like "scientific people"

#

But maybe you find a cluster of people who are really good at science subjects, and that they are commonly bad at languages f.e.

fervent vale May 22, 2022, 8:02 PM

#

Thats perfect, I can interpret each category by myself, is kmeans the proper approach to do this ?

mild dirge May 22, 2022, 8:03 PM

#

https://scikit-learn.org/stable/modules/clustering.html

scikit-learn

2.3. Clustering

Clustering of unlabeled data can be performed with the module sklearn.cluster. Each clustering algorithm comes in two variants: a class, that implements the fit method to learn the clusters on trai...

#

Maybe check this out

#

there is more than kmeans, some work better for specific settings

#

The interpretation of the clusters would mainly be the mean of the cluster (so the average grades for the courses), the spread and the amount of points in a cluster

fervent vale May 22, 2022, 8:05 PM

#

Perfect

#

Thanks

#

Do you have some idea about which approach would be the most adapted to the problem at first sight

#

From your experience

#

?

mild dirge May 22, 2022, 8:06 PM

#

absolutely not, I don;'t have your data 😛

#

Might be good to try visualize your data in some way, and get to understand it such that you can make an informed decision yourself

#

the link I just gave also shows what usecases each method has, try to see what your situation is with respect to those usecases

fervent vale May 22, 2022, 8:08 PM

#

Very good advice

#

I will do that

thin palm May 22, 2022, 8:49 PM

#

Does anybody know how to interpret this? It's telling me yearsExpereince, milesFromMetropolis, and degree have the least benefits? no way

Screen_Shot_2022-05-22_at_2.48.55_PM.png

stoic viper May 22, 2022, 8:53 PM

#

Hey, i have a project for university. Its the prediction of energy usage of electric busses.
I decided to ujse XGBoost and after all i get an r2_score of 0.991. Im really surpürised cause that felt really easy and now im worried something ist right.

#

I mean i put such low effort into it.

#

Is going for a lower r2score better sometimes?

mild dirge May 22, 2022, 8:55 PM

#

thin palm Does anybody know how to interpret this? It's telling me yearsExpereince, milesF...

Why do those values go so high? wut

thin palm May 22, 2022, 8:55 PM

#

mild dirge Why do those values go so high? wut

that's what I'm sayingggg

thin palm May 22, 2022, 8:55 PM

#

mild dirge Why do those values go so high? wut

maybe because I have 92 features

mild dirge May 22, 2022, 8:55 PM

#

Must be something that went wrong right?

#

Nah doubt it

#

how is it scored anyways?

thin palm May 22, 2022, 8:55 PM

#

mild dirge Must be something that went wrong right?

well I'm not sure... my model scored 74%

stoic viper May 22, 2022, 8:56 PM

#

thats surprisingly good for that values lol

mild dirge May 22, 2022, 8:56 PM

#

can you try scoring='accuracy'

#

?

#

or does it take really long to run?

thin palm May 22, 2022, 8:56 PM

#

mild dirge or does it take really long to run?

def takes some time to run, let me try it real quick

mild dirge May 22, 2022, 8:57 PM

#

That way we at least know what the score means

#

not sure what the default is

thin palm May 22, 2022, 8:57 PM

#

mild dirge That way we at least know what the score means

cv_score['test_score'].mean()

mild dirge May 22, 2022, 8:57 PM

#

appareantly "If None, the estimator’s default scorer is used"

#

didn't know the model would have an associated score

thin palm May 22, 2022, 8:57 PM

#

got an error?
ValueError: Classification metrics can't handle a mix of multiclass and continuous targets

stoic viper May 22, 2022, 8:58 PM

#

you try to put regression on classification

#

if om not mistaken

thin palm May 22, 2022, 8:58 PM

#

ahh we can't use accuracy for regression metrics

stoic viper May 22, 2022, 8:58 PM

#

yeah

mild dirge May 22, 2022, 8:58 PM

#

thin palm well I'm not sure... my model scored 74%

Your model scored an accuracy you say?

#

But isn't the problem a regression problem?

stoic viper May 22, 2022, 8:59 PM

#

mean squared error

#

pls

mild dirge May 22, 2022, 8:59 PM

#

yeah but how did you get this accuracy?

thin palm May 22, 2022, 8:59 PM

#

mild dirge But isn't the problem a regression problem?

Correct a regression problem, I guess technically I put a cross validate score which found the 'test_score'.mean()

mild dirge May 22, 2022, 9:00 PM

#

alright, try ‘r2’

stoic viper May 22, 2022, 9:00 PM

#

or neg mean squared

thin palm May 22, 2022, 9:00 PM

#

okay running now

mild dirge May 22, 2022, 9:00 PM

#

and what kinda values is your y data?

#

like on average

thin palm May 22, 2022, 9:01 PM

#

mild dirge and what kinda values is your y data?

it's a salary column

mild dirge May 22, 2022, 9:01 PM

#

alright, so nothing over a few million then I assume

thin palm May 22, 2022, 9:01 PM

#

mostly int64 of numbers ranging from 50k to 300k

stoic viper May 22, 2022, 9:01 PM

#

thats a hell lot of money

thin palm May 22, 2022, 9:01 PM

#

surgeons lmao

mild dirge May 22, 2022, 9:01 PM

#

rubles*

thin palm May 22, 2022, 9:01 PM

#

haha USD

stoic viper May 22, 2022, 9:01 PM

#

then those numbers are in cents?

thin palm May 22, 2022, 9:02 PM

#

no it's like this for example 120, 200, 45, etc

mild dirge May 22, 2022, 9:02 PM

#

what is the NONE feature btw?

thin palm May 22, 2022, 9:02 PM

#

mild dirge what is the NONE feature btw?

NONE meaning no degree obtained

mild dirge May 22, 2022, 9:02 PM

#

ah right

#

so it's a 1 or 0?

#

and what is the model?

thin palm May 22, 2022, 9:03 PM

#

so we want to predict salaries, degree obtained is with ordinal encoding- 0(HS) 1(bachelores) etc

#

LinearRegression

#

score came back: 0.7434167004759114

thin palm May 22, 2022, 9:04 PM

#

mild dirge and what is the model?

^ first model is super basic, no features being dropped just wanna test ito ut

mild dirge May 22, 2022, 9:05 PM

#

yeah thought if it was a nn, it might just only focus on the job, as that already says quite a lot about the salary

#

How did you normalize the years experience?

#

or did you not?

thin palm May 22, 2022, 9:05 PM

#

mild dirge yeah thought if it was a nn, it might just only focus on the job, as that alread...

Normalized them with MinMax Scaler

#

I normalized by numerical features which was years experience and miles From Metropolis

#

Also I had a column of companyId which was 63 unique values that I OHE.. thought about dropping this as we have a industry feature that may also tell us about the companies and thus reduce our features by 63 columns

mild dirge May 22, 2022, 9:08 PM

#

Well I mean, that is what we are trying to find out with this permutation right

#

Maybe there is one or two companies that pay their employees like 10x more than others, and it would still be good to keep those in

thin palm May 22, 2022, 9:09 PM

#

mild dirge Well I mean, that is what we are trying to find out with this permutation right

Basically yes, my matrix correlation showed some cool things and the table made sense but then I ran into hits

fading gate May 22, 2022, 9:13 PM

#

anyone here use hyperopt or some kind of DEOptim library? I'm not sure at least with hyperopt how to batch runs together, i.e, have it return a set of 50 or so samples that I can run at a time then return all 50 results before it iterates the next batch of parameters

tired fog May 22, 2022, 9:21 PM

#

anyone know how complicated it is to convert a large matlab script to python?

#

not counting specific, just general similarity between syntaxes

mild dirge May 22, 2022, 9:23 PM

#

I mean it heavily depends on what kind of code you want to convert

#

sometimes there are high level functions that are similar in both languages

#

If you want to plot something using matplotlib f.e. then obviously things are going to be quite similar etc.

#

I think i'm pretty comfortable with python, but having a hard time getting used to matlab, so to me the languages feel quite different

tired fog May 22, 2022, 9:30 PM

#

gotcha

#

im much more familiar with computational computing with matalb

#

but i have an equation in the project im working on that wont plot in matlab, so i had to plot it in wolfram mathematica and generate a matrix of the values to import into matlab for a specific example

#

so im just trying to find a better solution, as i need my project to be able to run self sufficiently

lapis sequoia May 23, 2022, 1:00 AM

#

do u know any trained cnn for object salient detection?

jade fjord May 23, 2022, 1:54 AM

#

I'm kinda new with AI, but I made this program that used a dataset of types of brain tumors and used tensorflow to make a neural network which evaluates the accuracy it can distinguish the types of brain tumors, I was wondering how this could actually be used? Like what's the logic of using your own data entry of a made up brain tumor data and inputting it in this newly built AI?

serene scaffold May 23, 2022, 2:29 AM

#

jade fjord I'm kinda new with AI, but I made this program that used a dataset of types of b...

I don't really know how brain tumors work, but there's such a thing as assistive AI, where models can be used to assist (but not replace) professionals in certain areas. Models that assist doctors in making diagnoses or designing treatment plans exist, but they have to be held to very high standards

#

Also I'm confused. Did you create a model that does this, or are you wanting to make one?

jade fjord May 23, 2022, 3:14 AM

#

I made one, but its with a csv file with data not actual images

#

such as width, height, depth, color, of certain examinations

#

and i ran it and it gave me a overall accuracy of around 97%

jade fjord May 23, 2022, 3:17 AM

#

serene scaffold Also I'm confused. Did you create a model that does this, or are you wanting to ...

so i just was wondering how you could actually use it, like could i use my own piece of data

#

i can show the program its pretty small

#

`import pandas as pd
import tensorflow as tf

dataset = pd.read_csv('cancer.csv')

x = dataset.drop(columns=["diagnosis(1=m, 0=b)"])
y = dataset["diagnosis(1=m, 0=b)"]

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(256, input_shape=x_train.shape[1:], activation='sigmoid'))
model.add(tf.keras.layers.Dense(256, activation='sigmoid'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=1000)

model.evaluate(x_test, y_test)`

tacit basin May 23, 2022, 3:20 AM

#

jade fjord `import pandas as pd import tensorflow as tf dataset = pd.read_csv('cancer.csv'...

So you trained a model. Nice. What you want to do with it you said?

jade fjord May 23, 2022, 3:20 AM

#

Like put it into practice

#

like if i ever wanted to make a UI and have some enter their own data

#

whats the next step after training a model and putting it into practice

tacit basin May 23, 2022, 3:22 AM

#

You would need an input data in the same form as training data, without target of course. Then you can make prediction

jade fjord May 23, 2022, 3:23 AM

#

so another csv file with inputted data

#

is there a way it can show me the value 0 or 1 for each entry that it predicts

tacit basin May 23, 2022, 3:25 AM

#

model.predict() https://www.tensorflow.org/guide/keras/train_and_evaluate

TensorFlow

Training and evaluation with the built-in methods | TensorFlow Co...

tacit basin May 23, 2022, 3:28 AM

#

jade fjord is there a way it can show me the value 0 or 1 for each entry that it predicts

Is your training data 0 ans 1s now?

jade fjord May 23, 2022, 3:29 AM

#

well the tumor is either malignant or benign so its set m as 0 and b as 1

#

so i meant show either as a 0 or 1

#

i can show one sec

#

tacit basin May 23, 2022, 3:32 AM

#

Yeah, if that's the target then you will get such output on predict too.

#

Just note that if your dataset is not balanced, then you 97% accuracy score could mean nothing if there are a lot more 1s than zeros

#

Say you have 97% of ones in your set. Then if you always predict 1 then you are always have 97% accuracy. But you can always predict 1 without ML ;)

jade fjord May 23, 2022, 3:41 AM

#

oh ya lol theres a lot more 0's below idk why theres a bunch of 1's in the beginning

untold smelt May 23, 2022, 4:31 AM

#

could anyone take a look at my A* algorithm and tell me what im doing wrong

lapis sequoia May 23, 2022, 4:34 AM

#

help pls

dapper axle May 23, 2022, 5:15 AM

#

This may seem like a dumb question, But if im trying to get into data science and machine learning; Where is a good place to start or learn?

vernal solstice May 23, 2022, 5:23 AM

#

hello

#

#

is this overfitting or underfitting?

#

im kinda confused

tacit basin May 23, 2022, 5:31 AM

#

dapper axle This may seem like a dumb question, But if im trying to get into data science an...

https://allendowney.github.io/ElementsOfDataScience/README.html

thin palm May 23, 2022, 5:38 AM

#

I have 1 million lines of data... any advice on how to optimize this for training and fitting a model? My random forest regression takes soooooooooo long

#

still need to test other models as well

tacit basin May 23, 2022, 5:58 AM

#

thin palm I have 1 million lines of data... any advice on how to optimize this for trainin...

You could use GPU for training one option

thin palm May 23, 2022, 5:59 AM

#

tacit basin You could use GPU for training one option

Can i do just reduce the sample to 20% of the data? Variance will be the same

#

Take a small sample instead of running experiments, feature engineering, and training baseline models on all the data. Typically, 10–20% is enough. Here is how it is done in pandas:

sample_df.shape
(191583, 120)```

tacit basin May 23, 2022, 5:59 AM

#

Also valid option

thin palm May 23, 2022, 5:59 AM

#

tacit basin Also valid option

Yeah will do this to build my model and I'll be happy 🙂

tacit basin May 23, 2022, 6:00 AM

#

Just need to be sure the sample of the data is valid representation of the full dataset or even better production data

thin palm May 23, 2022, 6:01 AM

#

tacit basin Just need to be sure the sample of the data is valid representation of the full ...

How do I do that?

tacit basin May 23, 2022, 6:01 AM

#

Do eda on the data and compare it to sample

thin palm May 23, 2022, 6:01 AM

#

tacit basin Do eda on the data and compare it to sample

ahh I see what you mean

tacit basin May 23, 2022, 6:01 AM

#

How do you train? What hparams?

thin palm May 23, 2022, 6:02 AM

#

For time sake I may not do that, because it's for job interview assignment I'm sure it's a cool technique I'll show them. They gave me 1 million lines of data so that could've been a tester

tacit basin May 23, 2022, 6:02 AM

#

1 million rows should not be that painful

thin palm May 23, 2022, 6:02 AM

#

But why is it??

tacit basin May 23, 2022, 6:02 AM

#

Unless you have millions of features?

thin palm May 23, 2022, 6:02 AM

#

I'm testing 1 million lines of data and 27 features

#

I assumed that's what's taking it so long?

tacit basin May 23, 2022, 6:03 AM

#

Depends of machine as well

thin palm May 23, 2022, 6:03 AM

#

@tacit basin Does this graph make sense to you for important features for determining your salary??

Screen_Shot_2022-05-23_at_12.03.04_AM.png

#

2021 Mac 16inch... Fairly good computer

#

really would've thought miles From Metropolis would've been least concern lol

tacit basin May 23, 2022, 6:05 AM

#

Don't know not a salary expert. But would love to learn outcome of your study to optimize my salary :)

thin palm May 23, 2022, 6:05 AM

#

tacit basin Don't know not a salary expert. But would love to learn outcome of your study to...

I just hope my machine learning model is good :(( I'd like this job haha

hollow flare May 23, 2022, 6:33 AM

#

I want roadmap for data analyst

dapper axle May 23, 2022, 6:45 AM

#

tacit basin https://allendowney.github.io/ElementsOfDataScience/README.html

thank you!

errant fern May 23, 2022, 7:41 AM

#

hey how can i apply kfold with MLPRegressor ?

#

i have 2 outputs and 7 input dimensions

#

`from pandas import read_csv
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, cross_val_score, validation_curve,cross_val_predict
import numpy as np
import matplotlib.pyplot as plt

inputs = df_xg_nu[feat]
targets = df_xg_nu[['home_team_goal','away_team_goal']]

x_train, x_test, y_train, y_test = train_test_split(
inputs, targets, test_size=0.25, random_state=2)

rate1 = 0.005
rate2 = 0.1

mlpr = MLPRegressor(hidden_layer_sizes=(12,10), max_iter=700, learning_rate_init=rate1)

scores = cross_val_predict(mlpr, inputs, targets, cv=5)
print(scores)`

#

I'm trying to fit elo scores against soccer match scores .

#

feat=['elo_offensive_1', 'elo_defensive_1', 'elo_home_offensive_1', 'elo_home_defensive_1', 'elo_offensive_2', 'elo_defensive_2', 'elo_away_offensive_2', 'elo_away_defensive_2','homey']

carmine silo May 23, 2022, 8:23 AM

#

hi, i am currently researching and making a script to autoplay my game, now i want to add a command line at line 23 so that it can recognize that the match has been matched earlier than expected. initial opinion and continue to execute the next commands in the event, how to do it?

steady basalt May 23, 2022, 9:55 AM

#

thin palm really would've thought miles From Metropolis would've been least concern lol

Makes sense don’t peopel in cities get paid more

steady basalt May 23, 2022, 9:58 AM

#

thin palm <@490342783572246538> Does this graph make sense to you for important features f...

Interesting how exec roles are valued low

compact rose May 23, 2022, 10:42 AM

#

Hello guys, i'm currently working on Pyspark. I have a question about i can do a thing. I want to eliminate rows based on column values. I have two features (Home,HomeVariant) and i want to drop if both are positive in the same row. How do i do this?

#

I wanted to use something like this : " If ViewHome = 1 & ViewHomeVariant = 1 | then drop

rose agate May 23, 2022, 11:11 AM

#

compact rose I wanted to use something like this : " If ViewHome = 1 & ViewHomeVariant = 1 ...

try df.drop(df.loc[(df['ViewHome'] > 0) & (df['ViewHomeVariant'] > 0)].index)

arctic wedgeBOT May 23, 2022, 11:17 AM

#

:incoming_envelope: :ok_hand: applied mute to @lean cave until <t:1653305221:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

dusty valve May 23, 2022, 11:39 AM

#

how do i load data from a BatchDataSet tf?

fleet plover May 23, 2022, 11:40 AM

#

May I know why https://gist.github.com/buttercutter/b6f526c56e20f029d68e6f9041c3f5c0#file-gdas-py-L396 gives runtime error on inplace operation ?

arctic wedgeBOT May 23, 2022, 11:40 AM

#

gdas.py line 396

self.nodes[n-1].connections[ni].forward(x, types=types)  # Ltrain(w±, alpha)```

verbal hedge May 23, 2022, 1:54 PM

#

Hello everyone

#

Please I need a little assistance in credit risk modeling in python. Link to some learning resources will do.

serene scaffold May 23, 2022, 2:16 PM

#

verbal hedge Please I need a little assistance in credit risk modeling in python. Link to som...

you're unlikely to get help if you ask to ask. try asking your actual question.

compact rose May 23, 2022, 2:39 PM

#

Does anyone know why it appear as null?

jade chasm May 23, 2022, 2:59 PM

#

Does anyone know how to solve the cuda discrepancy between these two envs? I need cuda working on my yolo env as well for training some models

#

I tried uninstalling torch and reinstalling with https://pytorch.org/get-started/locally/

PyTorch

blissful locust May 23, 2022, 3:16 PM

#

What all math concepts do I need to learn (like matrix) to learn ML using Scikit learn?

vital ruin May 23, 2022, 3:18 PM

#

Hello! I am wanting to use the numba njit decorator in one of my classes but only if the import is successful. I was able to hand the import with:
with contextlib.suppress(Exception): from numba import jit
But I can't figure out how to only use the decorator if jit was imported. The only way I can get it to work is pull it out of the class and use a try and except block on a normal function and call it from the class but thats not very clean and i end up writing the function twice

serene scaffold May 23, 2022, 3:28 PM

#

vital ruin Hello! I am wanting to use the numba njit decorator in one of my classes but onl...

try:
    from numba import jit
except ImportError:
    jit = lambda func: func

I guess

#

if you have jit = lambda func: func then using @jit as a decorator will have no effect

vital ruin May 23, 2022, 3:31 PM

#

Ya I was able to handle the import just fine but if I add
@jit def func(self): ...
it will obviously error if I run it outside of my venv without numba installed

serene scaffold May 23, 2022, 3:31 PM

#

vital ruin Ya I was able to handle the import just fine but if I add `@jit def func(self):...

do you understand what my proposed code does?

#

simply suppressing the potential import error eliminates your only opportunity to know that the import failed

vital ruin May 23, 2022, 3:34 PM

#

Ya I see what you are saying that I was just hoping for a way to do something like
try: @jit except: pass def func.....

#

inside the class

serene scaffold May 23, 2022, 3:35 PM

#

vital ruin Ya I see what you are saying that I was just hoping for a way to do something li...

that isn't syntactically valid, unfortunately.

vital ruin May 23, 2022, 3:36 PM

#

Yep which sucks. I will probably just abstract two versions or use an interface or factory instead. Thanks for your help!

serene scaffold May 23, 2022, 3:36 PM

#

you don't want to use my solution?

#

here's another option

HAS_NUMBA = True
try:
    from numba import jit
except ImportError:
    HAS_NUMBA = False

...

def func(a, b):
    ...

if HAS_NUMBA:
    func = jit(func)

vital ruin May 23, 2022, 3:39 PM

#

Oh.... I understand what you mean now. I haven't used jit before now so I didn't look into if you could use it as not a decorator. My bad

#

Their overview just shows it being used as a decorator

serene scaffold May 23, 2022, 3:39 PM

#

vital ruin Oh.... I understand what you mean now. I haven't used jit before now so I didn't...

in Python, a decorator is always a function. so this is just leveraging how Python itself works, rather than something specific to numba.

vital ruin May 23, 2022, 3:40 PM

#

Got ya thank you!

serene scaffold May 23, 2022, 3:40 PM

#

vital ruin Got ya thank you!

@decorate
def func(a, b):
    ...

# is the same as...

def func(a, b):
    ...
func = decorate(func)

#

when you use a decorator, the function you're decorating gets passed to the decorator, which is also a function. and then whatever the decorator returns gets re-bound to the original name of the function

#

it's not even required that the decorator return a different function. or that it return a function

#

it can return whatever it wants

vital ruin May 23, 2022, 3:42 PM

#

If I am passing kwargs in the decorator currently nopython=True I assume I can just add those in in the decorate func call?

#

Is there a standard for the kwarg for the function I am passing it?

serene scaffold May 23, 2022, 3:44 PM

#

vital ruin If I am passing kwargs in the decorator currently `nopython=True` I assume I can...

no, if your decorator looks like this

@decorator(x, y, z)
def func(a, b):
    ...

then decorator is a function that returns a function, and that returned function is the decorator.

#

def func(a, b):
    ...

func = decorator(x, y, z)(func)

meaning that this is the semantics.

vital ruin May 23, 2022, 3:45 PM

#

ok cool! Thanks!

serene scaffold May 23, 2022, 3:45 PM

#

welcome to #decorators-and-ai

thin palm May 23, 2022, 3:45 PM

#

steady basalt Interesting how exec roles are valued low

so by this it means that we dont have many values of type C execs, but we can see the importance of that column of "JobType"

#

any ideas on dropping the highly negative correlations? I'm afraid of dropping NONE because it's used up as the top correlation

Screen_Shot_2022-05-23_at_9.47.31_AM.png

steady basalt May 23, 2022, 3:49 PM

#

thin palm any ideas on dropping the highly negative correlations? I'm afraid of dropping N...

U don’t have to drop anything btw

#

I feel like feature selection is such a forced process in many projects

#

Which model will you use

#

Or will you choose best performing

thin palm May 23, 2022, 3:53 PM

#

steady basalt I feel like feature selection is such a forced process in many projects

Man was seriously thinking this...

thin palm May 23, 2022, 3:53 PM

#

steady basalt U don’t have to drop anything btw

I'm running three Models one for LinearRegression, Random Forest, and KNN Regressor

steady basalt May 23, 2022, 3:54 PM

#

Why don’t u run them all and choose best

#

It takes like 10 lines of code

#

Use a box plot

thin palm May 23, 2022, 3:54 PM

#

steady basalt Why don’t u run them all and choose best

Yup exactly what I'm doing haha

thin palm May 23, 2022, 3:54 PM

#

steady basalt Use a box plot

box plot for what?

steady basalt May 23, 2022, 3:54 PM

#

U can do it in one cell

thin palm May 23, 2022, 3:54 PM

#

steady basalt Use a box plot

i'm using a CV_SCORE to see where they range at

steady basalt May 23, 2022, 3:55 PM

#

Run all three default models and output a plot of each average score over 10 folds

#

Ez

#

But u shud seriously consider adding other models to this plot

thin palm May 23, 2022, 3:56 PM

#

steady basalt But u shud seriously consider adding other models to this plot

how many models should I run? I was only gonna do 3 and compare the RMSE scores?

steady basalt May 23, 2022, 3:56 PM

#

Do 5+ I’d say

#

I think xgb has regressive

thin palm May 23, 2022, 3:56 PM

#

steady basalt Do 5+ I’d say

perfect

steady basalt May 23, 2022, 3:56 PM

#

Output a box plot of each models cross fold scores

#

Cross validation

thin palm May 23, 2022, 3:57 PM

#

steady basalt Output a box plot of each models cross fold scores

sweet man, sounds good! Appreciate the help

steady basalt May 23, 2022, 3:58 PM

#

Then consider taking the highest accuracy model

#

Unless you will prioritise AUC and ignore accuracy

thin palm May 23, 2022, 3:59 PM

#

steady basalt Then consider taking the highest accuracy model

the job interview wants me to work with RMSE

#

so I'm gonna take the RMSE of each model and compare them

steady basalt May 23, 2022, 3:59 PM

#

Oh of course it isn’t classification

#

My bad

#

Ur predicting salary

thin palm May 23, 2022, 3:59 PM

#

no worries

#

yes

steady basalt May 23, 2022, 3:59 PM

#

Fair

thin palm May 23, 2022, 3:59 PM

#

but they do ask for another metric so i'm gonna see what other good approaches are

#

maybe accuracy

steady basalt May 23, 2022, 3:59 PM

#

That’s not rly gona work?

thin palm May 23, 2022, 4:00 PM

#

why's that

steady basalt May 23, 2022, 4:00 PM

#

You’re not going to be using true positive style metrics

#

Error is perfect

thin palm May 23, 2022, 4:00 PM

#

oh true true

#

so what other measure?

steady basalt May 23, 2022, 4:00 PM

#

do MSR RMSE uhh

#

MSE*

#

There’s also another error

#

MAR

#

Mae

thin palm May 23, 2022, 4:01 PM

#

oh true true

steady basalt May 23, 2022, 4:01 PM

#

I think rmse is popular

thin palm May 23, 2022, 4:03 PM

#

How do I solve this question?
9. a) Please estimate the RMSE that your model will achieve on the test dataset.

mild dirge May 23, 2022, 4:06 PM

#

without using the test dataset?

#

otherwise it wouldn't be an estimate right?

thin palm May 23, 2022, 4:07 PM

#

I mean the question is a bit confusing to me haha

steady basalt May 23, 2022, 4:09 PM

#

Yeah just estimate rmse post tuning?

#

It will spit out a result

#

But it’s gonna overfit

#

Bit odd

mild dirge May 23, 2022, 4:11 PM

#

You'd probably need to use some validation dataset if you want to estimate your model's performance on the test data

#

Otherwise you can't really estimate what the performance would be on the test data, as you only have the training data available

thin palm May 23, 2022, 4:13 PM

#

mild dirge Otherwise you can't really estimate what the performance would be on the test da...

So I actually do have the X_test file

mild dirge May 23, 2022, 4:14 PM

#

right but that would just straight up give you the rmse, not some estimation of it

thin palm May 23, 2022, 4:14 PM

#

I thought in order to get an estimate was with the X_train,y_train, x_test, y_test and we would use the x_test and y_test to get an estimate

mild dirge May 23, 2022, 4:14 PM

#

Maybe they just want to have the rmse of the test data

thin palm May 23, 2022, 4:14 PM

#

mild dirge Maybe they just want to have the rmse of the test data

hmm how the heck do I get them that?

#

a) Please estimate the RMSE that your model will achieve on the test dataset.
b) How did you create this estimate?

mild dirge May 23, 2022, 4:15 PM

#

you can just give the model the test input, compare the output with the desired output, which gives the actual rmse

#

but they want an estimate

thin palm May 23, 2022, 4:15 PM

#

interesting

mild dirge May 23, 2022, 4:15 PM

#

so a naive solution would be to give the rmse on your training data as an estimate of the rmse on your test data

thin palm May 23, 2022, 4:16 PM

#

right

mild dirge May 23, 2022, 4:16 PM

#

but this would be bad, as it will perform better on the data it is trained on than new data

thin palm May 23, 2022, 4:16 PM

#

mild dirge but this would be bad, as it will perform better on the data it is trained on th...

absolutely

mild dirge May 23, 2022, 4:16 PM

#

so you want some new data, that it is not trained on

thin palm May 23, 2022, 4:16 PM

#

hmm

mild dirge May 23, 2022, 4:16 PM

#

i.e. validation data

thin palm May 23, 2022, 4:16 PM

#

so then take the rmse of the test data?

mild dirge May 23, 2022, 4:16 PM

#

no

#

that's not an estimate

#

that is the rmse on the test data

thin palm May 23, 2022, 4:17 PM

#

mild dirge that *is* the rmse on the test data

how would you approach this?

mild dirge May 23, 2022, 4:18 PM

#

you want to give an as close as possible estimate of your model's performance on the test data, without using the test data

#

Like I said, some data that the model is not trained on, as this would give a biased performance result

#

so Validation data

thin palm May 23, 2022, 4:18 PM

#

mild dirge so Validation data

okay so using cross validation

mild dirge May 23, 2022, 4:19 PM

#

if you haven't left out any validation data, you'd have to retrain your model on less data, and test it out on this validation data that you left out

#

or cross validation could work too

thin palm May 23, 2022, 4:19 PM

#

perfect, well I made a .7 train and .3 test

#

so technically I did this

mild dirge May 23, 2022, 4:19 PM

#

But then it would also be bad if you already used this data for tuning your parameters

thin palm May 23, 2022, 4:20 PM

#

from sklearn.ensemble import RandomForestRegressor

instantiate model

rf = RandomForestRegressor()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1)

examine scores

cv_scores = cross_validate(rf,X_train,y_train, cv=5)

#

rf.fit(X_train, y_train)
rf.score(X_test, y_test)
y_pred = rf.predict(X_test)


from sklearn import metrics
# results of MAE
print(metrics.mean_absolute_error(y_test, y_pred))
# print results of MSE
print(metrics.mean_squared_error(y_test, y_pred))
# print results of RMSE
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))```

mild dirge May 23, 2022, 4:21 PM

#

So don't use the test data

#

Read my messages again if you don't understand it atm

thin palm May 23, 2022, 4:21 PM

#

But now I'm just confused because how the heck do you find out the RMSE without using test data?????

#

and if I use my actualy test.csv file that's not estimating

mild dirge May 23, 2022, 4:22 PM

#

you don't find the rmse, that is not what they want

mild dirge May 23, 2022, 4:22 PM

#

thin palm from sklearn.ensemble import RandomForestRegressor # instantiate model rf = Ran...

Here you split your data into training and testing

thin palm May 23, 2022, 4:22 PM

#

okay right

mild dirge May 23, 2022, 4:22 PM

#

do you have test data beyond this scope?

thin palm May 23, 2022, 4:23 PM

#

mild dirge do you have test data beyond this scope?

I have another file called test.csv that I have not used yet

#

to be used on my model

mild dirge May 23, 2022, 4:23 PM

#

okay, and then you have some other file, that you split into train and test?

thin palm May 23, 2022, 4:23 PM

#

mild dirge okay, and then you have some other file, that you split into train and test?

No I will not use that file unless it's for testing on my model

mild dirge May 23, 2022, 4:23 PM

#

So basically have 2 test sets, one is the actual test set, and one that you created

thin palm May 23, 2022, 4:23 PM

#

yes

mild dirge May 23, 2022, 4:23 PM

#

Okay, so one would be called the validation set normally

#

you have training data/validation data (both from 1 file), and then test data

#

From this you use only the training data for tuning the model, and training it

#

this can be done with k-fold cross validation

#

Then on the validation data you get an estimate of your model's rmse on new data

#

This is then also an estimate of your model's rmse on the test data, as this validation set is data that is also never shown to your model, yet you don't use the actual test data for this estimation

thin palm May 23, 2022, 4:27 PM

#

ok ok got it man, thank you very much!

steady basalt May 23, 2022, 4:27 PM

#

thin palm I thought in order to get an estimate was with the X_train,y_train, x_test, y_te...

Ur splitting ur training data into test set ? U can just make test y

#

Take 20%

mild dirge May 23, 2022, 4:28 PM

#

thin palm ok ok got it man, thank you very much!

You could also just see it as having 2 test set, one you got from your "train data", and the actual test set

steady basalt May 23, 2022, 4:28 PM

#

mild dirge This is then also an estimate of your model's rmse on the test data, as this val...

Why? Why not just use train and test

mild dirge May 23, 2022, 4:28 PM

#

and then you ofc wouldn't use the test set from the "training data" for training

mild dirge May 23, 2022, 4:28 PM

#

steady basalt Why? Why not just use train and test

Because they want an estimate of the rmse on the test set they give

#

but an estimate implies that they don't want you to use the test set

steady basalt May 23, 2022, 4:29 PM

#

I think it does

#

In the end it’s always an estimate

mild dirge May 23, 2022, 4:29 PM

#

then it would be the actual rmse

#

not if you specify that it is on the test data

#

it would be an estimate of the rmse on new data

steady basalt May 23, 2022, 4:29 PM

#

Why would you use validation set

mild dirge May 23, 2022, 4:29 PM

#

but it would be the rmse on the test data

steady basalt May 23, 2022, 4:30 PM

#

I cross validate on training set

mild dirge May 23, 2022, 4:30 PM

#

Because you want an unbiased estimate, so you can't use the data you use for tuning params

thin palm May 23, 2022, 4:30 PM

#

I thought after cross validation this gives you an idea of what your model score is. Once I do that I FIT my model and then score it on x_test and y_test

#

thus giving me my RMSE Scores

steady basalt May 23, 2022, 4:30 PM

#

U don’t use training anyway so what’s the difference

#

Oh

#

I’m not sure how much it impacts the final resul

mild dirge May 23, 2022, 4:35 PM

#

Here's my professional diagram

#

Maybe this clarifies a bit of the confusion

stark breach May 23, 2022, 4:37 PM

#

Hey need help on a project

#

Is anyone ready to help .?

#

It’s based on linear regression

mild dirge May 23, 2022, 4:37 PM

#

Maybe if you ask your question

stark breach May 23, 2022, 4:38 PM

#

Yeah so I built this model on a car resale dataset and I just want to decrease the bic

#

Also my model has like 30 features and if. I reduce them my efficiency falls

#

Should I share the file ?

steady basalt May 23, 2022, 4:44 PM

#

mild dirge Maybe this clarifies a bit of the confusion

That’s what I said, train and test no validation set, those are made temporarily using a sklearn function

mild dirge May 23, 2022, 4:44 PM

#

yeah, but 2 test sets

steady basalt May 23, 2022, 4:45 PM

#

I Didn’t bother before

mild dirge May 23, 2022, 4:46 PM

#

well you'd need an extra one if you'd want an estimate of the rmse on the test.csv data, without using test.csv

#

but normally you wouldn't

#

But maybe they just mean the actual rmse of the model on the test.csv data, it would be kinda unclear

robust jungle May 23, 2022, 5:12 PM

#

this line is giving me an error:

outputs=(layers.Dense(output_nodes, activation=tf.nn.softmax))

code block:

def create_model(input_shape, output_nodes):
    model = keras.Model(
        (layers.Conv2D(filters=64, kernel_size=3, input_shape=input_shape[1:], activation='relu')),
        (layers.MaxPooling2D(2, 2)),
        (layers.Conv2D(filters=64, kernel_size=3, input_shape=input_shape[1:], activation='relu')),
        (layers.MaxPooling2D(2, 2)),
        (layers.Conv2D(filters=64, kernel_size=3, input_shape=input_shape[1:], activation='relu')),
        (layers.MaxPooling2D(2, 2)),
        (layers.Flatten()),
        (layers.Dense(64)),
        outputs=(layers.Dense(output_nodes, activation=tf.nn.softmax)),
        inputs=(tf.keras.Input(shape=(input_shape))),
    )
    return(model)

error:

TypeError: __init__() got multiple values for argument 'outputs'

first tensorflow project not directly using a tutorial, so apologies if it's a bit scuffed

mild dirge May 23, 2022, 5:15 PM

#

the problem is in how you created the class instance

#

you supplied too many arguments for a single parameter

robust jungle May 23, 2022, 5:16 PM

#

mild dirge you supplied too many arguments for a single parameter

I get that from the error, but im more confused on how that might be happening

#

because tf.keras.Input(shape=(input_shape)) is in parentheses

#

so if it was to return multiple values

#

wouldn't they still be put into a single tuple?

mild dirge May 23, 2022, 5:17 PM

#

(input_shape) is not a tuple btw

robust jungle May 23, 2022, 5:18 PM

#

ah, what is it

mild dirge May 23, 2022, 5:18 PM

#

it evaluates to input_shape

#

(input_shape,) is a tuple

robust jungle May 23, 2022, 5:18 PM

#

input_shape is a variable I have defined

#

it's a tuple

mild dirge May 23, 2022, 5:18 PM

#

why are you doing shape=(input_shape) then?

#

why the brackets

robust jungle May 23, 2022, 5:19 PM

#

no idea, probably did it because it looks better imo

#

deleted it

#

(the parentheses I mean)

#

but yeah specifically looking at this line:

outputs=(layers.Dense(output_nodes, activation=tf.nn.softmax))

#

print returns a single object as expected

mild dirge May 23, 2022, 5:23 PM

#

this eror normally comes up when you supply the argument as a positional argument, and then as a keyword argument

#

!e

def my_func(key, *args):
  print("blabla")

my_func('red', key='blue')

arctic wedgeBOT May 23, 2022, 5:24 PM

#

@mild dirge :x: Your eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 4, in <module>
003 | TypeError: my_func() got multiple values for argument 'key'

mild dirge May 23, 2022, 5:24 PM

#

like this

robust jungle May 23, 2022, 5:24 PM

#

alright, thanks

mild dirge May 23, 2022, 5:24 PM

#

So the problem is likely not that line, but one of the others

#

I'm more familiar with pytorch, so don't know what arguments it takes :/

robust jungle May 23, 2022, 5:25 PM

#

it's fine, you seem to have found the issue

#

I appreciate it

tardy pelican May 23, 2022, 5:34 PM

#

Hey, got this error, my drivers are updated, version is compatible

#

Used command from pytorch website to install pytorch

#

woven coral May 23, 2022, 6:29 PM

#

#

whats wrong???

thin palm May 23, 2022, 7:30 PM

#

Does this make sense for feature importances in a linear model for coef??

Screen_Shot_2022-05-23_at_1.30.21_PM.png

steady basalt May 23, 2022, 7:31 PM

#

No

#

Have you tried RFE

thin palm May 23, 2022, 7:32 PM

#

steady basalt Have you tried RFE

what's RFE?

#

so this graph for feature importance is wrong correct?

thin palm May 23, 2022, 7:35 PM

#

steady basalt Have you tried RFE

I believe this is only with random forest though, right? I need to test this for LinearRegression

#

@steady basalt how about now

Screen_Shot_2022-05-23_at_1.38.54_PM.png

mild dirge May 23, 2022, 7:39 PM

#

seem like very normal scores again 👀

thin palm May 23, 2022, 7:40 PM

#

mild dirge seem like very normal scores again 👀

is that bad or good 😦

mild dirge May 23, 2022, 7:40 PM

#

features that change your score by 100,000,000,000 seems pretty bad

thin palm May 23, 2022, 7:41 PM

#

it's just in my dang linear model

mild dirge May 23, 2022, 7:41 PM

#

There might be some other underlying issue

#

just seems weird

thin palm May 23, 2022, 7:42 PM

#

but what could the underlying issue be?

#

Maybe this is an issue of over fitting?

mild dirge May 23, 2022, 7:48 PM

#

No I doubt it, those values don't really make sense, why would permutating any of your values result in values that are 10^11 lower than what they were

thin palm May 23, 2022, 7:50 PM

#

I really don’t know because I’m going back through my process and everything seems to make sense

#

@mild dirge I’ve never seen such high values before in coefficient so I’m like what the heck

mild dirge May 23, 2022, 7:52 PM

#

There seem to be some more normally valued features

#

maybe try to find the difference between these and the 10^11 score affecting features

thin palm May 23, 2022, 7:54 PM

#

hmmm, just for time sake I may not include this aspect and only show feature importance of my random forest which makes more sense lol

mild dirge May 23, 2022, 7:55 PM

#

What are you doing all this for btw?

#

some job application right?

scenic tulip May 23, 2022, 7:56 PM

#

Hey guys I had a quick question. Working on gathering data to put into my neural network. I have files, containing lists of 20 elements. I want to keep track of the "trend" in the data, so if i had 2 arrays 1.) [ 1, 2 ] 2.) [ 2, 3] the trend would be [ +1, +1 ]. For some reason I can't seem to store the result to add to the next 2 arrays differences. Here's my code.

#

https://paste.pythondiscord.com/rozikafixe

#

I guess I'm asking. How do you store the difference of the current 2 arrays im using, then apply it to the next 2 arrays difference?

mild dirge May 23, 2022, 7:58 PM

#

Not sure what you mean exactly, but for time dependent data some initial approach would be rolling regression

#

@scenic tulip

scenic tulip May 23, 2022, 7:58 PM

#

I'm not even to applying it to the NN yet

thin palm May 23, 2022, 7:58 PM

#

mild dirge some job application right?

Yes

scenic tulip May 23, 2022, 7:59 PM

#

If you see my code, you'll see what im trying to do.

mild dirge May 23, 2022, 8:00 PM

#

So you have multiple arrays like:

[1, 2, 3]
[4, 6, 2]
[1, 9, 2]

and then you want difference arrays:

[3, 4, -1]
[-3, 3, 0]

?

mild dirge May 23, 2022, 8:00 PM

#

thin palm Yes

What kinda job?

scenic tulip May 23, 2022, 8:01 PM

#

Yeah, i've accomplished getting the difference of the arrays, but how to do hold that value and then apply it to the next two arrays that get differenced

mild dirge May 23, 2022, 8:01 PM

#

"apply it to the next two arrays"

#

you have an array with differences, and two arrays with values, what do you mean apply

scenic tulip May 23, 2022, 8:01 PM

#

[1, 2, 3]
[4, 6, 2]
[1, 9, 2]

#

ok 123 is first array

#

difference would be like you stated 3, 4, -1

#

between array 1 and array 2

#

now

#

I have that difference, how could i keep that data going by grabbing 4,6,2 and 1, 9 , 2s difference and adding it to the first difference

mild dirge May 23, 2022, 8:03 PM

#

So let me first show you how to make the differences a lot easier

#

and then i'll show you how to easily do that second part

wooden sail May 23, 2022, 8:04 PM

#

you can rewrite it as a multiplication by a single row vector from the left

mild dirge May 23, 2022, 8:05 PM

#

!e

import numpy as np
values = np.array([
  [1, 2, 3],
  [4, 6, 2],
  [1, 9, 2]
])

differences = np.array([arr_b-arr_a for arr_a, arr_b in zip(values, values[1:])])
print(differences)
sum_of_differences = np.sum(differences, axis=0)
print(sum_of_differences)

#

oops 1 sec

wooden sail May 23, 2022, 8:05 PM

#

should be the same as [-1,0,1] multiplied from the left

arctic wedgeBOT May 23, 2022, 8:05 PM

#

@mild dirge :white_check_mark: Your eval job has completed with return code 0.

001 | [[ 3  4 -1]
002 |  [-3  3  0]]
003 | [ 0  7 -1]

mild dirge May 23, 2022, 8:05 PM

#

like this?

#

so 0 7 -1 would be the sum of the differences*

wooden sail May 23, 2022, 8:08 PM

#

In [1]: import numpy as np
   ...: values = np.array([
   ...:   [1, 2, 3],
   ...:   [4, 6, 2],
   ...:   [1, 9, 2]
   ...: ])

In [2]: diffs = np.array([[-1,1,0],[0,-1,1]])

In [3]: diffs.dot(values)
Out[3]: 
array([[ 3,  4, -1],
       [-3,  3,  0]])

In [4]: adder = np.array([1,1])

In [5]: adder.dot(diffs.dot(values))
Out[5]: array([ 0,  7, -1])

In [7]: adder_diff = adder.dot(diffs)

In [8]: adder_diff
Out[8]: array([-1,  0,  1])

In [9]: adder_diff.dot(values)
Out[9]: array([ 0,  7, -1])

#

as corroboration. all you need to do is multiply the vector [-1,0,1] to get the sum of the differences. if you want to apply this to N matrices of values at the same time, concatenate them along the columns axis. the result will be a vector of size 3*N of differences

scenic tulip May 23, 2022, 8:22 PM

#

hey im catching up on what you guys said, my neighbor needed help. sorry one sec

#

oh, wow that is cool never thought to use dot product. I was so caught up in handling the specific arrays rather than the entire dataset at one time.

#

so since im dealing with lists i should convert them to np arrays?

mild dirge May 23, 2022, 8:26 PM

#

yeah both our methods use numpy arrays

scenic tulip May 23, 2022, 8:27 PM

#

oh i guess i have to to use dot produ

#

ct

mild dirge May 23, 2022, 8:27 PM

#

edd uses some mathematical tricks, I mostly use python tricks

scenic tulip May 23, 2022, 8:27 PM

#

yeah im still new ish to python so, id rather learn those hehe

mild dirge May 23, 2022, 8:27 PM

#

you can pick whichever one you find more intuitive

scenic tulip May 23, 2022, 8:27 PM

#

thank you all much. I'll try some things and get back to ya with (hopefully) good results

wooden sail May 23, 2022, 8:27 PM

#

In [10]: values2 = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [11]: vals_concat = np.concatenate((values,values2), axis=1)

In [12]: vals_concat
Out[12]: 
array([[1, 2, 3, 1, 2, 3],
       [4, 6, 2, 4, 5, 6],
       [1, 9, 2, 7, 8, 9]])

In [13]: adder_diff.dot(vals_concat)
Out[13]: array([ 0,  7, -1,  6,  6,  6])

to finish the example

#

and indeed, however you like. numpy as fast though, and it should scale really well with this type of operation

scenic tulip May 23, 2022, 8:28 PM

#

to be honest the scope of my project is to take in Keno Drawings and identify a trend in the drawings since the beginning. then use that data in my NN

steady basalt May 23, 2022, 8:28 PM

#

thin palm what's RFE?

Recursive feature elimination

wooden sail May 23, 2022, 8:28 PM

#

the trick is only that the operation you wanted can be written as a linear combination, and matrix multiplication is commutative. so the whole thing can be done in a single product

scenic tulip May 23, 2022, 8:28 PM

#

to predict future trends

mild dirge May 23, 2022, 8:29 PM

#

If you want to predict future trends, you don't need to calculate all the differences

#

if you train a rolling regression model it will already make some prediction on future values if you feed it new data

scenic tulip May 23, 2022, 8:30 PM

#

im going back from the very beginning of when the drawings started

steady basalt May 23, 2022, 8:31 PM

#

Could a neural network predict the universe; can I predict the euro millions

scenic tulip May 23, 2022, 8:31 PM

#

i mean, the same array of 20 PRNG elements should never come up again in our life times so. That's one point, but I've found the mean, median, modes variance and std deviation of all drawing s that exist

#

i think if i could see how the drawings fluctuate and correlate that with all the other simple maths i could have something

steady basalt May 23, 2022, 8:32 PM

#

I think you’d need to get an opinion from the Japanese dude who invented the twister you need some serious math expertise

scenic tulip May 23, 2022, 8:32 PM

#

LOL yeah

steady basalt May 23, 2022, 8:32 PM

#

To rule out if this is egen feasabile

#

I don’t think it is

#

How do you know that they use a random seed

scenic tulip May 23, 2022, 8:32 PM

#

nothing is impossible

steady basalt May 23, 2022, 8:32 PM

#

The same one each time

scenic tulip May 23, 2022, 8:33 PM

#

oh ive moved beyond that lol

steady basalt May 23, 2022, 8:33 PM

#

What other approach can you do now

#

There’s nothing

#

It’s akin to predicting the future to the highest accuracy

scenic tulip May 23, 2022, 8:33 PM

#

one thing i have noticed from data that ive gathered is most drawings actually average out to between 35 and 46

#

so i started trying to generate random arrays of 20 elements that fit that average range

steady basalt May 23, 2022, 8:33 PM

#

Are you a mathematician

#

Ur gona need a team

#

Of top tier researchers

scenic tulip May 23, 2022, 8:34 PM

#

im gonna need divine intervention

steady basalt May 23, 2022, 8:34 PM

#

And millions of dollars

scenic tulip May 23, 2022, 8:34 PM

#

yeah

#

well

steady basalt May 23, 2022, 8:34 PM

#

Even still it isn’t possible

scenic tulip May 23, 2022, 8:34 PM

#

no not millions, just will power

#

well, my NN should be able to do that work of mathmaticians if i can find the right data and plug it in

steady basalt May 23, 2022, 8:36 PM

#

No you’d need the worlds best statisticians and mathematicians

#

To know what to do

#

What are you trying to do it?

scenic tulip May 23, 2022, 8:36 PM

#

https://github.com/StoreDrone1/Keno_Guesser.git

GitHub

GitHub - StoreDrone1/Keno_Guesser: Use a neural network to guess th...

Use a neural network to guess the next numbers. Contribute to StoreDrone1/Keno_Guesser development by creating an account on GitHub.

steady basalt May 23, 2022, 8:36 PM

#

Did you make it

scenic tulip May 23, 2022, 8:37 PM

#

I did

#

its been months in the making

#

if you run get keno numbers prepare to wait, there's like 1.6 million drawings now of 20 numbers

steady basalt May 23, 2022, 8:38 PM

#

I don’t really understand how one calculation can spit out the future values

scenic tulip May 23, 2022, 8:38 PM

#

its a combination of multiple calculations, gathering all of them, and figuring out how to plug them into the NN

steady basalt May 23, 2022, 8:38 PM

#

Explain it

scenic tulip May 23, 2022, 8:38 PM

#

explain what?

#

how the process im using works?

steady basalt May 23, 2022, 8:38 PM

#

What are the calcuLtions

scenic tulip May 23, 2022, 8:39 PM

#

Mean, median, mode, variance, and std deviation

steady basalt May 23, 2022, 8:39 PM

#

How can they predict a future array

scenic tulip May 23, 2022, 8:39 PM

#

for 1, no drawing will ever appear twice

steady basalt May 23, 2022, 8:39 PM

#

Really?

scenic tulip May 23, 2022, 8:39 PM

#

yep

steady basalt May 23, 2022, 8:39 PM

#

Rules?

scenic tulip May 23, 2022, 8:39 PM

#

well, not for quite a long time

#

80 numbers

#

20 get selected at random

#

you get to pick up to 10 numbers

#

numbers being 1 - 80

steady basalt May 23, 2022, 8:40 PM

#

I want to know how summary statistics of draws can predict future draws

#

Because when it happens next time it’s random

scenic tulip May 23, 2022, 8:41 PM

#

correct, but if you had a '1' show up 3 times in a row, i would bet it wont next time?

steady basalt May 23, 2022, 8:41 PM

#

What’s the prize for the winning draw

scenic tulip May 23, 2022, 8:41 PM

#

matching 10 numbers is 110k

steady basalt May 23, 2022, 8:41 PM

#

Is this a popular lottery

scenic tulip May 23, 2022, 8:41 PM

#

it is

steady basalt May 23, 2022, 8:41 PM

#

I think if you managed to cheat every time you’d be given a Nobel prize

#

No joke

iron basalt May 23, 2022, 8:41 PM

#

scenic tulip correct, but if you had a '1' show up 3 times in a row, i would bet it wont next...

Why would you bet it won't next time? Does the probability decrease each time? If so, how?

steady basalt May 23, 2022, 8:41 PM

#

I don’t think it’s mathematically possible

scenic tulip May 23, 2022, 8:42 PM

#

im actually not trying to cheat it. i want to see if they control the random factor or if it's actually random

steady basalt May 23, 2022, 8:42 PM

#

If it’s using mersenne twister it’s randomised well enough

#

You just can’t predict that for 20 number array matching

scenic tulip May 23, 2022, 8:42 PM

#

because the notion of their being 80 numbers to randomly select that you selected 1 of them 3 times in a row isn't probable that it will be selected again

#

mersenne twisters are psuedo random, not random

steady basalt May 23, 2022, 8:42 PM

#

😅

#

I’m gona need some equations

iron basalt May 23, 2022, 8:43 PM

#

https://en.wikipedia.org/wiki/Gambler's_fallacy

Gambler%27s_fallacy

scenic tulip May 23, 2022, 8:43 PM

#

look at my repo

#

it's got all the simple maths in it

steady basalt May 23, 2022, 8:43 PM

#

iron basalt https://en.wikipedia.org/wiki/Gambler%27s_fallacy

Funnily this is a concept I’ve always struggled with. In school we’re taught about how two flips landing heads is 0.25

#

And yet each flip is independant!

scenic tulip May 23, 2022, 8:44 PM

#

depends on how many flipls you are allowed

#

if you are allowed 4 flips?

iron basalt May 23, 2022, 8:44 PM

#

If I flip a fair coin 3 times and get 3 heads, it does not mean that next must be a tails, nor a heads again. They are independent events.

steady basalt May 23, 2022, 8:44 PM

#

Yes that’s true

scenic tulip May 23, 2022, 8:44 PM

#

correct, but the odds are

steady basalt May 23, 2022, 8:44 PM

#

But it’s annoying how chain flips are less likely

#

At the same time

scenic tulip May 23, 2022, 8:44 PM

#

that it will be the other, the universe has to balance itself out at some point

iron basalt May 23, 2022, 8:44 PM

#

scenic tulip that it will be the other, the universe has to balance itself out at some point

This is literally gambler's fallacy.

#

It's a fallacy for a reason.

#

Often described as "the slot machine owes me".

steady basalt May 23, 2022, 8:45 PM

#

You can try to probabilistically predict this lottery but I think it is a inhuman task unles you had entire research teams

scenic tulip May 23, 2022, 8:45 PM

#

Keno is the only lottery in Ohio that is PRNG

iron basalt May 23, 2022, 8:45 PM

#

PRNG huh.

scenic tulip May 23, 2022, 8:45 PM

#

the rest get selected with the floating balls

iron basalt May 23, 2022, 8:45 PM

#

Oh, well the floating balls are not PRNG.

steady basalt May 23, 2022, 8:46 PM

#

Can anyone explain how odds change with prng

scenic tulip May 23, 2022, 8:46 PM

#

yes it is PRNG, that's why initially i wanted to find the mersenne value of one of the drawings and compare it to the next

steady basalt May 23, 2022, 8:46 PM

#

Compared to balls

wooden sail May 23, 2022, 8:46 PM

#

if the repetition cycle of the prng is long, you'd need the statistics of hundreds of thousands of games in the past to find a pattern

scenic tulip May 23, 2022, 8:46 PM

#

yes i know edd, i ran my code on a dummy array till ihit my 4.29 billion bit limit lol

wooden sail May 23, 2022, 8:46 PM

#

a prng can still appear uniformly distributed over moderately long sample sizes

scenic tulip May 23, 2022, 8:46 PM

#

2 ^^ 32

#

yeah, so if they used seed value 1, then i could find it if i itereated over it till i found it.

steady basalt May 23, 2022, 8:47 PM

#

How?

#

Wait a second

#

Don’t you pay per entry

iron basalt May 23, 2022, 8:47 PM

#

You know which PRNG method they are using?

wooden sail May 23, 2022, 8:47 PM

#

what if they hit you with a 1-time pad and they change the seed each time

steady basalt May 23, 2022, 8:47 PM

#

Per iteration

scenic tulip May 23, 2022, 8:47 PM

#

I do knot

#

not

delicate apex May 23, 2022, 8:47 PM

#

wooden sail if the repetition cycle of the prng is long, you'd need the statistics of hundre...

depends on the alg. mersenne twister can be cracked with less than 1000, and in some cases, less than 100

steady basalt May 23, 2022, 8:47 PM

#

You’d need to pay 4 billion times?

scenic tulip May 23, 2022, 8:47 PM

#

no LOL

iron basalt May 23, 2022, 8:47 PM

#

Do you know if they change the seed every once in a while?

steady basalt May 23, 2022, 8:47 PM

#

delicate apex depends on the alg. mersenne twister can be cracked with less than 1000, and in ...

Can you explain

scenic tulip May 23, 2022, 8:48 PM

#

I do not, i haven't found it yet

wooden sail May 23, 2022, 8:48 PM

#

delicate apex depends on the alg. mersenne twister can be cracked with less than 1000, and in ...

aight, i'll read up on it. i'm not familiar with the method, i was just offering a word of caution

scenic tulip May 23, 2022, 8:48 PM

#

yeah i tried to do bit shifting with the mersenne....kinda like a reverse mersenne twister

iron basalt May 23, 2022, 8:48 PM

#

If they just random the seed, then there is no way. The seeds are often done with not PRNG.

scenic tulip May 23, 2022, 8:48 PM

#

problem is you can only do that with single int values

iron basalt May 23, 2022, 8:49 PM

#

But if they kept using the same one for a long time, they might be dumb. Lotteries in the past have done dumber things.

scenic tulip May 23, 2022, 8:49 PM

#

i speculated that they used system time as a seed too

steady basalt May 23, 2022, 8:49 PM

#

@scenic tulip considered paying star and math PhDs to help?

scenic tulip May 23, 2022, 8:49 PM

#

lol no i want to do this

delicate apex May 23, 2022, 8:49 PM

#

steady basalt Can you explain

i'm not familiar with the cracking process, but mersenne twister is a very common random number generator that's not very good at producing highly random numbers. this doesn't matter in quite a few fields, however, so it's still used just about everywhere.

scenic tulip May 23, 2022, 8:50 PM

#

i sat at the bar and watched the drawings. when the current drawing ends a 2 minute timer starts. I ran my code that used system time as a seed and generated PRNG arrays up until 1 minute. I compared my results with the following drawing to see if a certain time within that range was being used. Some arrays matched alot, others not so much

#

it was difficult to tell if that was actually getting me anywhere other than more tipsy 😂

#

and i wanna be clear about this. lets say i found that mersenne value that could repeatedly win that would be "breaking" the game and is illegal. Im just trying to validate they are actually random and not done by someone after everyone has put their drawings in ....not trying to break laws lol

steady basalt May 23, 2022, 8:53 PM

#

delicate apex i'm not familiar with the cracking process, but mersenne twister is a very commo...

So with one hundred generations you can confidently predict next array?

scenic tulip May 23, 2022, 8:53 PM

#

that is the theory super

#

sorry probably a dumb idea but thank you all for helping anyways

steady basalt May 23, 2022, 9:48 PM

#

can anyone help me with STATA

#

if theres any pro users

rugged falcon May 23, 2022, 11:15 PM

#

is anyone here who has ever done something with CUDA (or more precisely cupy)?

serene scaffold May 23, 2022, 11:32 PM

#

rugged falcon is anyone here who has ever done something with CUDA (or more precisely cupy)?

you have to ask your actual question, not if anyone has used a certain library.

rugged falcon May 23, 2022, 11:32 PM

#

ok

thin palm May 23, 2022, 11:33 PM

#

How do I answer this question?
Please estimate the RMSE that your model will achieve on the test dataset.

rugged falcon May 23, 2022, 11:33 PM

#

well: i was thinking about numpy/numba on very big images doing simple arithmethic operations is very slow

#

so when i have 20 cores

#

i can just use 5800 cores of my GPU

serene scaffold May 23, 2022, 11:34 PM

#

yes, AI that involves images is often faster on a GPU

rugged falcon May 23, 2022, 11:35 PM

#

so i looked into using CUDA and found cupy

#

i found this article:
https://towardsdatascience.com/heres-how-to-use-cupy-to-make-numpy-700x-faster-4b920dda1f56

Medium

Here’s how to use CuPy to make Numpy 700X faster

It’s time for some GPU power!

serene scaffold May 23, 2022, 11:35 PM

#

what GPU do you have

rugged falcon May 23, 2022, 11:35 PM

#

i redid the exact same example this guy did

#

3070

#

and where he got a x10 increase in speed

#

i got 0.5x

#

so cupy is slower

serene scaffold May 23, 2022, 11:37 PM

#

can you make a reproducible example, with every variable defined?

rugged falcon May 23, 2022, 11:37 PM

#

u mean code example?

serene scaffold May 23, 2022, 11:37 PM

#

yes. I happen to also have a 3070, so I can run it

rugged falcon May 23, 2022, 11:37 PM

#

ok one sec

serene scaffold May 23, 2022, 11:37 PM

#

but it has to be exact code that I can run without any changes

rugged falcon May 23, 2022, 11:38 PM

#

well

#

i assume you have cupy?

serene scaffold May 23, 2022, 11:38 PM

#

yes, anyone can download cupy at any time

rugged falcon May 23, 2022, 11:38 PM

#

ok one sec

serene scaffold May 23, 2022, 11:38 PM

#

remember to use markdown

#

!code

arctic wedgeBOT May 23, 2022, 11:38 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

rugged falcon May 23, 2022, 11:39 PM

#

actually its not exactly the same as on website because he use 1000x1000x1000 float64 and this does not fit in our GPU

#

so we compare 950^3

#

import numpy as np
import cupy as cp
import time

shaper = 950
arrShape = (shaper,shaper,shaper)

#NUMPY
time0 = time.perf_counter()
arr = np.ones((arrShape))
time1 = time.perf_counter()
print(time1-time0)

#CUPY
time0 = time.perf_counter()
arr = cp.ones((arrShape))
time1 = time.perf_counter()
print(time1-time0)

serene scaffold May 23, 2022, 11:48 PM

#

still working on it

rugged falcon May 23, 2022, 11:49 PM

#

hm what u mean?

serene scaffold May 23, 2022, 11:49 PM

#

the installation for cupy isn't as straightforward as intended

rugged falcon May 23, 2022, 11:49 PM

#

take ur time all good. im just happy u help me xd

rugged falcon May 23, 2022, 11:50 PM

#

serene scaffold the installation for cupy isn't as straightforward as intended

my first try got disrupted by windows11-upgrade assasination attempt too

#

x)

serene scaffold May 23, 2022, 11:53 PM

#

rugged falcon my first try got disrupted by windows11-upgrade assasination attempt too

I'm reinstalling the cuda drivers 😛

harsh nexus May 23, 2022, 11:57 PM

#

Anyone experienced with LDA? Got some issues with the topics (numbering?) and the words are not corresponding with the correct topic number. Using Genism

serene scaffold May 24, 2022, 12:01 AM

#

I have to reboot my computer, apparently

#

woo I can import cupy now

harsh nexus May 24, 2022, 12:04 AM

#

Sorry for interrupting :))

late peak May 24, 2022, 12:06 AM

#

Hey I've got a Pandas question in #help-lemon if anybody's got some bandwidth

serene scaffold May 24, 2022, 12:06 AM

#

@rugged falcon behold the results.

In [9]: import cupy

In [10]: %timeit cupy.random.random((500, 500, 500))
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached.
18.8 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: import numpy as np

In [12]: %timeit np.random.random((500, 500, 500))
601 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

rugged falcon May 24, 2022, 12:08 AM

#

so how is that possible?

serene scaffold May 24, 2022, 12:09 AM

#

rugged falcon so how is that possible?

how is it possible that cupy is faster?

rugged falcon May 24, 2022, 12:09 AM

#

no

#

that is what i would expec tfrom a task that cuda is designed for

serene scaffold May 24, 2022, 12:10 AM

#

so what is your question

rugged falcon May 24, 2022, 12:10 AM

#

how come mines not working

serene scaffold May 24, 2022, 12:10 AM

#

idk

rugged falcon May 24, 2022, 12:10 AM

#

we have same Hardware

#

prolly same vram not that it matters with 500**3

#

how can i recreate what u did with %timeit

#

this should be equivalent to what you did @serene scaffold or am i missing something?

import time

import cupy
timecp0 = time.perf_counter()
cupy.random.random((500, 500, 500))
timecp1 = time.perf_counter()

import numpy as np
timenp0 = time.perf_counter()
np.random.random((500, 500, 500))
timenp1 = time.perf_counter()

print(timecp1-timecp0)
print(timenp1-timenp0)

0.186s for cupy
0.46s for numpy

so our numpy is similar but your cupy is 10x faster. what am i missing?

serene scaffold May 24, 2022, 12:27 AM

#

rugged falcon this should be equivalent to what you did <@253696366952316929> or am i missing ...

I used IPython, which can run it a bunch of times to get a better read

#

python -m pip install IPython
python -m IPython

#

you can run that if you want to use IPython in a shell.

rugged falcon May 24, 2022, 12:28 AM

#

but this has nothing to do with Ipython having no GIP right?

serene scaffold May 24, 2022, 12:28 AM

#

IPython is just a console. the %timeit command will run a statement a bunch of times and report the average

#

which is more reliable than what you're doing.

rugged falcon May 24, 2022, 12:30 AM

#

im so confused

#

!e
print(18.8*1e-6,"s")

arctic wedgeBOT May 24, 2022, 12:31 AM

#

@rugged falcon :white_check_mark: Your eval job has completed with return code 0.

1.88e-05 s

rugged falcon May 24, 2022, 12:33 AM

#

ok i implemented "run a statement a bunch of times"

#

i end up with 400µs, so yours is still 20x faster

rugged falcon May 24, 2022, 12:34 AM

#

serene scaffold IPython is just a console. the `%timeit` command will run a statement a bunch of...

i thought ironpython is python in C

#

*C#

serene scaffold May 24, 2022, 12:42 AM

#

rugged falcon i thought ironpython is python in C

this statement seems to be completely unrelated to anything we've discussed.

#

oh

#

IPython is not ironpython, whatever that is

#

IPython is the whole name of it.

rugged falcon May 24, 2022, 12:43 AM

#

oh

#

@serene scaffold

In [5]: %timeit cupy.random.random((500, 500, 500))
The slowest run took 8.97 times longer than the fastest. This could mean that an intermediate result is being cached.
14.3 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit np.random.random((500, 500, 500))
489 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

serene scaffold May 24, 2022, 12:46 AM

#

rugged falcon <@253696366952316929> ```py In [5]: %timeit cupy.random.random((500, 500, 500))...

so cupy is faster

#

woo

rugged falcon May 24, 2022, 12:46 AM

#

so ur results are reproducabel

#

why would i ever not work in this IPython thing if its faster then without?

rugged falcon May 24, 2022, 12:47 AM

#

serene scaffold so cupy is faster

yes its so much faster (as i have hoped / otherwise i download so much for nothing xd)

serene scaffold May 24, 2022, 12:47 AM

#

rugged falcon why would i ever not work in this IPython thing if its faster then without?

IPython isn't making it faster

#

it's just telling you how fast it is.

rugged falcon May 24, 2022, 12:47 AM

#

so if ipython is not making anything faster

#

then why time.perf_counter() has

#

how you say it

serene scaffold May 24, 2022, 12:48 AM

#

(mean ± std. dev. of 7 runs, 1 loop each)
it did it seven times and reported the average

rugged falcon May 24, 2022, 12:48 AM

#

"right to exist"?

serene scaffold May 24, 2022, 12:48 AM

#

rugged falcon "right to exist"?

I haven't used it, so idk

rugged falcon May 24, 2022, 12:50 AM

#

for i in range(0,10):
    
    timecp0 = time.perf_counter()
    cp.random.random((500, 500, 500))
    timecp1 = time.perf_counter()


    
    timenp0 = time.perf_counter()
    np.random.random((500, 500, 500))
    timenp1 = time.perf_counter()

    print(timecp1-timecp0)
    print(timenp1-timenp0)

@serene scaffold wouldnt u agree it technically should do exactly the same (except summing/averaging)

serene scaffold May 24, 2022, 12:50 AM

#

rugged falcon ```py for i in range(0,10): timecp0 = time.perf_counter() cp.random...

the fact that it doesn't average it is a big deal

rugged falcon May 24, 2022, 12:50 AM

#

from the doc it states:
4.665306263360271e-07 2,143,482
for resolution / tickrate

rugged falcon May 24, 2022, 12:50 AM

#

serene scaffold the fact that it doesn't average it is a big deal

but the minimum is still manifold higher then Ipython %timeit

serene scaffold May 24, 2022, 12:51 AM

#

rugged falcon but the minimum is still manifold higher then Ipython %timeit

I don't really have anything to offer about time.perf_counter

rugged falcon May 24, 2022, 12:52 AM

#

ok i will google this behavior

#

one mor qeustion tho if you dont mind

#

can you think of a reason why the first execution is always so slow?

serene scaffold May 24, 2022, 12:53 AM

#

it might be that cupy does additional startup when its used for the first time, rather than eagerly when it's first imported.

rugged falcon May 24, 2022, 12:54 AM

#

hmm okay so some magic goign on like always

#

thanks for comparison/answer/help!

#

!close

#

dxd

serene scaffold May 24, 2022, 12:55 AM

#

I'm doing another test.

#

this seems to support my theory

In [1]: import cupy

In [2]: _ = cupy.random.random((100, 100, 100))

In [3]: %timeit cupy.random.random((100, 100, 100))
57.9 µs ± 706 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

rugged falcon May 24, 2022, 1:00 AM

#

i dont really believe it tbh

#

cupy advertises itself with several x up to 800x speed increases

#

and you get 16000x

serene scaffold May 24, 2022, 1:01 AM

#

you have to look at what operations they did to measure that speedup

#

simply allocating an array is just one thing

rugged falcon May 24, 2022, 1:02 AM

#

yes but look at what u did

#

random.random

serene scaffold May 24, 2022, 1:02 AM

#

so?

rugged falcon May 24, 2022, 1:03 AM

#

this implies all array entries being initialized as random value

serene scaffold May 24, 2022, 1:03 AM

#

it doesn't imply anything. it just means that each element is filled as a random float between 0 and 1.

rugged falcon May 24, 2022, 1:03 AM

#

yes

serene scaffold May 24, 2022, 1:03 AM

#

that's not implied; it's the spec for that function

rugged falcon May 24, 2022, 1:04 AM

#

yes yes sry english xd

#

i can assume that ones() is even easier because it skips RNG part?

#

or faster

serene scaffold May 24, 2022, 1:05 AM

#

I guess

rugged falcon May 24, 2022, 1:06 AM

#

so
https://towardsdatascience.com/heres-how-to-use-cupy-to-make-numpy-700x-faster-4b920dda1f56
a possibility exist that this author did some major wrong?

#

because when i follow what u did (with not only doing it once, but doing it few times with %timeit

#

i get not 10x increase

#

but way better increase

serene scaffold May 24, 2022, 1:12 AM

#

Idk if TDS has quality control

late peak May 24, 2022, 1:38 AM

#

Hey I've got a statistical question in #help-candy

stone pollen May 24, 2022, 2:19 AM

#

for a beginner starting out in python (kinda know the basics) and wanting to do AI,ML,DS in the future. what would you guys suggest as good beginner courses for that

barren wedge May 24, 2022, 2:59 AM

#

How to make PyTorch faster?

harsh nexus May 24, 2022, 4:09 AM

#

barren wedge How to make PyTorch faster?

https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html

harsh nexus May 24, 2022, 4:11 AM

#

stone pollen for a beginner starting out in python (kinda know the basics) and wanting to do ...

Sentdex has some fun stuff, there are some great youtube videos on these topics and I feel like the entry level is really low for such a sophisticated topic, you don’t necessarily need to know all the math behind algorithms / …

tacit basin May 24, 2022, 4:17 AM

#

stone pollen for a beginner starting out in python (kinda know the basics) and wanting to do ...

https://allendowney.github.io/ElementsOfDataScience/README.html

thin palm May 24, 2022, 6:52 AM

#

If anyone has the time to glance over my deliverable for this job interview I am completing and give some feedback that would be great! I'm only 8 slides in but take a peak 🙂
https://docs.google.com/presentation/d/1ibyiIDu-b3k3y_yI4UwVdqK7I-4F2Kh5AFBUD74N5p4/edit#slide=id.g12e390617fa_0_811

Google Slides - create and edit presentations online, for free.

Create a new presentation and edit with others at the same time. Get stuff done with or without an internet connection. Use Slides to edit PowerPoint files. Free from Google.

paper trellis May 24, 2022, 6:53 AM

#

whats a good method to automatically identify and remove columns with no change (from a csv file recording data for a long period of time that may have channels not hooked up to anything, but was never turned off)?

wooden sail May 24, 2022, 6:58 AM

#

you could do a finite difference approximation of the derivative/gradient of the quantity you are measuring. if it is close enough to zero, you omit it

#

you'd probably want one that is accurate to 2nd order

barren wedge May 24, 2022, 7:03 AM

#

harsh nexus https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html

I tried torch.jit
but it got error

desert bear May 24, 2022, 8:01 AM

#

hi is there someone how knows what you can use to predict something in tensorflow lite. in tenserflow it is just perdict but in tflite you dont have that.

upper spindle May 24, 2022, 8:54 AM

#

anyone here specialises in econometrics, specifically microeconometrics

dusty valve May 24, 2022, 9:23 AM

#

getting this error while fitting a tf model ```
ValueError: Exception encountered when calling layer "lstm" (type LSTM).

slice index 0 of dimension 0 out of bounds. for '{{node strided_slice_1}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](transpose, strided_slice_1/stack, strided_slice_1/stack_1, strided_slice_1/stack_2)' with input shapes: [0,?,50], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

Call arguments received by layer "lstm" (type LSTM):
  • inputs=tf.Tensor(shape=(None, 0, 50), dtype=float32)
  • mask=None
  • training=True
  • initial_state=None```

#

# define model
model = Sequential()
model.add(Embedding(vocab_size, 50, input_length=seq_length))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
model.fit(X, y, batch_size=128, epochs=100)```

shell panther May 24, 2022, 10:05 AM

#

@ machine learning community
what do you prefer? reading research papers or watching YT Vids on those papers?

wooden sail May 24, 2022, 11:00 AM

#

they can't achieve the same thing unless the youtube video is a recording of a talk or lecture that covers the paper in depth. you can watch videos while you have your coffee in the morning, but once you find an interesting idea, you need either a lecture on it or to read the full paper yourself if you wanna understand all of it

mild dirge May 24, 2022, 11:12 AM

#

For pytorch how does one encode a target output for multi-label classification?

#

using multi-hot encoded rn like [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]

#

But getting this when trying to calculate some performance measure using the prediction and the label

manic elk May 24, 2022, 12:51 PM

#

Hello, I believe this fits in data science. Could anyone help me with manipulating csv files? I'm trying to convert a file by deleting a column, copying 65,000 rows of data, and adding a header on top.

All of this should be done in a new file so to preserve the original file
It's trying to take a file that looks like the first image and convert it to a file that looks like the 2nd image. The header info is derived from the original file + a second file for Date, Time and VUnit.

I'm struggling trying to code this algorithm so I would appreciate any assistance 🙂 thank you so much!

serene scaffold May 24, 2022, 12:53 PM

#

manic elk Hello, I believe this fits in data science. Could anyone help me with manipulati...

use pandas

manic elk May 24, 2022, 12:54 PM

#

@serene scaffold I have very little experience with pandas unfortunately, could you drop some function suggestions or dataframe manipulation techniques to achieve this?

loud cove May 24, 2022, 1:08 PM

#

What's happening here? why is the change type not work?

#

the dtypes returns float but it isn't as seen with info()

serene scaffold May 24, 2022, 1:10 PM

#

loud cove What's happening here? why is the change type not work?

did you expect df.astype to modify it in-place?

loud cove May 24, 2022, 1:11 PM

#

I thought it would, guess I'll have to reassign?

serene scaffold May 24, 2022, 1:11 PM

#

loud cove I thought it would, guess I'll have to reassign?

you can keep chaining methods onto pd.read_csv

serene scaffold May 24, 2022, 1:12 PM

#

manic elk <@253696366952316929> I have very little experience with pandas unfortunately, c...

you would just need to use read_csv and to_csv

#

and drop

loud cove May 24, 2022, 1:12 PM

#

serene scaffold you can keep chaining methods onto `pd.read_csv`

serene scaffold May 24, 2022, 1:13 PM

#

loud cove

you didn't chain the additional method calls onto read_csv, so nothing interesting happened.

loud cove May 24, 2022, 1:14 PM

#

I don't understand, shouldn't df.astype({'Quantity': 'float64'}).dtypes just change it?

serene scaffold May 24, 2022, 1:14 PM

#

df = pd.read_csv(...).astype({'Quantity': 'float64'})

#

this is what I mean by chaining.

loud cove May 24, 2022, 1:14 PM

#

yes i understand

serene scaffold May 24, 2022, 1:14 PM

#

you didn't do that.

loud cove May 24, 2022, 1:14 PM

#

but how is this any different from doing it in another line with df?

serene scaffold May 24, 2022, 1:15 PM

#

because if you don't display it or save it to a variable, it just creates a new object and immediately throws it away.

loud cove May 24, 2022, 1:16 PM

#

nah

#

the issues was the last arguement as you said

#

the .dtypes jsut returns that when I don't care about it

serene scaffold May 24, 2022, 1:16 PM

#

I don't think I can help with this. sorry.

loud cove May 24, 2022, 1:16 PM

#

you already did, removing it fixed

#

dtypes seem to return the series,

manic elk May 24, 2022, 1:26 PM

#

serene scaffold you would just need to use `read_csv` and `to_csv`

How do I copy the contents of a csv file to the the csv I want to write to?

loud cove May 24, 2022, 1:28 PM

#

manic elk How do I copy the contents of a csv file to the the csv I want to write to?

you can either overwrite or append https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

manic elk May 24, 2022, 1:31 PM

#

loud cove you can either overwrite or append https://pandas.pydata.org/docs/reference/api/...

thank you!

#

@serene scaffold thank you so much

loud cove May 24, 2022, 1:32 PM

#

I'm trying to group by columns (a list of columns other than Quantity) and get the sum of the quantity, but it doesn't seem to be working
df2 = df.groupby(columns).agg(Quantity = ('Quantity', 'sum'))
my other idea is just dropping duplicates then getting sum by order ID and merging.

desert oar May 24, 2022, 1:54 PM

#

loud cove I'm trying to group by `columns` (a list of columns other than `Quantity`) and g...

that's not correct agg syntax. the docs describe the valid inputs. it's unclear how exactly pandas is interpreting this tuple ('Quantity', 'sum'), but it's clearly not doing what you think it's doing

#

i recommend always checking the docs for usage help when in doubt https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html

desert oar May 24, 2022, 1:57 PM

#

loud cove I'm trying to group by `columns` (a list of columns other than `Quantity`) and g...

!e ```python
import pandas as pd

data = pd.DataFrame({
'Category': ['A', 'A', 'B', 'B', 'A', 'B'],
'Type': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
'Quantity': [3, 2, 4, 9, 4, 7],
})

columns = ['Category', 'Type']

result1 = data.groupby(columns).agg({'Quantity': 'sum'})

result2 = data.groupby(columns)[['Quantity']].sum()

result3 = data.groupby(columns)['Quantity'].sum()

print(result1)
print()
print(result2)
print()
print(result3)

arctic wedgeBOT May 24, 2022, 1:57 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |                Quantity
002 | Category Type          
003 | A        X            7
004 |          Y            2
005 | B        X            4
006 |          Y           16
007 | 
008 |                Quantity
009 | Category Type          
010 | A        X            7
011 |          Y            2
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ixepomulew.txt?noredirect

loud cove May 24, 2022, 2:01 PM

#

desert oar !e ```python import pandas as pd data = pd.DataFrame({ 'Category': ['A', 'A...

I did check docs, and tried your thing.

#

this is what I ended up doing.

desert oar May 24, 2022, 2:01 PM

#

something appears to be wrong with your data then

#

if you provide a sample of your actual data (use our paste site) then i can investigate further

#

!paste

arctic wedgeBOT May 24, 2022, 2:02 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

loud cove May 24, 2022, 2:02 PM

#

yeah that's why I posted, all columns are strings other than quantity.
the columns list have column names other than quantity.

desert oar May 24, 2022, 2:02 PM

#

right, but without seeing the data you're forcing people to guess

loud cove May 24, 2022, 2:06 PM

#

📎 order_report.csv

loud cove May 24, 2022, 2:07 PM

#

desert oar right, but without seeing the data you're forcing people to guess

only code i ran

columns = ["Sale Code",
"Order ID",
"Store Name",
"Player First Name",
"Player Last Name",
"Shipping First Name",
"Shipping Last Name",
"Shipping Address",
"Shipping City",
"Shipping State",
"Shipping Zip",
"Billing Phone",
"Billing Email"]
df = pd.read_csv("order_report.csv", usecols=(columns + ["Quantity"]), dtype=str).astype({'Quantity': 'float64'})
df.info()

desert oar May 24, 2022, 2:08 PM

#

thanks for sharing the file

#

as a side note, you probably do not want to include sale code in the grouping columns... it looks like a unique row id

loud cove May 24, 2022, 2:08 PM

#

it is here, but not always.

desert oar May 24, 2022, 2:09 PM

#

what do you mean by that?

loud cove May 24, 2022, 2:09 PM

#

in the current file, it is the same, but in the future there might be different ones.

desert oar May 24, 2022, 2:10 PM

#

ok

#

and why are you casting quantity to float?

loud cove May 24, 2022, 2:10 PM

#

ultimately, the reason im passing those columns is just because i just want the total for each order, the rest of the attributes will always be the same.

desert oar May 24, 2022, 2:10 PM

#

oh, i see

loud cove May 24, 2022, 2:10 PM

#

yea

desert oar May 24, 2022, 2:10 PM

#

i will show you a tidier way to do that

loud cove May 24, 2022, 2:11 PM

#

thanks, im also curious why that isn't working since i kept googling for quite a bit and just went with my idea of merging.

desert oar May 24, 2022, 2:11 PM

#

i don't actually understand what this merge is supposed to be doing

#

why are you still using the tuple? i explained why that shouldn't work

#data-science-and-ml

instantiate model

examine scores