#data-science-and-ml

1 messages ยท Page 202 of 1

sand reef
#

what he means is that he is making a 2D matrix with a 100 columns

#

the np.random thing

#

well, @prisma verge i'll try to see what i can do

prisma verge
#

thank you very much for help! ๐Ÿ’œ

lapis sequoia
#

he uses the term dimension for everything so that kinda confused me

sand reef
#

yeah, i too try not to get confused on that

prisma verge
#

arrays are confusing at themselves

sand reef
#

what he meant is that if you were to plot it on a graph, the graph would be 100 dimensions

#

meaning 100 features

lapis sequoia
#

oh

#

so input_dim takes the number of columns in an array?

prisma verge
#

number of dimensions = number of features?

sand reef
#

not really

#

input_dim means what is the shape of array that you are passing into the function

prisma verge
#

huh, then it's worth it to always type in input_dim = array.shape

sand reef
#

and in ML generally we represent the stuff to learn in a 2D matrix

lapis sequoia
#

yeah so we can just do input_dim = array.shape ?

sand reef
#

exactly, but instead you say, input_dim = array.shape[1:]

#

because the first dimension is generally the number of examples

#

for a dataset

prisma verge
#

ah

#

now that makes perfect sense

lapis sequoia
#

so if the shape is (10, 5)
input_dim = 5
?

prisma verge
#

(not for my example, but for ml)

sand reef
#

yes, if you are passing it one by one

#

if not, then you can pass the whole thing in

lapis sequoia
#

then what if the shape is (10, 5, 5)?

sand reef
#

means, 10 examples, who have 25 features (5x5)

#

or 25 pixels if those are 5x5 images

lapis sequoia
#

so input_dim = 25?

prisma verge
#

that looks simple now for shapes since i was always confused by stuff like "oh god what shape should i pass into there and how much neurons there should be"

sand reef
#

no, the inut dim is (5,5)

prisma verge
#

huh
so, input_dim always = array.shape[1:]

#

even if it's d100

#

as it seems at least

sand reef
#

yep

#

since we dont mention the batch size

#

@lapis sequoia read this

#

so @prisma verge what are we trying to predict?

lapis sequoia
#

okay thanks

sand reef
#

because I dont see any labels

#

unless we have to generate the labels ourselves

prisma verge
#

we gotta predict which values are gonna be the next

#

like,
49, 3, 1, 2

sand reef
#

so, from the csv i read and the pic what you sent....this seems like a regression problem

prisma verge
#

49 = round
3, 1, 2 = team results

sand reef
#

and definitely not classification

prisma verge
#
That regression is the problem of predicting a continuous quantity output for an example.
#

seems like it

#

it's like predicting results for tv show actually, based on previous results

sand reef
#

yeah, since we are not classifying anything

prisma verge
#

yes, exactly

#

just i've only built a bit of image classification networks and finetuning one about text generation

#

so i don't know how to handle this situation

#

and it'd be probably quite useful experience for my knowledge of ml

sand reef
#

yeah, if my net works T^T

lapis sequoia
#

one more question xD

sand reef
#

sure

lapis sequoia
#
from keras import Sequential
from keras.layers import Dense
import numpy as np

# For a single-input model with 2 classes (binary classification):

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=(200, 100)))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

data = np.random.random((1000, 200, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)```
just wanted to tweak around with input dim to make my self understand
i get this error:-
#

but isnt input_dim = data.shape[1:]??

sand reef
#

well, in this case its a dense layer right?

#

dense layers, well need to flatten the input

lapis sequoia
#

so dense layers dont support 3d?

sand reef
#

nope, you will have to flatten it, what you can do is add a flatten layer before the dense layer

prisma verge
#

the problem with predicting the data imo is that there's not enough of dataset

#

just 47 rounds which is really not a lot for network

lapis sequoia
#

like this?
Flatten(input_shape=(200, 100))
?

sand reef
#

remove the 1000

lapis sequoia
#

its input_shape u sure i gotta remove 1000?

sand reef
#

yus

#

the way you are asking is like its a life and death situation

#

i am scared now xD

lapis sequoia
#

lmao xD

#

so the first element in the tuple is always the number of examples so i gotta remove it?

sand reef
#

@prisma verge i too am confused......we dont have a value for 'y' instead we are just predicting x

#

yus

lapis sequoia
#

that thing above can also be:-
Flatten(input_dim=(200, 100))
?

sand reef
#

oh okay, we can use the number of days and use multiple linear regression

#

@lapis sequoia yus

#

no wait, there might be a small issue

lapis sequoia
#

now i wonder whats the difference between input_dim and input_shape xD

sand reef
#

be careful, there is an apparant subtle difference between them

prisma verge
#

also, i've asked too much probably already, but may you comment the code for better understanding?
i'm aiming for understanding more than result :p

sand reef
#

sure, let me see what i can come up with it in it

prisma verge
#

also, really thank you for all the help with this fuss

lapis sequoia
#

cool i didnt get an error thanks man

sand reef
#

np

#

so, @prisma verge we will have to use linear regression where X will be the day number and Y will be the one of the features

prisma verge
#

huh, why one of the features though if there's three besides the days?

sand reef
#

exactly, we will predict each feature but by using the number of days

#

one by one

prisma verge
#

ah

sand reef
#

gimme a little bit of time, i'll tag you if i manage to do it

prisma verge
#

that'd be really cool!

#

also, if this prediction model won't lie, i'll give you something from the wins, haha

sand reef
#

xD

#

well, i got 32% accuracy xD

#

*34%

#

so, i got the accuracy up to 43.75

#

got it up to 50 xD

prisma verge
#

how much epochs?

sand reef
#

welp, here is the thing @prisma verge

#

it turns out that there is something called Multivariate Multiple Regression

#

that is used to capture the relation between the multiple outputs

#

we might need that

prisma verge
#

but there's no that thing in keras
and there's very little info about it except universities and other very scientific resources :p

sand reef
#

and I dont think keras or sklearn has that

prisma verge
#

aw man

#

that's sad

sand reef
#

yeah, its a weird one, i'll have to see how that works

#

and btw I ran with 500 epochs

prisma verge
#

i guess, that'd be interesting thing for you but too advanced for beginner like me :p

sand reef
#

thing is

prisma verge
#

especially when i'm not that good at math

sand reef
#

what we need is something to capture the relation between the outputs

prisma verge
#

is multivariate regression the only way to do that?

sand reef
#

because separately, we are getting the outputs for each of them

#

but they are not like 1, 2 and 3

#

multivariate multiple regression

#

multivariate means with multiple inputs

prisma verge
#

multiple means multiple outputs?

#

huh

sand reef
#

but multivariate multiple means the outputs are also considered

prisma verge
#

... i'm not really sure how to handle that then

#

since it'd be no use for me since it's just too hard for my basic understanding, and it'd be probably a lot of sucking power from you

sand reef
#

i am thinking of doing something really weird tho

prisma verge
#

i guess, 30 rounds heists-streak is my dream and will stay that :p

sand reef
#

like, train a model on the number of days

#

and then train that model on other inputs

#

its like linking multiple regressions together

prisma verge
#

hmmm
hey, that may actually work

sand reef
#

lets try that

prisma verge
#

like, instead of making whole sandwich, you make sandwich part by part

#

but it's still sandwich :p

sand reef
#

yup

#

fk, i ended up with a 1000 epochs lol, gotta tune it down

prisma verge
#

woah, that's a lot

sand reef
#

welp that didnt work

#

i used the previous model to predict the new models training values and used the old labels

#

well, the accuracy fell

prisma verge
#

... whoops

sand reef
#

yeah, i think i see the issue

prisma verge
#

seems like Life (TM) logic doesn't work in ML world :p

sand reef
#

its that, we dont have enough features

#

we only have one feature to predict stuff off

#

so, its not going to capture the output

prisma verge
#

... so, the stuff with multiple regression is needed

sand reef
#

yeah

prisma verge
#

ah man, why everything nice in this world is so hard :p

sand reef
#

or i might be running too many layers

#

apparantely if your network becomes too deep, the accuracy starts falling

prisma verge
#

how much layers do you have actually?

#

3 hidden should be enough for every feature, no?

sand reef
#

2 hidden in first, 1 in second

prisma verge
#

huh

#

should be enough, no?

#

weird then

sand reef
#

yep, i m trying to change teh activation functions now

#

accuracy went up to 43.75

prisma verge
#

but it went to 50 in the past, so it's actually lower than some previous variant

#

hm

sand reef
#

yeah, but for some reason the validation accuracy crashes when training hits 50

#

i think that overfits it in the middle

prisma verge
#

dropout layer needed, i guess?

#

i mean, dropout usually helps with overfits

sand reef
#

well, there is barely anything to dropout

#

that works when your training accuracy hits like 90 and all but validation is bad

#

here our training itself is bad

prisma verge
#

hm

sand reef
#
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import to_categorical

df = pd.read_csv('/content/gdrive/My Drive/m.csv')
dataset = df.values
X = dataset[1:, 0]
Y = dataset[1:, 1:]
#Y = dataset[1:, :]
'''min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
print(X_scale)'''
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)

model1 = Sequential([Dense(1, activation='relu', input_shape=(1,)), 
                    Dense(32, activation='relu'),
                    Dense(32, activation='relu'),
                    Dense(3, activation='tanh')])
model1.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])
hist = model1.fit(X_train, Y_train,
                 batch_size=1, 
                 epochs=100,          
                 validation_data=(X_val, Y_val))
predictions = model1.predict(X)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(predictions, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
model = Sequential([Dense(3, activation='relu', input_shape=(3,)), 
                    Dense(32, activation='relu'),
                    Dense(3)])
model.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])
hist = model.fit(X_train, Y_train,
                 batch_size=1, 
                 epochs=100,          
                 validation_data=(X_val, Y_val))
print(model.predict(model1.predict([50])))```
prisma verge
#

so, it can't achieve bigger 50% without math, stats, and tensorflow?

sand reef
#

not that I know of

prisma verge
#

aw

sand reef
#

well, I mean, thats the limit of my knowledge right there

#

I don't know what could increase it.

prisma verge
#

not like my knowledge is very better :p

sand reef
#

Because as much as I have learnt

#

the only way to increase training accuracy is

prisma verge
#

so, the least is the best team in eyes of nn?

#

:p

sand reef
#

more features, more examples, or sometimes better model

#

yes

#

if you want, you can do one thing

prisma verge
#

let's see how this heist goes and if predictions are bad or good

sand reef
#

use this function

#

in the end after modeling

prisma verge
sand reef
#

yes

#

every time the initial values are initialized randomly

prisma verge
#

we need to summon ML specialists!

sand reef
#

so, the results will be different

#

i'll give you a function, to take the values and predict them right

#

gimme a sec, i need to check something lol

lapis sequoia
#

what does an activation do?

sand reef
#

values after being multiplied by the matrix are passed through a function

#

that function is activation and it introduces non linearity

prisma verge
#

ml shows best results until epoch 500 hits

#

then loss chanegs veeeery slowly

#

changes*

lapis sequoia
#

should i choose activations based on the situation?

sand reef
#

sometimes they do matter, sometimes they dont

prisma verge
#

so, what's that function I wonder?

lapis sequoia
#

so i can give relu for now?

sand reef
#

here

#
def correct(list_values):
  temp = np.argsort(list_values)
  names = ['Micheal', 'Franklin', 'Trevor']
  return [names[x] for x in temp[0]]```
#

pass in the prediction we made

#

and yes @lapis sequoia you may do so

prisma verge
#

instead of model.predict[50]?

lapis sequoia
#

Thanks bro i luv you [no homo]

prisma verge
#

ah

#

with model.predict[50

#

gotcha

sand reef
#

yeah, the weird print() at the end right? pass its parameter into this function

#

and np!

#

works?

prisma verge
#

it goes through epochs

sand reef
#

well, thats the model lol

#

well, i dont know why am i feeling happy even tho it didnt exactly work out

prisma verge
#

because it was interesting task?

#

:p

sand reef
#

i guess?

#

so, the function worked?

prisma verge
#

it still goes throug epochs

sand reef
#

where did you get this csv from tho?

prisma verge
#

i made it in google tables by copying it from the site

sand reef
#

oh? what site was it?

prisma verge
sand reef
#

lol reddit only

prisma verge
#

oh, i didn't import numpy

#

imported, trying again now

sand reef
#

lol

prisma verge
#

but yeah, this guy does reddit giveaways which has 2/3 chances to win

sand reef
#

i see

prisma verge
#

you gotta have 9 streak to get 5 euros steam giftcard

#

and chance of it working is 2.3%

#

so, since i'm interested in ml, i thought "hey that'd be nice to predict with machine capabilities!"

#

but then i got confused with what should i do

sand reef
#

wat?

prisma verge
#

that's what it outputs

sand reef
#

show me the full thing, the one with the 4 frames and upper AttributeError

prisma verge
#

OHH

#

NEVERMIND

#

I CAPTURED JUST ONE PREDICT BUT NOT TWO

sand reef
#

lol

prisma verge
#

accuracy now doesn't bounce

sand reef
#

so, what does this guy do?

prisma verge
#

it shows stuff in order "biggest chance > middle chance > lowest chance"?

sand reef
#

yeah thats what the function does

prisma verge
#

this guy does a giveaway series where you get more points the more your streak is

#

streak = heists without getting onto biggest team

sand reef
#

so this time, it was to make a predictive model?

prisma verge
#

for points you can "buy" steam games and gift cards

#

yeah

sand reef
#

well, you can possible get as far with this model as far you can without a model lol

prisma verge
#

this ai is always bouncing within it's worth to choose trevor or michael

sand reef
#

*possibly

prisma verge
#

but always hates on franklin

#

:p

sand reef
#

maybe Franklin doesnt get in the biggest team a lot

prisma verge
#

nah, the least one with biggest teams is Michael

#

then goes trevor, then franklin

sand reef
#

xD, so, then i guess we are better off without a model xD

#

just make a random number generator

prisma verge
#

nah, it's your task to go on the team with least numbers or at least middle team

#

biggest team = arrest

#

i just thought that with all prediction stuff it's actually possible to do some predicts that have more than 20% chance of being true

sand reef
#

nah boi

#

for that just use maths

#

you need a lot of data and features to do predictive modeling

prisma verge
#

what maths should i do

#

๐Ÿค”

sand reef
#

probability

#

welp, i m off now!

#

ciao~

prisma verge
#

good luck, and that was fun :p

spark nimbus
#

@lapis sequoia moving here; Do you know of an efficient way to handle EQ?

lapis sequoia
#

@spark nimbus what do you exactly mean? Like frequency bands?

spark nimbus
#

yup

#

since this is the most taxing part of my current implementation

#
    def bands_to_eq_size(self, frame: numpy.array) -> numpy.array:
        frame *= self.eq
        return frame  / 1000

    def transform(self, frame: numpy.ndarray) -> numpy.ndarray:
        fftified = self.fft(frame.copy())
        eq_applied = self.bands_to_eq_size(fftified)
        return self.ifft(eq_applied)

    def process(self, audio: AudioSequence) -> AudioSequence:
        left, right = audio / 2

        new_left = []
        new_right = []

        for old, new in zip([left, right], [new_left, new_right]):
            for frame in old:
                new.append(frame.apply(self.transform, seq=True))

        return (
            left.new(
                numpy.concatenate(
                    [f.audio
                     for f in new_left])) *
            right.new(
                numpy.concatenate(
                    [f.audio
                     for f in new_right])))
```here's my current implementation
lapis sequoia
#

Well I will work on that this evening. But I cant promise anything. Asking efficient way is quit relative.

#

Hmmm srry i dont have a pc right now. I am at work hahah

spark nimbus
#

right now it's done frame-by-frame

lapis sequoia
#

Hmmm well

#

My way is to divide frequecy spectrumnfor human hearing into bands

#

So say 0 hz to 22 khz can be static or a logaritmic function

spark nimbus
#

wdym by that?

lapis sequoia
#

The on each band you can do fft

spark nimbus
#

hmm

lapis sequoia
#

Well the thing is

spark nimbus
#

sadly due to lack of knowledge in this area I have no clue what all that means

#

and there's no easy way to get into audio it seems

lapis sequoia
#

Welll uhm

#

Yeah mine knowledge is also very low. It costs me a lot ofntime. But you have to firstly know what you want to achieve

#

Is it plotting? What do you need exactly? Because it can be peaks? Of a specific band plotted? Or you can have averages of bands

#

You really must tell exactly what you want because thayt will

spark nimbus
#

I'm manipulating live audio

#

basically a DSP

lapis sequoia
#

Yeah.....

#

Manipulating?

spark nimbus
#

basically applying FX

lapis sequoia
#

Ahaaaa

#

And you are creating a new audio feed? Ornfile?

spark nimbus
#

passing straight to pyaudio

lapis sequoia
#

So you have a wave file then you want to apply fx and send that to pyaudio?

#

Fx is quit complicated bro. You really nedd to understand basic fft and complex numbers to make a fx on it. So be honest i dont have a clue on how to do that. I know i can get a plot for freqs peaks and i know how to calculate averages on freqs bands but appyling fx is quit complicated

prisma verge
#

@sand reef hey! what do you think, what will happen if you pass less data to the nn?

#

like, only last ten entries?

#

will the prediction become closer to "The Truth" (TM) because last entries are easier to judge from?

#

accuracy is 60% on latest entries btw

prisma verge
sand reef
#

Yup

#

And no less entries means, the output will be even more random

modest scarab
#

hi there!

#

Hows everyone doing today?

#

I am part of the chamber of commerce comittee for my local Indian community in my city

#

i am thinking about creating a census from the indian community so i can understand the demographic and the analysis of the demographic

#

so my committee can understand how to better serve the indian community

#

it would be easy to send a newsletter to everyone with a google form and to work with other indian leaders in my community to make a concerted effort to gather information about what part of india they are from , their type of professions, their families and household, and age group, and area they lived. of course, this is not about asking private information but more a general information

#

do you have any suggestions or ideas of how I could improve this project idea?

sand reef
#

How do you want to improve it? Like what are your limitations? And what kind of improvement are you looking for?

prisma verge
#

it actually guessed the stuff right!

#

once

#

now i've reran it to see if it'll gonna guess it again

#

because i've added dropouts and third network to it

#

@sand reef imagine all of money we'll win if my way works!

sand reef
#

Lol

prisma verge
#

"your car doesn't work? just add another engine to it" (C)

sand reef
#

You added a 3rd network like how I added the second one?

prisma verge
#

yeah

sand reef
#

๐Ÿคฃ

#

Well, that's going to be a big network now

#

And congratulations!

prisma verge
#

GOOGLE, LOOK, WE CAN PREDICT THE FUTURE NOW

#

FUND OUR STARTUP PLEASE

sand reef
#

Well, how I wish they would.

prisma verge
#

though this predict may have been an accident

sand reef
#

Probably

prisma verge
#

network is still learning to see if it wasn't

sand reef
#

Ppl would say calculate a p value for it

prisma verge
#

do i look like a mathematician?

sand reef
#

But we have only 1 feature, wtf we gonna calculate?

prisma verge
#

we'd have more if those keras guys implemented multiple regression thing

#

and we could win thousand of 5 euro gift cards...

sand reef
#

We are making features out of one feature

#

Well. Imma go read the pdf I was reading.

#

You also read this.

prisma verge
#

MAN

#

IT PREDICTED IT RIGHT

#

though it may have been reverse order because i don't remember what order it was

sand reef
prisma verge
#

BUT STILL

#

IT MAKES SENSE

sand reef
#

Read that and become a mathematician

prisma verge
#

IT MADE "michael - trevor - franklin"
AND MICHAEL WAS BIGGEST TEAM, TREVOR WAS MIDDLE, FRANKLIN WAS SMALLEST

#

DOES IT ACTUALLY WORK???

sand reef
#

Yep. That was reverse

prisma verge
#

i've reran it again

sand reef
#

So. Nope. I think that was luck.

prisma verge
#

i mean third time cannot be luck

sand reef
#

In reverse?

prisma verge
#

by in reverse i mean it could have outputted franklin trevor michael but i don't remember since my memory is goldifshes memory

#

i'm rerunning it

sand reef
#

Well. If you do get 1000s worth of gift cards, send a couple thousand here

prisma verge
#

and if it outputs good result third time

sand reef
#

๐Ÿ˜‚

prisma verge
#

i'll donate euro 2000 to you, get euro 2000 for me, and donate euro 1000 to somebody

#

:p

sand reef
#

Sure!

prisma verge
#

that's if our magic ai actually works

sand reef
#

Well. I gtg now. Time to sleep. It's kinda getting late now.

#

Yeah. I actually hope it works.

prisma verge
#

well, 10 pm here, not that much

#

good night!

sand reef
#

11.22 PM here

#

Good knight!

prisma verge
#

... it made it again

#

michael trevor franklin

#

predicting it again on new dataset

#

let's see it it'll talk truth

#

... okay now it always outputs michael trevor franklin

#

even with new data

#

weird

#

it now outputs same results over and over

#

2.283496 2.372942 2.0555158
it says now that trevor will be least, franklin biggest, michael middle

#

xen0remind
(putting this to search for it later when heists end)

jagged stump
#

I will share something about data science ML and DL maybe you guys know it but I guess that cheat sheet will be helpfull for you

sand reef
#

Thanks a lot @jagged stump !

small ore
#

450+ msgs since yesterday. This channel is suddenly rocking

silent swan
#

the summer of "how do I get started on ml/dl?" ๐Ÿ˜„

lean ledge
#

"I'm really interested in AI and ML" - literally every first and second year computer science student ever

noble ledge
#

๐Ÿšถ

sand reef
#

But 3rd year?

#

And what do I learn to get into Computer Vision? I am seeing a lot of stuff and feeling overwhelmed and don't know where to begin from. I know how to use Regular conv NNs.

lean ledge
#

I thought I already told you what is expected out of a job

#

@sand reef

#

The list is completely different if you want to go for traditional computer vision over deep vision

jagged stump
#

If you need something else about documemnts I have so many

#

I can share guys

sand reef
#

Holy...

#

Well. I guess I'll do something about it then

#

So, basically, ML and DL isn't as hyped up as it appears to be? @lean ledge

earnest prawn
#

it is as hyped up as it appears to be in the public

#

just not the industry or at least just in the minority of industry

jagged stump
#

I have a question . What is the best way for real time brand detected ? Any idea?

#

CNN etc which one you advice

onyx granite
#

a large majority of major models used by large institutions are still more "traditional" machine learning, partly because they work very well still and partly because DL/ML is harder to "explain" to regulators than a regression type model

#

you would have to get into more details which would be cumbersome for explaining to less tech-savy people

sand reef
#

I have a question from ISLR.

#

This is from linear regression

#

These are the parameters we have to minimize, and we calculate them using this formula (above)

#

This is given formula for calculating the variance of the means. (above)

#

How are we able to derive those? (above)

#

What I know:

#
2. Standard error is measured for the "deviation" between the "means" taken from random samples  and the true mean from a given dataset. ```
#

So, what am I missing here? I can't seem to derive this weird formula. Nor am I able to logically deduce this either.

#

ISLR First Printing (Page 66)

prisma verge
#

@sand reef well it kinda predicted it but trevor was middle not michael

#

franklin was biggest still though

sand reef
#

Nice!

silent swan
#

@sand reef I'm not sure which part you're having issues with

lapis sequoia
#

I need some help

#

there's three classes in my dataframe, how do I limit them to 100K rows each

sand reef
#

@silent swan how the standard error formulas are being derived for the square of the parameters.

silent swan
#

ah, I would say for those you'll actually need to get into real statistics

#

like a standard regression textbook or regression chapter in a statistics textbook will derive them

lost sinew
#
import pandas as pd

markets = ['ETH', 'BTC', 'ADA', 'ETC', 'EOS', 'NEO', 'BCHABC']
pair = 'USDT'
dfs = []
start_time = '2018-01-01'
end_time = '2019-06-15'

for market in markets:
    df = pd.read_csv(market + pair + '.csv', parse_dates=True, index_col=0)

    df = df.drop(columns=['Open', 'High', 'Low', 'Volume'])
    df.rename(columns={'Close': market + pair + ' Close'}, inplace=True)
    df = df.loc[start_time: end_time]
    dfs.append(df)
#

there is like 7 columns.. each with the closing price of each coin.. is there a way i could add all of the prices and average it and make it a separate column called 'average'

sand reef
#

there is

#

if i am not wrong, there should either be a mean function in pandas itself, if not, i think numpy will work as well

#

cuz, again, if i m not wrong, pandas was built on numpy

wide gyro
#

so i have a function like this

#
def epochUpd(row):
    row.at['updated'] = time.strftime("%a, %b %d %Y", time.localtime(row.at['updated']))```
#

and i have another function called clean which does something like this

#
def clean(df)
    for index,row in df.iterrows():
        epochUpd(row)
    df.to_csv('updated.csv',index=False)
#

Basically, I am trying to clean a data file, but I want to change every 'updated' column to show the value I created from the epochUpd(row)

#

Firstly, I don't think the iterrows() is the right way to go because I feel like it'll take a while and there could be a much easier way

#

Also, I'm not sure if the formatting is even near correct

#

Could I change the epoch function to

#
def epochUpd(df)
    df.updated = time.strftime("%a, %b %d %Y", time.localtime(df.updated))
#

and then keep the for loop how it is in the clean function

#

nevermind, getting invalid syntax on that last function

#

but there's gotta be a way

lapis sequoia
#

may i know whats wrong with this code

import tensorflow as tf
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from keras import Sequential
from keras.layers import Dense, Flatten

x, y = load_digits().data, load_digits().target
x_train, x_test, y_train, y_test = train_test_split(x, y, shuffle=False)
model = Sequential()
model.add(Dense(64, activation=tf.keras.activations.relu))
model.add(Dense(64, activation=tf.keras.activations.relu))
model.add(Dense(10, activation="softmax"))
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=3, batch_size=10)
print(model.predict(x_test[0]))```
#
  File "/Users/Kushi/PycharmProjects/test/misc/ML_test.py", line 19, in <module>
    model.fit(x_train, y_train, epochs=3, batch_size=10)
  File "/Users/Kushi/PycharmProjects/test/venv/lib/python3.7/site-packages/keras/engine/training.py", line 952, in fit
    batch_size=batch_size)
  File "/Users/Kushi/PycharmProjects/test/venv/lib/python3.7/site-packages/keras/engine/training.py", line 789, in _standardize_user_data
    exception_prefix='target')
  File "/Users/Kushi/PycharmProjects/test/venv/lib/python3.7/site-packages/keras/engine/training_utils.py", line 138, in standardize_input_data
    str(data_shape))
ValueError: Error when checking target: expected dense_3 to have shape (10,) but got array with shape (1,)
#

but in the load_digits dataset there are 10 class names so shouldnt i have dense 3 ten so that it gives me the probability of each one?

#

please ping me when helping

prisma verge
#

@sand reef
['Trevor', 'Franklin', 'Micheal']
[[2.3735387 2.3416874 1.861764 ]]
let's see if it works :p

#

if it'll guess biggest team third time

#

that'd be cool

brazen wing
#

@lapis sequoia you didn't give it that though, you gave it a single int

prisma verge
#

HUH

#

IT DID GUESS THAT MICHAEL WILL BE BIGGEST

#

still weird with middle and smallest team

#

it mixes them up so trevor was middle

#

but yeah it predicts biggest team so that's good already

#

['Franklin', 'Micheal', 'Trevor']
[[2.1652157 1.9663016 2.2141366]]

#

let's see if next game will be right too

#

if yeah...

#

then that's a gem

daring spindle
#

Hey guys what is the best place to learn ML and Neural Networks?

olive willow
#

dude first make a mind map what you need to start learning it because you need for example programming fundamentals, linear algebra, stats and later also calc

#

to properly understand it

prisma verge
#

cough no you don't cough

olive willow
#

assuming you know more complex algebra and functions and how to use them

#

?

#

to really understand it yes you do

prisma verge
#

for building networks just for fun you can use keras

#

also there's amazing Zero to Deep Learning course which covers all the stuff you need including math explainatory

olive willow
#

yes but building and understanding are two different places

prisma verge
daring spindle
#

Math was my best finals subject I understand it ez

prisma verge
#

but can you do linear algebra?

olive willow
#

he's 13

prisma verge
#

why do you think so tho

olive willow
#

btw so you haven't had that in school

#

I know him

prisma verge
#

... oh

daring spindle
#

Erm I am 14 now sur

olive willow
#

happy birthday then!

daring spindle
#

The real man heroh

#

almost got my beard ready

prisma verge
#

really though check out ZTDP course

#

it covers all subjects you'll need

olive willow
#

I'm using datacamp

prisma verge
#

with source codes, jupyter notebooks, and stuff

olive willow
#

but I'm going into DS

#

and robotics

prisma verge
#

i'm planning to go web infosec so
i'm just doing nns for fun

olive willow
#

btw khan academy is good for math

prisma verge
#

like text generation, prediction, stuff

olive willow
#

oohh sure

prisma verge
#

and it doesn't require much math, just basic array understanding

olive willow
#

thats ez

prisma verge
#

and python and numpy syntax knowledge

olive willow
#

I actually need to know a lot to do good DS

#

SQL, numpy, pandas, ML, math etc.

#

python

#

a lot of visualizing tools

prisma verge
#

yes but that's if you go for DS as a job

#

i'm building projects just to laugh them off with friends :p

olive willow
#

oohh hahahahahaha

#

I want to have an own robotics company which is using AI to make them human like

prisma verge
#

i've made a thing that'd make poem-like neuromancer quotes

olive willow
#

oooohhh that cool I guess

prisma verge
#

i also want to learn markov chains but my brain refuses to understand their realizations in python

olive willow
#

whats that?

earnest prawn
#

a stochastical model

#

for state transistions

olive willow
#

nix out of nowhere

prisma verge
#

oh no by mentioning scientific words you summon nix

olive willow
#

sure

earnest prawn
#

i have ben watching for around 10 minutes actually

olive willow
#

hahahhaahha

prisma verge
#

may we get pinned picture here with

#

BIG NIX IS WATCHING YOU

olive willow
#

yh

#

why do you even need cameras or monitoring bots if you have NIX

earnest prawn
#

because I dont convert to jpegs

prisma verge
#

i kinda get markov chains basics like
there's a bunch of words and every word has a chance to follow the other word

earnest prawn
#

no

prisma verge
#

... but

earnest prawn
#

that is not the deeper idea of markov chains

prisma verge
#

where can i get simple explaination of MC

earnest prawn
#

the deeper idea of markov chains is that you have a set of states and then you have a transitions probability from every state to every state

#

that is everything there is

#

the words are just a form of states

prisma verge
#

oh
so MC are just counting probabilities between transitions which can be any object like

#

weather, number

#

kinda like that?

earnest prawn
#

counting probabilities?

sand reef
#

Say. Has anyone heard of reservoir computing? What is it about?

prisma verge
#

compute probabilities
how do i say it

#

i can't say it with right words

#

that's not what i mean

earnest prawn
#

the markov chain itself does not compute probabilities

#

it expects you to provide probabilities and states

prisma verge
#

yeah i get that

earnest prawn
#

all the libraries you see for markov chains just do this for you

prisma verge
#

oh

#

there's libs for markov chains?

earnest prawn
#

dozens

prisma verge
#

may i just get fancy text generator

#

because i love random stuff

#

and markov chains examples seem to be like human words but still with a bit of computer craziness

earnest prawn
#

i mean you can easily implement stationarymarkov chains yourself

#

it gets a bit ugly with mth order markov chains because you have to look m states back for your new decision

uneven wren
#

is there a way to access printed information?

earnest prawn
#

you read from stdout?

#

also how does this fit here?

uneven wren
#

hmmm i thought it would be topical to data-science

prisma verge
#

but i don't understand markov chains and i just want to get more random stuff that look humanese but weird

uneven wren
#

sorry

earnest prawn
#

dont worry

#

so for markov chains, suppose we have three states s1 s2 s3

the transition probabilities could for example be
0.4 for s1 -> s2
0.8 for s1 -> s3
0.2 for s3 -> s2
0.1 for s3 -> s2
0.9 s2 -> s1

all ommited ones are 0

so if you start out in s1 you check your probabilities and see you have to go to s3
s3 transitions to s2
and s2 transitions to s1 (and now we're lost in an endless loop)

olive willow
#

so nix as you're here, can you tell me what this is Parametric representations of lines in linear algebra because the guy in khan academy only confuses me more

desert oar
#

can you give context for that

#

what is a parametric representation of lines

prisma verge
#

huh

olive willow
#

idk that's why I ask the all mighty nix who knows mostly everything

prisma verge
#

that now seems easy

olive willow
earnest prawn
#

yes it is extremly easy @prisma verge

prisma verge
#

and yeah, nix seem to know everything

olive willow
#

pretty much

prisma verge
#

praise nix

desert oar
#

that's the title of the video, i think you should watch it to find out what it is

olive willow
#

I've watched it more than once

#

and it doesn't make sense to me

prisma verge
#

may we get :praiseNix: emote

olive willow
#

yh

desert oar
#

@olive willow is there a specific part of it? or the whole thing is confusing

earnest prawn
#

also Id argue that a parametric representation of a line is (from a geometrical point of view)
a vector lets call it a
another vector lets call it b
and a variable scalar l

and then you could represent a line which goes from a in direction b like

a + l * b

but that is my school geometry thinking I dont really know what it is, it just sounds similar

and no I do not know everything in fact id consider myself horribly bad at almost everything related to data science

prisma verge
#

also, nix
from what i've heard, to get probability for each word you gotta divide the amount of words on amount of times this word appeared in the text

#

right?

desert oar
#

its saying that a vector is a "parametric" representation of a line

#

"parametric" in this case means that it uses a few specific numbers, i.e. parameters, to represent a large collection of objects, in this case a line

earnest prawn
#

but a vector represents an infinite amount of lines

desert oar
#

or a collection of lines

#

yeah

#

thats the point

olive willow
#

ooohohhhh

#

so what vectors you can make with these parameters

prisma verge
#

i don't want to hear anything about infinitiness so im getting outta here

olive willow
#

it would be infinite mostly

prisma verge
#

infinitiness confuse me

#

it's always confusing

#

like, that endless hotel paradox

desert oar
#

this isn't that @prisma verge

earnest prawn
#

i mean a vector is just a direction, it doesnt define from where it starts just where it points

#

so you can choose every point in space

prisma verge
#

VECTOR IS JUST X AND Y CHANGE MY MIND

olive willow
#

yh but if we would start at the origin

prisma verge
#

derez x and derez y

olive willow
#

to make things easy

earnest prawn
#

then its not a vector

#

then its what i described above with a = (0|0)

desert oar
#

@olive willow @prisma verge the punch line is at around 10:00 in that video -- you can use y = mx + b to describe a line in R2, but a vector can be used to represent an arbitrary "line" in an arbitrarily-dimensioned space

#

hence a vector is a parametric representation of a collection of lines

olive willow
#

an arbitrary "line" in an arbitrarily-dimensioned space ?

desert oar
#

yeah, what's a line in 10 dimensional space?

prisma verge
#

may we just simplify that
there's a lot of dots and you can line to any of dot with x and y coordinates

#

daz vector now

desert oar
#

equation is really comlicated

#

cause you dont just have x and y

olive willow
#

xyz

#

3dims

earnest prawn
#

a line in 10 deimensional space is a plane in 9 dimensional space

desert oar
#

can have more than that

earnest prawn
#

๐Ÿ˜œ

olive willow
#

nix stop!

desert oar
#

can have 3, 10, or even infinity

olive willow
#

yh

#

1, 2, 3 ....... n dims

desert oar
#

so instead of having to write special equations for each case

#

you use vectors

#

(and eventually matrices)

olive willow
#

yh I know that

#

and tensors

prisma verge
#

also i really respect nix and srl for knowing math because i like people that know math

olive willow
#

in numpy

prisma verge
#

but myself i hate math

#

but knowing it is cool i guess

olive willow
#

A vector is a thing that has a direction and magnitude

desert oar
#

thats one way to think of it, yes

olive willow
#

and you can place it anywhere you want if it has the same direction and magnitude

desert oar
#

@prisma verge can you elaborate on your question about word probabilities

olive willow
#

but to simplify things I would see a vector as a scalar of 2 base vectors

desert oar
#

@olive willow he gives a good example at around 12 mins in the video

#

how do you characterize the line that goes through 2 specific points

olive willow
#

gonna watch

desert oar
#

in a general dimensional space

#

using 2d for demo purposes

prisma verge
#

uhhh, so, we were talking about markov chains and when i was reading about it i've read that probability of each word being added in the text = amount of words in this text divided by times this word appeared in the text

desert oar
#

flip that around

#

and... sorta

prisma verge
#

oh

desert oar
#

depends. that's one specific model

#

its not always true

prisma verge
#

i just want to have nice text generator, really
i love random stuff

earnest prawn
#

then get yourself markov chain lib x and youre done

prisma verge
#

i mean, it's not like i'm going in data science to know the theory

earnest prawn
#

yeah well

#

then youre not really going to get far

prisma verge
#

and, i guess simplified version of theory is good enough that was provided by you nix

#

haha

earnest prawn
#

thats not really a simplified version

#

its literally all there is behind it

prisma verge
#

... oh

earnest prawn
#

i mean

#

you can express this a lot more formal

#

with random variables and whatever

#

but it boils down to this idea

prisma verge
#

i hate formal things to be honest since i'm stupid and it's easier when you can compare stuff with real life isntead of learning hundreds of new terms

desert oar
#

idk how you ever expect to learn anything then

#

real-life examples are valuable and important

#

but if you want to get past the basic level, at some point you have to sit down and try to understand what's going on

#

this goes for many things, not just data science

prisma verge
#

i have no idea?

#

i mean, uh, i kinda get python syntax when i'm bad at explaining this at formal side

#

and i even get what i do in code (on high level, not low level tho)

#

i guess i can say "it just works"

#

... no, not about code, about how i'm learning stuff

earnest prawn
#

Not undderstanding code will restrain you enormly in the long term

prisma verge
#

but i'm saying that i do get it

#

i'm just bad at getting things when they're explained with a lot of programming and math only terms

#

and that's weird

lapis sequoia
#

pretty new to data science, but currently trying to analyse some sensor output. i am making pandas plots and checking if the sensor behaves "nice" according to a temperature cycle over time. are there some tools or methods in pandas which might help me discover sensor "noise" or "shifts" in data? like for example gradually changing offset over time?

median siren
#

Hi all, i'm not sure if this is the correct place to ask, but is there anyone who's versed with python Learning To Rank packages? Especially Pairwise ranking algorithms? I'm struggling quite a bit with implementing it in python.

earnest prawn
#

Ranking packages?

median siren
lapis sequoia
#

@median siren you have to be more specific

#

what are you trying to rank

#

@lapis sequoia auto regressive methods

median siren
#

@lapis sequoia Apologies, i'll try to be more specific.

For example: consider the following table:

| Name         | Rank  | Points | Embedding  |
|-------------|--------|---------|--------------|
| Man. City  | 1           |   98       |  [....]              | 
| Liverpool   | 2          |   97       |   [...]               |
| Tot. Hots.  | 3          |   93       |   [...]               |

Here, we try to predict the rank of a team, according to their embedding. So there is 1 query, namely the results of this season, and you want to predict the rank of a team based on embeddings. (Note: the embeddings can also be replaced with other features)

lapis sequoia
#

what does the embedding represent..

#

what is the query

median siren
#

Well, to be frank I'm quite doubtful about that, Considering the goal is to "predict a rank", I assume the query is the rank, and you want to have the most accuare name with that query.

So in this case, if the query would be "1" You'd want Man. City to return. This considering the opposite doesn't hold true, I assume?. For example: When you query "Man. City." the result can't be "1", no?

#

what does the embedding represent..

It's a graph embedding, so the embedding represent the name in a graph.

lapis sequoia
#

I'm not able to understand the case here.. it's very cryptic..

#

you want to predict the rank of football clubs.. but what are the features exactly?

#

it's very rare for features other than text to be represented as embeddings..

lean ledge
#

That's not true. A lot of things use embeddings. It's less common to call them embeddings outside NLP but the idea of taking a representation, finding patterns in it and embedding it into a lower dimensional latent space that's more fundamental is a very common thread throughout ML

lapis sequoia
#

can't refer to all vector representations as embeddings..because they are specific task-restricted representations..

lapis sequoia
#

@lapis sequoia thanks, Ill look into that

desert oar
#

@lapis sequoia tbh python has bad tools for time series analysis. R has much better stuff for decomposition, trends, etc

#

@lapis sequoia isn't that pretty much the definition of an embedidng? an invertible mapping into a lower-dimensional space?

#

theres nothing specific to text in that

lapis sequoia
#

I don't know.. it feels very odd to me to use that term for anything other than text o.o

lean ledge
#

What makes text embeddings different from non text ones?

lapis sequoia
#

lets see.. in text embeddings there's cases like context free and context aware..

#

in other vector representations.. the features are represented in a certain space and can be reduced to another, just like embeddings that represent text..

#

features for a vector representation are often chosen for predictability.. taking into things like avoiding multicollinearity.. this is the opposite of context aware representations in text, which are representations that take into account their surrounding text..

#

the idea of embeddings is that the representation can be used for multiple related nlu/nlg tasks or in a related domain..

#

apparently they also use a term called Vector Embeddings to denote outputs from DL layers..

formal badger
#

How does everyone have their python environments setup?

#

Good idea to just install anaconda?

lapis sequoia
#

pika pi?

#

I used to use anaconda for university.. then was doing everything on the cloud.. so.. make a script for launching a quick instance with everything I need

#

pikachu

elder vessel
#

I am using miniconda with conda-forge as source. Then I conda install or pip install what is needed and set this env for the project in PyCharm (or whatever IDE) settings. If I use a lot of packages, I am uploading a venv.yml export along with the git repo.

polar acorn
#

I just use anaconda works fine for me. I use conda environments as well.

lean ledge
#

@lapis sequoia Vector embeddings are just embeddings. They are represented as a vector, same as text embeddings. Also not just used for deep learning. Deep learning is just very often used for things that lower dimensionality. Autoencoders' entire point is to create embeddings and classification tasks naturally create embeddings. Also a lot of embeddings use some sort of contextual information. Context around in text is just another part of what makes the information's position. In images, it would be the larger context around some area. It's not that different, really.

lapis sequoia
#

@lapis sequoia doing a lag plot of a large dataset (some 600000ish rows) certainly took its time.. maybe I am doing this wrong. Even tried to slice out every other row to "half" the data but that still took ages.

median siren
#

@lapis sequoia , Exactly as @lean ledge says, In my specific case I use a graph that I embed. A graph embedding can be seen similar to text embedding. The context of a word is reffered to as the probability of two words occuring next to eachother, or near eachother in a sentence. In a graph, the context of a node is reffered to as the probability of a node being a neighbour to another node.

Then, after that, I want to predict the ranking of those nodes based on their graph features, which in my case is represented in embeddings.

silent swan
#

I'm still not sure I'm comfortable with people now using "embedding" to refer to any kind of hidden state

lean ledge
#

What do you mean by hidden state?

#

Is a vector in a latent space considered hidden?

silent swan
#

I'm referring to just any intermediate state in a neural network model

#

people have started referring to those as embeddings too

void anvil
#

@desert oar that's entirely false lol

desert oar
#

@void anvil ?

void anvil
#

There are just as many, if not more, tools for TS in python and they're a lot faster

desert oar
#

Do tell

#

Seriously it would be great to not have to call out to rpy2 for stuff like time series decomposition and arima

#

Statsmodels doesnt count

void anvil
lean ledge
#

@silent swan What's wrong with use of the term embedding for that? It's generally only the last layer or two that are called embeddings, not any random hidden layer. It creates feature vectors that are separable according to a quality that we care about (which we used to classify) so it's honestly very much an embedding.

#

It's not purposefully made to be any kind of encoder but it certainly works that way given how neural networks seem to work experimentally (I'm not sure of any theoretical papers that exist on how semantically focused information comes out in later layers when training for a classification task etc. Would be nice to read if anyone's got one)

lapis sequoia
#

I'm looking for some help with numpy's bincount function

#

and trying to use bincount to make an azimuthal average but to ignore radii where data is completely missing but still have the missing value show up at that radius in the average

silent swan
#

@granite bobcatotboi one pattern that I've seen for example, is where transfer learning is performed and the output of the model being reused/transferred is referred to as an embedding

#

depending on what you refer to as classification (sounds like you're more referring to the cases of few-class classification, otherwise even generation tasks are classification over the vocabulary), I can probably pull up some NLP papers for that

#

I would contrast "embedding" with "representation", which I think is more appropriate for this usage. On one hand, representation is broad enough to basically say nothing other than "contains information", but on the other hand that's probably more appropriate, because in these cases we're just taking some intermediate transformation of the input. In contrast, I think an embedding is a more specific term that say something more about the space/transformation being applied

lapis sequoia
#

@brazen wing sry for ping but if u mind can u help me fix it?

#

tomorrow is fine too

brazen wing
#

sure

#

you want it to give you an array of size 10

#

with a 1 for whatever the right category is. but your training data provides the y value as an int from 0 to 9

#

you just need to convert that from a 1d array to a 2d array with 1's in the corresponding int index

#

this is what i did

#

@lapis sequoia

bitter pewter
#

'sup guys!

I'm developing a code to get into a javascript environment, then I want to scrape the data from the website using BeautifulSoup. The point is that I realized that there isn't any table in the environment, so I was wondering about how can I scrape the data from the website.

Any tips?

MWE:

from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
import pandas as pd
from tabulate import tabulate
import os

url = "https://scon.stj.jus.br/SCON/legaplic/toc.jsp?materia=%27Lei+8.429%2F1992+%28Lei+DE+IMPROBIDADE+ADMINISTRATIVA%29%27.mat.&b=TEMA&p=true&t=&l=1&i=18&ordem=MAT,@NUM"

driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div/div/div[3]/div[2]/div/div/div/div[16]/a')
python_button.click()

driver.switch_to.window(driver.window_handles[-1])

python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div[1]/div/div[3]/div[2]/div/div/div/div[3]/div[2]/span[2]/a')
python_button.click()

driver.switch_to.window(driver.window_handles[-1])

pagina_de_resultados = BeautifulSoup(driver.page_source, 'lxml')

table = pagina_de_resultados.find_all('table')

df = pd.read_html(str(table), header=0)

datalist.append(df[0])

x += 1

driver.quit()

result = pd.concat([pd.DataFrame(datalist[i]) for i in range(len(datalist))], ignore_index=True)

json_records = result.to_json(orient='records')

print(tabulate(result, headers=["Employee Name", "Job Title", "Overtime Pay", "Total Gross Pay"], tablefmt='psql'))

path = os.getcwd()

f = open(path + "\fhsu_payroll_data.json", "w")
f.write(json_records)
f.close()```
#

Actually getting the following Traceback:

  File "C:/Users/Pedro/PycharmProjects/data_scraping/data_scraping.py", line 29, in <module>
    df = pd.read_html(str(table), header=0)
  File "C:\Users\Pedro\PycharmProjects\data_scraping\venv\lib\site-packages\pandas\io\html.py", line 1094, in read_html
    displayed_only=displayed_only)
  File "C:\Users\Pedro\PycharmProjects\data_scraping\venv\lib\site-packages\pandas\io\html.py", line 916, in _parse
    raise_with_traceback(retained)
  File "C:\Users\Pedro\PycharmProjects\data_scraping\venv\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback
    raise exc.with_traceback(traceback)
ValueError: No tables found```
quaint ruin
#

Hi, I wish to calculate the weighted average of n data points, while lowering 2 specific data points to value of 1 arbitrary and adjusting the other data points so that the ratio of the weight is kept as-well as the weighted average. How can I achieve this?
For example I have the following data points and weights:

value1 - 1.186, weight1 - 100
value2 - 1.294, weight2 - 50
value3 - 2.157, weight3 - 200
value4 - 3.235, weight4 - 150

average - 2.2

I wish to set value1 and value2 to 1 and adjust value3 and value3 to satisfy my condition as to keep the weights ratio and the weighted average.

desert oar
#

@void anvil nice, thanks for that. doesn't cover everything i'd want but it's a start

#

still would be looking for seasonal decomposition

void anvil
#

Thatโ€™s literally all in statsmodel

#

And a multitude of other packages

wary fox
#

Just throwing this out there because I'm interested. Does anyone know how common it is for people to pair computer science degrees and biology degrees in college like in a double major? I'm have to pick my college major soon and I love both but am not sure if its a thing that is possible.

lapis sequoia
#

@wary fox it is.. only if you're into bio statistics.. plan to go work for pharma or a clinical research organization..

#

or maybe consulting.. where you're advising those sorts of companies

wary fox
#

@lapis sequoia interesting. So its not common for field biologist to have computer science degrees?

lapis sequoia
#

what's a field biologist.. :v

wary fox
lapis sequoia
#

I'm really not sure of what other prospects biologists have.. aside from ones related to data science..

wary fox
#

Oh ok, thanks for your input though! Its much appreciated.

brazen wing
#

@bitter pewter dont use a form then? you can just select the text you want using an id or name

bitter pewter
#

Hey @brazen wing can you give me some light? Maybe a documentation link or any kind of guide... Iโ€™m kinda lost LOL newbie at Python

brazen wing
#

sure I'll try. but what data are you actually trying to get

bitter pewter
#

Have you tried the code? It redirects you to a public database of a tribunal court. That database have some judgements infos. I want to get that info and copy to a table. Iโ€™ll try to show you! Give me a sec.

#

As you can see, I want to get the info into โ€œProcessoโ€, โ€œRelator(a)โ€ etc. fields

brazen wing
#

ill just get up the webpage

bitter pewter
#

Sure! Iโ€™ll get you the url

#

You should click on ยง 6ยบ text

#

And then click in โ€œ41 documento(s) encontrado(s)โ€

brazen wing
#

ah ok i see

#

well first of all i don't think you are selecting this

#

when you click in selenium

bitter pewter
#

Oh, itโ€™s going right to where Iโ€™ve pointed

#

It works well till I try to get the data

brazen wing
#

yeah its a bunch of divs

bitter pewter
#

Yeah LOL

brazen wing
#

well the class of both the entires

#

is docTexto

#

im just getting it up in python

bitter pewter
#

Sure. Iโ€™ve saw that we have docTexto as class and a bunch of divs about those fields. Iโ€™m not sure that there is a pattern about those divs names.

olive pine
#

Anyone know any ML groups?

lean ledge
#

There's an AI server, a data science server, /r/learnmachinelearning server

#

Channels on every major programming discord

#

I never liked any of them much though so your mileage may vary

#

Smart ML people seem to not use discord too much

brazen wing
#

@bitter pewter ok its the first two elements of class docTexto

#

so you can use driver.find_elements_by_class_name("docTexto")

#

and that will give you a list

#

of the elements

olive pine
#

@lean ledge I am a beginner.

bitter pewter
#

Iโ€™ll use it with append?

brazen wing
#

Well you will need to get the text from them.

#

so it will be like

#

textList = driver.find_elements_by_class_name("docTexto")

lean ledge
#

Other beginners are not who you want to talk to as a beginner

brazen wing
#

and then textList[0].text will give you a string of Processo field

olive pine
#

Yes, I am looking for a group that knows what they are doing.

bitter pewter
#

I see! 0-X will be the position of the field I want

brazen wing
#

yeah thats right

bitter pewter
#

Nice! How do I get that texts and makes a table from it?

#

I want to get it as a spreadsheet

#

table = textList ?

brazen wing
#

well you will probably need to construct it with pandas

bitter pewter
#

I see!

#

Np, I will study about pandas later

#

As you can see, Iโ€™ve kinda copied the code Iโ€™m using

#

Iโ€™m trying to understand it time by time

brazen wing
#

yes it did seem that way haha. an ambitious start

#

pandas is great though, so learning it will be useful

bitter pewter
#

Iโ€™m trying to use it as my undergraduate thesis in Law

#

๐Ÿคฃ

#

Iโ€™ll make an analysis of a Courtโ€™s decisions group

#

If itโ€™s ok, Iโ€™ve added you in my friends list. I wonโ€™t bother you ๐Ÿคฃ

brazen wing
#

yeah no problem. you can probably just message me in here anyway

#

but yeah if you just do a simple tutorial on panda, it should be obvious how to make a table of strings then

bitter pewter
#

Sure!

#

Thanks for your help, Bozo

#

You amazing!

brazen wing
#

no prob dude

proper sierra
earnest prawn
#

@proper sierra looks a little inspired by the SPIEGEL mining talk at 34C3 right?

proper sierra
lapis sequoia
#

hey this is pretty cool

strong flare
#

may i know the first pic is showing or keep loading ?

worldly ruin
#

just keeps loading

strong flare
brazen wing
#

@strong flare you're using pandas right?

strong flare
#

yes

#

@brazen wing with jupyter notebook

#

do you need the dataset in excel ?

brazen wing
#

that just means you will have it in a csv

#

or you should have it in a csv

#

you can just import from the csv with pandas, and then you will need to create a new dataframe with the headings you want

strong flare
brazen wing
#

right. do you know how to construct a dataframe?

strong flare
#

i m learning right now but i dont know why show out this error
ValueError: x and y must have same first dimension, but have shapes (3174,) and (1509, 4)

#

can you help me in code xd

brazen wing
#

well I would do a tutorial on pandas first to understand the basics

#

it will make it easier to understand

strong flare
#

ok you can use my dataset to do the tutorial

brazen wing
#

i mean look one up in google

strong flare
#

also can xd

#

i tried to search alot but not working

lapis sequoia
#

thanks bozo

desert oar
#

man

#

both pandas and numpy docs

#

they really lack a "basic concepts" section

#

it all jumps right into technical stuff

#

the world needs a good coherent intro to this stuff

lyric canopy
desert oar
#

thats better than i remember

#

huh you know thats pretty good

#

it covers broadcasting rules

lyric canopy
desert oar
#

10 minutes to pandas was the one i was thinking of

#

i remember when i started w/ pandas coming from R, i read 10 minutes to pandas and had absolutely no idea what was going on

#

selection by label? what?

#

the numpy one is serviceable because it starts by describing the data model

#

pandas expects you just figure out that theres this Index data structure, and that a Dataframe is logically a collection of Series, et al

lyric canopy
#

I think the main problem in the pandas one is that they omit half of the example

#

Show the df with labels and then the result of a selection

desert oar
#

ugh yes. they reuse the same example from the beginning and expect you to keep track

lyric canopy
#

That way, you can see what's happening and learn the terminology

#

Yeah, it's one of the basic writing principles we were taught in our scientific writing course: Don't rely on your readers remembering things read on page 1 when they're reading page 7

desert oar
#

good principle

wide gyro
#

Hey guys, I want to make my clean function more accessible and wanted to know if drop.na(subset='COLUMNS') could be set to where it reads every column no matter what file you put through it

#

So if I have a csv of 15 columns, it'll sort through them, but then sort through another csv with 5

desert oar
#

@wide gyro just leave off subset=

#

if you mean dropna()

wide gyro
#

so just dropna and that's it

desert oar
#

yes

wide gyro
#

well with ()

desert oar
#

but in general you can get all the columns from a dataframe with df.columns

bitter pewter
#

Here's a good guide on Pandas dataframes

#

Ignore the "from Silicon Valley" crap marketing lol

bitter pewter
#

Hey, guys! I'm trying to make the code to read 5 different pages content and gather it as a table in Pandas. How do I make the selenium to change the pages and read the content?

#

MWE:

#
from bs4 import BeautifulSoup
from openpyxl import Workbook
import numpy as np
import pandas as pd

url = "https://scon.stj.jus.br/SCON/legaplic/toc.jsp?materia=%27Lei+8.429%2F1992+%28Lei+DE+IMPROBIDADE+ADMINISTRATIVA%29%27.mat.&b=TEMA&p=true&t=&l=1&i=18&ordem=MAT,@NUM"

driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div/div/div[3]/div[2]/div/div/div/div[16]/a') # Aponta o dispositivo (art. 17, p. 6o)
python_button.click()

driver.switch_to.window(driver.window_handles[-1])

python_button = driver.find_element_by_xpath('/html/body/div[2]/div[6]/div[1]/div/div[3]/div[2]/div/div/div/div[3]/div[2]/span[2]/a') # Aponta os resultados da pesquisa
python_button.click()

driver.switch_to.window(driver.window_handles[-1])

textList = driver.find_elements_by_class_name("docTexto") # Variรกvel que puxa os dados nos campos da lista de resultados

resultados = BeautifulSoup(driver.page_source, 'lxml')

parse = resultados.find('div', {'id':'listadocumentos'})
paragrafoBRS = parse.find_all('div',{'class':'paragrafoBRS'})

header = []
content = []
for each in paragrafoBRS:
    header.append(each.find('h4', {'class':'docTitulo'}).text.strip())
    content.append(each.find(['div','pre'], {'class':'docTexto'}).text.strip())

    df = pd.DataFrame([content], columns = header)

df.to_excel('dados.xlsx') # Exporta as informaรงรตes para um arquivo .xlsx

driver.quit()
#

It already successfuly export the data to a .xlsx spreadsheet

wide gyro
#

I'm trying to output a coordinates map but my lat and lon are switched and I think that's the reason for making my map behind it super small

#

as in my lat is on x axis and lon is y axis

#

I am setting geometry = [Point(xy) for xy in zip(df["lng"],df["lat"])]

#

Which I thought would do the trick, but the latitude is still x axis

#

Nevermind

#

I think I got it

desert oar
#

it depends on how you defined your Point class

lapis sequoia
#

Also check out namedtuples

strong flare
#
df = pd.read_excel(r'data\mcd vs kfc eg.xlsx')
KFC = df['Restaurants'] == 'KFC'
KFC_Amount = df['Amount'] > 0
KFC_Data = df[KFC & KFC_Amount]

MCD = df['Restaurants'] == 'MCD'
MCD_Amount = df['Amount'] > 0
MCD_Data = df[KFC & KFC_Amount]

KFC_Data = KFC_Data.groupby(['Date']).sum()
MCD_Data = MCD_Data.groupby(['Date']).sum()```

HI Guys may i know how can i join this two data (KFC_Data,MCD_Data) in a table with specific columns name ?
#

is it i have convert this 2 data to new dataframe and join together ?

onyx granite
#

you need a key to join on

#

or use the default index (if the keys actually match)

strong flare
#

@onyx granite it can be date ? a column in the dataset

onyx granite
#

there should be a way to specify an index to be a datetime

#

check the docs and you should be able to join based on that

lapis sequoia
#

hi

#

i'm looking for a good plot

#

I have some groups names in a column and their correlation score in a column

#

what sort of plot would look good

strong flare
#
import pandas as pd 
import matplotlib.pyplot as plt

df = pd.read_excel(r'Data\mcd vs kfc eg11.xlsx')

KFC = df['Restaurants'] == 'KFC'
KFC_Amount = df['Amount'] > 0
KFC_Data = df[KFC & KFC_Amount]

KFC_Data.rename(columns={
    'Amount': 'KFC_Sales',
    'Count': 'KFC_Count',
},
           inplace=True)
KFC_Data.set_index('Date')
KFC = KFC_Data.groupby('Date').sum()

MCD = df['Restaurants'] == 'MCD'
MCD_Amount = df['Amount'] > 0
MCD_Data = df[MCD & MCD_Amount]

MCD_Data.rename(columns={
    'Amount': 'MCD_Sales',
    'Count': 'MCD_Count',
},
           inplace=True)
MCD_Data.set_index('Date')
MCD = MCD_Data.groupby('Date').sum()

Combined = pd.concat([KFC, MCD], axis=1)

# Plot with differently-colored markers.
plt.plot(Combined.index, Combined.KFC_Sales, 'b-', label='KFC_Sales')
plt.plot(Combined.index, Combined.MCD_Sales, 'g-', label='MCD_Sales')
plt.plot(Combined.index, Combined.KFC_Count, 'r-', label='KFC_COUNT')
plt.plot(Combined.index, Combined.MCD_Count, 'y-', label='MCD_COUNT')

# Create legend.
plt.legend(loc='lower right')
plt.xlabel('Date')
plt.ylabel('KFC VS MCD')
plt.show()

Finally success yoj

#

but it still showout error

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
To register the converters:
>>> from pandas.plotting import register_matplotlib_converters
>>> register_matplotlib_converters()
warnings.warn(msg, FutureWarning)

lapis sequoia
#

that's just a warning

strong flare
#

@lapis sequoia thanks i will just ignore

vestal lagoon
#

Does anyone have any good suggestions for economic forecasting libraries on python

#

Iโ€™m using ARIMA for time series but I feel itโ€™s lacking complexity for economics and I donโ€™t know how to fix that lol

strong flare
#

Hi all Pros, May i know how can i select month on date ?
like i need to select March on the date listed below, how can i get it ? the dtype is ('<M8[ns]')

2019-03-01
2019-04-01    
2019-05-01
polar acorn
#

@vestal lagoon You can check out facebook's prophet library. It's a bit more advanced but still quite intuitive. Time series forecasting is hard though and simple models often perform as well as more complex models so be aware that there might not be a magic bullet here.

#

@strong flare You can find the month of a datetime variable date as follows date.month. If you have your dates as a column in a pandas data frame called df you can get a subset of the dataframe on March like this: df.loc[df['Dates'].month==3, ]

strong flare
#

@polar acorn ok got it thank you

spark nimbus
#

So I found this post with the exact same question I have, and the answer is very clear to me. However, I'm having a lot of issues implementing this, can anyone give me some directions on what I should and shouldn't do?
https://dsp.stackexchange.com/questions/24017/which-filter-for-an-audio-equalizer?rq=1

#

also @summer plover could the admins discuss the possibility of opening up a channel specifically for audio/video/other signal processing?

summer plover
#

that is something we can talk about. could you make the same request in #community-meta as well?

spark nimbus
#

sure

summer plover
#

โค

lean ledge
#

@spark nimbus hmu with your signal processing questions

#

putting them mechatronic engineering skilsl to work

#

they have specific terms btw, not frequency domain vs time domain

spark nimbus
#

okay so tl;dr I want to add an EQ for users to configure simply with sliders (similar to how V4A works if you've ever used it)

lean ledge
#

IIR and FIR

#

IIR will probably be better

#

You can treat them as multiple bandpass butterworth filters for which you change the gain

spark nimbus
#

I what

#

note that I have practically zero knowledge of such terms

#

I know IIR and FIR and thats it

lean ledge
#

A band-pass filter, also bandpass filter or BPF, is a device that passes frequencies within a certain range and rejects (attenuates) frequencies outside that range.

The Butterworth filter is a type of signal processing filter designed to have a frequency response as flat as possible in the passband. It is also referred to as a maximally flat magnitude filter. It was first described in 1930 by the British engineer and physicist Stephen Bu...

spark nimbus
lean ledge
#

Just make arbitrary cutoffs (take the entire 20-20khz range and split it up (possibly in log scale?)) and make a bandpass filter for each range

spark nimbus
#

I couldn't find any code that does this at all sadly

#

any recommendation on how to do that?

#

also, how would I make a bandpass filter?

#

and do I pass it through all of them sequentially or in parallel and them sum them?

lean ledge
#

Bandpass filter is essentially a low pass and a high pass together

#

And parallel and then sum

spark nimbus
#

what's a low pass/high pass filter?

lean ledge
#

parametric equalizer in python

#

should be able to pick up a few things

paper niche
#

Anyone familiar with pyspark? i have a spark dataframe containing a column of arrays (A) and another column of structs (B). The array elements themselves are structs of the same structure as the dataframe struct column (B), whatโ€™s the correct way to append B to arrays in A, row-wise?

spark nimbus
#

I checked that one but it didn't seem like it worked the same nor was the code very readably structured

#

and it also doesnt specify the width

lean ledge
#

looking at the code, there's 5 filters and each is initialised with it's parameters already made

spark nimbus
#

I also tried chekcing the pulseeffects source but it's also pretty difficult to understand their implementation

lean ledge
#

https://github.com/twoz/pyEQ/blob/master/filters.py
defines all filter types, in particular as elliptic filters for the band-pass and high pass filters and butterworth filter for low pass. it uses scipy as a backend so it shouldnt be hard to figure out what each parameter is for with a little bit of documentation reading

spark nimbus
#

so I wanted the bandpass filter right

lean ledge
#

multiple band pass filters and then a low and high pass

#

for either end

spark nimbus
#

is that the HPButter one?

lean ledge
#

HPButter -> butterworth high pass

spark nimbus
#

uhh

lean ledge
spark nimbus
#

and now in a way I understand? ^^'

lean ledge
#

This gives basic background

spark nimbus
#

I've been through this one before

#

doesn't mean I understood haolf of it >~>

lean ledge
#

high pass blocks low frequencies (high ones pass), low pass blocks high frequencies (low ones pass), bandpass blocks both low and high except in a band that passes

#

butterworth, chebyshev and elliptic are types of filters

spark nimbus
#

so to make a bandpass I just add up a low pass and band pass?

#

wont that make a giant spike tho

lean ledge
#

giant spike?

spark nimbus
lean ledge
#

\omega is angular frequency, |G| the the magnitude of gain

spark nimbus
#

this is adding up both

lean ledge
#

Where'd you get those graphs from?

spark nimbus
#

it's an approximation I made in desmos using just log_e(x) and e^x

#

both converging to the same Y

lean ledge
#
  1. it's not what filter responses look like generally
  2. you dont add them up in the + sense of the word, you compose them. one after another. that's a convolution in the time domain, multiplication in s domain
spark nimbus
#

a what

lean ledge
#
  1. you have control over the cutoff frequencies. you can shift them left and right
spark nimbus
#

whats a time domain and s domain and convolution

lean ledge
#

f(t) is your signal
f(\omega) is your signal in frequency domain (after fourier transform)
f(s) is your signal in complex frequency domain (after laplace transform)
convolution is mathematical operation involving shifting, multiplying and integrating two functions. according to the convolution domain, a convolution in one domain (time or frequency) is a multiplication in the other domain

#

Wish I could dumb it all down but it's taken me 2 semesters of classes in my engineering degree to get a hang of it all, it's not an easily approachable subject for a person without an electrical engineering background and maths is the fields' bread and butter

spark nimbus
#

It's mostly just terminology I need to get used to

#

If I see code I practically understand complex concepts immediately

lean ledge
#

BlobThinkingEyes Code doesnt help you learn a semester's worth of signal theory that underpins that code

#

My suggestion would be to just reuse someone else's code

spark nimbus
#

But there's not even any code compatible with my project :(

lean ledge
#

What's wrong with the pyEQ one?

#

it just uses scipy/numpy

spark nimbus
#

The design is really hard to work with given my current framework

#

Just figuring out how to make it work without GUI is difficult enough already

#

Then I need to make it work from just a list of floats

silk forge
#

i need some help with this ^

#

i cant understand

desert oar
#

Which part don't you understand