#data-science-and-ml

1 messages Β· Page 200 of 1

olive willow
#

then go to 4:21

stoic beacon
#

Yep I'm there

olive willow
#

you see where I and J hat moved

#

right?

stoic beacon
#

Yeah

#

Oh I see

olive willow
#

so vector V = -1I + 2J

stoic beacon
#

It's -1 * [1, -2]

olive willow
#

yes

stoic beacon
#

Alrighty I got it

olive willow
#

and then 2 *

#

that is what you call a linear combination

#

here's the formula

stoic beacon
#

Got it. That's what I thought. Just need to watch the next video now

olive willow
#

aV + bA

#

a and b are scalars

#

V and A are vectors to be more precise I and J hat for example

#

you know what the coordinates are of I and J hat

#

nwm

#

do you know how to solve a linear equation ?

stoic beacon
#

Not sure

olive willow
#

ok so this is the equation:

#
2x - y = 0
-x + 2y = 3
stoic beacon
#

Oh yeah duh

olive willow
stoic beacon
#

I'll look at some other courses

#

Thanks

olive willow
#

here you see how to use matrixes and how to solves equations up to Rn

#

yh sure but in my opinion this is the best one. I'm 14 and the guys can explain it so that even I know exactly what's up

sand reef
#

What's up?

stoic beacon
#

But it's MIT. I'm just concerned it'll be over my jead

#

Head

olive willow
#

nope dude at least not the first course

#

I understood it fully

sand reef
#

Anybody can learn through MIT

#

Even the advanced courses are made easier.

olive willow
#

yup

stoic beacon
#

Is this course on edx or something?

sand reef
#

Come to India. Unless you're already in India, then you'll see what's bad explanation. They don't even want to teach and they are assigned as teachers.

olive willow
#

hahahha lol

#

yt dude

sand reef
#

Okay then. Habe phun.

stoic beacon
#

It's not on edx or Coursera?

olive willow
#

idk but I found it on YT

stoic beacon
#

That works

#

Wish I had your initiative at your age lol

olive willow
#

hhahaha

#

like people at my age in my school are just partying and stuff, me be like: I just like math and programming

desert oar
#

i can't recommend strang enough

#

it's an excellent course for getting comfortable with matrices of real numbers

olive willow
#

I'm using numpy to get familiar with matrixes and also solving linear equations

desert oar
#

its good to do it on paper too

olive willow
#

yh

#

I'm doing it on papaer

#

paper

#

not numpy but linear combies and equations

#

and also on my whiteboard

desert oar
#

thats good

olive willow
#

yup , after linear algebra, comes calc I think right?

#

for ML

stoic beacon
#

@desert oar strange?

#

Strange*

#

Fucking autocorrect

#

You know what I mean

olive willow
#

can someone give me an example of a 1d,2d,3d array?

misty sonnet
#

@olive willow 1d is just a single "dimension" of a python list for example

#

So []

olive willow
#

yh

misty sonnet
#

2 dimensions means a list inside a list

olive willow
#

then 2d is a list in a list

#

yh

misty sonnet
#

So [[], []]

olive willow
#

can there be one list

#

[[]]

#

or npt

#

not

misty sonnet
#

Emm... I think so.

#

But that's pretty useless

#

And you already know what 3D is now.

olive willow
#

but how would you represent a matrix?

#

like this?

#
[[4 6]
 [0 9]]
#

and so this is a 2d array of number otherwise known as a matrix right?

misty sonnet
#

Well, you should likely use numpy for this

olive willow
#

I did

#

πŸ˜ƒ

misty sonnet
#

Ah ok great

olive willow
#

but I get this

misty sonnet
#

Yes. a matrix is a 2d array

olive willow
#
import numpy as np
import matplotlib.pyplot as plt

Array = np.array([[4, 6], [0, 9]])
Array2 = np.matrix(Array)
print(np.ndim(Array))
print(Array2)

plt.plot(Array[0], Array[1])
plt.xlim(0, 10)
plt.ylim(0, 10)
plt.show()
misty sonnet
#

But you need to make sure it's layed out correctly

olive willow
#
2
[[4 6]
 [0 9]]
misty sonnet
#

Yes. that's right

#

You have 2 prints there

#

What's wrong with it?

olive willow
#

this is a 3d tensor

#

right?

misty sonnet
#

No.

#

that's a 3 by 2 matrix

olive willow
#

oohhh

#

what's a tensor then?

misty sonnet
#

3D

#

So another layer of lists

olive willow
#

so that is still 2d

misty sonnet
#

Yep

#

There's only 2 layers of lists

olive willow
#

[ [ [4, 6], [3, 8] ] [ [2, 6] [1,6] ] ]

#

this is 3 d

#

a 4 by 2 tensor right?

misty sonnet
#

It's a...

#

Em...

#

Idek

#

It won't be x by y

#

It'll be x by y by z

olive willow
#

so 3d

misty sonnet
#

I think that's 2 by 4 by 2?

#

Yes.

olive willow
#

nwm, I will learn how to write vectors and matrixes. this sht is hard

#

in programming. IRL it isn't that hard to do but in programming it's impossible

misty sonnet
#

Haha

olive willow
#

I'm 14 so I think I've some time hahahahah

misty sonnet
#

Matrices are hard. Nevermind tensors

olive willow
#

yup

misty sonnet
#

So pick your battles

#

Definitely dude. Keep going tho;

olive willow
#

then I choose vectors

#

hahah

misty sonnet
#

It's impressive, keep up the euthusiasm

olive willow
#

yh I know hardcore road to ML

#

sure dude thanks!

#

and let's not forget about indexing a matrix, not even talking about a tensor

stoic beacon
#

So what in machine learning is represented by a 2d matrix?

#

Cuz I'm studying linear algebra and seeing this in Numpy I'm trying to apply it here

earnest prawn
#

well every piece of data that requires more than one dimension like for example images have to be stored as 2d (or if you have rgb ones even 3d) data structures

#

and of course every transformation you apply to that piece of data also has to operate on those matrices

#

@stoic beacon

stoic beacon
#

Thanks man

#

I can't decide if I want to use MLB stats to predict game outcomes or try the MNIST database. Both would be a NN

#

But if I go about looking for the MLB dataset, I wouldn't know what to look for or how to format the data. I know Kaggle exists but once I find a dataset there I get confused on what would be used for inputs or how to even decide what gets used for inputs

median siren
#

Hi all, I've posted a similar question in the r/LearnMachineLearning discord, but hopefully someone here can help me out, too.

I'm trying to follow the following tutorial:

http://linanqiu.github.io/2015/10/07/word2vec-sentiment/

However, when i try to replace the vectors in numpy.zeroes with my own embeddings, I get the following error:

ValueError: setting an array element with a sequence.

Does anyone have any experience with this and / or how to solve this?

reef bone
#

The error is quite explicit, it tells you that you are trying to set an element of a numpy array to a sequence, rather than a single value (scalar)

#

It's hard to help more without seeing the code

lean ledge
#

@stoic beacon MIT people learn the same stuff people from other universities do. They're not gods

#

I use online courses for MIT or similar for the majority of the stuff

#

The resources are just better

reef bone
#

(I would assume that the error comes from you trying to set an element to a list or another array that holds the embeddings for a certain word)

stoic beacon
#

Fair enough man. I always just assume MIT, Harvard, and the like are all harder since yaknow, they're for like...smart people and whatnot

#

Usually

#

Unless you have money but we won't go there

lean ledge
#

They might go slightly more in depth but that depth is good and still accessible. Where they really shine is specialised highest level (fourth year and grad) courses that both exist and are well done

#

Eg. MIT's underactuated robotics or Stanford's CNNs for deep CV courses are not something that are easy to find elsewhere

frigid jacinth
#

hello.... so there is a weird idea just popped up in my mind and I don't know which AI/machine learning libraries do I need if I want to achieve the following scenario...

Scenario 1 :
(The AI know nothing)
(The User's facebook account is public and has set the birthday)
AI:hi
User:I am John
AI:Hello, John
User:My facebook account is XXX
(the AI will now know the user's facebook)
AI: OK
User:How old am I?
(Then the AI will go to his facebook and search for the data)
AI: (give the answer)

Scenario 2 :
(The AI knows user is john now)
AI:Hello, John
User:What is the result of the barcelona vs liverpool on 8/5?
(The AI will now go search in google)
AI: Liverpool won and it is 4:0 (something like this)

Sorry...I know this kind of confusing...and thank you for trying to help me out...

desert oar
#

i think thats basically what siri does

frigid jacinth
#

Oh that's right..never thought of this before lol

#

thank you

stoic beacon
#

So Keras comes with the MNIST dataset but if it didn't how would you load that?

#

Since it's images

#

Also, side note: is TensorFlow hard to learn?

stoic beacon
#

Also also, since a vector typically represents magnitude and direction how do vectors relate to machine learning?

#

I assume they're not talking about the same thing

sand reef
#

In ML they are basically column only matrices. Vectors are just one column of values in ML.

#

Why are the called vectors, because when you represent a vector in n-dimensions, you can write it as:
ai + bj + ck +...

#

So, you can instead of writing i, j, k,...
Write as a column matrix

#

With [[a] [b] [c]...]

#

If keras didn't come with the dataset loaded, you would have to download the dataset manually. Check sentdex. He has a video on how to load dataset. It's the second video of his new tensor flow, keras tutorial.

#

Tensor flow is lower level than keras, but it's not that hard.

#

@stoic beacon

stoic beacon
#

Thanks for the responses bud

#

I'll give learning TF a shot

void anvil
#

Can anyone explain why autoencoders are so popular compared to all the other models?

#

From what I've seen, it's not all that great in practice

sand reef
#

Well, it captures representations pretty well.

#

Reduces the noise to a minimum, if not even removing it.

#

And is able to output as close as to the original input.

lean ledge
#

@void anvil Depends on what they're for. For vision-y tasks, there's a lot more detail involved in making good images so autoencoders on their own don't work well but they're a pretty simple and cool way of reducing the dimensionality of your data to a much more dense representation

lapis sequoia
#

autoencoders are now used extensively in NLP tasks too.. look up BERT..

#

for capturing context aware representations of words.. ergo word to sentence embeddings

lapis sequoia
#

also.. dont mind me making nlp sound sexy.. it's not.. it's mostly mind numbing work and lot of Lisp :v.. I should've stuck to image processing..

lean ledge
#

Image processing is the fun stuff

#

Signals turn me on

lapis sequoia
#

Im trying to make sense of some numbers

#

consider I computed correlation of x vs a set (t, u , v, y, z )

lapis sequoia
#

what I stated above was cosine similarity..but apparently it's the same as pearson correlation coefficient

#

for centered vectors..

silent swan
#

naw NLP is great

karmic geyser
#

anyone have experience with lowpass/highpass/bandpass filters on digital audio samples?

lean ledge
#

@karmic geyser me

#

why

#

Spent this entire semester + possibly the next one if I take the DSP elective

#

You've been typing for a while πŸ‘€ Scared of how long the question might be

karmic geyser
#

I'm using sounddevice in python to get audio input then output it with low latency. I want to turn a stereo audio input into 5 or 6 channels which I then will output to a subwoofer, midrange speakers and then finally tweeters. pretty much I'm trying to do a 3 way crossover in software. for an example say I have an audio stream at 44.1khz sampling rate and I have 1024 samples in an array would I need to add some kind of delay of like 30 samples or so. If I wanted to reduce the volume of everything below 3500hz by like 6db an octave? Also what books/online would you reccomend to do basic stuff like butterworth filter.

#

@lean ledge

lean ledge
karmic geyser
#

The delay would be so when I go from 1 chunk of samples to the next chunk the filter would still be smooth.

#

30 samples was arbritary but I don't know how many samples a normal filter would use.

#

maybe not so much a delay but memory of the last 30 samples or 30 processed samples given to the filter for each channel.

#

Thanks for the lecture notes, they are quite good. I don't have much experience with filters or reading and understanding university level math. I tend to understand math better if it's written in a programming language.

lean ledge
#

When you're done with signal processing basics there's https://ocw.mit.edu/resources/res-6-008-digital-signal-processing-spring-2011/ for focus on digital signals and then https://www.coursera.org/learn/audio-signal-processing for focusing on audio signals

#

I'm not sure I can help you with audio processing chunks because while i've done a bunch of signal processing, it has been in context of signal theory rather than the details of specific implementations but I believe you're looking for techniques involving Hann and Hamming windowing functions and hen merging on top

karmic geyser
#

wouldn't that stuff be more for showing a spectrogram?

#

the windowing functions?

lean ledge
#

I'll just say that if you use a technique such as IIR filtering rather than something IIR based (like butterworth filters), you might be able to ge decent filtering without worrying about dealing with merging the output of separate buffer frames

#

Uh what do you mean?

karmic geyser
#

I feel like you would pass your samples through a windowing function and it would let you know how much activity is going on at a certain frequency bandwidth. and you would just repeat that say 1024 times with different bandwidths and use the output to make a spectrogram.

lean ledge
#

What exactly are you trying to do?

#

Create a spectogram or use outpu to drive audio?

karmic geyser
#

lower the volume of frequencies below 3500hz on a continuous stream of digital audio samples.

#

with a curve so the lower the frequency the lower its volume is.

lean ledge
#

@karmic geyser To skip the theory for you, construct a butterworth filter with the parameters you want (it's simple with scipy), then take advantage of the fact that butterworth is IIR and use the zf value that the lfilter function returns after you use a filter and pass that in into the next filter operation when you run the next batch

karmic geyser
#

I tried that but it didn't seem to be working. It was like it was just lowering the volume of the entire frequency spectrum.

lean ledge
#

Or you can use a pre-built system like GNU Radio with Python to set up the streaming architecture for yourself

#

@karmic geyser Every filter will reduce the volume to some extent

#

it shouldnt be by a lott

#

should be disproportionate

#

very disproportionate

#

you can always reamplify by multiplying by a constant as long as the wrong frequencies are filtered out

#

if they're not filtered out, there's probably something wrong with your parameters

karmic geyser
#

let me just quickly upload the code I have. It was ment to be a 6 order butterworth but it was making the signal inaudible. if I multiplied the signal by like 2048 I could hear it again it mostly seemed to be the same frequencies but with some distortion from compacting and expanding the samples.

lean ledge
#

How did you get the parameters?

#

v good for filter design, I love it

#

Anyways, i'm dead tired from studying for my signals course, this isnt helping much :p I'm gonna go take a rest

#

Good luck!

#

hopefully the resources I linked can help a bit

karmic geyser
#

Oh okay. this is my code.

#

line 10-16 is the filter values 18-22 applies the filter, 53-63 is where I actually pass the data

foggy bridge
#

Hello everyone

#

i have a question regarding panda

#

whats the best source to learn?

lyric canopy
#

There's also a 10-minutes to Python tutorial in the official documentation of Pandas

foggy bridge
#

thank you @lyric canopy

stoic beacon
#

When using Colab, where do you save CSV files to be read in

olive willow
#

yo guys!

stoic beacon
#

Morning

olive willow
#

howdy?

stoic beacon
#

Eh no

#

Don't say that

olive willow
#

hahahah hwry?

stoic beacon
#

Cuz you're not a cowboy lol

olive willow
#

yh I know 😦

#

hahaha

stoic beacon
#

So stick to calculus

#

Damn whippersnappers

olive willow
#

hahaha

lapis sequoia
#

🀠

#

@stoic beacon depending on how big it is, you can save it locally or on cloud storage..

stoic beacon
#

Awesome thanks

olive willow
#

guys on what do you need calc in data science? just curious

desert oar
#

optimization

#

understanding and computing gradients is really important

#

understanding at least how and why convex optimization works is important

#

also finite series come up a lot, that's usually covered in calc courses even if it's not strictly calculus

craggy geyser
#

quick question: for pandas, I have a dataframe where I have a timestamp column, an ID column, and another column category. In some cases, three rows can have the same ID and timestamp, but three different categories. Is there an easy way to drop all rows where this happens except one of them?

#

follow-up: This is not important, it doesn't really matter which row I keep since multiples is a good sign, but I have a 4th column snr which I could use to select which one to keep, i.e. keep the row with the highest snr value

polar acorn
#

@craggy geyser
Take a look at the following code (copy pasted from stackoverflow). It creates a dummy df and drops all rows where the values in the A and C column are not unique. It keeps the first non unique row.

import pandas as pd
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})
df = df.drop_duplicates(subset=['A', 'C'], keep='first')
#

If you want to keep the row with the highest snr value and you don't mind changing the order of your df you can sort on snr to begin with before dropping.

craggy geyser
#

ah, I see, by feeding it the columns, It will drop duplicates where the pair of those two columns are the same. That makes sense, and when you write here now I think I actually have done this in the past, should have remembered

#

and yes, that makes sense with snr of course

#

thanks!

#

πŸ‘πŸΌ

polar acorn
#

np πŸ˜ƒ

olive willow
#

so yh I'm learning data science and am thinking about buying a course, do they really cover at least the most of the stuff you really need to know?
because I'm thinking of buying the datacamp subscription
is it any good or are there better courses

desert oar
#

why not do a free one?

#

that's machine learning focused, but machine learning is a fine place to start nowadays for more general data science

silent swan
#

isn't fast.ai significantly deep-learning focused?

#

(I think it's good, but not sure if it's the best recommendation for general data science.)

desert oar
#

yeah it is

#

hes also 14 πŸ˜›

#

if you already know how to code, i don't see a problem with starting with deep learning

silent swan
#

oh, yeah then disregard data science, acquire AGI skillz

desert oar
#

especially if you're doing it as a hobby

#

if you don't fall into the "arrogant AI guy" trap then you should be fine transitioning into general data science

silent swan
#

deep learning would've been such a blast if I could've started earlier

desert oar
#

you can learn probability and stats later one you know the math and coding

silent swan
#

instead I was learning javascript before javascript became good

desert oar
#

heh

lean ledge
#

Deep learning should be learnt after ML

#

For one, deep learning is mostly useless and bad

lapis sequoia
#

What’s deep learning for

desert oar
#

Facebook begs to differ @lean ledge

lean ledge
#

For another, it's easier to learn how to treat it like any other model when you know how other models work

silent swan
#

I don't think it's mostly useless and bad if you use it on places where it's clearly good at. But don't use it to predict sstock prices

desert oar
#

facebook, google, openai, et al

lean ledge
#

How so?

#

Research β‰  practice

#

I am very very aware of ML research, I assure you

#

Deep learning excels in a few tasks but in practice as a data scientist, you almost never use deep learning

desert oar
#

of course

silent swan
#

hence why disregard data science, acquire AGI skillz :p

desert oar
#

i literally never use it

#

does that make it useless and bad?

lean ledge
#

Deep learning is the way to do CV and NLP but apart from that there's few uses for it

desert oar
#

which are huge problem domains right now

silent swan
#

actually though, as a 14 year old, deep learning will be much more fun/better as a hobby than learning how to do pivot tables

desert oar
#

at least 50% of the data science jobs i see are either CV or NLP or audio related

silent swan
#

if this were a first college course I'd say yea go learn some statistics first

lean ledge
#

πŸ‘€πŸ‘€πŸ‘€ we must be seeing very different jobs

desert oar
#

unstructured data is the big data of 2019

#

its a fad in some regards

#

but in others its a genuine big step forward

silent swan
#

but if he's going to have fun with CycleGANs and make cool pixelated pokemon recolors I say go do that

desert oar
#

^^

#

also theres no point being hyperbolic and inflammatory

lean ledge
#

CV and NLP are a minority of data science jobs and they require a large large amount of specialisation for the average job

#

You basically have to spend an year learning just deep vision after having already studied other DL and ML stuff in order to catch up on SOTA

desert oar
#

thats fair

reef bone
#

Where are you looking that half the data science jobs you see are CV or NLP or audio related?

desert oar
#

my recommendation was targeted at a bright kid who's already good at programming and math, and wants a place to get started

#

@reef bone maybe in the wrong places

lean ledge
#

Yeah I rarely ever see a CV or NLP job lol

reef bone
#

I'm genuinely wondering because I rarely see those at all

desert oar
#

anyway i wouldnt have made that recommendation to anyone else

lean ledge
#

The few CV jobs I see are specialised robotics related jobs

silent swan
#

it's like telling a kid "Don't learn javascript, start with learning big O and data structures"

desert oar
#

or "dont learn C++ learn python instead"

#

it was just a recommendation

reef bone
#

DL is an approach to ML, I don't think you can really learn DL without ML

desert oar
#

and frankly im only resisting this at all because your tone was confrontational

#

unnecessarily so imo

silent swan
#

tl;dr use CycleGANs to make new pokemon sprites, but also read Murphy

desert oar
#

^

lean ledge
#

I definitely think classic ML should be learnt before DL. It makes people too comfortable trying to use DL because that's what they're used to. It builds weak foundations in ML to start at DL.

desert oar
#

i agree, for anyone over 14

silent swan
#

I 100% agree for people getting serious in the topic

#

I disagree for a hobbyist wanting to pick up something new and cool

desert oar
#

its like saying to learn what a hash table is before using turtle graphics or pyqt5

lean ledge
#

I s2g GAN SOTA changes faster than the hottest JS frameworks

desert oar
#

you literally dont need to know

#

who ever said SOTA

silent swan
#

lol feel like we're talking past each other at this point

desert oar
#

you still learn about probabilities, objective functions, et al

#

doing mnist

lean ledge
#

(I was not referring to learning SOTA, just joking about the GAN hype)

desert oar
#

anyway if you have a recommendation for a free data science course that isnt fast.ai, i'm sure the person who asked the question originally would appreciate the recommendation

#

and i would too, so i can recommend to others

silent swan
#

oddly enough, Jeremy Howard probably would have been great for a datascience course

#

afaik that's his background

polar acorn
#

Honestly though @lean ledge the all caps nick is making you look more upset about this then you probably are πŸ€”

lean ledge
#

There are many other than fast.ai. Andrew Ng's course, Columbia's ML course (my preference), Google and Microsoft have their own free ones, etc.

#

Probably tbh

silent swan
#

when people talk about Andrew Ng's course, are they still talking about the coursera one? or deeplearning.ai

#

honestly even pre-DL boom, I never liked his coursera course

lean ledge
#

I generally assume coursera unless proven otherwise

#

I didn't like it either

#

It's too shallow

#

And too much "don't worry if you don't understand" going on

silent swan
#

also octave lol

lean ledge
#

That too

silent swan
#

my main gripe with fast.ai is that that group is incredibly self-promotional

#

the content is solid though (if a little bit loosey-goosey)

desert oar
#

i didnt like the coursera course either

#

not only "dont worry about it"

#

but also octave 🀒

silent swan
#

3/3 surveyed people hate the coursera course lol

desert oar
#

and teaching linear regression with gradient descent was always weird to me too

silent swan
#

yes!

lean ledge
#

n o r m a l E q U a t I O n

polar acorn
#

I liked it but then again I already had a background in statistics and maths and learned to program in Matlab so maybe it was intended for me.

olive willow
#

I'm back

#

wooww

silent swan
#

we've concluded that you need to read SICP but also He et al. 2015

desert oar
#

i didnt know columbia had a free ML course

lean ledge
#

It was too boring for me. Not Mathy enough. Columbia's course was much nicer and when I looked back on it, I realised the topics were so practical. Stuff I use or see being used all the time in real world DS

desert oar
#

i wonder what a more general purpose "data science" course would look like

#

vs a "ML" course which is what i normally see

silent swan
#

data science is whatever you want it to be

desert oar
#

well sure. i assume it'd spend more time talking about probability and stats, as well as data visualization

silent swan
#

domain-specific business logic? sure!

#

hardcore data-engineering principles? why not!

desert oar
#

heh

#

i'd assume that an "intro to data science" course is like 1/2 ML and 1/2 stats

silent swan
#

ELBo for building VAEs? throw it in!

desert oar
#

maybe start with stats and do ML at the end once they're a little more comfortable with the math and coding

lean ledge
desert oar
#

which syllabus is this, columbia?

lean ledge
#

Yah

desert oar
#

that's pretty comprehensive

#

and fast moving

#

what are the pre-requisistes?

#

it's free online? that's pretty sweet

silent swan
#

that's more ML than DS though?

lean ledge
#

DS is basically ML though? + Domain knowledge and blah blah hype words

desert oar
#

sorta, ML+Stats

#
  • you actually have to talk to the business people
#

at least that's my expectation when i see "data science"

silent swan
#

I do actually think some level of data engineering should be in data science

desert oar
#

like what

#

basic principles of indexing a database or something like that?

silent swan
#

databases, mapreduce

desert oar
#

does anyone actually write low level map reduce stuff though nowadays

#

people just use spark

#

unless you truly have enormous data

silent swan
#

would still be good to know the underlying principles though

desert oar
#

oh, the concept? yeah definitely

lean ledge
#

But yeah it's a good course. Builds good fundamentals in the first 6 weeks and then goes over good foundations for related stuff like drawing out true latent factors through matrix factorisation and PCA, Markovian models, continuous state space extension to those, etc

desert oar
#

i think that more falls under general programming skills than data engineering though

#

yeah, ill go look over the material at some point and start directing people there who ask

#

thanks

silent swan
#

arguably even (very practical) things like database integrity when you have parallel requests
don't have to know how DBs actually handle them, but you need to know that it's an issue that people have to think about

#

I would argue that data science should cover how people have to handle data

#

of course, there're different perspectives

#

like my bayesian stat friends who treat deep learning as just "function approximators"

#

which isn't wrong

desert oar
#

im one of those people i think πŸ˜›

#

i also think that data engineering generally can be learned on the job

#

and i think a good org will deliberately get you to push your limits in that regard

olive willow
#

btw guys do you need like linear algebra and calc do start learning ML and understanding it ?

desert oar
#

you can start, but you won't get that far

#

you might also build up some bad habits and mistaken ideas without knowing how it works

#

for understanding it, they are necessary

olive willow
#

so even before using linear regression, I should understand the formula

#

sht

#

that will take hella a lot of time

#

I'm on linear transformation rn

#

and calc == noting yet

#

so it will take like 2 years right? around that

#

to understand the a little bit more than the basics of ML

polar acorn
#

My 2 cents, at 14 I would play around with what I found enjoyable. If you put up a rigorous schedule for learning DL and/or ML and all related fields, you might find you are sick of it after two weeks and then do something else. Playing around and learning stuff in a sub optimal way is always better than giving up on learning it the right way. Playing around with a conv net for CV even if you don't understand everything that's going on, is much better learning then reading the intro chapter of some advanced calc book and then putting the rest of it away forever. If you enjoy it you'll find yourself learning the whole field soon enough. So first of all know yourself and then figure out what you want to learn.

olive willow
#

I enjoy it that's why I'm so enthusiast about it and want to learn it.

#

I'm repeating it every day because, it might sound weird to some people but I love math and things that make sense like chopping a circle down and then putting it into a graph and getting the cm^2 that way for example

#

in stead of using the normal pi r^2

#

sry dude bye have to go to sleep now, will read it tomorrow !

stoic beacon
#

All of this has hurt my head

#

And just the fact the it's so vast and there's so much to know I think I'm just done lol

#

It's too much math that's way over my head and spending 2-3+ years to learn something to just the point of basic understanding is just not my idea of a hobby lol

desert oar
#

@olive willow no for basic 1 variable linear regression you can get by with basic calculus if you want to really understand

#

you can skip that honestly

#

this is why people usually go to school for years when doing this stuff..

stoic beacon
#

@desert oar what about me bruh

#

I need wisdoms too

#

Where can I best understand the high level principles behind ML and the algorithms involved? Not trying to become a math wizard or scholar or anything

desert oar
#

what do you already know, and what are you trying to do with what you learn?

#

Just get a better understanding so you can follow the news and not be completely lost?

#

Or fit basic models?

stoic beacon
#

Fit basic models sir

#

Maybe do some fun things with some Kaggle data or work related data

desert oar
#

Ok

#

Im really not the best one to ask, i dont know too many resources

#

kaggle has their own tutorial content but its very limited

stoic beacon
#

Yeah I saw :(

#

Oh well

#

I'll just keep gaining bits of info here and there as I watch things

sand reef
#

Say. Is reservoir computing still a big thing? Or it never was?

karmic geyser
#

I have some python code and I need to do a lot of maths on a lot of data inside a function in almost real time. How would I normally go about making it faster? I have already rewritten most of it with performance in mind as well as running multiple of the function on different signals in seperate threads/processes. This is what line profiler says about the function. ignore the thing about bandpass as it's not actually a bandpass yet.

#

It's currently using about 3.3 ghz of total cpu usage and I need to try get it to under 1ghz

#

would you use something like cython? I haven't used it before.

sand reef
#

Well. About cpython.

#

Apparently PyPy, not PyPy3, is super fast. Faster than cpython.

#

@karmic geyser

zenith nova
#

Cython was probably the thing that was meant

karmic geyser
#

Hey sorry I didn't have it open.

#

I could maybe use pypy instead of cpython. But I don't mean using a different interpreter I mean having a single function written in C or C++ code and called from python.

sand reef
#

Yeah. Cython is 3-4x times faster than pypy

#

So, I think, that might fix your issue.

karmic geyser
#

Yeah I'm running into performance issues on my desktop pc and eventually want to run it on a 1ghz single core arm processor. most of the code is fast enough it's just a few filters that I will be applying to a big array of data that might need to be done in c++. How hard is it to use cython for a single function?

zenith nova
#

It's designed for that, so hopefully not very hard

karmic geyser
#

Okay I will give it a shot.

sand reef
#

Like a tutorial?

karmic geyser
#

Thanks I will have a look at that.

polar acorn
#

@karmic geyser If you have time I've also heard nice things about https://github.com/pybind/pybind11 for when you need just one function running in c++. I've never tried it out myself though.

karmic geyser
#

alright I will read about that too. I think I used ctypes for something in the past because I wanted to access some windows dll functions and python didn't have an interface.

desert oar
#

Cython's main difficulty is sparse and IMO somewhat incoherent documentation

#

If you know C it's probably easier to learn

#

You're just trying to wrap a C++ function?

karmic geyser
#

I have an algorithm in python but it's quite slow so I wanted to make just that algorithm function run faster.

#

I have about 23ms to run the algorithm on 1024 values. I think it would be a lot faster in c or c++.

desert oar
#

If you share the code i can probably help

#

Chances are there is something you can do to improve performance without using cython

#

But yes, when I rewrite something in cython i usually get about a 50% performance improvement without doing much of anything other than copy and paste

olive willow
#

Yo dude

desert oar
#

How long does it take currently?

olive willow
#

You can use libs to try make it faster ? Like numpy array instead of list

karmic geyser
#

I already made some changes. Instead of using a circular buffer for some values I only store the last values of the chunk for the next one. I'm also running the algorithm on 2 different cores but that won't help when I move it to the embedded device. at the moment it uses up about 70% of a 4.5ghz cpu core. and I need to run it on a 1ghz arm cpu.

desert oar
#

Well unless you post the code nobody can help

#

Are you operating on images? Text? Etc.

#

What do you mean by a buffer?

#

You're trying to iterate over something in chunks?

karmic geyser
#

Audio data. maybe I ment ring array. let me post the code + the profiler

desert oar
#

Thanks, it will just be much easier to assess the situation that way

karmic geyser
desert oar
#

OK, I think I have some performance improvements we can make once I get to a computer

karmic geyser
#

The algorithm at the moment pretty much just acts kinda like a low pass filter. I will need a few different algorithms but I'm still learning how to implement them. Python seems to be kind of slow and I'm not really sure how to optimise python code, with other python stuff I have done it hasn't been a problem but I think that was because libaries I used had the heavy stuff implemented in c or c++. I don't mind writing the actual algorithms in c++ or c I'm just not sure what is the best way to do that and then call it from python.

#

I think the algorithm I wrote might just be a moving average

desert oar
#

Yeah. Also looping is slow because of lots of memory allocation and other overhead

karmic geyser
#

yeah, I figured if I wrote it in c++ I could avoid most of that.

desert oar
#

If youre using numpy for looping you might as well use a list

#

Ill take a look but yes. This might be a good candidate for cython

karmic geyser
#

Some stuff I might need to do is multiply every value in the array by a constant which I think numpy helps with. The audio library I use for outputting and recording the samples gives a numpy array as well.

desert oar
#

Thats a 1 liner in numpy and extremely efficient

#
x = np.array([1,2,3])
print(x)

y = x * 10
print(y)
karmic geyser
#

'''
def stereotomono(left,right,gain):
left = left * gain
right = right * gain
mono = left + right

    return mono

'''

#

Woops, but I was doing that to multiply an array to lower or increase the volume and then I was summing the 2 arrays.

desert oar
#

Yep that should work

#

In your algorithm you loop over bpos twice for every pass over samples

#

Oh nvm

#

Hah yeah this is a moving average isnt it

karmic geyser
#

pretty much I'm adding a few of the previous samples to the start of the algorithm.

#

Yeah I think so.

#

on the left is the effect it had on some music I was playing. the right is with the filter turned off.

desert oar
#

The comments on the numpy answer are enlightening as well

karmic geyser
#

Pretty much I want to get an audio signal and turn it into 3 audio signals. 1 that is lowpass below 150 hz. 1 that is bandpass of 150-3500hz. and 1 that is 3500hz highpass.

#

I tried using scipy butterworth filter but the tutorials were not really clear and it didn't seem to work correctly. it was just making everything quiet.

desert oar
#

can you share your scipy code?

#

the example here looks straightforward enough to me

#
from scipy.signal import butter, sosfilt

freq_lo = 150
freq_hi = 3500
filter_order = 6
sos1 = butter(filter_order, freq_lo, 'lowpass')
sos2 = butter(filter_order, (freq_lo, freq_hi), 'bandpass')
sos3 = butter(filter_order, freq_hi, 'highpass')

def filter3(y):
    return sosfilt(sos1, y), sosfit(sos2, y), sosfit(sos3, y)
#

then you'll have to play around with the cutoffs and order in order to get the response to look how you want

#
import matplotlib.pyplot as plt
from scipy.signal import butter, freqs
from collections import Iterable

def plot_butter(N, Wn, btype):
    b, a = butter(N, Wn, btype, analog=True)
    w, h = freqs(b, a)
    plt.semilogx(w, 20 * np.log10(np.abs(h)))

    plt.title('Butterworth filter frequency response')
    plt.xlabel('Frequency (rad / sec)')
    plt.ylabel('Amplitude (dB)')

    plt.margins(0, 0.1)
    plt.grid(which='both', axis='both')

    if isinstance(Wn, Iterable):
        for w in Wn:
            plt.axvline(w)
    else:
        plt.axvline(Wn)

plot_butter(6, 150, 'low')
plt.show()
karmic geyser
#
    #I think this generates the values that the filter will use.
    def butter_bandpass(lowcut, highcut, fs, order=5):
        nyq = 0.5 * fs
        low = lowcut / nyq
        high = highcut / nyq
        sos = butter(order, [low, high], analog=False, btype='band', output='sos')
        return sos
    
    #I think this applies the filter to the data.
    def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
        sos = butter_bandpass(lowcut, highcut, fs, order=order)
        y = sosfilt(sos, data)
        return y
desert oar
#

what's the nyquist frequency? i don't know much of anything about signal processing

karmic geyser
#

fs was 44100, lowcut was 150 and highcut was 3500

desert oar
#

fs = frequency sampling rate?

karmic geyser
#

nyquist frequency is half of the sample rate. It's pretty much what a sample rate can have a sine wave up to.

desert oar
#

so that code looks right

#

do you have a sample signal we can test on?

karmic geyser
#

I had the order at 6 but it was making the entire song quiet. if I turned the order lower it just seemed to be making it slightly less soft. if I multiplied it by like 16384 it ended up sounding close to the original but with some distortion from compressing and then expanding the samples I think.

desert oar
#

so your data is sampled how many times / second?

#

just so i understand what's going on here

#

44100?

karmic geyser
#

44100 times

desert oar
#

ok, so a spike at 0 and 44099 would be a 1hz signal

#

right?

karmic geyser
#

I pretty much have a microphone/ line input that I am playing music from my phone through. I then grab 1024 samples on my computer. apply some processing then output the 1024 samples to my computers headphones. the latency is somewhere under 0.1 seconds. all this is happening in real time and constantly streaming from an input to an output.

#

I think so.

desert oar
#

ok

karmic geyser
#

a 1 hz signal would pretty much be a sine wave that repeats every 1 second.

desert oar
#

well anyway does your hand-written code work?

#

its just slow?

#

cause i feel like we can definitely get the butterworth filter working, but yes you can probably write your code in cython for a significant speedup

#

can you give an example signal i can test with

karmic geyser
#

I can open a wave file, apply the processing then output a wavefile with some python libaries I think.

desert oar
#

ok, thats not really what im asking

#

but if you find a signal processing library that's probably the best option

karmic geyser
#

I can't really give the same samples as I'm streaming it from an input. I can program some stuff to read/write the samples though.

desert oar
#

any kind of test data should work

#

how are you testing your code in the first place?

#

anyway, in your code there are a few things that can be optimized

#

the variable n is completely unnecessary, it's always just equal to bpos_max by the time you use it

#

im still not totally sure what x is but if it's a sliding window, then you can obtain that much more efficiently

#

also i += 1 should be ever so slightly more efficient than i = i + 1 which might matter on an embedded device

karmic geyser
#

yeah I was doing some stuff like adding 2 values then dividing it by n then resetting n back to zero. x is the summed value of the previous "bpos_max" samples

#

Yeah I wanted to do i++; but python doesn't have that haha

#

N is redundant with how it is atm.

desert oar
#

what happens if i is 3 and bpos_max is 16?

#

you just sum the 0-3rd values?

karmic geyser
#

pretty much I'm adding the previous chunks 16 samples to the start of the algorithm I'm summing samples 0-15 + the current sample then dividing it by the total samples which is 16. I'm then moving an offset by 1 and repeating it again up to the total number of samples in the chunk. I'm then setting the buffer to the last 16 samples of that chunk

desert oar
#

so what happens when you're at the 3rd sample in the signal

#

you don't have 16 previous samples

karmic geyser
#

the summed + divided value is then stored as a sample in an array to be returned

#

the first time I run it the 16 samples are all equal to 0.0

desert oar
#

got it

#

use the moving average code i sent

#

in the stackoverflow examples

#

as long as you aren't extremely memory constrained it's the most efficient option

#

it pre-computes the entire sequence of cumulative sums

#

then subtracts off whatever is outside the window

karmic geyser
#

yeah, I'm only cpu constrained. I got 512 megabytes of memory on the device that will run the code.

#

I think it's only like 35 kilobytes of samples.

desert oar
#

alright let me think about this

karmic geyser
#

It's ~1.5 megabytes a second of samples but I'm doing it at roughly 43 chunks a second so I don't need much memory. It's mainly a lot of operations.

desert oar
#

yes you can rewrite this in cython and should see significant improvements

karmic geyser
#

Ideally I would use someones library, put the filter values in then they would probably do it in c++.

#

Like what numpy and scipy probably does right?

desert oar
#

yeah. or fortran πŸ˜‰

karmic geyser
#

Haha why not verilog πŸ˜›

desert oar
#

hmm im confused as to how this buffer is working

stoic beacon
#

Can I jump in to ask a dumb and unrelated question?

desert oar
#

sure

karmic geyser
#

Always

desert oar
#

@karmic geyser it looks like you never update the buffer contents until the end of the loop

karmic geyser
#

Yeah I don't need to update it until the end, it was a small optimisation I thought of haha.

        i = 0
        x = 0
        n = 0
        for cur_sample in in_data:
            for sample in buffer:
                x += + sample
                n += + 1
                
            buffer.replaceOldest(cur_sample)
            new_signal[i] = (x + cur_sample) / n
            
            i += 1
            x = 0
            n = 0
            
        return new_signal
desert oar
#

then how is the buffer being populated

karmic geyser
#

that was the old code before I did some optimising and maybe changed it.

desert oar
#

arent you just pulling the first 16 values all the time?

stoic beacon
#

Been watching some TensorFlow videos and I honestly have no idea what's happening. I've watched enough high level videos and read enough articles that I generally understand how a neural net works but TensorFlow code is confusing me to shit. That being said, would Keras be sufficient for any stupid project I want to do? I don't need to do anything super scholarly or hardcore or like...cutting edge. Would the simplicity and ease of understanding of Keras be better?

desert oar
#

probably sufficient

#

but what part of tensorflow is confusing?

#

like... what's the most sophisticated code that you can understand?

#

also are you sure you understand how a NN works? i dont mean to be confrontational, sometimes we think we know more than we do

stoic beacon
#

Just the general workflow in terms of creating the actual net. What are Placeholders, what are Variables, what's a Graph, how do you create the layers, etc

desert oar
#

do you know what backpropagation is,what gradients are, etc.?

stoic beacon
#

I understand that there are input neurons which hold your data and the connections hold the weights and the hidden layers perform some activation function on your data

desert oar
#

ok... you'll need a more technical understanding than that in order to understand tensorflow

#

which i highly recommend developing. but for now keras is probably more friendly for your use case

stoic beacon
#

Then some calculus is used to find the minimum point, a la gradient descent

desert oar
#

tensorflow is really a "differentiable tensor computation graph engine"

#

for which NNs happens to be the most immediate use

stoic beacon
#

I gatcha

#

I must've watch that 3b1b video on NNs three or four times and yeah I clearly already forgot the big parts

#

Watched

desert oar
#

yeah if you don't feel comfortable with the equations, you will struggle to make TF work for you

stoic beacon
#

I was able to give a better detail a few days ago

desert oar
#

layers are kind of an abstraction

stoic beacon
#

So TensorFlow really kinda forces you to understand the math?

#

Not to sound like an ignorant fool or someone who doesn't want to learn it all, I just simply don't have the time

#

So something that abstracts away some of that math is probably best

karmic geyser
#

the buffer keeps its state between function calls. Think of it as me remembering the last "n" samples if I have all the samples in memory and I know what position I am up to then I don't need to write anything to the buffer until I am done with the current chunk then I just save the last few values of the chunk that the next function call will need to smoothly apply the algorithm, I could replace the oldest value in the buffer with the newest sample and shift the index by 1 every time but it's just not needed and slower then the way I switched too even though it was easier to read.

#

What do you want to do with machine learning?

#

@stoic beacon

desert oar
#

yeah it does @stoic beacon

stoic beacon
#

@desert oar I had a feeling haha. The series I was watching had him creating out the actual z = xw + b and I'm like...wut

#

@karmic geyser oh just stupid work things. I try to self improve every so often and I pick a topic I'm interested in and try to learn some of it

#

Without going too deep into any one thing. Jack of all trades, master of none kind of thing but I'm okay with that

desert oar
#

@karmic geyser im confused as to what your code is doing then

stoic beacon
#

I enjoy Python so I picked ML to practice some Python while learning something that interests me

desert oar
#

@karmic geyser how many elements is filterdata operating on at once?

#

if you can explain your algorithm in words it might help

karmic geyser
#

You can probably find tutorials for tensorflow with stuff like "tensor flow character recognition" in google. I think there was a white to black 32x32 pixel thing that you trained to recognise letters.

stoic beacon
#

And following a tutorial is great but it wouldn't be self improvement if I just blindly follow a tutorial and can't understand it

desert oar
#

precisely

stoic beacon
#

Even if I just loosely understand what Keras is doing I'd be happy haha

#

Even if I have to use the words "magic" and "awesome maths stuff"

#

And I don't usually put "awesome" and "math" in the same sentence

karmic geyser
#

pretty much make a 32x32 pixel image with a character in it. apply some kind of distortions/blurs. have like 10 different ones for each character. you use that as your training data. it then sets weights of "neurons" so that it gets as close as possible to 100% correct guesses of your training data. pretty much the "neurons" will find patterns in the data based on the intensity and position of values and how they compare to ones adjacent to them, it will then "guess" at what the character should be.

#

you could do stuff like trying to centre the character in the image before your program guesses it as that could improve the accuracy.

#

a neural network pretty much compares the inputs to each other to come up with an output. you tune the neural networks parameters/shape/size/weights/how inputs are linked ect so that the output gives you the output that you want most of the time.

#

That's supervised learning.

#

@desert oar filterdata operates on an input of 1024 floats in an array, it returns 1024 floats It has an internal memory that stores the last 16 elements of the previous array it was passed and uses those to continue from where it last got up too. the output samples are 17 samples added together then divided by 17, it then offsets everything by 1 to generate the next sample. first sample uses all 16 values from buffer + 1 from input array. second sample uses the newest 15 samples from buffer + 2 from input array. until it's using 0 samples from buffer and 17 samples from the input array. from that point onward it then just offsets 1 by 1 along the input array calculating a sample from 17 samples until it gets to the end of the array, at that point it saves the last 16 values of the array to the buffer to use next time.

desert oar
#

yeah but you're always just using the final 16 elements from the previous array

#

there's no window sliding over the current array

#

also why are you even chunking it up like that

#

if your whole array can fit in memory then just do it all in one pass

karmic geyser
#

    while i < (samples):
        while ii < bpos_max:
            if (ii + i) >= bpos_max:
                x = x + in_data[i+ii-bpos_max]
            elif (ii + i) < bpos_max:
                x = x + buffer.getValue(i+ii)
#

that starts off with the 16 values from buffer then 15 then 14. slowly adding values from the regular array.

desert oar
#

but where are you adding values

#

ohhhh

#

oh i see

#

ok

#

err why are you using this circular thing at all then

#

vs just storing the last 16 values in a regular old array?

karmic geyser
#

the if and elif are swapped around cause I figured that 98% of the time the first if statement is true.

desert oar
#

also you can just do else instead of that elif

karmic geyser
#

yeah the else also works.

desert oar
#

so why not just store the last 16 elements of the previous array?

#

why this ring buffer business?

karmic geyser
#

Originally every sample was going into the ring buffer haha. Then I realised I didn't need to do that. I will replace it with a list

desert oar
#

also why do this in chunks of 1024

#

instead of just.. an array

#

oh cause youre reading them 1024 at a time?

karmic geyser
#

yes. It's going to be a digital crossover for speakers

#

1024 samples is enough that I should be able to do most algorithms and the delay isn't too much. Also less overhead then if I were to do it in chunks of 64 ect.

#

slow computers can't handle low chunk size as well. it adds stuttering to the playback

#

I'm going to have a small linux device that recieves an audio signal via spdif or 3.5mm stereo jack, then output it to 3 different 3.5mm jacks with different processing based on what kind of speaker it is. Only send bass to the subwoofer. high frequency to tweeters ect. if you send too high an amplitude signal at a low frequency to a tweeter it will break it. and if you send too much high frequency stuff to a subwoofer the bass will not be as clear.

desert oar
#

if you send me some sample data i can test my implementation

#

sample inputs and outputs

karmic geyser
#

Pretty much my $350 speakers amplifier/subwoofer broke and it was proprietary. They don't make them anymore. I managed to find someone who was selling the exact same speakers cause they had the same problem and I got them for $30.

#

Yeah I'm writing a python script to read a wave file and pass it in chunks

olive willow
#

guys so tensorflow is for ML and keras DL

karmic geyser
#

deep learning is like an onion

#

it has lots of layers

olive willow
#

yh

#

btw what do you need for DL

#

?

#

ML, math, programming, understanding of data. and what more?

karmic geyser
#

Understanding of the industry/field that you are trying to use deep learning with.

olive willow
#

I'm not in an industry yet but I want to go e-commerce or social media

karmic geyser
#

social media it would probably help to know a bit about fake accounts and how to identify them to clean up your test data. e-commerce maybe it would help to know what data to collect on customers, if in doubt all of it that is legal haha

#

I'm not in the fields I was just trying to think of examples of things that would be industry specific knowledge that is relevant to the problems.

olive willow
#

so do you use ML or DL for targeted ads?

karmic geyser
#

I would say deep learning is still machine learning. It's just a catergory of machine learning that is more complicated.

olive willow
#

I never have been in an industry

#

ooohhh sure

karmic geyser
#

Generally with machine learning you want it to be as simple as possible to get the result you want/are looking for.

olive willow
#

yh not to over do it especially with data science

#

do you do predictive analytics with ML or just an algorithm

karmic geyser
#

more complicated makes it harder to train, harder to understand, harder to predict ect.

olive willow
#

yh

karmic geyser
#

predictive analytics is something you do.

olive willow
#

ooh sure

karmic geyser
#

machine learning is one tool you can use to get there.

olive willow
#

yh that's what I meant

#

can you give me an example of uses of ML in data science like for example the analyzing part

karmic geyser
#

an example of predictive analytics might be this

#

what do you mean by the analyzing part?

olive willow
#

so if you already have pre processed the data and you start analyzing it

karmic geyser
#

Most machine learning data is pre processed before you give it to the machine learning code.

olive willow
#

yh but what can you do with ML in data science? like group data that is coming every second for example a red flag of a bank transfer

desert oar
#

yes

#

ML has subsumed a lot of what would have traditionally been called statistics

olive willow
#

oooh sure

desert oar
#

basically "automated predictive modeling" = "machine learning", "one-off modeling or causal analysis" = "statistics" 🀷

#

the distinction is more one of application nowadays

#

the methods are different though

#

eg you wouldnt typically use a random forest to infer the distribution of cancer cell sizes

olive willow
#

what's modeling I've seen it sooo many times but don't really know the definition

#

yh I know that

desert oar
#

nor would you typically use a hierarchical bayesian model to run online fraud detection

#

unless you could optimize it for such a purpose

#

actually thats kinda not true, you totally could

#

but you know what i mean

#

modeling is... fitting a model

#

implied, with the purpose of capturing some truth about the world

#

rather than "just" making predictions

olive willow
#

but you would use a linear regression model for example with sensor data to see if it didn't overheat for example and give out an inaccurate measurement

desert oar
#

sure

#

linear regression kind of sits at the intersection between "machine learning" and "statistics"

karmic geyser
#

Okay say you have some basic maths algorithms and you are a bank and you use it to work out whether you should or shouldn't give someone a loan. you can then keep data on all the loans you have and use machine learning to try find patterns in people who defaulted on those loans. you also could give loans to people that don't quite pass the basic maths algorithm to get more data and then from that use machine learning based on whether they defaulted on the loan or not.

olive willow
#

yh

desert oar
#

what rayzar said, but note that said "machine learning" algorithm could well be based on a statistical model

olive willow
#

but what's a model??

desert oar
#

a representation of the world

#

in mathematical terms

karmic geyser
#

but why male models?

olive willow
#

oohh ok, kinda understand it more now

#

thanks guys1

#

!

desert cradle
#

@karmic geyser Are you serious? I just told you like, a second ago.

#

πŸ˜›

karmic geyser
#

lmfao

#

I have been saying that for a while and you are the first person to get it hahaha

desert cradle
#

fun fact, that bit was ad-libbed because Stiller forgot his next line

karmic geyser
#

You can't script that kinda stuff haha

#

stupid question here, if I have a numpy array and I want to fill it from the start with data until another array is empty how would I do it?

#

array from shape (211,2) into shape (1024,2)

stoic beacon
#

Stochastic gradient descent is a cost function right?

karmic geyser
#

"Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties" I think so.

#

I think it might be a part of minimizeing your cost function?

#

I'm not too sure sorry.

desert oar
#

SGD is a convex optimization algorithm

#

just like newton's method, conjugate gradient descent, L-BFGS, etc

karmic geyser
#

@desert oar You know how I can put a small numpy array into a bigger array. pretty much its shape is (1024,2) for the big one and (<1024,2) for the second one.

#

I want to put it in right at the start.

#

Okay I got it sweet.

stoic beacon
#

Thanks guys

#

Sorry for the dumb questions

desert oar
#

@karmic geyser what order? you want to insert them rowwise?

karmic geyser
#

outdata[:view.shape[0]][:] = view

#

That seemed to work.

desert oar
#

try this instead

outdata[:view.shape[0], :] = view
karmic geyser
#

oh woops thanks I made a typo and didn't notice

desert oar
karmic geyser
#

I have almost done the wave file thing for you.

desert oar
#

its fine, i wrote it untested

#

i needed to learn how to use arrays in cython properly anyway

#

let me know if it works for you

#

or if it breaks πŸ˜‰

karmic geyser
#

haha, i will give it a try.

desert oar
#

you should be able to use it with from speaker_filter import filter_signal

karmic geyser
#

@desert oar I sent you the files. It's 2:30 am so I will get some sleep then look into using the filter you did in cython

prime elm
#
# Contouring and Plant Detection

cv2.imwrite('saved_mask.jpg', mask)
reuploadedImage = 'saved_mask.jpg'
preBlobD = cv2.imread(reuploadedImage, 1)
# cv2.imshow("Mask", preBlobD)

font = cv2.FONT_HERSHEY_COMPLEX

start = 1
for i in range(48):

    lower_value = np.array([0, 0, 0])
    upper_value = np.array([180, 255, 255])


    blobDetection = cv2.inRange(preBlobD, lower_value, upper_value)

    # Contours detection
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    for cnt in contours:
        partLabel = "{}".format(start)

        area = cv2.contourArea(cnt)
        approx = cv2.approxPolyDP(cnt, 0.02 * cv2.arcLength(cnt, True), True)
        x = approx.ravel()[0]
        y = approx.ravel()[1]

        if area > 80:
            cv2.drawContours(preBlobD, [approx], 0, (0, 0, 0), 1)

            if start >= 49:
                break
                #Try to find the Center Here

            if len(approx) > 2:
                cv2.putText(preBlobD, partLabel, (x, y), font, 1, (255, 255, 255))

                start = start + 1

cv2.imshow("Mask", preBlobD)```
#

ops

#

So i have this code that i wrote

#

and write now it functions to create countours of the masks (of another image)

#

ideally i want to get a list of all the x plots and y plots of each contour vertex

#

i need this because i want to take the averages of it so that I can find the center point of each plant (the subject of the masks)

#

does anyone know how and if thats possible

olive willow
#

sorry bro i can't help you, ask maybe nix

prime elm
#

@earnest prawn

earnest prawn
#

why am i always listed as a reference for this i have no idea about data science

prime elm
#

actualjly my mentor for my internship just called and i have to go to a meeting 😞

#

cya ill be back with the same q :3

olive willow
#

@earnest prawn because you're an smart boy

earnest prawn
#

i am smart to the extent that i can use google (at least for this topic)

olive willow
#

than that makes two of us!

#

hahahah

prime elm
#

Does anyone have any suggestions

#

@olive willow @earnest prawn sry bout ping i try not to

olive willow
#

IDK i'm 14 dude still learning even the math hahahhaha

prime elm
#

ooo lolol

#

youngin

#

good for you

#

πŸ˜„

olive willow
#

hhahahahahha

prime elm
#

ill be back i need to 3d print something. id appreciate if seomeone could help me with this. its a gate to moving on in my code

#

❀

olive willow
#

sure

earnest prawn
#

@prime elm dont get me wrong, you can ping me as much as you like as long as its not spam, however regarding this topic I will very unliekly be of great use for you

prime elm
#

@earnest prawn I get that. I was just told to. if you know someone who could help with this id apprecaite. I know how to find a centroid using pixel averages, but it doesnt solve what im aiming to do next, so i need helping using the vertexs of the polygon to find the center point

#

So I found the data is stored in the variable like this

#

 [[892 563]]

....

 [[896 577]]

 [[897 576]]]```
#

with the left colummn denoting x coordinate

#

and right denoting y

#

how would i extract them and put them into two seperate lists?

prisma verge
#

so
well
i'll just say that keras is very amazing framework

#

it's like python but for deep learning
it makes things very simple but yet flexible

#

i was able to create my own network thanks to keras even when being dumb and not knowing math at all

#

so yeah

#

that's an amazing lib

#

though i find data preprocessing quite hard

#

anyone got good libs to simplify that except opencv for images?

prime elm
#

@prisma verge is that data type i posted above an array?

#

is it a 2 column array or one coloumn

prisma verge
#

have no idea, sorry

olive willow
#

@prime elm explain

#

it's a 3d array to be exact

#

because it has a list inside a list inside a list

#

so 3 lists

#

3d

#

you need in list less

#

@prime elm can you post the entire code or at least how you imported the data

prime elm
#

@olive willow sure

#
# # # Start Up
#_______________________________________________________________________________________________________________________

### Library
import matplotlib as plt
import numpy as np
import operator
import cv2

# Select Image
imageName = '04.24-13.26.jpg'
img = cv2.imread(imageName, 1)

# # # 1st Section of Code - Image Processing
#_______________________________________________________________________________________________________________________

# Grid Overlay
cv2.rectangle(img, (130, 25), (925, 340), (255, 255, 255), 1)

#Do the Processing
    # Color Filtering
        # HSV - Hue, Sat, Value
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_green = np.array([13, 0, 0])
upper_green = np.array([90, 225, 254])

mask = cv2.inRange(hsv, lower_green, upper_green)
res = cv2.bitwise_and(img, img, mask= mask)

    # Morphology
        #laplacian
laplacian = cv2.Laplacian(mask, cv2.CV_64F)

# Show the Image
# cv2.imshow('laplacian', laplacian)
# cv2.imshow('mask', mask)
# cv2.imshow('res', res)
cv2.imshow('Array of Plant Pots', img)
#

# # # 2nd Section of Code - Contouring and Blob Detection
#_______________________________________________________________________________________________________________________

# Contouring and Plant Detection

cv2.imwrite('saved_mask.jpg', mask)
reuploadedImage = 'saved_mask.jpg'
preBlobD = cv2.imread(reuploadedImage, 1)
# cv2.imshow("Mask", preBlobD)

font = cv2.FONT_HERSHEY_COMPLEX

start = 1
for i in range(48):

    lower_value = np.array([0, 0, 0])
    upper_value = np.array([180, 255, 255])


    blobDetection = cv2.inRange(preBlobD, lower_value, upper_value)

    # Contours detection
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    for cnt in contours:
        partLabel = "{}".format(start)

        area = cv2.contourArea(cnt)
        approx = cv2.approxPolyDP(cnt, 0.02 * cv2.arcLength(cnt, True), True)
        x = approx.ravel()[0]
        y = approx.ravel()[1]

        if area > 80:
            cv2.drawContours(preBlobD, [approx], 0, (0, 0, 0), 1)

            if start >= 49:
                break
                #Try to find the Center Here

            if len(approx) > 2:
                cv2.putText(preBlobD, partLabel, (x, y), font, 1, (255, 255, 255))

                start = start + 1

cv2.imshow("Mask", preBlobD)```
#

# # # 3rd Section of Code - Pixel Count and Surface Area of Each Plant
#_______________________________________________________________________________________________________________________

# Pixel Counter 0.2
# Recursion for Reading Masks
iteration = 0
for i in range(8):
    # Moving Frame Over Horizontally
    small_x = 150 + (92*i) # x_value refers to moving the frame left to right as columns
    big_x = 250 + (92*i)

    for j in range(6):
        # Moving Frame Over Vertically
        small_y = 35 + (101*j) # upper y_value has smaller numerical value
        big_y = 151 + (101*j)# lower y_value has bigger numerical value

        # Attempting to Output Grid as Multiple Windows
        fileString = 'saved_mask{}.jpg'.format(iteration)
        cv2.imwrite(fileString, mask[small_y:big_y, small_x:big_x])
        gridOutput = cv2.imread(fileString, 1)
        gridPart = 'Grid Part #{}'.format(iteration+1)
        cv2.imshow(gridPart, gridOutput)

        iteration = iteration + 1

        x = []
        y = []
        px = 0
        gridImg = gridOutput.astype('float')
        gridImg = gridImg[:,:,0] # convert to 2D array
        row, col = gridImg.shape
        for i in range(row):
            for j in range(col):
                if gridImg[i,j] == 255:
                    x.append(j) # get x indices
                    y.append(i) # get y indices
                    px = px + 1

        # Measuremets
        # px : mm^2 :: 1 : (110/93)^2
        print("\nGrid Part #%i\n----------------\nX values of identified pixels: %s\nY values of identified pixels: %s\n\nNumber of Pixels(s):%i\nSurface Area (1 px : ((11/9)^2) mm^2): %i" % (iteration, str(x), str(y), px, px * (110/93)^2))

# # Close and Exit
# cv2.waitKey(0)
# cv2.destroyAllWindows()
olive willow
#

what's the problem

prime elm
#
### FIN```
#

So

#

basically

#

I need the x and y coordinates of each contour

olive willow
#

whats a contour

prime elm
#

the contours are built in the for loop

#

a contour is a blob isolating part of open cv

#

and it basically wraps around shapes

olive willow
#

sure so when a color changes you want the x and y of that

prime elm
#

not color

#

in order to draw a shape

olive willow
#

Iknow

prime elm
#

u need to draw lines to each vertex. and i want those vertexs

#

bc using that i can find the center of the shape

olive willow
#

ooohh then I can't help you, I'm too noob for that

prime elm
#

lol thanks for tryin

olive willow
#

sure np sorry maybe nix

prime elm
#

oof. idk if any of the admins know, I hope i figure out this problem

olive willow
#

yh

prime elm
#

the data above is in an array and im not good with thinking like a programmer fully yet. so idk how to extract only the x values from that array

#

and only the y values

olive willow
#

with that I can help

#

I'm 14 so idk how to think either

#

so

#

so

#

so

prime elm
#

oo

olive willow
#

so you save the coordinates as an 2d numpy array?

prime elm
#

errr let me paste

#

nope thats for pixel counting

#

all sections of code work

#

that section is inaccurate

#

bc what it does is

olive willow
#

only not the dimension in the array?

#

because the code you've send before was 3d

#

you are somewhere adding another layer I think

prime elm
#

the 3d array i think u said is the one im trying to extract in for from

#

im not adding in the code

#

um look for the variable called "approx"

olive willow
#

yh

#

I know

prime elm
#

its in section 2

#

yuh

#

i pasted that variable to see what the data looks like

olive willow
#

approx = cv2.approxPolyDP(cnt, 0.02 * cv2.arcLength(cnt, True), True)

prime elm
#

yuh

#

so ik the data is there

#

i just need to extract the x and y

olive willow
#

can you print it

prime elm
#

ye

#

[[[897 568]]

[[892 563]]

[[891 563]]

[[890 562]]

[[878 562]]

...

[[895 579]]

[[896 578]]

[[896 577]]

[[897 576]]]

olive willow
#

np.array(aprox) maybe

#

but why is it an 3d array?

prime elm
#

so make a variable

coord_array = np.array(approx)```
#

i dont know

olive willow
#

yes

prime elm
#

its part of the library

olive willow
#

and then print it out

prime elm
#

aight

olive willow
#

lets see

prime elm
#

still the same

#

[[[897 568]]

[[892 563]]

[[891 563]]

[[890 562]]

#

...

#

etc

#

haha πŸ˜…

olive willow
#

do print(coord_array[0])

prime elm
#

kk

#

[[897 568]]

olive willow
#

and change it to, [0, 0]

prime elm
#

how would i do that

#

im bad at arrays

olive willow
#

just coord_array[0, 0]

prime elm
#

ooo thats what u mean

#

haha

olive willow
#

hhaha

prime elm
#

[897 568]

#

got rid of a []

olive willow
#

now [0, 0, 0]

prime elm
#

u think it a 2d array thats just in a 3d?

olive willow
#

cuz it's 3d

#

so add another layer

prime elm
#

897

olive willow
#

peel it like an onion boy

#

yeeessss

prime elm
#

so its just a 2d array as a 3d

olive willow
#

no look

prime elm
#

and all i got to do is make a for loop to pile it into a list

olive willow
#

there are 3 dimensions in your array

#

[[[ ]]]

prime elm
#

haha true. but only 2 sets of data

olive willow
#

those are 1d

prime elm
#

so one deminsion is empty

#

oh

olive willow
#

no

#

[ 3d [ 2d [ 1d ] 2d ] 3d ]

#

your info is in the 1d

#

it's a vector

prime elm
#

a . a

#

ops didnt meant to do that

olive willow
#

with an x and y cord

#

hahah

prime elm
#

......
c . c
b . b
a . a

#

so its two peices of info going back ^

#

like that, visually

olive willow
#

kinda

prime elm
#

but bsaically it is missing one "visual" deminsion is what im saying

olive willow
#

kinda

prime elm
#

like i can hold one of the deminsions constant

olive willow
#

yes

prime elm
#

and us a for loop

olive willow
#

because nothings there

prime elm
#

yuh

#

and use a for loop to pull out the data

olive willow
#

yh

prime elm
#

how does the len() function work on arrays?

olive willow
#

idk try it

prime elm
#

rip says 85

#

time to count

#

170 words

#

/2

#

85 lines

#

so its counts the group, not the number of elements

olive willow
#

yh

#

but you need to do [0, 0]

#

remember

prime elm
#

hmmm

#

it says 2 lol

#

i dont think im getting what u mean πŸ˜„

olive willow
#

yh 2 numbers

prime elm
#

i get that

#

but like i guess i still dont know the angle to attack it

olive willow
#

[ 464 346 ]

#

those are 2 items

#

o and 1

#

it's going by index

#

do str()

prime elm
#

hm?

olive willow
#

if you want the count

prime elm
#

len(str(coord_array))

olive willow
#

yh

prime elm
#

and []?

#

any?

olive willow
#

is it printing it out

prime elm
#

it says 9

olive willow
#

idk dude sorry have to go

prime elm
#

npnp

olive willow
#

bye ask nix

prime elm
#

thanks for ur help

rancid gust
#

Hey, there is any easy way to remove matplotlib axis while keeping it's grids ?

olive willow
#

I'm not sure but there has to be

sand reef
#

@rancid gust

olive willow
#

oohh yh that makes sense

rancid gust
#

@sand reef thanks!

sand reef
#

Np!

drowsy fulcrum
#

im trying to curve fit every point, anyone know how to do it?
essentially, i just want connect all the points up, but make the lines smoother
i could do this easily by hand, so it cant be very complicated
if it helps, the points will always increase in size

earnest prawn
#

you just put all the points in an x/y array ant plt.plot(x,y) them

#

if you want a function which fits all the points I am sure you can find one with a lot of work

drowsy fulcrum
#

this is what i need

earnest prawn
#

this doesnt look it the graph fits the point

#

it looks like the points fit the graph

#

esepcially with the last one

unkempt helm
#

It's just polynomial interpolation

drowsy fulcrum
#

seems to be exactly what i need ty

knotty nexus
#

any tips for reading multiple .xlsx files in a folder with column headers that are slightly off?

#

Assuming that I'm not allowed to modify the source files,
I tried something like df = pd.read_excel(source location + filename,
usecols = lambda x: x.lower() in usecols_lower_list) but it doesn't seem to work

#

then concat it together

sand reef
#

so, are you only limited to using pandas? or are willing to try other libraries?

#

because, you could try this, I hope it helps

#

@knotty nexus

knotty nexus
#

thanks @sand reef , I just skimmed the chapter, it doesn't really seem to contain specific information on handling files with different column headers, but it did give me the idea to just skip the header rows when reading in the files, then create header names afterwards. I think in my case the columns in the different files remain in the same order, so this would also work

sand reef
#

np!

spice cargo
#

I wonder if anyone could suggest me something to start in reinforcement learning. I have covered mostly its mathematical part and yeah some basics like GYM etc to implement Q learning.
I am looking for something solid to start.
beginner here

sand reef
#

Well, I too have to begin reinforcement learning myself, I could myself use some pointers where to begin its mathematics from, got any leads for me? All I have done is the regular machine learning and deep learning courses from coursera.

spice cargo
#

This could be the best tutorial to start

#

It basically covers all the prerequisites required for RL
like policy Grad,Actor Critic all basic information you need to start

#

Apart from that as above mentioned you can implement basic Q learning(You'll see in the tutorial) using GYM it is a toolkit for developing and comparing reinforcement learning algorithms.

#

I need someone to guide me from here...lol

sand reef
#

Tysm!

west sky
#

Does anybody know how to apply PCA to data with a large amount of features (170000+)? I am currently using sklearn but my computer crashes when I try to obtain a cumulative explained variance curve to determine an optimal number of components.

lean ledge
#

No need to use a shallower CV based course to learn RL

#

Try Sutton's book, Berkeley's course and Spinning up RL

spice cargo
#

What exactly does Spinning up RL means

lean ledge
median siren
#

Hi all, is there anyone who has experience with XGBoost? I'm trying to train a model which has X-value as a feature, but for pretty much every method the error goes:

 self._features_count = X.shape[1]

IndexError: tuple index out of range

Now, this makse sense since it's a single feature. So when I try to reshape my X_train value(e.g. using .reshape(1,-1), the following error occurs:

ValueError: setting an array element with a sequence.

Doe anyone knows whats wrong?