#data-science-and-ml

1 messages · Page 74 of 1

visual sleet
#

The language you speak

#

Where are you from

cursive drift
#

uh, my native language, and russian

#

are u understanding for my bad english?

visual sleet
#

So do you speak Russian

cursive drift
#

yeah

visual sleet
#

Mm

cursive drift
#

what

visual sleet
#

Alright

#

Tell you what

#

Find something you like and pursue it

#

Learn it thoroughly, inside and out

cursive drift
#

i like code anything

#

but im interesting in AI

visual sleet
#

I’m not an entrepreneur or a freelancer nor have I made any money from anything online whether that be coding or something else so good luck 🫡

cursive drift
#

but i think i cant learn it for 9 month

slim bone
#

Hey fellas
I'm about to enter my 2nd year of CS - I have a couple of months before starting the year though. I'm contemplating what to do with this time and my top* choice at the moment is to start learning Pytorch (Specifically, due to academic courses I'll probably be taking later)
Throughout my first year I learned Discrete math, Calc1/2, and Linear1/2 (Unfortunately no probability/statistics yet)
I'm wondering whether or not this knowledge could be useful while learning, or will I need to wait a little while longer before I can utilize what I've learned in math?

Ty in advance to anyone who replies

civic elm
#

I realized that also expensive, with large datasets how much could be the cost when using aws or Google cloud services?

civic elm
slim bone
civic elm
#

I would prefer masters in data science because it's more practical but I don't know really

slim bone
#

I'm not sure a degree in Data Science is more practical than Deep Learning if I want to specialize in Deep Learning, you could be right though. As mentioned earlier I'm only starting my journey

I'd really like to emphasize however, that I'm asking about whether or not having my background in math could come useful in any significant way while learning Pytorch?* And not about my future academic ventures :)

civic elm
#

Yes it's useful because when you want to train models

slim bone
#

Could you elaborate how perhaps?

civic elm
#

you would then be able to understand papers math equation on the model you are using

slim bone
#

Oh. Will I necessarily be taught these though, through the common learning sources available?

#

I went through a few tutorials before and other than a very primitive explanation about Linear Transformations I didn't really notice anything "math-y"

#

Obviously the whole field is fundamentally very reliant on math. Perhaps I should rephrase:
How can I utilize my math background (Calc1/2, Linear1/2, Discrete) to learn Pytorch? (Assuming I can utilize it at all)

I think that's a relatively concise question

boreal gale
#

it really depends on what do you want out of pytorch..? like what are you hoping to do with it.
"learn pytorch" is maybe a little bit vague, do you have a concrete goal in mind?

civic elm
#

You mean In concrete examples? Like you would instantly know what a linear regression would do to your dataset

boreal gale
#

my message was a question for @slim bone in case you misunderstood.

slim bone
#

Machine Learning in general sounds really cool, too

#

I’d imagine there’s subsets for those disciplines as well, I’m sorry if my answers aren’t useful

civic elm
#

What mean by practicality is that I see more job openings for masters in data scientists than masters in DL

slim bone
#

Ah, but if I just wanted a job I’d just stick to Fullstack lol

boreal gale
#

right, to learn how to use pytorch requires some linear algebra.
to understand how model works properly you need a mertric ton of linear algebra and calc

slim bone
slim bone
civic elm
#

I mean you can practice your education in the real world

slim bone
#

Ah perhaps I was unclear - I’m not looking to just utilize my knowledge. I’m hoping to utilize it about something I’m passionate about

boreal gale
slim bone
# boreal gale i assume you are based in the US? i don't know what is calc1/2 and linear1/2, an...

Ah, no I'm not based in the US at all
Calculus 1 and Calculus 2 are basically the courses that teach you about Limits, Derivatives, Integrals, Taylor series(es?), multivariable calculus, function series, etc'..
Linear Algebra 1/2 mostly teach you about the fundamentals of Linear Algebra, transforming a matrix into a diagonal one, checking if that's possible, Tensors, Bilinear forms, and the superset for diagonal matrices whose name I can't remember (The one that's built out of eigenvalues, with 1's across the secondary diagonal if that makes sense)

#

It's obviously a little hard to compress a years-worth of knowledge into a concise paragraph but I hope I managed to get the message across

civic elm
slim bone
#

Oh and I don't know if this small detail is relevant but I think I technically learned Real Analysis and not Calculus

boreal gale
slim bone
#

So, more proof-based I suppose.

slim bone
short path
#

Guys, I'm trying to install pandoc to turn rmarkdown into pdf but the .msi file isn't running in my windows

#

do you know how could I install it?

slim bone
#

Kind of dip my toes in the water, if that makes sense

short path
#

And Jupyter says I need to install it

slim bone
#

Because as mentioned earlier - the tutorials I've found don't really dive into the math-side of things

boreal gale
short path
#

@slim bone vc é brasileiro, né?

slim bone
#

Also err, what's NN?

slim bone
short path
#

Oh, my bad

#

I thought you were from my country

slim bone
#

All good haha

civic elm
slim bone
slim bone
boreal gale
tall tulip
#

My dataset contains 21k values approx, the dataset values are recorded after every 5-min, but there are 764 values which are not in 5-min interval, So, I try to resample the non 5-min dates to 5-min interval using resample . I have tried the following code:

df_raw['Time'] = pd.to_datetime(df_raw['Time'])
df_raw.drop_duplicates(subset='Time', inplace=True)
df_raw.set_index('Time', inplace=True)
df_freq = df_raw.resample('10T').ffill()
# df_freq = df_raw.resample('5min').interpolate(method='polynomial', order=1)
# df_asfreq = df.asfreq('5T')
# df_resampled = df_raw.resample('5T', on='Time').asfreq()
# df_freq = df_raw.resample('5min').sum()```
The issues are that:
It makes the dataset for 12 months whether my dataset only contains data from Feb to April
It reduce my dataset from 21k values to 9k values
boreal gale
slim bone
#

Great. Better get to it then - Too much future planning can be harmful at times I suppose
Thanks a lot you two! @boreal gale@civic elm

tall tulip
#

@boreal gale Okay let me make a sample data for you

short path
#

Jupyter isn't getting the characters in a table right. Is there a way to allow it to show the right characters?

#

It should be like that:

#

In RStudio ^^

boreal gale
short path
#

I just loaded the table from a book I'm using to study

#

So just these two lines ^^

#

the problem is that it should show "médio" instead of "médio"

#

Since in RStudio it works fine, the problem must be with the encoding in jupyter

#

@boreal gale I guess you won't be able to open it there because you would need an R kernel

#

but do you have an ideia on how to change the jupyter encoding?

#

to allow more characters

tall tulip
#

@boreal gale can I upload sample dataset here?

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

boreal gale
#

post a dump (e.g. dataframe.to_dict()) there if possible

boreal gale
tall tulip
#

I just make a csv file with 270 values, it's not that big

boreal gale
short path
#

Let me try to do the table manually then

#

to see if that works

boreal gale
# boreal gale seems fine to me 🤷 that's the default

just in case

pd.read_csv(io.StringIO("""
l'accent aigu (acute accent) – é
l'accent grave (grave accent) – à, è, ù
la cédille (cedilla) – ç
l'accent circonflexe (circumflex) – â, ê, î, ô, û
l'accent tréma (trema) – ë, ï, ü
"""), sep='–')
short path
#

@boreal gale it shows this

#

but now I don't know if the problem is really with the character or with the way I used the function

boreal gale
#

ah! sorry i was confused, you are using R throughout and not python/pandas.

short path
#

yeah

#

but I'm trying to adapt it

#

the code

boreal gale
#

it might be an issue with whatever dataframe library you are using, i will have to try it myself in a bit

short path
#

Ok. thank you

#

that's the data:

#

N;estado_civil;grau_instrucao;n_filhos;salario;idade_anos;idade_meses;reg_procedencia
1;solteiro;ensino fundamental;;4,00;26;3;interior
2;casado;ensino fundamental;1;4,56;32;10;capital
3;casado;ensino fundamental;2;5,25;36;5;capital
4;solteiro;ensino médio;;5,73;20;10;outra
5;solteiro;ensino fundamental;;6,26;40;7;outra
6;casado;ensino fundamental;0;6,66;28;0;interior
7;solteiro;ensino fundamental;;6,86;41;0;interior
8;solteiro;ensino fundamental;;7,39;43;4;capital
9;casado;ensino médio;1;7,59;34;10;capital
10;solteiro;ensino médio;;7,44;23;6;outra
11;casado;ensino médio;2;8,12;33;6;interior
12;solteiro;ensino fundamental;;8,46;27;11;capital
13;solteiro;ensino médio;;8,74;37;5;outra
14;casado;ensino fundamental;3;8,95;44;2;outra
15;casado;ensino médio;0;9,13;30;5;interior
16;solteiro;ensino médio;;9,35;38;8;outra
17;casado;ensino médio;1;9,77;31;7;capital
18;casado;ensino fundamental;2;9,80;39;7;outra
19;solteiro;superior;;10,53;25;8;interior
20;solteiro;ensino médio;;10,76;37;4;interior
21;casado;ensino médio;1;11,06;30;9;outra
22;solteiro;ensino médio;;11,59;34;2;capital
23;solteiro;ensino fundamental;;12,00;41;0;outra
24;casado;superior;0;12,79;26;1;outra
25;casado;ensino médio;2;13,23;32;5;interior
26;casado;ensino médio;2;13,60;35;0;outra
27;solteiro;ensino fundamental;;13,85;46;7;outra
28;casado;ensino médio;0;14,69;29;8;interior
29;casado;ensino médio;5;14,71;40;6;interior
30;casado;ensino médio;2;15,99;35;10;capital
31;solteiro;superior;;16,22;31;5;outra
32;casado;ensino médio;1;16,61;36;4;interior
33;casado;superior;3;17,26;43;7;capital
34;solteiro;superior;;18,75;33;7;capital
35;casado;ensino médio;2;19,40;48;11;capital
36;casado;superior;3;23,30;42;2;interior

#

and the codeline:

#

tab2_1<-read.table("tabela2_1.csv", dec=",", sep=";",h=T)

boreal gale
#

perfect

#

it probably is due to the R dataframe library, doing it in pandas seems fine to me

short path
#

because it works fine in RStudio

boreal gale
short path
#

with the data you send me ^^

boreal gale
#
wot <- ' â, ê, î, ô, û'
wot

how about this

short path
#

let me see

tall tulip
boreal gale
short path
#

oooh

#

Now I'm puzzled

boreal gale
#

okay perfect

tall tulip
#

@boreal gale #data-science-and-ml message here is the link to my question.
And 21055 0 days 00:10:00 21056 0 days 00:00:00 21063 0 days 01:40:00 21109 0 days 00:10:00 21115 0 days 00:10:00
These are the time step which are not in 5-min

boreal gale
short path
#

ô

#

it worked

boreal gale
tall tulip
#

Sure

short path
#

@boreal gale do you know if there's a way to do something like that?

#

for cases in which I'm not doing the "read.table" to create the dataset

boreal gale
#

did i get your requirement right?

tall tulip
#

If you check the index 21055 and 21056 There time are duplicated I want to remove duplicates, and if you check the index 21062 and 21063 the time difference is above 1 hour I want to make all the times are 5-min interval

#

@boreal gale see the difference

boreal gale
# tall tulip <@231160898872410123> see the difference

i get "If you check the index 21055 and 21056 There time are duplicated I want to remove duplicates"
and i don't get "and if you check the index 21062 and 21063 the time difference is above 1 hour I want to make all the times are 5-min interval"
it's not clear enough what you want yet.
do you want to "insert" more entries in-between, at 5 minute interval, using the previous seen temperature readings?

boreal gale
short path
boreal gale
tall tulip
#

"do you want to "insert" more entries in-between, at 5 minute interval, using the previous seen temperature readings?"
I just want to make my complete dataset to 5-min interval, there are 764 interval which are not 5-min.

boreal gale
short path
# boreal gale yep

Why did you stop using R so much? Is Python that much more effective? I'm curious because I'm at the beginning of my major in Statistics and I wanted to learn Python, but all my professors use R

#

Don't you want to use R for the data visualization at least?

short path
tall tulip
#

Okay let me give you an example: if you see the time column at index 21062 and 21063, at index 21062 the the time is 27/04/2023 20:55 but when you see the time at index 21063 it jumps two hour 27/04/2023 22:35. It needs to be 27/04/2023 21:00 not 27/04/2023 22:35.

#

@boreal gale

boreal gale
# short path Why did you stop using R so much? Is Python that much more effective? I'm curiou...

Why did you stop using R so much?
because i stop being a statistician.
I'm curious because I'm at the beginning of my major in Statistics and I wanted to learn Python, but all my professors use R
the statistics support in R is much better (not sure if that's still the case today, python has come a long way, especially in time series modelling which is the one thing that's mega awesome in R back in the day), i would stick to R if you are more productive at it. but for job prospect.. learning python is probably just an eventuality, might as well get started now 😛
Don't you want to use R for the data visualization at least?
not really, matplotlib, seaborn, bokeh/plotly is plenty for my needs.

boreal gale
tall tulip
#

I want this output:

27/04/2023 21:00
27/04/2023 21:05
27/04/2023 21:10
27/04/2023 21:15```
short path
boreal gale
boreal gale
boreal gale
short path
#

Do you come across data science books written in R? @boreal gale

boreal gale
#

not really! (i don't read much 😦 )

short path
#

Do you prefer to learn in a top-down approach?

#

by getting projects first and then learning what you have to know to complete it

#

practicing a lot

boreal gale
#

probably yes. i am the kind of person who sometimes disregard docs and actually look at source code of libraries i am using..

boreal gale
short path
#

these are my two fears for not learning R

boreal gale
short path
#

Do you use kaggle?

#

I want to get to be able to do some projects there

#

and be kinda competitive

#

and Python seems to be way more effective than R for that

boreal gale
#

ah it's important to note i am no longer a data scientist 😂
(but to answer your question, i tried, but i had better things to do, so actually kinda no.)

short path
#

What do you work with now?

boreal gale
#

this is going offtopic 😛 catch me in one of the off topic channel 😉

short path
#

Ok. I'm curious just because there's been some talk about the field of data science risking to be way smaller

#

because of the new tools

#

and a possible market saturation

tall tulip
short path
cosmic dew
#

Hi guys, I started to study Python, is there any website that you recommend me to practice?

tidal bough
#

codewars, I guess

oblique quarry
#

Guys I posted a question in the python help channel would be much appreciated if someone would take the time to take a look at it

misty flint
#

huggingface has a great API

#

for ML

#

saves a lot of time for stuff

late ruin
#

Hi I hope someone could help me out, I have data in a file named [MESTP].JZF from what I've searched around it and found nothing of this sort of file extension, but there is data there , would love to hear for some help, how could I read that kind of file using pandas/pickle to table it in jupiter

left tartan
worn plank
#

im tryna help my gf with her hw and she's tasked with finding the center and radius of a circle, and im following this video but it gets to one point thst confuses both of us. why would (x^2+4x+4)+(y^2+10y+25) simplify to (x+2)^2+(y+5)^2? where does the 4x and 10y go??

small wedge
#

Same with the y's

teal mesa
#

The function x^2+4x+4 has two zero points at -2, which means you can write x^2+4x+4=(x+2)^2

cosmic dew
quiet seal
#

you know what would be nifty

#

I've summarized categories of stuff with Pandas and drawn spider charts with plotly before

#

but it never occurred to me to have each category be a slice of a pie chart, and build out those slices from layers

#

so if you're doing a CMM with ratings from non-existent to 1-5, you have 5 colors (or desaturate as you go out from greatest to least maturity, and color each grouping of capabilities) and then just grayed out at the outside

#

but I don't think that kind of charting exists in python yet?

#

it's one hell of a lot of information in one place. 3 dimensional: group (pie slice), categories within the group (bands), portion of the group in each category (thickness of the bands)

tidal bough
#

In your actual example, do you really have as many different functions as you have elements?

#

Because if so, not sure anything can be done about it.

#

Well, these functions are all just python code, so one way or another they all need to be executed on the corresponding elements and that'll take the bulk of the time. Not sure if anything can be done.

ripe forge
#

Vectorization assumes the same operation on multiple data points. If all you are doing are running completely independent functions on independent values, then you can't really do much this way. Options would be either just running it all in parallel (and it would depend on actual task whether you get speedups or not) or rewrite your functions somehow first if that's an option. Make them all the same or make them more efficient.

obsidian otter
#

yoo

#

i want to build an ai, if i manage to do everything where do i train that ting ?

rapid temple
#

what is the default model used for OpenAI and ChatOpenAI classes when no model is specified?

#

is it text-davinci-003 and gpt-turbo-3.5 by default?

young granite
tepid tartan
#

It is better to become data analytics and then slowly transition to data science?

quiet seal
serene scaffold
lapis sequoia
#

What's the best way to get started with AI's?

#

Any recommended tutorials?

#

Recommend programs or websites

#

I want to make a discord bot that talks in chat like me

tepid tartan
serene scaffold
serene scaffold
#

unless you're okay with the responses not sounding natural.

lapis sequoia
#

Hmm, what do you recommend?

serene scaffold
#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold
#

I would follow along with a book or course, and then try developing basic models, even if you don't have a use case in mind for them.

lapis sequoia
#

I want to do something AI related..

rugged rapids
rugged rapids
#

have fun

serene scaffold
left tartan
# lapis sequoia Any recommended tutorials?

This one has been highly recommended as an intro in Python: https://cs50.harvard.edu/ai/2020/

left tartan
serene scaffold
tepid tartan
flint grail
#

@tepid tartan

#

i'm currently 13 and i've been doing Python for a while

#

is this a good age to start or do most start younger

#

i've done lua when i was 9

#

when did you start doing computer science

mild dirge
#

I never know if these comments are trolls or..

flint grail
#

why would this be a troll

tepid tartan
flint grail
#

please shut up.

lapis sequoia
flint grail
#

such as, work with cyber security

left tartan
flint grail
left tartan
flint grail
#

dont they have

#

exams on it?

left tartan
#

Only if you want to do them

flint grail
#

what?

#

so they can pass a class without taking an exam?

tepid tartan
#

If you enjoy it, you can purse it

flint grail
#

to achieve what I stated previously

left tartan
#

I suggest just visiting the page, they list several ways to take it and has a lot of information.

flint grail
tepid tartan
#

Python? Or…

flint grail
#

currently im doing python

#

i got to go finish this course

#

so i'll see you later guys

cursive drift
#

hey, where i can learn finetuning?

oblique quarry
#

comes with practice tbh like you get at some point a feeling when looking at the data to know how aggressive you can go with the learning rate decay and whatnot

cursive drift
oblique quarry
#

I only read about residual Learning (microsofts ResNet) as part of my cv project. If you wanna know more about this kinda stuff you should ask more creditable members of the data-science channel such as @past meteor. But I can link you some resources I used https://arxiv.org/abs/1512.03385

lyric olive
#

I have started working as healthcare AI ML engineer, any good resources for AI in Pathology & Radiology like Monai.io

barren fable
#

I have a question in machine learning. A lot of people have told me that when you split your data in your code, it's better to split it into training and testing data because validation is not that important. Is that true?

steady spindle
#

Hello, I started a new project "self assisting A.I.", need a guidance so that I can complete this project, So if any of you want to join. 😇

hasty mountain
#

Guys, I'm trying to use Genetic Algorithm optimization together with Stochastic Gradient Descent to optimize my VAE which as already reached its plateau. I'm only a bit confused on whether I should use Genetic Algorithms to find a model that will provide a lower loss for each batch (Stochastic approach?) or which will provide a lower loss for an entire epoch (Global approach, I guess?)

I know that stochastic approach has some good proprieties for gradient descent, but would that be also valid for genetic algorithms? So far, I've tested the stochastic approach and it seems that it may cause the epoch loss to both decrease on some epochs and increase at anothers...

oblique quarry
mild dirge
lapis sequoia
oblique quarry
#

is there somebody with more experience who can review my code ```py
import numpy as np
import scipy.signal

class Convolution():
def init(self, inputSize, kernelSize):
self.weight = np.random.randn(inputSize[0], kernelSize, kernelSize) / kernelSize**2
self.outputShape = (inputSize[0],inputSize[1] - kernelSize + 1, inputSize[0] - kernelSize + 1)
self.bias = np.random.randn(*self.outputShape)
self.kernelSize = kernelSize

def image(self, images):
    for batch in range(len(images)):
        yield images[batch], batch

def forward(self, images):#performing crossCorrelation
    self.input = images
    for image, b in self.image(images):
        for y in range(self.outputShape[1]):
            for x in range(self.outputShape[2]):
                self.bias[b, y, x] += np.sum(image[y:y+self.kernelSize, x:x+self.kernelSize] * self.weight[b])
    return self.bias

def backward(self, gradient):
    self.dbias = gradient
    self.dweight = np.zeros_like(self.weight).astype(np.float64)
    dInput = np.zeros_like(self.input).astype(np.float64)
    _, h, w = gradient.shape
    for grad, batch in self.image(gradient):
        for y in range(h):
            for x in range(w):
                self.dweight[batch] += self.input[batch, y:y+self.kernelSize, x:x+self.kernelSize] * grad[y,x]
                dInput[batch, y:y+self.kernelSize, x:x+self.kernelSize] += scipy.signal.convolve2d(np.flip(self.weight[batch]), grad[y,x].reshape((1,1)), mode="full")
    return dInput

conv = Convolution((2, 4,4), 2)
bilder = np.random.randn(2, 4, 4)
out = conv.forward(bilder)
dInput = conv.backward(out)

#

(I'll vectorize the code but before i do that i want to know if everything checks out)

spark inlet
#
Traceback (most recent call last):
  File "main.py", line 1, in <module>
    import cv2
  File "/home/runner/some-school-project/venv/lib/python3.10/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/home/runner/some-school-project/venv/lib/python3.10/site-packages/cv2/__init__.py", line 111, in bootstrap
    load_first_config(['config.py'], True)
  File "/home/runner/some-school-project/venv/lib/python3.10/site-packages/cv2/__init__.py", line 109, in load_first_config
    raise ImportError('OpenCV loader: missing configuration file: {}. Check OpenCV installation.'.format(fnames))
ImportError: OpenCV loader: missing configuration file: ['config.py']. Check OpenCV installation.
#

code:

import cv2

img = cv2.imread("img.jpg")
cv2.imshow("output image", img)

cv2.waitkey(0)
#

im new to python am i doing something wrong?

pseudo spire
#

@spark inlet what are you trying to do?

spark inlet
soft dock
indigo wing
#

People I need immediate help, I am working on a OMDENA AI project on the baseline regression model. I am a 3rd year B.Tech CSE DS student. I have only worked with the datasets given in our classes with questions on them. I have no idea what is happening here. I can't understand how can I contribute to the project. If I can't make some contribution then I will be kicked out. I want to contribute but it feels so chaos, but not out of my understanding. I don't understand what can I contribute tot he project. Please advice.

#

Please help. Me noob. The scale of participation of this project and their achievements, knowledge is far greater than mine

young granite
#

speak with ur project partners and find a way maybe u just dont get one point and in an discussion u get back on track @indigo wing

hoary sphinx
#

Is anyone here doing job as data scientist?

cunning falcon
#

I am not sure if this is the place to ask this question. I have been reading about statistical learning with Python.

https://hastie.su.domains/ISLP/ISLP_website.pdf

It seems that a row in a matrix is called a “feature” vector. A column in the matrix is a vector. Is there a special vector name for a column, like there is for a row?

pseudo spire
humble portal
#

I'm getting this issue on the server I use. It started happening a few days ago and I can still use CUDA, but it is both slow and outputs a warning to console every time I spawn a new process.

The warning being Can't initialize NVML. If I try nvcc --version I get Failed to initialize NVML: Driver/library version mismatch

The server is running Ubuntu and I do not have super user access. The server administrator is refusing to do anything about it until it becomes a major issue rather than just an annoyance, and I am unable to use conda rather than pip due to the standards of the paper I am aiming on submitting to in the end.

young granite
misty flint
#

other than that if you need MLE resources, i highly recommend madewithml.com

#

if you are into traditional books, i highly recommend "Machine Learning Engineering in Action by Ben Wilson" (more than worth every penny i spent)

vernal acorn
#

Hey guys! So I want to get into the python GUI space with a project essentially as a annotation checker for a TTS dataset. The idea being to

A) Load long audio and a csv file(s) indicating the text and timestamp points for when a NN thinks it said that point and
B) Allow the user to scrub through the audio, and modify/confirm the timestamp points.

My problem comes seemingly with a decent audio playback system, with seemingly no GUI libraries supporting comprehensive (not to mention not 2000esk UI) elements for audio processing. Is there any libraries or examples out there that can handle something like this in python?

#

Essentially, I want to create a dumb-downed version of what can be seen here with Prodigy:

slim bone
#

Hey fellas, I’m trying to get into ML but I feel like I’m drowning in an endless sea of documentation and terminologies while not really going anywhere.

I have some academic background and some of the math nailed down already (mainly calculus and linear algebra). Can someone please recommend me a book that teaches some PyTorch? Many thanks in advance

mild dirge
#

Might be a new version out

jovial elm
#

anyone had trouble with enabling GPU for tensorflow? i've followed a step by step tutorial and they have my exact graphics card (1050 Ti) but 0 luck getting it to work

serene scaffold
jovial elm
serene scaffold
# jovial elm

remember to always show text as text, not a screenshot.

how do you know that using tensorflow with the gpu isn't working? I'm not saying it isn't, but like what code are you running, and what does it do that is different from what you expect?

jovial elm
#

Not sure why it says 12.2 in the top right, i installed 11.8 after noticing tensorflow specifically asks for 11.8 instead of 12.2
I even uninstalled every 12.2 version so that's weird.

and I run this code to determine if it has successfully detected the GPU

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

and so far it outputs [] only, which is an empty array meaning it hasn't found any GPU.

#

i also installed anaconda, and used the conda environment to handle the installation of tensorflow and all that for me, tested it with that environment and it was the exact same result

#

previously i used my default python installation, modules installed with pip

serene scaffold
#

I'm not seeing any solutions that don't involve anaconda, but I don't use anaconda, so I'll have to get off here.

jovial elm
#

I'll just try restarting my PC for now and see if that'll do anything.

#

spoiler: it did not

#

And it still says 12.2 when running nvidia-smi so i think i'll make this my focus to try and fix because i have no idea what else lol

misty flint
vernal acorn
#

I do wonder if something like plotly and dash with react wrappers would do the trick, but it might already complicate the issue

misty flint
#

that might not work

#

if you need to wrap react components, i recommend this one https://reflex.dev/

#

reminiscent of streamlit

vernal acorn
#

Ill try that out

#

Thanks!

misty flint
#

np. gl

brittle storm
#

Any one know how to use PyBluez?

barren fable
#

ML (KNN) - Finding The Best n_neighbor

# Perform Grid Search for best n_neighbors on the validation set
param_grid = {'n_neighbors': range(1, 11)}
knn_model = KNeighborsClassifier()
grid_search = GridSearchCV(knn_model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(xTrain, yTrain)

# Get the best value for n_neighbors from the validation set
best_n_neighbors = grid_search.best_params_['n_neighbors']

print("Best n_neighbors:", best_n_neighbors)

# Train the model with the best hyperparameter on the combined training and validation data
final_knn_model = KNeighborsClassifier(n_neighbors=best_n_neighbors)
final_knn_model.fit(np.vstack((xTrain, xValidation)), np.concatenate((yTrain, yValidation)))

# Test the final model on the test data
yPrediction = final_knn_model.predict(xTest)

# Calculate accuracy and display it as a percentage
accuracy = np.mean(yPrediction == yTest)
print("Accuracy: {:.2%}".format(accuracy))

# You can also use classification_report to get detailed metrics
print(classification_report(yTest, yPrediction))

I took this code from chatgpt to find the best n_neighbor, but the problem is that I tested it on many different codes, and every time it gave me the result that the best n_neighbor was 1, and its accuracy was 81%. When I tested n_neighbor manually, it gave me these results.

(n_neighbor: accuracy)
1: 82%
2: 80%
3: 82%
4: 81%
5: 83%
6: 81%
7: 83%
8: 82%
9: 83%
10: 82%

5, 7 and 9 are better so why the code didn't print anyone of them?

crimson quiver
#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

wind cosmos
#

Suggest a good comprehensive data science, AI ML course available online for paid/free ( yt, udemy, Coursera anything works)
Preferably cheaper

lapis sequoia
wind cosmos
#

Data science/ Data analytics
I'll be pursuing a applied statistics and data science/analytics degree from next year and I want to learn all that ahead of time and prepare my projects pre hand to stay ahead of my college

humble shore
#

yay

#

there is an ai and ml field

#

: )

#

any one uses keras, tf or sklearn?

lapis sequoia
#

can someone explain to me the SPPF layer in YOLO?

rugged mist
#

what's the best way to do torch.tensor([model(torch.tensor([t])) for t in T])
(T is a 1d tensor)

tidal bough
#

torch.tensor([t]) is weird; you can probably do, like,

torch.tensor([model(t) for t in T[:,None]]) 

to make all the t have a shape of (1,) already.

rugged mist
#

any way to avoid the list comp?

jovial elm
# jovial elm spoiler: it did not

update: nvidia-smi showing 12.2 was normal, so that wasn't the problem.
went ahead and installed conda & tensorflow in WSL ubuntu environment and bravo hurray it works .. !

>>> e = tf.config.list_physical_devices('GPU')[0]
>>> e
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
tidal bough
tidal bough
#

besides that, though, not much advice except "vectorize your model so you don't have to do this".

spark inlet
#

im using replit online ide

pseudo spire
#

use local then

spark inlet
rugged mist
# tidal bough besides that, though, not much advice except "vectorize your model so you don't ...

okay im probably very deep into an x-y problem so just ignore that previous question

im trying to achieve sthn similar to this video about solving an ODE with an nn

the nn's shape is like 1->32->1 and its being used to approximate an R->R function such that it obeys NN'(t) = f(NN(t), t)
to do this it makes a loss function L(NN) = sum(NN'(t) - f(NN(t), t) for t in T)

as far as i understand, normally the flow is like: for each (t, xtrue) pair, loss is L(model(t), xtrue)
but here its not like the loss takes in a single input-output pair and returns the loss for that sample, instead the loss takes in all input-output pairs and returns the loss for the whole set

tidal bough
#

sum(NN'(t) - f(NN(t), t) for t in T)
I suspect you want to have, like, a square inside the sum here, otherwise it's very easy to get 0 or negative loss without being anywhere close to optimality.

rugged mist
rugged mist
civic elm
#

Hi, this is from the book Hands-On Machine Learning... I am confused why would models be biased and how?

jovial elm
#

(On windows this time, instead of WSL)

#

I ran

conda install python=3.8
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python -m pip install "tensorflow<2.11"

and that worked

misty flint
scarlet iron
#

hey, i'm starting out with AI and data science and want to build a project. fine tuning LLMs has really interested me, although i'm not sure if that's something one can do as a begineer or how tough or complex it is

agile jackal
#

gpt4all what training mod is better, I want unhinged results that might be accurate
I have gpt4all and mpt

inland nebula
#

Hey guys, I've been learning Python as of late and just got an idea for a motion/limb tracking project; what Python libraries and related technologies should I use to make something like that? I know OpenCV may be one option, but I wanna know all my options before I start working on it

#

Also, I've got another idea regarding GPT where I feed the chatbot some data on a specific data which can then be used for knowledge purposes (e.g. lawbotpro.com), but I don't really know where and how to start working on it. YT tutorials only go through surface level knowledge. What should I do regarding this, and where should I start?

oblique quarry
#

this vid is prob going to help you

inland nebula
#

Thanks a lot, gonna look into this

cedar turret
#

I have a general data analysis question, but I'm not sure where the best place to ask it is.

#

I got a dataset that contains null values in some of the rows. Under what circumstances is it ok to keep those entries in my dataset? Should it be best practices to drop all those rows, or should I look at it as kind of a case by case basis?

serene scaffold
cedar turret
#

It contains ride information for a bike-share company. So it contains columns such as membership type, ride start time, end time, start station name, end station name and latitude and longitude for both the start and end of the ride.

#

The null values are isolated to location information, mostly the station name.

#

The goal is to identify trends in how different membership types use the service.

#

So when I'm looking at how long or what time of day or month different member types ride, I don't really need the location information. So when I do that kind of analysis, am I fine to keep those rows if they have null location information? Or am I best dropping it all together?

serene scaffold
#

sounds like you don't need that information for most of the analysis you might do.

#

but you can probably learn things about the unnamed locations given the lat/long and some external resource.

cedar turret
#

Yeah that was my intention. I want to take the lat and long and use geopandas and a shapefile to basically find out where and which neighborhoods rides most often take place.

serene scaffold
#

sounds like you know what to do.

cedar turret
#

And obviously if I have a null lat and long value, well I can't use that for that analysis but when I'm looking to determine how many riders use the service in January, a lack of lat and long shouldn't mean that I need to throw out that entry right?

serene scaffold
#

exactly

cedar turret
#

Ok great. I just needed someone else to bounce this idea off of. Make sure I wasn't completely crazy.

serene scaffold
#

you probably are, just in different ways.

hasty mountain
#

Guys, is there an optimized way of taking the mean and standard deviation of a dataset without having to load it all into the memory RAM?

I have a dataset of 60,000 100x100 images, and I want to resize them into 64x64 and take the mean and standard deviation for a VAE, but I'm afraid this may cause my RAM to become a george foreman grill.

I was thinking about loading the first 5,000 images, taking their mean and standard deviation. Then, load the second 5,000 images, mean and standard deviation and so on. In the end, I would sum all those means and standard deviations and take the average.

However, I did some quick calculations with presumed numbers (like 3, 7, 5...) and discovered that this approach won't get me exactly the correct mean it would provide if I took the mean and standard deviation of all 60,000 images at once. Is there a correct approach that will also save me memory and computation time?

agile cobalt
#

it should get you a mean close enough for all practical purposes?

#
hasty mountain
#

Thanks! I'll take a look!

#

Hm... The first post seems closer to what I want, the mean and standard deviation of the whole dataset considering each pixel value. The second seems to be more focused on mean and standard deviation of the channels. The first post seems to, in a nutshell, try the same approach as I said. Taking the mean of a number of samples (batches), and, in the end, taking the mean of those mean samples.

That approach doesn't really provide the exact numbers for the complete mean, though:

Total mean: (1 + 6 + 7 + 3 + 4)/5 = 21/5 = 4.2

Partial Mean: (1 + 6 + 7)/3 + (3 + 4)/2 = 14/3 + 7/2 = (28/6 + 21/6) = 49/6 ----> Total mean would be = (49/6)/2 = 49/12 = 4.083

I suppose this difference could be discarded, them? Unless I did something wrong in my calculations...

past meteor
#

Meaning, you can express them as a running total of something. For the mean it's obvious how to do this, for the stdev a bit less so but you're one quick google search away 🙂 => Welford's algorithm

fallen dagger
#

Has anyone here coded a deep learning/ML library from scratch? If so can I see it I'm trying to write my own and want to see other people's approaches to it

agile cobalt
# fallen dagger Has anyone here coded a deep learning/ML library from scratch? If so can I see i...

take a look at https://course.fast.ai - particularly the second part
that is definitely not something you should write your own of though, or at least, limit your code to high-level stuff while building on top of something like PyTorch (which is what they do there, though it does explains most operations and implement them in python before switching over to using pytorch's version)

A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.

barren fable
#

Is there anyone who knows about scatter plot interpretation and linear regression, dropping some columns that are not important, etc...?

fallen dagger
agile cobalt
#

just keep in mind that the performance of a ML library written in pure python would be many hundreds of thousands of times worse than something like PyTorch

#

even if it were written in C you would still likely be looking at hundreds of times worse performance, between complicated optimizations and GPU support
...mainly the later

fallen dagger
#

that's fine it's a learning experience foremost

#

maybe I'll try to optimize it as well and replicate it in C++ later

tidal bough
# hasty mountain Guys, is there an optimized way of taking the mean and standard deviation of a d...

Mean can be computed in an online way, only considering an element at a time:

def mean_online(it):
    cur_mean = next(it)
    cnt = 1
    for el in it:
        # currently we have sum(lst[:cnt])/cnt, and we want sum(lst[:cnt+1]/(cnt+1)
        # so we want cur_mean = (cur_mean*cnt + el)/(cnt+1)
        # which can be rearranged a bit to get:
        mul = 1/(cnt+1)
        cur_mean = cur_mean*(1-mul) + el*mul
        cnt += 1
    return cur_mean

And for std... compute the mean square, too, then subtract the squared mean, then take the square root.

#

Now, if you want to also do it quickly... probably the best idea would be to rewrite mean_online a bit so that it works on blocks of K elements, instead of 1 element at a time.

tidal bough
# tidal bough Now, if you want to also do it quickly... probably the best idea would be to rew...

yup, this seems to work for me:

def mean_std_blocks(it: Iterator[np.ndarray], ddof: int = 0) -> tuple[float, float]:
    cur_mean = 0
    cur_meansq = 0
    cnt = 0
    for block in it:
        k = len(block)
        cur_mean = cnt / (cnt + k) * cur_mean + 1 / (cnt + k) * block.sum()
        cur_meansq = cnt / (cnt + k) * cur_meansq + 1 / (cnt + k) * (block**2).sum()
        cnt += k
    std = np.sqrt(cur_meansq - cur_mean**2)
    if ddof != 0:
        std *= np.sqrt(cnt / (cnt - ddof))
    return cur_mean, std
#

So you just need to load your dataset in blocks small enough to comfortably fit into memory, and feed the iterator of blocks through a function like that.

hasty mountain
#

Well...I just discovered that if I resize my 100x100 images to 64x64, the mean and standard deviation won't change that much (~0.02 more or less) pithink

#

But then... could that also be valid if I resize my 100x100 images to 200x200?

civic elm
tidal bough
#

(another way of thinking about this, is that duplicating points (like, going from arr to np.concat([arr,arr])) changes neither mean nor std - resizing isn't quite the same, since it does interpolation, but it shouldn't be far from it. Though I'm not sure how to state it formally.)

hasty mountain
tidal bough
#

Yeah, that's the "duplicating points doesn't change moments" thing, but in general resizing involves linear or nonlinear interpolation over some grid.

hasty mountain
#

Yeah, I suppose it may be possible that some modes of resizing might mess up with the statistics. But usually I'm just going for the classic mode (which is nearest neighbors, I think?)

#

Well... in that case... Maybe I could resize my 100x100 images into... 4x4 and take their mean and std? pithink

lapis sequoia
#

what do you do at work when you are waiting for some model training?

tidal bough
hasty mountain
civic elm
#

they use numpy though, but that's opensource and written in c/c++ I think?

coral field
#

does anyone know any model/ website i can use to find the best font style for an image?

spare briar
spare briar
#

loss = mean squared error reconstruction term - KL divergence

#

the mean squared error term is derived from the pixel-wise gaussian likelihood

#

have you read this? https://arxiv.org/abs/1312.6114

hasty mountain
spare briar
#

No this isn't what the VAE does

hasty mountain
spare briar
#

it generates latent gaussians then maps them with a neural network to denormalized pixels

hasty mountain
spare briar
#

The pixels are modeled as Gaussians centered around the true pixel value (which is like adding gaussian noise)

#

The latent gaussians are the KL loss. The pixel-wise reconstruction is the MSE loss

#

assume that each pixel has a true value, but there is some noise from sampling

#

we model the noise as a gaussian

hasty mountain
spare briar
#

so when we evaluate a particular pixel we do e^{(x - true value)^2/2\sigma}

hasty mountain
#

I've never had a VAE working on RGB images when using MSE Loss. Only on grayscale images.

spare briar
#

if you take the log that gives you the reconstruction loss term which is a gaussian log likelihood over pixels

#

Something is wrong with your implementation

spare briar
#
hasty mountain
#

The gaussian log likelihood is indeed over the output values...but not over the pixels, but instead it considers the values as parameters of the distribution

spare briar
#

ok this is just a very shitty old fashioned vae implementation

hasty mountain
#

But I do found it strange that most papers still consider MSE. But MSE never worked for me

spare briar
#

look at the paper i linked just above

hasty mountain
#

But ok, I'll take a look

#

I plan on making a paper on VAEs, so it'll be useful

spare briar
#
#

its code by a meta engineer to teach VAEs

#

not to implement a state of the art high quality image generation vae

tidal bough
#

It's a code by a Meta Engineer, though
ah yes, it is well known that Meta engineers only write good, quality code

spare briar
#

I highly recommend that you read the VAE paper (autoencoding variational bayes) closely

#

It seems like you are misunderstanding how the VAE works/how the loss is derived

#

Then when you go to implement it follow the NVAE or VQVAE papers for all of the modern tricks to make the images look nice

hasty mountain
#

Ok then. But then... should the dataset be normalized in a specific way? Or should the Decoder have some specific activation function?

#

I've been using a dataset scaled to be within range [-1, 1], and my VAE only worked with sigmoid activation at the decoder + GLLLoss

spare briar
#

If the input images are scaled, the generated images will be scaled

#

Again you are working based on that very barebones VAE implementation, which is similar to the original paper and doesn't include anything learned since

lapis sequoia
#

Hey there! I am currently pursuing a degree in bioinformatics..I was looking for someone in the same field or in the data science domain in order to participate with..to solve an extremely challenging and interesting problem statement available on Kaggle..would love if anyone would want to collaborate..looking forward to the same!!! Do DM if u r an enthusiast too!

hasty mountain
spare briar
#

This question is missing the forest for the trees

hasty mountain
#

Or maybe remove the final activation function at all? pithink

hasty mountain
#

Hm... I didn't quite understand why use MSE instead of GLLLoss at all... But I do like the possibility of using Feedforward Layers for the VAE in an effective way.
I also didn't get the "codebook" thing. Would it be like...a second Encoder? Or simply...a "book" of optimizable parameters?

verbal oyster
hasty mountain
#

I'll take a look at the vanilla VQ-VAE paper...

spare briar
humble canyon
verbal oyster
#

Yo can I be part of the group

hasty mountain
#

I may need to review the Lilian Weng's blog, too pithink

spare briar
#

yeah her content is good too

#

(I mean the chapters on latent variable inference)

keen kettle
#

Hi guys ... all my ML friends I need a bit of advice from you all.

I am working with my professor to conduct research in the fraud detection domain. I've recently made a hybrid ensemble model (using ensemble learn with LR, DT, SGD and NB) and written a paper on the same. It is in review by my professor , and I hope to send it out to journals shortly.

The next aim for our research is to develop a novel machine learning model for credit card fraud detection, any insights on how I should go about implementing the model?

keen kettle
lapis sequoia
#

is this a good channel for data visualization questions

#

data vis in this context would be computer graphics based, not like say R based

boreal gale
lapis sequoia
#

don't have any yet but when I do I will

lapis sequoia
hasty mountain
civic elm
#

keras is awesome just saying

#

pytorch vs keras?

#

what does one offer that is exclusive from the other?

hasty mountain
somber hamlet
#

Hello, does someone know a matplotlib colormap that render well both on white and black background?

sleek harbor
somber hamlet
#

Another open question I have is which color to choose for the axis labels lemon_thinking

sleek harbor
somber hamlet
sleek harbor
young granite
sleek harbor
# somber hamlet I see, I only took `cmap` in it's most basic form, a map of colors. Can you just...

If u want less vibrant, calm-ish colors, try these.. but I don't think that's what you want..

def dark_style():
    from cycler import cycler
    plt.style.use(["dark_background", "bmh"])
    plt.rcParams["axes.facecolor"] = "#23272e"
    plt.rcParams["figure.facecolor"] = "#23272e"
    plt.rcParams["axes.prop_cycle"] = cycler(
        "color",
        [
            "#1c90d4",
            "#ad0026",
            "#530fff",
            "#429900",
            "#d55e00",
            "#ff47ac",
            "#42baff",
            "#009e73",
            "#fff133",
            "#0072b2",
        ],
    )

dark_style()
young granite
# somber hamlet I see, I only took `cmap` in it's most basic form, a map of colors. Can you just...

for custom color_ramp:

rgb_list = [

]

def rgb_to_hex(r, g, b):
    return '#{:02x}{:02x}{:02x}'.format(r, g, b)

hex_list = []
for i in rgb_list:
    hex_list.append(rgb_to_hex(i[0], i[1], i[2]))

def make_Ramp(ramp_colors): 
    color_ramp = LinearSegmentedColormap.from_list( 'my_list', [ Color( c1 ).rgb for c1 in ramp_colors ] )
    plt.figure( figsize = (15,3))
    plt.imshow( [list(np.arange(0, len( ramp_colors ) , 0.1)) ] , interpolation='nearest', origin='lower', cmap= color_ramp )
    plt.xticks([])
    plt.yticks([])
    return color_ramp

custom_ramp = make_Ramp(hex_list)

rgb_tuple = cmr.take_cmap_colors(custom_ramp, "color_ticks", return_fmt='hex')
somber hamlet
#

thanks for the provided cmap, will try it. FTR here is the native colormap, I'll compare to it

#

It's quite good actually! Probably good enough, thanks!

#

Will try to fiddle a bit around

young granite
somber hamlet
#

Ah, I was searching with the wrong term, fig.set_edgecolor is supposed to work

#

no that's not it, it's the edges of the objects drawn pithink

somber hamlet
somber hamlet
#

is it called the frameon?

boreal gale
#

try ax.spines['left'].set_color('red')

somber hamlet
#

yeay, success. ax.spines[:].set_color(mpl.colors.to_rgba("#FF6600")), thanks! research hell

sleek harbor
odd meteor
#

To whom it may interest.

keen kettle
#

In the meantime, if you've any other suggestions I'd highly appreciate it 😄

coral field
#

how can i incorporate huggingface models into tensorflow for transfer learning? I have google's ViT from huggingface imported, and i want to add another dense layer to identify classes

tulip barn
#

Dunno if this is the right place for it but I found this super cool upcoming AI class

humble shore
arctic wedgeBOT
#

6. Do not post unapproved advertising.

noble plover
#

Hello, can someone help me with a csv file? I am trying to read the file using pandas.

csv_file = 'recensioni.csv'
df = pd.read_csv(csv_file)

and I get this error pandas.errors.ParserError: Error tokenizing data. C error: Expected 12 fields in line 3, saw 13.
My third line of the .csv file is this one:

"Consiglio vivamente!", "È fantastico, lo adoro!",4,,Alessia,,8481001046362,,,,,
What's causing the error is the comma between the quotes. This causes the csv file thinking he has one more column. The csv parses it as it is a delimiter, but it is not. I found on the internet that you just have to put the comma between the quotes ( just like this -> "This is not, delimited" ) but it doesn't seem to work. Does anyone have any idea?

serene scaffold
noble plover
#

something changed

#

I get the same error but a new information has been added to the error:
Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

serene scaffold
#

everything changed when the fire nation attacked.

#

oh, great.

left tartan
#

that ignores any "extra" trailing commas

#

because: it reads the header first, then only looks for those columns.

noble plover
left tartan
#

You don't need to use stringio, I just did that so I could provide a standalone xample

civic elm
#

I finally got my cat/not a cat binary classifier woot!

serene scaffold
left tartan
#

You'd just do ```py

header = pd.read_csv(filename, nrows=1).columns.tolist()
pd.read_csv(filename, usecols=header)```

noble plover
civic elm
#

5 weeks!

noble plover
#

okay now it's not giving errors

#

but it's splitting where it finds the comma inside the quotes

#

the value inside rating should have been concatenated in the body. And the value inside review_date should've been inside rating

#

It parsed the quotes instead of the comma

tepid tartan
#

The best way to start data science is focus on Linear Math first right?

left tartan
# noble plover okay I'll try

Try to modify my example to reproduce your problem. That's the best way to get help: a reproducible example that someone can work from.

noble plover
#

I have many two fields in many rows that contains a comma inside quotes

left tartan
noble plover
#

I'll try, thank you

tired elk
#

hey everyone, im trying to implement this linear regression model for stock prices but I instead of a line i want to plot a more complex curve that is shaped like this data- anyone have any suggestions- also pls let me know if you think thats a bad idea because I'm a complete beginner - thanks

desert oar
mild dirge
#

Like not only predicting the price change, but also the std of the price change

civic elm
#

Draw a line using 2 vectors

#

My statistics book would tell me. You have not really explored the data set

desert oar
# mild dirge Maybe draw a line with a confidence interval or something?

the chart to me is basically showing 0 linear relationship, so the best fit line is horizontal. it's also showing a very particular pattern of heteroskedasticity: conditional variance is strongly and monotonically related to volatility... which it had damn well better be because that is usually how volatility is defined.

#

oh that's volume not volatility

#

lol well it's still flat

floral tangle
#

Anyone know why the first debug print my string concat correctly and my ValueError tears the string finaly?

My call:

pos_tags = ['PRON', 'VERB', 'PUNCT']
is_valid = check_sentence(' '.join(pos_tags), grammar)
def check_sentence(sentence, grammar):
    print(f"try sentence: '{sentence}'")
    parser = nltk.ChartParser(grammar)
    try:
        for tree in parser.parse(sentence):
            print("Zugehörige Syntaxbaumstruktur:", tree)
            return True
        print("Keine Übereinstimmung mit der definierten Grammatik gefunden.")
        return None
    except ValueError as e:
        print("Fehler beim Parsen:", e)
        return False

Output: try sentence: 'PRON VERB PUNCT' Fehler beim Parsen: Grammar does not cover some of the input words: "'P', 'R', 'O', 'N', ' ', 'V', 'E', 'R', 'B', ' ', 'P', 'U', 'N', 'C', 'T'".

young granite
bronze flint
#

My prof gave me this project to work on, but i assume it was posted quite a bit ago on his website
The project itself contains 10gb of checkpoints and annotations and data which in my opinion is a lot because it was specifically formatted for Mask RCNN or Faster RCNN

This was released when Yolo V4 wasnt even out so i assume Faster RCNN is old now that YOLOV8 exists and it's format

I assume Faster RCNN is still a thing thats used or should i try and find other project that i could use YOLO on

#

I guess for the sake of practise i could do Faster RCNN

vestal widget
#

Can someone explain for me the difference between dataset and language model?

#

I read some articles about both of them but haven't really clear about it

trail rune
# vestal widget Can someone explain for me the difference between dataset and language model?

Dataset is basically raw data (can be text, video etc) that is used to train machine learning models or any model at all. While language models are machine learning/ statistical models that are trained on text data and are able to generate texts based on the dataset they're trained on.
In the context of large language models, the dataset are text gathered from various sources (books, web pages etc) and the language model is trained on these texts. It's able to find relationships between the words in the text and can learn to generate more texts based on that dataset its been trained on.

vestal widget
zealous badger
#

hey um are there any datasets which dont have models trained on them with more than 90% accuracy?

#

its to do with a assignment , we have to find these "challenging" datasets and try and improve on these scores . i just cant seem to find any. i assume all tabular datasets have models that can have scores >90%

mild dirge
#

Well if they have a low accuracy and they are popular, it will be hard to improve on them by yourself

#

But there are pretty old datasets like ImageNet that have barely 91% accuracy even after existing for so long

pine escarp
#

Can you guys recommend me beginner machine learning projects?

zealous badger
#

not feasible for an individual

desert oar
desert oar
buoyant mural
#

def scrapingMobilePhones():
url="https://www.flipkart.com/search?q=mobiles under 50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
r=requests.get(url)
soup=BeautifulSoup(r.text,"html.parser")
while True:
np=soup.find("a",class_="1LKTO3").get("href")
cnp="https://www.flipkart.com"+np
return cnp
#url=cnp
#r=requests.get(url)
#soup=BeautifulSoup(r.text,"html.parser")

print(scrapingMobilePhones())

#

help me with it

#

import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrapingMobilePhones():
url="https://www.flipkart.com/search?q=mobiles under 50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
r=requests.get(url)
soup=BeautifulSoup(r.text,"html.parser")
while True:
np=soup.find("a",class_="1LKTO3").get("href")
cnp="https://www.flipkart.com"+np
return cnp
#url=cnp
#r=requests.get(url)
#soup=BeautifulSoup(r.text,"html.parser")

print(scrapingMobilePhones()) #why I am not being able to get the href link from the class 1LKTO3

#

I am trying web scraping and I am complete beginner in it

#

fix the problem

#

I fixed it no worries

cinder schooner
#

Hello, can anyone review this part of my resume ? I'm feeling like it may be too long or have too much details but i don't know how to rewrite it. I know maybe it should be in career-discussion but it need AI expertise to answer so i thought i'd write here

cinder schooner
#

i know but what should i remove

past meteor
#

Most of it

unborn adder
#

1st point, 3rd point, 5th point

past meteor
#

Most companies have a prescreen by HR, for them that's just unreadable

cinder schooner
unborn adder
#

remove those

#

keep just 2nd and 4th

#

those others aren't saying much

cinder schooner
#

how about this version

#

Development of a Real-time ultrasound image analysis system for an Nvidia Jetson Nano edge device.

Refactored the codebase, profiled and analyzed the model's architecture, hardware optimization and deployment to diagnose latency and overheating issues.
Developed a lighter, extensible deep multi-task model that competes with the baseline segmentation model while decreasing inference time and halving loading time.
past meteor
#

Why add NVIDIA Jetson nano

#

Less is more, take it out. It detracts from the rest

unborn adder
#

I think you are trying to impress them by fancy words..that won't happen haha, they know what they are looking for, list them 3,4 bullet points you think you can speak upon and that's it...they are not going to be "impressed" by your CV, they actually don't care..trust me

cinder schooner
#

i'm really trying to tell what i worked on. So let me explain and you can tell me how to say this.
First I refactored the code base. Then i analyzed the model and profiled it with cuda tools looking for why is it taking too much inference time. I found that since its a segmentation model and then theres calculation of the bounding boxes from the segmentation masks theres a big bottleneck on the bounding box calculation part. I then changed that with another multi task model that does classification and bounding box regression. Since its a direct bounding box regression theres no need to the post processing thus no bottlenecks. and since its only one model we didnt have to load 2 models on the edge device.
And since its a multitask model its extensible so we're adding other auxilary tasks like calculating surface and angles etc

past meteor
#

I'm on mobile otherwise I'd have written my take on it.

To me it's a bit about like academic writing, no superfluous language. Why? They're reading tons of CVs and they won't read it in detail.

Secondly, it's about being able to impress HR, technical people and business people with one document. Too much of what you have doesn't speak to HR or business. Even if it's a DS they might not be in your niche

cinder schooner
#

that's why i asked, i'm really convinced its long and complexe but I really also want someone who after the HR that works on computer vision can know what I did. I really worked hard on this and since i'm a graduate i don't have many experiences on the computer vision part most of them are pure software engineering so i'm focusing on this

past meteor
#

I don't think detailed explanations of what are ever relevant

unborn adder
#

.
that's it..nothing else...

#

I haven't failed any interview or job proposal I every did, I aced everything so far..plus..pls don't do cold email approach, that's the worst, hit them with that on linked in or something...or If you have to do email approach, mention them their company and don't sounds like you cold emailed 10 others companies too..this should work

cinder schooner
past meteor
#

Development of system for real time fetal ultrasound image analysis:

  • Development of a neural network based system to classify and locate fetuses.

  • Improved time-to-prediction and performance of the previous model.

  • Deployed the model on specialised hardware in the field.

cinder schooner
#

and now I really know onnx and played a lot with it

past meteor
#

Notice how I use time to prediction instead of inference time

cinder schooner
#

Thank you really much guys

past meteor
#

That term doesn't even exist but everyone can understand what it means

unborn adder
cinder schooner
#

sorry if you're a woman, i always say guys

past meteor
#

HR and business don't care about onnx

unborn adder
#

that's right

past meteor
#

That's 2/3rd of your audience

unborn adder
#

I hope you did a research on their company and technologies they are working on and with whom they are working with...beacuse if you mention anything unrelated to that...they are not interested

past meteor
#

If I'm hiring I'm more interested in seeing if you can solve the problem. Why? Maybe inference on edge isn't even important, maybe you can just call the model from some API

#

ONNX is a very specific niche I think

unborn adder
#

unless you KNOW for sure they are working with ONNX, don't mention it

past meteor
#

(We do inference on edge but we just use a container and NVIDIA machines, they run it in exactly the same way our desktops run the models so no ONNX or tflite)

cinder schooner
#

yeah i didn't mention onnx but thought about that hardware optimization part

unborn adder
#

maybe they are working on something else and hate that...for example, maybe is someone using PyTorch and hate TensorFlow and if you mention that your are good and played a lot with TensorFlow, they won't care firstly, and secondly they might refuse you just because for that...it's complicated

cinder schooner
#

that's what important for edge ai

#

thank you very much, i understand

unborn adder
#

any time...good luck with that!

somber hamlet
unborn adder
#

any good ML papers to recommend? I don't know who to trust xd

past meteor
#

I mean, what type of paper?

unborn adder
#

I don't know if this is a good answer but I'm interested in computer vision

#

I have never read them..any of them..but I want to

#

is there that type of paper? on computer vision? or anything related

past meteor
#

I'd just read dive into deep learning

#

It lists seminal papers in computer vision so you can just read those then if you want more details

unborn adder
#

alright, any specific papers on that you would recommend?

past meteor
#

I'd just go with a book because papers expect you to have certain prerequisite knowledge and CV is mature enough to have books that take you from A to Z

unborn adder
cinder schooner
sleek harbor
# somber hamlet Hey Mayushii, I've compiled my results here: https://github.com/kraktus/cosmopol...

👍 I like mine best :3

P.s. they aren't exactly picked by hand tho. I ripped the colors from the bmh theme, and configured the color intensity (saturation) in such a way, so that on my specific background (the one I used up there, which is, btw, also ripped from the background color of the Dark One Pro Darker vs code theme, which is what I use), it looks best (to my eyes), when there are overlapping semi-transparent elements (such as histograms). So I get graphs that seems as if they have no background at all (cus the background of the graphs is the same as my editor theme), but others will just get a pleasant dark grey background with calm-ish colors that mix well on the background, as well as when elements are transparent. It was never meant to work well on a light background tho, never even considered that

cinder schooner
# unborn adder alright, any specific papers on that you would recommend?

what I do is i would read on something in particular and then build up on that. For exemple i was working on object detection so I found that there single shot detector that predict directly and two stage detectors. I started reading the papers thats shaped both so I read the papers about the versions of YOLO and what they introduced each time. Then I read the papers about RCNN and the versions so I understand more about each type and the difference. Then I started reading about the different possible loss functions used and the versions of the IOU loss they build each time. Then I read about the tuning of this models and choosing the hyperparameters then I built something to try.

somber hamlet
vestal widget
#

Im using nano-gpt, is it possible to create a language model for chatbot from my own dataset i give it?

serene scaffold
median fulcrum
#

Hi guys, I don't see so much discussion about the jupyter notebook 7 migration, so I tought would be cool to talk here

#

the strange think is that there's not a lot of issues in github repos saying that the extension is not working properly

tepid tartan
void veldt
#

is this where I would ask questions regarding data fitting and scipy minimize?

void veldt
#

so I posted my question on SO since easier to format code there, but in short just trying to confirm my code is setup properly

#

had a quick question regarding differences between LSMR and LIMFIT using my setup, I appear to be getting different solutions but don't quite understand why: https://stackoverflow.com/questions/76798827/lmfit-vs-lsmr-am-i-getting-different-fits-due-to-machine-precision

#

Based from my understanding, my data is just trash and due to the differences between how solvers work (nelder-mead versus levenberg-marquedt), I arrive at different solutions due to the minima being within machine precision

#

like with good quality data, with the setup I have, I should arrive to the same solution. But since my data is trash, that is why I am observing the divergence

tepid tartan
lapis sequoia
#

hello, I wanted to know that if universal sentence encoder uses gpu or not?

#

hey guys please help me

grim hearth
lapis sequoia
#

how do I fix it

#

it's not my code I am using : tortoise-tts-fast

void veldt
tidal bough
cosmic lynx
#

a few questions:

  1. how far of a jump is a digit reading AI to something that could play nim
  2. to make a game AI, would I need to learn another language aside from Python?
mild dirge
cosmic lynx
#

in that case, what would be a better next step?

woeful fiber
#

Has anyone seen a nice example repo for yolov7 object tracking in mp4 files

#

I have a little project idea for object tracking in video and since it’s kinda the thing deep learning for recognized for I figured there would be a lot of material for it

civic elm
#

Anyone here working in a large company? what is the stack like? aws? azure?

#

What about the data extraction?

cosmic lynx
civic elm
soft dock
# civic elm I just need info to build my cv

It still heavily depends on the industry you find yourself in and the company you'll be working for. It also depends on the subdiscipline of data science you'd like to go into. In my opinion, simply working on projects demonstrates a lot of your skills. Scrape ugly data from a website you're interested in, scrub it until it can fit into a model, and present your data analysis with visuals. It's even better if you can make a sort of dashboard app with it, and even better if you use a bit of DevOps to optimize how the app is shared/deployed. The specific packages and software don't matter as much as the results, because if you can do it once with x software you can be trained by your company to do it again with y software.

upper flame
#

hey

#

does anyone understand a lil bit in finance

serene scaffold
void veldt
frank helm
#

Need a bit help with course selection.

I completed self studying calculus 1 and 2. My plan is to do calc 3 w/ probability, and linear algebra w/ statistics after that.

However, MITOCW's probabilistic systems and applied probability has been rather difficult for me. One of my good friends recommended Georgie Tech's proabibility course.

So my question is are the following two courses enough to get me started with Data Science?

  1. Geogria's Probability https://www2.isye.gatech.edu/~sman/courses/6739/
    also available on edx: https://www.edx.org/professional-certificate/gtx-probability-random-variables
  2. Statistics for applications: https://ocw.mit.edu/courses/18-650-statistics-for-applications-fall-2016/video_galleries/lecture-videos/

OR https://ocw.mit.edu/courses/6-041sc-probabilistic-systems-analysis-and-applied-probability-fall-2013/pages/resource-index/ is a must? I am currently taking this right now and I am not a huge fan of the psets. Its way too hard and so are the recitations and since there aren't any easy problems the learning curve is way too steep. I am also in a time crunch so its hard for me to go out of my way to research new things.

Georgia's probability also has stats in the course. But its preferable I take statistics for applications as well right?

vestal widget
#

I read some articles online said that stuff like GPT-3 and GPT-4 is a language model. So i wanna ask, does the language model is the code it self or is it some kind of data that help the code determined the output?

serene scaffold
bronze flint
#

Hello,
I am converting COCO format to YOLO format and i am normalizing BBOX data to be between 0 and 1
When i did 1 epoch to just test if labels were correct, YOLO kept drawing rectangles off the actual coordinates

When i tested if i normalized coordinates correctly it worked locally

img = cv2.imread('vid_000031_frame0000043.jpg')

x = round(190.00196078431372/img.shape[1],6)
y = round(132.00196078431372/img.shape[0],6)
w = round(116.99607843137255/img.shape[1],6)
h = round(20.996078431372553/img.shape[0],6)

print(x,y,w,h)

x_pixel = int(x*img.shape[1])
y_pixel = int(y*img.shape[0])
w_pixel = int(w*img.shape[1])
h_pixel = int(h*img.shape[0])

print(x_pixel,y_pixel,w_pixel,h_pixel)
print(img.shape)
cv2.rectangle(img, (x_pixel,y_pixel), (x_pixel+w_pixel,y_pixel+h_pixel),(255,0,0),4)

It drew it well
I normalized and got it back to original state and everything worked well

I am unsure if YOLO is doing something badly or if i normalized data badly for the YOLO format
All i did is scale it based on image height and width as u can see here

x = round(190.00196078431372/img.shape[1],6)
y = round(132.00196078431372/img.shape[0],6)
w = round(116.99607843137255/img.shape[1],6)
h = round(20.996078431372553/img.shape[0],6)
lapis sequoia
#

@sonic vapor offering money for services. Against rules.

north rain
#

@zealous ermine You'd be better off looking for paid work on a freelance site like fiverr or upwork, we don't allow the solicitation of paid work here

#

!rule 9

arctic wedgeBOT
#

9. Do not offer or ask for paid work of any kind.

slim bone
#

Hey fellas, need a quick check on my understanding - Is "Label" the attribute which classifies what kind of data we fed the machine?
So, when we run some data through a neural network, we want it to, ideally, output label - and when the network does(?) back propagation it calculates the cost relative to the label (which essentially tells it the optimal outcome)?

I apologize if the explanation is unclear, I can rephrase if needsbe

serene scaffold
#

and when the network does(?) back propagation it calculates the cost relative to the label (which essentially tells it the optimal outcome)?
your understanding is missing some steps. you can't calculate "the cost relative to a label". a label is a symbol, not a number that you can do math/calculations with.

#

@slim bone let me know when you're here and we can go into more detail.

slim bone
#

I’m internalizing, I’ll ping you in a moment

serene scaffold
#

okie

slim bone
#

I appreciate the detailed explanation

#

and then your training and test instances are labeled as "cat" or as "dog".
This is indeed what I thought what labels were initially, but I'm following this link at the moment:
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
And the label (variable: labels) appears to be an array. What does this array represent exactly?

a label is a symbol
I'm assuming the first question would answer this

@serene scaffold Forgot to mention, I'm free to chat now

serene scaffold
# slim bone > and then your training and test instances are labeled as "cat" or as "dog". Th...

you're right that labels (the variable) are an array/tensor. when the labels are discrete, non-numeric symbols, a popular way to represent them as arrays is one-hot encoding. So you might decide that [1, 0] is the array for "dog", and [0, 1] is the array for "cat".

And if you have a sequence of images that are [cat, cat, dog], the representation would be

[[0, 1],
[0, 1],
[1, 0]]

does that make sense so far, @slim bone?

#

if you had a third class, like "turtle", then those could be [0, 0, 1], and you'd have to add an extra zero to "dog" and "cat" (because now there's three classes, not two)

slim bone
#

Also, on that note - can a desired outcome be [0.5,0.5,0] for example? (Probably not in this model, but perhaps a different one?)

serene scaffold
serene scaffold
slim bone
slim bone
serene scaffold
# slim bone Ah, maybe I don't entirely understand what a dataset is Why would you want a seq...

in this case, a dataset is all the images you have to train and test your model.

Why would you want a sequence of images? To quicken the computation of er... the correction of the weights and biases?

this is getting into the batch size. which is a hyperparameter. the batch size is the number of training instances, and the model's current outputs on those instances, that the model considers at a time when calculating the direction of the gradient.

slim bone
serene scaffold
slim bone
slim bone
serene scaffold
slim bone
serene scaffold
#

in this case, the y data is the label for the image, and the X data is the image itself

slim bone
serene scaffold
slim bone
#

If not, you don't have to explain. I'll just read on it later - I don't want to waste your time

#

Alright

serene scaffold
#

You are not wasting my time.

#

Using it? yes. but not wasting it.

slim bone
#

Apologies, I meant - I don't want to waste your time if I can probably read on this myself later
If you're still happy to spare an explanation I'll gladly take it obviously

#

Currently, I'm just trying to understand this little labels array

serene scaffold
#

the page you're looking at doesn't show a whole training procedure.

slim bone
#

Right

serene scaffold
#

but if you had an image classifier for cats and dogs, a batch of three images would look like this for the y data

[[0, 1],
[0, 1],
[1, 0]]

slim bone
#

Right, so far so good

serene scaffold
#

and a batch of three 64-by-64 pixel colored images would be an array or tensor of shape (3, 3, 64, 64)

#

(the first three is three images, and the second three is red-green-blue)

slim bone
#

Makes perfect sense

#

In the example though, data is only a single image
So wouldn't you only need a single label?

Wait actually, that is a single label, isn't it? It's just that there's "a thousand y-data"?

#

Or at least,* a thousand possible outcomes?

serene scaffold
#

it might be that this is a binary classifier. a binary classifier for pictures of cats would give you the probability that the picture is of a cat

slim bone
serene scaffold
slim bone
#

Would you reckon I should even get hung up on this piece of code? If I think I understand your explanation for labels at least?

#

I'm just trying to dip my toes and familiarize myself with the terminology, so I can eventually read the Pytorch documentation without drowning

#

I'm about to enter my second year as a CS student and figured this would be a nice summer project

#

Considering I have some of the mathematical background, although probably very little of it.

serene scaffold
slim bone
serene scaffold
slim bone
#

Should I just look up some papers? Or are you referring to something a little more basic

slim bone
left tartan
slim bone
#

Almost everything made sense (give or take, a couple of knowledge gaps in back propagation)

#

I do agree, the videos are excellent.

left tartan
#

yah, like most things, it'll gain meaning on rewatch

slim bone
#

Ah so far I've rewatched at least one video every day haha

#

Glad to see those are so heavily recommended though

left tartan
#

the videos often have something for every skill level... look at the 4th (back propogation), I get why it'd be a lot

slim bone
left tartan
#

Yah, but I think you can understand it without really intuiting why derivatives are involved

left tartan
#

To find the optimal nn

#

Obviously it'd work (find the right answer)... it would just be computationally infeasible.

slim bone
#

I figured I'd watch it later once I understand the terminology a little better
I just finished my calculus courses so I don't think understanding what's being said is beyond my reach (Then again, I have no idea)

slim bone
left tartan
#

So, video 4 gets into the "really tough problem" part: that it's really "hard" to find the optimal parameters.

#

And that's where all the clever algorithms and math come into play. but if you pretended for a moment that you could just exhaustively search every single possible set of parameters, and ignored this problem, everything is actually fairly simple.

slim bone
#

That's curious, and sounds rather crucial

left tartan
#

I think if you "get" hill climbing, gradient descent is much easier to intuit.

slim bone
#

Oh I understand gradient descent just fine I think

#

I didn't cover a topic called "hill climbing" in my courses - the idea of a multivariable function being a hill was mentioned in calculus 2 though

#

I was indeed explained the idea of "gradient" by the "imagine you're traversing the function on-foot" which what I think you're poking at

left tartan
#

so, imagine combining that idea with derivatives (where you can look at a multidimensional "slope" to decide which way to go)

#

Anyway, this is basically the idea of the 4th video

slim bone
#

Ah, it sounds simple when you put it like that

left tartan
#

Yah, the math does get hairy though.

slim bone
#

As one would imagine, I'll keep it in mind though. Thank you

civic elm
#

Question, are there any Data Science online certifications that are valuable to employers? for example, Google ML certification, is this something they like?

fresh harbor
#

what is the equivalent of onnxruntime.InferenceSession.get_inputs in OpenCV's cv2.dnn.Net?

sturdy canyon
# civic elm Question, are there any Data Science online certifications that are valuable to ...

Unfortunately I think the answer is it depends on the recruiter. Based on talking with my friend who has done some hiring in this space, he doesn't care so much about what you've learned/what certs you have. He's more inrerested in the stuff you've worked on (personal projects or otherwise) in a real world space, and that you can logically reason your way through a problem. He's one person though, so the next recruiter may require you to have a dual degree in quantum computing and elementary education to be considered pithink

#

I think you should ask yourself if you think you'd find value in them, or if you'd prefer to learn on your own. I got a data science cert back in the day for R, which was very helpful to solidify and learn how to apply the stats foundation I got in school (and it was faster than figuring it out myself). However, once I got into Python and ML, I felt comfortable enough to just learn it on my own/ask questions of my coworkers.

timid kiln
#

Where do y'all think the best place to ask questions about plotly would be?

past meteor
#

It's actually good practice. Make some fake data, take a loss function (MSE) and write your own linear regression with gradient descent

#

I did this some years ago, it's good to for example start with regular GD, then make it SGD, then add regularization, then make a 2nd order method, ...

daring sphinx
#

guys I'm kind of clueless on what to do next so I'm asking it here. I want to make some money next month through freelancing. $50 is enough. Right now, I know tensorflow, pytorch, natural language processing with spacy and transformers and traditional machine learning algorithms with sklearn. Most I've done in terms of deployment is deploying a simple gradio app of image classification in hugging face.
Right now what do I need to learn to start making some money through freelancing? I'm a fast learner.

slim bone
#

But perhaps that is the way to go - think of a cool project, and just figure out how to do it by any means necessary

serene scaffold
slim bone
#

Complete top-down approach type of thing, food for thought I suppose

past meteor
#

Do it with numpy

slim bone
past meteor
#

gradient descent with numpy is <10 lines of very easy code

serene scaffold
past meteor
#

Take a piece of paper and write out the partial derivatives of MSE and start there. It's easier than you think

past meteor
#

Afterwards, look at how y^{hat} is found in math terms (hint, it's just a dot product)

slim bone
past meteor
#

And then you're half way there

serene scaffold
past meteor
slim bone
#

Worth trying then, got a fun project idea?

past meteor
#

I gave a seminar on this a while ago to people that didn't have anything more than HS math

slim bone
#

What.

#

I don't think I could've even fathomed the concept of a "gradient" in highschool lol

daring sphinx
#

thanks for the motivation.

past meteor
#

But you did learn what a partial derivative is in high school right?

serene scaffold
slim bone
past meteor
#

I think I pitched it as learning is 1) trying something 2) making mistakes 3) getting feedback 4) improving 5) going back to step 1

daring sphinx
#

sure bro

past meteor
#

That's the core loop of gradient descent, idt I used the word "gradient" there unless in very vague terms

slim bone
#

Ah

serene scaffold
slim bone
#

Yeah I do agree that the idea is rather elegant

past meteor
#

I think you're scaring yourself. It's not that hard. If you know a regular derivative and the chain rule you can do partial derivatives and then you can understand what a gradient is

slim bone
#

Oh, I know what a gradient is. It's just weird to me that you relied on Highschool math

past meteor
#

Partial derivatives are high school math or at least they were to me

slim bone
#

Weird? Impressive? Not sure what's ther ight word here

daring sphinx
slim bone
serene scaffold
daring sphinx
serene scaffold
#

@past meteor my high school math ended at trig. only the most advanced students would learn limits and derivative calculus

past meteor
#

Yeah we learnt what vectors and matrices are and how to multiply them

serene scaffold
lapis sequoia
daring sphinx
past meteor
#

and with that you have enough for SGD

slim bone
daring sphinx
#

I'm through the AWS course 50% already.

past meteor
#

Maybe I am being overly optimistic here, I'm just going by what we covered in HS 🤣

slim bone
#

But you've raised a good point, I probably should learn Numpy and maybe Python in general, I haven't touched it in two years.

lapis sequoia
past meteor
#

Vectors were covered in 3rd secundary which is the last year of middle school afaik

slim bone
#

Wow.

#

Care to tell us where you're from?

daring sphinx
past meteor
slim bone
#

Well I know where I want to grow my kids

daring sphinx
slim bone
#

(Joking, of course)

#

But no, that's seriously impressive

past meteor
#

Well, if you're not great with these topics I think you should interweave doing ML projects and coding up the algos from scratch

#

The former is the more relevant skill for jobs but the latter is a good test to see if you know what you're doing imo

lapis sequoia
past meteor
#

Some people can read a proof and grok it but personally implementing it helps me to be sure that I know what I'm doing

daring sphinx
past meteor
#

Naïve implementations of most algorithms are quite simple (the ones that are used in practice tend to have some nice tricks for numerical stability etc).

daring sphinx
#

I think if I'm gonna have to learn deployment, rest api and making a UI to use the model.

lapis sequoia
#

If you can create machine learning models based on real world problems I don't see why you can't make money.

slim bone
#

@past meteor
I'm trying to attribute it some thought, breaking down what each step consists of
I've come down with:

  1. Obtaining, parsing and feeding the algorithm data (Just, File I/O?)
  2. Breaking down the data, and implementing a forward propagation algorithm
  3. Calculating the cost of the function and propagate it backwards - correcting the weights involved with Gradient Descent

Does this sound good? Because, I do think I know how to implement each step individually
(I know this sounds rather trivial, just asking if I'm in the right direction)

#

Oh and, I'll probably do a handwriting recognition software for the sake of being able to use 3blue1brown's videos.
Maybe something a little less basic if I manage this one

#

Oh and if I managed to get your attention - Can(should?) I do this in Jupyter Notebooks Google Collab for the sake of portability?

past meteor
#

I'd start with linear regression

slim bone
#

For example, I can have a hundred, 16x16 images - but I'll need 257 weights no?

past meteor
#

I changed N to P for clarity, P is the number of variables and N the number of datapoints

slim bone
#

Is that a mistake?

past meteor
#

No, you have 1 weight (or coefficient) for each variable

#

And a bias (intercept) term, so P+1

slim bone
#

Ah. I think I misread - when you say:

Generate a dataset (P variables) that's easy to work with
You mean, a dataset with N images (In my project), where each image gives up P variables (In our case, pixels)?

#

Is that right?

past meteor
#

Yes

#

The issue with not just generating easy to work with data is that you can't sanity check as easily

#

If I were you I'd generate a NxP matrix at random and have a function you generate at random as well that defines the output

#

Why? You're pretty sure in this case that your loss is 0 if you did it correctly

slim bone
#

Also uhm, wouldnt I have to calculate a ton of partial derivatives (256) with my implementation?

past meteor
#

What I'm trying to say is that you could use fake data instead of the handwritten digits

#

But that's just my personal way of working

slim bone
#

What does fake data mean in this context?

past meteor
#

np.random

slim bone
#

Yeah but I need to feed the network concrete, labeled inputs no?

#

I feel like I'm missing what you're trying to say

past meteor
#

I just make a random matrix and then I make a vector that maps input to output

#

So if I have a random matrix that is N x P, I make a vector of size P (also random, but this time maybe random integers) and I do labels = np.dot(random_data, random_true_weights) + 42

#

At the end of your gradient descent the weights (coefficients) you learnt should be really close to that random_true_weights vector you made in the beginning

#

But maybe I'm just really confusing you so feel free to ignore this bit haha

slim bone
#

Perhaps a tiny bit, I think might need to fill in some knowledge gaps I have (I just realized for example, that I have no idea how to adjust the biases.)

past meteor
#

TL;DR is that when I'm doing "fundamental" exercises (ones close to theory) I like making my own toy datasets because it helps in understanding what's going on

slim bone
#

Ah I think I'd just rather implement something I'd be proud of

#

E.g., when I learned HTML5 I made an entire website about Corgis. Core memory for me 🙂

#

Granted, it took two days and was very easy, but there's something motivating about making something that's truly your own

past meteor
#

I forget people aren't like me and that's also totally okay haha

#

Tbh, you might as well pick up any guide from Tensorflow / Pytorch's website because the first ones build simple neural networks for the handwritten dataset

slim bone
#

The Pytorch one specifically

#

But I thought you recommended me to use Numpy before relying on libraries first? ^^;

past meteor
#

It depends how deep you wanna know this stuff, if you want to make something that makes you proud and then move on to other projects that might not even be related to ML/AI I'd just use Keras

#

If you want to stick with ML/AI for the long haul then coding a few of the basics yourself in Numpy (starting from linear regression) and then moving to Jax inbetween doing actual projects (could even be the titanic dataset) makes the most sense

#

That's just my 2 cents, I'm actually curious to know what the rest thinks.

slim bone
#

Hmm, I can give the whole picture about myself (I'll make it as brief as I can)

  1. I'm currently approaching my 2nd year of bachelors in CS (out of 3.5, probably)
  2. I've started pursuing this degree in order to obtain a masters (and maybe even a PhD) in ML
  3. I probably have about 2 months to burn right now (summer vacation) and I figured to myself that I might want to start learning about ML, now that I have some of the mathematical background nailed down.
  4. I've been trying to get into ML in the last few days, kind of drowned in tutorials, asked around for advice, got some advice, tried said advice, still have no idea how to start.
    *. I've been reading Pytorch documentation for a while, trying to implement something, but to no avail - I just don't understand the code written there for the life of me.

Thing is, I've watched the first 3blue1brown videos and I do feel like I understand the fundamentals of how the process works, at least from the math-ier side of things. So your idea sounded intruiging

tl;dr: Starting 2nd year of CS. Definitely in it for the long haul. Can't implement anything despite genuine efforts. Narrowing down the problem to "just do math" actually sounds cool

#

@past meteor Obligatory tag*

twilit tundra
#

100% agree on Keras for accessibility. In my experience, coding the basic components from scratch at least once is useful and gratifying but not sure there is a need to continue using them on actual projects instead of the available frameworks

slim bone
#

Also, the reason I'm using Pytorch is because it seems universities tend to favour it over TF for some reason.

#

At least where I live

#

I've heard Keras is easier, don’t know if that’s true*

twilit tundra
#

Yes basically: I like to think of neural networks as lego bricks. Once I know how they work and their purpose, I just want to be able to reuse them on each project without having to reimplement them. Having experience coding them is mostly useful when you want to introduce new custom components.

In terms of framework, Keras and Pytorch are basically equivalent until you go into specific models. Keras runs on top of tensorflow and makes it more accessible.

past meteor
#

It's not as scary as you think

#

I can implement it for you but then you wouldn't learn anything compared to struggling with it for max 2 days

twilit tundra
#

Keras and Pytorch work similarly. The most difficult part about learning either is the data science/ML design part. You can easily switch between the two

slim bone
slim bone
slim bone
#

As in, would I have a reason to switch in the near future if my university uses PyTorch?

twilit tundra
#

When I took the coursera ML course a few years ago, they had the implementation of a neural network from scratch on MATLAB as a workshop

slim bone
#

That sounds cool!!

twilit tundra
#

They are equally accessible I'd say

#

If you know you're going to use pytorch, then use pytorch

slim bone
#

Right

twilit tundra
#

I can't really point to any guide on pytorch but I'm sure there are a few beginner-friendly ones

slim bone
#

So, I should probably do the NumPy thing, see if I manage with that, and then come back to you folks if I have some questions regarding how to proceed?

twilit tundra
#

Sounds good

slim bone
#

I am rather skeptical this project would help me read the documentation better but it sounds like a nice thing to do regardless

#

And honestly at this point I just want to make something

slim bone
long canopy
#

I have a program that records all my window usages (ActivityWatch, if you know of it) and generates Events objects which contain the window name and amount of time spent on this window; a new event object is generated each time I switch windows.

I have 2 categories: Working and Not Working. I'd like to construct an AI that automatically assigns, or suggests, the proper classification for an Event object into one of these categories. For some Event objects, like those recording time playing a game, the categorization is obvious. But for Events related to internet browsing, the difference between Not Working and Working is not always clear cut and obvious, especially when, e.g., I'm on discord, facebook, reddit, or Google, which may be for either work or nonwork activities, or when I'm navigating websites I've never encountered before.

Does anyone have suggestions on a path that could get me started on eventually being able to program something like this?

twilit tundra
#

A common use case that is quite similar is logs anomaly detection: detecting behaviors in log entries that are different from usual. Not an expert on the subject but it could be a good place to start

desert oar
#

+1 on looking at anomaly detection literature for logs, maybe specifically "intrusion detection" in cybersecurity

#

i might consider starting by trying to classify work vs non-work in a fixed time window, before trying to determine when to start/stop an activity and generate a corresponding event

#

that might be a good way to get a feel for the data and the feature you want to develop, in a less-complicated modeling scenario

#

that said, you will also want to build up some structured thinking about what an "event" really is. it might be something like "i have reached a minimum confidence threshold that i switched from Non-Work to Work X minutes ago"

#

at which point some of the statistical and probabilistic reasoning might look similar to "changepoint detection" in time series analysis

#

you could also do something like chunk up the time into 1-minute sliding windows and look at the prevailing activity in each window

#

lots of options here, but also hopefully you can see how there are a lot of components to a project like this. it's a great idea, but by the time you're done with it, you might have put enough work into it to have a product you can sell vs. just a hobby project to learn machine learning

#

i can imagine someone making a side business out of selling an "AI time tracker" app like this, if one doesn't already exist

long canopy
#

already found a couple of books about anomaly detection with python, so I'll begin with that

#

will look into changepoint detection and time series analysis too

umbral charm
#

Hey im Using Pandas (im new too this package) and im trying to create a column called 'MAT' . HOwever, i want in this column to be such that it only have values of if MA1 == MA2, (both of these r columns on the same dataframe), and if not it will just give NaN for that index
How would i implement this, ive tried chat gpt and BIng ai but ity no good

sleek harbor
#

When u code, does ur code look "good"? Cus.. everything I do is such a chaotic mess, I literally can't ask for help with my code when smth doesn't work, cus it's such a mess.. and I usually just clean things up at the end, when everything works..

twilit tundra
twilit tundra
#

Something like
df[MAT] = df[MA1]
df.loc[df[MA1]!=df[MA2],MAT] = np.nan

umbral charm
#

ok

twilit tundra
#

First line to initialize the value, and then you filter on the indices that have MA1!=MA2

umbral charm
#

It is a slight bit more complicated than that tho, my numbers go to around 5 dp, and it needs so stay like that, however if the numbers are equal to eacother too 2 dp, that is allowed to go to MAT

twilit tundra
umbral charm
#

So like i want to compare the numbers to 2 dp, but i wanna display them to 5 dp

#

hopefully its just simple .round()

twilit tundra
#

Yeah you can juste replace by a rounding

#

In the pandas boolean

sleek harbor
twilit tundra
#

It's a huge mess when I use a notebook and I know I'm not going to show it to anyone else

past meteor
#

I obsess about mine looking good but it's not great either because it's at the expense of doing less stuff

twilit tundra
#

The easiest way to make your code easier to read is to use markdown to partitionate your code and add descriptions

#

And clean up cells that you defined just to check one variable once you're done with them

sleek harbor
twilit tundra
#

Or merge them together

past meteor
#

With plotly yes but I haven't used dash specifically

sleek harbor
past meteor
#

Notebooks are good but dangerous imo because out of experience, they lead to many globals and hard to understand / debug / change code

sleek harbor
umbral charm
umbral charm
#

Surley if i put it within the .loc it will try to locate a rounded number which therefore does not existr

twilit tundra
#

df.loc[df[MA1].round()!=df[MA2].round(),MAT] = np.nan

#

something like that, you're just transforming the 2 Series you're comparing

sleek harbor
# past meteor Do you use version control?

Yes, but.. probably not often enough. I commit when I finish like, a "chapter" of what I'm doing, like a big step. Should do it more often, but.. ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

umbral charm
#

would that not try to locate the rounded number, and not the initial number

twilit tundra
#

The first term is a boolean

#

It returns a series that is True on the indices you're filtering it on

#

df["MAT1"].round() != df["MAT2"].round() = pd.Series([False, False, False, True]) for instance

#

So when you use loc, it will select the last row

past meteor
#

At each checkpoint you reach one of those, commit

umbral charm
#

OHH

#

I SEEE

#

Pandas is so different

umbral charm
#

fuck it ill just add 2 more columns with the rounded values

twilit tundra
#

Did you put an argument in round()?

umbral charm
#

2 for 2 dp

#

Its not finding all the values, Its finds like 5 values which match, but when i just iterate throught it i find about 13 that match

odd meteor
twilit tundra
#

Maybe your argument is not correct for the computation you're looking for (putting 1 instead maybe?)

umbral charm
#
TSLA['Boo'] = TSLA['MA20']
TSLA.loc[TSLA['MA20'].round() != TSLA['MA80'].round(), 'Boo'] = np.nan
print(TSLA['Boo'])
for i in TSLA['MA20']:
    for j in TSLA['MA80']:
        if round(i, 2) == round(j, 2):
            print(i, j)
twilit tundra
#

You're comparing every value in col1 with every value in col2 here

#

If you were trying to keep MAT1 if there is at least one MAT2 that approximates it, it's another formula

umbral charm
#

I see

#

its because even for different indexes the values of MA1 and MA2 r the same so they compare them from 2 different indexes

#

so your oriignal code only compares MA1 and MA2 from the same index row correct?

twilit tundra
#

Yes

umbral charm
#

but when i did my iteration it couldve been comparing MA1 from index 5 and MA2 from index 12

twilit tundra
#

Yes

#

Which one is the one you're trying to do

umbral charm
#

Compare in the same index