#data-science-and-ml | Python | Page 102

final kiln Feb 14, 2024, 2:04 PM

#

I don't think they'll be able to meet my salary expectations anyway

lofty thorn Feb 14, 2024, 2:04 PM

#

let's go

final kiln Feb 14, 2024, 2:04 PM

#

I don't trust this equity split stuff

lofty thorn Feb 14, 2024, 2:06 PM

#

is it a HR round or the first step to get the job in that company ?

final kiln Feb 14, 2024, 2:06 PM

#

no this is the third round already

#

there's a fourth and a fifth ._.

lofty thorn Feb 14, 2024, 2:07 PM

#

oh

final kiln Feb 14, 2024, 2:07 PM

#

gotta take it with stride

lofty thorn Feb 14, 2024, 2:08 PM

#

final kiln no this is the third round already

what they asked you ?

final kiln Feb 14, 2024, 2:08 PM

#

I thik there's a sixth

#

no it's 5, and the fifth I have to travel to their office, which I don't mind since it's in a cool place

final kiln Feb 14, 2024, 2:09 PM

#

lofty thorn what they asked you ?

first was leetcodes, second was hr

lofty thorn Feb 14, 2024, 2:09 PM

#

you seem very unsure

final kiln Feb 14, 2024, 2:10 PM

#

third is solving a physics problem, fourth is more leetcode and fifth is traveling to their place

#

i speak Spanish but im not gonna answer

lofty thorn Feb 14, 2024, 2:24 PM

#

oops

#

why 'i' = 1 is written?

tidal bough Feb 14, 2024, 2:29 PM

#

That's just a sum as i goes from 1 to n (inclusive).

lofty thorn Feb 14, 2024, 2:29 PM

#

oh ok

#

i though the first item need to be one

wooden sail Feb 14, 2024, 2:30 PM

#

the first value i takes is 1

#

math notation is something one reads and interprets, just like any other language

#

you would read this as "the sum of elements x_i, where i goes from 1 to n"

#

so, x1 + x2 + ... xn

vapid minnow Feb 14, 2024, 3:27 PM

#

Hey, i've made a 3D plot with matplotlib and i would like to know if there is some way to enable antialiasing for it?

#

I've tried passing antialiased=True as a parameter for plot but it doesn't seem to make any difference

#

Here's the code of the graph:

def plot_graph3D(func, zlim, name):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')

    x = np.linspace(0.0, 2.0, GPAPH_RESOLUTION)
    y = np.linspace(0.0, 1.0, GPAPH_RESOLUTION)

    X, Y = np.meshgrid(x, y)
    Z = func(X, Y)

    ax.view_init(15, GRAPH_ANGLE)
    ax.plot_surface(X, Y, Z, cmap='viridis', antialiased=True)

    ax.set_zlim(0.0, zlim)

    ax.set_xlabel('Pixel Luminance')
    ax.set_ylabel('Average Luminance')
    ax.set_zlabel('Output Luminance')

    if name is not None:
        plt.savefig(DIR + 'graph3D-' + name + '.pdf', bbox_inches='tight')

    plt.show()

vapid minnow Feb 14, 2024, 3:30 PM

#

vapid minnow Hey, i've made a 3D plot with matplotlib and i would like to know if there is so...

Also could someone help me understand why the image is cropped? Even when i save it, the label of the Z axis is cropped. I thought bbox_inches='tight' would fix that but it doesn't

lusty lotus Feb 14, 2024, 3:39 PM

#

hello! i am a 16 year old living in the uk. i want to have some work experience in research before i enter university. i am highly interested in reinforcement learning. i have successfully replicated a fairly well-known research paper and currently refining the code (i have a functional MVP but currently writing code to hopefully speed up the learning). however i consider my ml background to be "shaky" as i still lack some math understanding in some fundamental aspects of ml. i currently use pytorch, which abstracts some math away. would this be ok with my volunteering work experience, like i can just work behind a library?

i would want to have some work experience with some tech companies (i can volunteer). does anyone know any good companies/programmes that i can join to get some real-world experience? i know that at least for some us-based highschoolers there are some programmes for them and internships are usually for undergraduates. can someone please recommend some international/uk based programmes/opportunities for me? tysm :D

left tartan Feb 14, 2024, 3:51 PM

#

lusty lotus hello! i am a 16 year old living in the uk. i want to have some work experience ...

You might also want to ask in #career-advice

jagged latch Feb 14, 2024, 4:06 PM

#

I need a bit of help on something related to Plotly Dash. I am making an app, but it appears that I'm running into an issue where not all the data is being displayed when the data is being entered into a new dataframe from the text box which runs code to generate df_2. Here's the background. When I first run the program, I use Tkinter to enter an initial date for when the app first goes live. The data looks good when I do that. The problem is when I enter the same exact date in the text box and click the button, which should execute the Callback, which should then call the previous functions to generate a new dataframe, now a lot of the data is missing. Theoretically, nothing should change because it's the same exact query being run. Most of the data from df is the same, but when data is supposed to be saved into df_2 at certain times, it just does not do it, when it did it prior to the app going live with the initial date and continues like it never even saw said row in question even though it did. What could be causing such an issue when running all those functions from the Callback?

#

Here's the syntax I'm using to add rows to df_2 if it helps where A, B, C, D are variables defined in the function:

df_2.loc[len(df_2.index)] = [A, B, C, D]

#

Many times that line of code would appear like it's being ignored when the functions using this line are being called from the Callback.

#

It does not get ignored when I first run the program and enter the date. This problem I noticed only happens when the date is entered in the text box of the Dash App by the user.

#

In other words. It happens when everything is executing inside of the Callback instead of outside of it, where I get the expected results.

carmine pecan Feb 14, 2024, 4:34 PM

#

pithink

#

Hi I need help sending an email to a website. I want to scrape their data and use it to train my AI for a research paper/whatever that paper you send to confrences is called.

What should the email contain? They already told me I can do it via a call but I want to have an actual email.

brave arch Feb 14, 2024, 4:49 PM

#

Hello, I need help with python. I have a code but i am unable to use beautiful soup and scrape from website the column level data

river cape Feb 14, 2024, 4:53 PM

#

Hey in SVR , we call the points which are outside the epsilon insensitive tube as support vectors right?
and for training model , we use the .fit() method right? Should it always have a 2D array as its input?

river cape Feb 14, 2024, 5:46 PM

#

brave arch Hello, I need help with python. I have a code but i am unable to use beautiful s...

Yes\

past meteor Feb 14, 2024, 5:55 PM

#

river cape Hey in SVR , we call the points which are outside the epsilon insensitive tube ...

Yes all points outside of the epsilon tubes are support vectors. In classification all misclassified points are also support vectors.
I think most models do some dot product internally so it can't fit when you give it nx1 feature so they require you to reshape it to nx1

river cape Feb 14, 2024, 5:56 PM

#

past meteor 1. Yes all points outside of the epsilon tubes are support vectors. In classific...

Noice btw does reshape convert a 1D array to 2D arraY?

past meteor Feb 14, 2024, 5:56 PM

#

river cape Noice btw does reshape convert a 1D array to 2D arraY?

Try it out in the console 🙂

river cape Feb 14, 2024, 5:57 PM

#

past meteor Try it out in the console 🙂

I did

past meteor Feb 14, 2024, 5:57 PM

#

And then call .shape on it

river cape Feb 14, 2024, 5:57 PM

#

It gives it in the form of a 2D array

past meteor Feb 14, 2024, 5:57 PM

#

try a bunch of things, try muiltiplying two vectors of shape nx1 in numpy as well

#

You gotta run all the code to get the intuitions

river cape Feb 14, 2024, 5:59 PM

#

past meteor try a bunch of things, try muiltiplying two vectors of shape nx1 in numpy as wel...

if you see this

#

I want to make that 1D array to be vertical?

past meteor Feb 14, 2024, 5:59 PM

#

You don't need to write print in notebooks btw

river cape Feb 14, 2024, 5:59 PM

#

Is there any way>

past meteor Feb 14, 2024, 5:59 PM

#

also try calling .shape here on it

#

(Look up what .reshape(-1) does as well)

wooden sail Feb 14, 2024, 6:01 PM

#

i would note that numpy ndarrays of dimension 1 do not behave like proper vectors

#

you can multiply them from the left or right with no issues

past meteor Feb 14, 2024, 6:01 PM

#

That's true

#

Even so, that's something you need to run to find out

#

Compared to someone telling you

wooden sail Feb 14, 2024, 6:02 PM

#

hmmm

river cape Feb 14, 2024, 6:02 PM

#

past meteor (Look up what `.reshape(-1)` does as well)

Isnt this the same>?

river cape Feb 14, 2024, 6:03 PM

#

past meteor also try calling .shape here on it

'tuple' object is not callable . Its giving me this error

past meteor Feb 14, 2024, 6:03 PM

#

Ah, it's (-1, 1)

#

You should be able to call .shape on the array you get

final kiln Feb 14, 2024, 6:13 PM

#

# Calculation
number_of_neurons = 86_000_000_000  # 86 billion neurons
average_synapses_per_neuron = 5_000  # average synapses
parameters_per_synapse = 40 # 10 parameters for each 4 modes (neuro transmiter types, receptor types, synaptic strength, other factors (modulatory receptors, ion channels, post synaptic properties, etc)

total_parameters = number_of_neurons * average_synapses_per_neuron * parameters_per_synapse
total_parameters

#

a fermi estimate for the number of parameters in the human brain

#

17200000000000000

#

17.2e15 if I counted the zeros right

final kiln Feb 14, 2024, 6:42 PM

#

But ngl, I might've over engineered my pipeline

#

Now that I'm able to fit the entire dataset in ram

#

The code will probly be useful eventually, so I'm not too bummed out

#

Maybe now it fits, but nothing is stopping me from just getting more data

jagged latch Feb 14, 2024, 6:55 PM

#

jagged latch I need a bit of help on something related to Plotly Dash. I am making an app, bu...

Does anyone have any idea on what's causing this bug?

final kiln Feb 14, 2024, 6:59 PM

#

jagged latch Does anyone have any idea on what's causing this bug?

You might have more luck if you show some code

jagged latch Feb 14, 2024, 7:00 PM

#

final kiln You might have more luck if you show some code

I would if this wasn't for work though.

final kiln Feb 14, 2024, 7:01 PM

#

jagged latch I would if this wasn't for work though.

I mean it's hard to debug code through a natural language description of it

desert oar Feb 14, 2024, 7:01 PM

#

jagged latch I would if this wasn't for work though.

setting up reproducible examples is an important skill. if you can't isolate the problem enough to explain it to someone else, you might not understand the problem that well yourself. often the process of figuring out how to reproduce a problem also elucidates the root cause, and from there a solution.

#

that's been my experience at least, and i know i'm not alone in that.

final kiln Feb 14, 2024, 7:03 PM

#

That moment when I realize most of my job is debugging stuff

desert oar Feb 14, 2024, 7:05 PM

#

truly. 70% debugging, 20% data cleaning, 5% talking to people, 5% actual data analysis and machine learning

final kiln Feb 14, 2024, 7:08 PM

#

so good to hear that, was starting to think I was doing something wrong ahah

#

There's just like, infinite configuration. In the GitHub actions stuff alone there were like 10 unexpected things that prevented them from reaching the 24h run time mark

#

Not even in my code, just libs not working as expected, or settings that I didn't know about, like a timeout at 6h, that one was unfortunate cuz I could've seen it coming if I read the docs

desert oar Feb 14, 2024, 7:16 PM

#

final kiln so good to hear that, was starting to think I was doing something wrong ahah

it's the current plague of doing data science in industry. data scientists are under-supported by engineers and devops people. so you have this perverse trend of trying to hire data scientists with unicorn-level credentials to do 3 different jobs at once, instead of hiring 2 extra people to collaborate with the data scientist and get a lot more value out of the whole team. save 2/3 on payroll but only get 1/4 productivity, it's a bad deal in the end for everybody (including and especially the data scientist who doesn't get to actually do their own job and their resume / skills atrophy over time).

versed pilot Feb 14, 2024, 7:18 PM

#

I'm a data analyst and it's not much better

#

There are some data engineers in the company but they have their own work to focus on, mostly I have to do my own full stack end to end tasks, from system administration , to etl scripts, to cloud platform work, to SQL, python and Tableau

desert oar Feb 14, 2024, 7:20 PM

#

that's particularly bad

#

80-90th percentile bad

past meteor Feb 14, 2024, 7:21 PM

#

desert oar it's the current plague of doing data science in industry. data scientists are u...

this was a topic of dicussion a day or two ago

#

I think companies doing this aren't necessarily wrong

#

DS folk I meet just think they can get away with knowing 1 thing

versed pilot Feb 14, 2024, 7:22 PM

#

On the other hand companies are trying to get away with having small teams and fewer people.

past meteor Feb 14, 2024, 7:22 PM

#

And that's really a thing for the vast majority of jobs, especially if you're not a specialist in a large company

versed pilot Feb 14, 2024, 7:22 PM

#

you get to a point when that is not very productive

past meteor Feb 14, 2024, 7:22 PM

#

Yeah, if they're not pulling a lot of revenue what else are you supposed to do?

desert oar Feb 14, 2024, 7:22 PM

#

don't hire people that you can't make use of

versed pilot Feb 14, 2024, 7:22 PM

#

I'm a jack of all trades and it's very hard to become a master of any

past meteor Feb 14, 2024, 7:22 PM

#

Unless your argument is: don't get a data scientist untill you're at a big scale

desert oar Feb 14, 2024, 7:23 PM

#

not at big scale, but don't get a data scientist if you don't have at least 1 engineer that can help support getting their stuff into prod

past meteor Feb 14, 2024, 7:23 PM

#

Or they hire someone that can do a bit of both? 🤔

desert oar Feb 14, 2024, 7:23 PM

#

it never works out that way

past meteor Feb 14, 2024, 7:23 PM

#

I think it's a myth that having more than responsibility means you're doing twice the work

desert oar Feb 14, 2024, 7:23 PM

#

i can do both. i've done both, professionally. unequivocally it's worse when you expect someone to do both.

past meteor Feb 14, 2024, 7:23 PM

#

you're doing half of each

desert oar Feb 14, 2024, 7:23 PM

#

that's mythical man-month thinking

past meteor Feb 14, 2024, 7:24 PM

#

I do both and I don't do twice the job tbh

desert oar Feb 14, 2024, 7:24 PM

#

right

#

case in point, no?

#

it's not about doing twice the job. it's about doing less than half of each job

#

if i split my time 50/50 i end up doing 30% of each job

#

that leaves 40% of the job not done, or backlogged

#

yet i'm still spending the same 100% of hours

past meteor Feb 14, 2024, 7:25 PM

#

It's all data, I never understood the distinction

versed pilot Feb 14, 2024, 7:25 PM

#

there's the time spent context switching, and missed synergies

desert oar Feb 14, 2024, 7:25 PM

#

then maybe you're not doing what i'm doing

past meteor Feb 14, 2024, 7:25 PM

#

I don't see the context switching, data is data 🤷

desert oar Feb 14, 2024, 7:25 PM

#

also just the fact that it's ridiculous to expect a data scientist to also be a software engineer

versed pilot Feb 14, 2024, 7:25 PM

#

If I work on notebooks I have a variety of projects I can work on

#

If I switch from notebooks to sql to tableau to unix sysadmin

#

that's context switching 😉

desert oar Feb 14, 2024, 7:26 PM

#

past meteor I don't see the context switching, data is data 🤷

i'm not talking about "data" though

#

is writing an HTTP API and setting up a CI/CD pipeline "data" work in any non-trivial sense?

#

we literally pay kids $100k out of college to do that and only that, full time

#

here i am doing that and trying to also do data science and keeping an eye on the ETL pipelines

past meteor Feb 14, 2024, 7:27 PM

#

My interest, is in making things that work. If that requires an HTTP API, CI/CD, an ETL, ... so be it personally

#

What I see of a lot of DS is no interest in making things that work

desert oar Feb 14, 2024, 7:27 PM

#

same. that's why i do those things and know how to do them. i still think it's stupid to expect to hire someone who can do that

#

you and i are unicorns

past meteor Feb 14, 2024, 7:28 PM

#

The interest is in doing stuff in notebooks

desert oar Feb 14, 2024, 7:28 PM

#

it's very very well known across basically all industries that 1 person doing 3 jobs in 1/3 time ratios is less effective than 3 people specializing

past meteor Feb 14, 2024, 7:28 PM

#

ML model in a notebook, plot in a notebook 👎

#

I don't want to make any excuses for that

desert oar Feb 14, 2024, 7:28 PM

#

i use notebooks 🤷‍♂️ not sure what that has to do with it

#

i don't expect them to run in production

past meteor Feb 14, 2024, 7:28 PM

#

I also use notebooks, that's not what I mean

desert oar Feb 14, 2024, 7:29 PM

#

but even if i did, i don't see why it matters. tests are tests, pipelines are pipelines, etc.

past meteor Feb 14, 2024, 7:29 PM

#

I mean, no interest in going to prod

desert oar Feb 14, 2024, 7:29 PM

#

is it no interest in going to prod? or is it lack of interest in doing what should be someone else's job? another specialist's job, so you can focus on your own specialty?

past meteor Feb 14, 2024, 7:29 PM

#

Because some of my colleagues believe their responsibilities start at getting a clean dataset and end at producing a PoC

versed pilot Feb 14, 2024, 7:29 PM

#

desert oar it's very very well known across basically all industries that 1 person doing 3 ...

Didn't Adam Smith get his face on an english bank note for describing division of labour during the industrial revolution? 😉

past meteor Feb 14, 2024, 7:30 PM

#

When your responsibility should be: getting something that works. If there's no one to bail you out, then you gotta do it yourself imho

desert oar Feb 14, 2024, 7:30 PM

#

past meteor Because some of my colleagues believe their responsibilities start at getting a ...

i mean, sure? but if you extend this line of reasoning, you should also criticize software devs for not also being devops and DBA

past meteor Feb 14, 2024, 7:30 PM

#

sure I do

desert oar Feb 14, 2024, 7:30 PM

#

yes, at smaller scales, it pays off to be a generalist and to hire generalists

past meteor Feb 14, 2024, 7:30 PM

#

Depending on the scale of their company

#

You can 100 % blame them

desert oar Feb 14, 2024, 7:30 PM

#

have you ever worked with a good DBA?

past meteor Feb 14, 2024, 7:30 PM

#

What I'm saying is DS do this at any scale

#

And that's unreasonable

desert oar Feb 14, 2024, 7:31 PM

#

sure, maybe that's entitled on their part. but at the same time, it's ridiculous to expect this level of multi-specialization as table stakes for all DS

versed pilot Feb 14, 2024, 7:31 PM

#

desert oar have you ever worked with a good DBA?

Nothing like a good DBA.

river cape Feb 14, 2024, 7:31 PM

#

sc2.inverse_transform(regressor.predict(sc1.transform([[6.5]])).reshape(-1,1))
the above statement is used to predict the result of an svr regression algo

surreal sedge Feb 14, 2024, 7:32 PM

#

hu

river cape Feb 14, 2024, 7:32 PM

#

Is it necessary to use a reshape()

past meteor Feb 14, 2024, 7:32 PM

#

I think there's just like an overspecialisation of DS folk

#

I don't think any CS niche overspecialises this much

desert oar Feb 14, 2024, 7:33 PM

#

DS isn't CS
i totally disagree, i think there's an unrealistic expectation of DS people also being software engineers and unless we pay them 2x what software engineers make, it's just employers trying to be cheap

past meteor Feb 14, 2024, 7:33 PM

#

What

#

But you're not doing twice the job 😭

desert oar Feb 14, 2024, 7:33 PM

#

i spent 6 years in school studying math, statistics, and machine learning. you now want me to also become a professional software engineer?

final kiln Feb 14, 2024, 7:33 PM

#

I've been mostly working at startups, and I do really like doing a bit of everything but, I've decided that I'm not a substitute for a team, there's a point at which it's just not fair. I personally don't see the fun in just doing one thing, but I also see the line in the sand as an extremely important thing for my own well being

desert oar Feb 14, 2024, 7:33 PM

#

that's two careers and two specializations. i expect to be paid double accordingly.

past meteor Feb 14, 2024, 7:34 PM

#

The expectation is just that people can deliver results

#

Every role has this problem, it's not unique to DS

desert oar Feb 14, 2024, 7:34 PM

#

but that's a very startup-centric small-scale mindset

past meteor Feb 14, 2024, 7:35 PM

#

Not necessarily imho

desert oar Feb 14, 2024, 7:35 PM

#

there is absolutely a niche for people who can "deliver results"

past meteor Feb 14, 2024, 7:35 PM

#

It's definitely compounded by the fact that the slice of the process DS people do is super narrow

#

Narrower than other roles

#

Like pure pure DS roles. You need a large supporting cast for that

#

I'm just pleading for knowing more than 1 thing is all, just knowing DE is already a step up

left tartan Feb 14, 2024, 7:37 PM

#

Ivory tower DSers

past meteor Feb 14, 2024, 7:37 PM

#

At my job I grew into a lot of tasks because all the rest just says no

left tartan Feb 14, 2024, 7:38 PM

#

past meteor At my job I grew into a lot of tasks because *all* the rest just says no

Ever consider going into consulting?

past meteor Feb 14, 2024, 7:38 PM

#

That was the first contract I signed but I got cold feet and tore it up

#

I might in the future

wooden sail Feb 14, 2024, 7:41 PM

#

i'm kinda on salt rock's camp here

#

on a time constraint, time spent learning math is time spent not learning software eng

#

i would say that's a job for 2 people at least

#

the code and software optimizations you learn on one side are completely unrelated to the ones on the other

past meteor Feb 14, 2024, 7:42 PM

#

The truth is that for most roles there's diminishing returns on that math vs. software

wooden sail Feb 14, 2024, 7:43 PM

#

in general, a lot of "DS" positions really just need software eng

past meteor Feb 14, 2024, 7:43 PM

#

If you go that deep then you should really only aim for the ones where the diminshing returns aren't doing you in

wooden sail Feb 14, 2024, 7:43 PM

#

people don't even know what DS and ML are in the first place

versed pilot Feb 14, 2024, 7:43 PM

#

But a lot of the data work requires a different kind of software engineering. Optimal SQL or pandas is very different to the skills you learn with C/C++/Java type software engineering

final kiln Feb 14, 2024, 7:43 PM

#

But there's a reason why we teamup right, a team goes farther and for a team to work you gotta have lanes

wooden sail Feb 14, 2024, 7:44 PM

#

final kiln But there's a reason why we teamup right, a team goes farther and for a team to ...

this is what SHOULD happen, but salt rock lamp was complaining about the opposite being the case when oyu look for job openings

past meteor Feb 14, 2024, 7:44 PM

#

But it's definitely a two way street that's what I meant in the discussion tbh

final kiln Feb 14, 2024, 7:44 PM

#

Yeah I'm aware of what happens, especially in the smaller companies

past meteor Feb 14, 2024, 7:44 PM

#

Because from observation DS are unique in the fact that they say "not my job" and don't grow towards the mismatch in the hire

versed pilot Feb 14, 2024, 7:45 PM

#

Actually a lot of DS are moving into Data Engineering

#

getting AWS/GCP certifications, learning dbt

past meteor Feb 14, 2024, 7:45 PM

#

Unless the point all of you are trying to make is that companies should hire less data scientists

final kiln Feb 14, 2024, 7:45 PM

#

I don't know, I had to learn to say no. Not because I'm not willing to take additional tasks but because I'll very quickly become overworked

wooden sail Feb 14, 2024, 7:46 PM

#

past meteor Unless the point all of you are trying to make is that companies should hire les...

almost kinda, yeah

#

a lot of them don't really need it imo

versed pilot Feb 14, 2024, 7:46 PM

#

well, they should think whether they need a data engineer first

#

before going for the data scientist

final kiln Feb 14, 2024, 7:46 PM

#

I'm very productive in general and that creates this illusion even to myself that I can just keep on doing more stuff, but it's not the case

wooden sail Feb 14, 2024, 7:46 PM

#

just basic stats would take them a long way, which doesn't require heavy ds

past meteor Feb 14, 2024, 7:46 PM

#

To come full circle

#

What I'm trying to say is, move some of those math / stats hours to software

#

Or work at google / do a PhD / ...

versed pilot Feb 14, 2024, 7:47 PM

#

wooden sail just basic stats would take them a long way, which doesn't require heavy ds

that's where I've been trying to focus on as a data analyst, moving into hypothesis testing, linear regression etc.

#

but business people don't always get statistics

#

don't like uncertainty

past meteor Feb 14, 2024, 7:48 PM

#

We actually had a breath of fresh air 2 hires ago

wooden sail Feb 14, 2024, 7:48 PM

#

past meteor What I'm trying to say is, move some of those math / stats hours to software

this sounds about right

past meteor Feb 14, 2024, 7:49 PM

#

The person that we hired wasn't married to ML (and was previously a software engineer)

#

They ended up building an awesome ML product, one of the best we have on offer

#

Because they're willing to do what it takes

wooden sail Feb 14, 2024, 7:49 PM

#

that kinda piggy bags on what i said though

#

that software eng is truly what is usually needed

past meteor Feb 14, 2024, 7:49 PM

#

I think we're in agreement

wooden sail Feb 14, 2024, 7:50 PM

#

you could probably even do with a single ds person that doesn'T even code, but regularly participates in the meetings where stuff is arranged with the others

past meteor Feb 14, 2024, 7:50 PM

#

I just don't agree with the "I can do two things so I need to be paid 2x" argument

wooden sail Feb 14, 2024, 7:50 PM

#

yeah i guess that's unrealistic expectations, but from both sides

past meteor Feb 14, 2024, 7:51 PM

#

Software isn't a monolith, software engineers themselves need to do 2+ things all the time (frontend, backend, devops, data, ...) and none of them makes this argument tbh

wooden sail Feb 14, 2024, 7:51 PM

#

the employer not knowing what to ask for, and DS people being reluctant

past meteor Feb 14, 2024, 7:51 PM

#

wooden sail the employer not knowing what to ask for, and DS people being reluctant

I was like this before as well and what I did was ask a million questions in interviews

#

There's still things I more or less "refuse" to do because I don't enjoy them and I'm not good at them either, I'm just transparent about it

#

If anyone decides to not here me on the basis of that both of us win

wooden sail Feb 14, 2024, 7:52 PM

#

lol

final kiln Feb 14, 2024, 7:53 PM

#

past meteor Software isn't a monolith, software engineers themselves need to do 2+ things al...

But wouldn't it be unfair to hire someone as a data scientist and then have them do 90% frontend

#

Where's the line that seperates the roles

past meteor Feb 14, 2024, 7:53 PM

#

final kiln But wouldn't it be unfair to hire someone as a data scientist and then have them...

That's true but that's a super L for both sides

#

I can imagine the DS will be terrible at frontend

#

This doesn't happen tho

#

My next project is on HCI / explainable AI. The very first thing I'll do is make a frontend we'll use for the experiments.

#

My focus is on making cool stuff and if there's no one else to do it, then I step up. Obviously it'll take me longer than a specialist, but at the end we do have something tangible which is what matters

wooden sail Feb 14, 2024, 7:56 PM

#

zestar, maker of cool stuff

past meteor Feb 14, 2024, 7:56 PM

#

Yeah, maybe I should put that on my linkedin

#

And take away data scientist or whatever I have, I have been thinking of it 😛

final kiln Feb 14, 2024, 7:57 PM

#

I guess the fear is to be stuck doing things that don't further what the person feels should be their career, and this is a pretty strong thing because a lot of people derive purpose from their work

left tartan Feb 14, 2024, 7:58 PM

#

final kiln I guess the fear is to be stuck doing things that don't further what the person ...

I think there’s a bit of hubris involved in presupposing a career path.

wooden sail Feb 14, 2024, 7:58 PM

#

i guess DS people hit this wall often because its a buzz word that was turned into a career in unis for whatever reason

left tartan Feb 14, 2024, 7:58 PM

#

Lots of people think ML -should- be their career path, and it (imo) won’t be for most of them.

#

I think reality is: careers are shaped primarily by opportunity, some luck, and a bit of preparation

past meteor Feb 14, 2024, 8:01 PM

#

My organized thoughts will be written down about this. I have many sketches (actual drawings/figures) of what I think the problem is

#

Will take me a couple of months to write it out, but then I'll let all of you know

wooden sail Feb 14, 2024, 8:02 PM

#

zestar approaching us in 3 months with a large mirror
"look"

left tartan Feb 14, 2024, 8:03 PM

#

I’m wondering which other hype cycles have been like this.

wooden sail Feb 14, 2024, 8:04 PM

#

bitconnect

past meteor Feb 14, 2024, 8:04 PM

#

The gist of what it's going to be if you look at the N % most valuable work in an org it likely needs to be very large to sustain someone with a very lopsided skillset (I'll use radar charts for these).

left tartan Feb 14, 2024, 8:04 PM

#

The dot com boom was just general SWEing, but I guess it mainstreamed web dev

versed pilot Feb 14, 2024, 8:05 PM

#

yep, web dev hype in the late 90s

left tartan Feb 14, 2024, 8:05 PM

#

past meteor The gist of what it's going to be if you look at the N % most valuable work in a...

Maybe chip design is comparable to what you just said.

versed pilot Feb 14, 2024, 8:05 PM

#

like Verilog, layout etc.?

past meteor Feb 14, 2024, 8:05 PM

#

in what sense?

left tartan Feb 14, 2024, 8:05 PM

#

past meteor The gist of what it's going to be if you look at the N % most valuable work in a...

There’s an interesting stat around the number of chip designers / phds to produce successively modern chips: it’s becoming ever more expensive

#

One sec( there’s a talk…

wooden sail Feb 14, 2024, 8:06 PM

#

idk if that's the best comparison though

versed pilot Feb 14, 2024, 8:06 PM

#

Those chip companies just focus on chips though, they'll have a hardware team, a layout team, an embedded software team etc.

wooden sail Feb 14, 2024, 8:06 PM

#

cuz there's also the current struggle that too few people study electronics compared to what the market would like (in the chip design end)

past meteor Feb 14, 2024, 8:06 PM

#

past meteor The gist of what it's going to be if you look at the N % most valuable work in a...

You need less data scientists or a more balanced radar chart such that they can go out and do more valuable work when a pure DS project isn't in the top N %

wooden sail Feb 14, 2024, 8:06 PM

#

and the overall trend of people studying STEM decreasingly

left tartan Feb 14, 2024, 8:07 PM

#

https://m.youtube.com/watch?v=olXire09ZnE&list=PL8uoeex94UhFcwvAfWHybD7SfNgIUBRo-&index=46&pp=iAQB

YouTube

EuroPython Conference

The Future of Microprocessors — Sophie Wilson

[EuroPython 2023 — Forum Hall on 2023-07-20]

https://ep2023.europython.eu/session/the-future-of-microprocessors

The Future of Microprocessors - a talk about the history of microprocessors, how we got here and what might happen next. There will be two laws, one equation, some graphs and a particle beam weapon out of Star Trek.

This work is lic...

▶ Play video

wooden sail Feb 14, 2024, 8:08 PM

#

what's the TL;DW

past meteor Feb 14, 2024, 8:08 PM

#

left tartan https://m.youtube.com/watch?v=olXire09ZnE&list=PL8uoeex94UhFcwvAfWHybD7SfNgIUBRo...

I'll have a look!

versed pilot Feb 14, 2024, 8:08 PM

#

ok she's a bit of a legend

left tartan Feb 14, 2024, 8:08 PM

#

versed pilot ok she's a bit of a legend

Yah, and this was a great talk, very accessible

versed pilot Feb 14, 2024, 8:09 PM

#

but I think in her line of work it's as I mentioned above, lots of verilog people, lots of embedded software people (including assembly)

#

and some people who are more into layout etc.

left tartan Feb 14, 2024, 8:09 PM

#

wooden sail what's the TL;DW

I dunno how relevant but idea had to do with the declining ROI of chip research

#

Similarly, I think there’s some limit to the returns in DS in a single organization. Maybe a bit of a stretch.

versed pilot Feb 14, 2024, 8:10 PM

#

It depends, you can do lots of R&D to develop the next processor or ASIC

#

but it's much harder to push CMOS technology further

#

or give up on CMOS and come up with a replacement

#

not sure how this compares to data science

wooden sail Feb 14, 2024, 8:11 PM

#

past a certain point what you need is a team of physicists to research new fancy stuff and separately, engineers to try implementing it

versed pilot Feb 14, 2024, 8:11 PM

#

CMOS is hitting limits in terms of electrons tunelling through thin layers of insulator

#

quantum mechanics and all that

#

so you need a paradigm shift

#

and that was parallelisation, multicore, GPUs etc.

past meteor Feb 14, 2024, 8:12 PM

#

Anyhow, I apologize for the controversial opnions! 😄

#

Esp to Salt Rock if he's still reading

#

It's a difficult topic

versed pilot Feb 14, 2024, 8:13 PM

#

data science had the opposite with GPUs, suddenly a world of possibility opened

final kiln Feb 14, 2024, 8:13 PM

#

left tartan I think there’s a bit of hubris involved in presupposing a career path.

I'm not sure I agree with this. People get hired for certain roles which presuppose a given set of tasks. Someone hired as a data scientist refusing to do backend or anything else not in the job description doesn't strike me as something out of the ordinary.

past meteor Feb 14, 2024, 8:14 PM

#

Just when I thought I was out they pull me back in :/

#

We agree that people exaggerated with stuff like microservices because Google did it yeah? "Google does it, they're big so if we do it, we'll be big!"

final kiln Feb 14, 2024, 8:15 PM

#

Just to clarify tho, I don't think it's healthy to do just one thing, but there's no arrogance in pursuing self determination in a career

jagged latch Feb 14, 2024, 8:15 PM

#

final kiln That moment when I realize most of my job is debugging stuff

It looks like I found the root of my problem, which appears to happen as a result of defining another Pyodbc connection inside a function while I defined one earlier in the program.

past meteor Feb 14, 2024, 8:15 PM

#

Who says it's not the same for data science 😭

#

At least in the way where it's frequently touted

versed pilot Feb 14, 2024, 8:17 PM

#

people invented Hadoop to mimic Google's file system, Big Table etc.

#

and that was the data science fad of the 2010s

#

ok, one of them

brave arch Feb 14, 2024, 8:17 PM

#

hello, I need a help with python

final kiln Feb 14, 2024, 8:17 PM

#

final kiln Just to clarify tho, I don't think it's healthy to do just one thing, but there'...

I do think it's wrong to push someone out of their role. If the person wasn't hired to do X, there should be a discussion before assigning those tasks.

brave arch Feb 14, 2024, 8:17 PM

#

for scraping a website, I am unable to get the subcolum data

past meteor Feb 14, 2024, 8:18 PM

#

brave arch for scraping a website, I am unable to get the subcolum data

ask away

brave arch Feb 14, 2024, 8:18 PM

#


import requests
from bs4 import BeautifulSoup
import csv

# Define the URL
url = "https://izw1.caltech.edu/ACE/ASC/DATA/level3/icmetable2.htm"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table using its attributes (modify as needed)
    table = soup.find('table', {'border': '1', 'width': '1500', 'bgcolor': '#ECFFFF'})

    # Open a CSV file in write mode
    with open('output.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')

        # Process all rows
        for row in table.find_all('tr'):
            # Extract data from each row
            row_data = [column.text.strip() for column in row.find_all(['td', 'th'])]

            # Write the row data to the CSV file
            writer.writerow(row_data)

    print("Data has been successfully written to output.csv")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}") ```

final kiln Feb 14, 2024, 8:18 PM

#

Here, use this ```

brave arch Feb 14, 2024, 8:18 PM

#

I do not get the subcolumn data

final kiln Feb 14, 2024, 8:18 PM

#

: P

past meteor Feb 14, 2024, 8:19 PM

#

brave arch ``` import requests from bs4 import BeautifulSoup import csv # Define the URL ...

Can you reformat this and use ``` as pedantic_propagation says

#

I also don't think this question is best suited for this room

#

Could you make a help thread?

brave arch Feb 14, 2024, 8:20 PM

#

past meteor Could you make a help thread?

done

brave arch Feb 14, 2024, 8:21 PM

#

brave arch done

please help me with code to get sub column data in appropriate format

iron basalt Feb 14, 2024, 8:21 PM

#

left tartan I think reality is: careers are shaped primarily by opportunity, some luck, and ...

I highly recommend being a generalist (if you have strong programming and math skills, you can tackle most things). Especially now that software is massively downsizing, they can't keep hyper specialized people anymore.

#

(See who remains after the layoffs, it's not the specialists...)

#

Software is returning to where it was before, lots of generalists with many hats.

final kiln Feb 14, 2024, 8:23 PM

#

brave arch ``` import requests from bs4 import BeautifulSoup import csv # Define the URL ...

So what's going wrong ?

I highly recommend you check the stuff you're receiving, like, just cuz you got a 200, and your headers say it should be html, it don't mean nothing, sometimes people do wtv >.>

brave arch Feb 14, 2024, 8:24 PM

#

final kiln So what's going wrong ? I highly recommend you check the stuff you're receivin...

I mean the webpage has data in sub column and the csv output in different column under different column header which make the data screwed up

final kiln Feb 14, 2024, 8:25 PM

#


import requests
from bs4 import BeautifulSoup
import csv

# Define the URL
url = "https://izw1.caltech.edu/ACE/ASC/DATA/level3/icmetable2.htm"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table using its attributes (modify as needed)
    table = soup.find('table', {'border': '1', 'width': '1500', 'bgcolor': '#ECFFFF'})

    # Open a CSV file in write mode
    with open('output.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')

        # Process all rows
        for row in table.find_all('tr'):
            # Extract data from each row
            row_data = [column.text.strip() for column in row.find_all(['td', 'th'])]

            # Write the row data to the CSV file
            writer.writerow(row_data)

    print("Data has been successfully written to output.csv")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}") ```

#

It's the same code, I need the syntax highlight

final kiln Feb 14, 2024, 8:26 PM

#

brave arch I mean the webpage has data in sub column and the csv output in different column...

But I still didn't understand

#

Like maybe show the page and the CSV

left tartan Feb 14, 2024, 8:28 PM

#

brave arch I mean the webpage has data in sub column and the csv output in different column...

Could you open a help thread? This channel can get noisy and this sounds like this isn’t going to take a few back and forth a. #❓｜how-to-get-help

left tartan Feb 14, 2024, 8:28 PM

#

final kiln Like maybe show the page and the CSV

It’s an html table, they’re trying to write to csv.

#

But unclear on whether the subtotals are in the same html table (I didn’t look at the raw html)

final kiln Feb 14, 2024, 8:28 PM

#

Yes I understood, I'm not understanding what went wrong

brave arch Feb 14, 2024, 8:29 PM

#

final kiln But I still didn't understand

final kiln Feb 14, 2024, 8:29 PM

#

Seeing the CSV will make it clear for me what is being described

left tartan Feb 14, 2024, 8:30 PM

#

I think basic problem js this isn’t a simple html table. Colspan headers, multiple splits, etc

brave arch Feb 14, 2024, 8:30 PM

#

so i think there should be a way to distinguish the sbucolumns and the column names must processed to create an equal number of names as the number of colums/subcolumns

left tartan Feb 14, 2024, 8:30 PM

#

Yah, it’s the cold pans

#

Colspan

brave arch Feb 14, 2024, 8:31 PM

#

how to fix this in my code so that I get appropriate data ?

final kiln Feb 14, 2024, 8:31 PM

#

But it looks fine

brave arch Feb 14, 2024, 8:31 PM

#

no, the ICME Plasma/Field Start, End Y/M/D (UT)" can be divided into "ICME Plasma/Field Start Y/M/D (UT)" and "ICME Plasma/Field End Y/M/D (UT)"

#

currently the data goes in different coloumn header , which is Comp. Start, End (Hrs wrt. Plasma/

final kiln Feb 14, 2024, 8:32 PM

#

I see three columns

brave arch Feb 14, 2024, 8:33 PM

#

yes but the header is not mapped correctly and I need to fix my code

final kiln Feb 14, 2024, 8:33 PM

#

Four rows before the row with the trxt

brave arch Feb 14, 2024, 8:33 PM

#

missing the header

#

check the below row and it misses the sub column

#

so data is messed up

final kiln Feb 14, 2024, 8:34 PM

#

Below row it's still three columns

#

And choosing cells at random, they seem to match

#

Maybe I'm not seeing something, but the only thing missing is the first row, which is the header

left tartan Feb 14, 2024, 8:38 PM

#

#

The Comp Start values are actually teh second column of the ICME plasma

#

#

Because of the col span

#

thead>
<tr align="center"><td><b>Disturbance Y/M/D (UT) <A HREF="#(a)">(a)</a></b>  </td><td colspan="2">
<b>ICME Plasma/Field Start, End Y/M/D (UT) <A HREF="#(b)">(b)</a> </b> </td><td colspan="2">

<b>Comp. Start, End (Hrs wrt. Plasma/ Field) <A HREF="#(c)">(c)</a></b> </td><td colspan="2">

final kiln Feb 14, 2024, 8:39 PM

#

Oooh I see it

left tartan Feb 14, 2024, 8:40 PM

#

Just a terrible table design.

final kiln Feb 14, 2024, 8:40 PM

#

But can CSV represent this table ?

left tartan Feb 14, 2024, 8:40 PM

#

Yah, the col span needs to be split into two headers

#

Meaning, the colspan=2 headers need to be unpacked into two headers

final kiln Feb 14, 2024, 8:41 PM

#

Wait is the / meant to match the two columns

#

Like "name of first column/ name of second"

#

No it's the start, end

#

It's a date range or something

left tartan Feb 14, 2024, 8:43 PM

#

Yah, start/end I think

#

The funny thing is you can just open the HTML in Excel

final kiln Feb 14, 2024, 8:45 PM

#

I think even copying and pasting would work

left tartan Feb 14, 2024, 8:45 PM

#

Yah

brave arch Feb 14, 2024, 8:47 PM

#

any suggestion for code ?

left tartan Feb 14, 2024, 8:47 PM

#

brave arch any suggestion for code ?

You have to rewrite to handle the colspans in the headers.

brave arch Feb 14, 2024, 8:48 PM

#

any suggestion ?

#

what exactly ?

left tartan Feb 14, 2024, 8:48 PM

#

You'll have to modify this step: row_data = [column.text.strip() for column in row.find_all(['td', 'th'])]

brave arch Feb 14, 2024, 8:49 PM

#

to ?

left tartan Feb 14, 2024, 8:50 PM

#

You'll have to figure that out, I'm just telling you the problem and where to start.

brave arch Feb 14, 2024, 8:51 PM

#

Okay

#

from bs4 import BeautifulSoup
import csv

# Define the URL
url = "https://izw1.caltech.edu/ACE/ASC/DATA/level3/icmetable2.htm"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table using its attributes (modify as needed)
    table = soup.find('table', {'border': '1', 'width': '1500', 'bgcolor': '#ECFFFF'})

    # Open a CSV file in write mode
    with open('output.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')

        # Process all rows
        for idx, row in enumerate(table.find_all('tr')):
            # Extract data from each row
            if idx == 0:
                # Handle headers with colspans for both main columns and subcolumns
                header_row = []
                for cell in row.find_all(['td', 'th']):
                    colspan = int(cell.get('colspan', 1))
                    header_text = cell.text.strip()
                    if colspan > 1:
                        # If colspan is greater than 1, add the text multiple times
                        header_row.extend([header_text] * colspan)
                    else:
                        # Otherwise, just add the text once
                        header_row.append(header_text)
                writer.writerow(header_row)
            else:
                # Extract data from each cell in the row
                row_data = [column.text.strip() for column in row.find_all(['td', 'th'])]
                writer.writerow(row_data)

    print("Data has been successfully written to output.csv")

else:
    print(f"Failed to retrieve the webpage. Status code: {response.status_code}")

#

This works for 1st 4 row and not all. Any suggestion ?

left tartan Feb 14, 2024, 9:03 PM

#

This is all GPT code, right?

brave arch Feb 14, 2024, 9:03 PM

#

not all .

#

some to get the suggestion

left tartan Feb 14, 2024, 9:04 PM

#

So, the first step is to figure out what is not doing what you want it to do.

#

You ask: "Any suggestion ?". What do you need help with in this version?

#

Or, said differently, you say it works for 1st 4 rows. What happens on 5th and 6th?

brave arch Feb 14, 2024, 9:06 PM

#

now with updated code the the first four row works but when it find out 5th which is again a column header the col span is not working and it messing up a data

left tartan Feb 14, 2024, 9:06 PM

#

What about 6th row?

brave arch Feb 14, 2024, 9:06 PM

#

not working some problem as earlier

left tartan Feb 14, 2024, 9:06 PM

#

What's wrong with 6th row?

brave arch Feb 14, 2024, 9:09 PM

#

Let me fix it

left tartan Feb 14, 2024, 9:10 PM

#

The trick to debugging these problems is to add a few print statements, so you can see what's happening.

#

Do you know what a colspan is?

short path Feb 14, 2024, 9:38 PM

#

I need help with some basics machine learning. I am trying to solve the Titanic prediction problem from Kaggle but after imputation, my train data gets more row somehow and then it doesn't match with the y_train

#

X = train_data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
y = train_data['Survived']

X_train, X_val, y_train, y_val = train_test_split(X, y)

# Encoding

oh_enc = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

oh_X_train = pd.DataFrame(oh_enc.fit_transform(X_train[['Sex']]))
oh_X_val = pd.DataFrame(oh_enc.transform(X_val[['Sex']]))

X_train_encoded = pd.concat([X_train.drop('Sex', axis=1), oh_X_train], axis=1)
X_val_encoded = pd.concat([X_val.drop('Sex', axis=1), oh_X_val], axis=1)

X_train_encoded.columns = X_train_encoded.columns.astype(str)
X_val_encoded.columns = X_val_encoded.columns.astype(str)

# Imputation

imputer = SimpleImputer()

imputed_train_data = pd.DataFrame(imputer.fit_transform(X_train_encoded))
imputed_test_data = pd.DataFrame(imputer.transform(X_val_encoded))

imputed_train_data.index = X_train_encoded.index
imputed_test_data.index = X_val_encoded.index

imputed_train_data.columns = X_train_encoded.columns
imputed_test_data.columns = X_val_encoded.columns

#

I put a py X_train_encoded.describe() after the encoding and it says the dataframe has 668 rows at that point, which is what it should have

#

But when I do this after the imputation, for some reason it shows the df with a varying number of rows around 830, though this number varies a little bit every time I restart the kernel and at the end of the program, I get this error "ValueError: Found input variables with inconsistent numbers of samples: [838, 668]" when trying to fit a model

#

Do you have any idea about what it could be?

#

Is it wrong to do the imputation after encoding?

serene scaffold Feb 14, 2024, 10:00 PM

#

short path Is it wrong to do the imputation after encoding?

This is the kind of question that bwginners think to ask. And the answer is that pros don't think about situations like this

short path Feb 14, 2024, 10:04 PM

#

serene scaffold This is the kind of question that bwginners think to ask. And the answer is that...

That's interesting

#

But then I can't think of an actual problem with the imputation I did

#

even if take out the parts where I set the index and the columns, it keeps adding rows

serene scaffold Feb 14, 2024, 10:06 PM

#

Encoding is about making sure that the same information is represented the same way, and that information is represented in a way that is intelligible by the model

#

Both of those concerns are equal

#

Imputation is about filling in missing information with whatever would be least interesting to the model

short path Feb 14, 2024, 10:08 PM

#

serene scaffold Imputation is about filling in missing information with whatever would be least ...

So it shouldn't add rows at all

#

that's very odd

serene scaffold Feb 14, 2024, 10:08 PM

#

No...

short path Feb 14, 2024, 10:08 PM

#

Could you try to run this program if send you the dataset?

serene scaffold Feb 14, 2024, 10:08 PM

#

No.

#

I'm actually on vacation

#

I even promised my mom that I wouldn't answer questions on discord during it.

#

I don't live with my mom bte

short path Feb 14, 2024, 10:08 PM

#

lol

serene scaffold Feb 14, 2024, 10:08 PM

#

Btw

#

I just answer questions on discord when I'm at her house so I don't have to talk to her

#

So she assumes I do it all the time.

short path Feb 14, 2024, 10:09 PM

#

I get it

serene scaffold Feb 14, 2024, 10:09 PM

#

Yeah

short path Feb 14, 2024, 10:09 PM

#

but then I don't know what to do

serene scaffold Feb 14, 2024, 10:10 PM

#

Me either

#

Just don't give up

#

You can do it

#

Eventually

short path Feb 14, 2024, 10:10 PM

#

I could try to do it using get_dummies

#

probably it wouldn't give an error

#

but I want to see what the problem really is

serene scaffold Feb 14, 2024, 10:10 PM

#

That's basically like one hot encoding, I think

short path Feb 14, 2024, 10:10 PM

#

Yeah

short path Feb 14, 2024, 10:11 PM

#

short path but I want to see what the problem really is

But this error let me curious

#

It's very odd and I don't want to have it happening again

serene scaffold Feb 14, 2024, 10:11 PM

#

short path probably it wouldn't give an error

"getting an error" is just one of two overarching ways that your program can do something other than what you intended for it to do.

#

It can also do something other than what you intended, without raising an error

#

In which case you're fucked. Unless you know what you're doing.

short path Feb 14, 2024, 10:12 PM

#

Yeah

short path Feb 14, 2024, 10:16 PM

#

short path ```py X = train_data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']] y = tra...

@left tartan do you have any idea what could this be?

#

serene scaffold Feb 14, 2024, 10:19 PM

#

short path <@738234281146712084> do you have any idea what could this be?

Please don't ping people who haven't engaged with your specific question to ask them for help. No one is on call to provide help.

short path Feb 14, 2024, 10:19 PM

#

Oh OK

#

my bad

crimson summit Feb 15, 2024, 2:50 AM

#

I just want to verify the my intuition of why activation functions are necessary. For this example lets consider a network that classifies numbers 0-9. A network WITHOUT an activation function will be able to do well on numbers that are similar to the sizes of the numbers in the training set but it will struggle if numbers appear to be darker or lighter because it is linear and cannot take both size and lightness/darkness into account. In a neural network completeing the same task but WITH activation functions will be able to take into account both orientation and lightness/darkness because the weights will learn all possible relationships between the pixel values in the data set and the sigmoid(or other acctivation function) will then take number that are slighty lighter or darker and transform/smooth them so that the network could ouput the same probability as if it were the nomral darkness/lightness. Does this intuition sound about right or is it incorrect in some way ?

burnt coral Feb 15, 2024, 4:03 AM

#

sorry, didn't mean to send the previous message. i'm using pytorch, and i'm not really sure on the format of the data on which my pretrained model was... trained. there's a lot of stuff in the code about making vocabs that are, to my knowledge, not actually torch vocab objects (it seems all custom?). i'm having trouble parsing it but it doesn't seem to be changing anything outside of something about ID correspondence and making a .pkl file. i want to fine tune it for a binary classification task. should i be worried about this data formatting? as far as i know the data is similar enough, at least in its raw form. would it be okay for me to just put it into a dataset and start training?

#

the creators of the pretrained model talk about implementing classification as a downstream task, if that changes the process at all

mild dirge Feb 15, 2024, 8:01 AM

#

crimson summit I just want to verify the my intuition of why activation functions are necessary...

Activation functions in general are necessary because otherwise you can only model linear functions. Any number of weight matrices A1, A2, A3, ... that are used in a model by passing an input through every layer: ...A3⋅(A2⋅(A1⋅X)) can also be modeled using a single weight matrix A⋅X @crimson summit

#

Because in the end each output is just a linear combination of the inputs

#

In the output an activation could be necessary because, f.e., you want the outputs to sum up to 1, because it needs to be a probability distribution. So that is why you would use softmax f.e.

eternal pawn Feb 15, 2024, 9:25 AM

#

Hey guys, need some help regarding missing data. I know its all project specific but what would you do here? I am still new to this so I'm still learning. I'm using for name it would be easier to just make them all missing. But for the rest? I'm trying to do credit score classification from kaggle, https://www.kaggle.com/datasets/parisrohan/credit-score-classification

Credit score classification

Given a person’s credit-related information, build a machine learning model that

gritty vessel Feb 15, 2024, 9:46 AM

#

i am trying to predict next two days value

#

can anyone review my code please

final kiln Feb 15, 2024, 10:08 AM

#

I'm updating my resume, and this is the stuff I've been doing, and it reminded me of yesterdays conversation

looking at this, it feels like coding the model and the direct contact with data is a small part of what needs to be done to get these things going, either that or I'm doing something wrong, but I don't really see how I'm gonna do this without all the infra, and when the infra is done all I'm doing is sort of waiting around (I use that time to do other stuff like update resume and etcs)

#

like, once the infra is done, I trigger some runs and wait for it to do its thing

#

then I take the results, review what I did right or wrong, go back to the data and repeat the process

#

once that is done I'll be doing deployments

#

what I'm saying is that rn to me it feels like the MLOps stuff is 90% of the work that needs to happen to train a model

wooden sail Feb 15, 2024, 10:18 AM

#

that sounds about right in practical jobs and is in line with what we were discussing yesterday with zestar amd the others

#

we use ml extensively where i work too, but we do none of this other than things like in your first bullet point :p

final kiln Feb 15, 2024, 10:19 AM

#

wooden sail we use ml extensively where i work too, but we do none of this other than things...

how do you do it then tho

#

like do you just get a super expensive instance and train stuff there

wooden sail Feb 15, 2024, 10:20 AM

#

we do math on paper, run our bad code on the university compute cluster (e.g. lsf or slurm), and publish the results in papers

final kiln Feb 15, 2024, 10:21 AM

#

wooden sail we do math on paper, run our bad code on the university compute cluster (e.g. ls...

ok yeah if you have access to a cluster I can see that

#

is it like a super computer

wooden sail Feb 15, 2024, 10:22 AM

#

yeah

final kiln Feb 15, 2024, 10:22 AM

#

I think we have a couple in my country, there was a cluster in my faculty but it was small stuff

wooden sail Feb 15, 2024, 10:23 AM

#

this one is not huge either, but the nodes add up to a couple dozen A100s or so

#

and the nature pf the work is more about reformulating problems and showing why some approaches should be better. actually running code is more to show evidence

#

or at least that's how i see it :p maybe my boss hates me haha

final kiln Feb 15, 2024, 10:27 AM

#

wooden sail this one is not huge either, but the nodes add up to a couple dozen A100s or so

not enough to train lamma but still quite alot

#

like, if you read their paper they were very concerned with performance and doing all these crazy optimizations

#

which is my next thing to do

#

I'm essentially emulating their challenge but on a smaller scale

#

my constraint is a 16Gb GPU on the cloud, which is not a lot to train even small transformers

wooden sail Feb 15, 2024, 10:30 AM

#

yeah that's rough

final kiln Feb 15, 2024, 10:30 AM

#

I'm writing it to be scalable, eventually if I decide to I can just add more gpu at will

#

I think it's a good play to start in an artificially constrained setup because that encourages me not to be wasteful of the available resources

wooden sail Feb 15, 2024, 10:33 AM

#

yeah. we do a fair amount of that too, since faster and less mem is always a selling point

muted iron Feb 15, 2024, 10:58 AM

#

ASDSC

gritty vessel Feb 15, 2024, 12:01 PM

#

hey I trained multiple regression models on ethereum historicaldataset and all of them are giving good results

#

is it time to get rich?

left tartan Feb 15, 2024, 12:06 PM

#

gritty vessel hey I trained multiple regression models on ethereum historicaldataset and all o...

Now think about risk management

river cape Feb 15, 2024, 12:11 PM

#

Whats the importance of p-value? in multiple linear regression

mild dirge Feb 15, 2024, 12:54 PM

#

gritty vessel hey I trained multiple regression models on ethereum historicaldataset and all o...

Test it on future data to make sure it will work well in the future 👌🏽

desert oar Feb 15, 2024, 1:10 PM

#

river cape Whats the importance of p-value? in multiple linear regression

importance in what context?

agile owl Feb 15, 2024, 1:23 PM

#

what's the least performance-impacting way to do bounds checking on the size of a cuda array so that I can implement some kind of mitigation on my program's concurrent memory usage

final kiln Feb 15, 2024, 1:31 PM

#

agile owl what's the least performance-impacting way to do bounds checking on the size of ...

What do you mean by bound checking the size ?

#

Like prevent out of bounds access of a memory address ?

agile owl Feb 15, 2024, 1:33 PM

#

prevent it from trying to allocate too much CUDA ram using something like thread blocking or smth

#

my model fits in memory but I'm using async on parts of my program

#

so it can be loaded twice I think

#

in which case being in memory twice doesn't fit

#

so I need to wait for the first model to be deallocated

#

in principle though let's say you have many GPUs I only want it to block if the next action would overallocate to CUDA

final kiln Feb 15, 2024, 1:38 PM

#

If you know how much memory each model occupies on the GPU (accounting for the stuff that happens when calculations are being made), you can use redis to store how many models have been loaded to GPU

#

Redis can act as a lock for multi processing and multi threading stuff since it's single threaded and does one thing at a time

agile owl Feb 15, 2024, 1:39 PM

#

good idea thx

#

speaking of which, I wish there were something like Spark for the GPU

#

that's where I really need the memory mgmt

final kiln Feb 15, 2024, 1:41 PM

#

I think pytorch has builtin functionality for handling multiple GPUs

#

Something something nn.DataParallel, idk

agile owl Feb 15, 2024, 1:41 PM

#

i don't mean just that part, I mean the memory management part

#

like it's not smart enough to say "no more right now"

#

the way Spark is

final kiln Feb 15, 2024, 1:41 PM

#

Yeah that would be useful for sure

#

But many times it's hard to predict because it's not just the memory of the model, it's the allocations that happen in between when the graph is being executed

#

Got an interview in 10min or so

#

Just wanna get it over with, these things give me anxiety ._.

jagged latch Feb 15, 2024, 1:52 PM

#

final kiln Got an interview in 10min or so

Nice. What job is it for?

final kiln Feb 15, 2024, 1:52 PM

#

jagged latch Nice. What job is it for?

ML Engineer

#

It's gonna be a physics problem or something of the sort

jagged latch Feb 15, 2024, 1:52 PM

#

final kiln It's gonna be a physics problem or something of the sort

GTrain LTrain

final kiln Feb 15, 2024, 1:52 PM

#

jagged latch <:GTrain:1002598782187348009> <:LTrain:1002598886076076143>

Ty

#

Ngl, failing this interview would be a bit of hit on my pride 🥲

#

But I'm rusty on physics have t touched it in 2 years

jagged latch Feb 15, 2024, 1:53 PM

#

final kiln But I'm rusty on physics have t touched it in 2 years

How important is Physics in ML?

final kiln Feb 15, 2024, 1:54 PM

#

jagged latch How important is Physics in ML?

In this case it's important because it relates to the companies core product

jagged latch Feb 15, 2024, 1:54 PM

#

final kiln In this case it's important because it relates to the companies core product

Which is?

final kiln Feb 15, 2024, 1:54 PM

#

jagged latch Which is?

Using simulation data to train models that make the simulation unnecessary

#

Physics simulation tend to be costly and some can't even be put in a GPU without a ton of simplifying assumptions

#

My MSc thesis was physics simulation, but of a different kind to the ones they seem to be doing, which relates more to structural engineering

#

My stuff was simulation of elementary particles like photons and electrons

#

The first time they started doing them was actually at Los Alamos during the Manhattan project

#

They had to simulate neutrons and such

#

Aight gotta go

jagged latch Feb 15, 2024, 1:58 PM

#

GTrain LTrain

jagged latch Feb 15, 2024, 2:16 PM

#

I have a question. Does anyone experienced in Plotly Dash know how to update 'active_cell' with the new dataframe so that the contents of the cell for data_table will be read with the new data rather than reading the old data from the data_table cell.

lofty thorn Feb 15, 2024, 2:18 PM

#

trimmed mean-- first sort the numbers and then removing 1st and last number and then finding mean of that...did i understand it right?

jagged latch Feb 15, 2024, 2:23 PM

#

jagged latch I have a question. Does anyone experienced in Plotly Dash know how to update 'ac...

To give a better idea of the problem, imagine you have a table with 1 column and the rows have the values A, B, C, D. You then press a button that performs an operation that changes the values from A, B, C, D to 1, 2, 3, 4. Now you go to click the cell with the value 2, but the contents have it read as B, even though you can clearly see the 2 on that table. What would be the best way to solve this?

wooden sail Feb 15, 2024, 2:28 PM

#

lofty thorn trimmed mean-- first sort the numbers and then removing 1st and last number and ...

not necessarily only the first and last, but yes

trim saddle Feb 15, 2024, 2:28 PM

#

jagged latch To give a better idea of the problem, imagine you have a table with 1 column an...

You have to change the underlying data of the table,. So some kind of call back, that on cell change updates the underlying data

lofty thorn Feb 15, 2024, 2:32 PM

#

wooden sail not necessarily only the first and last, but yes

ok..also i didn't get what is meant by 'p' in the formula?

#

do i take the largest no. or smallest

#

in place of 'p'

wooden sail Feb 15, 2024, 2:36 PM

#

lofty thorn ok..also i didn't get what is meant by 'p' in the formula?

what i said is exactly what the p means

#

.latex sigma notation means the following;
[
\sum_{i=n}^N x_i = x_n + x_{n+1} + x_{n+2} + \cdots + x_{N-1} + x_{N}
]

strange elbowBOT Feb 15, 2024, 2:37 PM

#

$latex.png$

wooden sail Feb 15, 2024, 2:38 PM

#

so if you add p to n, and subtract p from N, it means "ignore the p smallest and p largest entries, then take the mean"

#

p can be 1, but it can also be any other value

jagged latch Feb 15, 2024, 2:40 PM

#

trim saddle You have to change the underlying data of the table,. So some kind of call back,...

And that would be done by having it modify the layout?

lofty thorn Feb 15, 2024, 2:41 PM

#

wooden sail so if you add p to n, and subtract p from N, it means "ignore the p smallest and...

ohk

#

got it

final kiln Feb 15, 2024, 3:00 PM

#

Eh, I do believe it went almost to perfection

#

I wasn't able to answer a question about the boundary conditions of the navier stokes equation on an air plane wing, but he said it was cool. The first questions I aced them all

#

And there was a lot of time left at the end

#

We just kinda talked about the role

#

Tho I'm a bit embarrassed I didn't know that one

#

I also got wrong a question regarding genetic algorithms, I think I correctly explained everything but I said something wrong by relating them to reinforcement learning

dusty forge Feb 15, 2024, 3:08 PM

#

Anyone here running a study group? Looking for one where people actually want and enjoy learning and working together on (mini) projects. Preferably based in the European timezone.

crimson summit Feb 15, 2024, 3:43 PM

#

mild dirge Activation functions in general are necessary because otherwise you can only mod...

I understand if you don’t use one the whole thing becomes linear but if you do use an activation function is the visual of the flow of numbers I described correct ?

jagged latch Feb 15, 2024, 4:41 PM

#

trim saddle You have to change the underlying data of the table,. So some kind of call back,...

THX! I went with an interesting approach that saves a CSV somewhere and the program will read that newly created CSV when I select the cells.

jagged latch Feb 15, 2024, 4:43 PM

#

final kiln Eh, I do believe it went almost to perfection

Nice. Imagine if Sheldon was your interviewer though.

final kiln Feb 15, 2024, 4:44 PM

#

jagged latch Nice. Imagine if Sheldon was your interviewer though.

I really dislike that show >.<

odd meteor Feb 15, 2024, 4:44 PM

#

dusty forge Anyone here running a study group? Looking for one where people actually want an...

Idk about mini-projects but if you're interested in NLP, join Cohere discord server. They have severally niche-specific study group in NLP.

MLCollective also has something like that, not just in NLP domain alone.

Just Google them, the link to their Discord community can always be found on their website.

desert oar Feb 15, 2024, 4:45 PM

#

odd meteor Idk about mini-projects but if you're interested in NLP, join Cohere discord ser...

I was just about to say, I really like the idea of a casual small study group with non-work colleagues. Thanks for these

jagged latch Feb 15, 2024, 4:45 PM

#

final kiln I really dislike that show >.<

Do you have a specific reason?

final kiln Feb 15, 2024, 4:48 PM

#

jagged latch Do you have a specific reason?

It's just a bunch of stereotypes. High levels of Intelligence also frequently come with high levels of empathy and emotional understanding. Sheldon is a myth as far as I know, and you can see that if you read up on real world geniuses

#

Physics and science students in general are also kinda just normal college students, they don't look or sound like the people on that show

desert oar Feb 15, 2024, 4:57 PM

#

it's a noxious combination of bad stereotypes, pop-culture pandering, and generally not being funny or interesting

proud river Feb 15, 2024, 5:08 PM

#

ai is cool

gritty vessel Feb 15, 2024, 5:36 PM

#

Guys anyone in research ?

#

I wrote my first paper today
So just looking for a review before submitting it

#

Did I wrote something wrong ?😶‍🌫️

crimson summit Feb 15, 2024, 5:40 PM

#

@wooden sail In a fully connected neural network that detects numbers 0-9 if two 7's are inputted into the network and both 7's are the exact same size and position expet one 7 is slightly darker and one is slightly lighter when the pre activation values at neuron 1(for example) the sigmoid function will essentially make the activation values the same something like 0.993 and 0.992 which will allow for both 7's to be treated the same thought the rest of the network and be classified correctly does this intuition sound right ?

wooden sail Feb 15, 2024, 5:42 PM

#

crimson summit <@467435887236612106> In a fully connected neural network that detects numbers ...

hmm there's several issues with that reasoning

#

the most important being that you wouldn't really know what value the network will output nor why

#

and there's no reason why the output values would be close to each other

#

0.50001 for one of them and 0.999999 for the other is still a correct classification

crimson summit Feb 15, 2024, 5:46 PM

#

wooden sail hmm there's several issues with that reasoning

at the end of the network their would be probabilities for each number so if one is 0.50001 in the first layer it could cause network to output a higher probability for another number whereas if the network has already learning the relationship for the pixel values for a 7 in that position the sigmoid will essentially just transform the lighter 7 to take the same path as the normal 7 throughout the network

wooden sail Feb 15, 2024, 5:47 PM

#

idk what you mean by "path through the network", any input will have exactly the same operations done on it

#

classifiers do often have an output corresponding to a probability being assigned to each class. all you need is for one class to have a probability higher than the others for that to count as the class the network predicts

#

it doesn't matter if it's the largest by 0.9 or by 1e-15

#

also what you understand by "similarity" is not at all what the network will learn to treat as "similarity". what networks do usually has no real world interpretation that makes intuitive sense to people

#

it's just not the case that you can interpret what a neural network is doing in general. you'd be better served thinking of it as a function that maps an input to a categorical distribution, without wondering about the "how" for now

crimson summit Feb 15, 2024, 5:52 PM

#

wooden sail idk what you mean by "path through the network", any input will have exactly the...

pre activation value for lighter 7 =4 pre activation value for darker 7 =5. Without the sigmoid the "lighter seven" would come up with a much different probability at the end of the network compared to the darker 7 but with the sigmoid if the pixel values are slightly different it will make them both 0.991 and 0.992 so then when all operations get done throughout rest of network they come out with the same probability

wooden sail Feb 15, 2024, 5:53 PM

#

crimson summit pre activation value for lighter 7 =4 pre activation value for darker 7 =5. W...

nope

#

there's no reason why the probability of the two 7s will be anywhere near each other

#

in fact the scenario you're describing is often enough to make a network guess the number incorrectly

crimson summit Feb 15, 2024, 5:54 PM

#

wooden sail there's no reason why the probability of the two 7s will be anywhere near each o...

if they are same shape and size and all that is different is the shade this doesn't happen ?

wooden sail Feb 15, 2024, 5:54 PM

#

it can very well happen

crimson summit Feb 15, 2024, 5:56 PM

#

so then isnt that what sigmoid does in this case

wooden sail Feb 15, 2024, 5:56 PM

#

nope

#

all the sigmoid does is enforce that the outputs are between 0 and 1, and the softmax (the multidimensional form of the sigmoid) makes it so that the outputs are between 0 and 1 and add up to 1

crimson summit Feb 15, 2024, 6:07 PM

#

wooden sail all the sigmoid does is enforce that the outputs are between 0 and 1, and the so...

okay let me try an rethink my logic to try and fit this and ill make another scenario hopefully better

hoary halo Feb 15, 2024, 6:30 PM

#

Can anyone help me with a problem im having with chromadb in python? -
im using unstructured to chunk and embed files to my local instance of chromadb. i then query the chromadb and send the k chunks to an LLM for natural language processing, and get a result. This flow works well, but im at the point where i need to store metadata and filter by the metadata when querying.

I am inserting vendor invoices in pdf format into chroma, and then i need to query them later. This is obviously difficult with multiple invoices as chroma does not 'know' which one or ones i want to query. Therefore, i want to extract some data from the invoice into metadata (invoice #, payee, vendor name) so when i query i can use this to filter results. (example: give me the total of all invoices from VENDOR to PAYEE, or what is the total of invoice INVOICE_NUMBER)

does anyone here have any experience with this? am i barking up the wrong tree and there is an easier way to do this? at a certain point the docs for both chroma and unstructured kind of just drop off and stop being useful

#

i know its a long shot 😆

crimson summit Feb 15, 2024, 6:33 PM

#

So a fully connected network that is trained to detect numbers (numbers are black and background of image is white) 0-9. It is trained on numbers that are slightly different shapes sizes and brightness. Through backprop the function (aka Neural Net) learns to generalize across all numbers 0-9. During testing we have two 7's of same same size and shape but one 7 has a slightly different brightness that has not been seen in training. When both of these 7's are inputed into the network they get classified correctly. The reason they were both classified correctly is not just because of the sigmoid but a combo of the weights and sigmoid because through backprop the weights learned relationships between pixel values of all different types of 7's so one that appeared in training will obviously do well and the one which has the brightness that did not show up in training the sigmoid assists this unseen brightness to be seen the same as one seen in training by mapping positive values as the same 4 and 5 get mapped to 0.991 and 0.993. So in conclusion its a combination of the sigmoid and all the the learned weights from training on many different examples that allow for this generalization to occur. @wooden sail does this seem to follow a better train of thought ?

wooden sail Feb 15, 2024, 6:34 PM

#

crimson summit So a fully connected network that is trained to detect numbers (numbers are blac...

you're still getting the sigmoid part wrong

#

there's no reason two different instances of the same class would be classified with the same probability, neither through the affine transformations nor through the sigmoid

crimson summit Feb 15, 2024, 6:36 PM

#

wooden sail there's no reason two different instances of the same class would be classified ...

probability wont be the same but more similar than if the sigmoid was not their right ?

wooden sail Feb 15, 2024, 6:36 PM

#

no

#

if the sigmoid is not there you won't even have probabilities in the first place

crimson summit Feb 15, 2024, 6:38 PM

#

ohh right cause its a linear transformation

#

w out sigmoid

wooden sail Feb 15, 2024, 6:39 PM

#

affine

formal sky Feb 15, 2024, 6:40 PM

#

watching a tutorial and i don't think it was explained, why is X always capital?

#

Or better, is it always capitalized?

wooden sail Feb 15, 2024, 6:41 PM

#

need some more context, but capital bold letters usually represent matrices or tensors, while capital letters without boldface denote random variables

#

you'd have to show an example though, because notation only makes sense in context and varies by book/course/video. symbols in math don't have fixed meanings

crimson summit Feb 15, 2024, 6:42 PM

#

wooden sail affine

so w out sigmoid its just linear transformation but with sigmoid it helps give similar probability to 7's of diff brightness. But I cant compare and say sigmoid helps give better probability then without sigmoid because without sigmoid is no probability at all just linear transformation

verbal bay Feb 15, 2024, 6:42 PM

#

wooden sail need some more context, but capital bold letters usually represent matrices or t...

Hi! Are you able to help me with python?

formal sky Feb 15, 2024, 6:42 PM

#

wooden sail need some more context, but capital bold letters usually represent matrices or t...

Sorry forgot about context

wooden sail Feb 15, 2024, 6:42 PM

#

crimson summit so w out sigmoid its just linear transformation but with sigmoid it helps give s...

no, there's no reason why the probabilities would be similar

wooden sail Feb 15, 2024, 6:43 PM

#

formal sky Sorry forgot about context

here it just means a transformation was applied to x and it's no longer the original

crimson summit Feb 15, 2024, 6:44 PM

#

wooden sail no, there's no reason why the probabilities would be similar

but if a pre activation value is 4 and a pre activation value of the same number but darker shade is 5 then when transformed by the sigmoid the values will be similar so then when further operations are done there result will be kind of similar as well

formal sky Feb 15, 2024, 6:44 PM

#

Alright ty

crimson summit Feb 15, 2024, 6:44 PM

#

sorry if being redundant just trying to get this through my head

wooden sail Feb 15, 2024, 6:45 PM

#

crimson summit but if a pre activation value is 4 and a pre activation value of the same number...

yeah but why would they be 4 and 5 to begin with

#

what you hope is that, in most cases, you predict the class correctly. all that needs to happen for that is that the correct category gets the largest probability. nothing is said about the value of that probability

#

an output of [1,0,0] is just as valid as [0.35, 0.33, 0.32]

#

and you can't even guarantee that it'll always work. you'll have many cases where you get the wrong output too

verbal bay Feb 15, 2024, 6:47 PM

#

wooden sail here it just means a transformation was applied to x and it's no longer the orig...

Can I get help please?

wooden sail Feb 15, 2024, 6:47 PM

#

not from me atm, sorry

verbal bay Feb 15, 2024, 6:48 PM

#

:/

crimson summit Feb 15, 2024, 6:49 PM

#

wooden sail yeah but why would they be 4 and 5 to begin with

lets say the darker 7 pixel values =0.7 0.3 0.4 0.5 and the lighter 7 pixel values =0.6 0.3 0.2 0.4 when these values are multiplied by the same configuration of weights that lead to neuron 1 in layer one and then passed through the sigmoid they will have similar values

wooden sail Feb 15, 2024, 6:50 PM

#

not necessarily, and especially not if you have several layers

#

what you consider distance or similarity is not the same as what the network considers distance or similarity

formal sky Feb 15, 2024, 6:53 PM

#

I am getting this error, the person who is doing the tutorial fixed it by running:

train, valid, test = np.split(df.sample(frac=1), [int(0.6*len(df)), int(0.8*len(df))])

but i cannot fix it by running that, what am i doing wrong? 🙂

#

oh wait, think i found the issue

#

yep found it, inside the scale_dataset i was passing the wrong args

crimson summit Feb 15, 2024, 7:12 PM

#

wooden sail what you consider distance or similarity is not the same as what the network con...

ohhh so network could have 0.01 difference between activation values in neuron 1 be a massive distance so as the sigmoid transforms both pre activation values if one is 0.01 smaller than the other it can result in a massive difference in estimated probability but with patterns learned from wights it will still have highest probability of its class and get classified correctly

#

so similar activation value dosnt mean similar probabilty but a probability that will still be the biggest in relation to all other classes so calssification can be correct

#

@wooden sail

wooden sail Feb 15, 2024, 7:15 PM

#

yeah

crimson summit Feb 15, 2024, 7:16 PM

#

wooden sail yeah

o shitttttttt thanks so much man ! Thanks for working with noobs like me !

final kiln Feb 15, 2024, 7:30 PM

#

Anyone know how to try Gemini 1.0 if you're in Europe ? I have the subscription thing but the model says it's not capable of generating images, which I thought was one of its capabilities

#

It's also clearly gpt 3.5 level

#

The android app is not available too

agile cobalt Feb 15, 2024, 7:35 PM

#

sometimes models might just hallucinate that they are or are not able to do something, and asking again in a new chat (and phrasing it differently, e.g. be more explicit like include "generate an image" in the start of the phrase) could work correctly, but there is also a huge chance it is just not available in Europe at all

final kiln Feb 15, 2024, 7:43 PM

#

#

#

See what I mean by gpt3.5 level

#

Ah no wait, I got mislead by the you're right part

#

The text does kinda make sense still

final kiln Feb 15, 2024, 7:46 PM

#

agile cobalt sometimes models might just hallucinate that they are or are not able to do some...

I mean it's totally fine if it's not available, but it seems like they are saying it is when in fact it's not

agile cobalt Feb 15, 2024, 7:48 PM

#

one way or the other, that is closer to tech support than data science ; may as well move to offtopic

final kiln Feb 15, 2024, 7:54 PM

#

Uhm not sure if it is off topic, since I'm trying to gather how google is doing their AI model rollout

past meteor Feb 15, 2024, 7:55 PM

#

I gave one of our interns the job to play around with these for a couple of months

#

So we know the capabilities of stuff like goose.ai, quantized models etc.

final kiln Feb 15, 2024, 8:03 PM

#

Open ai has also released a new model

#

https://openai.com/sora

Sora: Creating video from text

#

Looks insanely good

#

The dogs in the snow are my favorite

frigid terrace Feb 15, 2024, 8:12 PM

#

final kiln

It's like Gemini is insufficient in coding

final kiln Feb 15, 2024, 8:13 PM

#

frigid terrace It's like Gemini is insufficient in coding

Yeah I don't know what's up with it

trim saddle Feb 15, 2024, 8:16 PM

#

jagged latch THX! I went with an interesting approach that saves a CSV somewhere and the prog...

You can also use dash.store components

final kiln Feb 15, 2024, 8:19 PM

#

I really wanna work at open ai they're doing so much cool stuff 😭

#

https://cdn.openai.com/sora/videos/italian-pup.mp4

▶ Play video

#

It's like those impossible drawings, the whole doesn't make sense but the details do

versed pilot Feb 15, 2024, 8:37 PM

#

final kiln See what I mean by gpt3.5 level

is gpt 3.5 the one that comes for free with Bing and Microsoft is embedding into everyone of their products?

odd meteor Feb 15, 2024, 8:37 PM

#

frigid terrace It's like Gemini is insufficient in coding

Just saw that they launched Gemini 1.5 today.

https://x.com/sundarpichai/status/1758145921131630989?s=46&t=sRKd79BJEKLsp89AJToMJw

I feel they probably added mixture of experts (M.O.E) and called it 1.5

Sundar Pichai (@sundarpichai) on X

In December, we launched Gemini 1.0 Pro. Today, we're introducing Gemini 1.5 Pro! 🚀

This next-gen model uses a Mixture-of-Experts (MoE) approach for more efficient training & higher-quality responses. Gemini 1.5 Pro, our mid-sized model, will soon come standard with a…

versed pilot Feb 15, 2024, 8:40 PM

#

a year ago they were saying this, not sure where they are now Next-generation OpenAI model. We’re excited to announce the new Bing is running on a new, next-generation OpenAI large language model that is more powerful than ChatGPT and customized specifically for search. It takes key learnings and advancements from ChatGPT and GPT-3.5 – and it is even faster, more accurate and more capable. https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/

The Official Microsoft Blog

Reinventing search with a new AI-powered Microsoft Bing and Edge, y...

To empower people to unlock the joy of discovery, feel the wonder of creation and better harness the world’s knowledge, today we’re improving how the world benefits from the web by reinventing the tools billions of people use every day, the search engine and the browser. Today, we’re launching an all new, AI-powered Bing search...

final kiln Feb 15, 2024, 8:41 PM

#

versed pilot is gpt 3.5 the one that comes for free with Bing and Microsoft is embedding into...

As far as I know bing is using gpt 4

versed pilot Feb 15, 2024, 8:42 PM

#

ok makes sense, the blog entry I linked to is old

#

I had some fun with bing gpt recently. It refused to write a poem about an old cunning fox

#

because apparently there are iranian poems that describe England as an old cunning fox

#

I had to ask in a roundabout way to convince it I am not a revolutionary guard 🙂

final kiln Feb 15, 2024, 8:44 PM

#

I just asked it, it did it

#

In the shadowed woods, under moon's soft gaze,
Lived an old fox, traversing life's complex maze.
With fur as red as the dying day's sun,
He moved in silence, his cunning second to none.

Through the thicket, under canopy's embrace,
He danced with shadows, a silent, fleeting grace.
Eyes gleaming bright in the dark of the night,
He was a specter, a ghost, just out of sight.

(...)

#

I truncated it cuz it goes for a long time

versed pilot Feb 15, 2024, 8:48 PM

#

The thing with GPT is that it doesn't do the same thing in a repeatable way. It second guessed me in an entirely wrong way here

final kiln Feb 15, 2024, 8:49 PM

#

versed pilot The thing with GPT is that it doesn't do the same thing in a repeatable way. It ...

Uhm bing might have a different system prompt, or be a different variation of gpt 4

versed pilot Feb 15, 2024, 8:50 PM

#

it's always an older dodgier version of gpt 🙂

odd meteor Feb 15, 2024, 9:11 PM

#

AI Twitter is buzzing today. What a terrific Thursday.

Sora
Gemini 1.5
V-JEPA

All in a day. And the day's not even over yet

hollow escarp Feb 15, 2024, 9:16 PM

#

Somebody knows any good way to perfectly time showing captions with elevenlabs generated voice using python

hollow escarp Feb 15, 2024, 10:19 PM

#

If yes please ping

formal sky Feb 15, 2024, 10:58 PM

#

Anyone knows this tutorial? https://www.youtube.com/watch?v=i_LwzRVP7bg
And would be good to learn some basics?

YouTube

freeCodeCamp.org

Machine Learning for Everybody – Full Course

Learn Machine Learning in a way that is accessible to absolute beginners. You will learn the basics of Machine Learning and how to use TensorFlow to implement many different concepts.

✏️ Kylie Ying developed this course. Check out her channel: https://www.youtube.com/c/YCubed

⭐️ Code and Resources ⭐️
🔗 Supervised learning (classification/MAGIC...

▶ Play video

tidal scroll Feb 16, 2024, 4:56 AM

#

hi guys, I just want to ask a question, I have read an interesting journal about transformer model and finds out that transformer has its own inverted version. Does anyone understand about? I need help to understand of how it works

orchid forge Feb 16, 2024, 5:20 AM

#

I need help regarding data analysis project making

dusty forge Feb 16, 2024, 10:29 AM

#

formal sky Anyone knows this tutorial? https://www.youtube.com/watch?v=i_LwzRVP7bg And woul...

She talks a bit fast and the topic is physics, delta and gamma rays in energy, very interesting but half of the time no clue what the values need to be other than the same as hers 😄

rose crest Feb 16, 2024, 11:25 AM

#

hi guyz i am working on a college project, can anyone suggest / give me a learning link to train custom object classification using Tensorflow?

formal sky Feb 16, 2024, 11:40 AM

#

dusty forge She talks a bit fast and the topic is physics, delta and gamma rays in energy, v...

It's the dataset in question, but thanks for the insight. By chance you know other good tutorial with an intro to ML?

odd meteor Feb 16, 2024, 11:53 AM

#

formal sky It's the dataset in question, but thanks for the insight. By chance you know oth...

https://Kaggle.com/learn

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

odd meteor Feb 16, 2024, 11:54 AM

#

rose crest hi guyz i am working on a college project, can anyone suggest / give me a learni...

Check TensorFlow website there's a tutorial section therein. I think you'll find it there.

formal sky Feb 16, 2024, 11:55 AM

#

odd meteor https://Kaggle.com/learn

I was going for that after some intro video, i understand better when watching someone explain, i suppose it's related to being voiced idk

#

But i can try to give it a look again and see if i can understand text based

odd meteor Feb 16, 2024, 12:00 PM

#

formal sky I was going for that after some intro video, i understand better when watching s...

https://youtube.com/playlist?list=PL8P_Z6C4GcuVQZCYf_ZnMoIWLLKGx9Mi2&si=dQXDNFfk41Zvva6A

You can definitely find more videos that'll speak to you on a personal level on YouTube if this one isn't doing the job well 😀

YouTube

Tabular Data

This content is based on Machine Learning University (MLU) Accelerated Tabular Data class. Slides, notebooks and datasets are available on GitHub: https://gi...

rose crest Feb 16, 2024, 12:08 PM

#

odd meteor Check TensorFlow website there's a tutorial section therein. I think you'll find...

okay thanks

spark nimbus Feb 16, 2024, 12:36 PM

#

Using pyspark pandas, is there a way to do operations on dates lazily? (in my case, adding pd.offsets.MonthEnd)

dry geyser Feb 16, 2024, 1:00 PM

#

anyone with elastic py experience, and elastic in general, what is the best way to handle null fields gracefully?

#

pipeline? mappings/schema?

#

also, how can I make polars convert a column into a list of its own former values?

formal sky Feb 16, 2024, 1:21 PM

#

odd meteor https://youtube.com/playlist?list=PL8P_Z6C4GcuVQZCYf_ZnMoIWLLKGx9Mi2&si=dQXDNFfk...

Will give it a look, thanks 😉

dusty forge Feb 16, 2024, 1:59 PM

#

formal sky It's the dataset in question, but thanks for the insight. By chance you know oth...

I've decided on two main things, the ML course on Coursera by Andrew Ng (the specialization is paid but the individual courses can be done for free), this course is theory-heavy with math etc. To balance out the theory and frankly not get bored, I also chose the handson book from O'Reilly in which you work on projects in every chapter. Anything that I need to read into in terms of math and stats, I'll use books. Those single tutorials on Youtube are nice, but after done several of them I came to the conclusion that I learned what to type, but not why and how it actually works, which feels like a risk of learning bad habits from the start.

thorn flame Feb 16, 2024, 3:23 PM

#

formal sky It's the dataset in question, but thanks for the insight. By chance you know oth...

Heard of AWS deepracer student league?

#

I'm also currently transitioning to the ML field and it's already blowing my mind lol

#

you get a chance to earn an Udacity nanodegree :)

formal sky Feb 16, 2024, 3:40 PM

#

thorn flame Heard of [AWS deepracer student league](https://docs.aws.amazon.com/deepracer/la...

Currently i am not a student, apparently can't join if i am not one :\

thorn flame Feb 16, 2024, 3:42 PM

#

Ah, okays

final kiln Feb 16, 2024, 4:33 PM

#

Y can't the GPU do unsigned ints

#

Why do I need to use int64 for indexing

#

I thought I had all this resolved

#

D :

lyric forge Feb 16, 2024, 4:58 PM

#

Can someone guide me with this exercise?

desert oar Feb 16, 2024, 5:36 PM

#

formal sky I was going for that after some intro video, i understand better when watching s...

there's a known effect where students claim to learn the best from passive lectures, but actually retain the info the least. i personally really enjoy lectures too, but it requires the learner to actively participate by pausing to take notes, following up with practice problems, etc.

#

i'm not familiar with this particular YT channel and it might be really good. but i also strongly encourage spending some time with hands-on practice projects and ideally also practice problems from a good textbook

desert oar Feb 16, 2024, 5:38 PM

#

lyric forge Can someone guide me with this exercise?

i assume you're not supposed to use a dedicated csv reader, right?

#

without thinking too hard about code, how would you do it? if you had to just explain it in words.

formal sky Feb 16, 2024, 6:05 PM

#

desert oar there's a known effect where students claim to learn the best from passive lectu...

Uh didn't knew that, do you recommend any books for begginers? or kaggle is enough?

wooden sail Feb 16, 2024, 6:15 PM

#

i'll share this here, it might be interesting to some of you. when doing contrained optimization, one interior point method is to make the problem unconstrained by adding "barrier functions", among which log barriers are common. they explode to infinity when you come close to a specific value, so it's a good way of enforcing inequality. going past the point in a single step, however, gets you either a complex numer or a nan, depending on how the log is implemented. in the case of pytorch you get a nan, but the gradient seems to be hardcoded as 1/x even for values of x outside the domain of log(x) over the reals https://github.com/pytorch/pytorch/issues/76516

GitHub

torch.log has unexpected gradient over negative domain · Issue #765...

🐛 Describe the bug When the input is real, the log function and its derivative are undefined over the negative domain. In torch, the log function is undefined, but its derivative evaluates to 1/x o...

final kiln Feb 16, 2024, 7:09 PM

#

#

#

I might've been using models that are too large

#

but yet to see what's gonna happen on that final slope

#

don't matter where it is now, if it converges to a flatter slope blue will catch up

ornate ledge Feb 16, 2024, 8:22 PM

#

Hey guys newbie here, last year I started to get involved in data analysis using python, something pretty basic like using pandas in a df in a notebook. Instead of getting into more data analysis I got more interested in python itself. To be honest I was even embarrased of how I was "writing" code before, not even using functions or error handling. Well the thing is a read a couple of books, python crashcourse, python by John Zelle that was a bit more formal. And now my code is a bit more modular and following pep8, last project I made was scraping about 20k data entries from a website and as well creating a recommending system for anime based on similarity in a database.

Now I was diving into a book more related to web scraping and I got frustrated because it started to use classes and objects. Some recommendations when dealing with this more "advanced" concepts, probably pretty basic but I'm 28 migrating from a health carreer.

serene scaffold Feb 16, 2024, 8:30 PM

#

ornate ledge Hey guys newbie here, last year I started to get involved in data analysis using...

if you aren't embarrassed by your old code, then you aren't growing as a programmer.

#

anyway, the data science side of Python applies OOP differently than "normal" python. we don't often create our own classes. and when we do, it's really just extending the interface of a library like pytorch.

trail summit Feb 16, 2024, 8:31 PM

#

hi guys

#

is there anyone who would be kind enough to help me a little

serene scaffold Feb 16, 2024, 8:32 PM

#

ornate ledge Hey guys newbie here, last year I started to get involved in data analysis using...

you might watch this tutorial series https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjhwuqU2rCEAxVSFVkFHfhXCLYQwqsBegQIDBAF&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DZDa-Z5JzLYM&usg=AOvVaw1N-GitVlLEOvZ6_xUuzcHn&opi=89978449

trail summit Feb 16, 2024, 8:32 PM

#

im new to all this

#

thanks in advance

#

please ping me if you can ;-;

serene scaffold Feb 16, 2024, 8:32 PM

#

trail summit is there anyone who would be kind enough to help me a little

always ask your actual question right out of the gate. don't wait for a commitment or spread your explanation out over multiple messages.

trail summit Feb 16, 2024, 8:32 PM

#

serene scaffold always ask your actual question right out of the gate. don't wait for a commitme...

oh

#

thx :)

#

how do i put this

serene scaffold Feb 16, 2024, 8:33 PM

#

trail summit how do i put this

post the code as text (no screenshots) and explain how it's different from what you want.

trail summit Feb 16, 2024, 8:33 PM

#

im using a python backend for my react native project and basically it uses the phone's acceler-

serene scaffold Feb 16, 2024, 8:33 PM

#

if there's an error message, post the whole error message as text.

trail summit Feb 16, 2024, 8:33 PM

#

serene scaffold post the code as text (no screenshots) and explain how it's different from what ...

oh alr

serene scaffold Feb 16, 2024, 8:33 PM

#

trail summit im using a python backend for my react native project and basically it uses the ...

react? you might be looking for #web-development

trail summit Feb 16, 2024, 8:33 PM

#

serene scaffold react? you might be looking for <#366673702533988363>

no lol

#

its more python than react :D

serene scaffold Feb 16, 2024, 8:34 PM

#

okay. well give as much information about your data science question as is needed in one message. don't spread it out over a bunch of messages.

trail summit Feb 16, 2024, 8:34 PM

#

tysm brb

left tartan Feb 16, 2024, 8:35 PM

#

serene scaffold if you aren't embarrassed by your old code, then you aren't growing as a program...

I’m even embarrassed by my new code!

serene scaffold Feb 16, 2024, 8:35 PM

#

left tartan I’m even embarrassed by my new code!

I'm never embarrassed because I have no shame.

trail summit Feb 16, 2024, 8:36 PM

#

App.js:

import React, { useEffect, useState, useRef } from 'react';
import { StyleSheet, Text, View } from 'react-native';
import { Accelerometer } from 'expo-sensors';
import axios from 'axios';

export default function App() {
  const [data, setData] = useState({});
  const subscription = useRef(null);

  const _subscribe = () => {
    Accelerometer.setUpdateInterval(1000);
    subscription.current = Accelerometer.addListener(accelerometerData => {
      setData(accelerometerData);

      // Log the accelerometer data
      console.log("Accelerometer data: ", accelerometerData);

      // Send accelerometer data to the server for prediction
      sendDataToServer(accelerometerData);
    });
  };

  const _unsubscribe = () => {
    subscription.current && subscription.current.remove();
    subscription.current = null;
  };

  const sendDataToServer = (data) => {
    console.log("Sending data to server: ", data);
  
    axios.post('http://10.0.0.21:5000/predict', data)
      .then(response => {
        console.log("Response received from server: ", response.data);
        const predictedMovement = response.data.prediction;
  
        // Log the predicted action based on your model's predictions
        logPredictedAction(predictedMovement);
  
        // TODO: You can add logic to perform actions based on the predicted movement here
        // For example, send an emergency notification or update the UI
      })
      .catch((error) => {
        console.error('Error:', error);
        console.log('Error details:', error.response);
      });
  };

ornate ledge Feb 16, 2024, 8:36 PM

#

serene scaffold you might watch this tutorial series https://www.google.com/url?sa=t&rct=j&q=&es...

Thanks I'm going to watch it

serene scaffold Feb 16, 2024, 8:36 PM

#

!code

arctic wedgeBOT Feb 16, 2024, 8:36 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

trail summit Feb 16, 2024, 8:36 PM

#

arctic wedge

?

serene scaffold Feb 16, 2024, 8:37 PM

#

three backticks, not one
but it still isn't clear how this is a python, data science question

trail summit Feb 16, 2024, 8:37 PM

#

ic

#

nonono this is frontend backend still coming

#

one sec

serene scaffold Feb 16, 2024, 8:37 PM

#

okay, well like I've said a few times, you need to ask your whole question in one message.

trail summit Feb 16, 2024, 8:38 PM

#

character limit

#

im so sorry

#

continuation of above code:


  const logPredictedAction = (predictedMovement) => {
    // Customize this logic based on your model's predictions
    console.log("Predicted action: ", predictedMovement);
  };
  

  useEffect(() => {
    _subscribe();
    return () => _unsubscribe();
  }, []);

  let { x, y, z } = data;
  return (
    <View style={styles.container}>
      <Text>Accelerometer:</Text>
      <Text>x: {round(x)} y: {round(y)} z: {round(z)}</Text>
    </View>
  );
}

function round(n) {
  if (!n) {
    return 0;
  }
  return Math.floor(n * 100) / 100;
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: 'center',
    paddingHorizontal: 10,
  },
});```

#

now python:

serene scaffold Feb 16, 2024, 8:38 PM

#

at least get to the python/data science part. so far, it isn't clear why anyone should want to read this code.

trail summit Feb 16, 2024, 8:39 PM

#

server.py:

from flask import Flask, request, jsonify
import joblib
import numpy as np
import pandas as pd
from scipy.signal import butter, lfilter

app = Flask(__name__)

# Load the trained model
model = joblib.load('model.pkl')

# Add your preprocessing and feature extraction functions here
def butter_lowpass(cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    return b, a

def butter_lowpass_filter(data, cutoff, fs, order=5):
    b, a = butter_lowpass(cutoff, fs, order=order)
    y = lfilter(b, a, data)
    return y

def window_data(data, window_size):
    windows = []
    for i in range(0, len(data) - window_size + 1, window_size // 2):
        windows.append(data[i:i+window_size])
    return windows

#


def extract_features(windows):
    features = []
    for window in windows:
        feature = [np.mean(window), np.std(window)]  # Replace with your actual feature extraction process
        features.append(feature)
    return features

@app.route('/predict', methods=['POST'])
def predict():
    # Get the accelerometer data from the request
    data = request.get_json()

    # Apply the low-pass filter to the data
    filtered_data = butter_lowpass_filter(data, cutoff=0.3, fs=50, order=6)

    # Divide the data into windows
    windows = window_data(filtered_data, window_size=128)

    # Extract features from each window
    features = extract_features(windows)

    # Use the model to make a prediction for each window
    predictions = [model.predict([feature]) for feature in features]

    # Return the most common prediction
    prediction = max(set(predictions), key=predictions.count)

    return jsonify({'prediction': int(prediction[0])})

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=True)```

#

model.py:


import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib  # Import joblib for model saving

# Load the training data
X_train = pd.read_csv('UCI HUMAN MOVEMENT DATASET/UCI HAR Dataset/UCI HAR Dataset/train/X_train.txt', delim_whitespace=True, header=None)
y_train = pd.read_csv('UCI HUMAN MOVEMENT DATASET/UCI HAR Dataset/UCI HAR Dataset/train/y_train.txt', delim_whitespace=True, header=None)

# Create a new random forest classifier
rf = RandomForestClassifier()

# Train the model on the training data
rf.fit(X_train, y_train.values.ravel())

# Save the trained model to a file
joblib.dump(rf, 'model.pkl')  # Add this line to save the model

# Load the test data
X_test = pd.read_csv('UCI HUMAN MOVEMENT DATASET/UCI HAR Dataset/UCI HAR Dataset/test/X_test.txt', delim_whitespace=True, header=None)
y_test = pd.read_csv('UCI HUMAN MOVEMENT DATASET/UCI HAR Dataset/UCI HAR Dataset/test/y_test.txt', delim_whitespace=True, header=None)

# Make predictions on the test data
y_pred = rf.predict(X_test)

# Print a classification report
print(classification_report(y_test, y_pred))

serene scaffold Feb 16, 2024, 8:40 PM

#

remember to put a py after the three backticks.

trail summit Feb 16, 2024, 8:40 PM

#

serene scaffold remember to put a `py` after the three backticks.

oh

#

oh i forgot mbmb

#

uh so what im trying to do is use a dataset called:
"Human Activity Recognition Using Smartphones" from the UCI machine learning repository

#

and essentially use the phone's accelerometer and collect the values and use those to print in console the predicted movement/action

#

and here was the description of the dataset:

#

**The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.**

#

thank you once again 🥹

serene scaffold Feb 16, 2024, 8:46 PM

#

trail summit **The experiments have been carried out with a group of 30 volunteers within an ...

what exactly are you asking for help with?

trail summit Feb 16, 2024, 8:46 PM

#

serene scaffold what exactly are you asking for help with?

well when i run my code instead of getting the desired stuff i get:

#

LOG Error details: undefined
LOG Accelerometer data: {"x": 0.1241912841796875, "y": -0.073974609375, "z": -0.999786376953125}
LOG Sending data to server: {"x": 0.1241912841796875, "y": -0.073974609375, "z": -0.999786376953125}

#

only the values outputted to console

serene scaffold Feb 16, 2024, 8:50 PM

#

@trail summit so the model is supposed to tell you when the user transitions between activity classes, right?

trail summit Feb 16, 2024, 8:50 PM

#

yes pretty much

#

:)

serene scaffold Feb 16, 2024, 8:50 PM

#

yes pretty much, or yes?

trail summit Feb 16, 2024, 8:51 PM

#

yes

#

sorry im an idiot

serene scaffold Feb 16, 2024, 8:51 PM

#

no you're not

#

but also, where did you get the idea to use a decision tree to do this?

trail summit Feb 16, 2024, 8:51 PM

#

youtube and friends ;-;

serene scaffold Feb 16, 2024, 8:52 PM

#

can you show what the first few lines of X_train.txt and y_train.txt look like?

trail summit Feb 16, 2024, 8:52 PM

#

yes

#

X_train.txt

#

#

in notepad :/

serene scaffold Feb 16, 2024, 8:53 PM

#

No screenshots.

trail summit Feb 16, 2024, 8:53 PM

#

oh

#

oops

#

brb sry

#

how do i format this?

#

2.8858451e-001 -2.0294171e-002 -1.3290514e-001 -9.9527860e-001 -9.8311061e-001 -9.1352645e-001 -9.9511208e-001 -9.8318457e-001 -9.2352702e-001 -9.3472378e-001 -5.6737807e-001 -7.4441253e-001  8.5294738e-001  6.8584458e-001  8.1426278e-001 -9.6552279e-001 -9.9994465e-001 -9.9986303e-001 -9.9461218e-001 -9.9423081e-001 -9.8761392e-001 -9.4321999e-001 -4.0774707e-001 -6.7933751e-001 -6.0212187e-001  9.2929351e-001 -8.5301114e-001  3.5990976e-001 -5.8526382e-002  2.5689154e-001 -2.2484763e-001  2.6410572e-001 -9.5245630e-002  2.7885143e-001 -4.6508457e-001  4.9193596e-001 -1.9088356e-001  3.7631389e-001  4.3512919e-001  6.6079033e-001  9.6339614e-001 -1.4083968e-001  1.1537494e-001 -9.8524969e-001 -9.8170843e-001 -8.7762497e-001 -9.8500137e-001 -9.8441622e-001 -8.9467735e-001  8.9205451e-001 -1.6126549e-001  1.2465977e-001  9.7743631e-001 -1.2321341e-001  5.6482734e-002 -3.7542596e-001  8.9946864e-001 -9.7090521e-001 -9.7551037e-001 -9.8432539e-001 -9.8884915e-001 -9.1774264e-001 -1.0000000e+000 -1.0000000e+000  1.1380614e-001 -5.9042500e-001  5.9114630e-001 -5.9177346e-001  5.9246928e-001 -7.4544878e-001  7.2086167e-001 -7.1237239e-001  7.1130003e-001 -9.9511159e-001  9.9567491e-001 -9.9566759e-001  9.9165268e-001  5.7022164e-001  4.3902735e-001  9.8691312e-001  7.7996345e-002  5.0008031e-003 -6.7830808e-002 -9.9351906e-001 -9.8835999e-001 -9.9357497e-001 -9.9448763e-001 -9.8620664e-001 -9.9281835e-001 -9.8518010e-001 -9.9199423e-001 -9.9311887e-001  9.8983471e-001  9.9195686e-001  9.9051920e-001 -9.9352201e-001 -9.9993487e-001 -9.9982045e-001 -9.9987846e-001 -9.9436404e-001 -9.8602487e-001 -9.8923361e-001 -8.1994925e-001 -7.9304645e-001 -8.8885295e-001  1.0000000e+000 -2.2074703e-001  6.3683075e-001  3.8764356e-001  2.4140146e-001 -5.2252848e-002

serene scaffold Feb 16, 2024, 8:55 PM

#

```
line 1
line 2
line 3
```

trail summit Feb 16, 2024, 8:55 PM

#

oh

#

k

serene scaffold Feb 16, 2024, 8:55 PM

#

also is this one line?

trail summit Feb 16, 2024, 8:55 PM

#

no

serene scaffold Feb 16, 2024, 8:55 PM

#

I need to know what the structure of the data is

trail summit Feb 16, 2024, 8:56 PM

#

trail summit

it was like this

serene scaffold Feb 16, 2024, 8:56 PM

#

what are the rows and columns. like what do they represent

trail summit Feb 16, 2024, 8:56 PM

#

ah

#

Each row represents a single observation or sample. In the context of this dataset, a sample is a 2.56-second window of time where multiple measurements were taken from the smartphone’s accelerometer and gyroscope.

Each column represents a different feature that has been calculated from the raw accelerometer and gyroscope data. These features are various statistical measures (like mean, standard deviation, etc.) and frequency domain variables that were calculated for each window of data.

is what the university of genova(the dataset makers) said

serene scaffold Feb 16, 2024, 8:58 PM

#

okay. but you don't know what measure each column is?

#

also what about the y data? what does that look like?

trail summit Feb 16, 2024, 8:59 PM

#

serene scaffold also what about the y data? what does that look like?

one huge column

#

like this

#

trail summit Feb 16, 2024, 9:00 PM

#

serene scaffold okay. but you don't know what measure each column is?

oh oh i remembered in the folder i downloaded there was something called features_info.txt

#

and a bunch of other stuff

serene scaffold Feb 16, 2024, 9:01 PM

#

and each number represents a state?
so does the model only need to identify which line is which state, in isolation? or does it need to be able to tell you when someone is switching between states?

trail summit Feb 16, 2024, 9:02 PM

#

"The model is trained to predict the activity (or state) based on the features of each observation. In its basic form, the model treats each observation in isolation and doesn’t consider the sequence of activities. So, it doesn’t inherently know when someone is switching between states."

#

basically

serene scaffold Feb 16, 2024, 9:03 PM

#

Okay, so it doesn't identify state transitions.

serene scaffold Feb 16, 2024, 9:04 PM

#

serene scaffold <@968078189358383144> so the model is supposed to tell you when the user transit...

which is what I was asking about here, just so you know

trail summit Feb 16, 2024, 9:04 PM

#

serene scaffold Okay, so it *doesn't* identify state transitions.

sorry

trail summit Feb 16, 2024, 9:04 PM

#

serene scaffold which is what I was asking about here, just so you know

i realized ;-;

serene scaffold Feb 16, 2024, 9:04 PM

#

what happens when you run model.py by itself, without the react part?

trail summit Feb 16, 2024, 9:04 PM

#

serene scaffold what happens when you run `model.py` by itself, without the react part?

it created model.pkl

serene scaffold Feb 16, 2024, 9:05 PM

#

can you show the printed output of print(classification_report(y_test, y_pred)) as text?

trail summit Feb 16, 2024, 9:05 PM

#

k brb

#

um

#

@serene scaffold should i format it?

#

              precision    recall  f1-score   support

           1       0.90      0.97      0.93       496
           2       0.89      0.91      0.90       471
           3       0.96      0.85      0.90       420
           4       0.91      0.89      0.90       491
           5       0.90      0.92      0.91       532
           6       1.00      1.00      1.00       537

    accuracy                           0.93      2947
   macro avg       0.93      0.92      0.92      2947
weighted avg       0.93      0.93      0.93      2947

#

k its like this

serene scaffold Feb 16, 2024, 9:08 PM

#

I have good news and bad news for you

trail summit Feb 16, 2024, 9:08 PM

#

o no

serene scaffold Feb 16, 2024, 9:08 PM

#

the good news is that this is great model performance

#

the bad news is that the python code works correctly, which means that the problem is only with the javascript code, so you'll have to ask somewhere else.

trail summit Feb 16, 2024, 9:09 PM

#

noooooooooooooooo

#

i never liked js ;-;

#

but

#

thank you so so much!

#

I really appreciate it

serene scaffold Feb 16, 2024, 9:09 PM

#

are you a member of the js server?

trail summit Feb 16, 2024, 9:10 PM

#

no

serene scaffold Feb 16, 2024, 9:10 PM

#

lms if I can find the invite

trail summit Feb 16, 2024, 9:10 PM

#

thx :D

serene scaffold Feb 16, 2024, 9:10 PM

#

https://discord.gg/dAF4F28

trail summit Feb 16, 2024, 9:10 PM

#

yay tysmm

serene scaffold Feb 16, 2024, 9:10 PM

#

@trail summit just remember what I said about how to ask questions effectively. it will increase your chances of getting help quickly in the future.

trail summit Feb 16, 2024, 9:11 PM

#

serene scaffold <@968078189358383144> just remember what I said about how to ask questions effec...

I will :D

thank you again!

serene scaffold Feb 16, 2024, 9:11 PM

#

serene scaffold but also, where did you get the idea to use a decision tree to do this?

also, decision trees can't identify state transitions, so that's why I was confused

#

I thought maybe someone was bullshitting you

trail summit Feb 16, 2024, 9:11 PM

#

serene scaffold also, decision trees can't identify state transitions, so that's why I was confu...

Sorry about that lol

serene scaffold Feb 16, 2024, 9:11 PM

#

it's okay

trail summit Feb 16, 2024, 9:12 PM

#

serene scaffold I thought maybe someone was bullshitting you

XD

#

I wouldn't even know at this stage

serene scaffold Feb 16, 2024, 9:13 PM

#

trail summit I wouldn't even know at this stage

as you continue learning, keep this question in mind: "what are state transitions, and why can't decision trees detect them?"
that's actually two questions
eventually you'll figure it out.

agile owl Feb 16, 2024, 9:14 PM

#

Exception has occurred: AttributeError
'ArrowExtensionArray' object has no attribute 'to_pydatetime'

does someone have a fix for this

#

I just updated my OS and now my code doesn't run anymore

#

It's on writing to DB with pandas

serene scaffold Feb 16, 2024, 9:14 PM

#

did you accidentally switch python environments?

agile owl Feb 16, 2024, 9:14 PM

#

I had to switch python environments

#

something changed where my old venv didn't work after the upgrade

#

so I had to make a new one

serene scaffold Feb 16, 2024, 9:15 PM

#

okay, so you might not have the same pyarrow version that you had before

agile owl Feb 16, 2024, 9:15 PM

#

upgrading was probably a bad idea but I wanted to try to upgrade my GPU driver

serene scaffold Feb 16, 2024, 9:15 PM

#

do python -m pip freeze in both and compare

agile owl Feb 16, 2024, 9:25 PM

#

I'm in (venv) (base) (venv) hell

#

but I think they were on the same version (15)

#

pandas version was off by a minor version

#

still doesn't work, so something is strange

final kiln Feb 16, 2024, 9:29 PM

#

Dev containers ftw

agile owl Feb 16, 2024, 9:32 PM

#

ok I just installed the old env packages lets see if it works now

#

nope

#

lol

agile owl Feb 16, 2024, 9:47 PM

#

            # ensure conversion to pandas uses the pyarrow extension array option
            # so that we can make use of the sql/db export *without* copying data
            res: int | None = self.to_pandas(
                use_pyarrow_extension_array=True,
            ).to_sql(
                name=unpacked_table_name,
                schema=db_schema,
                con=engine_sa,
                if_exists=if_table_exists,
                index=False,
            )
            return -1 if res is None else res
        else:
            msg = f"engine {engine!r} is not supported"
            raise ValueError(msg)

#

do you understand what this comment means

#

it's in the polars source code

#

I just set it to false screw it

#

works now

#

i don't care if the write is expensive

#

I just need the read to be fast

final kiln Feb 16, 2024, 10:20 PM

#

I don't know anything about pyarrow

left tartan Feb 16, 2024, 11:41 PM

#

agile owl do you understand what this comment means

What about it?

agile owl Feb 16, 2024, 11:48 PM

#

why would it copy data otherwise

left tartan Feb 17, 2024, 1:09 AM

#

agile owl why would it copy data otherwise

That’s basically a key point of pyarrow; that pyarrow tables can be referenced by multiple engines without copying any data… they can even be passed between runtimes.

#

But pandas also supports numpy data types, which is completely different than pyarrow data types and is the default

trail summit Feb 17, 2024, 1:13 AM

#

Hi @serene scaffold

#

(sorry for the ping)

serene scaffold Feb 17, 2024, 1:13 AM

#

whAT

trail summit Feb 17, 2024, 1:13 AM

#

so I came to the conclusion

#

that I can't do react native ;-;

#

it's better to stick with python for both frontend and backend

#

uh if you remember my goal from earlier today

#

can you please give me advice on how to go about it?

serene scaffold Feb 17, 2024, 1:15 AM

#

Don't ask if someone will do something based on information that you haven't yet provided. Give all the information, and invite anyone to help.

trail summit Feb 17, 2024, 1:15 AM

#

?

serene scaffold Feb 17, 2024, 1:15 AM

#

!

trail summit Feb 17, 2024, 1:15 AM

#

?!

#

lol

serene scaffold Feb 17, 2024, 1:16 AM

#

‽

trail summit Feb 17, 2024, 1:16 AM

#

serene scaffold ‽

:o

#

copy paste or is there a way to do that

#

lol

serene scaffold Feb 17, 2024, 1:16 AM

#

https://en.wikipedia.org/wiki/Interrobang

Interrobang

The interrobang (), also known as the interabang ‽ (often represented by any of ?!, !?, ?!?,?!!, !?? or !?!), is an unconventional punctuation mark intended to combine the functions of the question mark (also known as the interrogative point) and the exclamation mark (also known in the jargon of printers and programmers as a "bang"). The glyph i...

trail summit Feb 17, 2024, 1:16 AM

#

OO

#

wow

#

u can combine them.

#

crazy 😭

#

how doooo I dooooooo thissssss

#

its like im drowning in confusion xd

serene scaffold Feb 17, 2024, 1:18 AM

#

trail summit how doooo I dooooooo thissssss

put yourself in the shoes of the person who's helping you. what do they need to know to start helping you?

trail summit Feb 17, 2024, 1:19 AM

#

serene scaffold put yourself in the shoes of the person who's helping you. what do they need to ...

everything

serene scaffold Feb 17, 2024, 1:20 AM

#

trail summit *everything*

let me restate: what would that person need to know about what you're doing to start helping you?

#

because I don't have access to your computer and idk what you've been doing since the last time we talked.

trail summit Feb 17, 2024, 1:20 AM

#

serene scaffold because I don't have access to your computer and idk what you've been doing sinc...

crying

serene scaffold Feb 17, 2024, 1:20 AM

#

I feel that

trail summit Feb 17, 2024, 1:20 AM

#

and compromising

#

mostly

trail summit Feb 17, 2024, 1:20 AM

#

serene scaffold I feel that

;-;-;-;

#

well, in the end all I want to do is take a dataset like MotionSense dataset and train a model with it and somehow implement it in a way that the device using the app(phone) uses its accelerometer and gyroscope and etc. along with that model and like it displays in console the action happening

#

like

#

idk:
WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING

#

stel?

#

k im not going to ping

#

bro when u see this can you please ping me

#

thx

serene scaffold Feb 17, 2024, 1:27 AM

#

I'm busy irl. I'll get to this if I can. but someone else might be able to help as well.

trail summit Feb 17, 2024, 1:28 AM

#

serene scaffold I'm busy irl. I'll get to this if I can. but someone else might be able to help ...

np it can wait

#

thanks

#

gtg anyways cya :)

serene scaffold Feb 17, 2024, 3:15 AM

#

trail summit well, in the end all I want to do is take a dataset like MotionSense dataset and...

you need to know what all the sensory inputs are that your model is using. and then figure out how to request those readings on Android and iOS.

#

but that's beyond the scope of this channel.

errant pivot Feb 17, 2024, 7:55 AM

#

good morning , for my project on Large language model , i need to know any good demo ?

serene scaffold Feb 17, 2024, 10:25 AM

#

errant pivot good morning , for my project on Large language model , i need to know any good ...

what are you trying to do

dry geyser Feb 17, 2024, 10:44 AM

#

is there anything "ready made" that allows efficient mapping of a set of column values into key-values? I would like to benchmark doing that outside of polars. right now I built a expr chain class that allows me to "cheat" around the problem by coercing sets of columns into structs, and then selecting these alone and converting the result to a dict

serene scaffold Feb 17, 2024, 10:50 AM

#

dry geyser is there anything "ready made" that allows efficient mapping of a set of column ...

# pandas DataFrame
df[['a', 'b']].set_index('a').squeeze().to_dict()

dry geyser Feb 17, 2024, 10:52 AM

#

lemme check

#

lemme rephrase the question

#

imagine we already have a dict of column names -> column values, I have also written my own expr builder so I can (if i want to load the polars parsing heavily...) coerce the desired columns into a struct for a new column. suppose I want FOOBAR to be a composite of X,Y,Z columns. right now i do this via the expr chain i build. how can I "displace" that specific step into something done after iter_rows?

#

can you provide me a sample input for the df?

#

from ast import Dict
import polars as pl
from pprint import pprint as pp
from typing import Dict

def expr_concat_columns_unique(new_column_name, columns):
    return pl.concat_list(
        [
            pl.when(
                pl.col(column).is_not_null()
            ).then(
                pl.col(column)
            ).otherwise(
                pl.lit(None)
            ).alias(column)
        
            for column in columns
        ]
        ).list.drop_nulls().list.unique().alias(new_column_name)

def expr_structured_column_with_mappings(name: str, mapping: Dict) -> pl.Expr:
    return pl.concat_list(
        pl.struct(
        {
        new_key: pl.col(original_column) for new_key, original_column in mapping.items()
        }
    ).struct.rename_fields(
        list(mapping.values())
    
    )).list.drop_nulls().list.unique().alias(name)
    
struct_mapping = {
    "foobar": {
        'foo' : "from_foo",
        'bar' : "from_bar"
    },
}

df = pl.DataFrame({
    'foo': ['xyz', 'blah' ],
    'bar': ['zyx', 'bleh' ],
})


exprs = []
exprs += expr_structured_column_with_mappings("foobars", struct_mapping['foobar'])

new_df = df.with_columns(exprs)

print(new_df)

#

this is how im doing it now

#

but it puts a significant strain on polars' engine, which handles it fine, but it does have quite some ram backpressure

agile owl Feb 17, 2024, 10:57 AM

#

"ram backpressure?"

#

sounds dangerous, be careful

dry geyser Feb 17, 2024, 10:59 AM

#

@agile owl dude it's too early to start trolling

#

relax

#

;P

#

ram backpressure = the scan_csv op no longer seems larger-than-ram friendly

#

so in other words, consumption shoots up. at least in virt addr space if you are pedantic.

#

didnt measure actual effective occupied memory...

#

(good luck using something like valgrind while processing a 10mil row csv file)

agile owl Feb 17, 2024, 11:04 AM

#

I wanted to try to help you but I don't understand what your problem is sorry. what do you mean by done after iter_rows? btw polars is meant to be used with the Lazy API most of the time that's where it gets its optimization benefits from but it looks like you're just using a normal eager dataframe

dry geyser Feb 17, 2024, 11:05 AM

#

so, im trying to validate and coerce as much data as possible into the actual ingestion schema for elastic

#

i made an expr "compiler" that takes my configuration (tl;dr "create these key-value mappings from the CSV rows, apply some transforms, inject some static values") and I apply it to the lazyframe returned from scan_csv

#

taking a CSV row with a set of columns i need a final dict/document for elastic as { 'somekey': { fields: ...mapped values from CSV row/dict }, 'anotherkey' .... )

#

as an experiment i built all that using exprs, but polars is then forced to allocate new data for every row in the dataset

final kiln Feb 17, 2024, 12:39 PM

#

Overfiting is non existent

#

I've noticed that single head single layer transformer works better

#

Due to performance mainly

#

I might have to look into my self attention implementation

#

And use a non learnable positional encoding to reduce the number of gradients per mini batch index

#

The size of the embeddings seems to matter quite a lot, more than the number of heads or number of blocks

#

This at small scales with little compute, I'm sure the story is different if you can do gradient accumulation across several GPUS

#

Of 80Gb mem each

#

Changing regions is time consuming, and the current one only provides this 16gb machine

#

I can do federated training to get to a 32gb GPU, but at that scale might as well just tank the extra iteration on my training loop instead of having to collect results through a network

dry geyser Feb 17, 2024, 3:10 PM

#

anyone has recommendations for handling parquet io (writing) from multiple Process(es)?

#

im using polars already so perhaps I can just output to parquet as is

versed pilot Feb 17, 2024, 3:20 PM

#

there is a pandas .to_parquet() method, is there a polars equivalent? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html

agile owl Feb 17, 2024, 4:12 PM

#

is there a handy replacement for ffill() in polars

past meteor Feb 17, 2024, 4:52 PM

#

agile owl is there a handy replacement for ffill() in polars

forward_fill

agile owl Feb 17, 2024, 4:53 PM

#

"DataFrame" has no attribute "forward_fill"

#

hmm

#

    test_df = (
        pl.read_csv("data/master.csv", try_parse_dates=True)
        .with_columns(pl.col("date").cast(pl.Date).alias("date"))
        .drop("")
        .sort("date", "ticker")   
        .forward_fill()    #   <---- typechecker/intellisense doesn't pick up method
    )

past meteor Feb 17, 2024, 4:54 PM

#

it's an expression

#

Well, you do it on an expression

#

So you forward fill on a date for instance

agile owl Feb 17, 2024, 4:55 PM

#

so I can't just do it across all columns

past meteor Feb 17, 2024, 4:55 PM

#

I don't know by heart but I think selectors are expressions so you could try cs.all().forward_fill()

agile owl Feb 17, 2024, 4:55 PM

#

gotcha

past meteor Feb 17, 2024, 4:56 PM

#

Or maybe they return expressions

#

lmk if it worked

agile owl Feb 17, 2024, 5:00 PM

#

    test_df = (
        pl.read_csv("data/master.csv", try_parse_dates=True)
        .with_columns(pl.col("date").cast(pl.Date).alias("date"))
        .drop("")
        .sort("date", "ticker")
    )
    test_df = test_df.select(cs.all().forward_fill())

so like this?

#

seems to be working

#

┌─────────────────────┬────────┬────────┬─────────┬───┬─────────────────┬─────────────────┬──────────┬───────────┐
│ date                ┆ ticker ┆ open   ┆ high    ┆ … ┆ IRLTLT01JPM156N ┆ IRLTLT01GBM156N ┆ WTISPLC  ┆ DEXCAUS   │
│ ---                 ┆ ---    ┆ ---    ┆ ---     ┆   ┆ ---             ┆ ---             ┆ ---      ┆ ---       │
│ datetime[μs]        ┆ str    ┆ f64    ┆ f64     ┆   ┆ f64             ┆ f64             ┆ f64      ┆ f64       │
╞═════════════════════╪════════╪════════╪═════════╪═══╪═════════════════╪═════════════════╪══════════╪═══════════╡
│ 2021-03-01 00:00:00 ┆ ABBV   ┆ 108.53 ┆ 109.21  ┆ … ┆ 1.684268        ┆ 1.029709        ┆ 1.226966 ┆ -1.790026 │
│ 2021-03-01 00:00:00 ┆ ACB    ┆ 10.84  ┆ 11.41   ┆ … ┆ 1.684268        ┆ 1.029709        ┆ 1.226966 ┆ -1.790026 │
│ 2021-03-01 00:00:00 ┆ ALKS   ┆ 19.2   ┆ 19.605  ┆ … ┆ 1.684268        ┆ 1.029709        ┆ 1.226966 ┆ -1.790026 │
│ 2021-03-01 00:00:00 ┆ AMGN   ┆ 225.88 ┆ 227.929 ┆ … ┆ 1.684268        ┆ 1.029709        ┆ 1.226966 ┆ -1.790026 │
│ 2021-03-01 00:00:00 ┆ AMPH   ┆ 17.79  ┆ 18.03   ┆ … ┆ 1.684268        ┆ 1.029709        ┆ 1.226966 ┆ -1.790026 │
│ …                   ┆ …      ┆ …      ┆ …       ┆ … ┆ …               ┆ …               ┆ …        ┆ …         │
└─────────────────────┴────────┴────────┴─────────┴───┴─────────────────┴─────────────────┴──────────┴───────────┘
shape: (10_320, 165)

Here's something super annoying: it uses pandas.to_sql and the conversion to pandas converts my date column to datetime microseconds so I can't get the right date I want without manually editing the db column

#

I might as well just write the insert myself I guess

agile cobalt Feb 17, 2024, 5:10 PM

#

agile owl ```py ┌─────────────────────┬────────┬────────┬─────────┬───┬─────────────────┬─...

Have you tried using the ADBC engine instead of SQLAlchemy?

agile owl Feb 17, 2024, 5:10 PM

#

no

#

what's the library to use it?

agile cobalt Feb 17, 2024, 5:11 PM

#

see https://docs.pola.rs/user-guide/io/database/#adbc

dry geyser Feb 17, 2024, 5:45 PM

#

@versed pilot there seems to be a parquet sink

pulsar elk Feb 17, 2024, 5:48 PM

#

Hello can someone tell me which course should i pursue to get into ai/ml or any data job in india or outside or which should i pursue

#

because getting in industry with this field i think is tough

dusty forge Feb 17, 2024, 5:52 PM

#

I started with Andrew Ng's course and even after only the first regression videos, I feel smarter already 🤣

pulsar elk Feb 17, 2024, 5:53 PM

#

dusty forge I started with Andrew Ng's course and even after only the first regression video...

Well i know it will need learning and some projects experiment but can you help me like for job assistance what should i approch

#

any suggestions please?

versed pilot Feb 17, 2024, 6:22 PM

#

dry geyser <@1114957477889441822> there seems to be a parquet sink

ok I haven't used polars, I understand that it is different from Pandas both in syntax and underlying technology

dry geyser Feb 17, 2024, 6:30 PM

#

yes

#

ill let you know how that works out

#

now trying to solve an issue

#

i cant seem to be able to filter null columns

trail summit Feb 17, 2024, 6:53 PM

#

serene scaffold but that's beyond the scope of this channel.

ic

#

welp thx

vocal cove Feb 17, 2024, 7:06 PM

#

Greetings guys
Hope all are well. Anyone here who is familiar with iTensor library in Julia?
Kindly let me know.

crisp raptor Feb 17, 2024, 7:11 PM

#

vocal cove Greetings guys Hope all are well. Anyone here who is familiar with iTensor libra...

No

final kiln Feb 17, 2024, 7:12 PM

#

#

I need help

#

Left side are actual train loss

crisp raptor Feb 17, 2024, 7:12 PM

#

final kiln

Looks to me like you got it...

final kiln Feb 17, 2024, 7:13 PM

#

Right side is the smoothened thing

crisp raptor Feb 17, 2024, 7:13 PM

#

final kiln Right side is the smoothened thing

Oh

final kiln Feb 17, 2024, 7:13 PM

#

crisp raptor Looks to me like you got it...

I know I'm narrowing down

#

So like, if I let these go for like 3 days they'll converge, almost certainly without over fitting, been keeping an eye on eval loss, eval acc, eval f1 etc

#

Those graphs are different runs, leftmost graphs are larger batches

#

They run for an epoch each

#

So, smaller batches converge faster but their loss graph is extremely chaotic

#

I don't care for their loss graph if at the end I still get a model I can put into production

#

My question is then, do I care for the chaotic graph ?

#

The loss graph itself looks fine as far as I can tell, that's how a loss graph of a transformer training on text looks like

sand grove Feb 17, 2024, 10:01 PM

#

do you need to understand data analytics to do data science

#

who's a data scientist that can guide\ me

shy grove Feb 17, 2024, 10:06 PM

#

I need some help with a classification algorithm for sentiment of sentences. If anyone is able to help, please check my post in #1035199133436354600 for more information about it

rose plume Feb 18, 2024, 12:11 AM

#

Good day everyone

I have a problem in term of learning. Actually I spend so much time on learning process, but I don't get result and in some case I get even panic attack.

Another problem is that I can not make a routine for myself. Right now it comes to my mind to ask you how do you study?

At wich hour are you starting and what time do you finish? Can you please give me some advice how to learn? Or how to manage my time?

shut girder Feb 18, 2024, 12:20 AM

#

rose plume Good day everyone I have a problem in term of learning. Actually I spend so muc...

I'm not very experienced in this field of study, however when it comes to learning, I try to break down what I need to learn. By breaking something down, it becomes more feasible to learn each part.

For example: I am currently learning to improve my ability to explore data by not only attempting to grasp the essence of EDA, but also the techniques used within EDA. Data cleaning can be used in conjunction with the EDA process. And because duplicated must be cleansed depending on the context, I must first learn how to identify these duplicates. So I ask myself, "what graphs can I use to indentify these duplicates." I then research on that and once I find a graph that seems efficient to me, I then learn how to actually create that graph using a tool. I personally use Python for this, so I would then read matplotlib documentation on that said graph.

In terms of motivation to learning, I make time out of the day; About 1-2 hours of studying and practice before going to bed. I gain this motivation by reading articles on data science, watching YouTube videos on data science, or even talking about data science. These methods really get me motivated to learn.

I also recommend referring to a resource fitting to you as long as it is trusted. This Discord server is also great when it comes to help and resources

left tartan Feb 18, 2024, 12:23 AM

#

shut girder I'm not very experienced in this field of study, however when it comes to learni...

My favorite EDA resource is: https://www.itl.nist.gov/div898/handbook/

#

It’s long but very thorough and no single part too difficult

shut girder Feb 18, 2024, 12:24 AM

#

Thanks, I will take a look at it

rose plume Feb 18, 2024, 12:33 AM

#

shut girder I'm not very experienced in this field of study, however when it comes to learni...

Thank you.
What are your study hour?
I mean when do you start and finish?

shut girder Feb 18, 2024, 12:37 AM

#

rose plume Thank you. What are your study hour? I mean when do you start and finish?

I usually begin around 8:00 pm all the way to somewhere around 10:00 pm. I usually do not go to sleep that late though. You should choose a time comfortable to you and fits into your schedule

left tartan Feb 18, 2024, 12:48 AM

#

rose plume Good day everyone I have a problem in term of learning. Actually I spend so muc...

I think ‘time’ is a depressing measure for studying. Instead, consider making a list of topics and making a point to cross off one topic a day. Then you can feel measurably accomplished

tacit basin Feb 18, 2024, 2:58 AM

#

rose plume Good day everyone I have a problem in term of learning. Actually I spend so muc...

Switching social media off may help? Big distraction for me.

magic dune Feb 18, 2024, 2:58 AM

#

!toopic

#

!topic

#

.topic

strange elbowBOT Feb 18, 2024, 2:58 AM

#

**No topics found for this channel.**

Suggest more topics here!

errant pivot Feb 18, 2024, 6:33 AM

#

serene scaffold what are you trying to do

need to show some demo , text generation based examples should be fine

slim turret Feb 18, 2024, 6:55 AM

#

hello, anybody knows any good datasets libraries?

gritty vessel Feb 18, 2024, 7:03 AM

#

hey everyone any idea where i amgoing wrong i am getting very high accuracy here

#

which should not be the case here

#

i splited data in 80 -20% and i believe i am not leaking data to model before hand also

#

its tike series forecasting if anypone want to know