#data-science-and-ml

1 messages ยท Page 1 of 1 (latest)

viscid bridge
#

still not clear can i dm ?

wooden sail
#

i'd rather not dm. you can replace your for loop involving W_val_hs with flat_gt == 255. this returns a vector booleans of the same size as flat_gt, and it has True wherever flat_gt[i] == 255. the other entries are false

viscid bridge
#

but then how do i only append those selective value to list. considering this example.

wooden sail
#

what i'm saying is you don't need to do that. appending to arrays is a bad idea and is slow

#

this array of booleans can be used as indices already

viscid bridge
#

give some time to digest this fact

wooden sail
#

this is what i mean

In [10]: x = np.array([1,2,3,4,5,6,7,8,9])

In [11]: indices = (x > 4)

In [12]: indices
Out[12]: array([False, False, False, False,  True,  True,  True,  True,  True])

In [13]: x[indices]
Out[13]: array([5, 6, 7, 8, 9])

In [14]: x[x>4]
Out[14]: array([5, 6, 7, 8, 9])

there are very few scenarios in which this doesn't work

viscid bridge
#

so 10 to 13 is a approach and 14 in itself is a single line solution right ?

#

10 to 13 is basically the explanation of is't working?

wooden sail
#

exactly

viscid bridge
#

ok. Got it now .

#

Though thanks you for time.

wooden sail
#

all good

nimble glacier
#

Im a react native Developer. I learned it through tutorial.
I want to switch to data science and ML im in college 3rd year .
What should i do and learn. I hv no experience . I live in india

prime hearth
#

@nimble glacier it helps to have a plan and pathway, can watch krish naik youtube channel on datascience roadmap is what the video is called , there you will learn what you need to learn

#

Can even google datascience roadmap 2022

#

Krish naik i recommend his channel since he is self taught

#

And got into ML from software dev

#

He has lots of playlist tutorials for ML and datascience

nimble glacier
#

I did i know a python basic . I Want to know some yt channel or course if u can recommend .

spring mortar
#

Question regarding AI and quality metrics. As I understand, there is no guarantee that if I have trained for n epochs, the result will be better in epoch n+1, right? n+1 being slightly worse might be an option while n+2 can be better again when it comes to e.g. IoU or F-Score? Maybe @serene scaffold has an idea?

prime hearth
#

Thats good to know python basic, you want to learn pandas and numpy

#

Thats the next step

#

Because you will need to learn how to open datasets and manopulate it and arrays

#

There are lots of tutorials on this so just pick any , i recommend ones that are more recent like 2022 tutorials

nimble glacier
#

Thank u thats really helpful

prime hearth
#

For pandas you want to learn like .iloc ,.loc , how to add data, remove, apply , edit data and merge datasets etc

nimble glacier
#

That should i keep in mind while learning

prime hearth
#

Yeah

#

There is a good tutorial i think it is called pandas tutorial by keith gali

#

And numpy tutorial by freeCodeCamp on youtube

#

That video is also done by Keith Gali

serene scaffold
limber token
#

Why tf is Pandas so slow? I just converted a program that was iterating through dfs that took >90min to transforming the dfs into dicts and now it takes ~5min

serene scaffold
#

if you don't write idiomatic pandas code, it's not going to be fast.

limber token
serene scaffold
#

pandas does iterative procedures in C code (which means that part of the code can't be expressed as a Python loop), and that is what makes it fast.

#

also, if you were appending to a dataframe, that runs in quadratic time.

limber token
#

I wasn't, was running this: (skus_B and columns_B are both lists)

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

limber token
#

No worries, my bad:

for curr_sku in tqdm(skus_B):
    for column_name in (columns_B):
        valueA = fileA[fileA['sku'] == curr_sku][column_name].values[0]
        valueB = fileB[fileB['sku'] == curr_sku][column_name].values[0]
serene scaffold
limber token
#

tqdm is just a lib for checking progress of loops, it prints stuff like this:

#

How do you suggest using groupby?

serene scaffold
#
fileA.groupby('sku').head(1)

that will give you one row of fileA for each unique value in the sku column. and then you don't have to loop over columns_B, because you have every column in that new dataframe.

ocean swallow
#

why using only mask tokens to generate texts breaks BERT

limber token
#

This is what the files look like:

serene scaffold
#

because seeing if something is in a set is O(1). so the whole process with just be O(n).

#

you can do step 2 with isna

#

provided that you set the SKU as the index for both

limber token
limber token
serene scaffold
#

even though the instructions say "compare each column", you can pretty much forget that it says that. because if you do operations between two DFs, it will already align on column names

#

@limber token "na" (or NaN) is a missing value. and you want to see which values are NaN in file B, but not in file A

#

so it's just boolean logic.

limber token
serene scaffold
limber token
#

| is or, & is and

#

~ I don't know

serene scaffold
#

right. and ~ is not

#

which also means that ~ is a unary operator, whereas the other two are binary.

#

so, if you do df.isna(), you get a dataframe where every element is a bool. and you get True if it was NaN in df, else False

#

so can you think of what pandas expression would give you "the values are NaN in fileB, but not in fileA"?

limber token
#

fileB.isna() & ~fileA.isna()?

serene scaffold
#

looks right to me

#

try that ๐Ÿ˜„

limber token
#

Thank you so much ๐Ÿ™‚

rapid cedar
#

why is there little to no tutorial on machine learning?

unique flame
#

but there is

serene scaffold
candid garnet
#

Hi!
I'm working with a h5 file output from a ferromagnetic resonance experiment.

I have three arrays:
current - shape = (39,)
frequency - shape = (5001,)
amplitude - shape = (39, 5001)

I am trying to visualise them in a surface plot (current x, frequency y and amplitude z) and having a bit of a nightmare.

I made a meshgrid (x,y) with the current and frequency arrays and then just tried to surface plot with (x,y,z) but I'm obviously doing something fundamentally wrong.

serene scaffold
#

(the code should be a markdown block, but the plot can be a screenshot)

wooden sail
#

the most likely issue is that the cartesian product was done in the opposite order as the coordinates of the amplitude array, but yeah, we'd need to see how you did this

candid garnet
#
ax = plt.axes(projection='3d')
ax.plot_surface(current_mesh,frequency,amplitudes)```
#

ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5001, 40) and arg 2 with shape (39, 5001).

#

oh wait current mesh has a shape of (5001,40) for some reason

wooden sail
#

can you print out the shapes of current, frequency, amplitude, current_mesh, and frequency after the meshgrid?

candid garnet
#
frequencies shape is (5001,) 
 amplitudes shape is (81, 5001)
 current_mesh shape is (16203240, 81), 
frequency_mesh shape is (16203240, 81)```
#

sorry my supervisor gave me a new dataset to work with in between these messages

#

same structure though

wooden sail
#

the weird thing is that the number you have there equals 40 * 81 * 5001

#

are you by any chance using a jupyter notebook?

candid garnet
#

indeed

wooden sail
#

i bet you forgot to restart the runtime. jupyter is super bad for doing this type of work

#

idk why people swear by it. nondeterministic code results? no thanks

#

the issue is you stored the result of the frequency cartesian product in the same variable name. when you ran that cell again, you stacked cartesian products

candid garnet
#
frequencies shape is (5001,) 
 amplitudes shape is (81, 5001)
 current_mesh shape is (5001, 81), 
frequency_mesh shape is (5001, 81)```
wooden sail
#

you need to rerun the whole thing. then do yourself a favor and don't use jupyter in the future ๐Ÿ˜›

serene scaffold
#

why did you do angrysad

wooden sail
#

because many people disagree. random blog posts without quality checks have made jupyter king

candid garnet
#
ax.plot_surface(current_mesh, frequency_mesh, amplitudes)```

```ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (5001, 81) and arg 2 with shape (81, 5001).```
#

do i need to transpose one

mint palm
#

where can i learn about using gpu in university labs remotely

serene scaffold
#

CUDA is the API that pytorch et al use to interact with the GPU

#

did you already ssh into the machine that has a GPU?

mint palm
#

i am using tendorflow with CUDA, after installastion nothing really happened, accept the prompts for "gpu...., cuda not .... " were gone

candid garnet
#

nice

mint palm
#

can i learn that with tensorflow too?

hollow sentinel
wooden sail
#

jupyter is ok if you want to make short presentations. it exports to PDF and other formats very easily, so you can make slidesets from it effortlessly. but please don't do serious dev there, debugging is hard enough without setting traps everywhere for yourself

serene scaffold
hollow sentinel
#

what if i like the extra challenge

#

jk it's not worth it

serene scaffold
hollow sentinel
#

i don't even think i got the job tho... i wouldn't be able to do the job even if i did

autumn glade
#

Hello, I want to be able to look at an image of a plate full of food with different food items and tell whether a set of predefined food items exist in it or not (It should be independent of the position of the individual food items) . Is there a model that i can use abstractly for achieving this? (please ping when replying)

steady basalt
#

but yeah, its a cv task

autumn glade
#

its possible though, right?

steady basalt
#

if u have a shit load of training images

autumn glade
#

with a reasonable amount of accuracy

#

hmm

steady basalt
#

some nerds here are gona say 'well u casn make syntehtic data' but the truth is ur gona need to mask food items by hand and label them

#

i recommend against it

#

unless of course u have a method to label otherwise

autumn glade
#

how do i label

#

i do have some images

steady basalt
#

u will need probabvly tens of thousands

#

this isnt rly a 1 man project

#

this is the kind thing that google will put out in a years time

#

hey look at this plate, these is what is on the plate: carrots, beans and potato

#

something that good sounds like a research teams projcet

#

im sure u can find food datasets out there but idk on a plate

wooden sail
autumn glade
wooden sail
#

idk how widely available their models are though, idk how exactly they do it

autumn glade
#

hmm

autumn glade
steady basalt
autumn glade
#

but its already trained

wooden sail
#

there's also onmyplate

autumn glade
#

i need to train it with images I send

wooden sail
#

indeed, idk if their models are even available somewhere

#

if their model is available, you could consider transfer learning

autumn glade
#

ow

wooden sail
#

remove a few layers of theirs, keep their parameters fixed, and add a few of your own trainable layers

autumn glade
#

unfortunately idk how any of this works lol

#

although i can learn, would be easier if i can use something abstract

mint palm
#

while using autoencoder for denoising/spare representation, as both side contains original image, how does model learns to generate the object of interest rather then training to generate spare representation to learns to reproduce noise?

#

is it because most of the time, info relating to object of interest >>>>info relating to noise
and while getting rid of some data both are equally penalized but what remains is is the object of interest??

rapid cedar
opal sluice
#

Hi! I'm a beginner in ML, and I have a question. If the resources allow it, would it be better to learn how, and setup a dedicated GPU to train models or use paid online GPU servers?

mild dirge
#

Well if you are a beginner in ML you are probably still learning about the basics

#

And I doubt you would need to run a 100 million parameter model on your own gpu anytime soon

lapis sequoia
#

anyone know how to make a restaurant menu into a json using image to text recognition would i have to train a ai? any ideas

opal sluice
lapis sequoia
#

anyone good at ai i got some questions

serene scaffold
rich olive
#

Guys im tryna sort a dataframe by a function of one of its columns but each value has to check each other value in the column and the dataset is far too large for that. Specifically, each value must check how many times it occurs in the column. How would you go about this?

serene scaffold
#

hmm, that actually won't work

rich olive
#

holy, key seems useful

serene scaffold
#

yeah, but what I did was wrong. the key has to return a Series of the same length as the one you passed to it

rich olive
#

ah

serene scaffold
#

and the value counts of a series will not be that.

#

I'm still thinking.

rich olive
#

thats okay Ive been thinking for a day and a half. take your time

rich olive
#

kk

#

thx

serene scaffold
#

yw

#

@rich olive did you figure it out

rich olive
#

yeah Im actually not worried about sorting the original df so using value_counts on the series was good enough

#

but now I think ill try and make that work just cause

#

doesnt seem to work as soon as i try to reindex it

serene scaffold
rich olive
#
def common_answers(data):
    return data.Answer.value_counts().reindex(data)
#

oh i have to probably sort after i reindex it

serene scaffold
#

also, I would do data['Answer'] instead of data.Answer

rich olive
#

it goes through my dataframe and reports the frequency of answers in descending order

#

why no dot notation

serene scaffold
# rich olive why no dot notation

dot notation doesn't work for column names that aren't valid as attribute names. and a lot of people find that use of dot notation to be jarring and ugly.

#

(some people wish they'd just remove it from pandas)

rich olive
#

what does the style guide say lol

serene scaffold
#

well, the python style guide would say to never name an attribute with a leading capital letter.

#

and it doesn't address ways that one could overload __getitem__

rich olive
#

thats a backwards application of the rule but i respect it

serene scaffold
#

anyway

#

you don't need to sort the dataframe for what you said

rich olive
#

yea if I figure this out cool if not whatev

serene scaffold
#

you still need to call sort_values at some point.

#

hmm, looks like my second approach doesn't work either

#

๐Ÿ˜ฆ

rich olive
#

its not reindexing properly idk whats happening to the values count

serene scaffold
rich olive
#

I do not understand how it is reindexing

serene scaffold
#

even though geeks4geeks is a very terrible website

rich olive
#

i expect to see the value count as the index for the dataframe when i reindex it but it is not

serene scaffold
rich olive
#

thats what the output looks like on the paste you sent\

serene scaffold
rich olive
#

what is rohan im talking about this

serene scaffold
rich olive
#

fair kk gonna learn this one then

serene scaffold
#

I don't usually tell people to forget what I say. most of what I say is great.

solar yew
#

What results should I hope to achieve with NLP classification? Iโ€™ve been playing with a lot of hyperparam tuning and still only hovering around 80% accuracy. Iโ€™m not sure I could achieve higher

#

Problem is real vs fake Amazon reviews

#

Iโ€™ve seen/talked to others using hotel reviews which can hit accuracy in the 0.9s so Iโ€™m not sure

nova nest
#

does anyone know how to encode the tags column like this?

civic thunder
#

Hello guys, I've written and tested a collection of REST API-s using the Flask framework in Python, for interacting with the Elastic cluster without breaking any security protocol or exposing any credentials/access tokens/confidential data. I'm open to collaborating on this project to add more functionalities or structure it in a better way in order to scale it to a more significant project.

So, please give it a look, star it, fork it and if you're interested in collaborating and then include it in your resume as an open source project, feel free to contribute by raising a PR.

GitHub Link of the project: https: //github.com/atanughosh01/elasticsearch-api

gloomy anvil
#

Is there a better way to group these algos? How could I call the first group?

#

I dont want to split the first group into linear/Bayesian/non-parametric or so

#

Is there a better way to structure these Algos?

ripe forge
#

things like KNN really dont belong in the same bucket as logistic regression, so the only thing that makes sense is "Other models" and those models coming at the end

gloomy anvil
#

Ill probably use your approach and put the "Other Models" in the end

#

another question though: Sklearn categorizes LogReg as Linear, which makes sense in a binary context. Sklearn also categorizes Bayes as a separate category. But isn't it also kind of linear in binary contexts?

ripe forge
#

naive bayes doesn't involve trying to fit a line, right, it's purely based on probabilities of events.

#

linear models, as i understand them, essentially involve some kind of regression and line fitting, where the line itself is linear.

#

logreg is a linear model because it also involves fitting a line, and then transforming that essentially using sigmoid

wooden sail
#

linear doesn't just refer to "lines" in the traditional sense, but rather to linear transformations. this includes matrix-vector products more generally

#

you could pair this with bayesian estimation if you wanted to, as well

#

you could put all of those under regression. you formulate a statistical model that can be parametric or not, bayesian or not, and linear or nonlinear. you can then choose how to find the parameters.

#

you can fit DL under different parts of these categories depending on if you use them as solution approaches or as part of the model. decision trees would be part of how you formulate the model

#

that'd be my take on it from the estimation theory perspective, at any rate. it's just very common that nowadays when people say algorithm, they mean "description of the model"

stable palm
gloomy anvil
# wooden sail you could put all of those under regression. you formulate a statistical model t...

Hey Edd, thanks for your insight! I am sorry, but I do not have that deep mathematical/theoretical background and english is not my first language. I thought, SVM and K-Nearest Neighbour would be non-parametric models, as the model structure is not defined prior to training. And using Bernoullio Naive Bayes would be the only bayesian model that I'd have here, isn't it? Please excuse my ignorance, but how would you group these models then ? Which models would be parametric/non-parametric in your eyes?

wooden sail
#

for svm, i'd say it depends largely on your interpretation. SVM finds by assuming there exists a hyperplane that splits separates the points in space. being on either side of the hyperplane defines what category the points are in. the task is to find the parameters describing the hyperplane. i'd be willing to call that parametric

#

knn looks nonparametric to me

#

as you have written them, bernoulli naive bayes seems to be the only bayesian one, yeah. you could mix and match that with some of the deep leaning approaches depending on how you use them. also with logistic regression and svm

gloomy anvil
#

okay, but the hyperplane basically is the space between, that I try to measure - with KNN I do basically do the same, but measure the distance between the data points, but not via a hyperplane ?

gloomy anvil
wooden sail
#

they can be bayesian depending on the function they minimize

#

the cost function with which you learn the parameters determines this

mint palm
#

is semantic segmentation/superpixel method, then masking, then reconstruction, then calculating reconstruction error that fast to incorporate real time anomaly detection??

steady basalt
#

oh, u want seperate headings

#

thats gona get u the same results just harder really

nova matrix
#

Guys is it necessary to read research papers or books for ML or is that only reserved for a more research tailored approach

mild dirge
#

Sometimes the best explanation about an algorithm is straight from the source @nova matrix

#

If you can find a better explanation, then do so, but reading some research papers can be useful

nova matrix
#

I am currently on my path tbh. I started Data Science and ML like 6 months ago. I now know how to use the python libraries and use some algos like SVM , RF , Logistic, Linear KNN, lighgbm xgboost
Idk what to do next tho
What should the next approach be
Should I broaden my understanding of these algorithms through books first or do projects on kaggle
Also when should I make the jump to Deep Learning

sleek tapir
#

what does this mean

#

#NUM!

serene scaffold
sleek tapir
#

pandas

#

in pandas dataframe

serene scaffold
sleek tapir
serene scaffold
#

looks like some excel fuckery

#

can you do print(df['ammonia'].dtype)?

sleek tapir
#

object

serene scaffold
wooden sail
#

looks like the cells in excel were formatted incorrectly

sleek tapir
#

yea it looks like excel is fked

#

tat fuckery of a dataset

#

ill delete the rows

#

thanks anyways

#

r u from usa

serene scaffold
#

yes. why?

sleek tapir
#

are there better courses in us

serene scaffold
#

better than which?

sleek tapir
#

like how is ds in usa

#

is it just products like in au

#

like ML and e.t.c.

serene scaffold
# sleek tapir like how is ds in usa

it's going to depend on the university, but it seems like a lot of places just have some data science electives for computer science majors. hopefully we'll start seeing actual DS/AI degree programs in the next several years.

sleek tapir
#

stanford ml seems so hard core

#

my uni doesnt have anything similar

#

my uni just do heres some code

#

and yea end of course

wooden sail
#

it really depends on where you look. you'd have to look at engineering or maths programs if you wanna go very in depth

sleek tapir
#

but stanford is maybe a exception

#

my uni cs is in eng

#

but ive heard a lot of unis put it under math

wooden sail
#

maybe they're in the "cs = software eng" school of thought, which is not necessarily what you're after

sleek tapir
#

yep

#

how bout in usa?

#

is it in eng or math faculty

#

are most cs faculties cs = seng thought

wooden sail
#

that varies by uni, so you'll find both in the states. you could find it under math or still in eng in other au universities

wooden sail
sleek tapir
#

my uni defs has this feeling

#

lol

#

ill shwo ut he handbook

#

click core courses

#

it looks all seng

#

sooftware engineering fundamentals oop comp sci projects

serene scaffold
#

at my uni (I say "uni" even though I'm american), computer science had been part of the math department, but it was moved to the engineering school some number of years before I enrolled. but the curriculum was more theoretical than software eng.

sleek tapir
#

which uni

#

i view cs as part of maths

#

i have both degrees

serene scaffold
sleek tapir
#

u look handsome!

serene scaffold
sleek tapir
#

u started college at the same age as me

serene scaffold
#

I don't think I have my age anywhere? tangerine_think

sleek tapir
#

the ppl challenged me saying tell me if u can build a website with math

wooden sail
#

hmm i see DSA and discrete maths are the only relevant core courses. what is "higher mathematics" lol

sleek tapir
#

it just feels facepalm

sleek tapir
#

calc 1 and calc2

wooden sail
#

yes but what kind xD math is very broad a term

sleek tapir
#

plus lin alg up to vector spaces/linear/transformation/eigenvalues/a bit of stats

wooden sail
sleek tapir
#

i do a math degree

#

yea we dont learn ml in depth

serene scaffold
wooden sail
#

yo what

sleek tapir
#

discrete math is part of cs

#

tbh

#

actually no

#

cs is part of discrete math

serene scaffold
#

"something that you use in CS" doesn't mean that it's part of CS

wooden sail
#

oof ๐Ÿ˜›

sleek tapir
#

tats wat most ppl dont understand

serene scaffold
sleek tapir
#

is viriginia in usa capital city

#

its a blue state

wooden sail
#

not controversial, but i'd say you'd make a good lawyer

sleek tapir
#

i want to go usa one day

#

ive only been china and australia

#

also japan too

serene scaffold
sleek tapir
#

tats it

#

they like joe biden

serene scaffold
sleek tapir
#

lol biden is so shit

#

he doesnt know anything

serene scaffold
#

we've strayed very far from data science and ai.

sleek tapir
#

anyways yea my uni doesnt do much ds and ai

#

wat u guys think of andrew ng course

wooden sail
#

from god

serene scaffold
sleek tapir
#

im redoing his new one

serene scaffold
wooden sail
#

i'd agree they're good courses. and i'm just memeing ๐Ÿ˜›

sleek tapir
#

lol in australia < 50% are christian

wooden sail
#

other suggestions, if you're interested, are gilbert strang's linalg and boyd's convex optimization (these are books)

sleek tapir
#

i dont need alg and covnex optimisation

wooden sail
#

i would argue if you studied those more, you would pick up ML more easily

sleek tapir
#

i do study those

#

cause i do a math deg

#

i do a double

wooden sail
#

ah sweet

#

but then you should be set

sleek tapir
#

yep

lapis sequoia
#

Hey,

I transformed an excel file (xlsx) into a csv file.
Then when i run pandas.read_csv('csv_file'), i get this error thrown: pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 76, saw 4

Any idea of whats going on? And how to solve it?

lapis sequoia
#

I'm trying to follow along with the fastai tutorial, I've copied the code they posted on their Kaggle Jupyter notebooks and copy-pasted the relevant parts of it to my IDE, but I'm getting this error: https://pastebin.com/AQYuAbtz

Any ideas how to fix this?

My code: https://pastebin.com/Jqbiicd9

Fastai code I copied: https://www.kaggle.com/code/jhoward/is-it-a-bird-creating-a-model-from-your-own-data

lapis sequoia
#

^ solved

tacit basin
lapis sequoia
solemn raptor
#

Hey guys, a question

I have a pd.DataFrame column with datetime index and I need to get intervals between column value crossing 100 from below until it crosses 0 from above - is this possible at all?

#

Here's a sample:

2022-07-01 00:00:02.804000000    1.665
2022-07-01 00:00:02.808999936    2.570
2022-07-01 00:00:02.816999936    3.635
2022-07-01 00:00:02.820999936    3.615
2022-07-01 00:00:02.824999936    4.280
2022-07-01 00:00:02.831000064    4.275
2022-07-01 00:00:02.840000000    4.595
2022-07-01 00:00:02.846000128    2.700
2022-07-01 00:00:02.852999936    3.605
2022-07-01 00:00:02.860000000    5.200

and I need to extract parts where the column value crosses 10 from below (i. e where previous value < 10 and current >= 10) and until it crosses 0 from above (previous > 0, current <= 0)

If the value crosses 10 from below again after the first time, I need to ignore it

wooden sail
#

how big is the dataframe? after thinking about it for a while, i don't think there's any clever way to avoid looping through that column

mint palm
#

https://www.youtube.com/watch?v=VC-H2z0Um6o
is this video did i understand correctly is i say basically its normal masking based anomaly detection but the mask is dynamic and its dimensions are hyperparameter?
what else is novel about the author's model??

We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. The proposed self-supervised block is generic and can easily be incorporated into various state-of-the-art anomaly detection methods.

The open-access paper can be found at:
https://arxiv.org/pdf/2111.09099.pdf

T...

โ–ถ Play video
serene scaffold
#

what's the structure of the json, and what are the patterns

#

this is too much to look at, and it's not structured in any way that makes it easy to read.

#

Please remove the parts that can be ignored, so that your question is focused only on what is relevant. We don't know which parts can be ignored unless you tell us.

#

some non-arbitrary use of new lines.

#

I can format it myself if you do that much, I guess.

#

I already had a plan for how I would format it. I just need the irrelevant parts stripped out.

#

well, it's your job to communicate your question effectively. you have to give some example of what the data is that is usable by a volunteer answerer.

bold timber
#

what is difference between loss and validation loss?

quaint leaf
bold timber
ebon hazel
#

Just genuinely, how hard is it to make AI? I see Code Bullet do it and it seems possible to even create the shittiest of AI

#

I get that math is a huge portion and it's important to know the math semi decently and Python but just even a shitty AI that learns how to scale boxes the fastest

wooden sail
#

it depends on how much you want to understand what you're doing. high level APIs can help you make super neat stuff without ever understanding what you're doing in depth. this will also limit which problems you can solve and how, but if that doesn't matter to you, the answer is "very easy"

ebon hazel
wooden sail
#

you need at least undergrad level math for that

#

if you wanna derive any sort of guarantee or description of the performance, then you need to have a pretty decent level of math

ebon hazel
#

Ok that doesn't seem so hard

#

I will be learning undergrad math

#

Uhhhh, I looked through the pins, is there anything else you suggest to help me get started?

wooden sail
#

learn einstein summation notation and get comfortable with isomorphisms of vectorspaces as soon as possible

#

so start linalg asap

ebon hazel
#

Nice, words that I need to google

#

Oh it sounds scarier then it is

#

Ok yeah, I got this. Thank you for the help

wooden sail
#

yeah it just sounds fancy, the main idea is stuff like, if you have linear transformations acting on matrices, you could instead vectorize the matrices and make a new, equivalent transformation for that

#

and similarly for n-dimensional arrays

ebon hazel
#

Yeah, doesn't seem the worstโ„ข๏ธ

eager scarab
#

Hi! does anyone have source codes for python chatbot?

steady basalt
#

But soon all coding will be done by ai so anyone can do it soon

ebon hazel
steady basalt
#

It is because I do it

ebon hazel
#

No math involved at all

#

And you simply just make the AI

steady basalt
#

No none at all depending on your target

ebon hazel
#

Because you know how to code well?

steady basalt
#

Not even well but yeah

ebon hazel
#

Please continue explaining

steady basalt
#

Iโ€™m almost done w my masters and the math Iโ€™m learning is purely supplementary

#

None of it was required

ebon hazel
#

Supplementary meaning?

steady basalt
#

For fun

#

For self improvement

ebon hazel
#

Ah

#

Masters in what

steady basalt
#

Data science

ebon hazel
#

Nice

#

So, ok

#

Break me down in the process of โ€œcode well, make AIโ€

steady basalt
#

Spent a lot of time feature engineering from data base

#

Too long actually, become god with pandas and python

#

Import ai library

#

Get predictions

#

Wow look at that successful model

ebon hazel
#

Step 1) Import AI library
Step 2) Predictions

steady basalt
#

Literally yes

ebon hazel
#

Amazing. How would that be implemented into, lets say, an AI that tries to walk

steady basalt
#

Sure, this is an actual what do u call it

#

Duno the word for it but itโ€™s like a independent entity

#

Reinforcement learning plus computer vision shud do it

ebon hazel
#

And do those also require math

#

Or are those also โ€œcode goodโ€

steady basalt
#

In the first place yes absolutely, now not so much as people have already done it

ebon hazel
#

Ah so itโ€™s just

#

Step 1) Import library

#

Step 2) Import library 2

steady basalt
#

It is a fact that yes you need to know math to know whatโ€™s going on in the background

ebon hazel
#

Step 3) Have you tried importing libraries?

steady basalt
#

But no u donโ€™t if all u want is to make a suboptimal model

ebon hazel
#

But if you donโ€™t care about the background

#

Ah ok, makes sense

#

I would like to be able to create it? If that makes sense

steady basalt
#

People who make the best shit usually use maths

#

Yeah u would

#

Easily

ebon hazel
#

Like not just import a library, add 2 lines and be like

#

โ€œCheck out my graphโ€

steady basalt
#

All this will be automated by ai writing code for u soon anyway

ebon hazel
#

Why so

steady basalt
#

Because ai can already write code

#

And imho these jobs will all be replaced in like 10 years or less

#

Except for the hardcore resaearch which yes requires a lot of maths

#

So, management asap for me

ebon hazel
#

Ah sweet

#

My hobby will be replaced by AI

#

Sick nasty. Iโ€™m a go cry

steady basalt
#

Lucky ur not doing this for career

#

Oh wait, literally every career will be replaced by ai

ebon hazel
#

My career will be research/engineering

#

Maybe I have job security

steady basalt
#

And you can do that without maths how?

ebon hazel
#

Never said I do it without maths

steady basalt
#

not just talking about normal maths either that shit requires REALLY advanced maths

ebon hazel
#

Yeah

#

I know

steady basalt
#

Good luck then

ebon hazel
#

Itโ€™ll be in biochemistry

#

So I know itโ€™ll suck ass

steady basalt
#

Oh thought u meant ai research

ebon hazel
#

No sorry

#

Thatโ€™s just a hobby

steady basalt
#

It is fun

ebon hazel
#

Yeah

#

Seems like it

steady basalt
#

do u know python ?

#

or u doing this with R

ebon hazel
#

I am learning Python

#

But I have experience in web and C++ (didnโ€™t like web development so I switched to Python few days ago)

steady basalt
#

I tried django lately it was hard

ebon hazel
#

Whatโ€™s Django

steady basalt
#

its python web dev

#

full stack

simple frigate
#

Hey is there a tool which can write stuff using AI on the basis of existing text?
Like add onto it.

runic lantern
#

Hi everyone, I am trying to implement a simple 3 layer neural network on the MNIST handwritten digits dataset using numpy only. I am facing some errors when running the gradient descent. can someone please help me out.

#

this is the jupyter notebook i am working on

runic lantern
gleaming osprey
#

nope

#

also, why use numpy only?

runic lantern
#

i was following andrew ng's course and i didn't really grasped what was going on in the backpropagation portion

gleaming osprey
#

oh

runic lantern
#

so i thought doing this would help me understand

rapid cedar
#

fellow discord, where do you start, when learning machine learning

gleaming osprey
#

all I know is that for each weight you add a partial derivative of the loss

serene scaffold
runic lantern
#

although I am still not able to fully understand :' ))

rapid cedar
#

ignored*

runic lantern
rapid cedar
runic lantern
#

there are various courses there

#

you can apply for financial aid

rapid cedar
#

for free?

#

bcs im broke as hell

gleaming osprey
#

then learn tensorflow/keras

#

and solve mnist

rapid cedar
#

i know how it works

gleaming osprey
#

oh

rapid cedar
#

like algorithm and stuff

gleaming osprey
#

do u know tensorflow

runic lantern
rapid cedar
#

but how do you actually write the code

gleaming osprey
#

tensorflow

#

ok imma give u a tutorial

rapid cedar
serene scaffold
#

well, if someone is trying to start with ML, I wouldn't even touch neural networks for a few months.

gleaming osprey
gleaming osprey
#

my teacher started with regression

runic lantern
#

they get really complex really fast

gleaming osprey
#

my model's validation keeps getting stuck at 55%

#

how can I fix this?

#

my goal is >= 80%

serene scaffold
# runic lantern they get really complex really fast

also, think about what non-ML people think ML is. they're usually very wrong about what ML is. and it would be difficult to learn neural networks, while also reframing your whole understanding of what a neural network does.

runic lantern
gleaming osprey
#

l2 regularization

gleaming osprey
#

ok

runic lantern
gleaming osprey
#

its a conv net

runic lantern
#

ohh i dont have experience with either keras or a conv net, my bad

gleaming osprey
serene scaffold
gleaming osprey
wooden sail
gleaming osprey
#

how to properly use dropout and regularization

wooden sail
#

how large is your data set and how large is the network? might be you need some augmentation, too

gleaming osprey
#

and this is my network: ```py
tf.config.run_functions_eagerly(True)

model = Sequential()

model.add(Conv2D(8, 2, activation='relu', input_shape=(48, 48, 1)))
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))

model.add(Conv2D(16, 2, activation='relu'))
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))

model.add(Conv2D(32, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(MaxPooling2D(2))

model.add(Conv2D(64, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(Conv2D(128, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.003)))

model.add(Flatten())

model.add(Dense(512, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.4))
model.add(Dense(128, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0001)))
model.add(Dropout(0.3))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(7, activation='softmax'))

model.summary()```

#

the exact size of the dataset is 28,709

#

help?

wooden sail
#

if you think about the size if your image and the number of convolutions and max pools you're taking, it makes more sense to use fewer convolutional layers with larger kernels

red timber
#

Hello everyone! I have a newbie question! โ˜บ๏ธโ˜บ๏ธโ˜บ๏ธ Iโ€™ve begun to preprocess Reddit comments for a beginner personal project of sentiment analysis.
My question is
does everyone use TextHero for all preprocessing needs?
Or do you guys use individual libraries for individual preprocessing tasks?
Thanks in advance ๐Ÿ™๐Ÿพ๐Ÿ˜Š

serene scaffold
steady basalt
#

Never heard of text hero either

red timber
# serene scaffold I do nlp professionally. and I haven't heard of text hero. what preprocessing ar...

Hi! Oh wow! Thank you for taking the time to respond! ๐Ÿ™๐Ÿพ๐Ÿ™๐Ÿพ. Currently I have
-extracted the comments from one post on Reddit
-I have them in a list of strings

  • I think I can convert the list of strings into one string to best get at the text
  • then! I am one by one , trying to remove emojis, numbers, url tags, stemming, tokenize, etcโ€ฆ. (I think that is the next step?) I was using NLTK so farโ€ฆ.
    -after doing this I hope to have words with which to begin some beginner analysis (summarizing, wordblob, sentiment analysis)โ€ฆ

Am I on the right track as for as my process? I heard that TextHero is something to use that is a โ€œone stop shopโ€ compared to SPACY or NLTK. Is that correct? Or do you have a favorite?

Thanks so much for any advice ๐Ÿ™๐Ÿพ

serene scaffold
red timber
# serene scaffold I usually use spacy, or implement it myself with regex. but if this text hero th...

Very fascinating! I did start learning about โ€œregexโ€ a bit yesterday. And I will continue today with that research. Iโ€™m so excited! Thank you for the advice! ๐Ÿ™๐Ÿพ๐Ÿ˜Š๐Ÿ™๐Ÿพ๐Ÿ˜Š
But since you work professionally and havenโ€™t heard of itโ€ฆ.
TextHero probably isnโ€™t being used professionally. So I donโ€™t wan to get to into it if its a waste of timeโ€ฆ.. I am really trying to tailor my learning to applicable job skills so that I can get employed as soon as possible
-thank you again ๐Ÿ™๐Ÿพ

serene scaffold
red timber
serene scaffold
#

if you're doing projects and learning the terminology, you're probably spending your time well.

red timber
serene scaffold
red timber
#

No I come from the service industry. Warehouse jobs and bartendingโ€ฆ so I know I have to work a million times harder than those coming from graduate degrees and technical backgroundsโ€ฆ. But Iโ€™m READY for this challenge!! I believe in myself. Itโ€™s so encouraging to hear you work professionally and are self taught as well!!! Wow I canโ€™t wait for that to be my story ๐Ÿ˜Š๐Ÿ˜Š๐Ÿ˜Š๐Ÿ˜Š

serene scaffold
#

and I had been working for starbucks for a few years before that. so unless some life circumstances would make it supremely impractical for you to get a degree, I would encourage you to reconsider it.

lucid current
#

any quants in here?

unique flame
#

Surely there are some practical uses of AI n warehouses and bartending no?

lucid current
lucid current
unique flame
#

I can think of an image classifier to detect drunk people

lucid current
red timber
lucid current
#

as drunkenness scores higher , beverage alcohol content lessens

lucid current
red timber
# lucid current lmk plz

May I please ask why you call yourself โ€œquantsโ€?
Is quantitative analysis your favorite or something? Iโ€™m Just a newbie over here wonderingโ€ฆ..

lucid current
#

i dont call myself a quant

#

i want to become a quant tho, thats why i ask

#

quant is just short for quantitative analyst bc nobody wants to say that every time, and quant sounds less nerdy

red timber
red timber
steady basalt
#

No one says quant over here

lucid current
steady basalt
steady basalt
unique flame
#

I actually had to look it up too, seems a finance thing

steady basalt
#

Quant always crops up as some sort of super advanced high paying job in finance

#

Not in data science

lucid current
#

its heavily data science related imo

#

often interchangeable

steady basalt
#

Sounds really like theyโ€™re making trading algorithms

lucid current
#

quantitative is a word that means data that refers to quantities, therefor it can be counted, numerically expressed/scaled, etc

steady basalt
#

Yeah but thatโ€™s really a broad term

lucid current
#

not rly

steady basalt
#

So quant is someone who does quant? Awesome. Or you could just say youโ€™re a data scientist specialising in finance

#

Anyway, I think this terms american so make sense why no one says it here a lot

lucid current
steady basalt
#

So data analyst on steroids

lucid current
#

data analyst on super steroids yes

unique flame
#

"The data analyzer" seems more fun tho

steady basalt
#

So you want to be a data analyst whoโ€™s basically just more skilled than normal ones

#

That is quant ?

#

I guess data analyst has been really dragged through the mud whereby youโ€™d imagine people making bad charts and stuff

#

Makes sense youโ€™d want to distinguish from this

lucid current
#

there is a reason why most seasoned quants make 200k-2m+/yr with insane bonuses

steady basalt
#

Google says quants develop mathematical models

#

This is certainly not like data analyst

lucid current
#

imagine if a data scientist

#

and a data analyst had a child

steady basalt
#

Seems like data scientist plus financial mathematician

lucid current
lucid current
#

u get the idea

steady basalt
#

Basically in simpler terms mathematician at the end of the day

lucid current
lucid current
#

anyway

steady basalt
#

Compared to DS at least

lucid current
steady basalt
#

This is a data science chat

lucid current
#

its\ closely related

wooden sail
#

did you have any questions?

lucid current
#

about quantitative analysis? yes

steady basalt
#

If u ask it maybe someone can answer u

#

But no that I know of there are no quants here

#

Mostly data scientists

lucid current
#

just curious where to start from a relatively sr/adept python dev

im already quite familiar with data analysis+/science already, ofc databases, some hft

#

i did not go to college so my math expertise is pretty average

steady basalt
#

And you want to be a quant ?

#

Didnโ€™t you just state quants are like math gods

#

Yeah that makes sense

wooden sail
#

a little math is good for everyone, so i'd start there. you can also look at domain-specific techniques and models to get a feel for what kind of maths will avail you

lucid current
#

and no i didnt, i just said statistics and calculus

#

but yeah ofc those can be pretty intense areas of math.

steady basalt
#

I mean we can point to data science stuff but for financial quants I donโ€™t think we have that knowledge

lucid current
#

well all good thanks anyway boys

steady basalt
#

Statistics and calculus isnโ€™t that heavy for most data scientists

lucid current
#

glad to hear

steady basalt
#

I mean, in terms of data science Iโ€™d say start with python

lucid current
#

ive been procrastinating on learning calc

steady basalt
#

Iโ€™ve never met a senior python dev whoโ€™s 18

lucid current
steady basalt
#

Then you have the talent

#

Just learn the data science frameworks

wooden sail
#

you're definitely gonna want some calc. in my very limited knowledge, finance relies heavily on differential equations, so you'll wanna reach that eventually

#

and stats/probability goes without saying

steady basalt
#

Try PyTorch or tensorflow

#

Tho Iโ€™m not sure you need it for quant

#

Sounds to me like numpy will be most important to you for maths in code

lucid current
#

oh ofc those as well
i never got into building models from complete scratch tho so i cant say im a true ml engineer

lucid current
steady basalt
#

Iโ€™d be filling to bet there are python libraries for financial analysis

lucid current
steady basalt
#

And other math stuff

lucid current
#

god bros feels like im learning from scratch all over again

steady basalt
#

To be fair, ur only 18 man

lucid current
#

7 more years until neuroplasticity declines
we speedrunning

steady basalt
#

I mean, most people here are a lot older than you ur starting very early

lucid current
#

yeah im glad im starting now. i just got lucky

#

that i had a friend around me who pushed me into it. otherwise it wouldnt have happened

lucid current
steady basalt
#

Do what u enjoy tho, if it doesnโ€™t turn out to be quant donโ€™t chase it you may find more fun stuff along the way

#

I use STATA almost exclusively for stats

#

Aim for the sky but I wud say donโ€™t expect an 800k salary

lucid current
steady basalt
#

Itโ€™s statistical software

#

Very popular in biostats

lucid current
lucid current
steady basalt
#

Yes stuff like trials and studies

#

For example, people dying

lucid current
#

interesting

steady basalt
#

I recently did such report

lucid current
#

do you have a med background?

steady basalt
#

Yes, I did a report on survival analysis

#

So thousands of people over a study period, with certain factors which influence outcome

delicate lintel
steady basalt
#

Wdym?

#

Qualification?

#

Outcome?

lucid current
#

name every single hyperparameter

steady basalt
#

It doesnโ€™t use hyperparamets because itโ€™s not machine learning

lucid current
#

jokign

steady basalt
#

Itโ€™s stats.. no tuning

delicate lintel
#

you said you did survival analysis which i'm guessing means analyzing what survives based on some parameters?

steady basalt
#

Regression etc

steady basalt
delicate lintel
#

like weight, height etc.

steady basalt
#

Oh yeah

#

Ahem: sex smoking status bmi age education drugs alzeimers were all confounders

#

This is not what the models โ€œbased onโ€ theyโ€™re adjusted for

#

The models purely bmi and death as well as bmi change

#

For example, thatโ€™s one model

lucid current
steady basalt
#

No my bachelor was biomedical science my master is data science

lucid current
#

no no i know i mean that was the joke

delicate lintel
#

it was a joke

lucid current
#

like ur a chem major name every single molecule

steady basalt
#

Iโ€™m also on the spectrum

lucid current
delicate lintel
steady basalt
#

Using cox proportional hazards

delicate lintel
steady basalt
#

Being โ€œoverweightโ€ actually had a protective effect

#

Compared to underweight

#

Yes as well as a few other things

#

I have this sort of experience as far as stats goes, mostly inferential

#

Not much in terms of theory that you may expect

delicate lintel
#

wait i'm curious, the different columns of the data like height, weight etc. do you not call that parameters in english?

steady basalt
#

No

delicate lintel
#

that's what i meant but how do you call them?

steady basalt
#

In ML they call that feature, in stats itโ€™s variable

delicate lintel
#

oh ok

steady basalt
#

Anyway, they werenโ€™t columns here in that sense

#

Itโ€™s not predictive itโ€™s causal inference

#

Except for MICE which you can say is predictive i guess?

#

I saw you typing edd

iron basalt
# delicate lintel that's what i meant but how do you call them?

Dependent and Independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function), on the values of other vari...

#

This might help.

#

Most generic is just "input" and "output".

delicate lintel
#

you have the data you use to predict and the data you predict

steady basalt
#

I told you I was analysing death rate not predicting

#

This is stats not ML

iron basalt
delicate lintel
#

yeah in your case i wasn't referring to that i was talking about ML

steady basalt
#

Are you Israeli

delicate lintel
#

and for other stuff

delicate lintel
runic lantern
#

hi so i finally got my neural network to work, but for some reason it is not converging

steady basalt
#

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the ...

#

It isnโ€™t the same as ml

runic lantern
steady basalt
iron basalt
#

"Parameter" is technically valid here (it's also super generic like "input" and "output"), but every field uses the word differently.

delicate lintel
delicate lintel
steady basalt
steady basalt
#

U canโ€™t really say that logistic or linear regression is machine learning by default

#

In my opinion

delicate lintel
#

no they're both concepts taken from statistics used for ML but it's the same concept

steady basalt
#

Iโ€™m sure alot of people do but for me itโ€™s just a tool

#

With uses

#

Maybe it becomes Ml when you want to predict death for data where death is not available sure

#

Thatโ€™s why I said for MICE it can be considered such

delicate lintel
#

what would it be useful for other than that?

#

i mean you could use it to see how weight affects survival rate but you don't really need to fit a model for the data to do that right?

#

you can just look at the data and see it

steady basalt
#

Causal inference

#

lol u canโ€™t just look at the data and see it

steady basalt
#

And yes it gives you โ€œprobabilitiesโ€ but youโ€™re still not doing ML

delicate lintel
#

ok i understand

steady basalt
#

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due to symmetry), the ratio of the odds of B in the presence of A and the odds of B in the absence of A...

#

Look at that

#

This is like, the fundamentals of logistic

#

Used aloooooot in medical research

#

Like the example I used above, different BMI categories

#

But you may have noticed I used cox, thatโ€™s because itโ€™s time dependant

timid kiln
#

Beginner here, a little frustrated with understanding pandas and the format/type of the values. Specifically with dates. I'm pulling data out of excel and out of a sql database for processing. It seems that pandas/python typically thinks my dates are strings. For example, when I run this code:

df_projects = df_projects[pd.to_datetime(df_projects['stf_date'], errors='coerce').notnull()]
print(df_projects['stf_date'].dtypes)

It tells me that the data in column stf_date is an object (which I assume means pandas/python thinks it's a string.

Can y'all help me understand why I'm getting that result, immediately after conversion?

#

The format of the date in excel, which is the source of the dataframe, is %Y-%m-%d.

serene scaffold
#

!docs pandas.to_datetime

arctic wedgeBOT
#

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)```
Convert argument to datetime.

This function converts a scalar, array-like, [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") or [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame")/dict-like to a pandas datetime object.
timid kiln
#

2023-01-31 00:00:00 is the result after conversion.

serene scaffold
#

is that what you want?

timid kiln
#

I would like it to stay as %Y-%m-%d, as it was in excel. It's a project, so we're just working with a resolution of days, not minutes, so I don't need minutes.

steady basalt
#

That is an object

timid kiln
#

ikr?

#

But an object can be a string, no?

steady basalt
#

Not really itโ€™s a date time object in memory

serene scaffold
steady basalt
#

Your computer stored it as a object

#

Because itโ€™s datetime not str

timid kiln
steady basalt
#

Itโ€™s meant to do that

#

Use str(object) maybe ?

serene scaffold
#

so, just forget how whether or not it "stays as %Y-%m-%d", because that won't matter for your calculations. what matters is that the moment in question is being represented unambiguously.

steady basalt
#

Or some function that will change it

timid kiln
#
print(df_projects['stf_date'].head())
df_projects = df_projects[pd.to_datetime(df_projects['stf_date'], errors='coerce').notnull()]
print(df_projects['stf_date'].dtypes)
print(df_projects['stf_date'].head())

gives me:

0               Complete
1               Complete
2    2023-01-31 00:00:00
3    2023-01-31 00:00:00
4    2023-01-31 00:00:00
Name: stf_date, dtype: object
object
2    2023-01-31 00:00:00
3    2023-01-31 00:00:00
4    2023-01-31 00:00:00
5    2023-01-31 00:00:00
6    2023-01-31 00:00:00
Name: stf_date, dtype: object
steady basalt
#

Whatโ€™s the problem with it

serene scaffold
#

also, why do you have Complete in there?

timid kiln
steady basalt
#

Oooof

serene scaffold
#

it might be that the timestamps are actually datetimes, but that your column is heterogenous (has more than one type). your columns need to be homogenous (every value is the same type)

steady basalt
#

Mixing data types

timid kiln
#

So the purpose of that line of code was to try to convert everything to a datetime, and the strings get converted to NaN

serene scaffold
timid kiln
#

And then removed.

steady basalt
#

You probably donโ€™t want to mix dates and complete in one column

serene scaffold
#

You probably donโ€™t want to mix dates and complete in one column

steady basalt
#

Xd

timid kiln
serene scaffold
#

you can have is_complete as a separate column of bools.

steady basalt
#

Good now ur problems fixed

serene scaffold
timid kiln
#

OK, so let me take a step back here and talk about the table that's feeding this thing as I could use some advice on how better to build it.

#

In fact, lemme go stare at it a bit and see if I can reconfigure the table that's feeding the dataframe. It could definitely be improved, especially considering the issue I'm having at the moment. ๐Ÿ™‚

#

Thanks for your help so far folks. Much appreciated.

steady basalt
#

What is the issue?

#

You fixed the issue by removing complete

timid kiln
# steady basalt You fixed the issue by removing complete

Well, yes, but I mean, I shouldn't have the word 'Complete' in there. it's not misleading when you're looking at Excel, but perhaps there's a better way so that when I feed this into pandas it makes more sense to pandas.

However, once all the rows with complete are gone, why does pandas still think it's an object?

#

And can I convert it to datetime?

steady basalt
#

Date time is a object

timid kiln
#

O.o

serene scaffold
timid kiln
#

Thus, my status as beginner.

steady basalt
#

Itโ€™s automatically changed it?

steady basalt
#

Anyway using date times in my opinion sucks in pandas I use float from time delta

serene scaffold
steady basalt
#

So youโ€™re saying pandas canโ€™t do the math without converting it to datetime ?

serene scaffold
steady basalt
#

Ok from excel dates maybe import them first as a python list?

timid kiln
#

My code does need to see it as a date. I guess it would be nice if I didn't have to convert it to a datetime when I use it from the dataframe.

serene scaffold
#

the datetime in the python stdlib, however, is an object. because everything that's natively in python is an object.

steady basalt
#

@timid kiln what happens if you take two values and subtract them?

#

In ur python

#

Just get two random values from ur column

#

You can use iloc

#

If it gives you a timedelta then thatโ€™s a w

timid kiln
#

Basically the table coming in is a list of projects and the stf_date is the date when the project should kick off. If the project was completed, that value is Complete even though I have another column called Status that has Complete in it. it's just that, as I pulled this data together, I didn't have dates for projects previous to 2021, and I do need to have the data in the table. So I'm thinking partly to put in a fake date so that I can have only dates in that column. If the project is complete, it doesn't really matter what's in the stf_date column. I just don't like it having a bogus date in there. I'm a bit OCD I guess.

steady basalt
#

What are u gona use pandas for

timid kiln
#

pandas is helping me build another table based on the information in the projects table. A status table, if you will, for forecasting the future.

#

Based on that stf_date and the start date of the project, certain values will be calculated and placed in a column that pertains to each month in the future.

steady basalt
#

What values

timid kiln
#

For now, how can I get pandas to recognize it pulled a date in from excel?

steady basalt
#

Have u tried putting the date from excel into a python list first

timid kiln
#

I'll answer all the questions about what I'm doing after I figure this one thing out. Dates in python have been kicking my butt for the past week, ever since I found out about datetime.datetime and datetime,date and timestamp and how they all hate each other apparently. ๐Ÿ˜„

steady basalt
#

Put it in native python first and see

#

When you make a column out of it might convert

timid kiln
steady basalt
#

A list can hold objects strings integers whatever

#

Object ception

timid kiln
#

Yes, that's right. I never thought about testing it like that tho.

steady basalt
#

A list just lets you access it in memory daily

timid kiln
#

So how does pandas recognize something is a date in a column? When I pulled in data from a sql server, it had timestamp in the dataframe surrounding those values. So it seems that there needs to be some text of some kind in the dataframe for pandas to go "oh, this is a datetime.date or something"

steady basalt
#

Why donโ€™t you test what steraclus said

#

Make a new column which is defined as date column minus a value

#

Where that value is another date maybe

#

And see if it works

#

It shud give you another date object thatโ€™s a time delta in days

timid kiln
#

OK, I'll give that a try. Thank you for the suggestions!

steady basalt
#

If that works thereโ€™s no issues for having that object type

#

If it doesnโ€™t work youโ€™ll have to convert it

#

Because itโ€™s looking at native items inside the series though Iโ€™d expect it to work

#

Iโ€™m about to have to work with date time dataframes myself

#

Need to define if someone died within a year of an event

boreal loom
#

I have a dataframe, that contains columns with some features, in one column I have seen that when a feature exists , then in another column I have empty values, is there a way to express that programmatically or check if it is the case for other features as well ?

steady basalt
#

Youโ€™d have to draw a diagram or something but anythingโ€™s possible man

#

Ok I get it, yes thatโ€™s easily possible

boreal loom
#

Example : column_name = "is_brown", with values [1,0,0,0,0,1] then I have another name "is_black" with values [0,1,1,1,1,0] , here the case is obvious but I used correlation and cramers and got 0.48, while it should be something better

steady basalt
#

It should be 1.0 u mean?

#

Correlation isnโ€™t gona work with categories like that by defualt

boreal loom
#

In that case it should be 1, but in my case it is more complicated, instead of is_black, I have various colours like red,green etc etc, so by taking the is_brown I should produce something better

steady basalt
#

It would treat it as a number but Iโ€™d have thought it wud be higher than 0.48?

boreal loom
#

Yeah Id assume so as well

#

Hmmm, can I use correlation only when the value exists?

steady basalt
#

It didnโ€™t error when tried missing value?

boreal loom
#

So for example instead of saying correlate the whole column, check for correlation only when value is 1?

#

I covered it with 0

steady basalt
#

Oh thatโ€™s right I think it also skips missing

#

If u covered it with zero itโ€™s going to break ur pattern try no doing that

#

It might skip empty rows anyway

boreal loom
#

Yeah makes sense

#

I think it skips, cause the results were the same

steady basalt
#

U may end up with 0.50

#

Ok I think I know why

#

Basically your correlation method is treating them as scalars

#

And sometimes you have 0>1 and other times 1>0 ?

#

So it should land on 0.5 ?

boreal loom
#

yeah

#

I think thats why this happens

steady basalt
#

So if u skip the empty it shud he 0.5 not 48

#

Exaclty 0.5

#

Unless you have more 1s than 0s in the first column by a little

gleaming osprey
#

do you add batch normalization after every layer?

balmy oak
#

Just doing basics. Anyone see why I am getting the exact same values from LinearRegression and Ridge?

#

have been playing around with alpha just to get a noticeable difference, but they're always same

wooden sail
#

that'll depend on the singular values of the gramian of the model matrix

#

you can think of alpha as a factor being added to the singular values of the matrix M^T M, where M is the model matrix. in your case that you have a linear model, that'd be an N x 2 matrix M, where N is the number of examples you have

#

you'd need to compute the singular values of this M^T M and see how large they are. alpha needs to be comparatively large to have any effect, as alpha -> 0 means the regularization disappears

balmy oak
#

Perfect, that makes sense. Thanks!

wooden sail
#

in your case, the matrix would be given by M = [x 1], where x is the vector of x values, and 1 is a vector of 1s of the same size as x. then you can compute M.T.dot(M) and ask numpy to compute the SVD for you

#

take a look at the singular values and see how large they are

balmy oak
#

Yeah, look large

wooden sail
#

hoo boy, that'd be why, then

balmy oak
#

Okay,really larege

#

lol, e10

wooden sail
#

try something in the scale of e3 or e4, so that you modify at least the lower singular value. if you go up to e10, then you'll definitely see differences

#

also notice that these vandermonde matrices often have this nasty condition number (the ratio of the largest to smallest singular values)

#

tbh idk if scikit learn internally works with the vanilla least squares expression or if it uses 1/N in front (same minimizer, smaller minimum). at any rate, using alpha in that range e4 to e10 will certainly net you some effect ๐Ÿ˜›

balmy oak
#

Yeah, e8 did a little, but got something more noticeable at e10

wooden sail
#

that data doesn't look so linear lol

balmy oak
#

Propbabily shouldnt beusing linear stuff anyway, but

#

Lol, just practicing

#

Next up, log

wooden sail
#

๐Ÿ˜Œ

balmy oak
#

Thanks for the help @wooden sail

wooden sail
#

all good

mint palm
#

good place to learn implementation of knowledge distillation??

#

"for dummies"

ebon hazel
#

I am not a art person at all so creation of websites was harder than usual and I just prefer arbitrary numbers

steady basalt
#

typing html makes me sick

ebon hazel
#

Itโ€™s honestly not bad for me

#

Like itโ€™s fine. Itโ€™s just, not something I ended up enjoying

#

HTML and CSS were easy to type and code up but then when I came to actually making it look nice? That was what sucked ass

#

Oh and atom is a fucking legend for the Beautify. What an amazing way to not have my html not look like shit

steady basalt
#

My wrist physically hurts with all the <>

#

/.

brave sand
#

has anyone used RLLIB with pettingzoo?

rough mountain
#

I can't figure out why my GAN always outputs noise

ebon hazel
grand canyon
#

hey everyone

#

i really need some help with pytorch

#

im confused about

#

the following error:

#

All my input files (in the dset folder) are 50 x 50, and my neural network has an input of 2500, so im confused as to what's going wrong

worldly dawn
#

Hi and welcome!
It's not a channel for memes and shitposting though

wooden sail
lapis sequoia
wooden sail
#

that's your problem, then

grand canyon
#

alright so

#

i would have to convert the images

#

to greyscale?

wooden sail
#

you can convert to greyscale or apply a transformation that affects all 3 color channels

#

a simple way of doing the latter is to fully flatten the input to size 7500

grand canyon
#

just change the input size to 7500?

wooden sail
#

make into a "vector"

#

at the moment it's a matrix

grand canyon
#

ok

#

what documentaiton

#

can i refer to in order to convert

#

the matrix

#

into a vector in pytorch

wooden sail
#

i wouldn't know where to look, i've never used pytorch ๐Ÿ™‚ go to pytorch's website and look there

grand canyon
#

alright

worldly dawn
jaunty jackal
#

Can I get some quick help please with DateTimeIndex and pandas?

I can say the set is consistent, say I have some dataframe df I can slice like:
df1 = df["2020-01-01":"2020-10-01"]
and get fine results
But if I try to jump over years, say df4["2020-05-01":"2021-01-01"] it returns a failure:

<ipython-input-58-cf7eb8035389> in <module>
----> 1 df4["2020-11-01":"2021-01-01"]

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3005             # either we have a slice or we have a string that can be converted
   3006             #  to a slice for partial-string date indexing
-> 3007             return self._slice(indexer, axis=0)
   3008 
   3009         # Do we have a (boolean) DataFrame?

~\anaconda3\lib\site-packages\pandas\core\generic.py in _slice(self, slobj, axis)
   3807         Slicing with this method is *always* positional.
   3808         """
-> 3809         assert isinstance(slobj, slice), type(slobj)
   3810         axis = self._get_block_manager_axis(axis)
   3811         result = self._constructor(self._mgr.get_slice(slobj, axis=axis))

AssertionError: <class 'numpy.ndarray'>```

df["2021-01-01":"2021-12-01"] slices fine, and the whole frame df has no issues. I get the same erro when I slice over 2021/2022 also (e.g. df["2021-11-01":"2022-01-01"])

Not sure what's happening here. I am taking advantage of the DateTimeIndex feature in pandas when it comes to using strings to filter/select/slice so maybe I could go barebones and just reference everything without using strings, but I'm curious as how I've gotten here.

For what it's worth, the data is consistent across the year jump (that is, a lineplot vs time for the set displays what it's supposed to display)
lapis sequoia
#

Bro help

#

I used wav2lip on conda at windows 11

#

And it took 1 hour for 10 secs video

#

How to make dit faster

#

Any idea?

#

I have RX560

#

Laptop

quaint leaf
tacit basin
candid garnet
#

Hi!
I am working with results from an FMR machine.

I am trying to remove background noise. I have a reading/contour plot for background noise. I have another for background noise along with the magnetic sample. I'm trying to isolate just the signal from the magnetic sample.
On X axis I have current, on Y axis I have Frequency. The contour is the amplitude (which is the signal i'm trying to separate)

#

background only (horizontal lines) and sample+background (image with curve included)

#

when i simply did amplitude2-amplitude1, i get the following:

#

horizontal lines still present

#

so it's basically background noise removal from an image/signal, where i have the background already recorded

slate smelt
#

can someone help me im a newbie in ML and i have an important project to finish this week?

wooden sail
#

.latex \argmin_s $\Vert I_signal - s I_background \Vert_F^2$

strange elbowBOT
wooden sail
#

sigh

#

.latex $\argmin_s \Vert I_signal - s I_background \Vert_F^2$

strange elbowBOT
wooden sail
#

min w.r.t. s || image_with_signal - s*image_with_only_background ||_F^2

lapis sequoia
nova matrix
#

is there anyway to prevent label encoder from encoding NaN values

serene scaffold
#

You need to figure out why it's doing it. And then either fix your data so that that doesn't happen, or decide what you want to happen if a NaN does get encoded

limber token
#

If I have two dfs with the same exact columns but different data, what is the best way to join them?

#

I found .concat(), .merge() and .join() but not sure which is the best for this

wooden sail
#

concat and append seem to do what you want

limber token
#

Worth noting: I noticed I was somehow losing some rows with .concat()

mild dirge
#

A very raw way to do it would be maybe taking the mode (or median) color of each row in the image, and subtracting that from the image

#

Since the background seems to just be horizontal lines of mostly the same color

untold bloom
#

it won't remove rows, so you need to prove that :p

#

merge & join perform more complicated concatenations; append is deprecated and will be gone

mild dirge
hidden rapids
#

Hey guys, new to image recog, cnns, have a problem.
when i use model.summary (model is a Sequential object frm keras), the output is being shown as multiple, instead of the usual tuples, please help

#

my code

serene scaffold
# hidden rapids my code

please always give text as actual text. Code, error messages, etc. Only GUI stuff should be given as screenshots.

mild dirge
#

Don't know what kind of datatype a multiple is

#

If it is a datatype at all ๐Ÿ˜›

hidden rapids
#

model.add(keras.layers.Conv2D(64, 3, activation='relu', padding='same'))
model.add(keras.layers.MaxPooling2D(2))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.BatchNormalization())

model.add(keras.layers.Conv2D(128, 3, activation='relu', padding='same'))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.BatchNormalization())```

```model.add(keras.layers.Flatten())
model.add(keras.layers.Dropout(0.2))

model.add(keras.layers.Dense(32, activation='relu'))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.BatchNormalization())

model.add(keras.layers.Dense(class_num, activation='softmax'))```

```model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy', 'val_accuracy'])
model.build((32, 32, 3))
model.summary()```