#data-science-and-ml | Python | Page 64

wooden sail May 24, 2023, 10:23 PM

#

so what you see there as the cost of np where is the difference between np where and str contains

#

np where was almost instant, str contains was the bottleneck

#

do you mean to use regex here? if not, try passing regex=False to str contains

#

otherwise the comparison isn't very good

tidal bough May 24, 2023, 10:25 PM

#

by using some dtype stuff and then numba, I managed to make it 2 and 8 times slower respectively 🥴

errant lake May 24, 2023, 10:27 PM

#

wooden sail do you mean to use regex here? if not, try passing regex=False to str contains

Not yet, but I should also test regex maybe.
np.where() with in is now way ahead

wooden sail May 24, 2023, 10:27 PM

#

can you show?

#

but anyway yeah, str contains is a super general function with regex, nan replacement, etc by default, so the comparison will be bad unless you turn all of that off

#

you're seeing the numpy and pandas tax: a fixed overhead to prepare for large data with weird properties

errant lake May 24, 2023, 10:29 PM

#

#

^ that's np.where() with in , str.contains() doesn't have regex=False yet

#

clear winner lol

#

str.contains() takes the same time with regex=False

tidal bough May 24, 2023, 10:30 PM

#

str.contain s does have regex=False

wooden sail May 24, 2023, 10:31 PM

#

can you show the code?

tidal bough May 24, 2023, 10:31 PM

#

I don't believe this np.where result is right.

errant lake May 24, 2023, 10:31 PM

#

Ah, let me inspect the df. I used:

np.where('char' in df['column'], 'thing', 'some')

tidal bough May 24, 2023, 10:32 PM

#

'char' in df['column'] is just a single True, since of course it's there

wooden sail May 24, 2023, 10:32 PM

#

yeah i was afraid you'd do something like that

#

i can't think of a sensible way of using np where for this case

#

short of ==, you kinda have to use str contains or one of the other methods you listed to generate the bools

errant lake May 24, 2023, 10:33 PM

#

Ah yep, thanks for that 🤦‍♂️

wooden sail May 24, 2023, 10:37 PM

#

if we do this, for example

 # np.where
    start_time = time.time()
    np.where(df['column'] == 'char', 'thing', 'some')
    results.loc['np.where'][f'Test {i+1}'] = time.time() - start_time
``` we get this plot

#

which would kinda be the ideal case

errant lake May 24, 2023, 10:40 PM

#

Nice thanks. I'll try to refine that to better match some use-cases

wooden sail May 24, 2023, 10:41 PM

#

this also makes sense

    # np.where
    start_time = time.time()
    np.where(df['column'].apply(lambda x: 'char' in x) , 'thing', 'some')
    results.loc['np.where'][f'Test {i+1}'] = time.time() - start_time
``` and yields the plot:

tidal bough May 24, 2023, 10:43 PM

#

I think I might have finally got an overengineered solution that's a bit faster

#

%%cython
import pandas as pd, numpy as np
cdef int cont(str a, str b):
    return a in b
def str_contains_cython(arr: np.ndarray, sub: str):
    n = len(arr)
    result = np.zeros(n,dtype=np.bool_)
    for i in range(n):
        result[i] = cont(sub, arr[i])
    return result

#

200ms on my set, ~10% faster than the listcomp

merry wadi May 24, 2023, 10:43 PM

#

Is anyone familiar with building input output hidden Markovs?

tidal bough May 24, 2023, 10:46 PM

#

a-ha, I got <100ms by cythoning it entirely:

cpdef void str_contains_cython(np.ndarray arr, str sub, np.ndarray res):
    n = len(arr)
    for i in range(n):
        res[i] = cont(sub, arr[i])

%timeit str_contains_cython(df["column"].values,"char", np.zeros(len(df["column"]),dtype=bool)) # 82ms

wooden sail May 24, 2023, 10:46 PM

#

ah yes, python't

tidal bough May 24, 2023, 10:47 PM

#

my least favorite part of cython is the C type syntax

#

why, oh god why

#

last time I tried using cython I realised it would be faster and less painful to rewrite it all in rust

wooden sail May 24, 2023, 10:48 PM

#

that's probably not true either

tidal bough May 24, 2023, 10:48 PM

#

~~that's what I ended up doing though~~ :p

wooden sail May 24, 2023, 10:48 PM

#

~~sunken cost fallacy~~

errant lake May 24, 2023, 10:52 PM

#

I'm using a regular jupyter notebook, is there anything I need to install to run your function definitions in cython? I'm not familiar with it sorry

#

nvm ill read the docs x)

tidal bough May 24, 2023, 10:53 PM

#

You'd need to install cython, then do %load_ext cython, and then a cell with %%cython should compile the contents as cython.

wooden sail May 24, 2023, 10:54 PM

#

then you'll have to rewrite the python as cython 😛

errant lake May 24, 2023, 10:58 PM

#

ah yeah... not convinient

tidal bough May 24, 2023, 10:58 PM

#

wooden sail then you'll have to rewrite the python as cython 😛

not sure what you mean, that function is meant to be self-contained

#

like, cpdef means it's callable from python.

wooden sail May 24, 2023, 10:59 PM

#

yeah but does that look like python? that was my point

tidal bough May 24, 2023, 10:59 PM

#

ah, that's true. it's only useful in situations where you are okay with writing a weird function to speed up a process 2x

#

occasionally it's a lot more useful when you're doing something really weird and the weird function speeds it up dozens of times (e.g. calculating some complicated cumulative function over the df, so you need to iterate)

errant lake May 24, 2023, 11:11 PM

#

Thanks, I'll try to use that. I've seen worse boilerplate for the time gain.

brave sand May 24, 2023, 11:38 PM

#

has anyone used a jetson nano for object detection?

somber pollen May 25, 2023, 12:15 AM

#

wooden sail you're seeing the numpy and pandas tax: a fixed overhead to prepare for large da...

I'm stealing this phrase hahaha

#

also if you don't want to do Cython but are dealing with relatively common types, you can use Numba

#

YMMV tho

hasty mountain May 25, 2023, 1:21 AM

#

How common it is to try to implement an algorithm from a paper and it doesn't work?
I feel like every image generation algorithm I try to implement solely based on the original paper ends up failing...then I have to make adaptations until it works grumpchib

#

Except for DCGAN. DCGAN is cool joe_salute

#

But those papers that use MSE Loss for Variational AutoEncoders...I really don't get their trick. My VAEs only work with Gaussian Log Likelihood

serene scaffold May 25, 2023, 1:50 AM

#

hasty mountain How common it is to try to implement an algorithm from a paper and it doesn't wo...

If you don't have access to the same data that they used, you can't say for sure that you've replicated their experiment and demonstrated that it isn't reproducible.

#

But it's probably a more common problem than we'd like to think

#

I've only implemented the algorithm from a paper once, and it performed well on my data, which was different.

hasty mountain May 25, 2023, 1:53 AM

#

Aw, that's sad...
I wouldn't want to reproduce the algorithm using the exact same data, though(most VAEs, GANs and Diffusion Models papers use CelebA dataset, which I find meh).

#

But I suppose that changing the data from CelebA to a CIFAR100 might be catastrophic...which is also sad, since a good model should be able to overcome such things...I guess... pithink

iron basalt May 25, 2023, 2:06 AM

#

hasty mountain How common it is to try to implement an algorithm from a paper and it doesn't wo...

Common. 😦

#

(And huge waste of my time)

hasty mountain May 25, 2023, 2:08 AM

#

Yeah, I've wasted quite some time in GANs because of that...
Strange... I thought math was supposed to be exact sciences yert

iron basalt May 25, 2023, 2:16 AM

#

hasty mountain Yeah, I've wasted quite some time in GANs because of that... Strange... I though...

Math is not science, nor is science engineering.

serene scaffold May 25, 2023, 2:17 AM

#

and python is not perl.

hasty mountain May 25, 2023, 2:21 AM

#

grumpchib

iron basalt May 25, 2023, 2:22 AM

#

Math - You know what you are doing because you made up the rules.
Engineering - You know what you are doing and are trying to figure out the best way to do it.
Science - You don't know what is going on and are trying to figure it out.

#

ML tends to bounce around all three.

hasty mountain May 25, 2023, 2:23 AM

#

Now it makes sense now

serene scaffold May 25, 2023, 2:23 AM

#

Which one is "You don't know what is going on, but you're happy to be here"?

iron basalt May 25, 2023, 2:23 AM

#

serene scaffold Which one is "You don't know what is going on, but you're happy to be here"?

Art?

agile cobalt May 25, 2023, 2:23 AM

#

iron basalt ``` Math - You know what you are doing because you made up the rules. Engineerin...

Engineering - You think you know what you are doing

serene scaffold May 25, 2023, 2:24 AM

#

iron basalt Art?

I thought some of the most profound and impactful art is made by the depressed and mentally ill

hasty mountain May 25, 2023, 2:24 AM

#

serene scaffold I thought some of the most profound and impactful art is made by the depressed a...

Like my GANs

iron basalt May 25, 2023, 2:25 AM

#

serene scaffold I thought some of the most profound and impactful art is made by the depressed a...

Art - You don't know what is going on, and are **trying** to be happy to be here.

serene scaffold May 25, 2023, 2:25 AM

#

!otn a squiggle the philosopher

arctic wedgeBOT May 25, 2023, 2:25 AM

#

:ok_hand: Added squiggle-the-philosopher to the names list.

potent sky May 25, 2023, 2:52 AM

#

hasty mountain How common it is to try to implement an algorithm from a paper and it doesn't wo...

Just based on the equations? Kinda common for generative algos ig, if you don't use their data then often it doesn't work out. Dcgan, cycle-gan, ragan, wgan have worked and a few others. I'm currently working on implementing LDMs, let's see how that turns out 🥲

potent sky May 25, 2023, 2:58 AM

#

serene scaffold I thought some of the most profound and impactful art is made by the depressed a...

And they say engineers aren't artists and software isn't our canvas...smh

sterile heath May 25, 2023, 5:04 AM

#

potent sky And they say engineers aren't artists and software isn't our canvas...smh

Software not canvas? Oh, my. Oh my oh my.

past meteor May 25, 2023, 5:25 AM

#

hasty mountain How common it is to try to implement an algorithm from a paper and it doesn't wo...

I'm always skeptical unless said method has a lot of adoption. There's more incentive to publish than there is to publish stuff that actually does really well / generalises broadly to many problems

#

If for my research I'd implement everything that looks remotely good I'd be busy forever. Half of the implementations are in Matlab 💀💀💀, way more than half doesn't share code/data and all of them seem to crush the existing benchmarks

tulip wyvern May 25, 2023, 5:32 AM

#

https://colab.research.google.com/drive/1Z8hYgbYTKRqqI3Y2V0BznfyrEh_k2MFB#scrollTo=JbqdrXQMGJtY

Google Colaboratory

#

Can somebody explain why my test accuracy is so bad (8.7%) and my loss isn't decreasing at all

#

I get that I should be using a CNN (idk how to implement CNNs though) but this test accuracy is just crazily bad

agile cobalt May 25, 2023, 5:39 AM

#

tulip wyvern https://colab.research.google.com/drive/1Z8hYgbYTKRqqI3Y2V0BznfyrEh_k2MFB#scroll...

that link is not public

tulip wyvern May 25, 2023, 5:39 AM

#

one sec let me make it public

#

okay its public now my bad

agile cobalt May 25, 2023, 5:42 AM

#

0.5 in everything?

tulip wyvern May 25, 2023, 5:42 AM

#

yea

#

was that a bad play

agile cobalt May 25, 2023, 5:43 AM

#

0.5 dropout as well?

#

also not 100% sure if it is fine or not to reuse the same layer, it would probably give a warning if it was a problem though

tulip wyvern May 25, 2023, 5:43 AM

#

yea i was just doing the medium

#

i dont think the dropout helps me anyways because im far from overfitting 💀

agile cobalt May 25, 2023, 5:44 AM

#

you are dropping way too much

tulip wyvern May 25, 2023, 5:44 AM

#

okay i think ill lower it to 0.3 then

agile cobalt May 25, 2023, 5:47 AM

#

set it to like 0.1

#

and do look into Conv2d layers

tulip wyvern May 25, 2023, 5:47 AM

#

yea im doing some research on them

agile cobalt May 25, 2023, 5:48 AM

#

if you feel like it is overfitting, you can increase it back later

tulip wyvern May 25, 2023, 5:48 AM

#

do you think my low test accuracy is just because im using an rnn (i think thats what its called??)

agile cobalt May 25, 2023, 5:48 AM

#

not sure if it has a name besides just being a neural network, but a CNN would likely work much better

#

maybe also try using Adam instead of just SGD

tulip wyvern May 25, 2023, 5:49 AM

#

i heard about adam before

#

all i know is that itsthe combination of momentum and rmsprop

#

but ill read more into it on the math and its function

#

thank you so much !! 🙏 🙏

past meteor May 25, 2023, 6:26 AM

#

Dropout, batchnorm, ... are things you should add after you've got a network that works but is overfitting

quaint loom May 25, 2023, 8:04 AM

#

Anyone who have a quick way to make this model into text/latex form? Or something similar?

worldly dawn May 25, 2023, 8:05 AM

#

quaint loom Anyone who have a quick way to make this model into text/latex form? Or somethin...

you could try to type it in graphviz/plantuml/mermaid?

quaint loom May 25, 2023, 8:13 AM

#

worldly dawn you could try to type it in graphviz/plantuml/mermaid?

Any webpage that does this thing?

past meteor May 25, 2023, 8:13 AM

#

Depends on how beautiful it needs to be, draw.io could work

#

It's web based but you can also install it

quaint loom May 25, 2023, 8:13 AM

#

It doesnt have to be beautiful, As long as GPT understand it 😛

past meteor May 25, 2023, 8:15 AM

#

Yeah, draw.io (https://app.diagrams.net/) works for that

quaint loom May 25, 2023, 8:17 AM

#

Ehm. It looks like this page can only create the outlook of the model, not in textform

past meteor May 25, 2023, 8:22 AM

#

Oooh, you want to give this diagram as text. I don't think GPT will understand that

quaint loom May 25, 2023, 8:23 AM

#

I remember I got a suggestion earlier here one time but I don`t remember the name of it. Some kind of latex of two dimentional matrix

bold timber May 25, 2023, 8:39 AM

#

anyone can explain to me what is difference between sequence_length and vocab_size from this code?

quaint loom May 25, 2023, 9:18 AM

#

bold timber anyone can explain to me what is difference between ``sequence_length`` and ``vo...

sequence_length relates to the length or size of the input sequence, while vocab_size relates to the number of unique tokens in the vocabulary. These parameters serve different purposes in the code and are used to configure and shape the embedding layer accordingly.

#

In your code, the , sequence_length and vocab_size are two parameters used in the Embeddings class

bold timber May 25, 2023, 9:25 AM

#

quaint loom sequence_length relates to the length or size of the input sequence, while vocab...

I try to run that code as:


emb=Embeddings(5, 5, 256) 
emb_out=emb(test_input)
print(emb_out)
print(emb_out.shape)```

why did I still get ``shape`` as (1, 5, 256) instead of (5, 5, 256)?

quaint loom May 25, 2023, 10:00 AM

#

bold timber I try to run that code as: ```test_input=tf.constant([[0, 1, 1, 5, 6]]) emb=Emb...

I recommend reviewing the implementation of the Embeddings class and checking if there are any batch-related operations or configurations that might be causing the discrepancy in the output shape

#

Sorry for late reply @bold timber

past meteor May 25, 2023, 10:32 AM

#

I have an array alphas = alpha ** np.arange(1, X.shape[1]+1)] that I multiply with X as such y_pred = np.dot(alphas, X.T). Now my question is, how would you guys optimize alpha wrt. the MSE? This is a non-convex problem. Right now I used simulated annealing because that's what I'm most conversant with from school. Can I just throw BFGS on such a problem instead? It's been a while but afaik 2nd order methods will just go to a saddle point.

tidal bough May 25, 2023, 11:38 AM

#

So it's a 1d optimization task, you are only tweaking α? Maybe differential evolution would work well, too.

wooden sail May 25, 2023, 11:40 AM

#

past meteor I have an array `alphas = alpha ** np.arange(1, X.shape[1]+1)]` that I multiply ...

what shape is X?

past meteor May 25, 2023, 11:41 AM

#

tidal bough So it's a 1d optimization task, you are only tweaking α? Maybe differential evol...

Exactly

past meteor May 25, 2023, 11:43 AM

#

wooden sail what shape is X?

#num_pts x #num_variables.

#

I hadn't heard of differential evolution yet! I'll try it on my data

wooden sail May 25, 2023, 11:43 AM

#

ok, was just making sure it was a matrix and not something else

tidal bough May 25, 2023, 11:43 AM

#

~~actually, wait... isn't this exactly quadratic by α?~~ ah, nevermind, **

wooden sail May 25, 2023, 11:44 AM

#

i would point out that no method has guarantees of finding the global optimum btw, so a heuristic like sim annealing also has no guarantee of finding it

past meteor May 25, 2023, 11:45 AM

#

I'm OK with not finding the global optimum. I just defaulted to sim annealing because I was unsure how my go-to (BFGS) would behave here

wooden sail May 25, 2023, 11:53 AM

#

is there any constraint on alpha?

#

probably alpha > 0, or?

past meteor May 25, 2023, 11:54 AM

#

0 < alpha < 1

wooden sail May 25, 2023, 11:54 AM

#

ok

past meteor May 25, 2023, 11:55 AM

#

I'm reading about differential evolution and at a glance it looks like a special case of genetic algorithms

#

Both being population-based metaheuristics that use crossover, mutation and selection. Am I missing anything @tidal bough ?

#

Like, you can express genetic algos in terms of real-valued vectors and pick a specific mutation operator and then it's differential evolution?

tidal bough May 25, 2023, 11:58 AM

#

Yup, it's a fancy genetic algorithm and local searches for refining the results. I've had good results with it in high-dimensional optimization, but plausbily it's less interesting in 1d.

past meteor May 25, 2023, 12:01 PM

#

Genetic algorithms were some of my favourite coursework 🙂 Good to know scipy has a very easy-to-use implementation

night kernel May 25, 2023, 12:07 PM

#

anyone know of a good text summarizer model on github?

#

kind of an ed-techy tool that can summarize large texts
not cgpt
in the interest of saving money not using their api

wooden sail May 25, 2023, 12:12 PM

#

past meteor Genetic algorithms were some of my favourite coursework 🙂 Good to know scipy ha...

do your matrix X and data y have any special properties?

#

the problem is not convex in general, but one could plot the second derivative vs alpha and see if anything special happens

past meteor May 25, 2023, 12:19 PM

#

It's just an exponentially weighted moving average so X are lags

wooden sail May 25, 2023, 12:23 PM

#

so all the entries of the matrix are positive?

past meteor May 25, 2023, 12:24 PM

#

Yes

wooden sail May 25, 2023, 12:24 PM

#

if the matrix and y are fixed, there's a good chance the problem is convex after all

past meteor May 25, 2023, 12:24 PM

#

But they might not be in the future - I might swap the task from predicting y_t to predicting delta_y_t

wooden sail May 25, 2023, 12:25 PM

#

aha

brave sand May 25, 2023, 2:44 PM

#

once I label my dataset, are there any good tutorials for object detection?

#

i want to draw a bounding box

past meteor May 25, 2023, 2:46 PM

#

Running YOLO through open-cv is pretty plug and play: https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html

brave sand May 25, 2023, 3:09 PM

#

past meteor Running YOLO through open-cv is pretty plug and play: https://opencv-tutorial.re...

it's gotta be a custom object detection model, any yt vids for that?

past meteor May 25, 2023, 3:21 PM

#

You'll have to look for that yourself 🙂

#

you can consult the docs of YOLO, they have a tutorial there on how to train their model

patent ocean May 25, 2023, 3:49 PM

#

Guys, I got a trained model from github but I wanna run it using my test dataset.

#

it is an image steganography project

#

but the code just keeps giving me results from the training data

#

If there are any kind souls present who would be willing to spend some time to read this project, I would deeply appreciate your effort. It is getting kind of desperate. I need help. please.

frail flicker May 25, 2023, 4:18 PM

#

hey everyone, im trying to remove the outliers from my data using pandas
Im using the following code rn:

    lower_limit = df.column.mean() - 3* df.column.std()
    upper_limit = df.column.mean() + 3* df.column.std()
    df = df[df[column] > lower_limit]
    df = df[df[column] < upper_limit]```
The error being raised is that the database doesn't have a column object

#

How can I resolve this? Im kinda a beginner with pandas and this is the easiest method to remove outliers I found

spare briar May 25, 2023, 4:40 PM

#

try df[column].mean()

plain jungle May 25, 2023, 5:17 PM

#

past meteor I'm reading about differential evolution and at a glance it looks like a special...

GAs can vary. Some are used in Neural Nets; however, some can be as simple as an array depending if the use case is static or not.

serene scaffold May 25, 2023, 5:22 PM

#

plain jungle GAs can vary. Some are used in Neural Nets; however, some can be as simple as an...

This server really isn't the place for you to promote your youtube channel.

plain jungle May 25, 2023, 5:25 PM

#

Sorry about that, will remove the post

past meteor May 25, 2023, 5:40 PM

#

plain jungle GAs can vary. Some are used in Neural Nets; however, some can be as simple as an...

Huh I don't see the connection between GANs and the conversation we were having before (differential evolution vs genetic algorithms)

potent sky May 25, 2023, 6:02 PM

#

past meteor Huh I don't see the connection between GANs and the conversation we were having ...

GANs?

#

I think JTexpo meant to make a general comment on GAs, in that they can be used to train neural networks

versed gulch May 25, 2023, 6:32 PM

#

what would be the best way to measure a plants bio mass with ai to accuratly measure a plant from the soil up in a single pot, guestimating the root biomass is also important but if i could get a accurate reading of the upper plant i could guestimate the roots possibly. the goal is to measure how much the plant is growing, has grown and esitmate how much water it has uptaken and also possibly transpired. if the system could be embeded into a cheap micro controller like esp32 or arduino it could help with a watering system im working on

agile cobalt May 25, 2023, 6:35 PM

#

what?...

versed gulch May 25, 2023, 6:35 PM

#

agile cobalt what?...

ai biomass estimation. i was thinking white dots around plant like in green screen movie animation

agile cobalt May 25, 2023, 6:36 PM

#

sounds a bit like a regression problem, but you would need of a lot of data to train a model for that

#

after the edit: do you understand how models actually work?

versed gulch May 25, 2023, 6:37 PM

#

agile cobalt after the edit: do you understand how models actually work?

a little bit but no real world hands on

brave sand May 25, 2023, 6:38 PM

#

hey guys

agile cobalt May 25, 2023, 6:38 PM

#

you must have well defined inputs and outputs
the model itself is used to approximate a function, but you must model your problem in a way such that you can "teach" it how to get better at approximating that function

versed gulch May 25, 2023, 6:39 PM

#

ive been recomended open cv and plant cv from gpt

brave sand May 25, 2023, 6:39 PM

#

does this matter?

#

can I use the mobile net v2 320x320 for this?

agile cobalt May 25, 2023, 6:39 PM

#

forget about using GPT if you do not understand the subject of the discussion well enough to verify whenever or not the output makes sense, seriously

brave sand May 25, 2023, 6:40 PM

#

brave sand can I use the mobile net v2 320x320 for this?

does anyone know?

versed gulch May 25, 2023, 6:42 PM

#

agile cobalt forget about using GPT if you do not understand the subject of the discussion we...

ok, i was just asking for guidence on where to start and maybe there is similar projects i could learn from. i see it as a possible highly valuable tool for automating watering for plants and robots but ive only found bit of info here and there regarding the subject, nothing precise like i would like. i would def need somone with experience and thats another story

agile cobalt May 25, 2023, 6:43 PM

#

it looks like there is some research on it (e.g. https://www.frontiersin.org/articles/10.3389/fpls.2022.758818/full), and that plant cv is an actual thing, but I had never heard about that area before

versed gulch May 25, 2023, 6:48 PM

#

agile cobalt it looks like there is some research on it (e.g. https://www.frontiersin.org/art...

wow such a well documented article thank you. the arduino system i built for water plants is basically the blueprint that would work great with a accurate biomass estimate software. would just need to patch them together somehow. i can show u my code if interested

agile cobalt May 25, 2023, 6:49 PM

#

the biggest issue you would have to tackle first is finding a dataset containing the biomass of the plant, with the features you want to use to estimate the biomass

versed gulch May 25, 2023, 6:50 PM

#

agile cobalt the biggest issue you would have to tackle first is finding a dataset containing...

mainly for cannabis plant to start, mainly for vegging. all edible plants one day too

agile cobalt May 25, 2023, 6:52 PM

#

I don't know much about plants, but I highly doubt that data about one species would work for other species unless they are extremely similar/close

versed gulch May 25, 2023, 6:55 PM

#

agile cobalt I don't know much about plants, but I highly doubt that data about one species w...

to get it to work with and high edible plants would be an accomplishment

#

or any valued plant

#

i water plants based on weight so the ai would be a compliment to my setup

#

watering can be a challenge for a lot of people, under watering over watering ect. if the ai can do it all then can help humans with labor costs

potent sky May 25, 2023, 7:22 PM

#

yeah interesting problem to solve but you need to define it as an ML problem first
along the lines of what etrotta said, identify your inputs, why those inputs will give you valuable modeelling information, identify your outputs, study whether such a system even is possible, identify what kind of statistical modelling task it fits into (regression maybe), etc.
Good luck!

analog schooner May 25, 2023, 7:32 PM

#

anyone knows a reliable Midjourney API provider?

tulip wyvern May 25, 2023, 7:33 PM

#

https://colab.research.google.com/drive/1Z8hYgbYTKRqqI3Y2V0BznfyrEh_k2MFB#scrollTo=Peze4xiZGJtW

Google Colaboratory

#

Why is my test accuracy so bad and not increasing at all? (5.6%) for my multiclass task?

#

😦

agile cobalt May 25, 2023, 7:45 PM

#

analog schooner anyone knows a reliable Midjourney API provider?

There are a lot of Stable Diffusion API providers and OpenAI offers Dalle-2 via API, but Midjourney does not offers an official API and you should not trust anything that claims to offer an API for it - It would be either a scam using a different model or violate ToS

agile cobalt May 25, 2023, 7:51 PM

#

tulip wyvern Why is my test accuracy so bad and not increasing at all? (5.6%) for my multicla...

I think that your Backprogagation step is in the wrong order?
also the normalisation with everything in 0.5 might (still) be doing you a disservice

tulip wyvern May 25, 2023, 7:52 PM

#

yeah i think ill chagne that then

#

what do you mean the backpropagation step is in the wrong order

agile cobalt May 25, 2023, 7:52 PM

#

the order of these lines

#

compared to https://pytorch.org/tutorials/beginner/introyt/trainingyt.html in which they zero grad before all others

#

I would also recommend taking a look at the images yourself after your code resizes them to make sure that they remain identifiable at all

versed gulch May 25, 2023, 7:54 PM

#

potent sky yeah interesting problem to solve but you need to define it as an ML problem fir...

thank you

tulip wyvern May 25, 2023, 7:54 PM

#

okay let me try putting the zero_grad first and hen check the images after resize

agile cobalt May 25, 2023, 7:54 PM

#

(like, if the original images are wallpaper sized, you might want to keep them a bit larger or crop the faces first)

versed gulch May 25, 2023, 7:55 PM

#

@agile cobalt thanks for that awesome site and your help

potent sky May 25, 2023, 7:56 PM

#

agile cobalt the order of these lines

loss.backward() is what accumulates the gradients that zero_grad() clears, so both are equivalent no?

potent sky May 25, 2023, 7:57 PM

#

versed gulch thank you

np!

agile cobalt May 25, 2023, 7:57 PM

#

potent sky loss.backward() is what accumulates the gradients that zero_grad() clears, so bo...

I have no idea tbh

potent sky May 25, 2023, 7:57 PM

#

ah okay

agile cobalt May 25, 2023, 7:57 PM

#

the model is also fairly large so you might need to let it train for a bit longer than 2 epochs, specially with the learning rate you set

potent sky May 25, 2023, 7:58 PM

#

Both would be equivalent

tulip wyvern May 25, 2023, 7:58 PM

#

i trianed for 5 epochs and the accuracy didnt change at all

agile cobalt May 25, 2023, 8:04 PM

#

looking at the dataset... a lot of these images are not like each other at all
I would recommend testing your model on a more conventional dataset first

#

including inverted images in the dataset is also weird af imo

#

specially considering that it looks like they were not even consistent on it? some of the characters have inverted images while others don't

tulip wyvern May 25, 2023, 8:06 PM

#

yeah you're right il use a different dataset this one is way too hard

#

ill find one iwht more consistent images

agile cobalt May 25, 2023, 8:06 PM

#

it might work well enough if you fine-tune an existing model, but I don't think that training one from scratch on that data set is a good idea

tulip wyvern May 25, 2023, 8:07 PM

#

yea i probably took too hard of a dataset for a starting learning project 😭

#

thank you very much!! 🙏

potent sky May 25, 2023, 8:09 PM

#

tulip wyvern i trianed for 5 epochs and the accuracy didnt change at all

A few things. Check if 64x64 image size contains enough information for you to do the classification you're trying to do, look at the dataset for this
Though these probably aren't significantly responsible for keeping it at 5%:
Try a lower kernel size. Your image is 64x64, a kernel size of 5 is probably too much
Try more filters, instead of 3->6 try 16,32,64. More filters give the model the ability to capture/model more features
Report train accuracy as well and compare train and test accuracy to get a sense of bias/variance

#

Check the data distribution and balance across classes
Lots of things are possible, you just keep eliminating what isn't the problem to keep making your method better and eventually find out a solution

#

Albumentations could be useful too once your model is learning better

#

So much successive pooling with such a small image-size to start with is rapidly reducing the size of your feature map, and combined with you having few filters could mean a lot of relevant information is available for very few parameters to learn

frozen girder May 25, 2023, 8:17 PM

#

Hi! I am trying to use TargetEncoder from sklearn so as to apply mean encoding to some of my features. But i cant find any info about it, especially examples. Any one knows how to use it?

analog schooner May 25, 2023, 8:26 PM

#

agile cobalt There are a lot of Stable Diffusion API providers and OpenAI offers Dalle-2 via ...

well, they really have a reason for delaying the release of their own API, so people like me that rely on their bot are pretty much stuck if they want to create bigger projects. I, for example, need it for a video generation tool, but without an API it's useless. others like DALEE2 don't work as well. I've found mjapi.io, thenextleg.io and others, might just try them, but I was curious if there's anyone that had already tried them and has anything to share. ofc, this is just a proof of concept, and I'm well aware that it's safer to use a burner account

agile cobalt May 25, 2023, 8:27 PM

#

analog schooner well, they really have a reason for delaying the release of their own API, so pe...

I'll just recommend using Stable Diffusion instead and request for you to avoid discussing that sort of alternative in this server 🤷

#

if something requires using proxies, burner accounts or such, we will not assist it

brave sand May 25, 2023, 10:32 PM

#

once I label my images, what do I do now?

#

i'm trying to make an object detector

timid grove May 25, 2023, 11:53 PM

#

this is my device map for "HuggingFaceH4/starchat-alpha" model , can anyone tell what values should i pass inside no_split_module_classes to make the same layers in one device.

tidal fog May 26, 2023, 12:57 AM

#

Im hoping I could get some help on a problem that I have been having. I have been trying to implement the paper HiPPO https://arxiv.org/abs/2008.07669 and accomplished this here: https://github.com/Dana-Farber-AIOS/HiPPO-Jax. However since then I have been trying to re-implement it with different design choices. The big difference is that HiPPO is basically a layer that behaves recurrently. Its common practice, atleast with RNNs, that you implement the RNN Cell and then call the RNN Cell within a skelaton RNN that accepts arbitrary RNN Cells, i.e. LSTM, GRU, etc. I wanted to do the same with HiPPO where the two different cells are HiPPOLSICell and HiPPOLTICell. I have found that when I try to initialize my parameters such that the flax module knows the learnable weights, the way I am doing it doesnt work and will fail when I try to set the weight matrices as self.param. I will provide the code for all of this in a moment.

hasty mountain May 26, 2023, 1:17 AM

#

My experience with PPO(not HiPPO) says that initializing the network weights the way they suggest(which is, if I remember correctly, through an invertible matrix) makes the model rubbish

and that's a standard for most algorithms I've tried so far

#

At least my model tends to produce the same outputs and get stuck on local optima more easily with such initialization

tidal fog May 26, 2023, 1:19 AM

#

oh ok I cant upload files

#

damn

digital quartz May 26, 2023, 1:44 AM

#

hi everyone, im have a problem, i have to generate a dataset for fine tune gpt from documents, but i have no idea how to do 🥲

serene scaffold May 26, 2023, 1:52 AM

#

digital quartz hi everyone, im have a problem, i have to generate a dataset for fine tune gpt f...

Who told you to do this, and why do they want you to do it?

digital quartz May 26, 2023, 1:57 AM

#

my instructor and he want to fine tune chat gpt to question answering about the document

hasty mountain May 26, 2023, 2:01 AM

#

I suppose it's to decrease the chance of misinformation pithink

serene scaffold May 26, 2023, 2:09 AM

#

digital quartz my instructor and he want to fine tune chat gpt to question answering about the ...

you can't fine-tune ChatGPT, because the actual model is not available. you can only interact with it over their website.

You can download GPT-2 (in a certain sense, ChatGPT is GPT-3.5), but I think that's as close as you can get. And GPT-2 isn't conversational the way ChatGPT is.

If your instructor doesn't know this, but they gave you this assignment anyway, you should see if you can get a refund for the course.

#

if I'm reading this other article right, it looks like you can fine-tune GPT-3 via OpenAI's API, but that the actual model is never turned over to you.

agile cobalt May 26, 2023, 2:14 AM

#

if they did not specify it has to be exactly chatgpt, there are some open source models you can use - overall they are significantly worse than openai's gpt, but should work well enough with fine tuning

serene scaffold May 26, 2023, 2:15 AM

#

agile cobalt if they did not specify it has to be exactly chatgpt, there are some open source...

GPT-2 is on huggingface

agile cobalt May 26, 2023, 2:15 AM

#

I'm thinking more like the ones released earlier this year than however long ago gpt 2 was
edit; 2019

serene scaffold May 26, 2023, 2:16 AM

#

everyone and their grandma wants their own LLM

tulip wyvern May 26, 2023, 4:13 AM

#

https://colab.research.google.com/drive/1Z8hYgbYTKRqqI3Y2V0BznfyrEh_k2MFB#scrollTo=rvYXXqgpGJtX

Google Colaboratory

#

How come my model test accuracy is just the same accuracy as pure guessing and is not improving at all? (10% with 10 classes)

#

I've used three different datasets and my test accuracy has been the same as guessing each time

#

😦

agile cobalt May 26, 2023, 4:18 AM

#

you are still using that?

transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])

tulip wyvern May 26, 2023, 4:18 AM

#

yea i havent changed it yet

#

im just trying to get my model to learn 😭

bold timber May 26, 2023, 4:20 AM

#

I have a question about this code:

def compute_mask(inputs, mask=None):
    mask = tf.math.not_equal(inputs, 0)
    mask1 = mask[:, :, tf.newaxis] # column vector
    mask2 = mask[:, tf.newaxis, :] # row vector
    attention_mask = mask1 & mask2
    return attention_mask

That code is the code for masking in Transformers. But I'm confused about why do we need to set mask= None ?

agile cobalt May 26, 2023, 4:22 AM

#

bold timber I have a question about this code: ```json def compute_mask(inputs, mask=None):...

you shouldn't have to, specially if you always overwrite it later

bold timber May 26, 2023, 4:22 AM

#

agile cobalt you shouldn't have to, specially if you always overwrite it later

I don't understand what you mean. Can you elaborate, please?

arctic wedgeBOT May 26, 2023, 4:24 AM

#

:incoming_envelope: :ok_hand: applied timeout to @tulip wyvern until <t:1685075660:f> (10 minutes) (reason: newlines spam - sent 157 newlines).

The <@&831776746206265384> have been alerted for review.

merry oak May 26, 2023, 4:25 AM

#

!unmute 494283724373098529

arctic wedgeBOT May 26, 2023, 4:25 AM

#

:incoming_envelope: :ok_hand: pardoned infraction timeout for @tulip wyvern.

agile cobalt May 26, 2023, 4:25 AM

#

bold timber I don't understand what you mean. Can you elaborate, please?

if you were doing something like ```py
if mask is None:
mask = tf.math.not_equal(inputs, 0)

merry oak May 26, 2023, 4:25 AM

#

use the pastebin

tulip wyvern May 26, 2023, 4:25 AM

#

yeah sorry 😭

tulip wyvern May 26, 2023, 4:29 AM

#

tulip wyvern https://colab.research.google.com/drive/1Z8hYgbYTKRqqI3Y2V0BznfyrEh_k2MFB#scroll...

i printed out the num correct guesses and the corresponding y label and found that my model is always guessing 9. Anybody have any idea why?

agile cobalt May 26, 2023, 4:29 AM

#

try shuffling your input data

tulip wyvern May 26, 2023, 4:29 AM

#

for the train or test, or both?

agile cobalt May 26, 2023, 4:29 AM

#

both

tulip wyvern May 26, 2023, 4:30 AM

#

okay i will try that 🙏

agile cobalt May 26, 2023, 4:31 AM

#

just remember to keep the images and their corresponding labels together

#

either bundle them in a 2-D array and only sort in the first axis, or just generate a random order to grab indexes from

tulip wyvern May 26, 2023, 4:32 AM

#

i have the label and the image as attributes in my dataset class

#

as lists

#

and for getitem i return the object at the selected index for the image and label list

#

okay i shuffled my training and test set and i got 37% test accuracy

Do you think this is just because I have a bad model architecture or is it a product of another dumb mistake?

agile cobalt May 26, 2023, 4:36 AM

#

are you still using 0.5 for all in the normalisation?

tulip wyvern May 26, 2023, 4:36 AM

#

yep

agile cobalt May 26, 2023, 4:37 AM

#

agile cobalt try shuffling your input data

but either way that was definitely a big whoopsie derp
your model was pretty much only seeing one type of 'target' on each batch

tulip wyvern May 26, 2023, 4:37 AM

#

yeah I can't believe i missed that 😭

#

okay it looks my model can finally learn

#

ill just work on finding better normalizaiton parameters and architecture

#

thank you very much i really appreciate it

agile cobalt May 26, 2023, 4:40 AM

#

taking a quick look at https://paperswithcode.com/sota/image-classification-on-cifar-10 it looks like at least 80% accuracy should be reasonable to achieve

Papers with Code - CIFAR-10 Benchmark (Image Classification)

The current state-of-the-art on CIFAR-10 is VIT-L/16 (Background, Spinal FC). See a full comparison of 228 papers with code.

#

any more than that might require some more delicate tweaking, but even 95% should still be reachable if you put time into it

tulip wyvern May 26, 2023, 4:41 AM

#

yeah im definitely gonna put in some time to play around with it

mild lichen May 26, 2023, 4:41 AM

#

Guys, who can help me with Python simple thing, im new in it.

tulip wyvern May 26, 2023, 6:07 AM

#

https://colab.research.google.com/drive/1Z8hYgbYTKRqqI3Y2V0BznfyrEh_k2MFB#scrollTo=rvYXXqgpGJtX

Google Colaboratory

#

Can somebody explain why the RNN performs so much better than the CNN (which is not improving at all)

#

I don't know why I keep getting this issue 😭 😭

agile cobalt May 26, 2023, 6:15 AM

#

recall = True Positive / Total Positive

you have 516 False Positives and 79 true negatives, so you identified correctly 79 / (519 + 79) of the examples labeled 1 = 0.13277310924369748

#

your classes are extremely unbalanced, so it predicts false positives more often than false negatives

#

*I might have messed|mixed up the names ; do double check it

#

I assume so

#

(so that it supports cases in which you have three or more categories instead of just binary classification)

agile cobalt May 26, 2023, 6:28 AM

#

tulip wyvern I don't know why I keep getting this issue 😭 😭

tbh I would recommend trying to use a slightly more high level library like fastai

I would guess underfitting, since your CNN network has wayyy more params than the other one, but I feel like even then it should improve at least a little more than it is right now
also you left the other one with 10 outputs?

tulip wyvern May 26, 2023, 6:28 AM

#

Fastai is higher level?

#

I was considering learning it but I didn't know people actually use it

tulip wyvern May 26, 2023, 6:29 AM

#

agile cobalt tbh I would recommend trying to use a slightly more high level library like fast...

Wait if my cnn network has more params how come that causes the cnn network to underfit?

agile cobalt May 26, 2023, 6:30 AM

#

not enough data to adjust all the params

#

but never mind, that does not makes much sense

#

I'm kinda sleepy derp

tulip wyvern May 26, 2023, 6:32 AM

#

i think i will try to grind and figure this out

#

even though it is extremely painful

#

hopefully im learning something 💀

#

thank you for all the help youve given me 🙏

agile cobalt May 26, 2023, 6:33 AM

#

maybe play a bit with the learning rate (both increasing and decreasing)

#

the loss going up in the last epoch for the 'rnn' is also a bit suspicious

tulip wyvern May 26, 2023, 6:38 AM

#

im going to do everything imaginable

fickle crescent May 26, 2023, 8:10 AM

#

hi guys, i recently got interested in data science..... can you share some really good resources to study....also how much time would it take before i could make a decent project and land an intership?

wraith walrus May 26, 2023, 8:34 AM

#

error: mp_drawing = mp.solutions.drawing_utils
AttributeError: module 'mediapipe' has no attribute 'solutions'

lapis sequoia May 26, 2023, 9:52 AM

#

Hello, so i have a question about an onnx models, im new to this and i'm not finding enough documentation on the internet.
So i have two models i exported as onnx and they work fine. One is called splitpost and the other nosplitpost. They take as input a mask of shape [1,88,44,60]
i wanted to create an onnx model where i have an if else statement and a split boolean value.
If split is true, we use the first model on the image else we use the second one.
This is my code: https://codeshare.io/Lw4en6
Obviously this isn't working but i don't know why. i'm getting this error on the check model:
onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: axes for operator ReduceSum
==> Context: Bad node spec for node. Name: OpType: If
Can anyone please help me solve this?

dawn hazel May 26, 2023, 10:33 AM

#

.

errant bison May 26, 2023, 11:44 AM

#

can anyone tell which libraries can i use for automatic license plate recognition

potent sky May 26, 2023, 12:18 PM

#

digital quartz my instructor and he want to fine tune chat gpt to question answering about the ...

You can prolly use ChatGPT retrieval plugin and connect it to a vector DB of your knowledge base
But that's reliant on getting access to chatgpt plugins

potent sky May 26, 2023, 12:25 PM

#

wraith walrus error: mp_drawing = mp.solutions.drawing_utils AttributeError: module 'mediapipe...

Check your filenames. They shouldn't be the same as the module name (mediapipe)

#

Basically, if you're using a file named mediapipe.py, change the name

potent sky May 26, 2023, 12:29 PM

#

errant bison can anyone tell which libraries can i use for automatic license plate recognitio...

One approach would be to use yolo for detection and then extract that and use an ocr model to get the license plate number
Many other approaches are possible depending on the kind of data, compute, and what exactly you want to do

potent sky May 26, 2023, 12:31 PM

#

potent sky You can prolly use ChatGPT retrieval plugin and connect it to a vector DB of you...

Also, that won't be fine-tuning. You can't fine-tune chatgpt because the model isn't publicly available.
But for the use-case your prof seems to be going for, retrieval plugin with vector-db is how a lot of teams do it

pulsar needle May 26, 2023, 12:58 PM

#

Hello, I am trying to make a CNN to predict whether an image of an eye is open or closed, and it seemed to train very well over 50 epochs:
Epoch 50/50
9/9 [==============================] - 7s 816ms/step - loss: 0.0044 - accuracy: 1.0000 - val_loss: 0.0119 - val_accuracy: 1.0000

When I predict images using the model, it always looks something like this:
array([[0.00796265, 0.9920373 ]], dtype=float32)
where the second class is much larger than the first.

Also, I tried predicting Images that are in the training set and it still outputted the same, incorrect prediction.

I'm not sure how I could fix this. If anyone could help that would be great.

cold osprey May 26, 2023, 12:59 PM

#

u sure something isnt swapped around?

lapis sequoia May 26, 2023, 1:10 PM

#

pulsar needle Hello, I am trying to make a CNN to predict whether an image of an eye is open o...

why do you have two output values for the prediction if its a binary classification?

potent sky May 26, 2023, 1:21 PM

#

pulsar needle Hello, I am trying to make a CNN to predict whether an image of an eye is open o...

Check the data distribution across your classes to ensure it's not heavily imbalanced

#

For example if you had 990 images - class A, 10 images - class B.
In this case even if your model learns to just predict A everytime, it gets 99.9% accuracy, even tho it's not learning anything valuable.

plain jungle May 26, 2023, 1:34 PM

#

pulsar needle Hello, I am trying to make a CNN to predict whether an image of an eye is open o...

What is the sample size you are using of both open and closed eyes?

pulsar needle May 26, 2023, 1:37 PM

#

plain jungle What is the sample size you are using of both open and closed eyes?

I have 12 5-second videos of my friends where they move their heads around, 6 with their eyes open the whole time, and 6 with their eyes closed. I used every frame from the videos to train my model, so theres an equal amount of data for each class.

plain jungle May 26, 2023, 1:40 PM

#

Just to double check, are you making sure to shuffle your data and not train eye open first then eye closed?

cold osprey May 26, 2023, 1:53 PM

#

lapis sequoia why do you have two output values for the prediction if its a binary classificat...

both ways is fine

lapis sequoia May 26, 2023, 1:53 PM

#

cold osprey both ways is fine

i really thought you were answering my earlier question 😢 . Thanks for the clarification anyway

pulsar needle May 26, 2023, 3:07 PM

#

plain jungle Just to double check, are you making sure to shuffle your data and not train eye...

I believe so. I have this code: X,Y = shuffle(img_data_list,target_column_hotcoded,random_state=2)

plain jungle May 26, 2023, 3:18 PM

#

pulsar needle I believe so. I have this code: X,Y = shuffle(img_data_list,target_column_hotcod...

I’m not familiar with that shuffle. I usually see people do it with a zip of their input list and their output list and then unzipping the object

#

https://www.geeksforgeeks.org/python-shuffle-two-lists-with-same-order/amp/

#

Something like this ^

keen gust May 26, 2023, 5:11 PM

#

anyone here familiar w/ pygsheets or google api auth? I had a project I was working on a while ago, I was using a service account to connect and could access my spreadsheet fine. I went to run it today but it just keeps giving me a timeout error. Nothing has changed in the code since or on my console account

spiral inlet May 26, 2023, 7:06 PM

#

Anyone here competing in Kaggle?

potent sky May 26, 2023, 7:26 PM

#

keen gust anyone here familiar w/ pygsheets or google api auth? I had a project I was work...

I think sometimes the IAM permissions have an expiry after which you have to reinstate them. Maybe that, but not very plausible

potent sky May 26, 2023, 7:28 PM

#

spiral inlet Anyone here competing in Kaggle?

Yeah?

spiral inlet May 26, 2023, 7:28 PM

#

potent sky Yeah?

What kind of competition you are, @potent sky ?

potent sky May 26, 2023, 7:29 PM

#

The last one was on RL

spiral inlet May 26, 2023, 7:29 PM

#

I'm in competing in the "playgrounds", to learn more

potent sky May 26, 2023, 7:30 PM

#

oh great! That should be useful
Past competitions are also useful to learn, you can also observe the discussions by the top submissions and even their code if they've made it public

#

Good luck!

spiral inlet May 26, 2023, 7:31 PM

#

I'm trying this competitions because there are a lot of people in there!

potent sky May 26, 2023, 7:32 PM

#

Oh yeah community is very important too, lots to learn from our peers

spiral inlet May 26, 2023, 7:34 PM

#

But I feel 'alone' in the competitions. I think that talking with other about the competitions can acelerate the apprenticeship

#

For the reason, I decided be more participative here

keen gust May 26, 2023, 7:49 PM

#

potent sky I think sometimes the IAM permissions have an expiry after which you have to rei...

stargazer always here responding! I did go back into the console and adjusted scopes, made a new service account, everything I could think of but couldn't get it to work unfortunately

keen gust May 26, 2023, 8:37 PM

#

potent sky I think sometimes the IAM permissions have an expiry after which you have to rei...

don't know if this helps narrow down how to troubleshoot but running the code in my editor (PyCharm) works fine - it will connect and print the data from the sheet - the issue is when it's trying to display this data in streamlit. Is this a network related issue? It did work previously w/ streamlit when I was originally working on it a few months back.

boreal gale May 26, 2023, 8:43 PM

#

do you know how to create a column for start of week's date for all the value in the Date column?
that's your first order of business.

obtuse locust May 27, 2023, 1:02 AM

#

In Pandas, is it possible to take a list of the data from rows like [A, B, C]

and have it output like

0 | Header A | Header B | Header C|
1 |    A1    |    B1    |    C1
1 |    A2    |    B2    |    C2
1 |    A3    |    B3    |    C3

obtuse locust May 27, 2023, 1:44 AM

#

Got it! Thanks 🙂

serene scaffold May 27, 2023, 1:47 AM

#

obtuse locust Got it! Thanks 🙂

What was your solution?

limber grove May 27, 2023, 2:47 AM

#

Hello! I´ve been learning python basics for almost 2 months and I want to improve my skills related to python for data analytics, does anyone know about any course that could be helpful?

coral field May 27, 2023, 2:58 AM

#

In pytorch, what is the difference between nn.Conv2d() and nn.MaxPool2d()? Both take in the "kernel_size" and "stride" arguments, so wont they perform the same function of decreasing the analysis area of the data?

agile cobalt May 27, 2023, 3:00 AM

#

where did you even read about them?

coral field May 27, 2023, 3:00 AM

#

Doing the "learnpytorch.io" tutorial on CNN's

agile cobalt May 27, 2023, 3:00 AM

#

their inputs and outputs are similar, but the operation they perform on the data is completely different

coral field May 27, 2023, 3:01 AM

#

could you explain?

agile cobalt May 27, 2023, 3:02 AM

#

MaxPool, as the name suggests, grabs the maximum value present in the kernel region, and ignores the rest
a Convolution has weights for each of the kernel positions, and will take all values into consideration ~~at least until an activation layer like relu negates some of them~~

potent sky May 27, 2023, 3:05 AM

#

limber grove Hello! I´ve been learning python basics for almost 2 months and I want to impro...

There's this course by freecodecamp that's..well, free
https://www.freecodecamp.org/learn/data-analysis-with-python/

This is very code-orientated
There's also a few courses by mit ocw if you want to get into the math behind it (recommended)
https://ocw.mit.edu/courses/15-075j-statistical-thinking-and-data-analysis-fall-2011/

Datacamp is considered a very good resource so here's a datacamp course:
https://www.datacamp.com/tracks/data-analyst-with-python
But datacamp is paid. I think you get a free 3 month trial so use it wisely

There are some other great courses as well on udemy, Coursera and ofc YouTube

Good luck!

coral field May 27, 2023, 3:06 AM

#

agile cobalt MaxPool, as the name suggests, grabs the maximum value present in the kernel reg...

so a maxpool is more specific than cnn's

potent sky May 27, 2023, 3:06 AM

#

They perform different operations

serene scaffold May 27, 2023, 3:07 AM

#

maxpool and convolution both involve operations on a sliding region of the array, yes? is there a term that encompasses both?

potent sky May 27, 2023, 3:08 AM

#

maxpool as the name suggests, activates the maximum from the kernel region
A convolution operation activates each element, with some weight, from the kernel region and adds them up

agile cobalt May 27, 2023, 3:09 AM

#

coral field Doing the "learnpytorch.io" tutorial on CNN's

I'm taking a look at their website... I seriously hope that the website is meant to be used as complement to videos, not just on it's own?

limber grove May 27, 2023, 3:10 AM

#

potent sky There's this course by freecodecamp that's..well, free https://www.freecodecamp....

Thank you so much! I´m going to save this links and try them later!

agile cobalt May 27, 2023, 3:11 AM

#

but specifically for CNNs, do take a look at the https://poloclub.github.io/cnn-explainer/ page that they link in their website if you haven't yet
it also explains convolutions and pooling layers

CNN Explainer

An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs).

potent sky May 27, 2023, 3:11 AM

#

keen gust don't know if this helps narrow down how to troubleshoot but running the code in...

Interesting, I'm not so sure. Streamlit's limited customisability annoyed me so I've not used it for ages. I generally just use flask or smtg else
Grpc support for streamlit was also messed up iirc

potent sky May 27, 2023, 3:11 AM

#

limber grove Thank you so much! I´m going to save this links and try them later!

no problem, and good luck!

potent sky May 27, 2023, 3:14 AM

#

serene scaffold maxpool and convolution both involve operations on a sliding region of the array...

Ig maxpool can also be seen as a sort of convolution operation no? The max element is weighted to one and the others to zero

primal jay May 27, 2023, 3:33 AM

#

Hello, am trying to install mediapipe, does anyone know why is this error?

kind loom May 27, 2023, 3:35 AM

#

obtuse locust In Pandas, is it possible to take a list of the data from rows like [A, B, C] a...

I can think of a lot of formatting
Formatting the keys and values to form a dataframe from a dictionary

agile cobalt May 27, 2023, 3:42 AM

#

primal jay Hello, am trying to install mediapipe, does anyone know why is this error?

Broadly speaking, there are no versions that would work in your environment. Check the library's supported OS and Python versions

primal jay May 27, 2023, 3:45 AM

#

Aight, i was using conda, maybe is that

tulip wyvern May 27, 2023, 4:21 AM

#

https://colab.research.google.com/drive/1zRPhI7Z52IgDCa0WJQrGcf-9czbQg1Sh?usp=sharing

Google Colaboratory

#

Anybody have any ideas on how I can reduce overfitting (Train: 100%, test: ~65%)

#

On a multiclass image task

#

I've done:
Normalization
Batchnorm
Random horizontal flip (0.5 chance)
Random grayscale (0.2 chance)
Dropout regularization (0.4 chance on each layer)

#

What other techniques are there?

hidden sigil May 27, 2023, 4:28 AM

#

tulip wyvern Anybody have any ideas on how I can reduce overfitting (Train: 100%, test: ~65%)

undersampling

tulip wyvern May 27, 2023, 4:28 AM

#

not enough train data?

hidden sigil May 27, 2023, 4:29 AM

#

tulip wyvern not enough train data?

how much training data do you have

tulip wyvern May 27, 2023, 4:29 AM

#

840 * 3 (for each class)

agile cobalt May 27, 2023, 4:29 AM

#

did you figure out why was it stuck on 33% the other day?

tulip wyvern May 27, 2023, 4:29 AM

#

agile cobalt did you figure out why was it stuck on 33% the other day?

yeah my model was just really bad

agile cobalt May 27, 2023, 4:30 AM

#

wow

#

wasn't it just like learning rate too high?

tulip wyvern May 27, 2023, 4:30 AM

#

oh yeah that too

#

i lowered it and it improved a lot

agile cobalt May 27, 2023, 4:30 AM

#

that explains it

#

it was probably overshooting in the gradient

tulip wyvern May 27, 2023, 4:31 AM

#

yeah

agile cobalt May 27, 2023, 4:35 AM

#

0.4 dropout sounds a bit high to me, not sure what is the standard though, and you are applying it pretty late in the process?
using Norm layers like BatchNorm comes with a few things you have to pay attention to, did you look into that?

hidden sigil May 27, 2023, 4:35 AM

#

idk how you set up your code but ill explain the basics of undersampling if you want to try it, basically you make it so that the number of classes you have for each target is approximately the same, and you can do that by removing samples randomly of a row that has the target class where there is more of. so lets say you have 1000rows of data where the target value is of class a and 200 rows of data for target class b. you randomly remove 800 of those rows where target value is a so that you end up with 200 rows for each target; if you implement that properly you can probably reduce the number number of convolutions, normalizations and dropouts you do significantly

hidden sigil May 27, 2023, 4:35 AM

#

agile cobalt 0.4 dropout sounds a bit high to me, not sure what is the standard though, and y...

0.4-0.6 is normal

agile cobalt May 27, 2023, 4:36 AM

#

oh wow

tulip wyvern May 27, 2023, 4:37 AM

#

agile cobalt 0.4 dropout sounds a bit high to me, not sure what is the standard though, and y...

The default in pytorch is 0.5 so I just used 0.4
What do you mean I'm applying it late in the process? I thought you were supposed to apply it right after each linear layer
I haven't looked into the things that batchnorm brings along with it, but upon a quick google search it says that too small of a batch size could negatively affect it, and that batchnorm can also affect the laerning rate?

tulip wyvern May 27, 2023, 4:38 AM

#

hidden sigil idk how you set up your code but ill explain the basics of undersampling if you ...

The dataset I used from kaggle is already balanced

agile cobalt May 27, 2023, 4:39 AM

#

never mind about dropout, from looking it up it sounds like that is about right (batch norm between conv layers and dropout between linears)

hidden sigil May 27, 2023, 4:40 AM

#

tulip wyvern The dataset I used from kaggle is already balanced

you should try looking at a classification_report then on your test and the predictions to see how your model is working

tulip wyvern May 27, 2023, 4:41 AM

#

I've never heard of classification_report, I will look into it

hidden sigil May 27, 2023, 4:41 AM

#

    
    xgboost_classifier = xgb.XGBClassifier()
    xgboost_classifier.fit(X_train, y_train)
    y_pred = xgboost_classifier.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)
    print(classification_report(y_test, y_pred))``` here is a simple example

tulip wyvern May 27, 2023, 4:41 AM

#

ohh its something scikitlearn already has

#

okay let me try that out rq then

median leaf May 27, 2023, 6:51 AM

#

guys i keep trying to do knn classifier

#

but i keep getting error about ''ValueError: Unknown label type: 'continuous'

#

anyone know the problem?

kind loom May 27, 2023, 7:08 AM

#

median leaf anyone know the problem?

Knn is used for classification task where the target variable is discrete. Your dataset might require a regression algorithm. Try using linear regression or SVR

kind loom May 27, 2023, 7:09 AM

#

median leaf anyone know the problem?

Your target variable should be divided into classes

median leaf May 27, 2023, 7:37 AM

#

It is and everything seems discrete so im quite confused

sleek harbor May 27, 2023, 7:40 AM

#

could anyone throw me a link to a good guide on feature engineering? Feels like it's a big deal yet I see little coverage for it in guides and tutorials.. I get that domain knowledge is often key here, but what are some ways you could get an idea of what to do without any domain knowledge? The kaggle tutorial on PCA shows that it can be used for feature extraction, but it's not very well explained. Anyone know where I could read about PCA, not about how it's calculated (there's plenty info on that), but on how to interpret and use the results? And/or other popular feature engineering techniques?

past meteor May 27, 2023, 8:15 AM

#

sleek harbor could anyone throw me a link to a good guide on feature engineering? Feels like ...

What helped for me was just doing Kaggle competitions. Do a notebook, submit and then look at other people's solutions.

#

There's also well-documented things you can try for canonical problems like creating lags in time series. Specifically for linear models: interactions terms, B-splines, ...

sleek harbor May 27, 2023, 8:24 AM

#

past meteor What helped for me was just doing Kaggle competitions. Do a notebook, submit and...

do you think putting a kaggle notebook into ur portfolio is a good or bad idea? I see many mixed opinions on the matter

past meteor May 27, 2023, 8:26 AM

#

sleek harbor do you think putting a kaggle notebook into ur portfolio is a good or bad idea? ...

Is this information from Reddit? I feel like they say a lot of stuff there that is demonstrably fake or at least coming from people that are in a different country or even continent so it might not apply to you 🙂

sleek harbor May 27, 2023, 8:28 AM

#

past meteor Is this information from Reddit? I feel like they say a lot of stuff there that ...

oh I've seen it everywhere, on youtube, reddit, random chat groups. Some say recruiters love kaggle, some say they've seen all kaggle notebooks and that they're sick of it and putting them on your resume is a bad idea.. I'm thinking of doing the House price prediction one and putting that on my resume, but..

past meteor May 27, 2023, 8:30 AM

#

sleek harbor oh I've seen it everywhere, on youtube, reddit, random chat groups. Some say rec...

I'm European and the voices on those platforms are all dominated by the US so I take them with a massive grain of salt

#

I think someone having code I can gloss over is great. It's nice if they've actively worked on things so I'd see experience on Kaggle as a plus. I also value experience in internships, student jobs as well but that's not mutually exclusive with sideprojects like Kaggle

#

For me the big thing is that none of these are mutually exclusive and most discussions treat them as if they were lol. You can get a degree, do Kaggle/sideprojects and get internships etc. Personally that's what I did

sleek harbor May 27, 2023, 8:45 AM

#

past meteor For me the big thing is that none of these are mutually exclusive and most discu...

but when ur time is limited and u r in a big hurry to get a job quick - u gotta choose 😅
I don't have a degree (not CS at least, I have a bachelors in Economics), and an unpaid internship is.. not really something I'd want to spend time on (if its a paid internship then sign me up lol). I have a few ideas for sideprojects, but one is massive and I'm unsure if I'm qualified to even approach it, and I haven't even checked if I can get the appropriate data for the second one I had in mind.. 💀 If everything works out as planned, I'll put one kaggle and 2 side projects on my resume, and hope for the best. I just hope my antisocial introvertness doesn't kick in during interviews.. 🗿 First step is to get the interview tho.. Storytelling is a big deal in DS, it seems.. and my storytelling skills are.. lacking

past meteor May 27, 2023, 8:47 AM

#

sleek harbor but when ur time is limited and u r in a big hurry to get a job quick - u gotta ...

You could also just get an analytics type job and do that for a year or two. A lot of the skills are transferable.

#

For what it's worth, my bachelors was from the economics faculty as well, but not pure econ

quick bay May 27, 2023, 9:10 AM

#

Hello, does anyone know how to do classification on univariate, muticlass, and imbalanced time series dataset? I would like to know that what i'm doing is right or not.

potent sky May 27, 2023, 10:06 AM

#

sleek harbor oh I've seen it everywhere, on youtube, reddit, random chat groups. Some say rec...

I think Kaggle NBs are useful in the sense that they can demonstrate your ability to work with code like a lot of the industry does, much like GitHub but toned down
However I think there could be a case for recruiters being tired of seeing the same things again and again. Like house price prediction is an overdone project, everyone does it, it's available easily, you could've just copy pasted it onto your resume.
Don't do something like that.
Do something new. Use Data Science to actually solve an interesting problem. It might be useful, or it might be silly, doesn't matter; you can do analysis of Pokemon for all they care
As long as it's something new and interesting

#

I have heard this from recruiters at some top companies. The same overdone kaggle projects don't help you.
Demonstrate that you can identify a problem and then use Data Science to bring it to a solution

burnt island May 27, 2023, 12:00 PM

#

Well said

#

Whenever you get to see this, do know that I am data scientist willing to participate in machine learning / DS projects . My main goal is to build my experience in solving real world problems using machine learning algorithms.

Whether it is a virtual internship, unpaid role, paid role, hackathon I am willing to contribute as much as my current resources would allow me to

tepid parcel May 27, 2023, 1:58 PM

#

Good morning!

I plan to be a machine learning engineering, but don't know if I need to learn machine learning from scratch without software development tool or It's better to use libraries and frameworks in this process? What should I analyze and consider on this choice?

If I will learn from scratch, what should I learn, only the logic or more structured and complex things like algorithms or similars?

grand minnow May 27, 2023, 2:25 PM

#

tepid parcel Good morning! I plan to be a machine learning engineering, but don't know if I ...

Here's a roadmap that you can use as a reference: https://whimsical.com/machine-learning-roadmap-2020-CA7f3ykvXpnJ9Az32vYXva

Whimsical

Machine Learning Roadmap 2020

2020 machine learning roadmap built in Whimsical. View for detailed mind map on: machine learning resources and the machine learning process for projects.

spice whale May 27, 2023, 2:55 PM

#

Hey @grand minnow

grand minnow May 27, 2023, 2:55 PM

#

Heyyyy

narrow crane May 27, 2023, 3:28 PM

#

https://www.youtube.com/watch?v=ICzi3GZT-Qc
https://www.youtube.com/watch?v=FYjrjRnO9Y8

YouTube

the data janitor

Bad Advice is Killing Your Career

You've been scammed and don't even know it.

▶ Play video

YouTube

the data janitor

The Path is Clearly Defined

Just because you don't know it doesn't mean it doesn't exist.

▶ Play video

#

What do you guys think of this channel, and what guidance could you give someone who's trying to make it in machine learning engineering or data science?

serene scaffold May 27, 2023, 4:21 PM

#

@narrow crane I watched the first one. I think he's mostly right, but might be over-stating a few things for dramatic effect. like with portfolio projects--they aren't useless necessarily, but no number of portfolio projects can make up for a lack of "real" experience.

he's also right that "learning Python or R" isn't a strategy for getting an AI role. No one gets hired into AI because they "know python". They get hired because of demonstrated knowledge and application of AI.

narrow crane May 27, 2023, 4:22 PM

#

serene scaffold <@398584986850033675> I watched the first one. I think he's mostly right, but mi...

His videos somewhat startled me because I realized my understanding and path is a lot more shakey than I imagined.

#

My plan was to just go through kaggle courses, then try to do the machine learning bootcamp, and hopefully do some projects while learning NLP and SQL.

serene scaffold May 27, 2023, 4:24 PM

#

narrow crane My plan was to just go through kaggle courses, then try to do the machine learni...

you will not get a job in AI doing that.

#

unless you have industry experience in another STEM field.

past meteor May 27, 2023, 4:27 PM

#

narrow crane https://www.youtube.com/watch?v=ICzi3GZT-Qc https://www.youtube.com/watch?v=FYjr...

I watched both. What he's saying, especially in the second video doesn't apply to say Western Europe

serene scaffold May 27, 2023, 4:28 PM

#

I'm watching the second video now. He seems weirdly preoccupied with SQL.

past meteor May 27, 2023, 4:28 PM

#

Because for old school data professionals data = tabular data

serene scaffold May 27, 2023, 4:29 PM

#

I agree that SQL is very important. it's just weird that he keeps saying "3 years of SQL"

past meteor May 27, 2023, 4:29 PM

#

If you start your career making Power BI dashboards good luck pivoting to anything ML heavy in the future I think...

serene scaffold May 27, 2023, 4:29 PM

#

he just said "excel is not a big boy data tool"--is this the guy who keeps posting that starbucks barista pic on quora??

#

done with the second video. does he ever talk about getting a masters in CS? because that's probably the most straightforward way to get a job in ML.

past meteor May 27, 2023, 4:31 PM

#

Getting a bit off-topic on my part but I'ma be honest and say that what matters the most is just a solid university degree.

serene scaffold May 27, 2023, 4:32 PM

#

I think he's mostly right when he says "there are no entry level jobs in ML" in the sense that the only ML jobs that will take you based only on your degree, require that you did something tangible with ML during that degree

serene scaffold May 27, 2023, 4:33 PM

#

past meteor Getting a bit off-topic on my part but I'ma be honest and say that what matters ...

this is on-topic.

narrow crane May 27, 2023, 4:33 PM

#

serene scaffold you will not get a job in AI doing that.

Well according to his videos I have to start with an entry level role preferably data analyst. So what do you think I should be doing to get there and then eventually to a machine learning engineer role?

#

Should I consider changing my study plans?

past meteor May 27, 2023, 4:33 PM

#

Are you based in the US?

narrow crane May 27, 2023, 4:34 PM

#

Yeah I am.

serene scaffold May 27, 2023, 4:34 PM

#

narrow crane Well according to his videos I have to start with an entry level role preferably...

I think you should plan to get a masters in CS.

#

(I'm assuming you're a young person who doesn't have professional experience. tell me if I'm wrong.)

narrow crane May 27, 2023, 4:35 PM

#

You’re not wrong. But my current degree is in cyber security.

#

Which is..different yeah. I was trying to learn two skill sets in one.

serene scaffold May 27, 2023, 4:36 PM

#

one that you're currently pursuing, or that you've finished?

narrow crane May 27, 2023, 4:36 PM

#

serene scaffold one that you're currently pursuing, or that you've finished?

Currently pursuing. I’m in my freshman year right now. But I was just very interested in data science.

past meteor May 27, 2023, 4:36 PM

#

Then I don't know how your job market looks like. In our case, you're not getting hired to do ML with a bachelors. Amongst masters level candidates there's tons of degrees (CS, bio-informatics, quant business (my first one), statistics, ...) that all can/want to work in ML/AI so you need to be able to "compete"

#

If you're interested early you have a ton of time to do relevant internships, projects, ... I think

serene scaffold May 27, 2023, 4:37 PM

#

past meteor Then I don't know how your job market looks like. In our case, you're not gettin...

I got hired into ML with only a bachelors, but I had to cultivate a very niche skillset that only made me viable for a handful of roles. and there was also some luck involved.

narrow crane May 27, 2023, 4:38 PM

#

Yeah I think I may have made a few mistakes with my goals.

queen cradle May 27, 2023, 4:38 PM

#

narrow crane Currently pursuing. I’m in my freshman year right now. But I was just very inter...

If you're a freshman, then you have lots of time. You should pursue a STEM degree of some sort (math, CS, physics, chemistry, engineering, econometrics, etc.), and you should plan to get a master's degree.

serene scaffold May 27, 2023, 4:38 PM

#

narrow crane Currently pursuing. I’m in my freshman year right now. But I was just very inter...

"data science" is really an umbrella term, rather than a specific thing. but if you can take ML courses during your current program, or get into an ML-focused masters after you finish this program, someone who knows about ML and cybersec would be very valuable.

past meteor May 27, 2023, 4:39 PM

#

narrow crane Yeah I think I may have made a few mistakes with my goals.

Take it from someone that studied in the economics / business faculty. You have time.

#

I did a lot of internships in data, learnt Python, ... while I was in my 2nd year I think

queen cradle May 27, 2023, 4:39 PM

#

Really I would say that the important thing is to do well in your courses. Any student with a 3.9 GPA is more appealing than any student with a 2.5 GPA.

narrow crane May 27, 2023, 4:40 PM

#

queen cradle Really I would say that the important thing is to do well in your courses. Any s...

Got it.

#

Yeah I was doing courses and studying data science on my own whilst being in clsss for cyber security.

past meteor May 27, 2023, 4:40 PM

#

If you can pick good electives you should

#

Idk how math / statistics heavy your program is but for us those 2 were the cornerstone of the degree which made grokking "data science" concepts easier

serene scaffold May 27, 2023, 4:42 PM

#

I have to go, so Kyle and zestar can fight over who gets to be in charge.

narrow crane May 27, 2023, 4:46 PM

#

So is it realistic for me to continue as I am and be able to land a good career or do I need to change a few things?

#

What courses, practices, or anything do I need to learn practical data science skills?

narrow crane May 27, 2023, 4:47 PM

#

serene scaffold "data science" is really an umbrella term, rather than a specific thing. but if ...

Btw should I do courses for the sake of certification or actually learning the skills?

narrow crane May 27, 2023, 4:53 PM

#

serene scaffold done with the second video. does he ever talk about getting a masters in CS? bec...

Well I don’t think he’s ever discouraged anyone from getting a degree but he treats the idea very harshly. He stated that degrees won’t teach you the real world necessary skills.

past meteor May 27, 2023, 4:55 PM

#

narrow crane What courses, practices, or anything do I need to learn practical data science s...

Considering you're a freshman just work on getting the skills you need on the side and do internships, summer jobs, ... in data

sleek harbor May 27, 2023, 4:55 PM

#

the way I see it - you don't get a degree to learn anything. U just get it so you would attract recruiters, so you could get hired. Once you are hired, then you actually start to learn (obv u should learn on ur own before that, else you won't get hired even with a degree, cus u won't have any knowledge, but what I mean is that a degree won't give you much.. didn't give me much at least)

past meteor May 27, 2023, 4:56 PM

#

Everything is learnable without a degree sure but if your uni / profs are good it helps. Massively.

narrow crane May 27, 2023, 4:57 PM

#

past meteor Considering you're a freshman just work on getting the skills you need on the si...

I appreciate the advice. But still I’m curious as to whether or not current study plan of using kaggle resources, the google collab boot camp, and then trying to do my own projects is sufficient for me to get those skills on the side. Will definitely start looking into internships. Also how do you feel about coding boot camps?

#

Like maybe I could do a boot camp for a summer to get the data skills

#

There’s a lot of programs for data science and cyber security

past meteor May 27, 2023, 4:57 PM

#

narrow crane Like maybe I could do a boot camp for a summer to get the data skills

Job experience >>>> bootcamp

narrow crane May 27, 2023, 4:58 PM

#

past meteor Job experience >>>> bootcamp

Alr, will the job experience teach me? Better than the boot camp at least?

#

Or is it just better for people to hire me with

past meteor May 27, 2023, 4:59 PM

#

Jobs taught me a lot I didn't learn in school nor didn't pay enough attention to

#

Reading and doing something are different

queen cradle May 27, 2023, 4:59 PM

#

narrow crane I appreciate the advice. But still I’m curious as to whether or not current stu...

It depends on what you want to do. If you want a thorough understanding of ML theory, then you will not get it this way. That requires a master's degree in statistics (at a minimum). A boot camp will teach you how to use existing ML packages in elementary ways.

past meteor May 27, 2023, 4:59 PM

#

They're complementary

narrow crane May 27, 2023, 5:00 PM

#

queen cradle It depends on what you want to do. If you want a thorough understanding of ML th...

I see

queen cradle May 27, 2023, 5:00 PM

#

Boot camps are fine if you want to just use ML techniques as a black box.

sleek harbor May 27, 2023, 5:00 PM

#

past meteor Everything is learnable without a degree sure but if your uni / profs are good i...

I wish I could say mine were good.. they weren't 💀 I want to get a masters at MIT, but.. if I ever get the chance, I'll probably be 30 by then, so waaay in the future some time.. :/

past meteor May 27, 2023, 5:00 PM

#

I took DB courses in school and also spent the summer doing data engineering a few years back

#

The DE jobs taught me so much about working in data environments, what the pitfalls were, and just practical SQL and more

narrow crane May 27, 2023, 5:01 PM

#

past meteor They're complementary

Okay so it would be preferable to do both but if I can only do one, pick job experience and internships?

queen cradle May 27, 2023, 5:01 PM

#

Let me put it this way. Given your situation, planning on a boot camp seems unwise. You would probably be better served by taking relevant courses. After all, you're already paying to go to school; you may as well learn some things while you're there!

past meteor May 27, 2023, 5:01 PM

#

The uni perspective was way broader and explained a lot of concepts that people take for granted in industry

narrow crane May 27, 2023, 5:01 PM

#

queen cradle Let me put it this way. Given your situation, planning on a boot camp seems unwi...

True that’s a good point.

queen cradle May 27, 2023, 5:02 PM

#

What you don't usually get in university courses are practical skills, like how to use SQL or various ML packages.

narrow crane May 27, 2023, 5:02 PM

#

I’ll have to consider a lot after this conversation

queen cradle May 27, 2023, 5:02 PM

#

You can get those in a boot camp, but you get them faster and better with a job.

past meteor May 27, 2023, 5:02 PM

#

I think bootcamps are a no-go tbh

#

They cover stuff you could just do by yourself

queen cradle May 27, 2023, 5:04 PM

#

I think they're reasonable in some situations. They give you something specific you can put on a resume. Like, maybe you start to do data stuff at some job, but it's not an official part of your job duties, and you decide you want to do it full-time. A boot camp might smooth that transition for some people.

#

That said, I think they are often oversold.

#

I think they have some value, but not as much as the boot camps' marketing would have you believe.

narrow crane May 27, 2023, 5:05 PM

#

Don’t mean to interrupt but I have one final question. Are the Kaggle and collab courses good enough or are they irrelevant or bad or whatever

#

I really do appreciate you guys being patient enough to talk to me about these things btw

queen cradle May 27, 2023, 5:06 PM

#

I don't know; I've never looked at them myself.

narrow crane May 27, 2023, 5:06 PM

#

narrow crane Don’t mean to interrupt but I have one final question. Are the Kaggle and collab...

But yeah currently I was on the Kaggle course for machine learning introduction, which I did right after I finished the Kaggle course for pandas

narrow crane May 27, 2023, 5:06 PM

#

queen cradle I don't know; I've never looked at them myself.

That’s fine

past meteor May 27, 2023, 5:06 PM

#

narrow crane Don’t mean to interrupt but I have one final question. Are the Kaggle and collab...

I did Kaggle a bit because my degree was very theoretical

narrow crane May 27, 2023, 5:07 PM

#

past meteor I did Kaggle a bit because my degree was very theoretical

Do you think it was actually helpful? Is it sufficient for a person like me who’s not currently going for a degree in computer science or data science?

#

I think right now I may have to focus on just being able to use these things like how Kyle said

past meteor May 27, 2023, 5:07 PM

#

Yes, there's a series of competitions called "tabular playground", make your own notebook. Submit it and then look at how other people solved theirs

narrow crane May 27, 2023, 5:08 PM

#

Part of me is scared but at the same time the capability of learning and growing over time is interesting.

#

Imagine I’m 30 or something and just a cyber wizard.

queen cradle May 27, 2023, 5:11 PM

#

When you're 30, you'll probably not feel like a wizard no matter what you do. Either you won't be that good; or you really will be that good but you'll know how much you don't know!

past meteor May 27, 2023, 5:11 PM

#

sleek harbor I wish I could say mine were good.. they weren't 💀 I want to get a masters at M...

RIP. Higher ed in NA is so expensive

#

If you like an adventure you could just do it in Europe for a fraction of the price

sleek harbor May 27, 2023, 5:16 PM

#

past meteor If you like an adventure you could just do it in Europe for a fraction of the pr...

I kinda want the prestige that comes with the letters M.I.T :3 Also they're supposed to have high quality material and innovative stuff.. I'm interested in timeseries and stock analysis as well as robotics, and that's the place to go for that.. Will probably settle for a cheap online masters eventually tho

somber panther May 27, 2023, 5:17 PM

#

determinate came up in my study, ad - bc is kind of abstract for me, this something i should spend much time studying if i want to market myself in DS fields?

narrow crane May 27, 2023, 5:19 PM

#

past meteor Yes, there's a series of competitions called "tabular playground", make your own...

thank you for the resource. i have no idea why this message didn't load on my phone

potent sky May 27, 2023, 5:38 PM

#

past meteor Then I don't know how your job market looks like. In our case, you're not gettin...

Its true that an overwhelming majority of jobs in ML Engineering/Research require you to have atleast a bachelor's.
However, it's definitely possible to get one with just a bachelor's, just very difficult. You have to be either very highly skilled ina specific area, or have solid demonstrable expertise well rounded

potent sky May 27, 2023, 5:39 PM

#

narrow crane Yeah I think I may have made a few mistakes with my goals.

Don't you generally have the option to pick your major till end of sophomore year in US universities?

past meteor May 27, 2023, 5:41 PM

#

potent sky Its true that an overwhelming majority of jobs in ML Engineering/Research requir...

I mean, yeah there's many different paths to whatever role you want 🙂 I'm just talking about the path of least resistance?

serene scaffold May 27, 2023, 5:42 PM

#

narrow crane Well I don’t think he’s ever discouraged anyone from getting a degree but he tre...

A lot of what you learn during a degree isn't directly applicable, but what matters is that it's what employers want to see in someone who doesn't have industry experience.

#

If you have enough industry experience, that can eventually eliminate the need for a degree. But then, that experience is hard to get if you didn't have the degree.

narrow crane May 27, 2023, 5:43 PM

#

potent sky Don't you generally have the option to pick your major till end of sophomore yea...

No you pick your major first before you start studying.

past meteor May 27, 2023, 5:43 PM

#

Tbh here you either do bachelors + masters at a research univ or a bachelors in applied science. The latter has 0 math in their applied comp sci programs.

#

Correlation != causation but employers do think so, so if you don't have a masters you're looked at unfavourably (at entry level). This doesn't apply to the US so I'd say my most important piece of advice is actually not to give too much weight to stranger's advice on the internet (not me, Reddit, Youtube or otherwise) and to understand your local job market tbh

potent sky May 27, 2023, 5:46 PM

#

past meteor I mean, yeah there's many different paths to whatever role you want 🙂 I'm just ...

Yep yep I agree with your point, was just supplementing it

potent sky May 27, 2023, 5:47 PM

#

narrow crane No you pick your major first before you start studying.

Atleast you can change it till end of sophomore year? I'd advise you to maybe check this again with your uni administration

potent sky May 27, 2023, 5:48 PM

#

potent sky Its true that an overwhelming majority of jobs in ML Engineering/Research requir...

**atleast have a master's

night prawn May 27, 2023, 5:49 PM

#

I have followed this tuto https://ubuntu.com/tutorials/enabling-gpu-acceleration-on-ubuntu-on-wsl2-with-the-nvidia-cuda-platform#4-compile-a-sample-application for install wsl but how use gpu with vsc ?

Ubuntu

Enabling GPU acceleration on Ubuntu on WSL2 with the NVIDIA CUDA Pl...

Ubuntu is an open source software operating system that runs from the desktop, to the cloud, to all your internet connected things.

serene scaffold May 27, 2023, 5:52 PM

#

night prawn I have followed this tuto https://ubuntu.com/tutorials/enabling-gpu-acceleration...

You don't use the GPU with VSC. VSC is just the editor.

#

Are you trying to use pytorch, or what?

night prawn May 27, 2023, 5:53 PM

#

tensorflow

serene scaffold May 27, 2023, 5:53 PM

#

So you need to look into how to install tensorflow with CUDA. That you're using VSC isn't relevant.

night prawn May 27, 2023, 5:56 PM

#

So I must install an anaconda framework ?

serene scaffold May 27, 2023, 5:57 PM

#

night prawn So I must install an anaconda framework ?

You don't need to install anaconda, no. You can if you want to. But anaconda is mostly for academic types who don't consider themselves programmers.

#

Also there is no "anaconda framework". It's just anaconda.

potent sky May 27, 2023, 5:58 PM

#

WSL2 is a type-1 hypervisor. Just install tensorflow with cuda and cudnn for Linux in the subsystem and it should ideally work

#

I have never tried it myself tho

past meteor May 27, 2023, 6:00 PM

#

Yes, TF and Pytorch support WSL2 better than actual Windows 🤣

serene scaffold May 27, 2023, 6:01 PM

#

Is that surprising?

night prawn May 27, 2023, 6:01 PM

#

I have installed cuda like in the tuto but i don't know how install cudnn and i tried to install tensorflow with pip but it doesn't work

serene scaffold May 27, 2023, 6:01 PM

#

night prawn I have installed cuda like in the tuto but i don't know how install cudnn and i ...

Be sure to never say that something "doesn't work". Show what you did, and the result.

potent sky May 27, 2023, 6:02 PM

#

How so? Or dym only for cuda?

potent sky May 27, 2023, 6:02 PM

#

past meteor Yes, TF and Pytorch support WSL2 better than actual Windows 🤣

night prawn May 27, 2023, 6:03 PM

#

i have did this

past meteor May 27, 2023, 6:04 PM

#

potent sky How so? Or dym only for cuda?

"GPU support on native-Windows is only available for 2.10 or earlier versions, starting in TF 2.11, CUDA build is not supported for Windows. For using TensorFlow GPU on Windows, you will need to build/install TensorFlow in WSL2 or use tensorflow-cpu with TensorFlow-DirectML-Plugin"

potent sky May 27, 2023, 6:11 PM

#

past meteor "GPU support on native-Windows is only available for 2.10 or earlier versions, s...

Yeahh I read about this.
Thankfully, non cuda is well supported
And there's always pytorch xd

past meteor May 27, 2023, 6:12 PM

#

potent sky Yeahh I read about this. Thankfully, non cuda is well supported And there's alwa...

How do you run TF without CUDA? ROCm? CPU?

#

Afaik torch.compile doesn't work on Windows either

potent sky May 27, 2023, 6:12 PM

#

I meant cpu yeah

potent sky May 27, 2023, 6:13 PM

#

past meteor Afaik `torch.compile` doesn't work on Windows either

Hmm I haven't experimented much with 2.0 yet

past meteor May 27, 2023, 6:13 PM

#

But for there rest, you're good to go

potent sky May 27, 2023, 6:14 PM

#

night prawn i have did this

Can you describe what you've done and what error you're facing exactly.
Unfortunately the only thing I can make out here is a pip install keras that returns requirement already satisfied

past meteor May 27, 2023, 6:14 PM

#

Getting anything to run on WSL2 is easier anyway I think

potent sky May 27, 2023, 6:17 PM

#

past meteor Getting anything to run on WSL2 is easier anyway I think

Probably, I'm not very familiar with the windows scene rn esp for CUDA

night prawn May 27, 2023, 6:17 PM

#

potent sky Can you describe what you've done and what error you're facing exactly. Unfortun...

i have this problems but keras and tensorflow are already installed

potent sky May 27, 2023, 6:17 PM

#

Also why not just dual-boot but let's not get off topic

past meteor May 27, 2023, 6:18 PM

#

night prawn i have this problems but keras and tensorflow are already installed

reload your window (CTRL + SHIFT + P and type reload)

potent sky May 27, 2023, 6:19 PM

#

night prawn i have this problems but keras and tensorflow are already installed

What happens if you run the code? As long as it runs it's only an autocomplete issue
And yes, tf and keras has been having autocomplete issues since 2.9.0 iirc, only recently released a fix after 2.11.0 that seems to work ig

night prawn May 27, 2023, 6:24 PM

#

past meteor reload your window (CTRL + SHIFT + P and type reload)

thank you this problem seems resolved but when i run this code it doesn't found gpu

potent sky May 27, 2023, 6:32 PM

#

night prawn thank you this problem seems resolved but when i run this code it doesn't found ...

It says there're some missing libraries. Also gives a link to follow steps to ensure you have everything setup. Try that

night prawn May 27, 2023, 8:50 PM

#

potent sky It says there're some missing libraries. Also gives a link to follow steps to en...

a link ?

timid grove May 27, 2023, 9:57 PM

#

hey folks,
anyone please help me out with this issue 🙏

I am making a data science text-code generating bot by finetuning https://huggingface.co/HuggingFaceH4/starchat-alpha this model on my own dataset which is having about 3000 text-clean code conversations.
i have loaded the model checkpoint shards successfully on my colab with the help of hugging face acclerate , BitsAndBytesConfig nested quantization for memory efficiency.
This is how my data science text to code data is structured.
DatasetDict({
train: Dataset({
features: ['input_ids', 'attention_mask', 'labels'],
num_rows: 2384
})
validation: Dataset({
features: ['input_ids', 'attention_mask', 'labels'],
num_rows: 666
})
})

I am facing a error while training this model with this code:
trainer = Trainer(
model = model,
args = training_args,
train_dataset = lm_datasets["train"],
eval_dataset = lm_datasets["validation"],)
trainer.train()
and getting this error :NotImplementedError: Cannot copy out of meta tensor; no data!

Please someone help me out in resolving this error, it will be a great help to me.
this is the full error message

HuggingFaceH4/starchat-alpha · Hugging Face

hoary jay May 27, 2023, 10:20 PM

#

hey anyone here experience with bert word tokenizer? basically it generates word vectors for a word in a sentence so if bert doesnt have an embedding for a word suppose overweight, then it sub divides it as "over" and "##weight" and then produces word embeddings for these two sub words instead, now would it be okay to define the embedding of the complete word overweight as the average of these two?

agile cobalt May 27, 2023, 10:24 PM

#

hoary jay hey anyone here experience with bert word tokenizer? basically it generates word...

no, not at all?
it isn't an average by any means. It is divided into two tokens, but each of these tokens has it's own individual meaning

serene scaffold May 27, 2023, 10:33 PM

#

hoary jay hey anyone here experience with bert word tokenizer? basically it generates word...

Those are called subtokens, or word pieces. They aren't because BERT doesn't have embeddings for them--they are intentional.

#

"over" and "weight" both have discrete meanings that contribute to the overall meaning of "overweight".

timid grove May 27, 2023, 10:49 PM

#

timid grove hey folks, anyone please help me out with this issue 🙏 I am making a data sci...

Anyone please tell the reason of this error , if would be very helpful 🙂

median leaf May 28, 2023, 3:48 AM

#

#

what does this mean

#

i keep trying to do knn but i get nothing

potent sky May 28, 2023, 4:13 AM

#

timid grove hey folks, anyone please help me out with this issue 🙏 I am making a data sci...

For large models hf accelerate loads your model and weights in shards on each device.
For this it first instantiates a meta tensor, which contains only shape and dtype information and doesn't consume any memory.
Then it moves different layers of this model to different devices
And then loads the corresponding weights from the state dict in each part of the model

Your error seems like you're trying to move the meta tensor to some device directly without any data, which can't be done since there's no data to move

I think there's an empty_like() method you can try using to do that and later load the model weights in

Or, more simply. You can just load the corresponding shard of weights into the meta tensor and then move it to the device of your choice

#

See if that helps. Meta tensors are still a relatively new feature last I checked and support is still maturing

magic dune May 28, 2023, 4:33 AM

#

anyone have any good decision Tree tutorials?

agile cobalt May 28, 2023, 5:10 AM

#

the sklearn documentation should explain it fairly well
they also have a free course/mooc with publicly available materials you can take a look at

potent sky May 28, 2023, 5:31 AM

#

night prawn a link ?

It's there in the error message

magic dune May 28, 2023, 6:14 AM

#

agile cobalt the sklearn documentation should explain it fairly well they also have a free co...

Awesome

hoary jay May 28, 2023, 7:31 AM

#

serene scaffold Those are called subtokens, or word pieces. They aren't because BERT doesn't hav...

i have a lot of these words in my data I was wondering what could i do represent them as a single word vector? Is this possible to do without training your own vocabulary or model ? I need to do this with a pre trained model only like with Gensim's word2vec or bert or any other...

hoary jay May 28, 2023, 7:34 AM

#

agile cobalt no, not at all? it isn't an average by any means. It is divided into two tokens,...

so i have a vector for over and for weight but is there no way to estimate the vector for overweight? I'm working on a data from reddit and it has a lot of slangs like this

agile cobalt May 28, 2023, 7:35 AM

#

iirc the way bert encodes it is not even a vector at all, it is just one number per word segment

#

why do you want to "represent them as a single word vector"?

#

and how would you even define what is and isn't one word?

hoary jay May 28, 2023, 7:51 AM

#

agile cobalt iirc the way bert encodes it is not even a vector at all, it is just one number ...

no? i believe word produces a 768 dimensional word vector for every word if you give it a sentence based on the contextual meaning of the sentence

hoary jay May 28, 2023, 7:55 AM

#

agile cobalt and how would you even define what is and isn't _one_ word?

i meant in cases where it subtokenizes one word into many is bad for me like consider a statement

"All men are overweight these days" then if i pass it to Bert then it will generate 768 dimensional vectors for every word from "all" to "days".... except for overweight, in this case I'll get two word vectors one for "over" and one for "##weight" and that's the issue i need to resolve

#

see by finding the word vector or "overweight" i can account for biases in my data whether overweight is more associated to words like He, Him or male or She, Her or females

agile cobalt May 28, 2023, 8:00 AM

#

what I was thinking off are the Tokens like on https://platform.openai.com/tokenizer, does it later uses these for making the vector you are talking about?

#

(I know that Bert and GPT are not the exact same thing, but as far as I know, the way they tokenize is more or less the same)

past meteor May 28, 2023, 8:13 AM

#

Computerphile has a few great videos on byte-pair encoding and it's worth looking at if you want to understand tokenization

glacial kiln May 28, 2023, 11:20 AM

#

is this the right place to talk about scipy?

grand minnow May 28, 2023, 11:20 AM

#

glacial kiln is this the right place to talk about scipy?

yes

glacial kiln May 28, 2023, 11:21 AM

#

can I do integration in scipy?

grand minnow May 28, 2023, 11:21 AM

#

glacial kiln can I do integration in scipy?

Don't know. Can you?

glacial kiln May 28, 2023, 11:21 AM

#

i have to see

#

one thing can I read about how to use scipy from the help section of python IDE?

grand minnow May 28, 2023, 11:23 AM

#

You could or you go its official doc

#

both works

#

Also depends on your IDE

#

whatever that is

glacial kiln May 28, 2023, 11:24 AM

#

grand minnow whatever that is

python's own IDLE

grand minnow May 28, 2023, 11:25 AM

#

And I assume help section of it is just help(scipy)?

tepid parcel May 28, 2023, 11:28 AM

#

grand minnow Here's a roadmap that you can use as a reference: https://whimsical.com/machine-...

Bro, being more specific, I want to dive into neuro science and social psychology studies using AI (Like Elon Musk's Neural Chip projects for example), can you or anyone here give me an orientation and advices more specific for it?

glacial kiln May 28, 2023, 11:29 AM

#

grand minnow And I assume help section of it is just `help(scipy)`?

i have to check

past meteor May 28, 2023, 11:31 AM

#

tepid parcel Bro, being more specific, I want to dive into neuro science and social psycholog...

University 🙂

tepid parcel May 28, 2023, 11:31 AM

#

past meteor University 🙂

What course or courses to do?

past meteor May 28, 2023, 11:32 AM

#

The masters program I did (AI) had multiple courses in computational neuroscience. I didn't do them but I assume that's the way to go.

tepid parcel May 28, 2023, 11:34 AM

#

past meteor The masters program I did (AI) had multiple courses in computational neuroscienc...

Actually I already am in a University, but I do Data science, so I need to know the areas and subjects that cover this path to guide me trough my career, otherwise I am wasting my time studying some useless or less important things, you know?

glacial kiln May 28, 2023, 11:41 AM

#

grand minnow And I assume help section of it is just `help(scipy)`?

not helping

dense crane May 28, 2023, 11:45 AM

#

can someone confirm if i want to use azure for training the models i have to create the virtual machine there?

past meteor May 28, 2023, 11:46 AM

#

dense crane can someone confirm if i want to use azure for training the models i have to cre...

No, you can also use "serverless" compute through azure machine learning or databricks. You don't need to create a VM

dense crane May 28, 2023, 11:51 AM

#

past meteor No, you can also use "serverless" compute through azure machine learning or data...

ok thx, finally someone answers! so i will figure that out a bit later and if i later would have a question can i ask you again?

grand minnow May 28, 2023, 12:21 PM

#

glacial kiln not helping

what kind of help are you looking for? Can't you refer to the official docs instead?

pseudo spire May 28, 2023, 1:12 PM

#

tepid parcel Actually I already am in a University, but I do Data science, so I need to know ...

Just in case, a thing you must know: there is nothing intelligent in AI. It's pure math: for prediction and classification. There are two types of tasks AI can solve:

classification
math regression (prediction)
All amazing AI things and wonders of past years are built because those tasks were applied to particular domain areas.

tepid parcel May 28, 2023, 1:21 PM

#

pseudo spire Just in case, a thing you must know: there is nothing intelligent in AI. It's pu...

What about Complex systems and data science application for it? Nowadays data science just use till Calculus 2 sporadically, but there still Calculus 3 and 4 available for further researches on the area.

pseudo spire May 28, 2023, 1:25 PM

#

are those university subjects or what?

#

actually gradient descent is Calculus 3, I believe

scarlet ingot May 28, 2023, 1:29 PM

#

how can i do some basic machine learning to make a snake ai

#

like where would i start

pseudo spire May 28, 2023, 1:29 PM

#

so I would be so ultimatively stating it is only Calculus 2

#

Anyways this is just subject names. When Newton invented Calculus, he didn't have 1 2 3 and 4

tepid parcel May 28, 2023, 1:31 PM

#

pseudo spire are those university subjects or what?

Of course bro, They are essential for everything, complex systems studies not only on neuroscience, psychology or social areas, but anything that has a system representation, like biology, telecommunication, including ANN itself (artificial neural networks). This area will discover and solve AI extreme low explainabilities for example and beyond things

scarlet ingot May 28, 2023, 1:31 PM

#

i think its to wean you onto the concepts of it

#

im gonna go ask chatgpt my question

tepid parcel May 28, 2023, 1:34 PM

#

pseudo spire actually gradient descent is Calculus 3, I believe

Yes, but Why Chat GPT is the best for what it does, and don't bypass Calculus 2 I think?
I think That they even won't willing to use gradient descent

pseudo spire May 28, 2023, 1:35 PM

#

ChatGPT is also prediction. Prediction of which of the many answers is the most appropriate.

tepid parcel May 28, 2023, 1:36 PM

#

It's not only matter of difficulty, having more deep calculus is good and is the trend for the development, but I talk about complexity too, Complex systems

#

It's slight different

pseudo spire May 28, 2023, 1:37 PM

#

The more subjects you know well the better. You know, they are not isolated from each other. Knowing what is positive feedback and negative feedback might help you in your AI work.

tepid parcel May 28, 2023, 1:37 PM

#

How Calculus 1 or simple things in general communicates with the more complex?

tepid parcel May 28, 2023, 1:39 PM

#

pseudo spire The more subjects you know well the better. You know, they are not isolated from...

Thanks, I know that no one here is a specialist on my area of interest, but I expected for someone that already is in the career and knows things related

pseudo spire May 28, 2023, 1:40 PM

#

There is no Math 1 2 3 4, so there is no Calculus 1 2 3 4. This is just conventional dividing into smaller parts. In math the more complex things are always built upon more simple things. As in many other subjects.

#

So the AI of Elon Musk's neurointerface is also either prediction or classification. E.g. classification of what particular signal could mean.

tepid parcel May 28, 2023, 1:44 PM

#

pseudo spire The more subjects you know well the better. You know, they are not isolated from...

And you're right, complex systems is so new and "experimental" let's saying, that it lacks so much of information, so that I discovered it accidentally. Only the best universities like Stanford, MIT and related has the most of the research results and projects at the moment.

tepid parcel May 28, 2023, 1:47 PM

#

pseudo spire There is no Math 1 2 3 4, so there is no Calculus 1 2 3 4. This is just conventi...

Yes I treated it like that just as title of understanding, representation and simplification. Cause are real examples that I know

night prawn May 28, 2023, 1:49 PM

#

potent sky It's there in the error message

have you a tuto who is simple and works for using tensorflow gpu ?

serene scaffold May 28, 2023, 1:59 PM

#

hoary jay i have a lot of these words in my data I was wondering what could i do represent...

you might see if you can turn off wordpiece tokenization in the tokenizer. but BERT isn't for getting vectors of each token individually.

hoary jay May 28, 2023, 2:03 PM

#

serene scaffold you might see if you can turn off wordpiece tokenization in the tokenizer. but B...

dont think it is possible i dont see any argument for this and even if i turned it on Bert would then probably give bad results or something, I think it's better to work with the complete sentence embeddings rather than just the word embeddings

serene scaffold May 28, 2023, 2:04 PM

#

hoary jay dont think it is possible i dont see any argument for this and even if i turned ...

I think it's better to work with the complete sentence embeddings rather than just the word embeddings
that's what I'm saying

grand minnow May 28, 2023, 2:04 PM

#

night prawn have you a tuto who is simple and works for using tensorflow gpu ?

Try the official doc. It has everything including using GPU

hoary jay May 28, 2023, 2:05 PM

#

serene scaffold > I think it's better to work with the complete sentence embeddings rather than ...

yep now im thinking to form the problem around this instead ty for the help

tepid parcel May 28, 2023, 2:06 PM

#

pseudo spire So the AI of Elon Musk's neurointerface is also either prediction or classificat...

Maybe I wasn't so clear too and misunderstanding, Complex systems are no related to hardware and software although it has applications based on that, an example of Complex system is climate, it doesn't nothing to do with hardware, software or IT, but as the mainly purpose of IT itself, I can use it as a technology or tool for understand climate systems (And I believe that it's the next researches studies steps after discovering human brain functions, to go understand it systems).

I said it so meaning not as a merely of software or hardware only, I talk about more fundamental things, next to the core, more structural and "symbolic", how Elon's Neural chips works as a system (a complex system type) on the sense of interactions and communication, like how a neural network works. Unfortunately the actual neuroscience focus and fame isn't on what I said and should be, it's on health solutions only (That's isn't a bad thing but maybe exclude other important scientific areas, like sociology that's so but so underrated and poorly explored till nowadays I think)

pseudo spire May 28, 2023, 2:12 PM

#

Your initial question was about AI, right?

tepid parcel May 28, 2023, 2:13 PM

#

pseudo spire Your initial question was about AI, right?

Yup

pseudo spire May 28, 2023, 2:15 PM

#

So you've got answers about AI, don't you

tepid parcel May 28, 2023, 2:17 PM

#

pseudo spire So you've got answers about AI, don't you

No, I don't know What I have to study, only a more superficial advice thought for studying computacional neuroscience at a unversity

potent sky May 28, 2023, 2:17 PM

#

night prawn have you a tuto who is simple and works for using tensorflow gpu ?

https://www.tensorflow.org/install/pip#windows-wsl2_1 maybe this'll help
read carefully, it has links to guides on setting up nvidia for gpu on wsl2, etc.

TensorFlow

Install TensorFlow with pip

#

specific to wsl2

plain drift May 28, 2023, 2:18 PM

#

have been wondering how valuable it might be to move my python install to wsl

potent sky May 28, 2023, 2:19 PM

#

tepid parcel No, I don't know What I have to study, only a more superficial advice thought fo...

you'd get a list of courses available right? It's difficult to suggest anything without knowing what is actually available to you
other than that, generic suggestions you know already, neuroscience, computational neuroscience etc.

tepid parcel May 28, 2023, 2:19 PM

#

tepid parcel No, I don't know What I have to study, only a more superficial advice thought fo...

I exactly mean on how use data science and AI for it...

potent sky May 28, 2023, 2:19 PM

#

plain drift have been wondering how valuable it might be to move my python install to wsl

maybe ot but why not just dual boot

plain drift May 28, 2023, 2:20 PM

#

would throw out a lot of OS familiarity

potent sky May 28, 2023, 2:20 PM

#

plain drift have been wondering how valuable it might be to move my python install to wsl

on windows you might have problems with CUDA, esp with tensorflow

potent sky May 28, 2023, 2:21 PM

#

plain drift would throw out a lot of OS familiarity

fair

wooden sail May 28, 2023, 2:24 PM

#

tf gpu and jax are only available on windows through wsl, and also docker requires it

#

it's worthwhile to look into

#

short of using a proper vm, it's the easiest way of getting started with linux too

potent sky May 28, 2023, 2:25 PM

#

jax as well? damn I've really been blissfully unaware using linux

tepid parcel May 28, 2023, 2:25 PM

#

potent sky you'd get a list of courses available right? It's difficult to suggest anything ...

Thanks, I know, but the problem is that as I said is a new area, It's more on "experimental phasis", like Data science on the past, there no course for that. So I decided to search by informed people here..

tepid parcel May 28, 2023, 2:26 PM

#

tepid parcel Thanks, I know, but the problem is that as I said is a new area, It's more on "e...

Look for it, just 4 pages about it! hahaha

#

I can find courses, but again, so superficial I think

#

Not a Degree for it

potent sky May 28, 2023, 2:27 PM

#

wait so are you looking to choose what courses to take at your uni or
what courses to take online or what to refer to?

tepid parcel May 28, 2023, 2:28 PM

#

potent sky wait so are you looking to choose what courses to take at your uni or what cours...

None, I want to be a professional on this area whatever I have to do, I only said that it isn't so structured to be on a university yet

cold osprey May 28, 2023, 2:29 PM

#

confusing

tepid parcel May 28, 2023, 2:29 PM

#

cold osprey <:confusing:950927428883193916>

hahahahaha

tepid parcel May 28, 2023, 2:31 PM

#

potent sky wait so are you looking to choose what courses to take at your uni or what cours...

It's like more a additional course, there isn't anything directed and complete for, It's more like "coaching" let's say

pseudo spire May 28, 2023, 2:33 PM

#

tepid parcel No, I don't know What I have to study, only a more superficial advice thought fo...

You have to study:

Math statistics (probability and statistics).
Linear algebra (matrixes).
Programming in Python.
How to create neural network via pure math.
Other types of machine learning which also exist (not only neural networks).
Common Python libraries in data science: for data manipulation, visualization and ML.
Jupyter / Anaconda

past meteor May 28, 2023, 2:35 PM

#

How to create a neural network via pure math pithink

pseudo spire May 28, 2023, 2:35 PM

#

tepid parcel None, I want to be a professional on this area whatever I have to do, I only sai...

One does not simply become professional. You need to spend 10'000 hours doing it.

past meteor May 28, 2023, 2:35 PM

#

Super super questionable advice you're getting here but OK

wooden sail May 28, 2023, 2:41 PM

#

after exactly 10k hours, your skin immediately turns gold and music chimes from within you, heralding the unlocking of an achievement

lapis sequoia May 28, 2023, 2:45 PM

#

wooden sail after exactly 10k hours, your skin immediately turns gold and music chimes from ...

if god == ("true")
  print("Fear me mortal")
else:
  print("nah I'm capping")

tepid parcel May 28, 2023, 2:45 PM

#

pseudo spire You have to study: - Math statistics (probability and statistics). - Linear al...

I know that it isn't your fault or responsibility to instruct me on this case, on how to discover the covered subjects for this scientific area, but actually I don't have any methods for it...

How you'd do to study one professional career that don't have any course about yet guys, is on "experimental phasis" and isn't well structured educationally saying, but it is full of concepts (They surged in 1950's, next to AI concept), you know that it's viable study and work on it? Like Its a low explored scientific area and there's no learning materials, but you know there's some core areas involved although not knowing which exactly and can do a self-taught on it?

tepid parcel May 28, 2023, 2:46 PM

#

pseudo spire One does not simply become professional. You need to spend 10'000 hours doing it...

Yes

tepid parcel May 28, 2023, 2:47 PM

#

pseudo spire You have to study: - Math statistics (probability and statistics). - Linear al...

I already know these things

wooden sail May 28, 2023, 2:47 PM

#

tepid parcel I know that it isn't your fault or responsibility to instruct me on this case, o...

i missed he beginning of the conversation, are we talking about AI/ML or something else?

tepid parcel May 28, 2023, 2:47 PM

#

wooden sail i missed he beginning of the conversation, are we talking about AI/ML or somethi...

Yep, How to apply AI/ML for Complex systems science

cold osprey May 28, 2023, 2:48 PM

#

do a phd

#

get into research

wooden sail May 28, 2023, 2:48 PM

#

what's complex systems science? AI/ML degrees are widely available at all education levels

hoary jay May 28, 2023, 2:48 PM

#

hey guys had a question, so if a word reappears again and again in a sentence than how to modify it's word embedding? like say it appears in two completely different sentences but the word has pretty much the same meaning (I'm only talking about pronouns in my case) so for a big data set how can i calculate the embeddings of these words in Bert

tepid parcel May 28, 2023, 2:48 PM

#

cold osprey do a phd

Yes, not kidding, I really am Willing for it

tepid parcel May 28, 2023, 2:54 PM

#

wooden sail what's complex systems science? AI/ML degrees are widely available at all educat...

To sum up all our talk and simplify on a lay way, complex systems science, is the study of systems, climate and even the entire universe is one.
We can use Data Science or AI, to understand these systems, which can solve the extreme low explicability of ANNs and everything on IT and on the world furthermore...

wooden sail May 28, 2023, 2:55 PM

#

sounds very wishy-washy

#

but in the direction you're pointing lies physics-informed AI, so maybe read up on papers related to the topic

tepid parcel May 28, 2023, 2:56 PM

#

wooden sail sounds very wishy-washy

What point you didn't get?

pseudo spire May 28, 2023, 3:00 PM

#

tepid parcel I know that it isn't your fault or responsibility to instruct me on this case, o...

No course? You do your own research then. Or explore other researches (you don't have to have PhD for that, but you would want to know how to learn without complete trainings or step-by-step tutorials).

potent sky May 28, 2023, 3:00 PM

#

pseudo spire You have to study: - Math statistics (probability and statistics). - Linear al...

information theory

#

there's interseciton but I think it deserves to be mentioned separately

pseudo spire May 28, 2023, 3:01 PM

#

"Complex systems". Is it something from the 80s? You said it's something new. I don't think it's new. Maybe it develops more rapidly last years (as almost any subject), but it is definitely something from 1980s. Or even 70s.

queen cradle May 28, 2023, 3:02 PM

#

I have encountered people who bill themselves as experts on "complex systems" and I have not been impressed.

#

My advice is that the fundamentals are the same regardless of the subject area: Math, statistics, computer science, physics, and so on are useful everywhere.

#

If you understand those well, then you can apply them to whatever problem interests you.

tepid parcel May 28, 2023, 3:02 PM

#

wooden sail but in the direction you're pointing lies physics-informed AI, so maybe read up ...

So so, This area is so general, but majority is based on physics approaches. I am willing to go to social science systems and they are the less physics related actually

pseudo spire May 28, 2023, 3:03 PM

#

Like we have complex systems even in generic electronics (not microelectronics).

#

Social science systems... What the heck is this?

tepid parcel May 28, 2023, 3:03 PM

#

pseudo spire "Complex systems". Is it something from the 80s? You said it's something new. I ...

No, it's from 50's, but as study is new

pseudo spire May 28, 2023, 3:04 PM

#

You are speaking some bird language I don't understand.

queen cradle May 28, 2023, 3:05 PM

#

I don't think he's a very fluent English speaker. But I also believe that's not the biggest problem here.

tepid parcel May 28, 2023, 3:06 PM

#

queen cradle I don't think he's a very fluent English speaker. But I also believe that's not ...

Yep hahahaha

cold osprey May 28, 2023, 3:07 PM

#

queen cradle I don't think he's a very fluent English speaker. But I also believe that's not ...

same

pseudo spire May 28, 2023, 3:09 PM

#

Me neither a native speaker. And subject names could differ from country to country. Anyway I have no clue what Social Science Systems is.

past meteor May 28, 2023, 3:09 PM

#

Sad that this isn't maintained anymore: https://github.com/dswah/pyGAM

#

If only I had 5x more time than I do

pseudo spire May 28, 2023, 3:11 PM

#

You have a pen (AI). You have an apple (domain area). Boom! PPAP
Although this journey is not always that linear.

#

And that's why science exist, and scientific researches exist.

#

You could investigate history of text-to-speech technologies to see what was the evolution in this particular domain area

tepid parcel May 28, 2023, 3:32 PM

#

It's just systems present on around the world with an social science approach: friendship, family are ones

#

Remember your School biology class about biological organization levels, from atom to biosphere, between them we have systems

hasty mountain May 28, 2023, 3:37 PM

#

Guys, is there a method to calculate how complex an operation is and how to use a method which may provide similar performance?

I'm currently making some experiments on GANs and I've noticed that, from previous experiments I've made, the best one (contrary to what is recommended) was the one where the Generator used no Transposed Convolutions for upsampling, but simply applied bi-linear upsampling operations followed by a residual block with 3 convolution layers + a single convolution layer to get an output with 3 channels.
I've done this to generate, in a single generator, images 4x4, 8x8, 16x16 and 32x32. However, for the 32x32 images, I've also applied, after the convolution, another residual block with the idea that the model could, then, "apply small enhancements" to the image.

The results: 4x4, 8x8 and 16x16 images are fine, but 32x32 have a tendency of showing model collapse or even collapsing the discriminator(there's one discriminator for each dimension, so the collapse of 32x32 discriminator doesn't affect the others). I was thinking that, of course, since 32x32x3 images are more complex than 16x16x3, this could be expected. But I'm not thinking: is this instability really caused just by the complexity of data, or is it because I added too many layers? Or maybe too few for such complex data? Is it possible to estimate whether I added too many parameters or too few for a given data sample?

#

(PS: Transposed Convolutions in Pytorch are simply the gradient of a normal Convolution with respect to the convolution input. So I suppose Transposed Convolutions and Convolutions in Pytorch should be the same operations, I guess the only difference is that TransConv might add way more padding)

#

Wait...then I suppose that...if I apply a Bilinear Upsampling...the model will have a better idea on which transformations it should apply to the data than in the case of a Transposed Convolution, since this one is equivalent to padding the input...so the model will just receive 0s and have no idea on what it should do...relying solely on the bias... pithink

tepid parcel May 28, 2023, 4:19 PM

#

pseudo spire Like we have complex systems even in generic electronics (not microelectronics).

Yes, that's true, we have complex systems on electronics too...

https://en.wikipedia.org/wiki/Complex_system

Complex system

A complex system is a system composed of many components which may interact with each other. Examples of complex systems are Earth's global climate, organisms, the human brain, infrastructure such as power grid, transportation or communication systems, complex software and electronic systems, social and economic organizations (like cities), an e...

tepid parcel May 28, 2023, 4:19 PM

#

tepid parcel Yes, that's true, we have complex systems on electronics too... https://en.wiki...

Please, consider reading this

tepid parcel May 28, 2023, 4:21 PM

#

tepid parcel Yes, that's true, we have complex systems on electronics too... https://en.wiki...

@pseudo spire , @past meteor , @queen cradle , @wooden sail and @cold osprey .
If any of you help me on this, I'll be thankful

cold osprey May 28, 2023, 4:23 PM

#

tbh i dont even follow the initial question

pseudo spire May 28, 2023, 4:23 PM

#

pseudo spire You have to study: - Math statistics (probability and statistics). - Linear al...

This. I think I can't help more.

serene scaffold May 28, 2023, 4:26 PM

#

@tepid parcel please do not ping people to ask for help. If they're online and have time to answer questions, they'll look at this channel. Otherwise, no one is on-call to answer questions.

tepid parcel May 28, 2023, 4:27 PM

#

serene scaffold <@805607106417786920> please do not ping people to ask for help. If they're onli...

ok.

tepid parcel May 28, 2023, 4:28 PM

#

pseudo spire This. I think I can't help more.

Oh, ok, thanks anyway

tepid parcel May 28, 2023, 4:31 PM

#

pseudo spire No course? You do your own research then. Or explore other researches (you don't...

I found this useful

tepid parcel May 28, 2023, 4:38 PM

#

wooden sail but in the direction you're pointing lies physics-informed AI, so maybe read up ...

Nobody here has any experience with AI for the social area studies and so?
I'd like to know about the experience too, know more about it before get into...

raw compass May 28, 2023, 4:38 PM

#

tepid parcel Nobody here has any experience with AI for the social area studies and so? I'd l...

this is something really wide why dont you search for it?

tepid parcel May 28, 2023, 4:42 PM

#

raw compass this is something really wide why dont you search for it?

It's not that simple, I talked about even more specific things, created all this talk and nobody could understand and help.
What I can to is try to generalize a little bit more to find something out.

raw compass May 28, 2023, 4:42 PM

#

tepid parcel It's not that simple, I talked about even more specific things, created all thi...

I dont understand you

tepid parcel May 28, 2023, 4:43 PM

#

raw compass I dont understand you

Did you follow the conversation bro?

raw compass May 28, 2023, 4:43 PM

#

tepid parcel Did you follow the conversation bro?

still dont get you

tepid parcel May 28, 2023, 4:44 PM

#

raw compass still dont get you

What is the problem? My English is so weird at that point?

raw compass May 28, 2023, 4:44 PM

#

tepid parcel What is the problem? My English is so weird at that point?

think so

#

I mean it looks like that you already got an answer.

errant bison May 28, 2023, 5:10 PM

#

potent sky One approach would be to use yolo for detection and then extract that and use an...

first i am trying to do with stationary. Then with moving cars, like extracting images from video and then training ml model.
Can u plss tell other approaches?

potent sky May 28, 2023, 5:30 PM

#

errant bison first i am trying to do with stationary. Then with moving cars, like extracting ...

For videos you could add a detection module with model chaining so that one model will detect whether there is a vehicle in the frame and only then the rest of the models will run to extract the car and license num. This will save a lot of processing.
You could also add an object tracking module to ensure each vehicle is processed only once
For static images YOLO + OCR is quite a solid and standard approach
An E2E model would not work as well for this even if you could somehow get a suitable dataset.
Same for a ViLT model fine-tuned for VQA, it wouldn't work as well imo
You can use some traditional image processing techniques to preprocess the images so that the license plate numbers become more prominent and then run OCR on that
You could also try segmentation but that'd be overkill since detection is the right task here

errant bison May 28, 2023, 5:32 PM

#

potent sky For videos you could add a detection module with model chaining so that one mode...

thanks for the explanation, can u pls provide any documentation

potent sky May 28, 2023, 5:39 PM

#

One simpler way could be to detect contours and extract the probable license plate regions from there, tho I don't think this would perform very well

#

Since it has to be in cars and cars will probably be upright you could also design specific kernels or structuring elements to detect these license plates

potent sky May 28, 2023, 5:43 PM

#

errant bison thanks for the explanation, can u pls provide any documentation

I mean, yolo, tensorflow, pytorch docs are available online but here you go:
https://docs.ultralytics.com/
If you're asking for a tutorial then maybe this'll help
https://medium.com/saarthi-ai/how-to-build-your-own-ocr-a5bb91b622ba

Home

Explore Ultralytics YOLOv8, a cutting-edge real-time object detection and image segmentation model for various applications and hardware platforms.

Medium

How to build your own OCR?

In this article, you will learn how to make your own custom OCR program with the help of deep learning, to read text from an image. I will…

potent sky May 28, 2023, 5:44 PM

#

potent sky One simpler way could be to detect contours and extract the probable license pla...

You could try RPNs, greedy selection algos etc. too but Yolo is better in both speed and performance

errant bison May 28, 2023, 5:48 PM

#

YOLO means opencv?

#

or am i mixing things

plain drift May 28, 2023, 6:09 PM

#

i explored pre-trained OCR tools for the first time in a while recently. before that, i'd been so disappointed with tesseract that i figured paying for google's vision module was my only hope for any ocr-reliant project. but this time i found paddleocr pretty impressive -- though even then I had to preprocess the input images a little to get useful outputs

#

there's a HF space that lets you try it out somewhere. will look to find it unless you're looking to train your own tool?

hasty mountain May 28, 2023, 6:11 PM

#

errant bison or am i mixing things

YOLO is a model for computer vision. OpenCV is a library/module/package/idk to work with computer vision in Python(and I guess in C too)

pseudo spire May 28, 2023, 6:12 PM

#

tepid parcel What is the problem? My English is so weird at that point?

The problem is... Judging from your questions, it is very likely that you try to find a ready-to-use recipe in a particular domain area (or areas). This judgment can be wrong, but as you are unable to came up with more practice-oriented questions, it very seems so.

Also, it might happen that in those particular domain area (or areas) a ready to use recipe either doesn't exist, or it's not made public. So... yeah.

Also, it there was a publicly available ready-to-use recipe, then you (as an AI professional, lets imagine that) would not be needed at all. Any person who knows how to operate computer to some extent, would be able to follow a ready-to-use recipe and achieve a needed result.

So.. .Step-by-step guides are not an option when you want to create something new / breakthrough. However, step-by-step guides can be used on some stages of learning... It's important though, that the learning path must include something else aside following step-by-step guides, otherwise the learning would not be complete.

pseudo spire May 28, 2023, 6:16 PM

#

tepid parcel What is the problem? My English is so weird at that point?

So I previously mentioned AI is the best in solving the following practical tasks:

Classification
Math regression

Actually I was wrong (I am learning as well), and there are more tasks/computational methods which can be well executed with AI:

Classification
Math regression
Clustering
Dimensionality reduction
seq2seq

So as an AI professional, you start to think can you apply any of the mentioned methods to a particular domain area. If yes - how exactly. E.g., the "clustering" task/method is very relevant to social connections, I believe.

tepid parcel May 28, 2023, 6:22 PM

#

raw compass I mean it looks like that you already got an answer.

Yes, AI projects with ML are experimental, testing and failing, and all else. Maybe I was misunderstood, I know that nobody know exactly what of Data science and AI subjects cover Complex System science, It's because I just didn't dive into this area before to know something about so I want to know it better before start to studying it, I thought that would hear something like "Dude, if you want to be not only a professional but a real researcher, study x, w, y, z subjects" maybe "on x subject you can start with this or that, on w together with y you will learn something".
To complement my doubt, I need know about the career too, the experiences, opportunities, specializations, future prospection and etc.

pseudo spire May 28, 2023, 6:26 PM

#

One must not only "know", but also be able to apply this knowledge. This is usually achieved via extensive practice in a particular domain area and gaining practical experience.

raw compass May 28, 2023, 6:28 PM

#

tepid parcel Yes, AI projects with ML are experimental, testing and failing, and all else. Ma...

I hardly recommend you to stop this right here. It is just weird.

tepid parcel May 28, 2023, 6:29 PM

#

raw compass I hardly recommend you to stop this right here. It is just weird.

Why broo?

pseudo spire May 28, 2023, 6:30 PM

#

This is what we do in Universities, isn't it? It's not only reading about x, w, y, z. But also some practical work. Or more precisely scientific work (especially for candidates to PhD). There is a disctinct difference between practical work and scientific work. And this difference is novelty, pushing forward borders of science.

raw compass May 28, 2023, 6:30 PM

#

tepid parcel Why broo?

you are getting answers arent you?

pseudo spire May 28, 2023, 6:31 PM

#

tepid parcel Why broo?

Because you have to do some practice bro

potent sky May 28, 2023, 6:31 PM

#

plain drift i explored pre-trained OCR tools for the first time in a while recently. before ...

Interesting
Tesseract didn't work that well for me either but then I had a bit of a niche use case
I ended up training a model myself

potent sky May 28, 2023, 6:32 PM

#

hasty mountain YOLO is a model for computer vision. OpenCV is a library/module/package/idk to w...

Yep this
And C++ actually

#

Yolo is more like a set of techniques by now rather than one single model, but fair enough
We're at YOLOv8 now

plain drift May 28, 2023, 6:34 PM

#

my domain was reading timestamps from screenshots of the game league of legends. surprisingly, even though these timestamps were way more well-structured than the canonical challenge for an OCR system (e.g. handwriting), this was too hard for tesseract and the failure rate was at least 10% (didn't actually measure)

potent sky May 28, 2023, 6:34 PM

#

plain drift my domain was reading timestamps from screenshots of the game league of legends....

Surprising, tesseract is pretty popular

plain drift May 28, 2023, 6:34 PM

#

yeah i can't help but think i was doing something wrong. but either way, paddleocr proved easier to get working

potent sky May 28, 2023, 6:34 PM

#

Did you do preprocessing on the images? Game screenshots can be kindof flashy

plain drift May 28, 2023, 6:36 PM

#

yeah! i think i did some transformations to increase contrast and whatnot. what ended up being important for getting performance on paddleocr to work was to basically make sure the parts of the image i snipped from these screenshots to read were superimposed on larger images -- e.g. a 150x150 image of black pixels. My guess is that if the ratio of "text" to background is unusual wrt to its training set, the model can be thrown off.

tepid parcel May 28, 2023, 6:36 PM

#

raw compass you are getting answers arent you?

Not really, how with this advice I will get an PhD on Complex System?
This are advices aren't directed enough, I think, I know that this server don't have any label of areas to apply AI, but I am asking for. Great part form what was said here, just being honest, clear and direct, I already know. But I am happy to know that I am one the right path!

pseudo spire May 28, 2023, 6:37 PM

#

You will not get PhD with an advice. Full stop.

tepid parcel May 28, 2023, 6:37 PM

#

tepid parcel Not really, how with this advice I will get an PhD on Complex System? This are a...

Not really PhD, I exaggerated, but apply AI knowledges

potent sky May 28, 2023, 6:37 PM

#

plain drift yeah! i think i did some transformations to increase contrast and whatnot. what ...

Hmm yes possible. I observed this when building a handwriting recognition model

raw compass May 28, 2023, 6:38 PM

#

tepid parcel Not really, how with this advice I will get an PhD on Complex System? This are a...

okay, im living in an english speaking country, but I swear to god I dont understand you.

plain drift May 28, 2023, 6:38 PM

#

it's really interesting how janky these AI models are. Felt like negotiating with an alien

pseudo spire May 28, 2023, 6:38 PM

#

tepid parcel Not really PhD, I exaggerated, but apply AI knowledges

What academic year are you in at the university?

potent sky May 28, 2023, 6:39 PM

#

tepid parcel Not really PhD, I exaggerated, but apply AI knowledges

You mentioned I think social sciences and complex systems. These are pretty broad fields and like you said most of the advice we can give in this setting are things you already know: explore relevant courses, understand the ML math deeply, consult books on the topic etc.
What other advice are you looking for? /gen

#

I hope this doesn't come off as harsh, I'm genuinely trying to understand what other advice you're expecting

#

Maybe I missed something in between, quite possible

tepid parcel May 28, 2023, 6:42 PM

#

pseudo spire What academic year are you in at the university?

Just a student.

pseudo spire May 28, 2023, 6:43 PM

#

Which year student?

tepid parcel May 28, 2023, 6:43 PM

#

pseudo spire Which year student?

Second year

tepid parcel May 28, 2023, 6:44 PM

#

potent sky You mentioned I think social sciences and complex systems. These are pretty broa...

Yes, all of you are right

tepid parcel May 28, 2023, 6:46 PM

#

potent sky You mentioned I think social sciences and complex systems. These are pretty broa...

I meant that: if there's anyone that study that or something near to that who can give me theses advices, I though that it'd simpler too, just asking here I'd find something more consistent

potent sky May 28, 2023, 6:48 PM

#

Fair enough. I don't think I specialise in those fields so there's not much more I can say.
Except that combining ai-ml with any other field will demand a deep understanding, so it is really recommended that you dive deep into the math behind it and how things work under the hood (esp for complex systems ig)
GL!

tepid parcel May 28, 2023, 6:57 PM

#

potent sky Fair enough. I don't think I specialise in those fields so there's not much more...

Ok, I don't know how can I find people in the area, perhaps I will struggle a lot in this case.

potent sky May 28, 2023, 7:01 PM

#

Search up courses on this online by other universities, see the professors who're taking the class, maybe contact them and ask for advice? Or the students enrolled in those courses?

#

Or people in the industry working in related positions, can get that through a LinkedIn search

brave zenith May 28, 2023, 9:13 PM

#

Is there any way to use opencv to render footage live

#

from an api or a server hosted

#

using google colab

hasty mountain May 28, 2023, 11:26 PM

#

Ugh... GANs are already annoying enough, and then when I search for a way to make a Variational AutoEncoder that doesn't generate blurry images...the papers around this tells me to use a VAE with a GAN configuration py_guido

#

I guess I'll just turn to huggingface and use a pretrained VAE, then...

iron basalt May 28, 2023, 11:32 PM

#

hasty mountain Ugh... GANs are already annoying enough, and then when I search for a way to mak...

Hierarchical VAEs with extra constraints work pretty well (e.g. diffusion).

hasty mountain May 28, 2023, 11:33 PM

#

iron basalt Hierarchical VAEs with extra constraints work pretty well (e.g. diffusion).

Fun fact: I'm trying to make a research on GANs using exactly the same idea of the stable diffusion. But for that, I need a pretrained VAE.

#

But I'm having some trouble in getting a trained VAE

#

And I'm having a hard time trying to find the architecture of the VAE they used for Stable Diffusion.

hasty mountain May 28, 2023, 11:40 PM

#

iron basalt Hierarchical VAEs with extra constraints work pretty well (e.g. diffusion).

Oh... Hierarchical... I've seen some comparisons about "eye recovery" with Stable Diffusion and the VAE...
I suppose that's the trick to avoid blurry, then...reducing the complexity of the data. pithink

#

I also see folks commenting about MSELoss... Really, I don't know how they manage to make a VAE work with colored images using a MSE Loss. At least my VAEs only work with colored images when I use Gaussian Log Likelihood

#

Interesting... I knew this for GANs, the Progressive Grow, but I didn't expect that this would be necessary for VAEs, too.

#

Thanks, Squiggle! I'll run some tests.

#

But first, just a single VAE in 8x8 or 16x16 images...just to make sure

hasty mountain May 29, 2023, 12:24 AM

#

Hm... The targets for the Encoding Loss isn't just a Normal Distribution for each encoder? It could be a distribution of a more complex features?

Now this got even more interesting...I'm already thinking about information entropy. Too bad it tends to make things too computationally expensive grumpchib

potent sky May 29, 2023, 12:51 AM

#

brave zenith using google colab

Yes iirc, but they change this often so better consult a tutorial. But there used to be ways

hasty mountain May 29, 2023, 1:50 AM

#

hasty mountain But first, just a single VAE in 8x8 or 16x16 images...just to make sure

It seems that with 16x16 images the VAE (non-hierarchical) generates quite the same quality of output.
But at least it's easier to notice that...the secret ingredient is probably many, many, many, many iterations and running time.

#

Even when the loss appear to have stabilized, after 3 or 4 epochs it decreases some 3 or 4 points, then stabilizes again, and so on...

#

Too bad I still don't know enough about how to implement Genetic Algorithms...it may be interesting to use stochastic gradient descent for a VAE and, when the model begins to take too much time to get better, maybe perform some evolutionary optimization in parallel...

mint palm May 29, 2023, 6:28 AM

#

does gpu parallelism in pytorch just means:
send model to each gpu
devide input into n parts

nothing else?

modest mulch May 29, 2023, 6:57 AM

#

mint palm does gpu parallelism in pytorch just means: send model to each gpu devide input...

Pretty much yea, and the gradients get merged in some master process and get broadcasted to the gpus. There are some nitty gritty details about how this exactly works, but generally the idea is as you have stated

mint palm May 29, 2023, 7:12 AM

#

modest mulch Pretty much yea, and the gradients get merged in some master process and get bro...

okay, do you recommend any source i can go through? i am unable to kind to find something simple.

dusk tide May 29, 2023, 7:16 AM

#

I am working on movies dataset on kaggle link here https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset and I am practicing data cleaning. So there is a column named production_countries which tells all the countries where movies produced. There are a lot of NaN values in this column so I decided to replace all of these with the most occured city names (like US, FR,GB). And I cannot do this by fillna() because it want a scaler or dictionary value but the values that I have are in lists form. So i found another way which is using random() function. prod_countries_nans = movies4['production_countries'].isna() prod_countries_nans_length = sum(prod_countries_nans) replacement = random.choices([['US','DE'],['GB','FR']], weights=[.5 , .5], k=prod_countries_nans_length) movies4.loc[prod_countries_nans,'production_countries'] = replacement . So this function try to evenly distribute the countries and stores them in replacement. But I am receiving an error Must have equal len keys and value when setting with an ndarray . The length of both prod_countries_nans_length and replacement is same 6206 and the same thing I applied for genre and production companies worked fine. Can anyone tell??

The Movies Dataset

Metadata on over 45,000 movies. 26 million ratings from over 270,000 users.

errant lake May 29, 2023, 7:30 AM

#

dusk tide I am working on movies dataset on kaggle link here https://www.kaggle.com/datase...

You can probably do a one liner replacing all your NaNs with a random choice

mellow quarry May 29, 2023, 7:31 AM

#

dusk tide I am working on movies dataset on kaggle link here https://www.kaggle.com/datase...

can you put the code in a normal codeblock with the python tag on it?

cold osprey May 29, 2023, 7:32 AM

#

!code

arctic wedgeBOT May 29, 2023, 7:32 AM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

errant lake May 29, 2023, 7:33 AM

#

dusk tide I am working on movies dataset on kaggle link here https://www.kaggle.com/datase...

choices = [...] 
df['column'] = [random.choice(choices) if pd.isna(x) else x for x in df['column']]

#

This will replace all NaNs in your column with a random choice from list choices

raw compass May 29, 2023, 9:22 AM

#

I was thinking about something interesting after seeing AUTO-GPT. How is it possible to use an AI agent that can control your entire operating system, open application files, and perform various tasks?

oblique quarry May 29, 2023, 9:42 AM

#

Hello, could I ask for help? It's about a question concerning a MLP. I have just started my journey into the wide land of ml, so I'm fairly new which is why I'd like to ask some more advanced pogrammers if they could help me spot the error in my code

plain drift May 29, 2023, 9:42 AM

#

raw compass I was thinking about something interesting after seeing AUTO-GPT. How is it poss...

This is maybe not exactly how it's being implemented. But when you have an AI agent like ChatGPT write a script for you, you are already having it specify a sequence of system operations that achieve a desired result. So from there, the only additional step you haven't automated is the execution of that script on your system. To use an AI agent that can control your entire operating system, you just automate that step too.

raw compass May 29, 2023, 9:50 AM

#

plain drift This is maybe not exactly how it's being implemented. But when you have an AI ag...

Thank you for your answer but with what technology, it can be a simple CLI and this app has the permission to execute various commands on the user's computer?

past meteor May 29, 2023, 9:51 AM

#

oblique quarry Hello, could I ask for help? It's about a question concerning a MLP. I have jus...

show your code and we'll do our best 🙂

plain drift May 29, 2023, 9:53 AM

#

Sure. So one scripting language for controlling operating systems is bash. ChatGPT can write bash scripts to execute system operations. You can make an app that the user grants system access, prompts ChatGPT to create scripts that suit a specified need (e.g., through an API call), and then executes that script.

raw compass May 29, 2023, 9:53 AM

#

plain drift Sure. So one scripting language for controlling operating systems is bash. ChatG...

I assume if I want to use APIs I should go with python and some low level languages.

#

so like extend python with c.

plain drift May 29, 2023, 9:54 AM

#

I guess it depends on your project-specific needs. But you can mostly stick to Python for this. Python will talk to ChatGPT through an API, and then initiate system commands based on ChatGPT's responses. Python's os library supports issuing system commands.

#

chatgpt is an idiot though. could do crazy irreversible things. careful!

oblique quarry May 29, 2023, 9:57 AM

#

https://paste.pythondiscord.com/oqapuhoxir

oblique quarry May 29, 2023, 9:59 AM

#

oblique quarry https://paste.pythondiscord.com/oqapuhoxir

The idea is to write a CNN but I seem to struggle with just implementing a MLP

past meteor May 29, 2023, 10:00 AM

#

oblique quarry The idea is to write a CNN but I seem to struggle with just implementing a MLP

Your forward method (line 5) seems to be wrong already

#

I also don't think writing a MLP from scratch and manually computing gradients is a good exercise.

#

If you're going to implement neural nets from scratch you should implement automatic differentation (aka autograd) https://en.wikipedia.org/wiki/Automatic_differentiation by yourself and then use that yo make your neural network library

Automatic differentiation

In mathematics and computer algebra, automatic differentiation (auto-differentiation, autodiff, or AD), also called algorithmic differentiation, computational differentiation, is a set of techniques to evaluate the partial derivative of a function specified by a computer program.
Automatic differentiation exploits the fact that every computer pr...

raw compass May 29, 2023, 10:03 AM

#

plain drift I guess it depends on your project-specific needs. But you can mostly stick to P...

I mean it should be able to communicate and handle every applications, since linux is open source and every-sofware on it, maybe a model could be trained on those softwares, and knows exactly how to deal with things.

oblique quarry May 29, 2023, 10:03 AM

#

past meteor Your forward method (line 5) seems to be wrong already

Yeah i accidentally typed self.weight, self.bias instead of self.weight, self.input

past meteor May 29, 2023, 10:03 AM

#

There's some C++ pseudo-code in the wiki page. There's also minimal autograd implementations from Andrej Karpathy on Github (https://github.com/karpathy/micrograd)

oblique quarry May 29, 2023, 10:08 AM

#

Appreciate it, will read through it

past meteor May 29, 2023, 10:09 AM

#

oblique quarry Appreciate it, will read through it

But yeah, if you want to use an MLP on an image you'd have to flatten the 100x100 matrix into a 100,00 vector and feed that to your MLP

#

Just as you wrote

night prawn May 29, 2023, 10:41 AM

#

potent sky https://www.tensorflow.org/install/pip#windows-wsl2_1 maybe this'll help read ca...

I've done everything it says but it doesn't work

placid cedar May 29, 2023, 10:53 AM

#

hi

#

currently doing my project, and these are the project suggested steps

#

#

shouldn't missing value imputation be before train and test split?

#

and the transformation of numerical variables using proper transformation methods be after the train test split?

#

would appreciate anyone's help here 🙂

past meteor May 29, 2023, 10:56 AM

#

placid cedar

Is this some kind of homework?

placid cedar May 29, 2023, 10:57 AM

#

its an assignment apparently

#

the lecturers suggested these steps

#

and i felt that the arrangement was kinda flawd

#

flawed

#

unless im wrong

past meteor May 29, 2023, 10:58 AM

#

What's flawed in yoru opinion?

placid cedar May 29, 2023, 10:58 AM

#

shouldnt missing values imputation be before the train and test split?

past meteor May 29, 2023, 10:58 AM

#

No

#

It's flawed because the first thing you should no is the test train split

placid cedar May 29, 2023, 10:59 AM

#

o?

past meteor May 29, 2023, 11:00 AM

#

You want to estimate the performance of your method on truly unseen data. If you explore it with .describe() and remove outliers from it, it is not truly unseen

potent sky May 29, 2023, 11:01 AM

#

past meteor It's flawed because the first thing you should no is the test train split

Hmm why? Data cleaning before split is perfectly reasonable
Moreso because you might have to drop records in data cleaning and that might skew the train test split, not maintaining the ratio selected

past meteor May 29, 2023, 11:01 AM

#

You're just introducing bias into your results

potent sky May 29, 2023, 11:02 AM

#

past meteor You want to estimate the performance of your method on truly unseen data. If you...

That would be, but this is simply handling missing data, and it is being performed anyway
The question is whether before or after split

past meteor May 29, 2023, 11:02 AM

#

Measure the performance of your methods like you would if you deploy them into production

placid cedar May 29, 2023, 11:03 AM

#

it also stated that the transformation methods are done before the train test split

past meteor May 29, 2023, 11:03 AM

#

That's bogus

placid cedar May 29, 2023, 11:03 AM

#

shld it be after split?

potent sky May 29, 2023, 11:03 AM

#

Hmm that should be done after

potent sky May 29, 2023, 11:03 AM

#

past meteor Measure the performance of your methods like you would if you deploy them into p...

^

past meteor May 29, 2023, 11:03 AM

#

Ideally everything is done after. In reality this is not the case but you should postpone as much as possible unless you can't

placid cedar May 29, 2023, 11:04 AM

#

mmm alright

past meteor May 29, 2023, 11:04 AM

#

Imagine your split makes it such that all the outliers are in your test set? If you split after removing outliers you've manipulated your test set

#

Which means your test set is not truly unseen. You saw it and you touched it.

wooden sail May 29, 2023, 11:05 AM

#

such a good example, oof

placid cedar May 29, 2023, 11:05 AM

#

makes sense

#

then what about the missing value imputation?

#

what happens if i perform it before the train test split

past meteor May 29, 2023, 11:06 AM

#

The same

potent sky May 29, 2023, 11:06 AM

#

Imputation will also have similar problems. The values filled in will be representative of the whole dataset
So the imputed values in your train set will have been influenced by the test set, rendering it not unseen

#

Only dropping records or features is fine before split I think

past meteor May 29, 2023, 11:06 AM

#

You're most likely doing some mean value imputation right? You're information from unseen data to determine the mean and using that to impute the values

past meteor May 29, 2023, 11:07 AM

#

potent sky Only dropping records or features is fine before split I think

It's a tough judgement call but I think yes you can drop records before the split. But ideally you define the rules for which you will drop records before looking (extensively) at the data

placid cedar May 29, 2023, 11:08 AM

#

so it means i should do the transformation methods such as logarithm transformation before the splitting?

#

so sorry for pestering too much yeah, im pretty new to this mb

past meteor May 29, 2023, 11:08 AM

#

placid cedar so it means i should do the transformation methods such as logarithm transformat...

no, you split first and then do all the rest

#

No no, this is a good discussion tbh because this is something some of my colleagues struggle with 💀

placid cedar May 29, 2023, 11:09 AM

#

oh damn

#

this really isn't easy at the start ngl

past meteor May 29, 2023, 11:10 AM

#

Another thing to notice is that the transformations your lecturer wants you to do are linear regression specific

potent sky May 29, 2023, 11:10 AM

#

past meteor It's a tough judgement call but I think yes you can drop records before the spli...

Yes exactly
However I think there's 2 different concepts at play here.
Exploring the data and being selective about what rows to drop will deteriorate the quality of the data.
But, it will not introduce bias
By my understanding

past meteor May 29, 2023, 11:10 AM

#

You can either 1) Eyeball the data and determine you need a log / box-cox / ... transform beforehand OR 2) make your linear regression model and study the residuals. If the residuals have any structure then you apply a transform afterwards (easier)

potent sky May 29, 2023, 11:11 AM

#

placid cedar so it means i should do the transformation methods such as logarithm transformat...

Try to do everything after splitting as a thumb rule
As you get more experienced you might try dropping rows and columns before splitting (but in a rule based manner as zestar75 mentioned) so as to maintain stable split proportions

past meteor May 29, 2023, 11:11 AM

#

Removing outliers are something you should do immediately though yeah

#

Because linear regression is very sensitive to outliers. I'll look for an image, sec

placid cedar May 29, 2023, 11:12 AM

#

so after i performed my train test split, lets say i want to use a mean value imputation. since im going to split it 70% and 30%, so the missing values in the 70% are going to be replaced by the mean of the values of that 70%?

potent sky May 29, 2023, 11:12 AM

#

Yes

#

Which is good

placid cedar May 29, 2023, 11:12 AM

#

ohhh ok i see

#

how would i explain why its good?

potent sky May 29, 2023, 11:13 AM

#

If it was based on all the 100% then some information is "leaking" from the test set to the train set
Which means the test set isn't exactly "unseen" anymore.
Obviously we want the test set to be unseen

past meteor May 29, 2023, 11:14 AM

#

potent sky Yes exactly However I think there's 2 different concepts at play here. Exploring...

To answer this, I think it all goes back to measuring the performance on truly unseen data. You're free to drop rows based on "basic" rules that you're sure of your model will not see in reality. For example if Y is missing you'd drop it, that's fine.

past meteor May 29, 2023, 11:15 AM

#

placid cedar how would i explain why its good?

Because information is not leaking from your test set to your training set (google leakage)

potent sky May 29, 2023, 11:15 AM

#

past meteor To answer this, I think it all goes back to measuring the performance on truly u...

Mhm I might drop rows that it may see in reality. That will be bad data processing. I'll be a bad data scientist. The data will be lower quality and the model will be worse.
However, I believe that it won't be introducing bias

wooden sail May 29, 2023, 11:15 AM

#

if you wanted to do a fair imputation, you'd do it based only on the 70%, which would introduce some amount of bias unless you have a fair amount of knowledge on the statistics of the problem you're dealing with

past meteor May 29, 2023, 11:16 AM

#

potent sky Mhm I might drop rows that it may see in reality. That will be bad data processi...

It won't lower the quality of your model, it'll just reduce the fidelity of the estimate of the performance of your model

potent sky May 29, 2023, 11:16 AM

#

This is a good clear example of leakage btw. Ngl a lot of times it's very difficult to spot leakage in complex data pipelines

past meteor May 29, 2023, 11:17 AM

#

Like, the whole point of this is having a high fidelity estimate of how your model will perform in the wild

potent sky May 29, 2023, 11:17 AM

#

I was referring to performance in the wild. That it will probably have lower performance since I've removed rows that I know would occur in reality

past meteor May 29, 2023, 11:18 AM

#

past meteor Like, the whole point of this is having a high fidelity estimate of how your mod...

Still not 100 % correct imo, really focus on this part

potent sky May 29, 2023, 11:18 AM

#

My point is only that it's different to bias

placid cedar May 29, 2023, 11:18 AM

#

thanks so much for ur support and answers anyway guys, hope to see you all next time when i have more queries. really nice learning more from y'all!

past meteor May 29, 2023, 11:18 AM

#

placid cedar thanks so much for ur support and answers anyway guys, hope to see you all next ...

Wait before you go

placid cedar May 29, 2023, 11:18 AM

#

yesh

potent sky May 29, 2023, 11:19 AM

#

placid cedar thanks so much for ur support and answers anyway guys, hope to see you all next ...

No problem! We sure get to learn too. Atb!

past meteor May 29, 2023, 11:19 AM

#

Do the box-cox or log transforms after step 8

#

So load in the data, split immediately, explore it, remove outliers, impute missing values, one hot encoding, make features and then make a model

placid cedar May 29, 2023, 11:20 AM

#

yep gotcha! joe_salute

past meteor May 29, 2023, 11:20 AM

#

Step 9 is investigate the residuals (the error you're making) with respect to all of your variables. Afterwards if you see "structure" in these residuals then you go for box-cox, log transform or binning

placid cedar May 29, 2023, 11:21 AM

#

yes sirrrr

past meteor May 29, 2023, 11:23 AM

#

potent sky My point is only that it's different to bias

So bias in statistics is just that the estimate you have is different to that of the actual quantity. So even if your sample is infinite you do not converge to y_true. The quantity you're trying to estimate in model evaluation is "how well is my model performing in the wild". If you remove a certain % of the data it may see in the wild you will converge to an estimate that is "biased", it will not accurately reflect the performance it should have had

#

I think you're mixing it up a bit with "if I drop valid rows my model will get less data and be worse"

potent sky May 29, 2023, 11:24 AM

#

potent sky Mhm I might drop rows that it may see in reality. That will be bad data processi...

Eep mb. I used the full sentence here. "Introducing bias". By which I mean biasing the training set towards the test set, and the model as a result

past meteor May 29, 2023, 11:24 AM

#

I think in the limit the problem I described is worse. Why? Companies like Zillow lost billions of dollars because they deployed bad ML models. This can likely be ascribed to not having high fidelity estimates of how well the model is performing.

#

I don't think I can explain it any differently or better than I have now 😅 . It's a bit unintuitive and poorly taught in school so it might take a while for this to make sense idk

potent sky May 29, 2023, 11:27 AM

#

past meteor I think in the limit the problem I described is worse. Why? Companies like Zillo...

I agree. I was talking about something else though, perhaps I didn't convey it very well

#

Ensuring fidelity in model estimates has multiple aspects. One is to have test data representative of real data. Another is to ensure the model is not biased towards the test data.
Different things

past meteor May 29, 2023, 11:28 AM

#

What does "biased towards the test data" mean?

potent sky May 29, 2023, 11:28 AM

#

I was referring to the second aspect and how dropping records or features will not lead to this.
It might lead to the first one

potent sky May 29, 2023, 11:29 AM

#

past meteor What does "biased towards the test data" mean?

The test data is not representative of entirely unseen data

past meteor May 29, 2023, 11:30 AM

#

Dropping records will not lead to the latter but imo that's a moot point

potent sky May 29, 2023, 11:30 AM

#

So learning in the training data biases model towards the train data distribution. This is why we need a test data to have a good fidelity estimate.
Hence we must ensure the training data (and the model by extension) isn't biased towards the test data

past meteor May 29, 2023, 11:31 AM

#

Scenario #2 is a lesser of both evils yes

#

It'll lead to your estimates not necessarily being higher than they're supposed to be (which happens if you leak from test -> train) but they'll still be off

#

Unless maybe you dropped the rows at random

potent sky May 29, 2023, 11:33 AM

#

past meteor It's a tough judgement call but I think yes you can drop records before the spli...

Mhm I think I interpreted this as stating that selectively dropping records before split will introduce the second kind of bias

past meteor May 29, 2023, 11:34 AM

#

Okay so a concrete example from my work 😄

#

We have multiple time series. For now our independent variables (X) are lags. We use the last n lags to predict n+1. If there is an interruption in our time series longer than for example 45 minutes between any of the lags we can drop that observation

#

This is a rule we made before looking at the data. The 45 min etc. is arbitrary, it could've been 2 hours or something else

#

In the real world if we deploy our system I don't think we'd make any prediction in these cases, we don't have enough information. In that sense dropping them does not lead to any bias

#

I'm a bit "annoying" in this respect, before we start modelling I always want to have a meeting where everyone sits together (tech/non-tech) where we discuss how we evaluate and how we'll drop rows, ...

potent sky May 29, 2023, 11:40 AM

#

Precisely. I agree.
My point was that if you had set it to 10 secs, it would introduce the first kind of bias, in that you would have to make those preds on deployment but you're dropping those records from your data
However it wouldn't introduce the second kind of bias, i.e. biasing the model towards the test data
We would still have low fidelity estimates tho. This becomes the problem of our test data distribution not being representative of the deployed data distribution

past meteor May 29, 2023, 11:41 AM

#

potent sky Precisely. I agree. My point was that if you had set it to 10 secs, it would int...

Yup, makes sense. It's a rough subject

#

Maybe I exaggerate as well, idk

#

I'm jealous that other "engineering" disciplines like mechanical engineering etc. take the evaluation of their products so much more seriously.

potent sky May 29, 2023, 11:42 AM

#

Ahaha it's difficult to convey the entire sense over text, that too without math

potent sky May 29, 2023, 11:43 AM

#

past meteor I'm jealous that other "engineering" disciplines like mechanical engineering etc...

In the ML Engg/ DS industry, some more rigorousness before jumping into building a deep neural network would help xd

past meteor May 29, 2023, 11:43 AM

#

It's normal there to have a meeting on product / material evaluation before you even start building the pieces. You don't want to test the brakes of a car in different situations than it will be used on the road etc. in data science it's less common to be very very rigorous about this

potent sky May 29, 2023, 11:43 AM

#

Exactly

#

Some more preparation than "throw a black-box at the problem and pray it works" lol

#

Anyw, fantastic discussion! gtg now tho

median leaf May 29, 2023, 12:22 PM

#

#

anyone understand how to interpret this

narrow crane May 29, 2023, 1:03 PM

#

is there a discord server related to data science?

pseudo spire May 29, 2023, 1:34 PM

#

median leaf anyone understand how to interpret this

Here I googled "confusion matrix" for you:
https://www.v7labs.com/blog/confusion-matrix-guide

placid cedar May 29, 2023, 2:30 PM

#

anyone online now?

#

quick question, are you able to check for r square and mse if there are columns that contain categorical data in your dataset

past meteor May 29, 2023, 2:37 PM

#

Huh

past meteor May 29, 2023, 2:37 PM

#

placid cedar quick question, are you able to check for r square and mse if there are columns ...

You compute the rmse between your predictions and the target, not on your variable itself

placid cedar May 29, 2023, 2:38 PM

#

oh, because the lecturer asked us to like, for instance, the transformation methods

#

he asked us to like for each numerical variable

#

try out the different methods

#

and see which one gives the best score or smth

#

but i still have some categorical data, so shld i js encode it first, and then do some transformation, and then use the r aquare and mse test?

past meteor May 29, 2023, 2:42 PM

#

You try out the different methods, make predictions and compare the rmse they were making on the test set

#

So method 1 -> predictions_1 -> rmse_1 vs method 2 -> predictions_2 -> rmse_2

placid cedar May 29, 2023, 2:42 PM

#

aite

half bramble May 29, 2023, 2:50 PM

#

Hello fellas

#

I’m new in Data science

#

What ur suggestions for better understanding..?🙏🏽

past meteor May 29, 2023, 2:53 PM

#

Very hard question to answer because it depends on what you already know

placid cedar May 29, 2023, 2:57 PM

#

what numerical imputation should i use for this case?

#

i feel that mean wld be good here as the data is quite evenly distributed without any form of extreme values

#

or maybe median

half bramble May 29, 2023, 3:02 PM

#

past meteor Very hard question to answer because it depends on what you already know

I know, but I literally just understand basics python and now want to learn Data science

past meteor May 29, 2023, 3:07 PM

#

half bramble I know, but I literally just understand basics python and now want to learn Data...

How is your math / statistics?

#

If it's rusty I'd start there tbh

#

And then pick up a canonical data science book and proceed from there

half bramble May 29, 2023, 3:22 PM

#

past meteor How is your math / statistics?

Not good🥲

half bramble May 29, 2023, 3:22 PM

#

past meteor If it's rusty I'd start there tbh

What u mean bout rusty..?

past meteor May 29, 2023, 3:22 PM

#

Being not good

half bramble May 29, 2023, 3:23 PM

#

past meteor Being not good

That’s me😂

past meteor May 29, 2023, 3:24 PM

#

half bramble That’s me😂

https://mml-book.github.io/book/mml-book.pdf this book covers most of the math that is frequently used

half bramble May 29, 2023, 3:25 PM

#

Thanks❤️

pseudo spire May 29, 2023, 3:56 PM

#

math statistics is a vital skill, so if you are in university pay attention to it

sleek harbor May 29, 2023, 3:59 PM

#

is target encoding often used in the field, and is it something one should necessarily know? Every way I look at it looks like the whole idea is a target leakage hazard.. I don't like it...

half bramble May 29, 2023, 4:10 PM

#

sleek harbor is target encoding often used in the field, and is it something one should neces...

As they said it depends.. and for me i will study it for machine learning

half bramble May 29, 2023, 4:10 PM

#

half bramble As they said it depends.. and for me i will study it for machine learning

Is it right to start understanding Data science for machine learning..?

past meteor May 29, 2023, 4:25 PM

#

sleek harbor is target encoding often used in the field, and is it something one should neces...

I think it can work for very very high dimensional input

#

Like if a few of your columns have very high cardinality if you'd one hot them

agile cobalt May 29, 2023, 4:31 PM

#

sleek harbor is target encoding often used in the field, and is it something one should neces...

I mean, if your target feature are classes like ["dog", "cat"], then even just converting the output from your neural network from 1 or 0 or otherwise index positions to the names when using a normal classification model is already a form of encoding, isn't it?

past meteor May 29, 2023, 4:34 PM

#

agile cobalt I mean, if your target feature are classes like `["dog", "cat"]`, then even just...

Target encoding is something else

#

It's encoding a level your feature (a categorical variable) by say the mean of the target when that categorical variable appears

#

E.g., https://maxhalford.github.io/blog/target-encoding/

agile cobalt May 29, 2023, 4:35 PM

#

oh

#

~~...isn't that still just (the) one-hot encoding (case listed there)?~~

past meteor May 29, 2023, 4:36 PM

#

No

agile cobalt May 29, 2023, 4:38 PM

#

oh wait I see

past meteor May 29, 2023, 4:39 PM

#

[{"colA" : "foo", "target": 1}, {"colA" : "bar", "target": 2}, {"colA": foo" : "target":7}] colA gets replaced with 4 in case it's foo and 2 in case it's bar

#

Bit handwavy but one hot "retains" information better but it sucks, so much, for high cardinality stuff sometimes

merry ore May 29, 2023, 6:06 PM

#

hello folks! hope you are having a great day! We are working on writing a ml pipeline for a bunch of models, in Rust and Python . So, we have looking at various inference engines. So far we have tried OnnxRuntime and tflite. While working with Onnx we noticed that it also has a bunch of ExecutionProviders. It has support for CoreMl on Apple devices. We also noticed another package coremltools by Apple. Both can give us inference results. Can any body give me some differences between the two? Do they call same backing code or are there any significant differences ?