#data-science-and-ml | Python | Page 6

tidal bough Aug 5, 2022, 6:11 PM

#

ooh, maybe...

#

oh my fucking god

#

it was right all along, since I was using apply

#

but it didn't look right

#

because I didn't know that pandas has special goddamn support for lists

#

and if you have a column of lists, it will visually "unwrap" these lists, each element into a row

#

This is one row 😩

untold bloom Aug 5, 2022, 6:15 PM

#

no, there's no such automatic thing for lists

#

and no, that's not a one row...

#

you need explicit .explode to roll lists to rows

tidal bough Aug 5, 2022, 6:16 PM

#

huh, you're right

#

indeed it's of len 2

untold bloom Aug 5, 2022, 6:16 PM

#

that screenshot implies a MultiIndex frame

tidal bough Aug 5, 2022, 6:16 PM

#

so I guess the issue then is that apply explodes a column of mine without asking

#

(or however it's called, here)

untold bloom Aug 5, 2022, 6:17 PM

#

well, without an MRE, i don't know what to write

#

because not sure what function you're applying to what kind of dataframe :|

tidal bough Aug 5, 2022, 6:17 PM

#

fair enough

#

lemme see if I can hack something together

tidal bough Aug 5, 2022, 6:20 PM

#

untold bloom well, without an MRE, i don't know what to write

test_df = pd.DataFrame.from_dict(dict(name=["A", "A", "A", "B", "B"], thing=["1", "2", "3", "1", "2"]))


def test_f(group):
    lst = [row.thing for row in group.itertuples()]
    return pd.DataFrame.from_dict(dict(
        name=group.iloc[0].name,
        lst=lst
    ))


test_df.groupby("name").apply(test_f)

#

here's a simplified example. Each group is collapsed into a 1-row dataframe with a list-type column.

#

The result becomes multiindex:

MultiIndex([('A', 0),
            ('A', 1),
            ('A', 2),
            ('B', 0),
            ('B', 1)],
           names=['name', None])

#

oh hey, I got it

#

the way to do it is pretty counterintuitive to me, though:

return pd.Series(dict(
        name=group.iloc[0].name,
        lst=lst
    ))

#

if apply returns a Series, it's not unwrapped into a multiindex. If it returns a dataframe, it is.

#

I wonder if there's even a mention of that in the docs.

solid quail Aug 5, 2022, 6:31 PM

#

Hey everyone,

untold bloom Aug 5, 2022, 6:34 PM

#

tidal bough if `apply` returns a Series, it's not unwrapped into a multiindex. If it returns...

the distinction is not Series vs DataFrame-returning function per se; it's about whether what you return respects the original index of the apply-e group

#

pandas will try to be helpful by putting another level of index consisting of the grouper keys so you can identify which group led to which new indexing scheme.

#

if you want, you can disable this via passing group_keys=False to .groupby(...).

#

in your case,2 groups had the indicies [0, 1, 2] and [3, 4]; but what you returned from the function per groups did not preserve the corresponding indexes fully, e.g., you returned [0, 1, 2] for "A" (fine) but [0, 1] for "B" (not so fine), hence the multiindex appearing.

tidal bough Aug 5, 2022, 6:39 PM

#

Hmm, interesting

solid quail Aug 5, 2022, 6:41 PM

#

Hey everyone, I have encountered a unique issue and I could really use some suggestions on how to resolve it as I have been stuck for a few days. I am currently trying to iterate through a df and create a new column that contains the value of the difference between two columns. The issue is that I need to find the difference between two separate columns on different rows. (so the difference between column A on row 1, and Column B on row 2, and store the difference value in column C on row 1. ) Does anyone have any experience doing this?

untold bloom Aug 5, 2022, 6:42 PM

#

hi, from that explanation, it seems like df["new"] = df["A"] - df["B"].shift(-1) should work?

#

.shift(-1) will "pull up" the column B one row above; then, when subtracted from df["A"] it is as if you're substracting row k of A and k + 1 of B

solid quail Aug 5, 2022, 6:43 PM

#

Thank you so much! I will give it a try

desert oar Aug 5, 2022, 6:56 PM

#

you can also .shift() or .shift(1) to shift forward instead

#

!e ```python
import pandas as pd

data = pd.DataFrame({
'x': [1,2,3],
'y': [33,22,11],
}, index=list('abc'))

print(data['x'])
print(data['x'].shift(-1))
print(data['x'].shift()) # default value is 1

arctic wedgeBOT Aug 5, 2022, 6:57 PM

#

@desert oar :white_check_mark: Your 3.10 eval job has completed with return code 0.

001 | a    1
002 | b    2
003 | c    3
004 | Name: x, dtype: int64
005 | a    2.0
006 | b    3.0
007 | c    NaN
008 | Name: x, dtype: float64
009 | a    NaN
010 | b    1.0
011 | c    2.0
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ecegevatub.txt?noredirect

desert oar Aug 5, 2022, 6:57 PM

#

note that shift shifts the data while keeping the index in place. this is perfect for doing computations exactly like what @solid quail described

tropic matrix Aug 5, 2022, 8:08 PM

#

If you have a regression problem, where the output can range from 1000 to 10 billion, is it a good idea to scale the output down? (atm i'm using the sklearn minmaxscaler)

modest onyx Aug 5, 2022, 8:12 PM

#

And then you just did one hot?

#

That's so weird cuz I did just that and my model is just getting stuck at a local minima where it outputs the most frequent character in the text

#

In my case it's either e or spaces

#

And when I researched on Google, it looks like people use more fancy encoding/decoding methods to avoid this issue

dusty valve Aug 5, 2022, 8:22 PM

#

modest onyx In my case it's either e or spaces

E truly is the most magnificent

#

even our computers agree

lapis sequoia Aug 5, 2022, 8:28 PM

#

class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        self.weights = 0.01 * np.random.rand(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases

X, y = spiral_data(samples=100, classes=3)
dense1 = Layer_Dense(2, 3)
dense1.forward(X)
print(dense1.output.shape)

Any idea why dense1 output is in the shape (300, 3)? Shouldn't it be 200 since I'm inputting 100 different x and y values for each dot?

#

Where does the 300 come from?

rich trail Aug 5, 2022, 8:30 PM

#

any recommendations on courses with projects or a course that goes over practical parts to get me steup creating projects?

#

im currently waiting on financial aid for the 2nd Ng Andrew course in the ml specialization but since it takes 2 weeks im looking for something while i wait

desert oar Aug 5, 2022, 8:32 PM

#

rich trail any recommendations on courses with projects or a course that goes over practica...

i recommend "just doing projects", e.g. kaggle. don't worry too much about high score, focus on figuring out workflows and tools that you like using and feel comfortable with.

rich trail Aug 5, 2022, 8:34 PM

#

and just google and watch videos when i need?

desert oar Aug 5, 2022, 8:34 PM

#

no, avoid videos and google and medium.com

rich trail Aug 5, 2022, 8:34 PM

#

so books? i got acces to O'reilly learning for free rn

desert oar Aug 5, 2022, 8:34 PM

#

practice reading actual software docs, and get a textbook if you don't already have one

rich trail Aug 5, 2022, 8:34 PM

#

they got all their books and some videos associated with the books

#

u recommend skipping that too?

desert oar Aug 5, 2022, 8:34 PM

#

no, those are good books. the videos are probably fine

#

OpenIntro has a statistics textbook too

rich trail Aug 5, 2022, 8:35 PM

#

ye im doing the math as im preparing for a masters in cs and focus on ml

wooden sail Aug 5, 2022, 8:35 PM

#

lapis sequoia ```py class Layer_Dense: def __init__(self, n_inputs, n_neurons): se...

i don't think this is doing what you think it's doing. you're saying the weights are of size 2 x 3 and the biases of size 3. what does your data look like? you're dotting X with the weights, so that seems to imply that X is of size 300 x 2

desert oar Aug 5, 2022, 8:35 PM

#

there is just so much "blogspam" out there on beginner-level ML that it's very difficult to avoid as a newbie

rich trail Aug 5, 2022, 8:35 PM

#

just wanna get into the practical parts beforehand

desert oar Aug 5, 2022, 8:35 PM

#

yeah, then just spend your time messing with data

#

if you are going to read blog junk, at least make sure it's from towardsdatascience.com, their blog junk is still somewhat junky but at least the minimum quality is somewhat high

#

rando youtube videos are unlikely to be useful

#

fast.ai is also a free online course and collection of resources

lapis sequoia Aug 5, 2022, 8:36 PM

#

wooden sail i don't think this is doing what you think it's doing. you're saying the weights...

The biases are 0 because of the np.zeros function

modest onyx Aug 5, 2022, 8:36 PM

#

lapis sequoia ```py class Layer_Dense: def __init__(self, n_inputs, n_neurons): se...

I think randn is better for initialization no?

wooden sail Aug 5, 2022, 8:36 PM

#

lapis sequoia The biases are 0 because of the np.zeros function

that doesn't matter. what's the shape of X?

rich trail Aug 5, 2022, 8:36 PM

#

desert oar fast.ai is also a free online course and collection of resources

i'll check it out and try some of the o'reilly books

lapis sequoia Aug 5, 2022, 8:37 PM

#

wooden sail that doesn't matter. what's the shape of X?

it's 2 dimensional, 1 x value and 1 y value

rich trail Aug 5, 2022, 8:37 PM

#

asides from learning ml, any recommendations on learning how to actually work with data @desert oar

#

just learn by practise?

desert oar Aug 5, 2022, 8:37 PM

#

rich trail asides from learning ml, any recommendations on learning how to actually work wi...

learn statistics, read about data visualization, learn about databases, learn about missing data imputation (subset of statistics), practice

wooden sail Aug 5, 2022, 8:38 PM

#

lapis sequoia it's 2 dimensional, 1 x value and 1 y value

can you just print the shape? and what do you expect spiral_data to return with the parameters you gave it?

#

because matrix products between matrices of sizes m x n and n x k are of size m x k, so naturally your X is of size 300 x 2

lapis sequoia Aug 5, 2022, 8:38 PM

#

wooden sail can you just print the shape? and what do you expect spiral_data to return with ...

X.shape returns ```py
(300, 2)

wooden sail Aug 5, 2022, 8:39 PM

#

indeed. so the question is what did you expect the shape to be given the parameters you passed to spiral data

lapis sequoia Aug 5, 2022, 8:39 PM

#

Ah... I thought it would be 100 since I specified samples=100

#

maybe I misunderstood what samples=100 means

wooden sail Aug 5, 2022, 8:39 PM

#

what does spiral data return?

lapis sequoia Aug 5, 2022, 8:40 PM

#

wooden sail what does spiral data return?

what do you mean?

wooden sail Aug 5, 2022, 8:40 PM

#

... what it does return lol

#

what is the data supposed to look like?

#

you specified 3 classes, too

#

how are those supposed to be returned

lapis sequoia Aug 5, 2022, 8:41 PM

#

I just told you the shape of the X values, the y values(labels) are either 0, 1, or 2.

#

an example of the X value would be:

[-8.67685974e-01 -3.85566771e-01]
 [-9.11101043e-01  3.01196396e-01]

steady basalt Aug 5, 2022, 8:41 PM

#

opinions on balancing the test set in binary classification of medical record data?

rich trail Aug 5, 2022, 8:41 PM

#

desert oar learn statistics, read about data visualization, learn about databases, learn ab...

appreciate it!

wooden sail Aug 5, 2022, 8:41 PM

#

yes dude, that's not what i'm asking. you were expecting it to return something of shape 100 x 2. why?

#

what do the classes mean?

#

how are the classes being concatenated?

lapis sequoia Aug 5, 2022, 8:42 PM

#

what are you even talking about

#

you've lost me bro

wooden sail Aug 5, 2022, 8:42 PM

#

i'm asking you what your code means

#

X, y = spiral_data(samples=100, classes=3)

#

you wrote this, yes? what does this mean? what do the classes do there?

lapis sequoia Aug 5, 2022, 8:43 PM

#

I was expecting spiral_data to return 100, 2 because I thought samples=100 would mean 100 different instances of X

wooden sail Aug 5, 2022, 8:43 PM

#

what does spiral_data return? did you write this function? is it from another lib?

lapis sequoia Aug 5, 2022, 8:43 PM

#

wooden sail what does spiral_data return? did you write this function? is it from another li...

ah

#

it's from library called nnfs, they created spiral_data where X returns (x, y) values which are points, and y returns classes(0, 1, or 2) which show what spiral the dot(X) belongs to

#

And I thought there would be 100 rows because I specified spiral_data(samples=100) and not 300 rows

#

isn't that what samples=100 makes sense that it would be, right?

wooden sail Aug 5, 2022, 8:47 PM

#

i'm looking for the docs for this func but can't find anything useful

tropic matrix Aug 5, 2022, 8:48 PM

#

tropic matrix If you have a regression problem, where the output can range from 1000 to 10 bil...

on top of that, does having the values range from 0-1 affect how mse is calculated? (the square of a decimal between 0 and 1 will be less than the value itself)

modest onyx Aug 5, 2022, 8:48 PM

#

one thing you can do is just tell us the shape and type of the thing it returns

#

using print

#

print(X.shape, y.shape)
print(type(X), type(y)

lapis sequoia Aug 5, 2022, 8:50 PM

#

X, y = spiral_data(samples=100, classes=3)
print(X.shape, y.shape)

returns

(300, 2) (300,)

lapis sequoia Aug 5, 2022, 8:50 PM

#

modest onyx ```python print(X.shape, y.shape) print(type(X), type(y) ```

they're both numpy.ndarray

modest onyx Aug 5, 2022, 8:50 PM

#

oh so the input shape is 300?

lapis sequoia Aug 5, 2022, 8:51 PM

#

Yeah

modest onyx Aug 5, 2022, 8:51 PM

#

wait this doesn't make sense

lapis sequoia Aug 5, 2022, 8:51 PM

#

how so?

modest onyx Aug 5, 2022, 8:51 PM

#

if you have 100 samples and the input shape is a vector of size 300

#

then I would expect the shape to be (100, 300)

wooden sail Aug 5, 2022, 8:51 PM

#

the only question is why is it 300 x 2 instead of 100 x 2

modest onyx Aug 5, 2022, 8:52 PM

#

oh

lapis sequoia Aug 5, 2022, 8:52 PM

#

wooden sail the only question is why is it 300 x 2 instead of 100 x 2

that's what I'm saying

#

let me show a different output with different sample and class values

wooden sail Aug 5, 2022, 8:52 PM

#

i know, i'm just getting abdulhaleem up to speed. yeah, change classes to 1

#

maybe it generates n samples per class

#

i can't find the docs anywhere

lapis sequoia Aug 5, 2022, 8:52 PM

#

X, y = spiral_data(samples=10, classes=2)
print(X.shape, y.shape)

returns

(20, 2) (20,)

#

and ```py
X, y = spiral_data(samples=100, classes=1)
print(X.shape, y.shape)

prints

(100, 2) (100,)

wooden sail Aug 5, 2022, 8:53 PM

#

ok, so it generates n samples per class indeed

modest onyx Aug 5, 2022, 8:53 PM

#

actually yeah that makes sense

#

I'm guessing that the spiral_data returns x,y coordinates representing a spiral

#

hense why it's 100, 2

modest onyx Aug 5, 2022, 8:54 PM

#

lapis sequoia ```py X, y = spiral_data(samples=10, classes=2) print(X.shape, y.shape) ``` retu...

actually nvm then what does classes mean

lapis sequoia Aug 5, 2022, 8:54 PM

#

modest onyx I'm guessing that the spiral_data returns x,y coordinates representing a spiral

right, but when I do samples=100, classes=3 why does it return (300, 2)

wooden sail Aug 5, 2022, 8:54 PM

#

100 samples for each of the classes

#

just like in the examples you did rn

modest onyx Aug 5, 2022, 8:55 PM

#

oh interesting

#

well then I'd expect the matrix multiply to return shape n_samples * num_classes

steady basalt Aug 5, 2022, 8:55 PM

#

opinions in balancing test set?

lapis sequoia Aug 5, 2022, 8:56 PM

#

wooden sail 100 samples for each of the classes

so it's returning a new x, y coordinate for each class per sample...

wooden sail Aug 5, 2022, 8:56 PM

#

yes

lapis sequoia Aug 5, 2022, 8:56 PM

#

what I was expecting it to return was 100 samples, with 33% of them in class 1, 33% in class 2, and 34% in class 3

#

idk why I was expecting it to act like that but yeah it doesn't act like that

wooden sail Aug 5, 2022, 8:57 PM

#

i would've expected that too

modest onyx Aug 5, 2022, 8:57 PM

#

yeah that's pretty confusing

lapis sequoia Aug 5, 2022, 8:57 PM

#

yeah, thanks for your help Edd and Abdulhaleem

modest onyx Aug 5, 2022, 8:57 PM

#

bad design choice if you ask me

steady basalt Aug 5, 2022, 9:34 PM

#

anyone know why a neural net would do this if the traiing data was oversampled to balance?

#

random forest manages just fine to reach 0.6 recall and 0.1 prec for class 1

cold smelt Aug 5, 2022, 10:14 PM

#

I'd like to use NLP to generate answers in a simple oracle bot, but I haven't dealt with ML much yet. What are my options?

modest onyx Aug 5, 2022, 10:38 PM

#

I haven't done any NLP but I've heard of GPT 2 which could be a good option

modest onyx Aug 5, 2022, 10:46 PM

#

dusty valve **E** truly is the most magnificent

actually although using categorical sampling at inference time significantly improved my results, it turns out my biggest oopsie was thinking that torch.nn.functional.cross_entropy accepts a probability distribution as input when it actually accepts logits

#

now my model actually works

lapis sequoia Aug 6, 2022, 2:30 AM

#

Does anyone have some tips, or know a good tutorial, for someone who wants to create a classification model on images on my own PC? A lot of tutorials just use the built-in datasets.

modest onyx Aug 6, 2022, 2:32 AM

#

so you want to do classification on your own dataset?

#

that isn't that different from using built in datasets

lapis sequoia Aug 6, 2022, 2:33 AM

#

modest onyx so you want to do classification on your own dataset?

yeah

lapis sequoia Aug 6, 2022, 2:34 AM

#

modest onyx that isn't that different from using built in datasets

I don't know how to get my code to read the images off my PC or how to label them for my model

modest onyx Aug 6, 2022, 2:34 AM

#

In pytorch for example, the built in datasets are built using the DataLoader and Dataset classes. So all you have to do is get your images and wrap them on those classes

lapis sequoia Aug 6, 2022, 2:34 AM

#

Ok, I'm gonna try that

modest onyx Aug 6, 2022, 2:34 AM

#

oh

#

wait so you still not able to turn your images into tensors?

lapis sequoia Aug 6, 2022, 2:35 AM

#

No, I'm not

modest onyx Aug 6, 2022, 2:35 AM

#

well there's a long way to do it and an easy way to do it

#

but if you're already working within a framework then just using DataLoader and Dataset can make your life pretty easy

#

they can do all that under the hood

lapis sequoia Aug 6, 2022, 2:36 AM

#

Do Dataloaders handle turning images into tensors?

modest onyx Aug 6, 2022, 2:38 AM

#

yeah I think so

#

but you might need to load the images as PIL

lapis sequoia Aug 6, 2022, 2:40 AM

#

Ok, trying that now

modest onyx Aug 6, 2022, 2:42 AM

#

Yeah I think once you are able to turn your images into PIL, then the rest should be easy

lapis sequoia Aug 6, 2022, 2:42 AM

#

Ok, I converted an image to a tensor

modest onyx Aug 6, 2022, 2:42 AM

#

but I'm not sure if that's the most efficient way

lapis sequoia Aug 6, 2022, 2:42 AM

#

I don't know about efficiency either. I just loaded the image as a PIL image and used a ToTensor() transform and it seems to work

modest onyx Aug 6, 2022, 2:42 AM

#

probably shouldn't matter since you only need to load the dataset once and you're done

lapis sequoia Aug 6, 2022, 2:42 AM

#

it returns 3 RGB values like expected

lapis sequoia Aug 6, 2022, 2:43 AM

#

modest onyx probably shouldn't matter since you only need to load the dataset once and you'r...

it's still better to have good habits, but yeah doesn't matter to me now.

modest onyx Aug 6, 2022, 2:43 AM

#

lapis sequoia I don't know about efficiency either. I just loaded the image as a PIL image and...

no I'm thinking about if you have say 100,000 images for example then there might be a more efficient way to do all this at the same time

#

rather than going one by one

#

but you probably don't need to worry about that

lapis sequoia Aug 6, 2022, 2:43 AM

#

Yeah

modest onyx Aug 6, 2022, 2:44 AM

#

are you using dataloader and datasets?

lapis sequoia Aug 6, 2022, 2:44 AM

#

not yet

#

I'm working with 1 image right now

#

Do you know how to convert a tensor to a numpy array?

modest onyx Aug 6, 2022, 2:45 AM

#

wait so you're not using either pytorch or tensorflow?

lapis sequoia Aug 6, 2022, 2:45 AM

#

I'm using pytorch

#

my tensor is currently in the form of a ToTensor object

modest onyx Aug 6, 2022, 2:46 AM

#

if you're only dealing with one image then you don't need a dataloader

#

but why would you want to turn it into a anumpy array if youre using pytorch?

#

you want to show it using matplotlib?

#

in that case it's just .numpy()

lapis sequoia Aug 6, 2022, 2:47 AM

#

well the ToTensor object isn't the same thing as a Tensor object right?

#

Trying to figure out how to convert it to a tensor object or a numpy array

#

unless they're the same thing

#

Ok, not the same because when I run "my_tensor.shape" I get

AttributeError: 'ToTensor' object has no attribute 'shape'

modest onyx Aug 6, 2022, 2:49 AM

#

ToTensor is a function

#

it returns a function that can be used to convert images into tensors

#

if you want to use the function right away, theres a pil_to_tensor function

#

but ToTensor is good if you want to compose it with other transformations

#

torchvision.transforms.functional.pil_to_tensor might be what you're looking for and you can import it

#

also I checked one of my recent projects and it looks pretty simple to load an entire folder of images as a dataloader

lapis sequoia Aug 6, 2022, 2:52 AM

#

I used "PILToTensor" and it worked as well

lapis sequoia Aug 6, 2022, 2:53 AM

#

modest onyx also I checked one of my recent projects and it looks pretty simple to load an e...

Ok, I'll try it and see how it goes

#

wait... I could've just kept using ToTensor and set a variable to the function and it would've worked

modest onyx Aug 6, 2022, 2:53 AM

#

transform = transforms.Compose([
        transforms.Resize(image_size),
        transforms.ToTensor(),
        # other transformations
    ])

# the folder data/fiftyk has a single folder in it where all images are
train_dataset = datasets.ImageFolder(root="data/fiftyk", transform=transform)
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)

modest onyx Aug 6, 2022, 2:54 AM

#

lapis sequoia wait... I could've just kept using ToTensor and set a variable to the function a...

yeah ik but I'd say it's cleaner to use pil_to_tensor

#

maybe that's just me though

#

“Shame I want to get us coming from his s1elling table!” said Stan. “Harry yelled after 
chance of glittening of Otter, while Kuttic tells me.” 

They all thowed his broom below at the bit past twelve 
Prifet consulets, had said, “I always middle shop I told your common room, the 
statue and next way. “So you’ve eaten to do, Harry ... it’s dead ideas had had 
exams to frightens, and she — what was a rust.

#

Trained for 2 epochs and already this good 😵‍💫

modest onyx Aug 6, 2022, 3:05 AM

#

lapis sequoia wait... I could've just kept using ToTensor and set a variable to the function a...

Also look at the source code for PilToTensor

#

it literally returns F.pil_to_tensor

Screen_Shot_2022-08-05_at_8.05.19_PM.png

tropic matrix Aug 6, 2022, 4:48 AM

#

If you have a regression problem, where the output can range from 1000 to 10 billion, is it a good idea to scale the output down? (atm i'm using the sklearn minmaxscaler, which scales it to be between 0 and 1)

on top of that, does having the values range from 0-1 affect how mse is calculated? (the square of a decimal between 0 and 1 will be less than the value itself)

wooden sail Aug 6, 2022, 4:51 AM

#

yes, it's a good idea because the gradients will depend on the size of the values the function takes. rescaling prevents exploding gradients

#

the MSE will behave as usual

#

what does matter about the MSE, independent of what you mentioned, is that small error values are effectively "ignored. this happens regardless of the error dynamic range, and is one of the reasons regularization is helpful

#

the TL;DR is "yes it's a good idea" and "no, MSE still works the same"

modest onyx Aug 6, 2022, 5:08 AM

#

as long as there's no problem of outliers then minmax feature scaling is good

ebon hazel Aug 6, 2022, 7:03 AM

#

https://www.udemy.com/course/tensorflow-developer-certificate-machine-learning-zero-to-mastery/
https://www.udemy.com/course/artificial-intelligence-az/
https://www.udemy.com/course/data-science-machine-learningtheoryprojectsa-z-90-hours/

Are any of these Udemy courses good for data science? Or is there a even better course I don't know of/better resource

worthy phoenix Aug 6, 2022, 8:21 AM

#

im getting a memory error even tho htop says i have almost 3gb memory at my hand idk why

wooden sail Aug 6, 2022, 8:23 AM

#

what are you trying to do?

primal shuttle Aug 6, 2022, 9:23 AM

#

@worthy phoenix you can double check that with psutil

#

import psutil
psutil.virtual_memory()

worthy phoenix Aug 6, 2022, 9:24 AM

#

got the reason for the error

steady basalt Aug 6, 2022, 9:24 AM

#

thesis results complete, not great but at least we can say 'u DEF dont have cancer'

#

what would you do

worthy phoenix Aug 6, 2022, 9:24 AM

#

its reading the whole model into memory and the model is about 10gb

steady basalt Aug 6, 2022, 9:25 AM

#

is it worth balancinbg the test set to get a better understanding of the model

primal shuttle Aug 6, 2022, 9:25 AM

#

If your set is overall unbalanced, then no

#

Preprocess, split with stratification, balance the training data, train the models, and then test on the imbalanced data with appropriate metrics

steady basalt Aug 6, 2022, 9:27 AM

#

i trained this model on balanced data

#

but its hard to get a clear understanding of the model when 98% of the test data is a single class

#

also i used smote on the train set

primal shuttle Aug 6, 2022, 9:27 AM

#

If your set is balanced to begin with, you don't want to overengineer

steady basalt Aug 6, 2022, 9:28 AM

#

it was extremely un balanced

#

like 100k vs 4k samples

primal shuttle Aug 6, 2022, 9:28 AM

#

Oh ok

primal shuttle Aug 6, 2022, 9:28 AM

#

primal shuttle Preprocess, split with stratification, balance the training data, train the mode...

Then this applies

steady basalt Aug 6, 2022, 9:28 AM

#

so u can say im training on lets say 4000 of each

#

then testing on 2000 and 500

#

BUT, if i tested on 500 and 500 i might better see how it determines class

#

frmo those results, would you say its over predicting class 0 because the data available doesnt allow for much class 1 predictions

primal shuttle Aug 6, 2022, 9:29 AM

#

For the train/test splitting you can additionally apply the K-fold, and then to the remainders for each fold

steady basalt Aug 6, 2022, 9:30 AM

#

wdym

primal shuttle Aug 6, 2022, 9:30 AM

#

check this out

Analytics Vidhya

guest_blog

Class Imbalance | Handling Imbalanced Data Using Python

Class Imbalance is a very common problem in machine learning. This article lists ways to dealing with imbalanced classes in machine learning using Python.

steady basalt Aug 6, 2022, 9:30 AM

#

id di oversample

#

i used smote

#

maybe i shud try non-informed over sampling

#

because the data is v noisy?

#

AND its quite high dimensional, i one hot encoded a couple features

lapis sequoia Aug 6, 2022, 9:59 AM

#

Hi there, any suggestions on data science interview practice sites? I've used Hackerrank, Leetcode, and AceAI so far

steady basalt Aug 6, 2022, 10:25 AM

#

my experience so far with coding data science interviews is so negative

#

idk how u can give someone a bunch of tables in an env and with code theyve never seen before and expect them in 10 mins to return a table exactly how you like it when it takes ages ot understand whats even going on

#

especially when hacker rank error output is bugged and invisible

gloomy anvil Aug 6, 2022, 11:00 AM

#

Hello, has one of you experience with using statmodels? I used it to plot acf plots like such:

steady basalt Aug 6, 2022, 11:00 AM

#

its good but i prefer stata

gloomy anvil Aug 6, 2022, 11:00 AM

#

gloomy anvil Aug 6, 2022, 11:01 AM

#

steady basalt its good but i prefer stata

ok, my question is though: I have 10 different TimeSeries that I want to plot in one ACF plot

steady basalt Aug 6, 2022, 11:01 AM

#

matplotlib lets u do that yes

#

im sure statsmodels isnt the graph itself, pyplot is

#

just add a bunch of plots and then draw it theyll all go in the space

#

on that axis

gloomy anvil Aug 6, 2022, 11:01 AM

#

how would I do this? Should I first sum the 10 Timeseries up to one timeseries and then run acfplot?

steady basalt Aug 6, 2022, 11:02 AM

#

well are you allowed to do that?

#

10 time serires cant be represented by 1 time series

#

ud need to plot them as seperate lines surely

gloomy anvil Aug 6, 2022, 11:03 AM

#

I have 10 different timesseries that are kind of correlated, but I want to see if there is generally some autocorrelation on a meta level.

steady basalt Aug 6, 2022, 11:04 AM

#

cant u do them alongside each other then as seperate values

#

wouldnt that add to the corrleatrion

gloomy anvil Aug 6, 2022, 11:04 AM

#

or is there a way to get the autocorrelations per lag per timeseries from the plot? And calculate a mean autocorrelation per lag?

steady basalt Aug 6, 2022, 11:04 AM

#

like multiple variables

gloomy anvil Aug 6, 2022, 11:06 AM

#

Well I already have created an acf plot per dataset and per predictor (all in all 80 plots), so on a detailed level I can already assess the autocorrelations. I just want to have an aggregated view of autocorrelations

unique flame Aug 6, 2022, 1:05 PM

#

Been training this yolo algorithm for 6.5 hours now..though google colab. I watched 2 movies, died a few times in playing ps4 games all while keeping the session active...

storm sigil Aug 6, 2022, 1:50 PM

#

y_1 = lambda x: x**2
plt.scatter(list(range(1000)),[y_1(a) for a in range(1000)], s=20, edgecolor='none', cmap=plt.cm.Blues)

The cmap doesn't work here. Why is that?

#

Screen_Shot_2022-08-06_at_7.35.24_PM.png

storm sigil Aug 6, 2022, 2:27 PM

#

got it

meager crater Aug 6, 2022, 2:27 PM

#

Hey is the structure of make_column_transformer right?

# First would need to deal with Binary Labeling

from sklearn.compose import make_column_transformer
from sklearn.preprocessing import LabelBinarizer, OneHotEncoder

bin_cols = ["gender", "ever_married", "Residence_type"]
ohe_cols = ["work_type", "smoking_status"]

ct = make_column_transformer(
    (LabelBinarizer(), bin_cols),
    (OneHotEncoder(), ohe_cols),
    remainder="passthrough"
)

ct.fit(df)

Error:
TypeError: LabelBinarizer.fit_transform() takes 2 positional arguments but 3 were given

wooden sail Aug 6, 2022, 2:55 PM

#

storm sigil

cmap is for 2D and 3D images. for 1d plots, you can directly specify the color of each curve with a letter or by making your own colors

real oyster Aug 6, 2022, 3:02 PM

#

`model = Sequential()
model.add(Conv2D(64, kernel_size=4, activation="relu", input_shape = (256,256,3)))
model.add(MaxPooling2D(4,4))

model.add(Conv2D(32, kernel_size=3, activation="relu", padding="same"))
model.add(MaxPooling2D(3,3))

model.add(Flatten())
model.add(Dense(32))
model.add(Dense(5, activation='softmax'))`

#

I trained this model and got these results

Screen_Shot_2022-08-06_at_11.02.46_AM.png

#

Screen_Shot_2022-08-06_at_11.03.00_AM.png

barren snow Aug 6, 2022, 3:03 PM

#

Hi! I want to calculate the array of sound envelope of a signal(in each music note), and I see a tool called https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.hilbert.html. However, there are two questions I want to check from the reference.

The first one is what are they doing in

signal = chirp(t, 20.0, t[-1], 100.0)
signal *= (1.0 + 0.5 * np.sin(2.0*np.pi*3.0*t))

I am not sure the number 20 and 100 in first line is for what meaning?

The second question is, I want to calculate the envelope of a signal(onset), not the whole sound. I already have onset and offset. Then, how to calculate it?

Thanks!

#

real oyster Aug 6, 2022, 3:03 PM

#

real oyster

I was wondering if the this is overfitting or non-ideal and also wondering how you would change the model to increase accuracy

wooden sail Aug 6, 2022, 3:04 PM

#

barren snow Hi! I want to calculate the array of sound envelope of a signal(in each music no...

what do you mean per note?

wooden sail Aug 6, 2022, 3:06 PM

#

barren snow Hi! I want to calculate the array of sound envelope of a signal(in each music no...

also chirps require you to specify the starting and ending frequency, since the frequency changes over time. that's what the 20 and 100 are: start at 20 hz and end at 100 hz

barren snow Aug 6, 2022, 3:06 PM

#

wooden sail what do you mean per note?

I mean onset, each sound event(might me more clear

wooden sail Aug 6, 2022, 3:06 PM

#

i don't know what you mean by onset and offset here, you'd have to clarify a bit more

barren snow Aug 6, 2022, 3:07 PM

#

wooden sail also chirps require you to specify the starting and ending frequency, since the ...

Is it the maximum and minimum, or just the starting point and ending point?

wooden sail Aug 6, 2022, 3:07 PM

#

those two things are the same here

#

well. i lie. if you look at the spectrum, depending on the extent of the signal w.r.t. the total time duration, you'll get higher frequency harmonics.

#

let's go with starting and ending value of the modulation frequency

barren snow Aug 6, 2022, 3:08 PM

#

wooden sail i don't know what you mean by onset and offset here, you'd have to clarify a bit...

Sure, Onset refers to the beginning of a musical note and offset means the ending of a musical note

wooden sail Aug 6, 2022, 3:08 PM

#

and by note you mean, pitch?

barren snow Aug 6, 2022, 3:09 PM

#

What is b note?

wooden sail Aug 6, 2022, 3:09 PM

#

a typo 😛

storm sigil Aug 6, 2022, 3:10 PM

#

wooden sail cmap is for 2D and 3D images. for 1d plots, you can directly specify the color o...

yeah

barren snow Aug 6, 2022, 3:10 PM

#

wooden sail and by note you mean, pitch?

yes! You can say in this way

wooden sail Aug 6, 2022, 3:11 PM

#

this is kind of a difficult problem, depending on how complicated you want to make it

barren snow Aug 6, 2022, 3:12 PM

#

wooden sail well. i lie. if you look at the spectrum, depending on the extent of the signal ...

Really? I thought it was easy!

wooden sail Aug 6, 2022, 3:12 PM

#

the easiest answer is "you don't need a hilbert transform, but rather a fourier one", but that's probably not what you're looking for

barren snow Aug 6, 2022, 3:12 PM

#

hahaa

wooden sail Aug 6, 2022, 3:12 PM

#

there's ongoing research in this stuff

barren snow Aug 6, 2022, 3:12 PM

#

ot other methods, maybe?

wooden sail Aug 6, 2022, 3:12 PM

#

do you know which notes you are looking for?

barren snow Aug 6, 2022, 3:12 PM

#

each note in this music piece

barren snow Aug 6, 2022, 3:13 PM

#

wooden sail there's ongoing research in this stuff

Librosa has one, I can show you.

#

But I the same problem is that i don't know how to calculate per sound event

wooden sail Aug 6, 2022, 3:14 PM

#

i'm reading a paper right now and they suggest using a filterbank, which is a generalization of the fourier approach i mentioned just now

barren snow Aug 6, 2022, 3:14 PM

#

Oh, cool. Let me check for a sec

wooden sail Aug 6, 2022, 3:15 PM

#

not only that, they seem to use a time-windowed approach, so it's really more like a short time fourier transform

#

i think that's what you're looking for

#

that'll give you a time-varying magnitude (well, really amplitude and phase, but the phase presumably won't matter much unless you want a super sophisticated method) for each frequency bin

#

that's how you make these "spectrograms"

#

each horizontal line is the time-varying magnitude of a bin or "note"

barren snow Aug 6, 2022, 3:17 PM

#

Great! COuld you send me the link of paper?

barren snow Aug 6, 2022, 3:17 PM

#

wooden sail each horizontal line is the time-varying magnitude of a bin or "note"

Yes, I think so too

wooden sail Aug 6, 2022, 3:18 PM

#

i just glossed over this because it mainly discusses the detection that goes AFTER. but just seeing that plot you can immediately tell it's something akin to an STFT, but presumably with filters that have a better band rejection http://dafx.de/paper-archive/2002/DAFX02_Duxbury_Sandler_Davis_note_onset_detection.pdf

#

if you're doing this for fun, i'd recommend to start with an STFT. if you're doing research, then it's time to read about polyphase filters

barren snow Aug 6, 2022, 3:19 PM

#

Also, Librosa has one, like this. Do you think it's the suitable way to calculate?

#

barren snow Aug 6, 2022, 3:19 PM

#

wooden sail if you're doing this for fun, i'd recommend to start with an STFT. if you're doi...

Yeah, I am doing the research, probably need to read it

#

😭

wooden sail Aug 6, 2022, 3:20 PM

#

i'm checking the docs and it makes something called an "onset strength envelope", which is going to be some variation of what i mentioned right now

#

idk how it finds the peaks, but it's probably something like a thresholded smoothed derivative

barren snow Aug 6, 2022, 3:21 PM

#

Find the peak is not a big problem for me, because there is a function tool called get_peak 🙂

wooden sail Aug 6, 2022, 3:21 PM

#

that's ok, but those have very low resolution

#

what kind of research are we talking 😛 super resolution parameter estimation?

barren snow Aug 6, 2022, 3:21 PM

#

wooden sail i'm checking the docs and it makes something called an "onset strength envelope"...

Yeah! onset strength envelope is what I am looking for

wooden sail Aug 6, 2022, 3:23 PM

#

i'm gonna check the pick_peak code really quick, let's see

barren snow Aug 6, 2022, 3:24 PM

#

wooden sail what kind of research are we talking 😛 super resolution parameter estimation?

Do you mean to find the peak?

wooden sail Aug 6, 2022, 3:24 PM

#

ah it's even simpler

#

it's just a simple heuristic, pick the max value in an interval if it's above a theshold

barren snow Aug 6, 2022, 3:25 PM

#

Cool

#

Is it this one?

wooden sail Aug 6, 2022, 3:25 PM

#

that will 'work' but you won't get state of the art results. depending on what your research is in, this is not good enough

barren snow Aug 6, 2022, 3:25 PM

#

scipy.signal.find_peaks(x, height=None, threshold=None, distance=None, prominence=None, width=None, wlen=None, rel_height=0.5, plateau_size=None)

wooden sail Aug 6, 2022, 3:25 PM

#

they actually coded their own in librosa

#

lemme read how scipy does it

#

eh pretty similar

barren snow Aug 6, 2022, 3:26 PM

#

wooden sail that will 'work' but you won't get state of the art results. depending on what y...

okay

#

What do you "feel" about this? About calculate onset strength envelope.

#

wooden sail Aug 6, 2022, 3:27 PM

#

in fairness, this type of peak finder is indeed a maximum likelihood estimator of peak locations, but only if the underlying parametric model is "easily resolved"

barren snow Aug 6, 2022, 3:29 PM

#

Yeah, u'r right...

wooden sail Aug 6, 2022, 3:32 PM

#

i'm not sure i fully understand what their onset strength computation is doing, i don't have enough time to go through all the details right now. they reference this mel spectrogram though, so this is a good place to start
[#] Böck, Sebastian, and Gerhard Widmer.
"Maximum filter vibrato suppression for onset detection."
16th International Conference on Digital Audio Effects,
Maynooth, Ireland. 2013.

#

at any rate, it's a filterbank, they're applying some bandpass filter to the signal to split it into bands over different time windows

#

this is a good place to start if your research is in onset detection itself. if not, and you just need this as an intermediate result, i'd say this is probably good enough. if this IS exactly what you're researching... the next question is whether you want to develop a new, better method or just make a survey of what is out there

barren snow Aug 6, 2022, 3:35 PM

#

Sure! Thanks for giving me those information and suggestion! I appreciate it!

lapis sequoia Aug 6, 2022, 5:05 PM

#

Does anyone know how I would proceed making my own image generator from words app?

serene scaffold Aug 6, 2022, 5:48 PM

#

lapis sequoia Does anyone know how I would proceed making my own image generator from words ap...

You can use an existing one, but training your own "from scratch" would require you to learn very advanced techniques that you wouldn't be able to understand until you've been studying AI for a while.

#

Not to say that you'll never be able to do it. But it would be a supremely disappointing first project.

steady basalt Aug 6, 2022, 6:42 PM

#

lapis sequoia Does anyone know how I would proceed making my own image generator from words ap...

a LOT of hard coding lol. i recommend you to use someone elses and then just ask permission or if its free use put it in ur app

languid stratus Aug 6, 2022, 7:15 PM

#

Anyone know if it's possible to pass a spacy object through an SKlearn pipeline

#

(I need the info in the object at different stages)

lapis sequoia Aug 6, 2022, 8:37 PM

#

Anyone know any open source ai image generator that generates purely based on word descriptions?

ebon hazel Aug 6, 2022, 8:59 PM

#

lapis sequoia Anyone know any open source ai image generator that generates purely based on wo...

Dalle E Mini and Dalle E

#

I think Dream too

steady basalt Aug 6, 2022, 9:42 PM

#

lapis sequoia Anyone know any open source ai image generator that generates purely based on wo...

disco diffusion

tropic matrix Aug 6, 2022, 10:31 PM

#

I'm solving a regression problem, but when i calculate the MSE for my loss it becomes "inf". I believe this is because my data ranges from 1,000 to 2,147,000,000, but what should i do to solve it?

#

(using keras)

#

i have rmse as a metric, and that displays a valid number, but when the loss is initially high it is impossible for it to be displayed as mse

#

so is it a viable solution to set objectives (like for early stopping, reducing lr, and hyper parameter tuning) to be the validation rmse?

wooden sail Aug 6, 2022, 10:45 PM

#

a quick fix is to change the dynamic range of the data

tropic matrix Aug 6, 2022, 10:52 PM

#

wooden sail a quick fix is to change the dynamic range of the data

and how would i go about doing that?

rapid cedar Aug 7, 2022, 2:05 AM

#

whats the best modules to learn b4 starting doing ML

desert oar Aug 7, 2022, 2:24 AM

#

rapid cedar whats the best module**s** to learn b4 starting doing ML

matplotlib, numpy, pandas. but you will almost certainly learn how to use these in the process of learning machine learning, i wouldn't worry too much about them. focus on being a good general-purpose programmer and being comfortable with python.

rapid cedar Aug 7, 2022, 3:18 AM

#

ok

modest onyx Aug 7, 2022, 4:09 AM

#

rapid cedar whats the best module**s** to learn b4 starting doing ML

it's the other way around I think

#

you should learn a bit of ML then start learning these modules

#

or at the very least be doing them in parallel

merry wadi Aug 7, 2022, 4:37 AM

#

Are there any ML models that have sequential/ordered splits?

I’d like the model to take into account specific columns first then others

crisp axle Aug 7, 2022, 5:07 AM

#

tropic matrix I'm solving a regression problem, but when i calculate the MSE for my loss it be...

you could try changing the units of your target variable

modest onyx Aug 7, 2022, 5:22 AM

#

tropic matrix I'm solving a regression problem, but when i calculate the MSE for my loss it be...

you can try normalizing the target data

tropic matrix Aug 7, 2022, 5:23 AM

#

modest onyx you can try normalizing the target data

i tried doing that and came up with a lot of problems, such as the model not predicting anything below 107319 (very weird number)

modest onyx Aug 7, 2022, 5:24 AM

#

that doesn't make sense

#

I suspect you're implementation could have a bug

#

could you show me how you normalized your targets?

elfin whale Aug 7, 2022, 8:06 AM

#

which is the best course of tableau for a beginner

#

any one?

quaint leaf Aug 7, 2022, 8:41 AM

#

elfin whale which is the best course of tableau for a beginner

define best ? Easiest, cheapest ?

old grove Aug 7, 2022, 8:46 AM

#

Hello wave, can anyone tell What exactly is Generalization error and how does it differ from train or test error ?

elfin whale Aug 7, 2022, 9:00 AM

#

quaint leaf define best ? Easiest, cheapest ?

i need best one

elfin whale Aug 7, 2022, 9:02 AM

#

elfin whale i need best one

@quaint leaf ?

quaint leaf Aug 7, 2022, 9:03 AM

#

elfin whale <@1000038429108736031> ?

I don't know what best means for you, so I cannot answer that without more info

elfin whale Aug 7, 2022, 9:04 AM

#

quaint leaf I don't know what best means for you, so I cannot answer that without more info

I mean best courses which are available

#

tell me the easiest one

quaint leaf Aug 7, 2022, 9:09 AM

#

elfin whale I mean best courses which are available

then definitely this one:
https://www.tableau.com/learn/classroom
https://www.tableau.com/tableau-training-pass
It's official from Tableau itself, and when you don't understand you can ask the teachers so that makes it easier
Normally I never recommend paid stuff actually but since you asked for the best 🙂

gray pasture Aug 7, 2022, 9:11 AM

#

Hi,
I need help with this question about clustering and finding optimal number of clusters. I would appreciate for any help.

https://datascience.stackexchange.com/questions/113303/finding-suitable-measure-for-optimal-number-of-clusters-for-the-specified-cluste

Data Science Stack Exchange

Finding suitable measure for optimal number of clusters for the spe...

Suppose we have this kind of data and the preferred clusters (not in the case of the optimal number of clusters, but the shape of clusters) here:

I achieved the exact shape of clusters using KMean...

elfin whale Aug 7, 2022, 9:31 AM

#

quaint leaf then definitely this one: https://www.tableau.com/learn/classroom https://www.ta...

Can u give me the best in paid one

quaint leaf Aug 7, 2022, 9:36 AM

#

elfin whale Can u give me the best in paid one

yes that's a paid one

elfin whale Aug 7, 2022, 9:44 AM

#

quaint leaf yes that's a paid one

ok

elfin whale Aug 7, 2022, 9:45 AM

#

quaint leaf yes that's a paid one

cheapest?

quaint leaf Aug 7, 2022, 9:48 AM

#

elfin whale cheapest?

this one is pretty complete (and free)
https://www.youtube.com/watch?v=aHaOIvR00So

thick marlin Aug 7, 2022, 10:08 AM

#

Hello, I'm trying to remove the sky from this tree picture. I have used kmeans to cluster the colors. And this is the output. Now I need to remove the sky. What would be the best way about it.

#

#

original picture

glacial wadi Aug 7, 2022, 11:42 AM

#

How can I predict 2020 values with machine learning

steady basalt Aug 7, 2022, 12:26 PM

#

Looks like regression at first glance

serene scaffold Aug 7, 2022, 12:52 PM

#

do you know what an activation function is?

#

each image goes through the whole network. what changes with each image is the output, not whether or not it goes all the way through.

#

and keep in mind that we're talking about mathematical functions, which (unlike Python functions) always have an output.

#

anyway, activation functions are non-linear functions

#

here's a great comment from Emyrs about what activation functions are for

wooden sail Aug 7, 2022, 1:07 PM

#

as for the 128, this determines how many weights and biases there are. how to pick this "well" is difficult to answer and it is often the case one has to try a couple different configurations to see which one works best

#

in estimation theory this is called the "model order" and one tries to strike a balance between having too few and too many parameters. too many means you have quite a bit of "descriptive power", but it is both difficult to tune the parameters correctly and it is easy to overfit. if you have too few parameters, you'll simply lose predictive power

#

a more in depth discussion requires quite some information theory and statistics

#

the number stands for the size of the output of that layer. that means you have an input of 28*28 and an output of 128. one dense layer is an affine transformation with a matrix of size layer input x layer output biases of size layer output, so you're telling the layer to grab the input of size 28*28, multiply it by a weight matrix W of size 28*28 x 128, and add a bias vector b of size 128. then, the relu activation function is applied

#

google happened to quickly grace me with this illustration 😛 this is exactly the same as the network you have

meager crater Aug 7, 2022, 1:44 PM

#

Hey I had a quick question about coefficient and p-value.
Based on the results I have noticed Radio is the correct answer for the first question and billboard is the answer for second; however, I am struggling to grasp the reasons for it. Could someone help out?

wooden sail Aug 7, 2022, 2:00 PM

#

that depends on which definition of p-value you are using in your course

meager crater Aug 7, 2022, 2:02 PM

#

no prior information was given or constructed

#

this is completely uncorrelated question to the previous ones

wooden sail Aug 7, 2022, 2:03 PM

#

i doubt that

#

at any rate, the standard usage of p-values is the probability of observing the measurement data under a base model, and the null hypothesis is that that base model explains the data. the smaller the value p, the more unlikely it is to observe the data under the base model, meaning that the parameters you derived are more significant and a better explanation of the data

#

that'd make p values of 0 have the interpretation of "the base model cannot explain the data at all, these new parameters should be accepted". then TV would be the most effective channel, as it has the strongest positive correl and highest significance

#

my recommendation is to review the content, it looks like you skipped something either during the course, or one of its prerequisites. be it the interpretation of p-values, or the definition they decided to use here

meager crater Aug 7, 2022, 2:08 PM

#

See that's the thing I'm confused about, due to the fact that p-value was so insiginificant in both cases kind of cancels it out and falls under the idea of <0.05 rejection, but that's what I am confused about is that Radio was more effective than TV in the answer

wooden sail Aug 7, 2022, 2:09 PM

#

so the question is, what definition of p-value did they give during the course

meager crater Aug 7, 2022, 2:09 PM

#

None, it is practice tests I'm doing with no prior information given

wooden sail Aug 7, 2022, 2:10 PM

#

well, then we can't answer the question, can we 😛 you don't have enough info

#

no way to know if the quiz is wrong, or if they were working with a different def

meager crater Aug 7, 2022, 2:12 PM

#

put in the TV at first and was given a feedback that Radio was the right answer

steady basalt Aug 7, 2022, 2:16 PM

#

Why tf me network crashing after 3 epochs

potent flame Aug 7, 2022, 2:18 PM

#

Wdum crashing

worthy phoenix Aug 7, 2022, 2:20 PM

#

hi there

#

i wanna deploy a ml inference program in rpi

#

but the thing is it seems to take too much time to give a single output

#

as we all know ml requires alot of computing power

#

so should i buy a jetson instead? or what updates can i make to the rpi to get it working

lapis sequoia Aug 7, 2022, 2:32 PM

#

I wanted to become a data scientist. However I wanted to know what modules and things would I need to master in Python to be a competent one.

serene scaffold Aug 7, 2022, 2:34 PM

#

lapis sequoia I wanted to become a data scientist. However I wanted to know what modules and ...

I have a message in the pins about the major libraries, but your understanding of data science theory would matter more than your programming ability.

lapis sequoia Aug 7, 2022, 2:34 PM

#

Hi everyone. I have a corpus of text with a continuous dependent variable, and I would like to create a model that predicts this variable based on text. Which types of ML/DL models could be used for such a task?

lapis sequoia Aug 7, 2022, 2:35 PM

#

serene scaffold I have a message in the pins about the major libraries, but your understanding o...

so then any books?

serene scaffold Aug 7, 2022, 2:36 PM

#

lapis sequoia so then any books?

I would start with the second edition of "data science from scratch"

#

!resources data science

arctic wedgeBOT Aug 7, 2022, 2:36 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

serene scaffold Aug 7, 2022, 2:36 PM

#

^ you can go to that page and filter for books

lapis sequoia Aug 7, 2022, 2:37 PM

#

Ok

meager crater Aug 7, 2022, 2:39 PM

#

Hey another question, this is easier 😄 I've tried to understand Bayes theorem, however, struggling. The answer appears to be 16.7%, but don't know how it was calculated and https://www.youtube.com/watch?v=HZGCoVF3YvM didn't help sadly.

YouTube

3Blue1Brown

Bayes theorem, the geometry of changing beliefs

Perhaps the most important formula in probability.
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/bayes-thanks
Home page: https://www.3blue1brown.com

The quick proof: https://youtu.be/U_85TaXbeIo

Intera...

▶ Play video

wooden sail Aug 7, 2022, 2:46 PM

#

you gave some info but didn't show the question

meager crater Aug 7, 2022, 2:47 PM

#

sorry, added

wooden sail Aug 7, 2022, 2:47 PM

#

so bayes' theorem says that

#

.latex $P(A \vert B) = \frac{P(B \vert A)P(A)}{P(B)}$

strange elbowBOT Aug 7, 2022, 2:48 PM

#

$latex.png$

meager crater Aug 7, 2022, 2:48 PM

#

yup where P(A) is the probability of something happening?

wooden sail Aug 7, 2022, 2:49 PM

#

here, they are asking you for P(A|B) where A is a probability the person has cancer, and B is the event your code gave the output 1

meager crater Aug 7, 2022, 2:50 PM

#

okay so in this case the probability of the person having cancer is 0.01? since the total population is 0.01?

wooden sail Aug 7, 2022, 2:54 PM

#

i don't get what you mean by "the total population is 0.01", but yes, the probability of any person having cancer is 0.01

#

let me walk you through it because this has several steps

#

let's say P(A) is the probability a person has cancer. this is 0.01

#

next, P(B) is the probability your model outputs a 1. we will deal with this later

#

then, P(B|A) is the probability your model outputs 1 when a person has cancer. we are told the model is 0.99 correct when a person has cancer, so if a person is known to have cancer, the output 1 is 99% of the time. P(B|A) = 0.99

#

we want P(A|B): if the output of the model 1, how likely is it they have cancer? we can compute that with bayes' theorem, but we need that missing value P(B)

meager crater Aug 7, 2022, 3:00 PM

#

wooden sail let's say P(A) is the probability a person has cancer. this is 0.01

due to the 1/100 having it

meager crater Aug 7, 2022, 3:02 PM

#

wooden sail we want P(A|B): if the output of the model 1, how likely is it they have cancer?...

so the top side of the equation will be 0.99 * 0.01

wooden sail Aug 7, 2022, 3:02 PM

#

correct

#

now, P(B) is the gotcha. your model can output 1 in two ways. it can be a true positive, or a false positive. a true positive happens when a person has cancer AND you detect it correctly. this is the probability of A and B happening at the same time. the probability of this happening is P(output 1 | person has cancer) * P(person has cancer).

#

we need something else here

meager crater Aug 7, 2022, 3:03 PM

#

the probability of false positive?

wooden sail Aug 7, 2022, 3:03 PM

#

.latex $P(A \cap B) = P(A \vert B) P(B)$

strange elbowBOT Aug 7, 2022, 3:03 PM

#

$latex.png$

meager crater Aug 7, 2022, 3:03 PM

#

so that will be 0.99 * 0.01?

wooden sail Aug 7, 2022, 3:03 PM

#

this is the probability of two dependent events happening together

#

right. so P(model gives 1 | person has cancer) = 0.99, as we were told. that means the probability of getting a TRUE positive is 0.99 * 0.01

#

but for the false positive, we need P(model gives 1 AND person does NOT have cancer)

#

this would be P(model gives 1 | person does not have cancer) * P(person does not have cancer)

#

we know the model is correct 95% of the time when a person does not have cancer

meager crater Aug 7, 2022, 3:05 PM

#

okay so that would be .99 -> prob of getting 1

wooden sail Aug 7, 2022, 3:05 PM

#

meager crater okay so that would be .99 -> prob of getting 1

no

#

since it's correct 95% of the time, that means it's wrong 5% of the time

#

so the missing term for a false positive is 0.05*0.99

meager crater Aug 7, 2022, 3:06 PM

#

yup that makes sense

wooden sail Aug 7, 2022, 3:06 PM

#

so we have

meager crater Aug 7, 2022, 3:07 PM

#

0.99 * 0.01 / (0.99 * 0.01 + 0.05 * 0.99)

wooden sail Aug 7, 2022, 3:07 PM

#

.latex $P(cancer \vert test = 1) = \frac{0.990.01}{0.990.01 + 0.05*0.99} = 16.6 \cdots$

strange elbowBOT Aug 7, 2022, 3:07 PM

#

$latex.png$

wooden sail Aug 7, 2022, 3:07 PM

#

i missed the % mark

meager crater Aug 7, 2022, 3:07 PM

#

ahhh okay so that will be division of the population

#

the we just extract the probability of model being incorrect in the false positive

#

let me try to flip the question and see what comes out

wooden sail Aug 7, 2022, 3:10 PM

#

idk what you mean by division of the population

meager crater Aug 7, 2022, 3:10 PM

#

so You run the model and predicted 0. What is the probaility that this person does not have Cancer?

#

that would be:

0.95 * 0.99 / (0.95 * 0.99 + 0.01 * 0.01)

wooden sail Aug 7, 2022, 3:13 PM

#

looks aight

meager crater Aug 7, 2022, 3:13 PM

#

okay great that makes a lot of sense, thanks Edd!

odd meteor Aug 7, 2022, 4:06 PM

#

Edd how do you invoke Sir Lancelot to display latex? 😊

mild dirge Aug 7, 2022, 4:09 PM

#

you just type .latex <latex stuff> I think @odd meteor

#

.latex $\sqrt{5}$

strange elbowBOT Aug 7, 2022, 4:09 PM

#

$latex.png$

odd meteor Aug 7, 2022, 4:13 PM

#

mild dirge you just type `.latex <latex stuff>` I think <@519319496868233227>

Thanks, let me try it out 😊

#

.latex VIF_{Weight} = \frac{1}{1-R_{Weight}^{2}}

strange elbowBOT Aug 7, 2022, 4:15 PM

#

Failed to render input.

View Logs

wooden sail Aug 7, 2022, 4:18 PM

#

you need .latex at the beginning of the message and $ to start and end an in-line equation environment

#

.latex so presumable this is in-line $x = 3$ but this other one makes a separate environment
\begin{align}
\phi(x) = \int f(\tau - x) d\tau
\end{align}
but i'm not sure if it'll work

strange elbowBOT Aug 7, 2022, 4:19 PM

#

$latex.png$

wooden sail Aug 7, 2022, 4:19 PM

#

oh sweet, all good

#

there's a typo but you get the idea

#

@odd meteor

odd meteor Aug 7, 2022, 4:29 PM

#

wooden sail <@519319496868233227>

Thanks. I'm still yet to get used to Latex.

strange elbowBOT Aug 7, 2022, 4:30 PM

#

Failed to render input.

View Logs

odd meteor Aug 7, 2022, 4:31 PM

#

odd meteor .latex VIF_{Weight} = \frac{1}{1-R_{Weight}^{2}}

Any idea why this isn't rendering? Where's the typo coming from

wooden sail Aug 7, 2022, 4:44 PM

#

you didn't put the $$ is my guess

#

.latex let's see

strange elbowBOT Aug 7, 2022, 4:44 PM

#

$latex.png$

wooden sail Aug 7, 2022, 4:44 PM

#

huh

#

.latex $VIF{Weight} = \frac{1}{1-R{Weight}^{2}}$

strange elbowBOT Aug 7, 2022, 4:45 PM

#

$latex.png$

wooden sail Aug 7, 2022, 4:46 PM

#

yeah latex doesn't like unexplained back slashes. \frac is not expected outside of math mode, so you need either $$ or another math environment

serene scaffold Aug 7, 2022, 4:47 PM

#

wooden sail yeah latex doesn't like unexplained back slashes. \\frac is not expected outside...

They're all unexplained if you don't know latex

odd meteor Aug 7, 2022, 4:48 PM

#

wooden sail yeah latex doesn't like unexplained back slashes. \\frac is not expected outside...

Ohh I get now... Let me try it again

#

.latex $VIF = \frac{1}{1 - R^{2}} = \frac{1}{Tolerance}$

strange elbowBOT Aug 7, 2022, 4:48 PM

#

$latex.png$

steady basalt Aug 7, 2022, 4:49 PM

#

It’s easier to just use Microsoft word equation writer

serene scaffold Aug 7, 2022, 4:51 PM

#

Just learn latex.

#

I can't stand wysiwyg

wooden sail Aug 7, 2022, 4:59 PM

#

word also accepts latex for typesetting equations

serene scaffold Aug 7, 2022, 5:27 PM

#

wooden sail word also accepts latex for typesetting equations

I like that, in the sense that word is becoming less like word.

odd meteor Aug 7, 2022, 5:33 PM

#

steady basalt It’s easier to just use Microsoft word equation writer

Yeah it's easier to use MS Word to write math equations, but writing it on JupyterLab or Jupyter Notebook requires Latex; except of course, there's now a new trick to import equation written in MS Word to Jupyter.

wooden sail Aug 7, 2022, 5:34 PM

#

this is the worst conversation ever, jupyter and word brought together

serene scaffold Aug 7, 2022, 5:40 PM

#

wooden sail this is the worst conversation ever, jupyter and word brought together

No conversation involving Emyrs can be the worst one. But those are two of my least favorite things, yes.

odd meteor Aug 7, 2022, 5:56 PM

#

wooden sail this is the worst conversation ever, jupyter and word brought together

😁 I think it's much easier for people who don't know Latex. Of course, to people with good knowledge of Latex, it's drudgery moving back and forth from Word to Jupyter.

wooden sail Aug 7, 2022, 5:56 PM

#

i can't deny that, it took me months to warm up to latex

misty flint Aug 7, 2022, 6:09 PM

#

notion allows for inline latex

#

and it changed my note-taking habits

#

DoggoKek

mild dirge Aug 7, 2022, 6:19 PM

#

We have to use it to write reports at uni, but it's pretty nice for scientific reporting with a lot of formulas

#

Still get annoyed by latex placing figures 2 pages further than you want though :/

steady basalt Aug 7, 2022, 6:21 PM

#

Yeah. Not a fan of equations in jupyter, I keep them to reports written in word

#

I don’t think many people read jupyter for anything method related right? That stuffs all reported on

grizzled verge Aug 7, 2022, 6:25 PM

#

Hey guys for Gensims most_similar function how do I get a list of just the most similar word without the float value next to them

#

I was looking through documentations trying to do it and it wasn’t working

#

#

Bottom image is my code

#

Sorry if this is a noob question bye

#

Btw *

serene scaffold Aug 7, 2022, 7:50 PM

#

grizzled verge Bottom image is my code

!code

arctic wedgeBOT Aug 7, 2022, 7:50 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

edgy whale Aug 7, 2022, 8:01 PM

#

hey, for some reason tensorflow is not properly installed and I tried uninstalling and reinstalling but I get this when I try to verify the install (in the cmd)

2022-08-07 21:34:41.553613: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...```

lapis sequoia Aug 7, 2022, 8:17 PM

#

What are some classification algorithms which work with categorical features.

#

Hmm

#

Like how

#

Wouldn't that be incorrect

#

Or as long as you encode them properly, categorical features can be used As numeric?

#

Last time someone taught me one-hot encoding.

#

Does that now allow me to use any algorithm to use, even if it's only supposed to work for numerical features?

#

Like knn

#

@lapis sequoia, @serene scaffold

odd meteor Aug 7, 2022, 8:36 PM

#

lapis sequoia What are some classification algorithms which work with categorical features.

If I get your question correctly, you mean the ML algorithms that can pretty much handle categorical data explicitly without having to preprocess or encode them to a numeric feature....

LightGBM and CatBoost does that with ease! But because I kinda love CatBoost more than LightGBM, I'll focus on 🙀-Boost

You only need to identify which feature is a categorical feature in your dataset.

You could do something like this with CatBoost


categorical_features_indices = np.where(X.dtypes != np.float)[0]

model.fit(x_train, y_train, cat_features=categorical_features_indices, eval_set=(X_val, y_val), plot = True)

steady basalt Aug 7, 2022, 8:40 PM

#

lapis sequoia What are some classification algorithms which work with categorical features.

almost all of them

#

if not all

#

use dummy features and yes you can with 0s and 1s

odd meteor Aug 7, 2022, 8:42 PM

#

lapis sequoia What are some classification algorithms which work with categorical features.

However, all ML classification algorithms can work with categorical feature. Most of them would require you explicitly preprocess your categorical features to numeric feature, while some few (like CatBoost and LightGBM) can handle such even when you don't explicitly preprocess your categorical features.

quick eagle Aug 7, 2022, 8:46 PM

#

Quick question...
df['timestamp'].diff() returns a series of timestamps(?); I have a data stream and trying to identify different 'epoch' (gaps in data). Most .diff() values are 1 sec; but every so often I get a gap of minutes/hours/days. How can I find the index of these 'jumps'?

untold bloom Aug 7, 2022, 8:52 PM

#

is_large_gap    = df.timestamp.diff().abs().gt(pd.Timedelta("1 sec"))
inds_large_gaps = is_large_gap[is_large_gap].index

Checking whether the (absolute value of) the difference is greater than 1 second; this gives a True/False Series. Then index it with itself to let only True's through. The .index will then give the indexes of the (end point of) jumps.

#

.diff on a datetime type Series gives a Series of type timedelta.

quick eagle Aug 7, 2022, 8:59 PM

#

side note: is df['timestamp'] same as calling df.timestamp? how do you distinguish between methods and columns?

serene beacon Aug 7, 2022, 8:59 PM

#

using pandas to get the duration between two dates gives me a wierd output, what can I do to get the hours that passed between those dates? ```python
import pandas as pd

df = pd.read_csv('./data/essential_info_dashborad.csv')

df['Last session begin'] = pd.to_datetime(df['Last session begin'], errors='coerce')
df['Last session end'] = pd.to_datetime(df['Last session end'], errors='coerce')

df['test'] = df['Last session begin'] - df['Last session end']

df.to_csv('./data/testingdate.csv', index=False)

quick eagle Aug 7, 2022, 9:05 PM

#

df['diff'] = (df['end']-df['start']).dt.total_seconds()

#

df['diff'] = (df['end']-df['start']).dt.total_seconds()/3600

quick eagle Aug 7, 2022, 9:10 PM

#

untold bloom ```py is_large_gap = df.timestamp.diff().abs().gt(pd.Timedelta("1 sec")) inds...

This works great!
I'm trying to add a column with a letter to designate each group/epoch. Any suggestions on how to add a 'epoch' column, and the first set is 'A', after the first large gap index, 'B', and so on?

serene beacon Aug 7, 2022, 9:10 PM

#

quick eagle df['diff'] = (df['end']-df['start']).dt.total_seconds()

Omg I was doing it backwards 🥲 thanks for the enlightenment!!

lapis sequoia Aug 7, 2022, 10:15 PM

#

does anyone know how to retrieve and add the rest of a dataframe(df1) based on two columns worth of data in another dataframe(df2) that have the same column names in both df? I tried merging(inner and left) but since there some of the values in the dfs are duplicative it messes the whole thing up. im coming from excel so im trying to fundamentally do a vlookup based on two conditions - thanks!

steady basalt Aug 7, 2022, 10:18 PM

#

https://www.youtube.com/watch?v=Lu56xVlZ40M

YouTube

Two Minute Papers

OpenAI Plays Hide and Seek…and Breaks The Game! 🤖

❤️ Check out Weights & Biases here and sign up for a free demo: https://www.wandb.com/papers
❤️ Their blog post is available here: https://www.wandb.com/articles/better-paths-through-idea-space

📝 The paper "Emergent Tool Use from Multi-Agent Interaction" is available here:
https://openai.com/blog/emergent-tool-use/

❤️ Watch these videos in ear...

▶ Play video

worthy phoenix Aug 7, 2022, 10:48 PM

#

is there any opensource implementation of gpt3?

serene scaffold Aug 7, 2022, 10:51 PM

#

worthy phoenix is there any opensource implementation of gpt3?

Even if there were, it would be incredibly expensive to train.

worthy phoenix Aug 7, 2022, 10:52 PM

#

serene scaffold Even if there were, it would be incredibly expensive to train.

with pretrained models

#

xDDD, forgot to add that

serene scaffold Aug 7, 2022, 10:52 PM

#

worthy phoenix with pretrained models

There's only one instance of gpt 3, and it's behind a paywall.

worthy phoenix Aug 7, 2022, 10:53 PM

#

sadge

worthy phoenix Aug 7, 2022, 10:53 PM

#

serene scaffold There's only one instance of gpt 3, and it's behind a paywall.

any other generative transformers which is at par with gpt3 and opensource?

serene scaffold Aug 7, 2022, 10:54 PM

#

worthy phoenix any other generative transformers which is at par with gpt3 and opensource?

There's nothing on par with gpt 3, but there is gpt 2.

worthy phoenix Aug 7, 2022, 10:54 PM

#

serene scaffold There's nothing on par with gpt 3, but there is gpt 2.

aight, link to the repo pls

#

couldnt find the gpt2 one either

serene scaffold Aug 7, 2022, 10:55 PM

#

worthy phoenix aight, link to the repo pls

I'd have to look for it. I'm sure you can.

worthy phoenix Aug 7, 2022, 10:56 PM

#

aight trying again, btw if u get free time to look for it , pls drop it to me will ya?

serene scaffold Aug 7, 2022, 10:56 PM

#

Did you try "gpt2 GitHub"

worthy phoenix Aug 7, 2022, 10:56 PM

#

serene scaffold Did you try "gpt2 GitHub"

yep but the implementations are kinda not looking ok xD

#

oh ok

worthy phoenix Aug 7, 2022, 10:57 PM

#

serene scaffold Did you try "gpt2 GitHub"

this one yeilded the official openai repo

#

nice

#

thanks

tidal bough Aug 7, 2022, 11:12 PM

#

Is there a way to make pandas's fancy integrated dataframe display break very long cells into lines?
E.g. consider

df = pd.DataFrame.from_dict(dict(a=[["hello"*30]]))

If you do pd.options.display.max_colwidth = 150, this row will be shown as a giant line. But can you make it shown fully, but broken into more than one line?

#

visual aid

#

I want this cell shown in full, but broken into two lines

lapis sequoia Aug 8, 2022, 12:43 AM

#

I'm using Pytorch to try to train a model and I'm getting this error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x25568 and 400x120)

any tips on how to fix?

#

my code:

transform = transforms.Compose([transforms.Resize((150, 200)), transforms.ToTensor()])
-----------------
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_dler, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

#

I can share more code if needed

serene scaffold Aug 8, 2022, 12:47 AM

#

lapis sequoia I'm using Pytorch to try to train a model and I'm getting this error: ```py Run...

Do you understand what the error message is telling you?

lapis sequoia Aug 8, 2022, 12:47 AM

#

I think that my images are sized incorrectly

serene scaffold Aug 8, 2022, 12:48 AM

#

lapis sequoia I think that my images are sized incorrectly

but just taking the error message by itself, do you understand the problem?

lapis sequoia Aug 8, 2022, 12:49 AM

#

I understand that it can't multiply the two matrices(my image's pixels and the weights I presume)

#

but I might be wrong

serene scaffold Aug 8, 2022, 12:49 AM

#

not even as it relates to your code. does "mat1 and mat2 shapes cannot be multiplied (10x25568 and 400x120)" mean anything to you?

serene scaffold Aug 8, 2022, 12:49 AM

#

lapis sequoia I understand that it can't multiply the two matrices(my image's pixels and the w...

well, it quite literally says that there are two matrices that can't be multiplied. do you understand why?

lapis sequoia Aug 8, 2022, 12:50 AM

#

because the row in the first matrix needs to be the same size as the column of the 2nd matrix

#

and it isn't so

serene scaffold Aug 8, 2022, 12:50 AM

#

lapis sequoia because the row in the first matrix needs to be the same size as the column of t...

if we're talking about matrix multiplication, yes. I don't actually know if we're talking about matrix multiplication or element-wise multiplication

#

so the question is, how are mat1 and mat2 created from your code?

#

we would need to see the whole traceback (ie, the whole error message) to begin to guess.

lapis sequoia Aug 8, 2022, 12:51 AM

#

sure

#

https://pastebin.com/Fi7LKkzv

serene scaffold Aug 8, 2022, 12:52 AM

#

what is net?

lapis sequoia Aug 8, 2022, 12:52 AM

#

so Net.forward() is:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

#

net is an instance of this ^

serene scaffold Aug 8, 2022, 12:52 AM

#

great. the error message shows you that the problem starts here: ---> 19 x = F.relu(self.fc1(x))

#

and then return F.linear(input, self.weight, self.bias)

lapis sequoia Aug 8, 2022, 12:53 AM

#

yes

serene scaffold Aug 8, 2022, 12:53 AM

#

so figure out what x is when you get the error

lapis sequoia Aug 8, 2022, 12:55 AM

#

I think x is the matrix of my images,

#

I don'

#

t understand what the second matrix (400x120) is

#

and I'm not certain if the first matrix is even the image

proper ingot Aug 8, 2022, 1:41 AM

#

#

How do I include the head of the dataframe (first column) in the csv

#

The code for that would be under

#

#CENTRAL DATAFRAME

lapis sequoia Aug 8, 2022, 2:13 AM

#

lapis sequoia I'm using Pytorch to try to train a model and I'm getting this error: ```py Run...

Anyone here has ideas on how to fix this?

modest onyx Aug 8, 2022, 2:51 AM

#

the error is not even in the code you gave us

#

it's probably inside net

#

oh you also gave that nvm

modest onyx Aug 8, 2022, 2:54 AM

#

lapis sequoia t understand what the second matrix (400x120) is

that's the shape of the weight matrix of the first fully connected layer

#

in other words, it's expecting inputs that are 400 dimensional vectors

#

convolutional layers don't care about the spatial size of the image, but fully connected layers do

#

so you have to make sure that your input image is the right size such that the dimentions end up matching (you can do that by cropping/resizing the image)

rapid cedar Aug 8, 2022, 3:40 AM

#

how do i make a ai that predict typos
like if i type "helo"
list = ["hello", "goodbye"] it will predict that "helo" is the closest to "hello"

soft lotus Aug 8, 2022, 4:36 AM

#

You can make a tree of the letters in your list and find words that match by using the letters of the typo until the typo doesn’t have a letter in a path on your tree then suggest the word at that node

#

@rapid cedar

brisk apex Aug 8, 2022, 6:22 AM

#

hey guys I hope this is right place to ask. I've been having hard time trying to look for real life example for usage of global temp view in spark. Has anyone used this feature, and if so, could you share your experience, like why use that over temp view.

I'm well aware of definition for global temp view which exists across all spark sessions compared to temp view which expires when session that was created ends; I just can't imagine real life case to use global temp view from the first place. What would require you to create separate session and use that global temp view instead of just using a session you already have opened with existing temp view? Thanks in advance

untold bloom Aug 8, 2022, 6:27 AM

#

quick eagle side note: is df['timestamp'] same as calling df.timestamp? how do you distingui...

hi again, sorry i went AFK shortly after

#

yes df["col_name"] and df.col_name are equivalent, except in 2 cases:
- if "col_name" isn't a valid Python identifier, it will fail
- e.g., spaces in it: "current value", starting with a number: "20th quantile", containing apostrophe: "today's"
- if it clashes with a method/attribute of a Series/DataFrame as you said
- e.g., df.sum will go to the method even if you have a column named "sum"

so, df["col name"] always works, df.col_name sometimes works. But the latter is easier to type, so if it is fine to do, i prefer it due to laziness :| It also makes code more readable IMHO when chaining things like we did above (.diff().abs()...)

untold bloom Aug 8, 2022, 6:32 AM

#

quick eagle This works great! I'm trying to add a column with a letter to designate each gro...

IIUC, you can map the cumulative sum of is_large_gap... why? here's a demonstration:

#

sample data

In [20]: df
Out[20]:
        date  sales
0 2021-12-29    300
1 2021-12-27    100
2 2021-12-30    100
3 2021-12-31    300
4 2021-12-28    200
5 2022-01-03    500
6 2022-01-02      0
7 2022-01-01    200

#

above code applied to get is_large_gap, except the threshold being 1 day here

In [21]: is_large_gap = df.date.diff().abs().gt(pd.Timedelta("1 day"))

In [22]: is_large_gap
Out[22]:
0    False
1     True
2     True
3    False
4     True
5     True
6    False
7    False
Name: date, dtype: bool

#

so what I understood is, you want to start with some letter, say "A", and keep it going as long as "no large gap"; once it hits a large gap, change it to "B", i.e., the successive letter. And do this until the end.

rapid cedar Aug 8, 2022, 6:37 AM

#

soft lotus <@820199993452265472>

like for i in len(typo):
typo[:1] match for i in range len(possible_typo)
match with possible typo[:1]?

untold bloom Aug 8, 2022, 6:37 AM

#

to that end, we can use .cumsum()... This takes the cumulative sum: let's see what it would do a Boolean series: if it sees a True, the accumulated sum then-far is increased by 1 (True is 1 in numeric context). So this is good: when it hits a large gap point, it will change its value. If, on the other hand, it sees a False, the accumulated sum then-far won't change! (because False is 0 in numeric context). This too is good: when it hits a not-large gap, it won't change its value.

#

well this gives this:

In [23]: is_large_gap.cumsum()
Out[23]:
0    0
1    1
2    2
3    2
4    3
5    4
6    4
7    4
Name: date, dtype: int32

#

see how it stays the same when it hits False's?

#

now all we need is a mapper: 0 -> A, 1 -> B, ...

#

assuming it won't exceed 25, we can use the ASCII alphabet :p

#

so here we go map:

In [24]: import string

In [25]: mapper = dict(enumerate(string.ascii_uppercase))

In [26]: mapper
Out[26]:
{0: 'A',
 1: 'B',
 2: 'C',
 3: 'D',
 4: 'E',
 ...
 23: 'X',
 24: 'Y',
 25: 'Z'}

In [27]: is_large_gap.cumsum().map(mapper)
Out[27]:
0    A
1    B
2    C
3    C
4    D
5    E
6    E
7    E
Name: date, dtype: object

#

note that you'll get NaNs instead of letters in the output if the value you're mapping is not in the mapper

#

e.g., if there's 47 to map, since it's not in mapper, it will put NaN in the result.

#

range(is_large_gap.sum()) will give you the range of numbers to cover in your mapping ('s keys).

untold cradle Aug 8, 2022, 7:50 AM

#

Hi guys, im not sure where should i ask this,
but im looking for a project idea (something like AlgoTrading).
i feel like im intermediate - advanced in python

Any idea, or help would be appreciated!

strong sedge Aug 8, 2022, 8:39 AM

#

I am taking machine learning classes (high level, only application), the teacher said that rather than learning about the working of the algorithm (gradient decent and OLS) I should just learn to use it
is this correct ? should I attempt to understand/learn it myself ?

wooden sail Aug 8, 2022, 8:40 AM

#

you should at least have some notion of when and how it makes sense to use it

#

OLS is not always the best estimator, and gradient descent doesn't always make sense to use. even when it DOES make sense to use grad des, it only converges under special conditions

#

and WHAT it converges to is another matter

strong sedge Aug 8, 2022, 8:59 AM

#

wooden sail you should at least have some notion of when and how it makes sense to use it

okok, makes sense

serene steeple Aug 8, 2022, 10:13 AM

#

what is the google service that provides computing power for machine learning called

steady basalt Aug 8, 2022, 10:17 AM

#

proper ingot How do I include the head of the dataframe (first column) in the csv

anything in a csv is included anyways

#

anyone know how to deal with such noisy data that my minority class precision is almost 0?

wooden sail Aug 8, 2022, 11:31 AM

#

serene steeple what is the google service that provides computing power for machine learning ca...

google colab?

steady basalt Aug 8, 2022, 12:05 PM

#

serene steeple what is the google service that provides computing power for machine learning ca...

Google cloud

young ridge Aug 8, 2022, 12:46 PM

#

#

hi guys im a new data science student and i was wondering what do i call those values that are above 600

#

do i call them anomalies or outliers?

wooden sail Aug 8, 2022, 12:48 PM

#

anomaly usually refers to a behavior that is caused by something external and is not part of the underlying distribution. extreme values are part of the original distribution, just very unlikely

young ridge Aug 8, 2022, 12:48 PM

#

Ohhhhhh

wooden sail Aug 8, 2022, 12:48 PM

#

to be able to tell which of the two it is, some kind of anomaly detection entails

#

in some cases the difference doesn't matter and they're treated as synonyms, so more context is needed 😛

young ridge Aug 8, 2022, 12:51 PM

#

is it logical to remove any values above 400 or 500?

serene scaffold Aug 8, 2022, 12:51 PM

#

wooden sail anomaly usually refers to a behavior that is caused by something external and is...

one time, a non-technical person asked me to do anomaly detection. and they weren't very clear on what an anomaly was in the context of their data. so I asked a senior coworker, and he said

"anomaly detection. they don't know what an anomaly is--wouldn't know if it bit them. but it's a buzzword. everyone and their grandma is doing it--that's probably who told them about it."

"sure, but what would an anomaly be in this context?"

"you know, you're asking the right questions."

wooden sail Aug 8, 2022, 12:52 PM

#

sounds about right

young ridge Aug 8, 2022, 12:52 PM

#

if i were to do a .describe() on my departure delay in minutes column, the 3rd iqr only has a value of 12 minutes

#

#

does this count as an anomaly dection?

young ridge Aug 8, 2022, 12:53 PM

#

serene scaffold one time, a non-technical person asked me to do anomaly detection. and they were...

you have a point

tropic matrix Aug 8, 2022, 12:53 PM

#

I'm solving a regression problem, but when i calculate the MSE for my loss it becomes "inf". I believe this is because my output data ranges from 1 to 2.147 billion, but what should i do to solve it?
(using keras)
i have rmse as a metric, and that displays a valid number, but when the loss is initially high it's impossible for it to be displayed as mse,
so is it a viable solution to set objectives (like for early stopping, reducing lr, and hyper parameter tuning) to be the validation rmse?

wooden sail Aug 8, 2022, 12:53 PM

#

as a buzzword, sure. but since you're likely not trying to find how many distributions are needed to correctly describe your data without overfitting (a model order estimation problem), it sounds like you're just looking to remove outliers. you can make a histogram and see what probability distribution fits it best, then remove extreme values based on that

young ridge Aug 8, 2022, 12:55 PM

#

wooden sail as a buzzword, sure. but since you're likely not trying to find how many distrib...

im currently doing the data cleaning process of my whole project and later on im going to do linear and logistic regression

#

so im kind of stuck on whether i should remove them or not

wooden sail Aug 8, 2022, 12:56 PM

#

probably so if the entries skew the distribution

young ridge Aug 8, 2022, 12:56 PM

#

ohhhhh

wooden sail Aug 8, 2022, 12:56 PM

#

for example OLS regression is only optimal if the distribution truly is normal

#

so check the histogram

young ridge Aug 8, 2022, 12:57 PM

#

wooden sail so check the histogram

alright ill try

#

thank you for the help by the way

wooden sail Aug 8, 2022, 12:58 PM

#

if you're willing to use maximum likelihood based on the sample covariance instead of assuming normality, you'll get something often called "mahalanobis distance" instead of the ordinary least squares

#

would be nice to compare if you have some time to spend

young ridge Aug 8, 2022, 12:58 PM

#

ah ill try

#

i just learnt OLS regression so ill have to give that a try first

tropic matrix Aug 8, 2022, 1:02 PM

#

tropic matrix I'm solving a regression problem, but when i calculate the MSE for my loss it be...

@wooden sail are you able to possibly assist me?

young ridge Aug 8, 2022, 1:03 PM

#

wooden sail probably so if the entries skew the distribution

what happens if my histogram is skewed?

wooden sail Aug 8, 2022, 1:06 PM

#

tropic matrix <@467435887236612106> are you able to possibly assist me?

i'm pretty sure i suggested you rescale/normalize your data to start with

wooden sail Aug 8, 2022, 1:07 PM

#

young ridge what happens if my histogram is skewed?

try with 100 bins, this plot is deceptive

tropic matrix Aug 8, 2022, 1:08 PM

#

wooden sail i'm pretty sure i suggested you rescale/normalize your data to start with

I tried doing that beforehand using sklearn's minmaxscaler, but i ended up having this weird issue where it wouldn't predict any value lower than 103,000, is there another encoding method i should use to scale the y data?

#

i already normalize my x data btw

wooden sail Aug 8, 2022, 1:09 PM

#

the y as well. it also sounds like you have some overfitting

young ridge Aug 8, 2022, 1:09 PM

#

tropic matrix Aug 8, 2022, 1:10 PM

#

wooden sail the y as well. it also sounds like you have some overfitting

what encoding method should i use for scaling the y data?

young ridge Aug 8, 2022, 1:10 PM

#

after chaging the bins there wasnt much difference and it remained skewed

wooden sail Aug 8, 2022, 1:14 PM

#

young ridge

how many data points do you have?

young ridge Aug 8, 2022, 1:14 PM

#

101695 rows and 24 columns

wooden sail Aug 8, 2022, 1:15 PM

#

try 500 bins for one last look at the data

young ridge Aug 8, 2022, 1:17 PM

#

i think 100 is the max it can go 😆

#

if not the bins would be very very thin

#

lemme try removing some outliers

#

wooden sail Aug 8, 2022, 1:20 PM

#

well but that's already ok

#

now the question is, which distribution does this look like

young ridge Aug 8, 2022, 1:21 PM

#

usually for OLS regression it would be logical for the column to be normally distributed right?

#

technically this is right skewed

wooden sail Aug 8, 2022, 1:21 PM

#

that's what you normally assume, but that evidently isn't the case

young ridge Aug 8, 2022, 1:21 PM

#

Ohhhhhhhh

wooden sail Aug 8, 2022, 1:21 PM

#

so your options are: DON'T use OLS and compute the sample covariance

#

or manipulate the data and use OLS

#

that was the point of this exercise

#

but it may be the case that the data never looks normally distributed no matter where you draw the cutoff point

#

poisson, gamma, and chi-squared distributions all kinda look like what you have. and indeed, these have degenerate cases in which they approach the gaussian distribution

young ridge Aug 8, 2022, 1:24 PM

#

alright thank you

#

i know what to do from here

#

thanks

wooden sail Aug 8, 2022, 1:24 PM

#

you can try and make a box and whiskers plot and see where the mean and median are, and use this info to make a cutoff decision

#

aight, best of luck

frail dune Aug 8, 2022, 1:52 PM

#

Hey guys not sure if this is the right place to ask but could anybody here help me generalize this Matrix in a loop im kinda stuck

#

if for example here K1 = {0, 2}, K2 = {0}, K3 = {3}, K4 = {1, 3}

#

not sure how to build the Matrix so the rows equal the amount of Elements inside each K[i]

wooden sail Aug 8, 2022, 1:57 PM

#

as i understand it, the first column contains the elements of the K_i. the second column is just the value of i?

frail dune Aug 8, 2022, 2:01 PM

#

the first column(first row) is the Value of the first Element in K1, the first column(row 2) is the second Value of K1

#

the second column stands for the the index after K

#

and the third columns are the constraints (a(x) if its K1, b(x) if its K2, M(x) if its K3 and Q(x) if its K4

wooden sail Aug 8, 2022, 2:07 PM

#

this is my artistic interpretation. you can add the bit about a,b,M and Q yourself, i think

#

In [1]: import numpy as np

In [2]: K = [[0,2], [0], [3], [1,3]]

In [3]: N = 0

In [4]: for k in K:
   ...:     N += len(k)
   ...: 

In [5]: B = np.zeros((N,3))

In [6]: row = 0

In [7]: for i,k in enumerate(K):
   ...:     B[row:row+len(k),0] = k
   ...:     B[row:row+len(k),1] = i
   ...:     row += len(k)
   ...: 

In [8]: B
Out[8]: 
array([[0., 0., 0.],
       [2., 0., 0.],
       [0., 1., 0.],
       [3., 2., 0.],
       [1., 3., 0.],
       [3., 3., 0.]])

#

ah, it was supposed to be i+1

frail dune Aug 8, 2022, 2:08 PM

#

ty very much this line 7 was killing me

frail dune Aug 8, 2022, 2:55 PM

#

wooden sail ```py In [1]: import numpy as np In [2]: K = [[0,2], [0], [3], [1,3]] In [3]: ...

is there a possibility to change a number out of K to for example -> n ?

#

if its K = [[0,2], [0], [n], [1,3]]

wooden sail Aug 8, 2022, 2:56 PM

#

if n is already defined, yes

#

do you need the matrix to be variable?

#

your options are to make this into a function and pass in specific values, or use sympy/symengine

frail dune Aug 8, 2022, 2:57 PM

#

okay i guess i'll just leave that then 😄 not sure if I even need that was just a thought

#

thank you

wooden sail Aug 8, 2022, 2:58 PM

#

the code i wrote there works regardless of what is inside K, as long as it is defined and K is a list of lists of floats

serene scaffold Aug 8, 2022, 4:13 PM

#

there are libraries that do this. have you googled "python speech recognition library"?

#

being a liturgical language, if you had to create your own speech recognition model, how much training data would be available?

#

you would need lots of examples of Sanskrit speech audio and transcripts. and you would probably need to align those in the time dimension.

steady basalt Aug 8, 2022, 4:25 PM

#

Tfw the data won’t allow for a working model for my thesis, fuck

#

What a disaster

abstract sentinel Aug 8, 2022, 4:56 PM

#

Hey, I might be in a little wrong section, but I think some of you guys could give me ideas.
I currently work in a high school as a physics teacher, and I'm considering replacing slides with colab notebooks. One of the main reasons, is use of interactive demos with widgets. I learned how to make interactive graphs which is great, however I was wondering if somebody could suggest any other cool demos/libraries. I would kill for a library where I could create simple interactive animations, as I'm already using manimCE to create short animations.
Any directions/suggestions are welcome!

desert bear Aug 8, 2022, 5:14 PM

#

is there already a name for a ML program that has a model with lots of sentences and entity's. and classifies a input sentence with the data in the model and returns the entity?

limber token Aug 8, 2022, 5:23 PM

#

Hey guys, how can I transpose one column to another df while matching another column? Visual example:

#

Without having to iterate through the df using df.at, that is

serene scaffold Aug 8, 2022, 5:38 PM

#

limber token Hey guys, how can I transpose one column to another df while matching another co...

it looks like ATTR1 has different kinds of data in df1 and df2, so you need to change the name of one of them. but you should just be able to merge the two dataframes, and then sort the result on the SKU column.

if what you want to do is more intricate than that, your example does not illustrate it.

limber token Aug 8, 2022, 5:39 PM

#

serene scaffold it looks like ATTR1 has different kinds of data in df1 and df2, so you need to c...

They're the same kind of data, just different values, my question is, if I use merge, if df1 has the SKUs in this order: 1, 2, 3, 4, 5... and df2 has the SKUs in this order: 5, 2, 3, 4, 1, will it merge them by matching the SKUs or will it just do it in their original orders?

#

Sorry if I wasn't clear

serene scaffold Aug 8, 2022, 5:41 PM

#

limber token They're the same kind of data, just different values, my question is, if I use m...

when you do the merge method, you say that the SKU column is the one that you want to use to do the linking.

#

!docs pandas.DataFrame.merge

arctic wedgeBOT Aug 8, 2022, 5:41 PM

#

pandas.DataFrame.merge


DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)```
Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Warning

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

serene scaffold Aug 8, 2022, 5:41 PM

#

it might be that you need to merge on both SKU and ATTR1.

by the way, what pandas calls a "merge" is a join in SQL terminology.

iron basalt Aug 8, 2022, 5:43 PM

#

abstract sentinel Hey, I might be in a little wrong section, but I think some of you guys could gi...

Desmos is really nice: https://www.desmos.com/calculator/fxhu08lai8

Desmos

Desmos | Graphing Calculator

abstract sentinel Aug 8, 2022, 6:02 PM

#

iron basalt Desmos is really nice: https://www.desmos.com/calculator/fxhu08lai8

I'm looking for something to use inside colab (so anything that works on jupyter notebook)

limber token Aug 8, 2022, 6:04 PM

#

abstract sentinel Hey, I might be in a little wrong section, but I think some of you guys could gi...

Wdym by interactive animations? You mean drag a point in a graph function?

wooden sail Aug 8, 2022, 6:08 PM

#

matplotlib sliders can give you some level of interaction with your plots https://matplotlib.org/stable/gallery/widgets/slider_demo.html rather basic though

lapis sequoia Aug 8, 2022, 6:25 PM

#

Is it fine to use clustering algorithms even if you have labels?

#

But instead to use the labels for validation

abstract sentinel Aug 8, 2022, 6:33 PM

#

limber token Wdym by interactive animations? You mean drag a point in a graph function?

To be honest anything. I did some graphs + sliders stuff. For example how tragectory of a ball depends on initial velocity.

misty flint Aug 8, 2022, 6:41 PM

#

manim

wooden forge Aug 8, 2022, 6:56 PM

#

Hi there, I have a small issue with importing a data set, the CSV file contains : Team 1; Team 2; Score Team 1; Score Team 2, so str; str; int; int. But I can't get to import it in numpy.
I used this command python np.genfromtxt(filepathEast,skip_header=1,usecols = (0, 1)) but I get an error message for each line Line #2 (got 1 columns instead of 2). So yeah struggling a bit, I haven't touched python for a month and half by now

lapis sequoia Aug 8, 2022, 7:46 PM

#

Is this a really bad model?

lapis sequoia Aug 8, 2022, 7:47 PM

#

wooden forge Hi there, I have a small issue with importing a data set, the CSV file contains ...

perhaps if you import it through pandas first and then turn score team 1 and 2 into numeric values? could be a temporary solution but im not that versed

#

does anyone know how to recursively merge/concat a temp df into a master_df. I have a temp df that is created for every id, and it calculates the difference between row values in a column. Im trying to bring that column to the master_df so I can track the differences in the master_df. <- for python

desert oar Aug 8, 2022, 8:07 PM

#

lapis sequoia Is this a really bad model?

yes, it's significantly worse than always guessing "no churn", which should give you 86.2% accuracy since "no churn" is 86.2% of the data. see https://towardsdatascience.com/calculating-a-baseline-accuracy-for-a-classification-model-a4b342ceb88f and https://machinelearningmastery.com/dont-use-random-guessing-as-your-baseline-classifier/

lapis sequoia Aug 8, 2022, 8:11 PM

#

desert oar yes, it's significantly worse than always guessing "no churn", which should give...

What can I do then

#

To make it better

#

Because my train data also has tons of negative labels

#

No churn labels I mean

thick marlin Aug 8, 2022, 8:22 PM

#

Can someone help explain the tensorboard training logs generated for GAN network

#

Are these good or bad? How do I compare ?

#

mellow vapor Aug 8, 2022, 8:30 PM

#

does treating a column dtype as categorical instead of object dtype make any difference?
apart from the space optimization does it have any other impact?

wooden forge Aug 8, 2022, 8:45 PM

#

Would any of you have food ressources about scores prediction with neural network? I can't find something satisfying online

warped osprey Aug 8, 2022, 8:57 PM

#

I have a simple pandas question... I want to fill in the NaN values for the code column by referring to the dict below... how can I do that?

Screen_Shot_2022-08-08_at_1.55.17_PM.jpg

steady basalt Aug 8, 2022, 8:58 PM

#

lapis sequoia Is this a really bad model?

Just show the precision recall

steady basalt Aug 8, 2022, 8:59 PM

#

lapis sequoia Because my train data also has tons of negative labels

Use a class weight or resample

#

I opt to use class weight atm

#

Easier one liner code

#

Does the same shit but you don’t lose info or gain noise

#

I just got my thesis results and the model has 0.06 precision in predicting the positive class. Do u think there’s any redeeming thing I can write so to not get bad grade? It’s the datas fault

mint palm Aug 8, 2022, 9:03 PM

#

is it possible to overfit to data whiile using naive bayes?

wooden sail Aug 8, 2022, 9:10 PM

#

mint palm is it possible to overfit to data whiile using naive bayes?

yes

#

for completeness: naive bayes is a probabilistic model. you pair this is a "rule" to make an estimator: a statistic with which you evaluate whether a model is good. maximum posterior likelihood is common here (also called maximum a posteriori)

#

you then bundle this up with 2 other things: an actual model, which is your network, and an optimizer, which optimizes the model parameters based on the estimator

#

the model, optimizer, and/or available data might not be good enough to correctly implement the estimator

mint palm Aug 8, 2022, 9:16 PM

#

wooden sail yes

but what are the hyper parameters?

#

isnt sigma and mean all fixed? also in naive we consider variable to independent and disjoint.

wooden sail Aug 8, 2022, 9:18 PM

#

so you're specifically using a gaussian model? also no, the means and variances of the gaussian model are computed per class

#

and yes, you consider each mean and variance per vector in a class to be disjoint and dependent only on the class

#

so each class has its own mean vector and covariance matrix

mint palm Aug 8, 2022, 9:20 PM

#

wooden sail so you're specifically using a gaussian model? also no, the means and variances ...

it i dont do gaussian, wont there be even less params?
yes we compute it seperately but they are fixed for each vector, right?

#

i dont get, what are we optimising

wooden sail Aug 8, 2022, 9:20 PM

#

they are fixed, but they are computed from the data. and i guess a gaussian is about as easy as it gets

mint palm Aug 8, 2022, 9:20 PM

#

can we further nudge sigma and mean?

wooden sail Aug 8, 2022, 9:21 PM

#

wdym nudge

mint palm Aug 8, 2022, 9:21 PM

#

optimize

wooden sail Aug 8, 2022, 9:21 PM

#

they're already computed from the data

#

you solve an optimization problem to find them

#

and once those are learned, what the estimator does is optimize for the class of each sample you feed it by maximizing the posterior likelihood

#

so it's optimizing 2 things

#

the parameters of the gaussian distribution of each class first, as parameters of the model during training

#

then those params are used to infer the classes of examples you feed it when using the trained model

#

if you're using neural networks for the inference, then you can consider learning the parameters of the gaussian distributions a "pre-learning", and then you feed labeled exampled so that the network trains the weights of the inference network that predicts the class

#

i formulate it this way because all you said was naive bayes with gaussian model. that's barely enough to describe the estimator, not how you're implementing it

mint palm Aug 8, 2022, 9:33 PM

#

i sort of get it, but sort of difficult to understand how optimisation will be take place.

wooden sail Aug 8, 2022, 9:33 PM

#

optimization of what?

mint palm Aug 8, 2022, 9:34 PM

#

wooden sail if you're using neural networks for the inference, then you can consider learnin...

the thing during fine-tuning as per this

#

i will look at code, it will clear things

wooden sail Aug 8, 2022, 9:35 PM

#

i would think it's usually backwards, but hopefully that helps you

trail adder Aug 8, 2022, 11:29 PM

#

Hi!! Do any of you know about data scrubbing?

merry wadi Aug 9, 2022, 12:30 AM

#

Is there anything wrong with using a model to make predictions then incorporating the predictions into another model?

magic dune Aug 9, 2022, 1:05 AM

#

why do people use -1 and 1 for perceptron?

weary crown Aug 9, 2022, 2:31 AM

#

magic dune why do people use -1 and 1 for perceptron?

1 is the max value the weighted sum can be

magic dune Aug 9, 2022, 2:32 AM

#

weary crown 1 is the max value the weighted sum can be

thx

barren snow Aug 9, 2022, 3:13 AM

#

Hey guys, I couldn't understand the meaning and the calculation in this section. Could sb explain to me, I appreciate it!!!

#

iron basalt Aug 9, 2022, 3:16 AM

#

barren snow Hey guys, I couldn't understand the meaning and the calculation in this section....

Adding up the bins, and then averaging those sums across the N frames.

#

.latex $\frac{1}{N}\sum_{i=1}^{N}...$ is often an averaging.

strange elbowBOT Aug 9, 2022, 3:18 AM

#

$latex.png$

barren snow Aug 9, 2022, 3:35 AM

#

Thanks! So what's the meaning of bins here?

wooden sail Aug 9, 2022, 4:34 AM

#

bins usually refers to frequency bins, but it's hard to say without further context

celest patrol Aug 9, 2022, 5:16 AM

#

Why are pd.join/merge/concat/combine so cursed

barren snow Aug 9, 2022, 5:22 AM

#

wooden sail bins usually refers to frequency bins, but it's hard to say without further cont...

Got it!

wooden sail Aug 9, 2022, 5:25 AM

#

in general bins refer to reference values for the discretization of a domain, so do make sure you read the explanations given before that equation to ascertain whether it truly referred to the spectrum

lapis sequoia Aug 9, 2022, 6:27 AM

#

steady basalt I opt to use class weight atm

Will that change the way confusion matrix looks?

#

And can you tell me how to use class weights

lapis sequoia Aug 9, 2022, 7:16 AM

#

Name: City, Length: 200000, dtype: category
Categories (7489, object): ['Abbeville', 'Abbotsford', 'Abbottstown', 'Aberdeen', ..., 'Zumbro Falls', 'Zumbrota', 'Zuni', 'Zwingle'

#

7489 sounds like a lot of Categories to use for a categorical variable, but given the length of 200k it doesn't seem large enough, so it seems fine?

bold timber Aug 9, 2022, 7:53 AM

#

Hi, I have a question: What the meaning of "lambda: input_fn(train, train_y, training=True)"?

Why after lambda is not any variable like "lambda x: x**2"? Why is only "lambda: " ?

#

"input_fn" is input function

untold bloom Aug 9, 2022, 7:59 AM

#

bold timber Hi, I have a question: What the meaning of "lambda: input_fn(train, train_y, tra...

hi, it means the function does not take any arguments; equivalent to this in functionality:

def f():
    return input_fn(train, train_y, training=True)

except yours has no name f

wooden sail Aug 9, 2022, 7:59 AM

#

if train, train_y and training are defined and visible in the same scope as the lambda function, then it will call the function on those parameters

unique flame Aug 9, 2022, 7:59 AM

#

So I'm trying to print a pandas tabel and doing that by saving it to a csv file and opening it in excel. At first everything seems fine, but now the decimal separator of the value changed. If a value is 50.650 cm, it will now just write 50650. Any one know a way to fix this? I already tried in options>Advance.

bold timber Aug 9, 2022, 8:01 AM

#

ok thank you so much! @untold bloom @wooden sail

bold timber Aug 9, 2022, 8:12 AM

#

untold bloom hi, it means the function does not take any arguments; equivalent to this in fun...

whether the form of input should be a function when we want to training the model in tensorflow?

untold bloom Aug 9, 2022, 8:16 AM

#

bold timber whether the form of input should be a function when we want to training the mode...

sorry i didn't understand, can you rephrase

bold timber Aug 9, 2022, 8:21 AM

#

Sorry, my bad. I miss understanding @untold bloom

bold timber Aug 9, 2022, 8:32 AM

#

untold bloom hi, it means the function does not take any arguments; equivalent to this in fun...

Sorry, I have a question again: Why the result is different?

#

this is my function of 'input_fn'

untold bloom Aug 9, 2022, 8:34 AM

#

in the first one you're calling the function with parens ()

#

in the second one you're not calling it; only referring to the function itself

#

if you do (lambda: input_fn(train_y, training=True))(), they will do the same thing

#

f is a function; f() is calling it

#

lambda: ... is a function; (lambda: ...)() is calling it

steady basalt Aug 9, 2022, 8:36 AM

#

lapis sequoia Will that change the way confusion matrix looks?

Uhh… yes ?

#

Have you thought about how models make predictions and what this means for confusion matrix

bold timber Aug 9, 2022, 8:38 AM

#

untold bloom `lambda: ...` is a function; `(lambda: ...)()` is calling it

Thank you so much for the explanation. You makes me understand now!

mellow vapor Aug 9, 2022, 9:19 AM

#

how to check for cross validation using a model like isolation forest?

young ridge Aug 9, 2022, 9:33 AM

#

#

Hi I recently tried creating a linear regression model using statsmodels.api and I tried creating a scatter plot using matplot lib. After using the train_test_split function i ran into this error

#

#

is there any way i can make both x and y train the same size?

vast goblet Aug 9, 2022, 9:36 AM

#

Hello there, I’ve transaction dataset, my goal is to find rules. So i decided to use FP-growth algorithm which is an association rule algorithm, but my minimum support is like 0.01% which is so low for 55k transactions.

What can I do to fix this?

bold timber Aug 9, 2022, 10:40 AM

#

What the meaning of * * in .format(**eval_result)?

#

why it calculated power of eval_result?

steady basalt Aug 9, 2022, 10:47 AM

#

if df_concat_noid['42006-0.0'][i] != np.datetime64('1970-01-01T00:00:00.000000000'):

#

anyone know why i cannot do this

steady basalt Aug 9, 2022, 12:06 PM

#

hey guys... does anyone know whether it is fair practise to drop NA from oNLY the test set, because while I can conditionally impute in the training data, the massive data imbalance means that imputing at all in the test set will misguide models to predicting towards the majority class

#

so, train test split, impute training data, and dropping any rows with NA from testing data to only test on complete samples

serene scaffold Aug 9, 2022, 12:10 PM

#

celest patrol Why are pd.join/merge/concat/combine so cursed

They're not. That was easy

serene scaffold Aug 9, 2022, 12:12 PM

#

steady basalt ```if df_concat_noid['42006-0.0'][i] != np.datetime64('1970-01-01T00:00:00.00000...

How do you know that you can't

untold bloom Aug 9, 2022, 12:46 PM

#

young ridge Hi I recently tried creating a linear regression model using statsmodels.api and...

hi, as it stands it is as if you're trying a 5D scatter plot - 4 columns of X + the y values; since matplotlib (or any other tool i guess) is uncapable of that, it flattens the input arrays to attempt a 2D plot. But then the sizes do not match. What you can do includes plt.scatter(x_val[:, 0], y_val), for example; it selects the 0th column first, plots its scatter against y_val.

untold bloom Aug 9, 2022, 12:50 PM

#

bold timber What the meaning of * * in .format(**eval_result)?

one use of ** is indeed exponentation; but that's when it's an infix operator (i.e., takes 2 operands like 5 ** 7); here it's a prefix operator (i.e., unary operator; cares about only what's after) and it does some other thing. (another example is -: when used like 12 - 9, it subtracts; when used as -7, though, it negates.)

What unary ** does is to "unpack" its mapping operand. Presumably eval_result returns a dictionary. Then that dictionary's key-value pairs are passed as keyword arguments to the .format function. Say it returned a dict {"accuracy": 0.77}. Then with .format(**eval_res), it is as if we we wrote .format(accuracy=0.77).

slender wren Aug 9, 2022, 12:52 PM

#

Hi, I am not sure which channel I should send this message to, but I wanted to know if there are any good websites for challenges in Python. By challenges, I don't mean like a competition, but something which will help practise basic concepts like loops, lists, functions, etc., using libraries like Pandas, Matplotlib, Numpy etc. If you know of any such website, please let me know.

timid hollow Aug 9, 2022, 12:57 PM

#

slender wren Hi, I am not sure which channel I should send this message to, but I wanted to k...

There are things like 100-numpy-exercises, 100-pandas-exercises, etc. on GitHub. Is that what you’re looking for? If you don’t get basic Python concepts like loops and built-in data structures, I recommend mastering those first.

lapis sequoia Aug 9, 2022, 1:21 PM

#

steady basalt Have you thought about how models make predictions and what this means for confu...

Yes. But I don't know much about how class weights work

bold timber Aug 9, 2022, 1:30 PM

#

untold bloom one use of `**` is indeed exponentation; but that's when it's an infix operator...

Thank you so much for the explanation!

untold bloom Aug 9, 2022, 1:31 PM

#

you're welcome!

delicate tendon Aug 9, 2022, 2:03 PM

#

Hi there I had a Q on imbalanced data

#

I am predicting if an animal gets rehomed from a shelter. In general my data is only slightly imbalanced (60:40 split), however by animal type it is extremely imbalanced (like 5% of birds get adopted, 90% of dogs get adopted). Is this an issue? What are ways I should solve it?

#

#

These are the average proportions of animal types that get rehomed

lapis sequoia Aug 9, 2022, 3:20 PM

#

Guys, if Boolean variables can be handled by the Algorithms which only work with numerical distance. Then what's the point of calculating a jaccard similarity for Boolean features.

#

In my last assignment I dropped the Boolean features before feeding them to my knn classifier and converted them into a feature named jaccard.

#

But now people say that all of the Algorithms can handle categorical features with one hot encoding. And a Boolean column is already sort of one hot encoded.

steady basalt Aug 9, 2022, 3:24 PM

#

What do u mean by that

lapis sequoia Aug 9, 2022, 3:27 PM

#

Like

#

High BP:
1
0
0
0
1
1

#

Can I put it in directly to knn?

#

I last time transformed 5 of such Boolean features to a jaccard feature.

#

Which basically calculated the percentage of 1's for each row

#

So if it's 5 features.
1,1,0,0,1
Jaccard was 60% for that row.

steady basalt Aug 9, 2022, 3:32 PM

#

What is a jaccard “feature”

#

That’s a binary feature ?

#

Jaccard index measured between values

lapis sequoia Aug 9, 2022, 4:09 PM

#

steady basalt What is a jaccard “feature”

Told you

lapis sequoia Aug 9, 2022, 4:09 PM

#

lapis sequoia So if it's 5 features. 1,1,0,0,1 Jaccard was 60% for that row.

.

#

5 features having values 1,1,0,0,1 gets transformed into 60% jaccard index.

#

As a feature.

lapis sequoia Aug 9, 2022, 4:26 PM

#

delicate tendon I am predicting if an animal gets rehomed from a shelter. In general my data is ...

Well, if you think about it the animal type is a feature that is related to the probability of it being adopted or not.

#

5% of birds are adopted and 90% the dogs are, so the animal being a bird or a dog is something that should be taken into consideration while fitting the model.

#

So no, you should not balance for each animal type.

delicate tendon Aug 9, 2022, 4:28 PM

#

Fair point, ty

haughty pewter Aug 9, 2022, 4:47 PM

#

while doing clustering, if a scatterplot ends up looking like this, it's safe say that both columns have no correlation and thus should be disregarded, is that correct?

serene scaffold Aug 9, 2022, 4:49 PM

#

haughty pewter while doing clustering, if a scatterplot ends up looking like this, it's safe sa...

looks like you need to downsample. it might be that there's actually a curve where the plot is "thicker", but that it's impossible to see because there's so many points.

steady basalt Aug 9, 2022, 4:58 PM

#

haughty pewter while doing clustering, if a scatterplot ends up looking like this, it's safe sa...

Lmao looks like that African flag

steady basalt Aug 9, 2022, 4:58 PM

#

lapis sequoia 5 features having values 1,1,0,0,1 gets transformed into 60% jaccard index.

I don’t rly understand that

lapis sequoia Aug 9, 2022, 5:17 PM

#

steady basalt I don’t rly understand that

Hmm

#

Leave it. I just have to write a report. I will write silly stuff in it

#

https://c.tenor.com/CrcbFRrvFCMAAAAM/monkey-calculate.gif

#

Is this channel pretty active, or are there better discords for AI questions?

rough mountain Aug 9, 2022, 5:54 PM

#

Any good datasets of free to use images? They don't have to be labeled or anything, just images.

serene scaffold Aug 9, 2022, 5:58 PM

#

lapis sequoia Is this channel pretty active, or are there better discords for AI questions?

It's one of the most active channels on this server.

serene scaffold Aug 9, 2022, 5:58 PM

#

rough mountain Any good datasets of free to use images? They don't have to be labeled or anythi...

Just arbitrary images? Literally any images will do?

lapis sequoia Aug 9, 2022, 6:00 PM

#

serene scaffold It's one of the most active channels on this server.

Okay cool. I’m trying to do an informal survey of 1: what the future of AI is (like particularly promising sub fields) and 2: what sub fields need more hardware acceleration. If you have answers to those two questions I’d be really appreciative

rough mountain Aug 9, 2022, 6:01 PM

#

serene scaffold Just arbitrary images? Literally any images will do?

Preferably real world photos. What they are of matters less. Also preferably the same size.

serene scaffold Aug 9, 2022, 6:01 PM

#

lapis sequoia Okay cool. I’m trying to do an informal survey of 1: what the future of AI is (l...

I don't know
natural language processing and image/video processing

rough mountain Aug 9, 2022, 6:01 PM

#

So not like ct scan data or the large scale fish dataset

lapis sequoia Aug 9, 2022, 6:02 PM

#

rough mountain Preferably real world photos. What they are of matters less. Also preferably the...

Hrm I found 50k art scans but that’s probably not what you want

lapis sequoia Aug 9, 2022, 6:02 PM

#

serene scaffold 1) I don't know 2) natural language processing and image/video processing

Why those two? Also third question, how do you test a novel type of NN?

rough mountain Aug 9, 2022, 6:03 PM

#

lapis sequoia Hrm I found 50k art scans but that’s probably not what you want

Actually that could work really well

#

not what I originally had in mind, but yep that's perfect.

serene scaffold Aug 9, 2022, 6:04 PM

#

lapis sequoia Why those two? Also third question, how do you test a novel type of NN?

Those two domains use deep neural networks and large training sets, and that's where one uses GPU computation.

building it with pytorch and seeing if it performs well on a given dataset

lapis sequoia Aug 9, 2022, 6:05 PM

#

rough mountain not what I originally had in mind, but yep that's perfect.

Hrm so I’ve got a bigger one of plants and an equal sized one of celebrities

#

Idk you can pick out of this https://imerit.net/blog/22-free-image-datasets-for-computer-vision-all-pbm/

lapis sequoia Aug 9, 2022, 6:06 PM

#

serene scaffold Those two domains use deep neural networks and large training sets, and that's w...

Gotcha, I kinda wanted to make a NN only using multiplication and division. Idk if that’s been tried before

trim zephyr Aug 9, 2022, 6:13 PM

#

Hi anyone familiar with Tweepy

wooden sail Aug 9, 2022, 6:14 PM

#

lapis sequoia Gotcha, I kinda wanted to make a NN only using multiplication and division. Idk ...

that's not gonna be a neural network though, it's equivalent to doing linear (or affine) transformations and can therefore be condensed into a single matrix-vector product. you need nonlinear operations to reap the benefits of the universal approximation theorems

lapis sequoia Aug 9, 2022, 6:14 PM

#

wooden sail that's not gonna be a neural network though, it's equivalent to doing linear (or...

Well you can make nonlinear activation functions from that iirc, no?

wooden sail Aug 9, 2022, 6:14 PM

#

from mult and division? no

lapis sequoia Aug 9, 2022, 6:15 PM

#

Okay so I’m lost, like what does the sigmoid function do besides multiplication and division?

wooden sail Aug 9, 2022, 6:15 PM

#

exponentiation

lapis sequoia Aug 9, 2022, 6:16 PM

#

Which is a special case of multiplication.

wooden sail Aug 9, 2022, 6:16 PM

#

not at all, not for non integer arguments

#

if you're only working with integers, i'd be willing to let that slide, but then you can't use division

#

exponentiation is a nonlinear transformation

lapis sequoia Aug 9, 2022, 6:18 PM

#

Okay. So would a NN with exponentiation, multiplication, and division be sufficient?

wooden sail Aug 9, 2022, 6:19 PM

#

yes

#

though sigmoids are known not to yield the best results in general

lapis sequoia Aug 9, 2022, 6:19 PM

#

Do you know of any approach that does that, IE avoiding addition and subtraction?

wooden sail Aug 9, 2022, 6:20 PM

#

nope

lapis sequoia Aug 9, 2022, 6:20 PM

#

Would be fun to try then

wooden sail Aug 9, 2022, 6:21 PM

#

there's a reason no one does it, but try it out by all means

#

multiplication, division, and exponentiation can only do so much

lapis sequoia Aug 9, 2022, 6:21 PM

#

Like what’s it missing?

wooden sail Aug 9, 2022, 6:21 PM

#

in particular, you can't do any translations/shift up, down, left, or right

#

which means no matter how hard you try, you cannot change activation thresholds with them

lapis sequoia Aug 9, 2022, 6:22 PM

#

Hrm true.

wooden sail Aug 9, 2022, 6:23 PM

#

anyway you need addition in back propagation and grad desc, but idk how strict you were being with the "no addition/mult"

#

you could do gradient-free techniques like simulated annealing, but again, depends how strict you are in allowing addition in the cost function itself

lapis sequoia Aug 9, 2022, 6:24 PM

#

Well I’m kinda intrigued now.

#

What if you were completely strict, no addition/subtraction whatsoever

wooden sail Aug 9, 2022, 6:25 PM

#

go find out and let me know 😛 but that limits which functions you can work with quite a bit

#

for one you are immediately kinda limited to work with min and max of functions that are bounded by below or above

#

since you can't even subtract the target value

#

can't take norms of anything other than scalars either

#

i would call it like "diet optimization", more like what you do the first time you learn about opt with local minima and maxima looking at the first and second derivative tests, or using probabilistic approaches

#

though now that you think about it, you can be a pain in the ass and work out loopholes like logs and exponents of products and divisions to achieve the same as addition and subtraction. up to you if you consider that fair or not

lapis sequoia Aug 9, 2022, 6:30 PM

#

wooden sail though now that you think about it, you can be a pain in the ass and work out lo...

How would that work?

wooden sail Aug 9, 2022, 6:30 PM

#

stuff like log(ab) = log(a) + log(b)

lapis sequoia Aug 9, 2022, 6:32 PM

#

Hrm that’s probably fair

wooden sail Aug 9, 2022, 6:33 PM

#

in that case you're not all that constrained regarding cost funcs. working with independent random variables already lets you create log-likelihood expressions and maximum likelihood estimators only out of products of probability density functions, and their log expressions are equivalent to common cost funcs like least squares

lapis sequoia Aug 9, 2022, 6:35 PM

#

Do you know how to add something that affects the positive part of a function only (without being piecewise)?

lapis sequoia Aug 9, 2022, 6:36 PM

#

lapis sequoia Okay cool. I’m trying to do an informal survey of 1: what the future of AI is (l...

Oh and Edd if you could answer these two questions too please

wooden sail Aug 9, 2022, 6:38 PM

#

1.) i like explainable/hybrid AI where networks are made out of classical models, but the hyperparams are optimally learned in a data driven fashion. 2.) any that need it:P i work with multidimensional data, and it's certainly needed there. that's stuff like anything with sensor arrays, composition of different sensors, hyperspectral imaging, spatial audio, etc

wooden sail Aug 9, 2022, 6:39 PM

#

lapis sequoia Do you know how to add something that affects the positive part of a function on...

i can't think of anything off the top of my head, but i'm pretty sure you could fit a pretty good polynomial to a relu and add two of them with opposite signs, and compose another function with the positive one. maybe a hyperboloid would do well

lapis sequoia Aug 9, 2022, 6:40 PM

#

wooden sail 1.) i like explainable/hybrid AI where networks are made out of classical models...

With multidimensional data what sort of processing are you doing?

wooden sail Aug 9, 2022, 6:41 PM

#

most commonly multi channel ultrasound stuff, but some colleagues with with stuff like mimo radar and satellites

#

depending how the ultrasound data is collected, you have data with axes like tx angle, rx angle, tx element, rx element, time/freq

#

i try to image stuff with it. tomographic inversion

lapis sequoia Aug 9, 2022, 6:48 PM

#

Oh cool.

steady basalt Aug 9, 2022, 6:52 PM

#

The wait is over. Finally found physicals and the calculus grind begins

#

All I need now is a trig book first

wooden sail Aug 9, 2022, 6:55 PM

#

heh gil's book has an illustration of the "fundamental theorem of linear algebra" on the cover

steady basalt Aug 9, 2022, 6:55 PM

#

I spotted a really good one which was called foundations of mathematics and it holds your hand through the very basics which I may borrow and recap, espeically for stuff like sine and angles

#

may help to go back ovre that before trying this because i bet after a few dozen pages theyll ask you to use it, tho im not so sure the methodology cares whether its a sine function or not

#

oh, and logarithms

robust jungle Aug 9, 2022, 7:21 PM

#

wooden sail i try to image stuff with it. tomographic inversion

https://tenor.com/view/jfk-clone-high-i-like-your-funny-words-magic-man-jack-black-gif-18659433

Tenor

dusty valve Aug 9, 2022, 8:06 PM

#

what does WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 71742 vs previous value: 71742. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize. mean?

mint palm Aug 9, 2022, 8:39 PM

#

wouldnt it be much better to just initially feed generator with real images but the ones which we are not feeding in discriminator?
Instead of random input??

#

probably not cuz when deploying, we would need similar images again to get novel output images.
We should input something that is actually input-able when deployed.

spare briar Aug 9, 2022, 9:25 PM

#

dusty valve what does ``` WARNING:tensorflow:It seems that global step (tf.train.get_global_...

A step is one forward pass + gradient update

tf maintains a global step value which can be used by each component of your model (for example you might change your learning rate based on global step)

tf is warning you that your global step is not being incremented and suggesting that you have your optimizer update it automatically when it updates gradients

mint palm Aug 9, 2022, 9:54 PM

#

Also our generator generates noisy labels while our discriminator, being relatively robust to noise, cleans these labels which is not the case in TS framework.
I am new to generative models, but how is possible what that bold thing says?

brisk apex Aug 9, 2022, 10:25 PM

#

I have question regarding concept of ELT: after transformation, after you create dataframes and tables after optimization, what happens to that dataframes/tables? Do they go back to data warehouse for data analyst to work on? If not, do data analyst need to run optimization every time they want to work on optimized dataframes/tables?

dusty valve Aug 9, 2022, 11:05 PM

#

spare briar A step is one forward pass + gradient update tf maintains a global step value w...

How would I do that

misty flint Aug 9, 2022, 11:23 PM

#

brisk apex I have question regarding concept of ELT: after transformation, after you create...

"it depends"

#

different architectures for different use cases

#

different resources as well

lapis sequoia Aug 9, 2022, 11:25 PM

#

How would the steepness of an activation function impact its utility?

misty flint Aug 9, 2022, 11:26 PM

#

the other question is also: are you the software/data engineer serving the data analyst/scientist? are you on the cloud -- if so, this will allow for more flexible architecture.

misty flint Aug 9, 2022, 11:27 PM

#

brisk apex I have question regarding concept of ELT: after transformation, after you create...

another thing to consider is what kind of data warehouse are you working with? does it allow for "easy reads" without interfering with other processes? how expensive are the queries to run, etc.

lapis sequoia Aug 10, 2022, 12:21 AM

#

What is Data Science and AI even for?

#

Isn't it just analysing data?

hazy saddle Aug 10, 2022, 12:24 AM

#

Hello, I'm using pandas to filter some data, I'm using the next code:
first_week_data = data_relevant.loc[(data_relevant["FechaEncuesta"] >= first_day_week1) &
(data_relevant["FechaEncuesta"] <= last_day_week1)]

The problem is that I'm geting the first and las day data only not the days between, any ideas where's the problem.

serene scaffold Aug 10, 2022, 12:25 AM

#

please show the result as text of print(data_relevant.dtypes)

#

@hazy saddle ^

serene scaffold Aug 10, 2022, 12:26 AM

#

lapis sequoia What is Data Science and AI even for?

data science is about analyzing data using programming. AI is about writing programs that make decisions, and stuff.

hazy saddle Aug 10, 2022, 12:27 AM

#

serene scaffold data science is about analyzing data using programming. AI is about writing prog...

Fuente object
FechaEncuesta datetime64[ns]
Grupo object
Ali object
Cant Kg float64
dtype: object

lapis sequoia Aug 10, 2022, 12:27 AM

#

serene scaffold data science is about analyzing data using programming. AI is about writing prog...

Oh, like predicting and stuff

#

Got it

serene scaffold Aug 10, 2022, 12:27 AM

#

!docs pandas.Series.between

arctic wedgeBOT Aug 10, 2022, 12:27 AM

#

pandas.Series.between


Series.between(left, right, inclusive='both')```
Return boolean Series equivalent to left <= series <= right.

This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.