#data-science-and-ml

1 messages · Page 307 of 1

velvet thorn
#

but generally, that is true

lapis sequoia
#

So if I set my unit=32

#

I'm guessing we have 32 different combinations of bs and cs

#

Right?

velvet thorn
#

subject to the above caveat

#

but

#

"combination" is not really

#

an appropriate word

#

or rather, it's ambiguous

#

but, yes, in essence you have 32 (w, b) tuple-equivalents

#

which are independent

lapis sequoia
#

So what decides which 32 values of b,c you get?

velvet thorn
#

this

#

subject to initial conditions

lapis sequoia
#

what's backpropagation of error

velvet thorn
#

hm

#

how did you learn about deep learning?

lapis sequoia
#

kaggle

velvet thorn
#

I would suggest

#

you pick up a book, or a video course, or something like that

#

it's important to have a theoretical foundation

lapis sequoia
#

I already am

#

It's just a hole in my knowledge I need fixing

velvet thorn
#

backpropagation is one of the most basic aspects of neural networks

#

basically

lapis sequoia
#

Oh, the loss function?

velvet thorn
#

it's the application of the chain rule, given the application of a loss function to a neural network's prediction vs ground truth, to successively update the weights (including bias) of preceding layers

#

the layer closest to the end is updated first

#

and then the weight updates are propagated backwards throughout the network

lapis sequoia
#

So the number of layers is the number of times the gradient is calculated and applied?

velvet thorn
#

hm

#

you can think of it that way

#

but that is not always true

#

because you work at a higher level of abstraction than that

#

sometimes layers may incorporate multiple such mathematical operations

#

each of which requires one backpropagation step

#

consider for example

#

RNNs

lapis sequoia
#

Yeah I'm gunna go look up backpropagation a bit

#

I think I'm there with the individual components, I just dont have much intuition as to how it all ties together

#

Thanks 🙂

velvet thorn
#

yw 👋

#

if you ever need it, .reset_index()

turbid drift
#

Can someone suggest me a good roadmap for deep learning? Thanks!

quasi sparrow
#

Deep learning with Python is a good start

#

That’s the name of the book

stuck socket
#

sup

#

watcha doin

near nymph
#

Heyo, does anyone here know how to download a .json file from a html link and convert it into dataframe or csv format?

dense relic
#

use requests library?

mint palm
#

i know the derivation of the above two equation

#

but not able to derive for third one (marked with 2 arrows)

#

it is for one hidden layer NN

#

we are using sigmoid for first layer and tanh for output layer

#

this is cost function j

mint palm
#

does someone know how to derive it

warm wharf
#

hi im having trouble converting this architecture into code

#
(2) The convolutional layer is followed by a max pooling layer. The pooling is 2x2 with stride 2.
(3) After max pooling, the layer is connected to the next convolutional layer, with 64 output feature maps. The convolution kernels are of 5x5 in size. Use stride 1 for convolution. The activation is ReLU.
(4) The second convolutional layer is followed by a max pooling layer. The pooling is 2x2 with stride 2.
(5) After max pooling, the layer is connected to another convolutional layer, with 128 output feature maps. The convolution kernels are of 5x5 in size. Use stride 1 for convolution. The activation is ReLU.
(6) After convolutional layer, there is fully connected layer with 3072 nodes and ReLU activation function.
(7) The fully connected layer is followed by another fully connected layer with 2048 nodes and ReLU activation function, then connected to the last fully connected layer with 10 output nodes (corresponding to the 10 classes). Use the SoftMax activation for the last layer. ```
#

so far i have:

#
                          keras.layers.Conv2D(64, (5, 5), 1 , padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.MaxPooling2D((2,2), 2),
                          keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.MaxPooling2D((2,2), 2),
                          keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),```
#

i don't know how to make a fully connected layer

#

or know if my input_shape arguments are correct

#

let me know 😎

#

@ me when you respond tysm

cobalt creek
#

@warm wharf just add a dense layers

#

before that flatten the result

warm wharf
#
                          keras.layers.Conv2D(64, (5, 5), 1 , padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.MaxPooling2D((2,2), 2),
                          keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.MaxPooling2D((2,2), 2),
                          keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.Flatten(),
                          keras.layers.Dense(3072, activation='relu'),
                          keras.layers.Dense(2048, activation='relu'),
                          keras.layers.Dense(10, activation='softmax')
])```
#

something like this?

cobalt creek
#

why is this 2444 when i have 78200 images on dataset

arctic crown
#

please help

#

but it doesent go more than 0.8
i have been training a hotword 5000 times

cobalt creek
#

try changing huperparameters

grave frost
mint palm
#

maybe use bigger network

warm wharf
mint palm
#

is it coursera

warm wharf
#

uni class

grave frost
#

@lapis sequoia You can search it up on StackOverflow, it's a hardware reason in GPU - batches in the power of 2 can be efficiently calculated by (4 CUDA cores in parallel?) in the end, it boils down to the GPU architecure and what Nvidia has adopted

warm wharf
#

model = keras.Sequential([
                          keras.layers.Conv2D(64, (5, 5), 1 , padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.MaxPooling2D((2,2), 2),
                          keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.MaxPooling2D((2,2), 2),
                          keras.layers.Conv2D(64, (5, 5), 2, padding='same', activation='relu',
                                              input_shape=(32, 32, 3)),
                          keras.layers.Flatten(),
                          keras.layers.Dense(3072, activation='relu'),
                          keras.layers.Dense(2048, activation='relu'),
                          keras.layers.Dense(10, activation='softmax')
])

lr_schedule = keras.callbacks.LearningRateScheduler(
              lambda epoch: 1e-4 * 10**(epoch / 10))
optimizer = keras.optimizers.SGD(
    learning_rate=0.01, momentum=0.0, nesterov=False, name="SGD"
)

model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                 metrics=['accuracy'])```
#

i ended up setting up my model like this im not exactly sure if its correct but i really hope so cause training 20 epochs is taking forever even on colab

grave frost
warm wharf
#

im not exactly sure if i set it up correctly and wanted to make sure before i spent the training the model

#

specifically the input shape param

#

i did it mostly looking at a kaggle notebook and kinda guessing

grave frost
#

uni eh? what's the end aim? any baselines?

warm wharf
#

its using svhn dataset the google street view house numbers

#

but its more a learning activity or something

#

first intro to NN

grave frost
warm wharf
#

requirement is to train the model and plot loss functions

grave frost
#

first intro should be Dense architectures only

#

do you know how conv layers work?

warm wharf
#

yeah they had a few modules on DL

#

this is the application project

grave frost
#

so is your knowledge in CNN's fully fleshed out?

warm wharf
#

not particularly, but that may be my fault i am a little behind

grave frost
#

yea, I suggest you take things slow and learn the basics first

warm wharf
#

i have a little experience with them cause i took the andrew ng DL coursera course a few years back but it has been a while

mint palm
#

andrew ng is the goat

grave frost
#

better learn DL from the ground up

#

Andrew NG's course is shit - it's just spoon feeding you code

warm wharf
#

yeah i got that vibe when i took it

grave frost
#

though I didn't complete it, so I might be biased

#

but learning NN's from the ground up is much better

mint palm
# mint palm

see this i aint feeding from spoon.......(he said you may not wonder how to derive cuz its complex......but i did wonder)

#

i think its good if you see deeper with the course side by side

warm wharf
#

i don't mean to flood the channel but is this normal? ik its early in the training of the model but accuracy hasn't changed in 4 epochs but the loss is going down

grave frost
grave frost
warm wharf
#

o not good

grave frost
#

run it over a few more epochs to see whether val accuracy increases

mint palm
#

wait can we tell overfitting just by this?

warm wharf
#

sorry for the stupid question but hows it even possible for the loss to decrease and the accuracy to remain the same? isn't the loss function measuring accuracy in a way by calculating error?

mint palm
#

yeah

#

your right

cobalt creek
#

can someone help me🙄

arctic crown
#

@cobalt creek can i dm you?

cobalt creek
#

ye maybe

cobalt creek
arctic crown
cobalt creek
#

What model are u using

arctic crown
#

ls hotword

cobalt creek
#

I hv not used it but roughly hyperparameters are the values changing which affect accuracy,

arctic crown
#

hmm

#

can i send you the code?

#

@cobalt creek

cobalt creek
#

can someone help me please

cobalt creek
#

why is this 2000 training partition is of 65000... i m confused pls help

cobalt creek
#

is there some default value of batch size, i just set it to 1, i have 65000 on the counter

#

exactly what i was expecting

short heart
#

Ive got RL algorithm to choose from buying,selling or holding things. How do I prevent it from choosing actions like buying when it has no money, or selling when it doesnt have anything? Cause it can choose these things for a lot of iterations and gets 0 reward, which breaks everything I assume

sinful briar
#

@drifting void this one

lapis sequoia
#

If i have an image, and its mask, what operation do i need to apply the mask but leave the background white?

drifting void
#

Sorry, my lab got destroyed and I couldn't get the sample data...
So my case is the following. I am generating a lot of data in a form I choose, last version is something like that:
[
'0001c06e32a85a5d92c9cb784ff6a492df1d0055',
'00088f45a8bc798ceb2b5a37505f787fad19d9af',
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
[350, 351, 352, 353, 354, 355, 356, 357, 99],
-9.5,
1.0
]

Since I have many of these, I do that in parallel and append to a file. I chose msgpack but now the file is so large that I cannot read it back...

#

I use Dask for other use case and it worked well with reading many parquet files. So maybe I should write in several parquet files instead of msgpack single file.
My question is how do you usually write big files and what do you use for reading and searching later on

arctic crown
#

can someone please help me

#

i am making a personal assistant and i want to add a hotword in it

#

please help

wicked meadow
#

Hey this is a fairly basic question I think. I was told to post in here.

I currently have pandas 0.20.3 installed. I want 0.24 or newer versions. I tried update pandas but it apparently only sees the 0.20.3 as the newest version.

Currently working on corporate servers so I can't download anything. Anybody know how to get the new version of pandas in my situation?

lapis sequoia
#

If i have an image, and its mask, what operation do i need to apply the mask but leave the background white?

tidal bough
wicked meadow
tidal bough
#

That's really strange, hmm

wicked meadow
#

Probably just how it's all set up here

tidal bough
#

Try updating pip, perhaps. python -m pip install --upgrade pip

#

I've had weird behaviour from old pips

wicked meadow
#

Hmm that's giving me an error in the prompt. Says unable to get local issuer certificate

tidal bough
#

try also pip install --upgrade pip, I guess, but that ends up badly for me sometimes

wicked meadow
#

It gives me that same error

tidal bough
#

hmm, weird

#

might want to open a help channel

#

something is wrong with your pip, possibly

wicked meadow
#

Fyi

#

Okay I may have to do that

#

Thanks so far!

rough otter
#

in a regression model, would you keep variables that have low correlation with the target variable?

lapis sequoia
#

imagine my classifier classifies melons and water melons

#

how can i make it infer a melon colored with red as a melon and not a water melon?

tidal bough
#

wait, what

#

watermelons are the ones with red insides, not melons

lapis sequoia
#

thats what i mean

#

if i paint a melon with red, like, manually on photoshop

#

the cnn will think it is water melon, but it is actually a melon

#

or orange - lemon

tidal bough
#

Well, include in the training set such trick examples.

lapis sequoia
#

like my question is, how can i make it not rely that much on colors but on shape

supple turtle
lapis sequoia
#

will be good randomly paint some images on the data training set???

tidal bough
#

Possibly, yeah

lapis sequoia
#

like, idk, then maybe when he sees an orange, maybe it will think it is a lemon that has been painted

#

:/

#

how will it not mess up with real and fake?

tidal bough
#

Include enough examples and eventually it will learn.

You could also just grayscale the image and so abandon matching on color entirely, but that might reduce accuracy on normal examples.

lapis sequoia
#

i just made it right now

#

if u hadnt the original photos, it will be hard even for u to see what is a lemon and what is an orange

tidal bough
#

I'd be fooled by that too, yeah

lapis sequoia
#

so there is not actual way?

lapis sequoia
broken warren
#

Hey i'm trying to build an ai that predicts a 6. number to a given 4 number series. what is the best neural Net i could use for that? (I heard that RNN or specifically LTSM is good for the task)

lapis sequoia
#

hello I need help with an AI in open cv someone knows about this topic

exotic maple
dapper halo
#

Could anyone direct me to information on feeding a bayesian network distributions as inputs? All I see are on using bayesian networks to produce a distribution as an output

bronze skiff
#

to be fair, by "producing a distribution" we usually mean "parameters of a predefined family of distributions"

#

so you can also say that your inputs follow a parameterized family and feed those in

distant trout
#

Hi guys how can i get 8 peak values at every charts? I have this values saved in txt file and numpy dataFrame

lapis sequoia
#

How many times do neurons get backpropagated in a neural network?

fleet vault
#

im not sure if a beautifulsoup question belongs here, but #help-carrot

rare shell
#

Hey guys! I had a quick Numpy question. If I have an array such as

[
[0], [1], 
[1, 0], 
[1, 1],
[1, 0, 0], 
[1, 0, 1], 
[1, 1, 0], 
[1, 1, 1]
]

And I wanted to populate the blank spaces martrix with any number lets say 8 to become a 8 by 3 matrix such as

[
[0, 8, 8],  
[1, 8, 8], 
[1, 0, 8], 
[1, 1, 8], 
[1, 0, 0], 
[1, 0, 1], 
[1, 1, 0], 
[1, 1, 1]]

How would I do something like this?

serene scaffold
rare shell
#

Its not in the first place I have to specify dtype=object

velvet thorn
velvet thorn
#

anyway, this is probably what I would do

rare shell
#

Well its accually a step for solving a problem in a question on my CS assigment so yea

serene scaffold
#

I would find another way to approach the problem so that you end up with nans instead of an improper matrix.

velvet thorn
#

!e

import numpy as np

data = [
[0], [1], 
[1, 0], 
[1, 1],
[1, 0, 0], 
[1, 0, 1], 
[1, 1, 0], 
[1, 1, 1]
]

max_length = len(max(data, key=len))
repeated_element = 8

a = np.array([row + [repeated_element] * (max_length - len(row)) for row in data])
print(a)
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | [[0 8 8]
002 |  [1 8 8]
003 |  [1 0 8]
004 |  [1 1 8]
005 |  [1 0 0]
006 |  [1 0 1]
007 |  [1 1 0]
008 |  [1 1 1]]
rare shell
#

wow

velvet thorn
lapis sequoia
#

How many times do they get recalculated

rare shell
velvet thorn
velvet thorn
#

...but if I understand the thrust of your question correctly

#

that would depend on the architecture of the network.

lapis sequoia
#

is it defined in the model?

#

like in keras

velvet thorn
distant trout
rare shell
#

@velvet thorn In your code how would I replace the number 8 with something different?

#

Nm i got it

distant trout
#

How can i check when series of booleans like "True, True, Ture,True, False,False" is changing from True to False?

bright aurora
#

Guys can you please help me with this question

#

I'm struggling to implement a sine wave predictor using LSTM in pytorch. If someone can help me understand why it's not working

stuck socket
#

wtf

#

how r ya guys

#

@bright aurora u there?

#

woao

#

highjshgsjdf

#

where did u learn pytorch=?

bright aurora
whole mica
#

What’s up guys?

prime vortex
#

anyone free to look at my beginner data analysis code? I really appreciate the help

lapis sequoia
#

Would someone mind having a look at a short notebook where I'm toying with some data exploration/visualization? I wound up coming to almost the opposite conclusion I expected when I started and I'm wondering if I inverted something somewhere or made some really stupid mistake in sorting and filtering my data?

balmy junco
#

How does one typically use non-image data with image data when training models using pytorch? I have used pytorch for image classification, but never for image classification with non-image features. Any thoughts?

crisp wing
#

Sorry, asked this in help section, but since I'm in deep dung, and the question probably was a bit too specific, I'd ask it here if ok:

I did SVD on some precentered data,

# done in python

# T: amount of samples with time, kind of our "variable" with this type of data
# X: data put inside a np.ndarray
# X.shape = (T=109, N_Lat*N_Lon=alot)
# X has mean ~ 0
U, s, V, = svd(X)

# mean ~ 0, std ~ var ~ 1
# but min ~ -2, max ~2.5
# retain three components
standardised_PCs = sqrt(T) * U[:, 0:2]

# Since standardised, I'd assume this would result in the correlation matrix, but...
standardised_PCs.T @ standardised_PCs
array([[ 1.09000000e+02,  1.45674989e-14, -8.57975238e-15],
       [ 1.45674989e-14,  1.09000000e+02,  2.23983947e-14],
       [-8.57975238e-15,  2.23983947e-14,  1.09000000e+02]])

The diagonals are equal to T rather than 1.
I feel like I misunderstand the approach or result somehow. Everywhere I look I feel they say you'd get a correlation matrix using standardised PCs

My reference for this approach is (eq. 16)
http://www.ehu.eus/eolo/pyclimate/downloads/matrix.pdf

ripe forge
# lapis sequoia is it defined in the model?

Indirectly. How often a backpropagation is fired depends on your batch size, and your learning algorithm. So, you can pretend it happens once for every batch but there's exceptions too. Now that means number of epochs also affects it. And then finally you throw a gpu into the mix and it all goes to shit

lapis sequoia
lavish tundra
#

i'm having a problem with chinese and korean words using seaborn+matplotlib, someone know how i can fix that?

crisp wing
# lavish tundra i'm having a problem with chinese and korean words using seaborn+matplotlib, som...
lavish tundra
#

ty

arctic wedgeBOT
#
Command Help

!eval [code]
Can also use: e

*Run Python code and get the results.

This command supports multiple lines of code, including code wrapped inside a formatted code
block. Code can be re-evaluated by editing the original message within 10 seconds and
clicking the reaction that subsequently appears.

We've done our best to make this sandboxed, but do let us know if you manage to find an
issue with it!*

cobalt creek
#

what do u guys prefer to save some ML model? i read about h5, pickle, YAML, json...which one should i prefer

tidal bough
#

probably h5 or pickle, would prefer h5

#

storing giant arrays of numerical data in YAML or JSON is a crime against efficiency

#

like, how'd you encode them, as base64?

terse hull
#

is R better than python in datascience

#

?

exotic maple
#

its more widespread and has a growing community, so yes

#

R is stale

terse hull
#

oh

#

does R perform better than python though

#

like in terms of computing speed

#

i assume it would

#

considering python needs so much dependencies

ripe forge
#

Bad assumption, you're assuming number of dependencies decides programming speed.

safe tapir
#

Is there a go-to lib for a/b testing?

terse hull
crisp wing
#

I imagine most performance-driven stuff in python as well as r is basically a wrapper around lower level language functionality.

vague vector
#

Hey guys, I need to make a Visualisation project in Tableau
I chose the London Underground, Bus and Overground usage data compared to daily covid cases. The data looks like this:

#

I need ideas for the visualisation
the tricky part is that I dont have dates, rather time periods. Date from to Date to. How can we handle it while visualizing?

crisp wing
vague vector
#

Any Data Engineering and Visualisation expert here?

ripe forge
iron basalt
#

It's a very fun read. I really like Jeff's story, need more people like him.

stuck socket
#

guys, how can i add time series into my enviroment'

#

???

iron basalt
# terse hull i mean wouldnt numpy be slower compared to what it would be if it was ddirectly...

Both R and Python end up calling some C code (for datascience stuff), that C code probably involves calling BLAS/CBLAS (e.g. calling numpy) which will result in mostly identical speed. The overhead of Python and R for calling a C function may be different, but it's irrelevant to any data science task. For example, if python took say 0.2 milliseconds to call a C function which got the mean of 20000 data points and R did the same but with 0.1 milliseconds overhead, it would not matter since something like 99.9...% of the time is spent in the C function (actually computing the mean). So it's a micro-optimization at best. If you are worried about speed and want to get serious about it, consider learning C to make fast things. As it will probably result in you learning about the relative speeds of things in modern computing and the C community is more focused on such things while Python/R is focused on using the things made by those people to be productive without too much work (Systems programmers make the fast systems which Python and R programmers use for their specific use cases).

#

(People that know both Python and C are the engines of the Python community that let everyone be very productive with Python (and there are a lot of them -> python is very big / used everywhere))

grave frost
iron basalt
grave frost
#

I wouldn't believe it lol. it's such a fundamental thing when working with heavy computation

iron basalt
#

Yeah they put out of a bunch of theory stuff (typically some crazy equations and such), but don't know that if such a thing could be computed it would be easy. One cannot ignore the physical reality of implementing an idea.

lapis sequoia
late shell
#

Can someone please help me with sklearn.preprocessing.OneHotEncoder. I can't figure out how to use its categories parameter.

exotic maple
#

if your column had 3 options A,B,C those are going to the categories tehe encoder is going to fit

#

I think you can also pass a list of your own categories, if you already have them or if you want to exclude unknowns

lapis sequoia
#

Hey ! I just followed the tutorial of Tech With Team to create an AI playing Flappy Bird using the NEAT algorithm

#

everything works as intended and I now want to check if I understood correctly by coding a snake game

#

But I'm wondering about something : something the snake game has that flappy bird don't is collectibles

#

Basically If I have X snakes playing around at any given time but only one apple for them to eat, this will cause issues. My question is : should I give each snake its own Apple that other snakes can't eat ? In this case, should all the apples be at the same position (apple #1 will be at 54;60 for every snake then apple #2 at 100;100, etc) or will a random position for each snake work just fine ?

#

Thanks for your answers lol I'm only starting out with ML

grave frost
lavish tundra
#

someone who understand about asia fonts(cjk) can give me a hand?
i'm trying to set the Noto Sans CJK font family using seaborn

sns.set_style({'font.family':'NotoSansCJK-Medium.ttc'})

i tried this too

sns.set_style({'font.family':'Noto Sans CJK'})

idk if the problem is with the font or with the code

dapper halo
#

Following along one of keras tutorials with my own data....really just trying to use datasets instead of a dataframe....but I keep getting the error that the model expects 3 inputs, but only receives 1 input tensor when it tries to fit the model to the dataset. Stackoverflow solution was to ensure that the second part of the tuple for the dataset needs to be the targets, which I have done....so not really sure what to do to resolve this. Anyone know how to resolve this?

`metal = 'N_SiII'
dataset = tf.data.Dataset.from_tensor_slices((Dataframe[['N_H','Redshift',metal]].values,Dataframe[['Metallicity','Density']].values))

def get_train_and_test_splits(dataset,train_size,batch_size=1):
train_dataset = (dataset.take(train_size).shuffle(buffer_size=train_size).batch(batch_size))
test_dataset = dataset.skip(train_size).batch(batch_size,drop_remainder=True)
return train_dataset, test_dataset

def run_experiment(model, loss, train_dataset, test_dataset):
model.compile(
optimizer=keras.optimizers.RMSprop(learning_rate=learning_rate),
loss=loss,
metrics=[keras.metrics.RootMeanSquaredError()],
)

model.fit(train_dataset, epochs=num_epochs, validation_data=test_dataset)

run_experiment(baseline_model, mse_loss, train_dataset, test_dataset)`

terse hull
somber prism
#

anyone have a tip on what to learn/do after finishing 'ml for stanford' ?

inland isle
#

what is data warehousing?

#

how to do it using python ?

lapis sequoia
mint palm
#

what are we actually doing in this L2 regularisation

#

i dont get the sigma part

#

are we squaring the numbers in weight parameter W and adding them?

grave frost
#

oof, I just read about a "ML scientist" (not a Data Scientist) who doesn't know any aspect of DL or anything in NLP, CNN's etc. And is wondering why he got fired from his company

safe tapir
velvet thorn
#

you mean lambda?

#

or omega?

mint palm
#

the lambda/2m term

velvet thorn
#

anyway yes just take the sum of the squares of the weights

mint palm
#

ok but what it does??

velvet thorn
mint palm
#

so the purpose is to decrease the overfitting right?

mint palm
#

overfitting i saw it

velvet thorn
#

that is what it is commonly used for, yes

mint palm
#

if we are adding to term how is it penalising

velvet thorn
#

higher = worse

mint palm
#

of so we are increasing loss function so than dW and db increase?

velvet thorn
mint palm
#

i wanna know why are we adding it to loss function

somber prism
#

anyone know how to make it less ugly

#

how can i make them not overlap on each value

mint palm
#

use alpha

muted oyster
#

xticks yticks

mint palm
#

@somber prism

muted oyster
#

anyone knows how do we create a function like grid search

tidal bough
#

Well, you can use numpy's linspace and meshgrid (or a similar method) to generate the sets of parameters, then you evaluate the function on all of them and pick the best results.

muted oyster
#

i need to pass strings, trying with some for loops first may be function is not necessary

somber prism
mint palm
#

@somber prism nvm dont use alpha.......its for discriminating how dense the overlapping plot is...........instead you can use plt.setp

#

u can rotate it through an angle and it would be much clear

#

something like this :

#

the usage is something like this:

somber prism
grave wasp
#

I have a program with face recognition by adding my custom database with images. Reading the video and doing face recognition. Libs that i am using are cv2, face_recognition. My question is can i use sklearn for classification report?

dapper halo
#

Doing a bayesian regression. I fit the model with separated dataframes x_train, y_train of shapes (samples,3).

Trying to look at output distributions instead of the deterministic values from model.predict(). So I feed the model x_test. Get error that x_test has no rank.
Tried converting the dataframe to a series oriented dictionary and spits out the error "expected one input tensor and got 3.

Any suggestions on what I need to convert my testing dataframe to, to view the output distributions?

frozen marten
#

within unet there is something called backbone_name parameter which takes resnet, vggnet.. unable to understand the fundamental difference between unet and (vggnet, resnet) ... Are'nt the latter too models like unet?
base_model = Unet(backbone_name='resnet34', encoder_weights='imagenet')

#

anyone online??

#

help me out with this guys....

#

😩 😩

lapis sequoia
#

hey

#

if I want to learn AI

#

but I don't exactly have the mathematical background to understand it all

#

but I still want to understand the math instead of just using pre-built pipelines and treating it as black box

#

where should I look

#

youtube

#

😄

#

BRUH oK thank you 😁

#

but seriously are there books or courses that could do the trick

#

u can look for something like maths under a neural network or soemthing

#

and then building my own nn from scratch

lapis sequoia
#

h u h

#

alright that sounds like a good idea

lapis sequoia
#

oK then anyways thanks again :)

iron basalt
#

ISBN-10: 0387310738

#

You should know linear algebra and multivariate calculus for that book. There are tons of books for both of those things. For linear algebra try Linear Algebra Done Right. For multivariate calculus, idk, do whichever.

#

There is also links for the math in the pins.

lapis sequoia
#

Hey ! I'm currently trying to apply the NEAT algorithm to a snake game I coded with python. For now I already have the "base" : I have a snake object with it's own food, So I can spawn how much snake I want at once and each snake will only be able to eat it's own food. Now, I'm wondering about which inputs I should give each snake for the algorithm to work
the obvious ones are easy : position of the food, position of the head, current direction and current lenght of the snake
but for it to be efficient, the snake should be able to know the position of each of it's body part for it to be able to avoid it properly
the problem is that the number of body parts can change, and from what I know, the number of inputs should be fixed. How should I proceed ?

#

I've seen this on the web but here the snake still does not have informations about the position of it's body

#

My goal is to make it learn to avoid self enclosing

merry frost
#

any pandas experts in here willing to educate me?

exotic maple
tough surge
#

Hey guys a question regarding anaconda..

#

I have initially installed anaconda on different drive and now I have reinstalled windows and deleted the .anaconda2 and .anaconda3 hidden folders inside AppData.

#

The problem is that now i dont know how to make it work with pycharm

#

Maybe i should create all the envs, one by one using the .yml files. But I cant find them inside the env's folder

serene scaffold
merry frost
velvet thorn
#

as text

#

also if you have a question, just ask it.

#

no need for a preface

velvet thorn
merry frost
velvet thorn
#

and an expected result

#

otherwise it's hard to help you

merry frost
#

I can show you a screenshot of a pivot table in excel if that is helpful, i figured you wanted data you could manipulate

exotic maple
#

for example

#

df.groupby(COLUMN).agg(function(s) to aggregate with)

velvet thorn
exotic maple
#

pivot table pretty much does the same, but I find groupby cleaner to read

velvet thorn
#

maybe you can give me an example

#

of what you want to do

#

and what the shape of your data is like

exotic maple
#

SUMIF is an excel funciton that does SUM when the IF is true

velvet thorn
#

uh

#

okay

#

so you mean like this?

merry frost
#

you guys type faster than i can think lol

velvet thorn
#

!e

import pandas as pd

s = pd.Series([2, 5, 4, 3, 8])  # data
evens = s[s % 2 == 0]  # events
print(evens.sum())
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

14
velvet thorn
#

i.e. 2 + 4 + 8, since all of those are even

#

like that?

exotic maple
#

excel has it in a single function

velvet thorn
exotic maple
#

so he probably needs to make a custom function

velvet thorn
#

but you can apply filters

#

which is what I did

#

s[s % 2 == 0] this is basically "get me the subset of the data where the remainder when divided by 2 is 0"

#

and you can apply any condition in the same way

#

even something much more complex

merry frost
exotic maple
#

pandas > excel though

#

specially for much larger files

velvet thorn
exotic maple
#

@merry frost the best way to get help is to include a subset or visual sample of your data and what result you expect to see

merry frost
#

I have sales data with 13 different revenue types and 700+ reps I need to use historical data to create a sales goal

exotic maple
#

how is the data structured? the revenue types are columns and the reps rows?

#

so its a 700x13 table?

merry frost
exotic maple
#

its not a stupid question, and yes, but its noyt preferred.

#

I dont remember how people post their df's here thou lol

merry frost
#

each row is an event they went to with the date of the even the revenue type, rep name, number of new members.

exotic maple
#

so for example if you want the total revenue per rep (regarldess of date)

#

you can do

#

data.groupby("rep").agg(sum)

#

that will give you the rep, and the sum of each revenue (assuming they are columns)

velvet thorn
#

it's harder to read

#

so not everyone will

velvet thorn
merry frost
#

how would i than take that and get an average of each monthly total for each rep in each revenue type

velvet thorn
#

also I believe .agg(sum) would be slower than .agg('sum')

#

that's my guess though

exotic maple
exotic maple
velvet thorn
#

which sums as Python objects

#

-> slow

#

whereas .agg('sum') would use C summation

#

it might be specialcased

#

no idea

velvet thorn
#

is it a column?

merry frost
#

yes

exotic maple
velvet thorn
#

or is it a value in a column for each row

velvet thorn
#

which is

#

take all the rows

#

group them by representative

#

then take the mean of all remaining columns

merry frost
#

it took me this long but i cleaned the data, i hope this pastes ok

velvet thorn
#

!e

import pandas as pd

df = pd.DataFrame([['a', 5, 8], ['b', 3, 6], ['a', 2, 7], ['b', 1, 7], ['a', 4, 3], ['c', 2, 6]], columns=['rep', 'type_a', 'type_b'])

print(df.groupby('rep').mean())
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 |        type_a  type_b
002 | rep                  
003 | a    3.666667     6.0
004 | b    2.000000     6.5
005 | c    2.000000     6.0
velvet thorn
#

!e

import pandas as pd

df = pd.DataFrame([['a', 5, 8], ['b', 3, 6], ['a', 2, 7], ['b', 1, 7], ['a', 4, 3], ['c', 2, 6]], columns=['rep', 'type_a', 'type_b'])
print(df)
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 |   rep  type_a  type_b
002 | 0   a       5       8
003 | 1   b       3       6
004 | 2   a       2       7
005 | 3   b       1       7
006 | 4   a       4       3
007 | 5   c       2       6
velvet thorn
#

^ the original

velvet thorn
merry frost
#

no

velvet thorn
#

then?

merry frost
#

i think its a hash from the elastic search the data is exported from

velvet thorn
#

huh

#

so what's the type

solid blaze
#

Anyone have any idea of how to solve this? The wording is confusing to say the least. It doesn't help that I'm self taught.

merry frost
# velvet thorn so what's the type

sorry i dont think i understood your question Type is not the first column the first column is the hash i spoke of ( 'id') if you are asking which column the type is in that would be the 5th column

velvet thorn
#

in your screenshot

merry frost
#

Correct i cut 45 columns of useless information

lapis sequoia
#

Do I need GPU for image recognition model training?

velvet thorn
velvet thorn
velvet thorn
merry frost
wind bobcat
#

I am sorry if this channel is inappropriate for this question :C

May i ask for recommendations for python packages that helps extract or convert music into some sort of data?

slim ivy
#

how can i remove rows in pandas by the name of the column

hollow grove
#

i needed some help doing something specific with tensorflow

#

im really not sure how this all work since i didnt write the code but, this is the code and what i want to do is serve it as with a Flask API, as in i want it to take image data as input and get the output. How should i approach it? should i build a model file and then somehow run it? if i were to simply import that to the flask main.py it would do a lot of computation on each request so im not sure how to do it.

#

ive never worked with tf before

#

i mean yeah its from google collab but i need know what changes i need to make

ripe forge
#

I'd say step 1, get familiar with the code. You should be able to tell yourself what each line is doing before proceeding.

#

No point trying to build on top of something you don't understand, especially when the code is right there

gentle birch
#

um i have a question
being RLY new to trying to llearn tensorflow, are there any good resources to use to actually learn the code and how it works?

#

i understand the basics of how nueral nets work but other then premade tensorflow code i cant find any good resources for learning tensorflow

iron basalt
gentle birch
#

im trying to learn how to use tensorflow to write nueral nets
so i guess the second one u asked

iron basalt
gentle birch
#

im trying to learn how to actually use tensorflow to implement convulutional nets and gans and such, but theres nothing i can find that explains what the code actually does like what attributes do what etc

iron basalt
#

Take a look at that simple feed forward neural network example.

#

That code just uses basic concepts from TF, not an entire prebuilt network.

#

Prebuilt networks are really just a bunch of those basic units combined and made into a class.

#

Fundamentally, TF and Pytorch are just fancy automatic differentiation tools that make running stuff on the GPU (and CPU with threading and vector operations) easier (for the most part).

#

(I actually have my own which is very much like pytorch and it was not hard to make, the real gains from using pytorch or TF is that lots of other people have already made a bunch of models for you)

gentle birch
#

damn that wouldve taken allot of learning and math to do that though?

iron basalt
#

Not anything anybody that's really into ML would not know.

minor charm
#

I second using pre-built stuff like TF. Very easy to set up

gentle birch
#

ok i understand what u mean by TF code uses basic blocks and adds them together to do bigger tasks,
what i dont get is jus what each attribute and function actually does, and the tensorflow documentation isint very good at explainging it at a level a beginner would understand

iron basalt
#

Give me a concrete example of what is causing you trouble.

gentle birch
#

for example, the conv2Dtranspose function, what does it actually do to make an image??
that makes no sense to me, cause as far as i know a transpose of a matrix is just rotating the matrix, why that help?

#

its used in a bunch of GAN code i saw to essentialy morph the input data toward turning it into an image
but i cant find a good explanation as to what its actually doing to the data

iron basalt
#

That has an animation

gentle birch
#

ah thankyou
this helps this specific issue
guess i should search more on stack overflow when i get these questions

iron basalt
#

That's more of a general deep learning question rather than an TF question.

#

Note that a lot of things in DL and ML have terrible names. Like "convolution" does not make sense to begin with (but many miss-uses later and it just became accepted that it means a specific thing in the context of ML/image processing).

#

And even worse, very popular papers will use different definitions for the same word (even in the same context).

#

So it's important to kind of be on the same wave-length in terms of jargon to be able to quickly understand what is going on. This can only be done by just having followed of bunch of projects and read a bunch of ideas. It's kind of like playing baseball and not knowing all the baseball terminology that was made up just for baseball. https://en.wikipedia.org/wiki/Glossary_of_baseball_terms

This is an alphabetical list of selected unofficial and specialized terms, phrases, and other jargon used in baseball, along with their definitions, including illustrative examples for many entries.

#

It's annoying to have to learn it all, but not really any way around it.

primal tulip
arctic wedgeBOT
#

Hey @rose thicket!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

rose thicket
#

duh

#

see this error

digital aurora
#

Any data scientist available??

#

Need some help

#

What did you mean?😅

mystic harbor
#

Links are allowed,just that particular one isn't

#

but yeah,

mystic harbor
digital aurora
#

I just needed some guidance.

#

I am a beginner in this field

#

I'm done with python and numpy

#

But have no clue, what to do next

frozen marten
#

this question can be answered even by learners... not necessarily data scientists

tough surge
#

Another question regarding anaconda. I have exported multiple envs as .yml files and imported those file using anaconda navigator. But in every enviroment the packages are not installed. Should I do a conda install "something" ?

serene scaffold
tough surge
tough surge
#

Ill try to export my env as a requirements.txt

mental nova
lapis sequoia
#

I need a software which work to recognize money. I like to work with open cv o what you recomend?

drifting void
#

Hi, how do you check for values such as those in Dask dataframe:
df[(df['val0']==val0) & (df['val1']==val1)].compute()
the above is super slow so perhaps there's another way?

serene scaffold
#

Is the data larger than your RAM, or are you doing operations in parallel?

drifting void
#

Yes it is a lot of data in parquet files

#

I want to add more data in case it doesn’t exist

#

I guess I shouldn’t be doing that with dask but rather use a different data structure to do the check

serene scaffold
drifting void
#

I was considering that too. The data will be growing to (if I calculated that correctly) 20-30 millions of entries

serene scaffold
#

FYI, I probably won't know if you've responded unless you ping/reply to me

drifting void
serene scaffold
drifting void
#

Yes, true. I thought Dask would help here.

#

I should probably start using a database. It may be useful

serene scaffold
#

On the flip side, I had somehow never heard of Dask, so thanks for bringing that to my awareness!

digital aurora
#

Guys, what all do i need to study under Stats for DS, anybody?

serene scaffold
short heart
#

How do I decrease discount factor in reinforcement learning

tough surge
weak remnant
#

guys can anyone assist me on how to train models efficiently if i have a low grade GPU and buying a new one is not an option

#

also i've tried google colab and looking for other suggestions

mint palm
#

do we compute cost after setting output values that are greater then 0.5 to 1 and others to 0, or before that?

#

in NN

livid jetty
#

What data I need to build a machine learning model which can predict future coronavirus cases count?

serene scaffold
livid jetty
mint palm
serene scaffold
grave frost
iron basalt
grave frost
cyan ridge
#

what can you suggest who is taking a data science career? actually I'm a second year student im so confused whether I wanted to be a software eng. or a data sci.

drifting void
dusk hornet
#

Can someone help me with python code for AI face recognizer

lavish tundra
#

we have a chinese/korean/japanase data scientist here? . _. i'm having problem about use a asia font to do a data visualisation ; -;

#

=/ i'm hard stuck on this problem using seaborn

iron basalt
lavish tundra
iron basalt
exotic maple
#

nah tbh that looks pretty impressive. especially because it seems they did entirely with chinese tech. No Tensorflow / pytorch, no nvidia, etc

lavish tundra
weak remnant
iron basalt
#

you don't have the font

lavish tundra
#

i thought i had it

lavish tundra
deft basin
#

woah

iron basalt
lavish tundra
#

OMG I CANT BELIEVE

#

i tried to user a different font and it works

lavish tundra
#

a angel on my life

lavish tundra
#

idk why but some fonts works for chinese words and dont work for korea words

torpid ember
#
def cannex_format_over1y(url,product):
    curr_dt = datetime.datetime.today().strftime('%Y-%m-%d')
    # curr_dt = (datetime.datetime.today() - BDay(2)).strftime('%Y-%m-%d')
    curr_dt_str = datetime.datetime.today().strftime("%Y%m%d")
    df_html = pd.read_html(url,header=1)
    header = df_html[0].iloc[0]
    cols = ['Financial Institution'] # only forward fill on Financial Institution column
    df = (df_html[0].iloc[1:])
    df[cols].fillna(method='ffill')
    df.columns = header
    df.insert(0, 'Date', curr_dt)
    # df.to_csv(csv_path)
    df.rename(columns={df.columns[5]: "1Y",
                       df.columns[6]: "2Y",
                       df.columns[7]: "3Y",
                       df.columns[8]: "4Y",
                       df.columns[9]: "5Y",
                       df.columns[10]: "6Y"},inplace = True)
    df.insert(loc=2,column='product',value=product)
    return df

cannex_format_over1y(gic_nonreg_1to6y_url,'Non-registered GIC').replace('-','')```
#

hey guys i have a really simple code that im getting a warning on. Im wondering if you guys can help me figure out what needs to change to avoid the warning

#

this is the warning:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(```
#

im struggling to understand what the issue is, but i assume it has to do with the

    df[cols].fillna(method='ffill')``` 
portion of the code
iron basalt
#

A font only has the glyphs that the typographer created for the font. Making a font with glyphs for multiple languages is a ton of work.

#

Few fonts are free and even fewer are free and good.

grave frost
#

you have full control over your GPU in the cloud. There is nothing you can't do physically that you can do using a terminal 🤷

grave frost
exotic maple
#

If I was on the CCP id say "The U.S is unreliable as partner for this, let's throw bullshit cash at it and develop our own" and boom. goodbye Tensorflow / PyTorch AI monopoly on the U.S

lavish tundra
exotic maple
#

that's just my take thou

grave frost
#

I trust no chineese AI dev - especially when you see what they use AI in China like. it's literally like 1984, and all research done helps them 😞

lavish tundra
#

i tried

sns.set(font="Arial Unicode MS")
sns.set(font="ArialUnicodeMS")
sns.set(font="arial-unicode-ms")

but none of those work

exotic maple
exotic maple
#

its pretty much a credit score lol

grave frost
#

The documentaries I have seen are pretty demonstrative of those tech used

grave frost
#

one doc actually interviewed the chineese guy making it - and he was answering those question very carefully

#

he expressly stated that those technologies will "benefit" chineese citizens

exotic maple
#

Idk I find it still exagerrated. Think it like this from another perspective:

JP Morgan and all other U.S banks have credit scores. Literally your whole life, including employment is based on this credit score, and this is not called "dystopian". Either all these types of scores are dystopian (they are) or none are, but cherry picking because "X GOOD" "Y BAD" its annoying.

In fact, US credit score sounds worse than a social score lol.

crisp wing
#

Can you lose variance even though you do a full reconstruction with SVD? Like X = s @ V.T

grave frost
#

it's not only credit scores or anything - it's a lot of tracking tech too, which is perfectly plausible to build with the proper investment

#

I mean, just look at what NSA did in earlier times. no one could have believed that such resources would be poured just to track common people.

exotic maple
#

you dont need government intervention when people give it away on their own

grave frost
#

ngl, people do give up their privacy. but you would be wrong that there were no protests or any opposition

exotic maple
#

if there was no change, the protest was irrelevant.

#

I would know that, living in 3rd world semi dictatorship country

grave frost
#

anyways, I for one support US's mass surveillance

exotic maple
#

I dont. Neither chinese nor US. screw both

grave frost
#

china's is really bad - but then you never know when USA might be too

grave frost
exotic maple
#

I know a lot of chinese. they dont care about privacy, HR or whatever, as long as they're safe and prosper

grave frost
#

if giving up a part of your daily privacy can prevent some mass shootings (maybe with your family involved) would you pay the price?

exotic maple
#

I feel a lot of americans think the same

exotic maple
#

in fact i do

#

but i dont call that dystopian

grave frost
#

no, but what china does is defintely wrong - and their research funds all go into that "dystopian" research

exotic maple
#

we are a bit off topic here i think, not in the domain of the channel

#

if you'd like we can continue discussing political perspectives via DM and not spam the channel

grave frost
#

Did you read that research where some chineese uni made a model to classify criminals based on their faces? with a lot of SOTA work, they got 85% accuracy in predicting criminals alone from their faces

#

may not be gov sponsored (didn't check), but still

exotic maple
grave frost
#

true. who would have even thought of making a model to do that unless specifically directed??

#

in any case, chineese life is just ....depressing to say the least

exotic maple
sharp nimbus
#

!ot

arctic wedgeBOT
grave frost
#
MIT Technology Review

Soon after the invention of photography, a few criminologists began to notice patterns in mugshots they took of criminals. Offenders, they said, had particular facial features that allowed them to be identified as law breakers. One of the most influential voices in this debate was Cesare Lombroso, an Italian criminologist, who believed that crim...

exotic maple
#

@grave frost we are off topic, you want we can conitnue via DM, lets stop spamming here

grave frost
exotic maple
molten hamlet
#

How do you even detokenize this dataset? 😐

#

i want words! i mean, I solved my problem, but can't check it

#

got it

#

from some issues xD

hexed heath
#

Hi, I am using implicit package (https://github.com/benfred/implicit) to create a recommender system. I am using the implicit least square algorithm.
I was able to make predictions for already existing users, or to find similar items, no prob. But I don't get how can I get predictions for a new user which was not in input data? the idea is that I have a set of items (each one existing in input data), and I want recommendations based on this set. I could get recommendations for each items and sum them up, but it doesn't feel right. This seems like a common usage, so I think I am missing something ^^'. Any ideas? Thanks 🙂

GitHub

Fast Python Collaborative Filtering for Implicit Feedback Datasets - benfred/implicit

lapis sequoia
#

is anyone available to help me?

scenic elbow
#

@lapis sequoia Possibly, what is it that you're trying to do?

minor charm
#

anyone familiar with tensorflow? Having some issues getting logs to write for a customtensorboard

carmine iron
#

why is this returning 11

coins = 8
max = 0
while max < coins:
    # print(max)
    for i in costs:
        max +=i
max```
royal crypt
iron basalt
whole mica
#

anyone here use machine learning for finance?

tall loom
lapis sequoia
#

Each case is a word. A series is formed by taking the height profile of the word

#

WordSynonyms remapped FiftyWords to 25 classes

#

But the data is the same (and I think flipped)

tall loom
#

@lapis sequoia What do the classes tell and what is height profile of a word?

lapis sequoia
#

hi, i'm new here

#

anyone want's to help me with some code suggestions?

#

i'm working on CNN project, and i have to prepare a dataset for my boss, who gave me 2 .HID files with inside some specific image filenames from a big Dataset of Images. I've converted every line of the .HID file in a element of a list, and i have a dictionary with all image filenames. But to check if the names in the .HID are matched with the names of the Dataset, i have to join ".jpg" string at the end of every elements of the list, cause the list elements are image filenames without the extension. Is right my
reasoning? Someone who can help me to do this? Cause the problem is that you can't concatenate list elements with string...

#

`import cv2
from PIL import Image
import os

path_file = 'E:\Work\AU(13)-SottoCampioniA e B\SetA.HID'
path_image = 'E:\Work\AU13_face'
work_dir_tr = 'E:\Work\x'

image_file_names = [i for i in os.listdir(path_image)]
#images = [i for i in os.listdir(path_image) if i[-3:]=='JPG' or 'jpg']

file1 = open(path_file, 'r')

list_of_lists = []

for line in file1:
#print(line_list)
stripped_line = line.strip()
line_list = stripped_line.split()
#print(line_list)
list_of_lists.append(line_list)
list_of_lists = [line_list + ".jpg" for line_list in list_of_lists]

file1.close()

print(list_of_lists)

#============================================================================
#############################################################################
#============================================================================

#result = any(elem in line for elem in list_of_lists)
os.listdir(path_image)

wIDTH = 100
hEIGHT = 100

for i,image in enumerate(image_file_names):
#print(image)
if any(elem in list_of_lists for elem in image_file_names):
print(i,'matched')
# im = Image.open(image)
# im = im.convert('L')
# I = Image.open(path_image+"/"+ image)
# I = I.resize((wIDTH,hEIGHT), Image.BICUBIC)
# I.save(work_dir_tr+'/'+ image)`

kindred radish
#

As it's not necessarily directly related to AI

lapis sequoia
#

yes but, there is no people who answer me

kindred radish
#

wait :)

#

It took me two weeks but I've only just noticed that the reason my model isn't training isn't necessarily my data. It's my code for my model:

#

Orange is sklearn.linear_model.linearRegression() blue is my own OLS algorithm

#

kill me ;-;

#

idk how i didn't notice what a shit job it was doing

olive shore
#

has anyone used hugging face before or is good with AI

#

I am trying to create a personal assistant app that would answer questions based on information I trained it on. I want to lets say upload a book or a dataset of a lot of research papers then when I ask a question it would give it

#

is that possible without having context or something

#

just train it with some data?

#

is this what i am looking for?

#

or is this something else

arctic wedgeBOT
lapis sequoia
#

import cv2
import numpy as np

def dibujar(mask,color):
,contornos, = cv2.findContours(mask, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
print ("ya deje de joder")
for c in contornos:
area = cv2.contourArea(c)
if area > 3000:
M = cv2.moments(c)
if (M["m00"]==0): M["m00"]=1
x = int(M["m10"]/M["m00"])
y = int(M['m01']/M['m00'])
nuevoContorno = cv2.convexHull(c)
cv2.circle(frame,(x,y),7,(0,255,0),-1)
cv2.putText(frame,'{},{}'.format(x,y),(x+10,y), font, 0.75,(0,255,0),1,cv2.LINE_AA)
cv2.drawContours(frame, [nuevoContorno], 0, color, 3)

cap = cv2.VideoCapture(0)

azulBajo = np.array([100,100,20],np.uint8)
azulAlto = np.array([125,255,255],np.uint8)

amarilloBajo = np.array([15,100,20],np.uint8)
amarilloAlto = np.array([45,255,255],np.uint8)

redBajo1 = np.array([0,100,20],np.uint8)
redAlto1 = np.array([5,255,255],np.uint8)

redBajo2 = np.array([175,100,20],np.uint8)
redAlto2 = np.array([179,255,255],np.uint8)

font = cv2.FONT_HERSHEY_SIMPLEX
while True:

ret,frame = cap.read()

if ret == True:
frameHSV = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
maskAzul = cv2.inRange(frameHSV,azulBajo,azulAlto)
maskAmarillo = cv2.inRange(frameHSV,amarilloBajo,amarilloAlto)
maskRed1 = cv2.inRange(frameHSV,redBajo1,redAlto1)
maskRed2 = cv2.inRange(frameHSV,redBajo2,redAlto2)
maskRed = cv2.add(maskRed1,maskRed2)
dibujar(maskAzul,(255,0,0))
dibujar(maskAmarillo,(0,255,255))
dibujar(maskRed,(0,0,255))
cv2.imshow('frame',frame)
if cv2.waitKey(1) & 0xFF == ord('s'):
break
cap.release()
cv2.destroyAllWindows()

#

Someone help me i dont know what is wrong

#

Hello, Do you know some good sources for finding a plan for how to become a data scientist? I mean a real plan, not how to become a senior data scientist. I found some articles but there is too much to learn, you need a lifetime to learn this stuff. I started with linear algebra and also learned the basics for ANN, but there is much more. I need a good plan because I want to find a job in a year maybe.

late shell
#

Hello, I'm a noobie to ML and was learning about Decision Tree Regression and was testing out something on my own. The Decision Tree algorithm for Regression works in a way that, for each node of the tree, iterates through all the values of all the features trying to find the split that decreases the SSR the most. At each iteration the algo considers only 2 points at a time, takes their average, makes the split at that average, and then makes predictions using that split and calculates the SSR. And then selects the split which decreases the SSR the most. I was wondering, does the number of observations considered at the time of a split (i.e. 2 right now) affect the model in any way. I believe its a trade-off between speed/time taken by model to train and accuracy of the model. So I wrote a notebook for testing it out whether this trade-off is significant enough to be considered. Can someone please go through this notebook and let me know if I'm just wasting my time doing silly & useless things or should I continue this exploration. It'll be really valuable to me if someone gives a feedback on this. Thanks
Here is the notebook : https://github.com/Noobie20/ML/blob/master/Regression/Decision Tree Regression/n_obs_split.ipynb

GitHub

notebook learnings. Contribute to Noobie20/ML development by creating an account on GitHub.

untold verge
#

is there a way to data mine facebook?

tame lichen
#

hi so if im prediciting sales for a company whats the best type of model to use for something like that?

mint palm
#

used above, what does int64 do?

#

does it limit the number of digits to such that they are 64bits and fasten up the model?

serene scaffold
mint palm
#

a2 is just activated vector after applying activation function

#

for layer2

serene scaffold
mint palm
#

oh yess so that instead of true false we get numbers

#

right?

serene scaffold
#

sounds right to me

mint palm
#

did saw something like that in lecture

serene scaffold
#

it looks like that line is just a fancy way of setting certain values in dA2 to 0

#

I think dA2[A2 <= 0] = 0 would have the same effect.

mint palm
#

he did it in less fancy manner one in forward propogation

#

why would he make it more fancy here lol

#

😆

#

he did this earlier

#

in forward prop of dropout

serene scaffold
#

Can you see how that could be simplified?

mint palm
#

no cuz i dont get what int64 is doing there

serene scaffold
#

there is no int64 in that one

mint palm
#

its making values less then probability to 0 and others unchanged

serene scaffold
#

so do you see how you could simplify it, knowing that dA2[A2 <= 0] = 0 is an alternative to the other one?

mint palm
#

its sort of same i guess i understand

lilac needle
mint palm
#

is saw some usage but does fit into bool concept

#

its not in bool

lilac needle
#

That’s why when you sum a series of true and false, you’ll get the total count of True values

lilac needle
haughty pagoda
#

guys

#

can anyone help me with opencv?

#

i wanna detect angular velocity

#

of a rotating object

#

Abyone?

#

*anyone

grave frost
#

damn I hate pandas

#

I want to drop the second row, but it messes up the index

#
0               column_2
2                     ....
3                     ....

1 is missing from the index due to the deletion causing keyerror. anyone know what this problem is called and Its solution?

primal tulip
grave frost
primal tulip
nova widget
#

@grave frost just do "for row in dataset:"

#

or "for row in range(len(dataset))

grave frost
#

no more pandas 🥳

mint palm
tepid rapids
#

hey im working on a KNN algorithm that tries to predict whether a youtube video will trend based on the title. Does anyone know where i can get a dataset that includes trending and non trending? i can only find trending so far...

velvet linden
#

so if i have a program
that checks a csv file, and it is like if this input is found in the a column then go the value next to it in the b column, and check if the next input the user types matches that.
but i dont know how to do that
any help?

real basalt
exotic maple
maiden sigil
#

how to filter date multiple coloumn

copper willow
#

Hi guys, I hope this is the right channel to ask for opinions about this: I want to create a whatsapp bot using Python that report to the users the status of delivery of their product. Any ideas of how can I do that?

flint mason
#
df['Percent'] = clean_values(df['changesPercentage'])

Note: clean_values removes brackets and symbols out of the number and convert the string to float. Is there something wrong with the syntax

raven knoll
#

I am working on a unsupervised text sentiment project but this is the first time I am doing this. I got some feedback the last time I posted here but I still have some questions.

Currently I have a dataset to train the model but I don't know how to make the model.

  1. I have preproccesed the data. (stemmed, lemmetized, removed stopwords)

  2. I have used a w2vec

  3. Used Kmeans to create 2 clusters (but the clusters are not good because I don't know what I can do.)

  4. Now I don't know what to do

lapis sequoia
#

you cannot believe how many hours it took me to realize that fit_transform is actually fitting and then transorming the DataFrame 🤣

#

I was applying that nonstop on test dataset

primal tulip
lapis sequoia
#

yeah, i was literally hugging the documentation at night and praying to it in hopes of finding an answer

primal tulip
# lapis sequoia yeah, i was literally hugging the documentation at night and praying to it in ho...

I have like 5 days fighting over a read-streaming-data program I kinda need for work to do some aggregations on huge datasets.
I read a bit on https://wiki.haskell.org/Lazy_evaluation Haskell's Lazy approach and when combined with Pandas it can deal with the data in chunks decently. Even tho it was going really slow so something was amiss. I have a padding function for UnicodeEncodeErrors where it just printed a '?' for each invalid char it found, but the issue was I passed the whole chunk of the dataframe and it casted it to str, instead just the invalid value. Since almost each chunk had one weird char, I was casting everything as a string, printing each char one by one in that chunk and then casting it back to pd.Dataframe. To read 1 million records and 30ish rows, it took 52 minutes lol. I haven't fixed it yet, but hopefully it'll work it out in 20 seconds (ish) if everything goes accordingly.

untold cove
untold cove
#

I want 2 bars with px.bar.

#

@primal tulip doesn’t provide a bar example nor with plotly.express unfortunately

lapis sequoia
fast saffron
#

I need help to fit multiple columns in a linear

#

LinearRegression

#

Like comparing X to diffrent Ys(diffrent columns)

primal tulip
mint palm
#

How is this momentum equation derived.....i need reason to why the equations are like this....

#

?

noble sand
#

I'm trying to plot information/facts about companies from a dictionary item onto a timeline like this, how would I do that? At the moment, I've got one item stored in the dictionary, it gets plotted on the graph but doesn't get labelled with its name and as well annotating other info too...

winged yew
#

anyone have ML projects >>>

#

???

knotty kayak
#

does anyone know why using multiprocessing while using matplotlib opens up multiple plots

kindred radish
#

Quick question about plotting data

#

Say i've got this data:

#

You could easily just draw a straight line through it and say they're linearly correlated

#

but is this gap in the middle problematic?

#

If i plotted a straight line through both regions individually it would say that they're not linearly correlated

sweet cobalt
#

If so thats probably why

knotty kayak
#

nope

#

it can be any function, even with a function with just print and itll do the same thing

sweet cobalt
#

So nothing to do with matplotlib is in the function t

#

Just the fact that you imported matplotlib causes the multiple plotting

misty flint
#

2 data clusters seem pretty significant depending on your problem

#

like if you were in retail and it represented 2 different demographics

#

otherwise, you could probs use a linear model...just probs not the strongest is all

dapper halo
#

My colab continuously crashes when I simply take the difference between predicted values and true values.

Worked around that by throwing it into a loop to compute difference of each element (wouldnt this take more RAM??).

Now it gets stuck when attempting to plot the histogram of those difference. Any tips for reducing the load....which i honestly don't even get why its having a problem with plotting yet trains just fine

late shell
#

hello, can someone help me understand the non-decreasing property of R^2 regarding regression models. I clearly, can't understand why the hell can R^2 never decrease upon addition of new predictors. I found this explanation on stackexchange. At the end of this answer, the guy says Or if extra estimated coefficient(βp+1) takes a nonzero value , the SSE will reduce. Why would the SSE necessarily decrease. Isn't is possible that the new combination of coefficients (β's) would make even worse predictions. What if the model, upon addition of a new predictor makes even more worse prediction than the model where the "new predictor" wasn't present. Because of worse predictions, the SSE will increase as a result of which, R^2 will decrease. Where am I wrong?

molten hamlet
#

@desert oar , just saying I solved it if you were curious hah

desert oar
#

this looks like a much better use of pandas functionality

#

glad you figured it out

fierce grove
#

@late shell The basic idea is Sum of Squares total (SSTotal) equals to Sum of square of individual factors + Sum of sqaure of their interactions(if any) + Sum of squares of error(SSE)
SST = (SS1 + SS2+...SSn)+ (SS12+SS13+...)+ SSE.
So by adding a new predictor say n+1 , then it comes in form of SS(n+1) and its interactions with others (if any) and since SStotal remains constant the SSerror has to decrease.
Thus in R^2 formula either the SSE decreases or it remains constant. So the R^2 either increases or remains constant.

#

Hope it helped you.

oak jungle
#

Hey, I was wondering if this channel included neural networks and machine learning, or if this is just for standard a.i.

tidal bough
oak jungle
#

Ok, thanks, forgot to check that

#

For some reason

wicked mantle
#

can one CNN model be used for all types of images? 🤔 (for recognition)
For example i have model which is good at dogs, cats, ducks dataset. At the result, can i just change dataset to else images? Without changing fundamental CNN model

ripe forge
#

Sure, with no guarantees whether it will perform just as well or not.

thick kestrel
#

Hello people... first time posting here... I am working on a model that predicts whether a person was arrested based on some variable information...
My target variable has multi-class data and I chose to convert the classification to numerical values prior to fitting the data to the model.

new_target_values = {'Arrest':0,
'Field Contact':1,
'Citation / Infraction':2,
'Offense Report':3,
'Referred for Prosecution':4}

I got ValueError: ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

Should I just do a binary classification and have arrest be 1 while the rest are 0s or should i try fitting in a multiclass model

tame lichen
#

question what applications is a random forest model good for?

serene scaffold
tame lichen
#

kinda general I know

velvet thorn
lavish sleet
#

Does anyone have a code snippet for multi word keyword analysis

slate hollow
#
Epoch 1/100
1250/1250 [==============================] - 1s 400us/step - loss: 128.7992
Epoch 2/100
1250/1250 [==============================] - 0s 397us/step - loss: 1.5939
Epoch 3/100
1250/1250 [==============================] - 1s 406us/step - loss: 1.4500
Epoch 4/100
1250/1250 [==============================] - 0s 385us/step - loss: 1.3226
Epoch 5/100
1250/1250 [==============================] - 0s 398us/step - loss: 1.1951```so uh my `X_train` size is 40k, so why is it only 1250
#

(ping 2 reply thx)

slate hollow
#

wait uh

#

here

#
import tensorflow as tf
import numpy as np
keras = tf.keras


def func(inp: np.ndarray) -> np.ndarray:
    return np.array([inp[0] * 2, inp[1] + inp[0] * 3, inp[0] * 1 + inp[1] * 10, inp[1], inp[0] + inp[1]])


training = []
for x in range(200):
    for y in range(200):
        training.append([x, y])
X_train = np.array(training)
y_train = np.array([func(x) for x in X_train])

model = keras.models.Sequential(layers=[
    keras.layers.Dense(5, input_shape=(2,)),
    # keras.layers.Dense(5)
])

lr_decay = keras.callbacks.LearningRateScheduler(lambda e, lr: lr * np.exp(-0.1) if e < 20 else lr)
model.compile(optimizer=keras.optimizers.SGD(lr=1e-3), loss=keras.losses.MeanAbsoluteError())
model.fit(X_train, y_train, epochs=100, callbacks=lr_decay)
#

it's really rudimentary, but i'm just learning

velvet thorn
#

40000 / 32 = 1250

slate hollow
#

oh

#

so that's like how many batches?

velvet thorn
#

ye

#

I don't remember it being like that

slate hollow
#

also another question, the loss is the sum of the losses

velvet thorn
#

but it's been a while

slate hollow
#

over all

#

over all training instances right

velvet thorn
#

uh

#

the loss is defined

#

as a function over arrays

#

like f(actual, predicted)

slate hollow
#

yeah

#

but there's so many instances

#

is it averaged or summed

velvet thorn
#

that's my point

slate hollow
#

wait what

velvet thorn
#

not a function applied over individual values in actual and predicted

slate hollow
#

i'm confused

velvet thorn
#

you have an array of actual values, and an array of predicted values

#

and the loss function

#

does whatever it wants

#

so

#

in the case of mean absolute error

slate hollow
#

it depends on the loss function

velvet thorn
#

it's the sum of the absolute differences

#

(I would presume)

slate hollow
#

yeah ok then

#

thx!

velvet thorn
#

yw 👋

slate hollow
# velvet thorn yw 👋

hey, uh, i have another question
when tweaking the parameters, why is it lr * gradient? wouldn't just a general direction be enough? (positive, negative, or 0)?

shy tundra
#

Hey guys, so i want to build a 3D model of a place for a project and i want to run an AI simulation through it based on customer shopping patterns. What is a good program for me to use which supports AI in the 3D model

exotic maple
# slate hollow hey, uh, i have another question when tweaking the parameters, why is it `lr * g...

the learning rate is an adjustment to fine-tune how much your function is changing based on the observed gradient.

if you simply set it to 1 you have no flexibility and your model might never reach (or take forever to) reach the global minima for the cost function. Too high LR can make you "fly over" the minima and too small may take too long and use too much computational resources to converge into a solution

slate hollow
exotic maple
slate hollow
#

i mean like take this hypothetical cost function: / ---/ <-- we're herei mean it's steep, but that doesn't mean we should

#

oh ok

#

so it's just generally agreed upon, and it's worked for most models?

exotic maple
#

this is what LR does

#

Learning rate just determines the "step size" how large is your jump

#

and this is a good visualization of what happens with different learning rates

#

if you look at the right image it "jumps" over the minima because the step size (LR) is too large

slate hollow
#

hey- for the sklearn housing data set, ik the target variable is the mean house price, but what's the unit for that?

#

$100k or something?

delicate lodge
#

Hi ,
I am developing a recommendation system
I have a question...
that suppose we have the product list so how we can do synthetic grouping of that list.
for example
we have
milk , 1L
milk, 500ml
milk,2L
I want that my system consider it as same
any idea ...

ripe forge
#

Here's the real kicker. My direction information felt incomplete when we even knew our destination. Now imagine the same scenario where we don't even know where the destination is.. Oh and we teleport randomly to different roads and keep asking the same question.....

#

Best part is, that assumes that we would even know we're there when we arrive. Which we don't. Sounds like fun

hardy jetty
#

What is the difference between np.min(array) and array.min()? I timed it on a numpy.ndarray, and array.min() is a bit faster. Would've thought it would be the same speed.

lapis sequoia
#

Hey everyone, I have a question about the credit risk notebook from pysurvival

#

the goal in this notebook is to predict the speed of a repayment loan

#

but at the end

#

we finish by plotting this graph that I don't understand

#

I'm not sure to know what is the y-axis and I don't understand how the high risk line can be faster to repay the loan than the low and medium risk

#

shouldn't it be the opposite?

#

Also I'm not sure to understant what the "T=6.0" means (the actual time)

#

I looked at the code but it didn't help me that much, can you help me please?

lapis sequoia
#

Hello. Anybody with data science experience? I want Simpsons transcripts for a machine learning task. I want them all in .txt files for all the episodes named ep1.txt, ep2.txt, ep3.txt, ep4.txt ... and so on. I found a script dataset of Simpsons here: https://www.kaggle.com/prashant111/the-simpsons-dataset?select=simpsons_script_lines.csv
but it is one csv file that is not split. How to I get the data in the kaggle link to my format?
Can anybody tell me a script to get the data in the format I want? Or is the data available in my wanted formatted anywhere? I'd appreciate any sort of help!

serene scaffold
#

@lapis sequoia do you know pandas?

uncut barn
#

What is the difference between the correlation coefficient and the p-value in relation to how good the regression model is?

tame sleet
#

I need some help with numpy's random.randn
why and what does it print

serene scaffold
serene scaffold
#

whereas the correlation coefficient is a measure of how strongly related two variables are.

uncut barn
#

@serene scaffold ah ok thanks

#

has anyone read the storks delivers babies paper?

lapis sequoia
serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
#

but I'm new to dealing with csv files

tame sleet
lapis sequoia
#

@serene scaffold are you there?

serene scaffold
#

!docs pandas.DataFrame.groupby

arctic wedgeBOT
#

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)```
Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
lapis sequoia
#

how do I use pd.dataframe.groupby to achieve what I want?

serene scaffold
serene scaffold
lapis sequoia
serene scaffold
#

look at the data--if you want to handle each episode separately, what column gives you that information?

serene scaffold
#

are you sure?

lapis sequoia
#

episode id

serene scaffold
#

right

lapis sequoia
#

🙂

serene scaffold
#

strictly speaking it is episode_id. the underscore is necessary

serene scaffold
#

so you need to select rows by each episode id and write out each slice.

lapis sequoia
#

let me try to write the code
can I ask you if I have any problems while writing the code

serene scaffold
lapis sequoia
#

hello
my computer is very weak
it crashed because the csv file was too large
and repl.it can't load the csv file

serene scaffold
#

@lapis sequoia you can just read in a certain number of rows at a time, but that means you'll need to be appending to the outputted files

tawdry hamlet
#

Yo, is it alright if I ask for some advice on how to do some down-and-dirty outlier detection in a t-SNE plot? I am currently evaluating a weird machine learning method I jury-rigged together and am trying to generate some evidence that what the system is flagging as abnormal is actually abnormal

lapis sequoia
#

@serene scaffold ARe you there

#

I wrote my code

#

but was waiting for your homework to finish

lapis sequoia
#

but I need your help

#

a lil

#

are the episode ids random unique ids? or are they the number of the episode?

sick wedge
#

Sorry to interrupt you Pinkie, hoping someone else can chime in, I'm trying to catch up on this course but I'm doing I'm really stuck on the basics, at the moment I'm on this exercise:

Exercise 6:
Please import from seaborn the famous Anscombe’s quartet. Then plot them with
matplot. And calculate their means, variances correlations and linear fitting
coefficients. For linear regression, you can use the sklearn lib. Can you have a more
concise way to plot the data?

And I'm given the code

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import linear_model

anscombe = sns.load_dataset("anscombe")
print(anscombe)

# create subsets and subplots of the anscombe data
dataset_1 = anscombe[anscombe['dataset'] == 'I']
dataset_2= anscombe[anscombe['dataset'] == 'II']
dataset_3 = anscombe[anscombe['dataset'] == 'III']
dataset_4 = anscombe[anscombe['dataset'] == 'IV']
fig = plt.figure()

axes1 = fig.add_subplot(2, 2, 1)
axes2 = fig.add_subplot(2, 2, 2)
axes3 = fig.add_subplot(2, 2, 3)
axes4 = fig.add_subplot(2, 2, 4)

axes1.plot(dataset_1['x'], dataset_1['y'], 'o')
axes2.plot(dataset_2['x'], dataset_2['y'], 'o')
axes3.plot(dataset_3['x'], dataset_3['y'], 'o')
axes4.plot(dataset_4['x'], dataset_4['y'], 'o')

#linear regression model
regr = linear_model.LinearRegression()
regr.fit(dataset_1['x'].values.reshape(-1,1), dataset_1['y'].values.reshape(-1,1))
axes1.plot(dataset_1['x'].values.reshape(-1,1), regr.predict(dataset_1['x'].values.reshape(-1,1)), 'r')
plt.show()

I really just barely have a clue how this code is even working, I understand it is plotting graphs atleast, and I know the Anscombe’s quartet will have the same means, variances, medians, etc... but can anyone guide me through calculating those values? Would appreciate any help

#

I didn't receive much support from my lecturer since face-to-face teaching is not allowed :\

tame lichen
#

like where would I find the code for an alogarithm like this?

uncut barn
#

what are the possible relationships between correlation and causation?

lapis sequoia
#

@serene scaffold Are you online

serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
serene scaffold
golden pawn
#

logo_panda3d[PANDAS] Hello men. I have a big trouble with having no idea how to write a code to print this:
The most popular girl’s name and boy’s name in every year ( two records for year )
And I wonder how to make that? That’s the excel sheed which I have read in. Liczba means amound, Plec means sex, Imie means name and Rok means year. logo_panda3d And thats the code I was trying to do smth with ```py

print(f"{df1.loc[(df1.groupby('Rok')) & (df1.Plec == 'M')]['Liczba'].idxmax()}")

lapis sequoia
# serene scaffold right, so once you have all those CSVs, what do you want to do with them?

I don't want many csvs, I want many txts. I need them for a machine learning project, specifically, few-shot learning with EleutherAI's GPT-Neo.
I finished the code to generating all my txts. Can you please verify my code and correct and explain me all errors? Also tell me how I can improve my code and why it isn't working if it isn't working. Also inform me if it works as expected.

#

@serene scaffold Are you there?

serene scaffold
#

Just all the dialogue in a given episode as one continuous stream of text?

lapis sequoia
serene scaffold
#

That's easy to do if you can fit the whole csv in your ram

lapis sequoia
#

did you check out my code

lapis sequoia
lapis sequoia
serene scaffold
#

One moment

lapis sequoia
#

np.full((25, 25), "white", dtype="object")
raises
ValueError: Object arrays cannot be loaded when allow_pickle=False

lapis sequoia
#

then it will work

#

where?

#

i'm not seeing that as an argument in np.full

lapis sequoia
lapis sequoia
lapis sequoia
#

try the argument in np.full

#

or remove pickle data from your file

#

@lapis sequoia wait I looked it up

#

there is no allow_pickle argument in numpy.full()

serene scaffold
#

@lapis sequoia your code can at least be greatly simplified

lapis sequoia
#

try opening an issue on github'

lapis sequoia
serene scaffold
#
import pandas as pd

df = pd.read_csv('simpsons_script_lines.csv')
episode_ids = df['episode_id'].unique()

for id_ in episode_ids:
    ...
#

see if you can go from there

serene scaffold
#

it's a method but yes

lapis sequoia
#

I thought I had to make it a list and then make it a set to make it unique

serene scaffold
#

that's the verbose way to do it

#

it's easier to just let pandas do it for you 😁

lapis sequoia
#

I understand it's not simple, but does my code at least run properly? Can you check the txt files that it generates and whether they make any sense?