#data-science-and-ml | Python | Page 63

plain jungle May 17, 2023, 6:09 PM

#

If you want this vid goes into the understanding of how DNNs work from scratch. TensorFlow Keras and PyTorch are all nice, but I find that they add too much abstraction for someone new to NNs, so they may work but the less experienced users don’t know why

Hope it helps!

YouTube

JTexpo

The Math AI : Building a Neural Network for Algebraic Problem Solvi...

Automate algebraic problem solving with this comprehensive tutorial, where you'll learn how to implement a neural network to effortlessly tackle math questions. In this in-depth guide, we'll build a neural network from scratch using the powerful NumPy library, enabling you to create a dynamic model in Python.

Take your skills to the next level...

▶ Play video

#

There’s a cool concept in discrete math where you can find a formula for a poly functions given you have N (equal to the greatest exponent + 1) size to train on. Let me see if I can find a good link talking about it

#

Here it is https://discrete.openmathbooks.org/dmoi2/sec_polyfit.html

Hope it helps!

#

I don’t have too much experience on the topic, but iirc deep fakes make people pronounce how the letters sound. Such as A = “ah” B = “buh” etc… I would guess that to detect words you’d do something similar to getting samples of just the letters on their own, then in a word and try to tokenize the word with the sounds; however, please don’t take this advice as gospel, I am unfamiliar with anything other than Amazon transcribe when it comes to this topic

umbral charm May 17, 2023, 6:34 PM

#

plain jungle I don’t have too much experience on the topic, but iirc deep fakes make people p...

Haha mine isnt for words, its for a 'splosh' sound

#

actually it would be to detect any sound at all

#

I just need it to hear anything

#

than do something

plain jungle May 17, 2023, 6:42 PM

#

If you just want to detect general sound. A threshold of amplitude could work.

librosa may be a good start if you don’t have any direction just yet

umbral charm May 17, 2023, 6:44 PM

#

plain jungle If you just want to detect general sound. A threshold of amplitude could work. ...

I was thinking librosa

#

But problem is it detects the audio from my microphone (audio input)

#

i want to do it from a video playing from my computer

#

so from my sound output basically

plain jungle May 17, 2023, 6:49 PM

#

Oh hrmm… yeah that’s a little bit too far past my knowledge and I’d hate to give ya bad info

night kernel May 17, 2023, 7:04 PM

#

hey guys, i just recently got the source code for an NLP that will help me for an iOS app that im building. it's a question generator. i know this is a broad, naive question but how can i go about implemented this into my iOS project?

#

this might be the wrong thread for this question but the python code is data science/ai

past meteor May 17, 2023, 7:57 PM

#

How do you guys deal with this?
The data we have is cleaned etc. and now in a DB, we just have one big table for this.

We're multiple people working on the same data and potentially making features, all in different programming languages.

How would you go about making new features and making them accessible for everyone? I'm not a fan of alter tabling every time we make a new feature because the upstream code that fills the DB is "unaware" of this. The way I'm dealing with it for now is essentially making smaller tables (that contain the features) that I join with the big one whenever I read from the DB....

What I'd dislike even more is just version controlling Parquet / CSV's because that's the beginning of the end of serious projects imho.

#

In general the solution is to make the features at train/inference time instead of storing them in a DB but that's not so obvious for a lot of things we're dealing with.

somber pollen May 17, 2023, 8:31 PM

#

past meteor How do you guys deal with this? The data we have is cleaned etc. and now in a D...

this is the like one case where NoSQL sounds like it would be a good fit

#

https://pypi.org/project/pymongoarrow/ have all the queries output a csv using something like this

#

but you could just do a projection, and then each "version" would just be assigned a projection which would show or hide certain fields

past meteor May 17, 2023, 8:32 PM

#

We also have a mongo and minio (approx. S3 bucket but on-prem)

somber pollen May 17, 2023, 8:33 PM

#

Well if you ever are adding new features a lot then Mongo would be pretty decent

past meteor May 17, 2023, 8:33 PM

#

But I don't think NoSQL solves this, can you elaborate?

somber pollen May 17, 2023, 8:34 PM

#

As I understand it you basically want one large database, where each row of the database is some set of features. You can use a projection to provide a specialized view of this database depending on which features you want to use, and then it will be the same database, but missing the features you exclude (like new features that old models can't use)

#

But this would allow you to have one large database, incrementally add features as you go, and then you can just export to CSV whenever you need to actually train something with it

#

let's say you have features a, b, c = 0, 0, 0, then if you want all the rows but just with just a, b then you could do projection={"a": 1, "b":1}

#

and if you want to add a new feature you can by just append it to each document in the db

#

but that old projection will never change which features it has

#

then managing the different projects is just a case of managing their projections

past meteor May 17, 2023, 8:38 PM

#

I see - this is quite interesting and quite close to what I want indeed! Adding fields to Mongo is also way less of a hassle compared to SQL.

somber pollen May 17, 2023, 8:39 PM

#

Yeah Mongo is kinda designed for web applications where you similarly have to constantly update your schema

#

(boss wants you to add a new button toggle, etc)

#

I am biased tho bc I worked there tho haha

past meteor May 17, 2023, 8:39 PM

#

I'd have to check with the rest of my team if they're willing to move what we have back to Mongo (this is our upstream DB, together with minio) but you've got me sold

somber pollen May 17, 2023, 8:40 PM

#

past meteor I'd have to check with the rest of my team if they're willing to move what we ha...

yeah definitely think about it especially depending on exactly what the types of the features are

past meteor May 17, 2023, 8:40 PM

#

In principle the barrier is low because we already used it, but we have SQL-lovers so it might be a tough sell

somber pollen May 17, 2023, 8:40 PM

#

that's fair--you can also use the PyMongoArrow package to kinda type-check your db

#

Ooh wait one other thing you could try is I think a bunch of sqls now support a json field

#

you could just store all your features in one column

#

or a few of them, but regardless, less churn on the db

past meteor May 17, 2023, 9:25 PM

#

Yeah, our most downstream DB is postgres so I could have a bunch of the features in a JSON column

#

Your suggestions make a lot of sense, I don't know why I didn't think about it 😄

night kernel May 18, 2023, 1:42 AM

#

hey guys. i have a new question that i believe to be more relevant than my last. as i mentioned i have a question generator that takes texts and can create quizzes for me.

i want to be able to transcribe youtube videos and then take that text and put it automatically in the question generation ai

#

so im basically wondering if anyone knows about video transcription ai, how hard it is to make, etc

earnest widget May 18, 2023, 6:15 AM

#

Is there any specific model format which is preferred on handheld devices which is memory efficient? .pth, tflite or .onnx?

night prawn May 18, 2023, 9:59 AM

#

have you a solution ?

proven kelp May 18, 2023, 10:58 AM

#

Hello!
I was doing some research in denoising Hyperspectral images using RNNs.
I saw a GitHub repo that's doing the same thing, but I'm not that much experienced with MATLAB.
I've downloaded all the prerequisites libraries and set up the coding environment. I just need a head start in understanding how the code is being implemented and sequencing.
Could anyone please help me?
I'll DM you the repo link!

proven kelp May 18, 2023, 11:58 AM

#

Here is the repos link:
https://github.com/Vandermode/QRNN3D

GitHub

GitHub - Vandermode/QRNN3D: 3D Quasi-Recurrent Neural Network for H...

3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising (TNNLS 2020) - GitHub - Vandermode/QRNN3D: 3D Quasi-Recurrent Neural Network for Hyperspectral Image Denoising (TNNLS 2020)

night prawn May 18, 2023, 12:52 PM

#

how tu use gpu with tensorflow on WSL ?

earnest widget May 18, 2023, 1:55 PM

#

Isn't this strange that the model gets 100% training accuracy in the first epoch?

#

With a pretrained model.

tired cargo May 18, 2023, 2:35 PM

#

Can anyone tell me what algorithm would be best to train an RNN algorithmic trading system ?

pastel tiger May 18, 2023, 3:08 PM

#

earnest widget With a pretrained model.

Lmao

mint palm May 18, 2023, 4:09 PM

#

to convert video embedding (batch, frames, embed_size) aka "sequential embedding" to single embedding(bs, embed) aka "non-sequential embedding"
is this correct?

class TransformerEncoderWithCLS(nn.Module):
    def __init__(self, config: Config):
        super().__init__()
        self.embed_dim = config.embed_dim
        self.num_layers = 2
        self.num_heads = 16
        self.feedforward_dim = 2048
        self.cls_token = nn.Parameter(torch.randn(1, 1, self.embed_dim))
        self.transformer_encoder = nn.TransformerEncoder(
            nn.TransformerEncoderLayer(self.embed_dim, self.num_heads, self.feedforward_dim, dropout=0.3), self.num_layers)

    def forward(self, video_embed_sequential):
        # input batch_size x frames x embed_size
        
        # Add the cls token to the input sequence
        # batch_size x frames+1 x embed_size
        CLS_video_sequential_embed = torch.cat((self.cls_token.repeat(video_embed_sequential.shape[0], 1, 1), video_embed_sequential), dim=1)
        
        # frames+1 x batch_size x embed_size
        CLS_video_sequential_embed = CLS_video_sequential_embed.permute(1, 0, 2)
        
        # Pass the input through the transformer encoder
        # frames+1 x batch_size x embed_size
        output = self.transformer_encoder(CLS_video_sequential_embed)
        
        # Extract the embedding corresponding to the cls token
        # batch_size x embed_size
        non_seq_video_embed = output[0, :, :]
        
        return non_seq_video_embed

hardy depot May 18, 2023, 4:39 PM

#

guys im tryina use dlib for image classification and i need help, like i made the code to get facial landmarks, but i need to incorporate it into a model, first the image should pass through this dlib face landmark program and then use that image which has the landmarks on it to be passed into the model

#

if anyones kinda enough could u maybe dm me ill share the programs i have

long zephyr May 18, 2023, 5:09 PM

#

Hello guys, I might need some help 😦

I have this model which takes 60 samples and makes 3 predictions per batch as follows:

X_train = []
y_train = []
for i in range (60,training_set.shape[0]-PREDICT_PERIOD): 
    X_train.append(scaled_training_set[i-60:i, 0])
    y_train.append(scaled_training_set[i:i+PREDICT_PERIOD, 0])
X_train = np.array (X_train)
y_train = np.array (y_train)

print(X_train.shape) # (1047, 60)
print(y_train.shape) # (1047, 3)

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1)) # (1047, 60, 1)

(here i have my training set construction, where PREDICT_PERIOD is 3

Then i have my regressor defined like

regressor = Sequential ()
regressor.add(LSTM(units = 50, return_sequences= True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout (0.2))
regressor.add(LSTM(units = 50, return_sequences= True))
regressor.add(Dropout (0.2))
regressor.add(LSTM(units = 50, return_sequences= True))
regressor.add(Dropout (0.2))
regressor.add(LSTM(units = 50))
regressor.add(Dropout (0.2))
regressor.add(Dense (units=1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs=100, batch_size=32)

But when I make a prediction:

print(X_test.shape) # (1, 60, 1)
y_test = regressor.predict(X_test)
print(y_test.shape) # (1, 1)
y_test = scaler.inverse_transform(y_test)
print(y_test.shape) # (1, 1)

And I am expecting my shape to be (1, 3) instead

thorn swift May 18, 2023, 6:35 PM

#

long zephyr Hello guys, I might need some help 😦 I have this model which takes 60 samples ...

regressor.add(Dense (units=1)) to regressor.add(Dense (units=3))

mighty orchid May 18, 2023, 6:43 PM

#

is there a way to parallelize a UDF in polars? i have about 2.2 million strings that i want to run through an NLTK lemmatizer

dreamy phoenix May 18, 2023, 7:07 PM

#

https://gyazo.com/c9e648f3acc269664a45aaedca5368bb
Is this normal?

Gyazo

hardy depot May 18, 2023, 8:31 PM

#

hardy depot guys im tryina use dlib for image classification and i need help, like i made th...

hey?

subtle knot May 18, 2023, 9:09 PM

#

If I have the writing score and parents education for a number of students in a dataset then is doing this correct?how does it decide what value to take of the writing score for each value of the parent education?

rn_image_picker_lib_temp_f88a9177-f1da-473b-9695-29ae8b975ccc.jpg

agile cobalt May 18, 2023, 9:15 PM

#

subtle knot If I have the writing score and parents education for a number of students in a ...

seems about right

how does it decide what value to take
always check the documentation when you have questions about how the a library works, for example in that function's documentation https://seaborn.pydata.org/generated/seaborn.barplot.html you can see that it allows for you to specify some parameters like
estimator='mean', errorbar=('ci', 95),

#

it is also usually checking the tutorials / user guides / quickstart / alike if the library provides one, for example https://seaborn.pydata.org/tutorial.html

subtle knot May 18, 2023, 9:20 PM

#

agile cobalt seems about right > how does it decide what value to take always check the docum...

Oh thanks I understand it now 👍

severe dune May 18, 2023, 10:30 PM

#

Anybody know a good website for Datasets?

lapis sequoia May 18, 2023, 10:39 PM

#

hey, im trying to make a OCR solver so i asked chatgpt about how to make one he told me somethings i dont remember the name of what i need to do but the first thing he told me is to make a script that recolor the OCR image in white (background) and let the ocr text showen in black that what i did and its working but the problem is i have some straight lines i cant remove in the image

#

so please someone help me with this

delicate apex May 18, 2023, 10:43 PM

#

that looks more like a captcha solver, which is against most place's ToS

lapis sequoia May 18, 2023, 10:47 PM

#

delicate apex that looks more like a captcha solver, which is against most place's ToS

yeah thats what im trying to do but its not something popular i just want to upgrade my skills by doing this

#

i tryed alot of ways nothing works

#

can u help me with that ?

serene scaffold May 18, 2023, 10:48 PM

#

lapis sequoia yeah thats what im trying to do but its not something popular i just want to upg...

Thank you for stating your intentions explicitly. This is something that is clearly against the rules you agreed to when you joined this server.

lapis sequoia May 18, 2023, 10:49 PM

#

serene scaffold Thank you for stating your intentions explicitly. This is something that is clea...

i just want to upgrade my skills by doing this

serene scaffold May 18, 2023, 10:49 PM

#

There are other ways to upgrade your skills.

lapis sequoia May 18, 2023, 10:49 PM

#

serene scaffold There are other ways to upgrade your skills.

🙂

#

but this is not something popular

serene scaffold May 18, 2023, 10:49 PM

#

If someone doesn't want bots interacting with their website, that's their prerogative

lapis sequoia May 18, 2023, 10:50 PM

#

my friend has an api that generate this

serene scaffold May 18, 2023, 10:50 PM

#

If you keep asking about this, you'll be kicked out.

severe dune May 18, 2023, 10:50 PM

#

Giving AI the ability to pass Human captchas

#

is big yikes

lapis sequoia May 18, 2023, 10:50 PM

#

severe dune is big yikes

🙂

serene scaffold May 18, 2023, 10:51 PM

#

OCR is a fine thing to learn about. But not in the context of bypassing captchas.

lapis sequoia May 18, 2023, 10:52 PM

#

serene scaffold OCR is a fine thing to learn about. But not in the context of bypassing captchas...

oh no

#

im not to make a bypasser 😂

#

im just trying to detect this image only

serene scaffold May 18, 2023, 10:52 PM

#

You already said that you are.

lapis sequoia May 18, 2023, 10:52 PM

#

serene scaffold You already said that you are.

i mean by that image

#

not all the images

#

this is the first image before i did what i said

#

lapis sequoia May 18, 2023, 10:53 PM

#

lapis sequoia hey, im trying to make a OCR solver so i asked chatgpt about how to make one he ...

now its like this 🙂

#

and i want to remove those straight lines

#

and detect the numbers

#

i dont know how to do all of this btw i just wanna begin learning AI

#

@serene scaffold can u help me please 🙂

serene scaffold May 18, 2023, 10:56 PM

#

I'm busy rn. At a graduation actually. That's why I'm on my phone

lapis sequoia May 18, 2023, 10:56 PM

#

serene scaffold I'm busy rn. At a graduation actually. That's why I'm on my phone

okey np goodluck 🙂

#

@delicate apex can u help me with that i saw u reacting

delicate apex May 18, 2023, 10:59 PM

#

i'm not helping with a captcha solver, and i have no experience with OCR

hasty mountain May 18, 2023, 11:18 PM

#

To make an OCR algorithm, you could try something around object detection. Or image segmentation.

#

I don't know how to make one either, I gave up trying before. But I'd try something around that. Detect objects that are characters and then try to make the model understand which characters they are

#

Applying thresholding to remark those objects may help greatly

steep echo May 19, 2023, 12:49 AM

#

Hey guys, I'm still having trouble with this thread if anyone is available to help. I think I've made progress but I'm still not sure whats going on

violet monolith May 19, 2023, 1:44 AM

#

@hasty mountain hello

hasty mountain May 19, 2023, 1:45 AM

#

joe_salute

violet monolith May 19, 2023, 1:45 AM

#

I want to discuss your project

#

I am a professional ocr developer

hasty mountain May 19, 2023, 1:46 AM

#

Nah. I didn't make it at all. It was just a model to try to extract scores from a game

#

For Reinforcement Learning

#

But then I just figured out that it was better to just make a model to directly receive as input a screenshot with the said score and output a reward

violet monolith May 19, 2023, 1:46 AM

#

yes. you can upload the game screen as example

#

yes. please show the screenshot as input

hasty mountain May 19, 2023, 1:47 AM

#

Why?

violet monolith May 19, 2023, 1:47 AM

#

I can extract the score

dusty bay May 19, 2023, 2:21 AM

#

anyone knows how to recognize header position and extract header info. on pandas?

dusk aurora May 19, 2023, 2:59 AM

#

dusty bay anyone knows how to recognize header position and extract header info. on pandas...

Are you referring to the column names?

dusty bay May 19, 2023, 3:01 AM

#

dusk aurora Are you referring to the column names?

No, column names can change

lapis sequoia May 19, 2023, 6:28 AM

#

I want to practise my data analysis skills and become better in pandas. Does anyone know a project I could do or where to find small projects that I can do? I would love to do something with physics or companies.

odd meteor May 19, 2023, 6:32 AM

#

lapis sequoia I want to practise my data analysis skills and become better in pandas. Does any...

You can check https://Kaggle.com/learn you'll find resources to practice Pandas therein

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

lapis sequoia May 19, 2023, 6:35 AM

#

odd meteor You can check https://Kaggle.com/learn you'll find resources to practice Pandas...

Thanks! However, I already exhausted Kaggles learning courses. Do you mean there are other sources I could use on Kaggle that allow me to do some research about it?

odd meteor May 19, 2023, 6:40 AM

#

dusty bay anyone knows how to recognize header position and extract header info. on pandas...

The header is usually the first row in the dataframe. There are several ways to extract the header / columns.

https://stackoverflow.com/questions/19482970/get-a-list-from-pandas-dataframe-column-headers

Stack Overflow

Get a list from Pandas DataFrame column headers

I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called.
For example...

past meteor May 19, 2023, 6:42 AM

#

lapis sequoia Thanks! However, I already exhausted Kaggles learning courses. Do you mean there...

Doing kaggle competitions is a fast way to upskill on the practical side because you can make a notebook first (without looking at what others did) and then look at their solutions to see how they solved it conceptually

#

Do note that a lot of code on Kaggle isn't great imo

dusty bay May 19, 2023, 6:44 AM

#

If in one csv file has many headers, for example like this

name;age
robert;19
steven;25
.
.
name;country
andi;indonesia
albert;singapore

odd meteor May 19, 2023, 6:45 AM

#

lapis sequoia Thanks! However, I already exhausted Kaggles learning courses. Do you mean there...

Yes.

If you've exhausted the practice exercise on Kaggle on Pandas, you might wanna try picking a dataset, study the columns in the dataset, then write down 10 different things you'd want to explore on the said data with Pandas.
You could go to the notebook session and check other people's code. How they used pandas to manipulate their dataset. (you'll learn a lot more from this)
Alternatively, you can use YouTube or ask ChatGPT to generate Pandas questions for you to practice.

lapis sequoia May 19, 2023, 6:49 AM

#

past meteor Doing kaggle competitions is a fast way to upskill on the practical side because...

I am happy with what Kaggle did and I agree with you that doing Kaggle competitions helps build skills. However, I want something more. 🙂

lapis sequoia May 19, 2023, 6:50 AM

#

odd meteor Yes. 1) If you've exhausted the practice exercise on Kaggle on Pandas, you mig...

Thank you for these tips! I will do that

past meteor May 19, 2023, 6:50 AM

#

lapis sequoia I am happy with what Kaggle did and I agree with you that doing Kaggle competiti...

What I did years ago was export all of my own social media data (Fb etc.), wrangle it and visualise it

lapis sequoia May 19, 2023, 7:07 AM

#

past meteor What I did years ago was export all of my own social media data (Fb etc.), wrang...

That sounds interesting! What kind of interesting things were you trying discover about your own behaviour on social media?

past meteor May 19, 2023, 7:10 AM

#

I was interested in general usage patterns, what days was I sending a lot of messages, what hours, how did the evolution look through time, in what languages (very basic NLP), to what gender (very basic ML predicted that classified someone's gender based on their name), ...

#

This was my first Python project / the first project I did basic ML outside of the classroom afaik. Good stuff.

spark nimbus May 19, 2023, 8:00 AM

#

what's the best way to optimize large dataframe merges with pandas? I have to merge a table of only a few hundred thousand rows (monthly records) onto a table with daily records since ~1980, and timing the merge it takes ~1 hour to do. It doesn't seem bottlenecked by memory usage, so it's really just the slow performance. Is there any way to make this faster in any way?

past meteor May 19, 2023, 8:18 AM

#

spark nimbus what's the best way to optimize large dataframe merges with pandas? I have to me...

Run it in polars instead of pandas.

spark nimbus May 19, 2023, 8:19 AM

#

past meteor Run it in polars instead of pandas.

unfortunately our codebase is too big to migrate to another framework (~300k LOC), but if you have other suggestions for optimizing merges they're more than welcome :)

past meteor May 19, 2023, 8:21 AM

#

spark nimbus unfortunately our codebase is too big to migrate to another framework (~300k LOC...

If you're running Pandas 2.0 everything is arrow under the hood so you can keep all of your code in pandas and just do a few of the high performance stuff in Polars, don't need to refactor all 300K LOC

#

Otherwise, merging by index is faster than by columns but I'm sure you know that already

spark nimbus May 19, 2023, 8:25 AM

#

unfortunately their indices don't line up nicely, as the record dates are part of the columns we merge on :(
I could filter it with comparing to its MonthEnd, but that's an incredibly expensive operation.
As for pandas version, we're currently on 1.5.3

past meteor May 19, 2023, 8:28 AM

#

spark nimbus unfortunately their indices don't line up nicely, as the record dates are part o...

DuckDB could work for you as well if you want better performance and are comfortable writing SQL on top of your DF. https://duckdb.org/2021/05/14/sql-on-pandas.html

DuckDB

Efficient SQL on Pandas with DuckDB

TLDR: DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames. Recently, an article was published advocating for using SQL for Data Analysis. Here at team DuckDB, we are huge fans of SQL. It is a versatile and flexible language that allows the user to efficiently perform a w...

#

From personal experience scaling pandas can be a bit of a pain and so are modin, dask etc. These are 2 great ways to easily get way way better performance

spark nimbus May 19, 2023, 8:33 AM

#

I see. We're currently trying to move away from SQL and SAS and the like, but I'll see if I can get my superiors to look into these in case the performance becomes necessary :)

gloomy saddle May 19, 2023, 8:33 AM

#

past meteor From personal experience scaling pandas can be a bit of a pain and so are modin,...

able to give some insight on this? about to scale up from working with millions of rows to near a billion, and needing to think about this kind of stuff

past meteor May 19, 2023, 8:35 AM

#

spark nimbus I see. We're currently trying to move away from SQL and SAS and the like, but I'...

I can understand moving away from SAS but I think gracefully combining SQL and Python is a great way (that's what I do)

past meteor May 19, 2023, 8:40 AM

#

gloomy saddle able to give some insight on this? about to scale up from working with millions ...

For me the main draw of Polars, Spark, DuckDB, SQL, ... is that you write a sequence of operations that are then shoved through a query rewriter and optimizer

#

That is not the case with Pandas, if you're running 5 operations they'll be done sequentially even though there may be a more optimal way of arranging them and exploiting parallelism

gloomy saddle May 19, 2023, 8:42 AM

#

I've already been looking at sqlite as most of its going to be pretty flat and simple queries, its kind of instead of being a dataframe per group, I'm trying to have all the groups in 1 place

spark nimbus May 19, 2023, 8:42 AM

#

I see. Does DuckDB support SQL's CREATE FUNCTION as well? Our code is very complex, to the point where most things can't be written as proper SQL queries while remaining readable

past meteor May 19, 2023, 8:43 AM

#

gloomy saddle I've already been looking at sqlite as most of its going to be pretty flat and s...

DuckDB is more or less sqlite (an in-memory DB) but it's made for this kind of thing (analytical queries)

past meteor May 19, 2023, 8:44 AM

#

spark nimbus I see. Does DuckDB support SQL's `CREATE FUNCTION` as well? Our code is very com...

There's macros but yeah, I agree with that being a drawback of pure SQL.

#

I think you could keep writing everything in Pandas and then you just swap out the places that are bottlenecks with either polars or DuckDB tbh.

#

Example: I inherited a codebase that heavily used SQLalchemy (ORM) + Pandas. The code was very clean but just didn't scale. They'd use loops that emit tons of queries at the ORM level and then turn a list of dicts (the query results) into a pandas dataframe. I can't say exactly what I do but imagine it's predictive maintenance on washing machines, we installed a bunch of washing machines over time in a bunch of homes to the point that at some point the total memory of the list[dict] we had (before it goes out of scope in a function) plus the creation of a pd.dataframe made us go OOM. Even before we went OOM this pipeline was taking 1-2 hours. Yes, I could have made it faster by loading more and not doing 1-by-1 queries in the ORM but then I'd have been OOM even faster 💀.

I removed all the SQLalchemy + Pandas, added proper indexes on the DB. I do a few more filtering before I load the data into Python as well. Now with Polars this pipeline is taking ~15 seconds.

#

Doing everything in SQL might've scaled better even but just like you the code is imo too complex to do in SQL and it'd be spaghetti. We need to resample a lot of the sensors we're measuring etc etc. If not for that I'd just have done it with DBT (https://www.getdbt.com/product/what-is-dbt/) + DuckDB (https://duckdb.org/why_duckdb) and dagster/airflow (https://dagster.io/platform)

round parrot May 19, 2023, 11:26 AM

#

Hi guys! I have created a data viz chart before i train my machine learning model. For the categorical data i have used mapping to change the values into integers, is this a good way to visualize and understand my data or is there something better i could do?

wooden sail May 19, 2023, 11:32 AM

#

round parrot Hi guys! I have created a data viz chart before i train my machine learning mode...

this looks very nice, first of all. very clean. i think the mapping to integers is fine, but here you could do a couple of extra things. you can change the x ticks so that there are only integers (right now you have stuff like married status 0.25 which doesn't make much sense) and you can also remove the smooth curve you put on top. i think the bar chart is enough for categorical data. that's just my take though

tall tulip May 19, 2023, 11:37 AM

#

Hello everyone, I'm working on a dataset which contains timeseries data of 3 months with 5 min time stamp, which contains the daily and 12 days seasonality I want to train the model which predict the next week. I've tried different ARIMA, SARIMAX and now using LSTM, but don't know how can I make my model better. I need your help if any one help me it'll be better for me.

somber pollen May 19, 2023, 11:52 AM

#

wooden sail this looks very nice, first of all. very clean. i think the mapping to integers ...

or maybe only keep the smooth curve for the ones that have a lot of bins? idk it's somewhat useful there

wooden sail May 19, 2023, 11:53 AM

#

somber pollen or maybe only keep the smooth curve for the ones that have a lot of bins? idk it...

maybe that works after removing the extra ticks, sure

somber pollen May 19, 2023, 11:53 AM

#

wooden sail maybe that works after removing the extra ticks, sure

oh yeah my comment was an addendum

prime prawn May 19, 2023, 11:56 AM

#

hello! somebody here? i need a little bit help with something

#

guys, i have a small problem here and i can't found the andswer... i've a df with 'Order Date' and i want a new column with the months, so i'm trying to use ''' df["Month"] = df["Order Date"].dt.month '''.. but it throws me in float type and i can`t convert them into int

#

i already cleaned the data, any Nan value, and when i trying to conver into integer, trow me this error: IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

sullen kernel May 19, 2023, 12:33 PM

#

I'm doing a machine learning project from school,
i found the "best k" i think its called like that. and i think I need to run the test again but with the best k this time how do i do that?

round parrot May 19, 2023, 12:43 PM

#

wooden sail this looks very nice, first of all. very clean. i think the mapping to integers ...

Omg thanks so much for the input, im new at this so im not sure what's best, Ill do that as i think its a good idea Thanks a lot!

sharp vale May 19, 2023, 12:59 PM

#

ahh I see probably better to make a thread for my question cause there's probably many ways to do it, 1sec

odd meteor May 19, 2023, 4:11 PM

#

sullen kernel I'm doing a machine learning project from school, i found the "best k" i think i...

It's probably a clustering task where you need to use KMeans to figure out the right number of clusters (K) for your dataset.

lapis sequoia May 19, 2023, 6:53 PM

#

nvm

mild dirge May 19, 2023, 6:53 PM

#

Only one training/test cycle?

lapis sequoia May 19, 2023, 6:53 PM

#

It was CV

#

5 fold

past meteor May 19, 2023, 6:54 PM

#

Small tip: when hyper parameter tuning random forest you could just tune the cost complexity hyperparameter

lapis sequoia May 19, 2023, 6:54 PM

#

Oh shit

#

It's 0 by default

#

Lol

past meteor May 19, 2023, 6:54 PM

#

I'd tune only that one and call it day

lapis sequoia May 19, 2023, 6:54 PM

#

I submitted the last assignment with 0 fold

#

haha

#

i mean 1

lapis sequoia May 19, 2023, 6:54 PM

#

past meteor Small tip: when hyper parameter tuning random forest you could just tune the cos...

Oh what's that

past meteor May 19, 2023, 6:55 PM

#

Too much effort to tune all the other ones tbh. The ccp_alpha parameter has the biggest impact.

lapis sequoia May 19, 2023, 6:55 PM

#

where is it even

past meteor May 19, 2023, 6:55 PM

#

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html CTRL+F "ccp_alpha" and also read this page https://scikit-learn.org/stable/modules/tree.html#minimal-cost-complexity-pruning

lapis sequoia May 19, 2023, 6:55 PM

#

I have never seen it

#

oo

#

SO i like have gigantic loop

#

Just doing ccp_alpha

#

nice

#

I think the deafault param in gridsearchCV used to be 5 fold. because that's what I saw in the video I learnt it from

#

Then I never checked my own function

#

Rip

past meteor May 19, 2023, 6:58 PM

#

Yeah, just give ccp_alpha as the only parameter of grid search tbh

lapis sequoia May 19, 2023, 6:59 PM

#

nice bro

#

we should be besties

#

Zestar and kolv

#

How sweet

#

It did that again though. Decreased the accuracy

past meteor May 19, 2023, 7:02 PM

#

It could be that your default is not inside the parameter grid

lapis sequoia May 19, 2023, 7:03 PM

#

what does that mean

#

btw guys, I had to do some feature selection as a task in my assignment. I selected some features. But The first 2 algos I will be using will be random forests and a neural net. Both of which I think can select their own features? So can I just use the feature selction in logistic regression?

#

Or using the feature selection will affect the other 2 algos as well

timid grove May 19, 2023, 7:50 PM

#

Hey folks,
I want to make a conversational chatbot i am facing an issue , please help !

I used 'alpaca-gpt4-data' (https://huggingface.co/datasets/c-s-ale/alpaca-gpt4-data/viewer/c-s-ale--alpaca-gpt4-data/train?p=0) used a "bert-base-cased" tokenizer which formatted my data as
[CLS] ques [SEP] answer [SEP]
eg: [CLS] Give three tips for staying healthy. [SEP] 1. Eat a balanced and nutritious diet : Make sure your meals are inclusive..... so on [SEP]
used casual language modeling
trained it for around 800 epochs and got this result : step:750 train loss: 2.058868646621704, val_loss: 1.9444265365600586
AND FINALLY GOT THIS :
[CLS] write a code to print numbers from 1 to 10 [SEP] One of a strong surface owe attention : 159 : Having back within the continue to 6. * [ Hanna 15ing and ¥ our flexibility : In summary, cater seconds may didner the late can be consistent where the intently. This is equalability phenomenapar to need to focus it glucose of baseball sales All in ascent less. 3. 4. https : Regular his jungle Change found was 1960 to prevent staticprint : Here and marked horseivo stealing anywhere

🙂

I want to make this model and write a conference paper on it but my model is always predicting shit.
earlier i made the encoder decoder arch then also i got same results.
Either such model require training for 1000's of epochs , idk where i am lagging now.

I want to make a bot which is trained on ques and answer dataset which do not have any context part while giving input, now please tell me what to try now, i do not want to just fin tune those pre-trained models.

c-s-ale/alpaca-gpt4-data · Datasets at Hugging Face

odd meteor May 19, 2023, 7:51 PM

#

lapis sequoia btw guys, I had to do some feature selection as a task in my assignment. I selec...

What did you use for your feature selection?

RFE, Variance Threshold, or Correlation Coefficient or others?

If you're using RFE you can simply instantiate the 2 algorithms you wanna to use, and then use it to get the most significant features for each of the 2 models respectively.

So when training the models you'll only use the most significant features for each model as suggested by RFE.

timid grove May 19, 2023, 7:53 PM

#

timid grove Hey folks, I want to make a conversational chatbot i am facing an issue , pleas...

I would really appreciate if someone having experience in this domain likes to give some inputs

lapis sequoia May 19, 2023, 8:05 PM

#

odd meteor What did you use for your feature selection? RFE, Variance Threshold, or Correl...

I used an R cran algorithm called spfsr. It used a wrapper classifier as well. But I just passed an untuned decision tree in that.

agile cobalt May 19, 2023, 8:07 PM

#

timid grove Hey folks, I want to make a conversational chatbot i am facing an issue , pleas...

large language models like GPT require an absurd amount of training data and epochs
if I understood it correctly, you are trying to train one from scratch? If so, I'd recommend giving up on that - that dataset is meant to be used to fine tune existing models, not train one from scratch

#

you can even see that the dataset has "fine-tune" and "instruct-tune" tags

timid grove May 19, 2023, 8:11 PM

#

custom language modeling is used to just generate text, andrej karpathy used this model to generate shakespeare's language, i used the same model arch, put tags between ques and answer so that it can learn the same and generate text which are basically the answers.

timid grove May 19, 2023, 8:14 PM

#

agile cobalt large language models like GPT require an absurd amount of training data and epo...

i do not want it to make that roburst just want to make it work and predict text that is somehow related to question

agile cobalt May 19, 2023, 8:15 PM

#

timid grove custom language modeling is used to just generate text, andrej karpathy used thi...

link?

timid grove May 19, 2023, 8:16 PM

#

casual language modeling https://huggingface.co/docs/transformers/tasks/language_modeling
andrej video https://youtu.be/kCc8FmEb1nY

Causal language modeling

YouTube

Andrej Karpathy

Let's build GPT: from scratch, in code, spelled out.

We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable...

▶ Play video

agile cobalt May 19, 2023, 8:19 PM

#

you think that the model trained on Shakespeare would be able to work on anything even slightly different from what it training data is like?

#

never mind, I can see that this is not going to go anywhere - sorry for wasting our time

#

thanks for the links though

timid grove May 19, 2023, 8:21 PM

#

i used https://huggingface.co/datasets/c-s-ale/alpaca-gpt4-data/viewer/c-s-ale--alpaca-gpt4-data/train?p=0 this data, which has about 52000 ques and answers in it

c-s-ale/alpaca-gpt4-data · Datasets at Hugging Face

hasty mountain May 19, 2023, 8:22 PM

#

timid grove I would really appreciate if someone having experience in this domain likes to g...

How is the evaluation/inference mode done? Is it just like the paper says?
You pass a token to begin prediction as a target sentence and after each generation you append it to the target sentence?

timid grove May 19, 2023, 8:23 PM

#

hasty mountain How is the evaluation/inference mode done? Is it just like the paper says? You p...

yes

hasty mountain May 19, 2023, 8:23 PM

#

timid grove yes

Then that's the problem

#

Seriously... if there's only 1 single tutorial that can say about the problem around the bias caused by teacher enforcing... sigh

#

The thing is...the model is trained receiving the target sentences (teacher forcing). This causes it to get really confused in inference mode, when it is not receiving the correct answer at all.

#

As a result, the model is trained to make proper outputs when it knows the answers.

timid grove May 19, 2023, 8:26 PM

#

its just a text generating model , in which the labels are the next word and the architecture is only decoder type , so i guess teacher enforcing is not required here,

#

its not encoder-decoder based arch, its only decoder based arch.

hasty mountain May 19, 2023, 8:27 PM

#

The decoder requires the target sentences for self-attention.

timid grove May 19, 2023, 8:27 PM

#

yes

hasty mountain May 19, 2023, 8:27 PM

#

The decoder applies attention on target sentences, then on the "encoder output"(in this case, the input text)

#

So there's teacher enforcing

hasty mountain May 19, 2023, 8:28 PM

#

hasty mountain As a result, the model is trained to make proper outputs when it knows the answe...

The solutions:

Schedule Sampling: During training, perform one iteration using the proper target sentences(teacher enforcing). At the next iteration, use the text generated in the previous iteration as a target sentence for the decoder. For the loss, apply a weight that begins at zero for the second iteration and that increases over time.

Reinforcement Learning(Hello ChatGPT)

timid grove May 19, 2023, 8:32 PM

#

what i have done it, i made a ONLY DEOCDER type arch, which has masked attention in it, then i formed a dataset that contains ques and ans with special tokens separating it, i tokenized the full text and passed it to model, for the model the input is suppose n words then the trget in n+1 word sentence, the decoder has masked attention which is applied on the input sentence to just predict the next word,
so its like a text generating model, i am not dealing with ques and answers separately.

#

if you have any content on the teacher enforcing and encoder-decoder arch i would be very thankful if you provide me with that

hasty mountain May 19, 2023, 8:34 PM

#

I'm checking the architecture of the decoder in GPT

#

If the decoder is the same as in Transformer, then it probably does has teacher enforcing

#

Ok, indeed, it doesn't seem to use Teacher Enforcing at all.
It seems that this function is done through the masks, indeed.

So what etrotta said is probably correct. You may need way more data for this.

hasty mountain May 19, 2023, 8:42 PM

#

timid grove what i have done it, i made a ONLY DEOCDER type arch, which has masked attention...

OpenAI even created GPT-2 in unsupervised manner specially to use extremely large datasets (more than hundreds of million of words)

timid grove May 19, 2023, 8:51 PM

#

so can you people please suggest any pre-trained model on hugging face that is used for question answering and which do not need context as input , cause these masked language models need ques , context and answer as inputs but i only want to give ques as input and answer as output.

timid grove May 19, 2023, 8:52 PM

#

hasty mountain OpenAI even created GPT-2 in unsupervised manner specially to use extremely larg...

i got it.
Thanks for your time @hasty mountain @agile cobalt

hasty mountain May 19, 2023, 8:53 PM

#

timid grove so can you people please suggest any pre-trained model on hugging face that is u...

You could try using the vanilla Transformer, where the question could be the input text and the answer the target text, but then you'll need to apply schedule sampling

#

Hugging Face probably has it with pretrained weights

timid grove May 19, 2023, 8:55 PM

#

are you talking about this arch ?

hasty mountain May 19, 2023, 8:55 PM

#

timid grove are you talking about this arch ?

Yes

#

I thought GPT used target sentences and teacher enforcing because I thought it simply removed the encoder. But then I saw in the paper that they made adaptations...

timid grove May 19, 2023, 8:57 PM

#

Thank You !

dawn fable May 19, 2023, 9:02 PM

#

A simple question: After I am done learning the standard part of Python, where do I go next?

timid grove May 19, 2023, 9:06 PM

#

If you want an indepth knowledge of ML models then go with andrew's ML course, if you are intrested in coding part then start with some simple ML projects using sklearn.

odd meteor May 19, 2023, 9:58 PM

#

dawn fable A simple question: After I am done learning the standard part of Python, where d...

https://Kaggle.com/learn
Augment with Andrew Ng's Machine Learning course on Coursera (note: this does not always work for everyone. It didn’t work for me when I started. So if you find yourself struggling to finish the course or sleeping off while watching the videos, don't hesitate to drop it and try another resources)
If #2 did not quite work for you and you're interested in making financial commitment, then try Udemy, DataQuest, DataCamp etc
Once you've completed #3 and comfortable in building projects, try move to Deep Learning.
I'll recommend https://fast.ai course

You can also check the pinned post on this channel for more resources to further your learning.

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

fast.ai - fast.ai—Making neural nets uncool again

#

!resources

arctic wedgeBOT May 19, 2023, 10:07 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

night kernel May 19, 2023, 10:20 PM

#

do you think i can use Open ai's Whisper ai to transcribe yt videos by fetching their addresses rather than downloading the video first

serene scaffold May 19, 2023, 10:33 PM

#

night kernel do you think i can use Open ai's Whisper ai to transcribe yt videos by fetching ...

that's probably against the youtube TOS regardless.

#

but Whisper can only transcribe audio that is made available to it, which means that the audio needs to be on the computer where Whisper is running, one way or another.

night kernel May 19, 2023, 10:43 PM

#

serene scaffold that's probably against the youtube TOS regardless.

got it. let me explain my vision further as there may be a way around the TOS.

i was hoping to create a code/ai that would fetch new videos addresses as they are posted to a channel. this would be an api for an iOS app. the text would then be generated to create questions as an edtech feature

#

i have the code for the question generation and ai's whisper - my goal would come to fruition if i could combine these two on top of the fetching feature

#

this is still impossible you think?

serene scaffold May 19, 2023, 10:49 PM

#

night kernel this is still impossible you think?

you first need to establish if that's something that YouTube permits under their terms of service. and it probably isn't.

night kernel May 19, 2023, 10:57 PM

#

serene scaffold you first need to establish if that's something that YouTube permits under their...

ok. i just checked the website's ToS and asked chat gpt but couldnt find a specific answer. ill prod further by reaching out to youtube support

out of curiosity why are you inclined to believe they dont permit this - i take your word for it

serene scaffold May 19, 2023, 10:58 PM

#

night kernel ok. i just checked the website's ToS and asked chat gpt but couldnt find a speci...

ChatGPT is not omniscient. Don't ask it if something is against a website's TOS. Look for the TOS of that website.

night kernel May 19, 2023, 10:58 PM

#

serene scaffold ChatGPT is not omniscient. Don't ask it if something is against a website's TOS....

i just said i checked the websites ToS

serene scaffold May 19, 2023, 10:58 PM

#

So don't even bother asking ChatGPT

#

The following restrictions apply to your use of the Service. You are not allowed to ... access the Service using any automated means (such as robots, botnets or scrapers) except (a) in the case of public search engines, in accordance with YouTube’s robots.txt file; or (b) with YouTube’s prior written permission;

#

this would preclude what you are wanting to do.

night kernel May 19, 2023, 11:03 PM

#

serene scaffold this would preclude what you are wanting to do.

got it. what if it was from a different website, not yt, and the website's term's allowed it

serene scaffold May 19, 2023, 11:07 PM

#

night kernel got it. what if it was from a different website, not yt, and the website's term'...

you ultimately need to download the audio that you want to transcribe to the computer where Whisper is running. Otherwise, Whisper can't get to it.

night kernel May 19, 2023, 11:11 PM

#

serene scaffold you ultimately need to download the audio that you want to transcribe to the com...

got it. there are websites that can transcribe just as accurately so i wouldnt need to use whisper at that point. i wonder if a system exists for what i seek

serene scaffold May 19, 2023, 11:11 PM

#

night kernel got it. there are websites that can transcribe just as accurately so i wouldnt n...

I don't know

hybrid mango May 20, 2023, 1:41 AM

#

Hello!
What textbook and/or tutorials would you recommend to a beginner but somewhat familiar with programming for learning how to do text analysis with python? I am a graduate student and want to do text analysis on free response questions I had my students answer as feedback for a class activity. The purpose of the text analysis is to find common themes, what they liked and possible improvements

worldly dawn May 20, 2023, 2:18 AM

#

Hi!
We don't do ads

serene scaffold May 20, 2023, 3:06 AM

#

hybrid mango Hello! What textbook and/or tutorials would you recommend to a beginner but some...

You probably already know this, but for a corpus of that size, you won't learn more from trying to some a sophisticated NLP thing than you would by just reading all the responses

#

that aside, you might want to look into topic modeling.

hybrid mango May 20, 2023, 3:41 AM

#

Thanks for your response! Yeah, the total text to analyze is around 100 pages, so yes, not large at all. But would like to at least visualize (quantitatively) a bit, what were the most common complaints or likes of the excercise to make a case (or not) for the type of excercise we are doing in class. I'll look into topic modeling, thanks!

serene scaffold May 20, 2023, 3:44 AM

#

hybrid mango Thanks for your response! Yeah, the total text to analyze is around 100 pages, s...

you could also use a sentiment analysis tool to sort stuff into positive and negative statements, and see which words are common in the positive comments that aren't common in the negative ones, and vice versa.

#

that might help you establish what the most common complaints or likes are.

hybrid mango May 20, 2023, 3:56 AM

#

Thank you! I appreciate it, will look into sentiment analysis too

short heart May 20, 2023, 8:32 AM

#

I have a question regarding F1 score. So I have an imbalanced dataset and I checked F1 of my predictions on full dataset - it was 0.8, then I checked F1 score for both classes separately and it was 0.5 and 0.5 (even though accuracy was 0.98 on both), why is it like that and is it normal?

agile cobalt May 20, 2023, 8:43 AM

#

short heart I have a question regarding F1 score. So I have an imbalanced dataset and I chec...

what were the precision and the recall of the model in each of the cases? (full, only A, only B)

#

it sounds weird but not sure if impossible
~~even if possible, probably extremely unlikely though~~

vestal spruce May 20, 2023, 9:20 AM

#

Wait does train_test_split automatically stratify the dataset? I haven't done anything Machine Learning related, I just read the doc and still confused, so could anyone confirm my assumption?

cold osprey May 20, 2023, 9:34 AM

#

vestal spruce Wait does train_test_split automatically stratify the dataset? I haven't done an...

no, theres a stratify parameter

vestal spruce May 20, 2023, 9:34 AM

#

cold osprey no, theres a stratify parameter

aight so I have to specify it right? got it. thanks Shimmer ^^

past meteor May 20, 2023, 10:03 AM

#

vestal spruce Wait does train_test_split automatically stratify the dataset? I haven't done an...

It tries to make sure your test and train set have approx. the same distribution in the case of classification

vestal spruce May 20, 2023, 10:03 AM

#

past meteor It tries to make sure your test and train set have approx. the same distribution...

I still remember that part, I was just unsure whether the train_test_split does it automatically

#

But thanks for the reminder anw

past meteor May 20, 2023, 10:04 AM

#

It does

cold osprey May 20, 2023, 10:39 AM

#

It does??

kind moth May 20, 2023, 11:00 AM

#

Does anyone know how to use a GPU for TensorFlow, I'm using TensorFlow version 2.10 which is compatible with GPU's, and I've got CUDA and cuDNN installed, but it gives me a memory error, something like: Trying to allocate 13.7 GB And Failed or something like that, anyone got any ideas?

#

I've got a GTX 1660 Ti which has 6 GB

tidal bough May 20, 2023, 11:02 AM

#

Well, you can't make a 13GB-sized array on a GPU with only 6GB VRAM.

#

presumably you need to use smaller batches or reduce the size of your model or both

kind moth May 20, 2023, 11:02 AM

#

So is there a way to limit it?

kind moth May 20, 2023, 11:02 AM

#

tidal bough presumably you need to use smaller batches or reduce the size of your model or b...

Oh I am using a Batch Size of 1

#

Because ChatGPT told me to reduce the batch size as a possible fix

#

I did it that low and still the same error

past meteor May 20, 2023, 11:10 AM

#

cold osprey It does??

It doesn't you were right! I misread it, good to know lemon_surprised

lapis sequoia May 20, 2023, 1:05 PM

#

Hi. This project of mine is driving me bonkers - it's an async tool that gets product data from a number of json on various different sites. At first I was only using Pandas and had everything in memory, I've now moved to writing processed results to the DB for each domain. It still eats more and more memory with every additional site that I add to it, and I don't understand why. I've been using Memray which narrowed down the largest memory usage to the following function (specifically the 'for product in objects' loop). I moved to using ijson to parse the resulting json file instead of json, although I'm not sure there's any benefit here as streaming isn't possible since I'm not loading the file locally. Any ideas? https://pastecord.com/ecamavuvag.py

royal void May 20, 2023, 2:11 PM

#

Hello, is there someone able to tell me if my neural network is good and what should I change to make it good and look good ? I am just trying to make it learn mnist numbers

serene scaffold May 20, 2023, 3:18 PM

#

royal void Hello, is there someone able to tell me if my neural network is good and what sh...

do you know what precision and recall are, and if so, what scores did you get for the 10 digits?

royal void May 20, 2023, 3:32 PM

#

serene scaffold do you know what precision and recall are, and if so, what scores did you get fo...

I have a precision of 93% but I have no idea of what is s recall

#

I do not use any library, just numpy

queen cradle May 20, 2023, 3:41 PM

#

lapis sequoia Hi. This project of mine is driving me bonkers - it's an async tool that gets pr...

It looks to me like the problem is that all_products stores all the products. There must be more of those than will fit in your available memory. I think that instead of returning all_products, you should insert directly into your database (which will need to be on disk). Something like:

async def get_products_from_domain(db: DBType, domain: str) -> None:
    ssl_context = utils.custom_ssl_context()    

    async with httpx.AsyncClient(
        http2=True, verify=ssl_context, timeout=config.TIMEOUT
    ) as client:
        for x in itertools.count(0):
            url = f'https://{domain}/file.json?page={x}'
            try:
                result = await get_url(client, url)
            except Exception as exc:                        
                print(f"[!] {domain}: Error {exc}. Failed getting file{x}.json")
                config.ERRORDF.loc[len(config.ERRORDF)] = [url, exc]
                continue

            objects = (*ijson.items(result.text, 'products.item'),)
            if not objects:
                break

            insert_products_into_db(db, objects)
            await asyncio.sleep(2)

royal void May 20, 2023, 3:53 PM

#

serene scaffold do you know what precision and recall are, and if so, what scores did you get fo...

https://paste.pythondiscord.com/iguvinazoj
Here is what I did, I find it weird to have 93% with an alpha (learning rate) of 3, if someone could have a look it would help me a lot !

lapis sequoia May 20, 2023, 3:58 PM

#

queen cradle It looks to me like the problem is that `all_products` stores all the products. ...

Thanks Kyle. The max size of the combined JSON files from one domain is about 15mb (~1000 products). If I'm processing data from 5 domains at a time that should be 75mb+overhead of lists etc.. My program starts off taking up about 110-120mb and fairly quickly reaches 500mb of memory. Do you think with the size of files I'm working with that is possible? Shouldn't I be seeing variable memory usage (depending on the amount of products per domain), rather than one steady increase? Also running memprofiler doesn't show me that the memory is being released after processing is done - but maybe I'm not reading it right

torn mulch May 20, 2023, 4:07 PM

#

import numpy as np
def gaus(x,y,epsilon=0.01,t=100):
    x = np.array(x)
    y = np.array(y)
    diags = np.diag(x)
    print(diags)
    np.fill_diagonal(x,0)
    print(diags)
    sum = np.sum(np.abs(x),axis = 1)
    if not np.all(np.abs(diags)>sum):
        print('Not Diagonally Dominant')
        return False
    x = -x
    matold = np.zeros(x[0].shape)
    for i in range(t):
        matnew=np.array(matold)
        for j,row in enumerate(x):
            matnew[j]=(y[j]+np.dot(row,matnew))/diags[j]
        print('iter - ',i+1,matnew)
        d=np.sqrt(np.dot(matnew-matold,matnew-matold))
        if d < epsilon:
            return True
        matold=matnew
    print("Not convergent")
    return False
Xs = [
  [
    [3, -2, -2],
    [6, 5, 4],
    [-4, 7, 2]
  ],
  [
    [15, 2, 3, 5],
    [1, 3, -1, 0],
    [1, 1, 6, 3],
    [2, 1, 4, 9]
  ],
  [
    [8, 2, -5],
    [3, 5, -1],
    [3, 2, 6]
  ],
  [
    [-8, 3, -2],
    [-2, 4, 1],
    [-2, 5, 11]
  ],
  [
    [12, -8, -2],
    [-3, 5, 1],
    [-3, 4, 8]
  ],
  [
    [-9, 3, 3, -3],
    [-4, 12, 4, -4],
    [5, 5, 15, 5],
    [-6, 6, -6, 18]
  ],
  [
    [-6, 9, -2, 1],
    [-9, 5, -1, 4],
    [-3, 4, -7, 2],
    [-4, 3, 7, 3],
  ],
]

Ys=[
  [5, 4, 7],
  [3, 7, 3, 4],
  [8, 4, 1],
  [3, 3, 5],
  [9, 5, 4],
  [8, 4, 7, 4],
  [9, 6, 4, 1],
]


for i,x in enumerate (Xs):
    print(f"A: {x} Y: {Ys[i]}")
    if gaus(x,Ys[i]):
        print('Convergent')
    print("\n")

#

can anyone tell me why the diags value different from the first one?

queen cradle May 20, 2023, 4:09 PM

#

lapis sequoia Thanks Kyle. The max size of the combined JSON files from one domain is about 15...

You'll only see variable memory usage if your code no longer has any references to the products. If you store the list of products anywhere—for example, as long as you keep a reference to all_products—then Python can't garbage collect the products.

Here's an experiment to try: In your original code, replace return all_products with return {'products':[]}. That is, do exactly the same processing as now, but you return an empty product list instead of the actual list of products. You should see greatly decreased memory usage; that indicates that the problem is not that this function uses too much memory, but that elsewhere in your program, you keep around references to the products even after you're done with them.

lapis sequoia May 20, 2023, 4:10 PM

#

queen cradle You'll only see variable memory usage if your code no longer has any references ...

Ok thanks - I'll give that a shot and report back. Here's the report from memprofiler, some very large negative numbers there but you can see that just this run increased memory usage by more than 85mb. https://pastecord.com/qagibirule.yaml

queen cradle May 20, 2023, 4:14 PM

#

lapis sequoia Ok thanks - I'll give that a shot and report back. Here's the report from mempro...

If you want a simple time optimization, you could process result.text only once. Just do something like:

starting_length = len(all_products['products'])
all_products['products'].extend(objects)
if len(all_products['products']) == starting_length:
    break

torn mulch May 20, 2023, 4:15 PM

#

@queen cradle can you help me ?

queen cradle May 20, 2023, 4:19 PM

#

torn mulch <@710929945526009897> can you help me ?

I'm not going to try to figure out your code with nothing to go on. But if you explain what it's supposed to be doing, someone might be willing to help.

torn mulch May 20, 2023, 4:22 PM

#

oh ok ,i get error when i do np.fill_diagonal(x,0) the diags value get changed

#

i have already store the diagonal value in diags but after i do fill diagonal x,0 the value from diags is changed

queen cradle May 20, 2023, 4:23 PM

#

fill_diagonal is supposed to change the diagonal. See https://numpy.org/doc/stable/reference/generated/numpy.fill_diagonal.html.

torn mulch May 20, 2023, 4:24 PM

#

queen cradle `fill_diagonal` is supposed to change the diagonal. See <https://numpy.org/doc/s...

yes i know that,but i store the diagonal value first before do fill diagonal x with 0

queen cradle May 20, 2023, 4:25 PM

#

No, that code doesn't store a copy. It stores a reference to the original diagonal. When you change the original, the reference still points to the changed value.

#

Try diags = np.diag(x).copy().

wooden sail May 20, 2023, 4:25 PM

#

whenever possible, numpy operations return views to avoid making copies in memory. it's always a good idea to sanity check these things because it's not always easy to tell when numpy creates views or copies

#

!e

import numpy as np
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
y = np.diag(x)
z = np.diag(x).copy()
np.fill_diagonal(x,0)
print(y)
print(z)

arctic wedgeBOT May 20, 2023, 4:26 PM

#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [0 0 0]
002 | [1 5 9]

torn mulch May 20, 2023, 4:28 PM

#

queen cradle No, that code doesn't store a copy. It stores a reference to the original diagon...

oh ok,ty bro

torn mulch May 20, 2023, 4:29 PM

#

wooden sail whenever possible, numpy operations return views to avoid making copies in memor...

ok bro ty

lapis sequoia May 20, 2023, 4:33 PM

#

queen cradle You'll only see variable memory usage if your code no longer has any references ...

I tried this but I'm still seeing the same memory increase. This is the output of memprofiler now - https://pastecord.com/ubefuhemen.yaml - what do you think is going on? Should I be seeing Python release memory on the return call, or does that generally happen at a later point and in a snapshot like this I'm not going to be able to see if that memory actually gets released or not?

queen cradle May 20, 2023, 4:38 PM

#

lapis sequoia I tried this but I'm still seeing the same memory increase. This is the output o...

Memory is released when objects are garbage collected. In CPython, if an object's reference count drops to zero, then it's collected immediately. In this case, all_products will be collected after the end of the function, but as the output you're looking at doesn't have a line for "after the function," that won't be visible. You could insert del all_products before the return statement and then it should be visible. Or you could use a different memory profiler or a different mode of the one you're using.

#

Remember, though, that my hypothesis was that the problem was not in this function; it was elsewhere, because the list of products was being kept around forever somehow. get_products_brand should have similar overall memory usage to the original get_products because it does the same processing. The real change should be that your script does not run out of memory later.

lapis sequoia May 20, 2023, 4:50 PM

#

queen cradle Remember, though, that my hypothesis was that the problem was not in this functi...

del all_products does release some memory but not everything that I would expect, although from what I gather using the memory profiler like this isn't the most precise way of seeing what's going on. I'm running the script now to see if it ends up running out of memory again - memory use is still increasing for now with every domain processed

sinful lark May 20, 2023, 5:27 PM

#

hi! so i have a time-series data that looks like this, there's a logariphmic trend here, but when i exponentiate it the values become too big for modeling, what should i do?

lapis sequoia May 20, 2023, 5:28 PM

#

queen cradle Remember, though, that my hypothesis was that the problem was not in this functi...

Ended up running out of memory again - which probably means although memray detected that this function is using a lot of memory, it may not be the source of my memory leak. What do you think? If I need to be looking elsewhere, do you have any tips on how I could go about finding the leak?

queen cradle May 20, 2023, 5:42 PM

#

lapis sequoia Ended up running out of memory again - which probably means although memray dete...

I agree: If you're still running out of memory, then this function is not the problem. You'll have to look for other functions that are allocating a lot of memory. I don't have any particular advice for this except looking at a lot of memray output.

lapis sequoia May 20, 2023, 5:53 PM

#

queen cradle I agree: If you're still running out of memory, then this function is not the pr...

Ok, thanks a lot for your help. I've moved closer to finding my solution and I have a more elegant function here now thanks to you 🙂

grand warren May 20, 2023, 6:02 PM

#

Hi, i have been taking ztm machine learning course on Udemy and i almost completed it, so i am familiar with basic concepts of machine learning. İ want to start taking Andrew ng's deep learning specialization course. İs it a good idea? İs there anything i should know before i start?

grand warren May 20, 2023, 7:27 PM

#

thanks

keen axle May 20, 2023, 7:57 PM

#

What is a good way for a beginner like myself to learn how to create visual simulations of physical systems like pendulums, springs etc.

#

Is this the place to ask?

#

I want to learn how to solve real world physics odes and translate it to meaningful visualizations

#

But i dont really know where to start

#

Some assistance would be appreciated or some pointers and tips maybe

arctic wedgeBOT May 20, 2023, 9:00 PM

#

:incoming_envelope: :ok_hand: applied timeout to @lapis sequoia until <t:1684617029:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

hasty mountain May 20, 2023, 9:59 PM

#

keen axle I want to learn how to solve real world physics odes and translate it to meaning...

I think the folks from #game-development or #software-architecture might know more about simulations.

wet nacelle May 20, 2023, 10:29 PM

#

Question about this video at 8:10. He is creating training data for spacy NER. He includes a sentence string, and then the start index, end index, and type of the entity. My question: when training spacy, does anyone know if we need to provide full sentences with entities interspersed within them, or can we simply provide the entity? In the latter case, you would simply provide a string that contains only the entity, and the start integer would always be 0 and end always -1.

https://www.youtube.com/watch?v=YBRF7tq1V-Q&list=PL2VXyKi-KpYs1bSnT8bfMFyGS-wMcjesM&index=5

YouTube

Python Tutorials for Digital Humanities

How to Use spaCy to Create an NER training set (Named Entity Recogn...

In this video, we continue exploring named entity recognition (NER) for the digital humanities in Python, specifically via the spaCy library. In this video, we use the EntityRuler model that we created in the last video to generate automatically a strong training set. We will use this NER training set in the next video to train a custom domain-s...

▶ Play video

coral field May 20, 2023, 11:46 PM

#

I'm a beginner at ml/ neural networks, and I was wondering when I should call the "nn.ReLU()". my possibilities are after the first hidden layer, after the second hidden layer, after the third hidden layer, or after all of them

#

i am leaning towards after the first hidden layer, but why?

serene scaffold May 20, 2023, 11:52 PM

#

coral field I'm a beginner at ml/ neural networks, and I was wondering when I should call th...

!code

arctic wedgeBOT May 20, 2023, 11:52 PM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold May 20, 2023, 11:52 PM

#

Please do not ask people to read screenshots of text

coral field May 20, 2023, 11:52 PM

#

alr. my bad

serene scaffold May 20, 2023, 11:52 PM

#

Do you understand that relu is as compared to linear?

coral field May 20, 2023, 11:53 PM

#

not really?

serene scaffold May 20, 2023, 11:53 PM

#

Do you know what an activation function is?

coral field May 20, 2023, 11:53 PM

#

yes

serene scaffold May 20, 2023, 11:53 PM

#

What is nn.Linear?

coral field May 20, 2023, 11:54 PM

#

im pretty sure it creates a layer with input & output nodes (?), but that's all i learned about it.

serene scaffold May 20, 2023, 11:54 PM

#

What about nn.ReLU?

coral field May 20, 2023, 11:55 PM

#

i believe it transforms the data using the ReLU activation function so the model can better understand it (?)

serene scaffold May 20, 2023, 11:56 PM

#

It's an activation function, yes. When do you need activation functions?

coral field May 20, 2023, 11:56 PM

#

i'm not completely sure. is it after the data has been passed in?

serene scaffold May 20, 2023, 11:57 PM

#

You need one between every linear layer

#

Having a bunch of linear transformations is the same as having one linear transformation

#

So you need a nonlinear function between each one

coral field May 20, 2023, 11:58 PM

#

why would it not be possible to only put 1 after the first layer? wouldn't the raw data already be transformed?

serene scaffold May 20, 2023, 11:59 PM

#

The point isn't to transform the raw data

#

It's to detect nonlinear relationships between features

coral field May 21, 2023, 12:00 AM

#

is there an article or page in which i can read more on this? since i can't seem to find too many good ones through a google search

serene scaffold May 21, 2023, 12:01 AM

#

By the time you're putting data into the neural network, all the data cleaning has been done. The data already needs to already be suitable for the network

serene scaffold May 21, 2023, 12:01 AM

#

coral field is there an article or page in which i can read more on this? since i can't seem...

You might watch three blue one browns series about neural networks

coral field May 21, 2023, 12:01 AM

#

serene scaffold You might watch three blue one browns series about neural networks

👍 thank you so much!

serene scaffold May 21, 2023, 12:01 AM

#

And in particular, it sounds like you have not yet wrapped your head around activation functions

#

Which is fine

coral field May 21, 2023, 12:02 AM

#

yea, i've only known the very basics, not too much in depth. ill definitely check it out

restive path May 21, 2023, 2:59 AM

#

guys, for data scientists, is it very important to have knowledge of aws?

serene scaffold May 21, 2023, 3:04 AM

#

restive path guys, for data scientists, is it very important to have knowledge of aws?

Every company has their own definition of "data scientist", so it really depends. But generally speaking, no.

restive path May 21, 2023, 3:05 AM

#

serene scaffold Every company has their own definition of "data scientist", so it really depends...

In your work experience, how much have you used it?

serene scaffold May 21, 2023, 3:06 AM

#

restive path In your work experience, how much have you used it?

I often use VMs that are hosted on AWS, but I'm not in charge of setting any of that up.

restive path May 21, 2023, 3:07 AM

#

serene scaffold I often use VMs that are hosted on AWS, but I'm not in charge of setting any of ...

interesting

serene scaffold May 21, 2023, 3:09 AM

#

If you're looking to up your ops skills, it's probably more important that you can do whatever environment management you might need to do with Linux

violet gull May 21, 2023, 5:22 AM

#

how do i calculate gradients in my neural network using reinforcement learning?

#

i have a tiny neural network where one number goes in and it goes to 3 nodes and each node represents an action my frog can take

#

also im trying to design a biology simulation where frogs use AI to find food

#

but i still do not see the reason for AI to be applicaple

#

if its just to find the food simple trig works and no need for millions of iterations of AI

#

where does AI do something logic statements cant

#

even in examples of training an AI to play pong, a simple 3 line code that moves the paddle up if the ball is above it and down if the ball is below it beats any AI

agile cobalt May 21, 2023, 5:57 AM

#

violet gull even in examples of training an AI to play pong, a simple 3 line code that moves...

many somewhat simple things can be solved by hand way more efficiently than by using a machine learning model, and many moderately complex things can be solved using simpler models like decision trees instead of neural networks

that said, there are definitely some cases in which it is not possible to do by hand, and that a simple model cannot work well

violet gull May 21, 2023, 5:57 AM

#

but for my frog example

#

isnt a simple logic statement better?

agile cobalt May 21, 2023, 5:58 AM

#

the details you gave are not enough to tell

violet gull May 21, 2023, 5:58 AM

#

#

frog suppose to go find food

#

once a food is in its vision it is given the location of the food

#

it can either do 1 trig equation to calculate the angle of rotation it needs to go to or it can do 1 million iterations of trial and fail

agile cobalt May 21, 2023, 5:59 AM

#

sounds simple enough that you can hardcode an efficient policy

violet gull May 21, 2023, 5:59 AM

#

so when would i need AI

agile cobalt May 21, 2023, 6:00 AM

#

for things too complex to define a policy by hand?

violet gull May 21, 2023, 6:00 AM

#

like what

agile cobalt May 21, 2023, 6:00 AM

#

look up deepmind's projects if you haven't yet

violet gull May 21, 2023, 6:00 AM

#

i mean for my frog

agile cobalt May 21, 2023, 6:00 AM

#

you wouldn't?

#

you can try using reinforcement learning but if your entire input is just "distance to food if known", I don't think that there is anything it can do with that

violet gull May 21, 2023, 6:01 AM

#

why

agile cobalt May 21, 2023, 6:02 AM

#

99% of the inputs are gonna be just "unknown" and in the rest, the best action should be just "move forwards"

violet gull May 21, 2023, 6:03 AM

#

right

agile cobalt May 21, 2023, 6:03 AM

#

if you include X position, Y position, current vision angle and perhaps some information pertaining the path it has travelled so far, then it can try and formulate a strategy to travel around the space more efficiently, but with just that one input, there's not much to be done

violet gull May 21, 2023, 6:04 AM

#

ok so the more situations and complexity i add the better the AI will be

agile cobalt May 21, 2023, 6:04 AM

#

not necessarily / not always

#

but you gotta at least give it enough information that it has something to work with

violet gull May 21, 2023, 6:05 AM

#

how is it not enough info

#

if it has the x y coords of foods cant it path to the food

agile cobalt May 21, 2023, 6:05 AM

#

which information are you feeding into the model right now?

violet gull May 21, 2023, 6:06 AM

#

location of food

#

in the sight range

agile cobalt May 21, 2023, 6:06 AM

#

violet gull i have a tiny neural network where one number goes in and it goes to 3 nodes and...

huh?

agile cobalt May 21, 2023, 6:06 AM

#

violet gull in the sight range

so yeah, while in sight it can work, but do you want for it to perform efficiently while it is not literally looking at food or literally stay still / move randomly?

violet gull May 21, 2023, 6:07 AM

#

for now it will move randomly when food not in sight

#

or like you said, just move forward

#

im talking specificially once it sees food

#

can it use reinforcement to turn to go to the food

agile cobalt May 21, 2023, 6:08 AM

#

once it sees food in that, just hardcoding the behaviour should give you an equal or better result compared to reinforcement learning

violet gull May 21, 2023, 6:08 AM

#

ok

#

so what if i add a bird that eats frogs

#

it has to path around the bird and get to the food

agile cobalt May 21, 2023, 6:09 AM

#

then you would be modifying the inputs anyway

violet gull May 21, 2023, 6:10 AM

#

agile cobalt then you would be modifying the inputs anyway

the input would be modified to see the predator too

violet gull May 21, 2023, 6:12 AM

#

violet gull how do i calculate gradients in my neural network using reinforcement learning?

ok back to the original question

agile cobalt May 21, 2023, 6:15 AM

#

as far as I know the most common method is DeepQ learning but I cannot say that I am particularly sure?

#

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

violet gull May 21, 2023, 6:22 AM

#

that math looks gross

#

cant i just raise the value slightly of the paramaters that did a good output or lower the paramaters of the bad outputs

somber pollen May 21, 2023, 6:37 AM

#

violet gull cant i just raise the value slightly of the paramaters that did a good output or...

that's what machine learning does, just in an automated way

violet gull May 21, 2023, 6:37 AM

#

but the math looks gross

#

nothing is defined, they use random symbols, its not meant to be replicated

somber pollen May 21, 2023, 6:38 AM

#

in that article they don't actually really do any math

#

if you want, I can explain the symbols that they use, or link you to a resource, but I promise they're not random haha

#

all this stuff can be implemented as like pretty basic python code

#

https://github.com/karpathy/micrograd

#

it's phrased in a math-oriented way in that article, but they're just a different way of referring to common programming concepts

violet gull May 21, 2023, 6:40 AM

#

ok

somber pollen May 21, 2023, 6:41 AM

#

you also don't need to understand most of that math to implement a network that works pretty well

violet gull May 21, 2023, 6:41 AM

#

lemme translate what i have into psudocode real quick

somber pollen May 21, 2023, 6:41 AM

#

that's more so if you want to make a neural network library yourself

#

calculating gradients is mostly done by auto-differentiation I believe

#

which analyzes the AST of your kernel function, and uses that

#

but you want to be using something like PyTorch, because they'll handle all that for you

violet gull May 21, 2023, 6:43 AM

#

def decide(coord_to_food){
  output = coord_to_food.mmul(weights).add(biases)
  decision = argmax(output)
  if decision = 0: move left
  if decision = 1: move right

#

is that generally the right idea?

somber pollen May 21, 2023, 6:44 AM

#

yeah that's the right idea, but not the right approach if you want to make a model that's actually useful

violet gull May 21, 2023, 6:44 AM

#

why

somber pollen May 21, 2023, 6:44 AM

#

there are a lot of different ways you can implement that decide function

#

you did one of the potential ways it could, but there's many

#

and so it's better if you use a library because then they already have done the math

#

and you just have to select which type of function you wanna use

#

but they're all already tested, and defined

violet gull May 21, 2023, 6:45 AM

#

i usually done like using libraries

#

ive done deep learning by hand

somber pollen May 21, 2023, 6:45 AM

#

by hand?

violet gull May 21, 2023, 6:45 AM

#

like the gradients and stuff

#

no imports

#

my last project was image classification AI

somber pollen May 21, 2023, 6:46 AM

#

oh well if the goal is to implement it by hand

#

then just follow the general outline of the micrograd library

#

he also has one for transformers

#

idk if he has one for reinforcement learning

#

but also reinforcement learning is a lot harder than image classification

violet gull May 21, 2023, 6:47 AM

#

great

somber pollen May 21, 2023, 6:47 AM

#

by a lot harder I mean it's difficult to do even if you use an existing library

#

it requires a lot more computational resources

violet gull May 21, 2023, 6:47 AM

#

somber pollen yeah that's the right idea, but not the right approach if you want to make a mod...

so what was wrong with my idea

somber pollen May 21, 2023, 6:48 AM

#

violet gull so what was wrong with my idea

I didn't realize when I wrote that that you were doing this for the purpose of learning

#

Not for the purpose of making something that would give you good results

violet gull May 21, 2023, 6:48 AM

#

i want my frog to act like a frog

somber pollen May 21, 2023, 6:48 AM

#

Good luck!

violet gull May 21, 2023, 6:49 AM

#

somber pollen Good luck!

so there isnt just a basic equation for weight gradients...

somber pollen May 21, 2023, 6:49 AM

#

violet gull so there isnt just a basic equation for weight gradients...

I mean there is for differentiable functions but most of the time you can't easily express an entire network as a differentiable function

violet gull May 21, 2023, 6:50 AM

#

somber pollen I mean there is for differentiable functions but most of the time you can't easi...

i could with image classification

royal void May 21, 2023, 6:50 AM

#

Hi, do you have an explanation of why my cost fonction (MSE) looks like that in my neural network ? I'm training it to learn gravity, I give him curves and he has to give me the initial speed and the initial angle, it works quite well but I have no idea on why the cost function looks like that with huge points

Capture_decran_2023-05-21_a_08.48.26.png

somber pollen May 21, 2023, 6:50 AM

#

violet gull i could with image classification

gradients in that sense are completely different

#

image classification gradients are used to downsample an image, and characterize parts of an image

violet gull May 21, 2023, 6:50 AM

#

somber pollen gradients in that sense are completely different

no by gradients it refers to the matrix applied to the weights that changes the weights

#

the change of the weights after loss

somber pollen May 21, 2023, 6:51 AM

#

uhh

#

gradients aren't applied to the weights immediately, they're used to determine how the backpropagation works

violet gull May 21, 2023, 6:51 AM

#

yes they are applied to the weights

somber pollen May 21, 2023, 6:52 AM

#

I said immediately

#

I know they're appliedI

royal void May 21, 2023, 6:52 AM

#

violet gull no by gradients it refers to the matrix applied to the weights that changes the ...

you have to go through some maths but there is a simple way to calculate the gradient, go watch the 4 videos of 3b1b on neural network and you'll be able to make your own if you have a little bit of maths background

violet gull May 21, 2023, 6:53 AM

#

i already have made neural networks....

#

i know how they work

royal void May 21, 2023, 6:53 AM

#

so you know how to calculate the gradient?

violet gull May 21, 2023, 6:53 AM

#

i do in regular deep learning

somber pollen May 21, 2023, 6:53 AM

#

I'm confused why you're even asking questions if you already know the answer

violet gull May 21, 2023, 6:53 AM

#

not reinforcement

#

i cant find any documentation for the actual math

somber pollen May 21, 2023, 6:54 AM

#

violet gull i cant find any documentation for the actual math

someone sent you a link for the actual math but you said it was gross haha

violet gull May 21, 2023, 6:54 AM

#

cause it wasnt math

violet gull May 21, 2023, 6:54 AM

#

somber pollen in that article they don't actually really do any math

^

somber pollen May 21, 2023, 6:54 AM

#

I was saying that to try to convince you to try to read it

#

Because you seemed like you don't like hard math

#

I then explained how they're just using math symbols to express programming stuff

#

but also a textbook would be a much better source for something like that because most people here are making networks to use them for a problem, not just making them to learn how they work

#

library genesis is a great source, I can recommend some textbooks if you want to see the like real math haha

violet gull May 21, 2023, 6:56 AM

#

screw it, rust has auto diff

somber pollen May 21, 2023, 6:56 AM

#

i'm so confused honestly

violet gull May 21, 2023, 6:56 AM

#

same

somber pollen May 21, 2023, 6:56 AM

#

do u want to learn the math or not

#

or do you want to learn how to program better

violet gull May 21, 2023, 6:57 AM

#

i want my frog to have a brain

#

idc how it happens

somber pollen May 21, 2023, 6:57 AM

#

ok so you want this to work well

#

and you don't care how it happens

#

then don't write the whole damn thing urself

#

and use something written by people who have spent their lives doing this like PyTorch

violet gull May 21, 2023, 6:57 AM

#

whenever i try to import anything the documentation is bad or nonexistant and nothing works so i resort to doing it myself

somber pollen May 21, 2023, 6:58 AM

#

I understand that impulse, but it gets easier to deal with bad documentation

#

And also for most of these libraries you can always contribute

#

And make the docs better!

violet gull May 21, 2023, 6:58 AM

#

i rage quit pytorch after i had to dig for 9 hours into a 4chan post to find the fact that catagorical cross entropy already applies softmax because it was listed in the documentation zero times

somber pollen May 21, 2023, 6:58 AM

#

I mean it does say categorical

#

I think you might be running into issues because you're rushing too much

#

I have also run into tons of issues with Tensorflow and PyTorch and DeepChem and whatever

violet gull May 21, 2023, 6:59 AM

#

i really do appreciate your patience btw

somber pollen May 21, 2023, 6:59 AM

#

No problem haha

#

I understand your frustration fr

#

Docs can be really garbage, but implementing stuff yourself is almost always harder

#

and I say that as someone who implements stuff myself for fun

violet gull May 21, 2023, 7:00 AM

#

i did it once

#

took 7 months

somber pollen May 21, 2023, 7:00 AM

#

yeah exactly, I'm sure you learned a ton

#

but you could have done that in like an afternoon w tensorflow

#

it's worth it doing it once

violet gull May 21, 2023, 7:00 AM

#

i have a fully functional deep learning API in java that is optimized and hyperthreaded that is so spagetti i am the only one on the planet who can read it

somber pollen May 21, 2023, 7:01 AM

#

but like don't subject urself to that constantly haha

somber pollen May 21, 2023, 7:01 AM

#

violet gull i have a fully functional deep learning API in java that is optimized and hypert...

did you just say deep learning api in java?

violet gull May 21, 2023, 7:01 AM

#

yuh

somber pollen May 21, 2023, 7:01 AM

#

hahahahah jesus christ man I'm sorry that sounds like a nightmare to debug

violet gull May 21, 2023, 7:01 AM

#

i was then told, hey why dont you use libraries for your linear algebra operations because they are way faster and build hardware level

#

i spent the next week refactoring all my code to implement their custom vector datatype

#

bench marked it: 600x slower

#

i hate using code i didnt write

somber pollen May 21, 2023, 7:03 AM

#

well there are like 15 thousand libraries for matrix ops

#

i'm sure some are garbage, but that doesn't mean they all are haha

#

also with most performance things there's always a deep tradeoff that you can accidentally run afoul of

violet gull May 21, 2023, 7:04 AM

#

ill try some rust ML libraries but i have little hope

somber pollen May 21, 2023, 7:04 AM

#

using libraries optimized for large matrix ops for small matrix ops is worse than no library, etc

#

well also the reason the docs might be so garbage

#

is because you're using rust and java

#

which one is super new, and therefore has garbage docs (imo)

violet gull May 21, 2023, 7:04 AM

#

rust and java have almost always had better documentation than python

somber pollen May 21, 2023, 7:04 AM

#

and one is super old so not really made for deep learning

iron basalt May 21, 2023, 7:04 AM

#

For RL it's best to start with the basic concepts (math) from the RL book: Reinforcement Learning An Introduction by Sutton and Barto.

somber pollen May 21, 2023, 7:04 AM

#

violet gull rust and java have almost always had better documentation than python

I haven't coded a ton in rust but I highly disagree for java haha

iron basalt May 21, 2023, 7:05 AM

#

Then for implementing it with deep learning there are several tutorials and you should be able to understand the mathematics.

violet gull May 21, 2023, 7:05 AM

#

somber pollen I haven't coded a ton in rust but I highly disagree for java haha

your right, i shouldnt say good documentation

#

just an enourmous amount of data on stack overflow

somber pollen May 21, 2023, 7:05 AM

#

yeah no that's very true

#

but for data science, python absolutely dominates in terms of documentation and stack overflow qs

#

maybe R and Julia come close but like definitely more than java

violet gull May 21, 2023, 7:06 AM

#

im still salty about the catagorical crossentropy incident

violet gull May 21, 2023, 7:06 AM

#

iron basalt For RL it's best to start with the basic concepts (math) from the RL book: Reinf...

ill look into that book

#

thank you

somber pollen May 21, 2023, 7:06 AM

#

violet gull im still salty about the catagorical crossentropy incident

that's fair, it's also difficult with ML stuff because this stuff is relatively new

#

so there's a lot of churn in apis, and algorithms, and stuff like that

#

that causes docs to get out of date, and miss stuff

#

like with tensorflow they switched everything with tf2.0

violet gull May 21, 2023, 7:08 AM

#

i just want something like

model = model(layers=2, type=nn)
model.train(data)
model.test(testData)```

#

or for reinforcement

model.give_result(score)```

somber pollen May 21, 2023, 7:15 AM

#

while you're up there in the clouds can you ask the python developers for a shorter way to join a list of strings

violet gull May 21, 2023, 7:16 AM

#

isnt that what zip is for

#

idk i dont write python

somber pollen May 21, 2023, 7:17 AM

#

no I just personally feel like there's gotta be better syntax than ' '.join(list)

#

to turn ["a", "b"] into "ab"

violet gull May 21, 2023, 7:17 AM

#

in rust or java it uses iterators or data streams and its beatiful

somber pollen May 21, 2023, 7:17 AM

#

yeah python has iterators hahaha

#

the join function works on iterators

patent ocean May 21, 2023, 7:48 AM

#

Is there anyone here who can help me with this github project. I have been looking for help for the past few days but to no avail. The project runs fine but I just want it to run using the test dataset instead of the training dataset. I am actually a beginner and for now, all I want is for the code to run with the two test images of my choosing. The project is "Image steganography using Convolutional Neural Networking".
github project link: https://github.com/vivekmehendiratta/ImageSteganography
the python file I'm lookin at is "hiding single image(fixed).ipynb"
Any help would be greatly appreciated. It is getting a bit desperate.

GitHub

GitHub - vivekmehendiratta/ImageSteganography: Hiding images behind...

Hiding images behind an image using CNNs. Contribute to vivekmehendiratta/ImageSteganography development by creating an account on GitHub.

kind moth May 21, 2023, 8:50 AM

#

How can I split memory when loading large amounts on a not so powerful graphics card? I'm using TensorFlow 2.10

#

My Current Graphics card has 6 GB VRAM but my script gives an error saying it can't locate 13.7 GB to it, so is there a way to fix this?

past meteor May 21, 2023, 11:33 AM

#

Maybe your model and/or batch size are too large

kind moth May 21, 2023, 12:20 PM

#

I set the batch size to 1 and I'm using an autoencoder

hardy depot May 21, 2023, 12:27 PM

#

guys does anybody know how to write images into a directory in google colab with labels

past meteor May 21, 2023, 12:39 PM

#

kind moth I set the batch size to 1 and I'm using an autoencoder

Considering your batch size is 1 your model is too large. How many params do you have?

past meteor May 21, 2023, 12:39 PM

#

hardy depot guys does anybody know how to write images into a directory in google colab with...

You can mount your (google) drive and then write to it from colab

kind moth May 21, 2023, 12:42 PM

#

past meteor Considering your batch size is 1 your model is too large. How many params do you...

Lemme Check

#

I have 741,970 parameters

past meteor May 21, 2023, 1:15 PM

#

Then I don't know why TF is trying to allocate 13.7 GB, maybe the rest has some ideas

kind moth May 21, 2023, 1:16 PM

#

Yeah

#

I reduced the parameters to 187,936 and yet still the same issue but this time 13.47 GB

#

I'm so confused

#

😕

hardy depot May 21, 2023, 1:32 PM

#

past meteor You can mount your (google) drive and then write to it from colab

do you know how to write tho

mild dirge May 21, 2023, 1:34 PM

#

Find out at what line you get that error

serene scaffold May 21, 2023, 2:01 PM

#

@violet gull as you know, this server is about Python. Going forward, any ML questions that you ask in this channel need to be about something you are doing in Python. If that isn't the case, you need to look to another server.

#

Please let me know if you have any questions about that.

rustic trout May 21, 2023, 2:02 PM

#

Hey there! Would it be correct to transform coordinates to deal with skewness?

chrome kernel May 21, 2023, 2:06 PM

#

I'm 41% done through my llm model training. If my accuracy is 71% and loss is .94, should I be worried?

serene scaffold May 21, 2023, 2:08 PM

#

chrome kernel I'm 41% done through my llm model training. If my accuracy is 71% and loss is .9...

LLMs take an exceptionally long time to train for them to be good. you're not fine-tuning an existing one?

chrome kernel May 21, 2023, 2:10 PM

#

I'm fine tuning distilgpt2

#

I think I messed up tho

#

serene scaffold May 21, 2023, 2:11 PM

#

you have to say that--saying that you're "training an LLM" means something different.

chrome kernel May 21, 2023, 2:11 PM

#

I'm training on genius lyrics

chrome kernel May 21, 2023, 2:11 PM

#

serene scaffold you have to say that--saying that you're "training an LLM" means something diffe...

Yeah I figured. Wasn't sure which term to use

#

Kinda new to AI

#

So I'm not sure if I'm even fine tuning right

#

Trying to build a lyric suggestion AI

serene scaffold May 21, 2023, 2:12 PM

#

did you set a pad token in the model config?

chrome kernel May 21, 2023, 2:12 PM

#

no

#

#

pretty bad

#

Not sure if I'm doing anything right tbh

#

shit

#

A single lyric can have text as long as this

#

So essentially, the AI is training on jargon

serene scaffold May 21, 2023, 2:16 PM

#

chrome kernel So essentially, the AI is training on jargon

jargon?

chrome kernel May 21, 2023, 2:17 PM

#

I'm trying to get it to deliver good rhymes or at least, rhyming phrases based on the input

#

Not sure if I even did it right though. I know for a fact that simply training on lyrics won't do that

serene scaffold May 21, 2023, 2:17 PM

#

"jargon" means "the technical vocabulary for some occupation or specialty"

#

do you have the training code somewhere that I can access, like colab?

chrome kernel May 21, 2023, 2:18 PM

#

I initially used colab, but it was too slow

#

I can send you a paste

serene scaffold May 21, 2023, 2:18 PM

#

did you remember to use the GPU?

chrome kernel May 21, 2023, 2:18 PM

#

I thought of it, but I was like, I'm too late

#

Unless of course the checkpoints will let me switch over to gpu?

serene scaffold May 21, 2023, 2:18 PM

#

if you're not using a gpu, you might as well stop everything and start over on a gpu

chrome kernel May 21, 2023, 2:19 PM

#

damn

#

welp

#

I'm going to sleep then work anyways so why not

#

That's strange

#

I swear this thing has an Nvidia gpu tho

serene scaffold May 21, 2023, 2:22 PM

#

is this on your computer, or colab?

chrome kernel May 21, 2023, 2:22 PM

#

my computer

serene scaffold May 21, 2023, 2:22 PM

#

exit the python repl and try nvidia-smi

chrome kernel May 21, 2023, 2:23 PM

#

serene scaffold May 21, 2023, 2:23 PM

#

okay, so your torch installation doesn't have cuda

chrome kernel May 21, 2023, 2:23 PM

#

fuck

serene scaffold May 21, 2023, 2:23 PM

#

you just have to install it with cuda

chrome kernel May 21, 2023, 2:23 PM

#

So I have to say bye bye to this huh

#

Ctrl+C?

serene scaffold May 21, 2023, 2:23 PM

#

you could just leave it running, I guess

#

up to you

chrome kernel May 21, 2023, 2:24 PM

#

Well, if running with a gpu is faster, I'll take that

serene scaffold May 21, 2023, 2:24 PM

#

please say your operating system and python version

chrome kernel May 21, 2023, 2:24 PM

#

Windows 11

#

python 3.11.2

#

Do I need to uninstall pytorch, then reinstall with cuda?

serene scaffold May 21, 2023, 2:25 PM

#

one moment

queen cradle May 21, 2023, 2:26 PM

#

rustic trout Hey there! Would it be correct to transform coordinates to deal with skewness?

Maybe; maybe not. It depends on your ultimate goals. What are you trying to do? Can you show us your data?

serene scaffold May 21, 2023, 2:27 PM

#

@chrome kernel try pip install https://download.pytorch.org/whl/cu117/torch-1.13.0%2Bcu117-cp311-cp311-linux_x86_64.whl

#

oh that's linux

#

pip install https://download.pytorch.org/whl/cu117/torch-2.0.0%2Bcu117-cp311-cp311-win_amd64.whl

past meteor May 21, 2023, 2:37 PM

#

Does Windows support all torch2.0's features? Last time I tried it it didn't (for example compile doesn't work). Might want to run it with WSL 🙂

chrome kernel May 21, 2023, 2:38 PM

#

serene scaffold <@483460784668803082> try `pip install https://download.pytorch.org/whl/cu117/to...

no need to uninstall the previous one?

#

nvm

#

Forgot that python does that for you

#

yeah running on cpu was a mistake lol

simple tapir May 21, 2023, 3:46 PM

#

import torch
from torch import nn
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_blobs
from matplotlib import pyplot as plt
import numpy as np

x, y = make_blobs(100,n_features=2,cluster_std=0.3, random_state=42)

x, y = torch.from_numpy(x).type(torch.float32), torch.from_numpy(y).type(torch.float32)

x_train, x_test, y_train, y_test = train_test_split(x,y,random_state=42)


class Model(nn.Module):
    def __init__(self, input_layer, hidden_unit, output_layer, linearity : bool) -> None:
        self.check = linearity
        super().__init__()
        if linearity is not True:
            self.layer = nn.Sequential(
                nn.Linear(input_layer, hidden_unit),
                nn.ReLU(),
                nn.Linear(hidden_unit, hidden_unit),
                nn.ReLU(),
                nn.Linear(hidden_unit, output_layer)
            )
        else:
             self.layerr = nn.Sequential(
                nn.Linear(input_layer, hidden_unit),
                nn.Linear(hidden_unit, hidden_unit),
                nn.Linear(hidden_unit, output_layer)
            )
    
    def forward(self, x:torch.Tensor):
          if self.check:
              return self.layerr(x)
          return self.layer(x)
    
model = Model(input_layer=2, hidden_unit=8, output_layer=1, linearity=True)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
criterion = nn.BCEWithLogitsLoss()

def simpleTrain(model, x_train, y_train, epochs, optimizer, loss_function):
    loss = 0
    for epoch in range(epochs):
        model.train()
        y_preds = model(x_train).squeeze()
        loss = loss_function(y_preds, y_train)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    return loss.item()

#

How can i make this train function general? Like when I work with datasets, this shouldn't work. But If i enter dataloader as a parameter, then how can i make it so this project also works?

ocean prism May 21, 2023, 6:10 PM

#

o

chrome kernel May 21, 2023, 6:16 PM

#

yeah Just woke up

#

I'm done. cancelling this shit

coral field May 21, 2023, 6:33 PM

#

How would I know when I should call an activation function?

mild dirge May 21, 2023, 6:36 PM

#

You use them to make operations non-linear. Thus you want to use them when you have multiple linear layers (dense layers). Because you can get the same result with 1 linear layer as any number of subsequent linear layers, we use non-linear activation between all linear layers.

#

The same goes for convolutional layers.

#

And often the same activation is used for all the hidden layers, except for the final layer, which has an activation function based on what the task of your model is.

#

Sigmoid/softmax for classification f.e. or no/linear activation for regression

#

@coral field

coral field May 21, 2023, 6:38 PM

#

so it would typically happen after every layer in the model?

mild dirge May 21, 2023, 6:38 PM

#

Yes, because if you have 2 linear layers without activation, you might as well just use 1 linear layer

rustic trout May 21, 2023, 7:28 PM

#

queen cradle Maybe; maybe not. It depends on your ultimate goals. What are you trying to do? ...

I'm trying to model the house price.

rustic trout May 21, 2023, 7:31 PM

#

queen cradle Maybe; maybe not. It depends on your ultimate goals. What are you trying to do? ...

This is my data:

ocean swallow May 21, 2023, 8:47 PM

#

no offense but this was approved for advertising?

#

How do you use LLMs for Law applications effectively? I want to get a PoC that answers users regarding their law related queries. it should refer to/quote from laws when answering whenever possible. And one other thing a lot of the laws refer to each other so the agent shouls recursively find related laws. How should I approach this?

queen cradle May 21, 2023, 9:01 PM

#

rustic trout I'm trying to model the house price.

Modeling house prices is difficult because so many factors affect what a house is worth. I do not think you will find a simple distribution that models house prices.

chrome kernel May 21, 2023, 11:52 PM

#

Holy shit gpu training is fast

#

MINUTES compared to 3 hours

mild dirge May 22, 2023, 12:23 AM

#

Well yeah, gpus have thousands of cores, your cpu prob only 10-20 ish 😛 @chrome kernel

chrome kernel May 22, 2023, 2:28 AM

#

lol

somber pollen May 22, 2023, 4:13 AM

#

the GTX 1080 Ti has 1024 separate execution units in each warp or whatever they call them

#

and I'm pretty sure that number has also been increasing exponentially since that card was made, which was like

#

2014? I don't even know

worldly dawn May 22, 2023, 4:18 AM

#

somber pollen 2014? I don't even know

you can run commands like clinfo to get that data

somber pollen May 22, 2023, 4:18 AM

#

worldly dawn you can run commands like `clinfo` to get that data

I don't have a card made since 2016 hahaha

#

I have a m1 13" from 2020 and a desktop w a ryzen 5900x and 2070super

#

I feel like they have also changed the number of each kind of parallel core though

#

Oh no it just turns out my GPU is significantly worse than a 1080 Ti

#

It's interesting comparing the FP64 speed of old GPUs vs new

chrome kernel May 22, 2023, 4:22 AM

#

wow bruh. I've been using a test size of 200 this whole time, but that value is supposed to be a double from 0 to 1 in the train_test_split function for transformers

#

No wonder I've been running out of memory

somber pollen May 22, 2023, 4:22 AM

#

hahahaha

#

yeah that would make sense

#

I definitely would not have caught that either tho

chrome kernel May 22, 2023, 4:22 AM

#

It was a percentage of the test data this whole time lmao

somber pollen May 22, 2023, 4:22 AM

#

I wish Python had like constrained range types or something

#

But this is also a criticism often levied at Python for data science

#

But I feel like a slightly stronger type system might help with stuff like that

chrome kernel May 22, 2023, 4:30 AM

#

I swear I'm going to break my monitor

#

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.84 GiB (GPU 0; 12.00 GiB total capacity; 5.26 GiB already allocated; 718.06 MiB free; 9.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

somber pollen May 22, 2023, 5:00 AM

#

classic steps to address this are to reduce batch size, reduce epoch steps, reduce epoch number

modest juniper May 22, 2023, 7:47 AM

#

You guys probably heard this a lot already but is there any difference between PyTorch, TensorFlow and MxNet that would compel me to choose one over other?

simple tapir May 22, 2023, 9:51 AM

#

modest juniper You guys probably heard this a lot already but is there any difference between *...

https://reason.town/mxnet-vs-tensorflow-vs-pytorch/

#

You can find more information by googling your question

modest juniper May 22, 2023, 9:57 AM

#

The reason I came here is because that and similar links are blocked.

cold osprey May 22, 2023, 10:32 AM

#

blocked?

ocean swallow May 22, 2023, 11:54 AM

#

modest juniper You guys probably heard this a lot already but is there any difference between *...

PyTorch

#

just trust me. tensorflow was corrupted by multiple API imlementations and still to this day it is very possible to find 3 completely different codes that do the same

#

v1 v2 Keras API

#

and then what autograd?

modest juniper May 22, 2023, 11:56 AM

#

lol

#

their naming sense would give microsoft a run for its money

modest juniper May 22, 2023, 11:57 AM

#

cold osprey blocked?

I live in 🇮🇷

mild dirge May 22, 2023, 12:04 PM

#

I definitely prefer pytorch as well

#

But keras is nice for making a model very quickly

cold osprey May 22, 2023, 12:09 PM

#

+1 pytorch

hardy depot May 22, 2023, 12:16 PM

#

anybody wanna join me on a dlib face emotion recognition project?

#

im about half way donetoo so

zenith gull May 22, 2023, 2:35 PM

#

hey guys i'm a lil new to this but I was wondering if I could get some help with fine tuning, If anyone can help that would be amazing

serene scaffold May 22, 2023, 2:41 PM

#

zenith gull hey guys i'm a lil new to this but I was wondering if I could get some help with...

Be sure to always ask a complete question that someone can start answering.

hasty mountain May 22, 2023, 3:04 PM

#

worldly dawn you can run commands like `clinfo` to get that data

That's a bit wonderful
nvidia-smi --id=0

Mon May 22 12:02:53 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 497.09       Driver Version: 497.09       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P8     2W /  N/A |    534MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

#

The --h command also shows quite some things. Too bad it isn't clear for lay people grumpchib

elfin robin May 22, 2023, 3:43 PM

#

Help please

fickle shale May 22, 2023, 3:45 PM

#

elfin robin Help please

https://www.geeksforgeeks.org/python-opencv-capture-video-from-camera/

#

i found this bro!

#

or try to upgrade or cv2

serene scaffold May 22, 2023, 3:53 PM

#

elfin robin Help please

Do you understand what the error message is telling you?

elfin robin May 22, 2023, 4:04 PM

#

serene scaffold Do you understand what the error message is telling you?

yeah actually argument is needed inorder to access webcam but its throwing error

elfin robin May 22, 2023, 4:06 PM

#

fickle shale or try to upgrade or cv2

did but not working

maiden coral May 22, 2023, 5:00 PM

#

https://medium.com/p/9d74c4ab1995

Medium

LangChain Tutorial: A Step-by-Step LangChain Python Crash Course

Langchain is a framework that allows you to create an application powered by a language model, in this LangChain Tutorial Crash you will…

limber token May 22, 2023, 5:15 PM

#

I'm trying to plot stacked bars using seaborn, but when they're stacked, they "overlap". What I mean by this is that instead of the next bar starting when the previous ends, they all start at y=0 and the shortest bars are closer to the screen in the z axis. How can I fix this? Here is the code I'm using:

grouped_df = (
    df.groupby(['user:Name', 'user:Project'])
    .sum(numeric_only=True)
    .reset_index()[['user:Name', 'user:Project', 'BlendedCost']]
    .sort_values(by='BlendedCost', ascending=False)
)

sns.barplot(
    x='user:Name',
    y='BlendedCost',
    hue='user:Project',
    data=grouped_df,
    estimator=sum,
    saturation=1,
    dodge=False,
    order=grouped_df['user:Name'].unique(),
)
plt.xticks(rotation=45, ha='right')
plt.ylabel('BlendedCost (BRL)')
plt.xlabel('Project')
plt.title(f'Cost by project and product name\n{MONTH}')
plt.grid(axis='y')

for i, v in enumerate(
    grouped_df.drop_duplicates('user:Name').reset_index()['BlendedCost']
):
    plt.text(
        i,
        v,
        f'Total: R$ {str(round(grouped_df.groupby("user:Name").sum(numeric_only=True).reset_index().iloc[i, 1], 2))}',
        horizontalalignment='center',
        verticalalignment='bottom',
    )

plt.axhline(y=cost_project[:, 1].mean(), color='k', linestyle='-', linewidth=1, alpha=0.5)
plt.text(
    x=len(cost_project[:, 1]) - 2,
    y=cost_project[:, 1].mean(),
    s=f'Mean: R$ {str(round(cost_project[:, 1].mean(), 2))}',
    horizontalalignment='center',
    verticalalignment='bottom',
)

plt.savefig('cost_by_project_and_product_name.png', bbox_inches='tight')
plt.show()

#

Basically I want to get rid of the "Total" annotation since the information itself would be contained in the y axis

earnest moat May 22, 2023, 5:50 PM

#

Quick Question here (since I kept getting closed in the help channel 😅) :

So I’m currently trying to covert this sample pdf to text in excel using TABULA. I want to see if I can extract the first part starting from REVENUE to TOTAL revenue. (made up numbers for fake company)

I started my trying to import the pdf and pull text out but didn’t have much luck? I thought at first maybe the pdf was unstructured (but it looks like the text is in a table kinda format??) any tips?

what i have so far...

import tabula
df = tabula.read_pdf("Invoice.pdf", pages='all')[0]

# convert PDF into CSV
tabula.convert_into("Invoice.pdf", "Invoice_text.csv", output_format="csv", pages='all')
print(df)

earnest moat May 22, 2023, 5:57 PM

#

limber token I'm trying to plot stacked bars using seaborn, but when they're stacked, they "o...

I could be wrong but have u tried

grouped_df .plot(kind='bar', stacked=True)

mild dirge May 22, 2023, 6:00 PM

#

What does it print? @earnest moat

earnest moat May 22, 2023, 6:01 PM

#

mild dirge What does it print? <@623166473149480981>

Right now it prints:
[ ]

mild dirge May 22, 2023, 6:01 PM

#

Can you send me the pdf in dms?

faint mist May 22, 2023, 6:38 PM

#

for experts in timeseries forecasting, I have an LSTM model that takes the past 21 values and predict the next value. I set the supervised dataset in a sliding window fashion of step 1. Example x1 = [t1, t2, .., t21] and y1=[t22], then x2 = [t2, t3, .., t22] and y2=[t23]. Now, I want to use ARIMA to compare the performance. Not sure how to train such a model. Given that I have a 80:20 split. I know I can use ARIMA to fit on the 80 split but how to test and compare on the 20 split ?

#

If I have 4000 timesteps, I would allocate 3200 steps for training and 600 for testing

lapis sequoia May 22, 2023, 7:07 PM

#

past meteor I was interested in general usage patterns, what days was I sending a lot of mes...

Nice! 🙂

#

I think it is amazing that you motivated yourself into doing that

regal vault May 22, 2023, 8:26 PM

#

i have some raytracer code i want to run on the gpu

#

can anyone hlep

#

i looked into numba

#

but with me using diffrent classes is makeing it thorw errors

violet gull May 22, 2023, 8:50 PM

#

what kind of optimizer does Deep Q Learning use?

#

SGD?

past meteor May 22, 2023, 9:10 PM

#

faint mist for experts in timeseries forecasting, I have an LSTM model that takes the past ...

Check out Nixtla's statsforecast package

#

Imo they have the most comprehensive set of time series models in python that aren't just horrible wrappers of pdarima and statsmodels (both are slow)

past meteor May 22, 2023, 9:13 PM

#

faint mist for experts in timeseries forecasting, I have an LSTM model that takes the past ...

A specific answer to your question, you can "unroll" your model and predict the last 20 % all at once in an autoregressive fashion or you can use y_true after it's available. They support both but the latter is a bit hidden in the docs

wintry crag May 22, 2023, 10:26 PM

#

anybody have knowledge of 1/n modelling?

limber token May 22, 2023, 10:41 PM

#

earnest moat I could be wrong but have u tried grouped_df .plot(kind='bar', stacked=True)

Yup, the output is them stacked on the "z" axis and not on the y axis

candid tiger May 22, 2023, 11:00 PM

#

I have an interview tomorrow for a research assistant position working on natural language processing with additional desire for knowledge on using multi-modal data, rumour verification and explainability of NLP models. Can anyone think of anything essential I should probably be prepared to discuss technically?

night kernel May 22, 2023, 11:08 PM

#

noob/novice here

is it possible to recreate some of the features of chat gpt easily? for instance - i want to generate questions from amounts of pasted text (create multiple choice and true or false). so- only 2 small features

since cgpt can be costly, could i make my own api with my own code for the two small features? would it cost the same amount?

serene scaffold May 22, 2023, 11:37 PM

#

night kernel noob/novice here is it possible to recreate some of the features of chat gpt ea...

looks like this does that: https://github.com/asahi417/lm-question-generation

GitHub

GitHub - asahi417/lm-question-generation: Multilingual/multidomain ...

Multilingual/multidomain question generation datasets, models, and python library for question generation. - GitHub - asahi417/lm-question-generation: Multilingual/multidomain question generation ...

old hornet May 23, 2023, 2:19 AM

#

anyone familiar with detectron2? Pls check out my question in the python-help section.

https://discord.com/channels/267624335836053506/1110390271948107866

dusty bay May 23, 2023, 3:44 AM

#

How do I read line by line and test each line in a csv file whether it starts with a number or a letter in pandas?

serene scaffold May 23, 2023, 4:05 AM

#

dusty bay How do I read line by line and test each line in a csv file whether it starts wi...

if you're doing it with pandas, you don't do it line by line. you do it all at once.

dusty bay May 23, 2023, 4:07 AM

#

serene scaffold if you're doing it with pandas, you don't do it line by line. you do it all at o...

How? please explain to me

serene scaffold May 23, 2023, 4:07 AM

#

dusty bay How? please explain to me

did you already open the csv as a dataframe?

dusty bay May 23, 2023, 4:07 AM

#

yes i did

serene scaffold May 23, 2023, 4:08 AM

#

dusty bay yes i did

please show the first few lines of the CSV in the chat.

#

(no screenshots)

dusty bay May 23, 2023, 4:08 AM

#

"RMS Level",
Ch1,
X,Y
Hz,dBSPL
18.75,-1.1629900192681
18.8593510765948,-0.695322155711976
18.9693398949471,-0.227654292155845
19.0799701744034,0.240013571400276

tulip wyvern May 23, 2023, 4:08 AM

#

I'm trying to do a binary classification task with images but my loss is just -0.0 and I get the RuntimeError "Expected floating point type for target with class probabilities, got Long" after 1500 batches. Here's the link to my colab: https://colab.research.google.com/drive/1IHt6P9M57Sh3xaZp5OhaeXpLaAdJJNwk#scrollTo=fHVmZ0o3EPTs&uniqifier=1

Google Colaboratory

#

I'm so confused

dusty bay May 23, 2023, 4:09 AM

#

dusty bay ``` "RMS Level", Ch1, X,Y Hz,dBSPL 18.75,-1.1629900192681 18.8593510765948,-0.69...

@serene scaffold this

serene scaffold May 23, 2023, 4:10 AM

#

dusty bay <@253696366952316929> this

and you need to split it into dataframes where the rows are text, vs numbers?

tulip wyvern May 23, 2023, 4:11 AM

#

Stel do you think you could help me

serene scaffold May 23, 2023, 4:11 AM

#

tulip wyvern Stel do you think you could help me

I'm going to go to sleep after I finish helping phd-fauzan; sorry

tulip wyvern May 23, 2023, 4:11 AM

#

oh okay no worries

dusty bay May 23, 2023, 4:12 AM

#

serene scaffold and you need to split it into dataframes where the rows are text, vs numbers?

I want to separate header (text) with data (Number)'

serene scaffold May 23, 2023, 4:13 AM

#

Suppose you have this

                  0                   1
0         RMS Level                 NaN
1               Ch1                 NaN
2                 X                   Y
3                Hz               dBSPL
4             18.75    -1.1629900192681
5  18.8593510765948  -0.695322155711976
6  18.9693398949471  -0.227654292155845
7  19.0799701744034   0.240013571400276

you can do this

In [7]: df.iloc[:, 0].str[0].str.isalpha()
Out[7]:
0     True
1     True
2     True
3     True
4    False
5    False
6    False
7    False

And that tells you which rows start with a letter.

dusty bay May 23, 2023, 4:14 AM

#

serene scaffold Suppose you have this ```py 0 1 0 RM...

And how do you separate ?

serene scaffold May 23, 2023, 4:14 AM

#

dusty bay And how do you separate ?

In [8]: starts_with_letter = df.iloc[:, 0].str[0].str.isalpha()

In [9]: df[starts_with_letter]
Out[9]:
           0      1
0  RMS Level    NaN
1        Ch1    NaN
2          X      Y
3         Hz  dBSPL

In [10]: df[~starts_with_letter]
Out[10]:
                  0                   1
4             18.75    -1.1629900192681
5  18.8593510765948  -0.695322155711976
6  18.9693398949471  -0.227654292155845
7  19.0799701744034   0.240013571400276

dusty bay May 23, 2023, 4:16 AM

#

serene scaffold ```py In [8]: starts_with_letter = df.iloc[:, 0].str[0].str.isalpha() In [9]: d...

And if a csv file has many headers, does that code also apply?

serene scaffold May 23, 2023, 4:16 AM

#

dusty bay And if a csv file has many headers, does that code also apply?

not sure what you mean by "many headers"

dusty bay May 23, 2023, 4:17 AM

#

wait

#

"Level and Distortion",,,,,,,,,,,,,,,
"Ch1 (F)",,"Ch1 (H2)",,"Ch1 (H3)",,"Ch1 (Total)",,"Ch2 (F)",,"Ch2 (H2)",,"Ch2 (H3)",,"Ch2 (Total)",
X,Y,X,Y,X,Y,X,Y,X,Y,X,Y,X,Y,X,Y
Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms,Hz,Vrms
20,0.00772013164376534,20,5.60982648239952E-05,20,0.000389709733151927,20,0.011492581958802,20,0.00699792689186063,20,0.000151471712877565,20,0.000389940899485093,20,0.010080448380793
"THD Ratio",,,,,,,,,,,,,,,
Ch1,,Ch2,,,,,,,,,,,,,
X,Y,X,Y,,,,,,,,,,,,
Hz,%,Hz,%,,,,,,,,,,,,
20,83.009797319554,20,82.1460991930652,,,,,,,,,,,,
21.1179638886716,85.3656629417084,21.1179638886716,82.0338466400102,,,,,,,,,,,,
22.2984199401618,90.6674826441566,22.2984199401618,85.7190774666039,,,,,,,,,,,,

If I have a csv file like this to separate text and number

dusty bay May 23, 2023, 4:19 AM

#

serene scaffold ```py In [8]: starts_with_letter = df.iloc[:, 0].str[0].str.isalpha() In [9]: d...

does this code work?

serene scaffold May 23, 2023, 4:19 AM

#

dusty bay does this code work?

try it and see.

#

tempus est dormienda.

dusty bay May 23, 2023, 4:20 AM

#

serene scaffold tempus est dormienda.

Tibi gratias ago pro subsidio tuo

serene scaffold May 23, 2023, 4:21 AM

#

Did you use Google translate

tulip wyvern May 23, 2023, 4:22 AM

#

I'm trying to do a binary classification task with images but my loss is just -0.0 and I get the RuntimeError "Expected floating point type for target with class probabilities, got Long" after 1500 batches. Here's the link to my colab: https://colab.research.google.com/drive/1IHt6P9M57Sh3xaZp5OhaeXpLaAdJJNwk#scrollTo=fHVmZ0o3EPTs&uniqifier=1

Is anybody available to help?

Google Colaboratory

dusty bay May 23, 2023, 4:27 AM

#

serene scaffold Did you use Google translate

that right hahah

magic dune May 23, 2023, 4:31 AM

#

hi

tulip wyvern May 23, 2023, 4:32 AM

#

@magic dune

magic dune May 23, 2023, 4:33 AM

#

yes

tulip wyvern May 23, 2023, 4:33 AM

#

BY PERCHANCE COULD YO HELP

magic dune May 23, 2023, 4:33 AM

#

maybe

#

what u need?

#

@tulip wyvern

tulip wyvern May 23, 2023, 4:34 AM

#

My loss keeps coming up as -0.0 becuase my pred is a super small value (~10^-10)

#

https://colab.research.google.com/drive/1IHt6P9M57Sh3xaZp5OhaeXpLaAdJJNwk#scrollTo=fHVmZ0o3EPTs&uniqifier=1

Google Colaboratory

#

and then it crashes at batch 1500

#

with "RuntimeError: Expected floating point type for target with class probabilities, got Long"

magic dune May 23, 2023, 4:35 AM

#

so you are using a simple nn for binary

#

@tulip wyvern

tulip wyvern May 23, 2023, 4:35 AM

#

yes

#

@magic dune are u still here

magic dune May 23, 2023, 4:51 AM

#

tulip wyvern <@555944200047296513> are u still here

ya

#

looking at it should work

#

@tulip wyvern https://stackoverflow.com/questions/63383347/runtimeerror-expected-object-of-scalar-type-long-but-got-scalar-type-float-for

Stack Overflow

RuntimeError: Expected object of scalar type Long but got scalar ty...

I'm running into an issue while calculating the loss for my Neural Net. I'm not sure why the program expects a long object because all my Tensors are in float form. I looked at threads with similar

tulip wyvern May 23, 2023, 4:53 AM

#

I have the opposite error of that stack overflow

magic dune May 23, 2023, 4:53 AM

#

tulip wyvern I have the opposite error of that stack overflow

what are you setting dtype too?

tulip wyvern May 23, 2023, 4:54 AM

#

float32

magic dune May 23, 2023, 4:54 AM

#

try long

#

?

tulip wyvern May 23, 2023, 4:55 AM

#

#

Then I get this when I try to pass it into the loss function

lapis sequoia May 23, 2023, 4:56 AM

#

pydis

dusty bay May 23, 2023, 5:02 AM

#

dusty bay ``` "Level and Distortion",,,,,,,,,,,,,,, "Ch1 (F)",,"Ch1 (H2)",,"Ch1 (H3)",,"Ch...

@serene scaffold If using the code you sent it doesn't work. Please help me after you wake up

fiery grotto May 23, 2023, 6:41 AM

#

I'm not sure if this is the right channel to request help but I had an old nvidia jetson nano and I wanted to try running some pytorch code on it. I don't have much experience with linux which made installing python and pytorch on it a pain. So far I have installed python 3.8 which the jetpack version im running isn't meant for but I installed it anyway and it seemed to work fine, also i needed that version or higher for the application im making.

I'm have downloaded and installed pytorch manually from the binaries available on https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048 and it seemed to go fine apart from a few bumps that I was able to solve. I then tried downloading and installing torchvision which didn't go as smoothly. I downloaded it using

git clone --branch v0.12.0 https://github.com/pytorch/vision torchvision

and installed it using

cd torchvision python3.8 setup.py install --user```
and its the correct version of torchvision for the version of torch I'm using.
I ran into this error while installing
`OSError: libmpi_cxx.so.40: cannot open shared object file: No such file or directory`
I found on google that I had to run the command `sudo apt-get install libopenmpi3` which said it is unable to locate the package libopenmpi3

Can someone help me out with this? If this is the wrong channel, please direct me to the correct one.

NVIDIA Developer Forums

PyTorch for Jetson

Below are pre-built PyTorch pip wheel installers for Jetson Nano, TX1/TX2, Xavier, and Orin with JetPack 4.2 and newer. Download one of the PyTorch binaries from below for your version of JetPack, and see the installation instructions to run on your Jetson. These pip wheels are built for ARM aarch64 architecture, so run these commands on your...

past meteor May 23, 2023, 6:47 AM

#

fiery grotto I'm not sure if this is the right channel to request help but I had an old nvidi...

My recommendation is to really really read the manual well. I didn't do the installation part but I remember one of my colleagues managed to flash the Nvidia machine we had. I think we ended up running containers on them.

fiery grotto May 23, 2023, 6:53 AM

#

past meteor My recommendation is to really really read the manual well. I didn't do the inst...

I did flash my jetson multiple times and even tried other versions of jetpack and it didn’t help. I read through multiple articles about running pytorch on the jetson nano and I didn’t seem to be missing anything. One reason I feel it isn’t working is because of the jetpack version. The fact that the latest version for the jetson nano wasn’t made for python3.8 could be a reason. I don’t think it will work but I will try to use l4t pytorch instead

potent pollen May 23, 2023, 7:47 AM

#

Hello! I've wanted to create an IA playing tetris using machine learning. I've tried to create his neural network but after 1000 generations of 12 agents (I konw it's really low but I don't have the choice since I'm making this IA on a calculator) they won't play any better : I've made the conclusion that the problem was in my neural network itself : on the input layer was I giving to the agent the display of the tetris piece in binary (so an array of dimensions 4x4) (if there is a block, there is a 1 on the display and if there is no block at this place there is a 0). The other input I was giving her was the distance from the bottom of the piece to ground. I've thought that the IA could understand what the display means but it's seems I was wrong since all of them keeps failing. The other idea I got was to give her, instead of the distance to the ground, the whole display of all the tetris grid, an array of dim 10x20, but this is really a lot and I would like to keep it as small as possible. Does anyone have an idea of the inputs I could give to the neural network please?

faint mist May 23, 2023, 8:52 AM

#

when I run this code:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(
    models=[AutoARIMA],
    freq='B',
    n_jobs=-1
)

sf.fit(train)

I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 sf.fit(train)

File ~\miniconda3\envs\thesis\lib\site-packages\statsforecast\core.py:579, in _StatsForecast.fit(self, df, sort_df)
    577 self._prepare_fit(df, sort_df)
    578 if self.n_jobs == 1:
--> 579     self.fitted_ = self.ga.fit(models=self.models)
    580 else:
    581     self.fitted_ = self._fit_parallel()

File ~\miniconda3\envs\thesis\lib\site-packages\statsforecast\core.py:72, in GroupedArray.fit(self, models)
     70     X = grp[:, 1:] if (grp.ndim == 2 and grp.shape[1] > 1) else None
     71     for i_model, model in enumerate(models):
---> 72         new_model = model.new()
     73         fm[i, i_model] = new_model.fit(y=y, X=X)
     74 return fm

TypeError: _TS.new() missing 1 required positional argument: 'self'

#

not sure what I am doing wrong here?

boreal gale May 23, 2023, 8:56 AM

#

example from https://github.com/Nixtla/statsforecast
shows:

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA

sf = StatsForecast(
    models = [AutoARIMA(season_length = 12)],
    freq = 'M'
)

sf.fit(df)
sf.predict(h=12, level=[95])

note the (...) after AutoARIMA, you are meant to pass instances of AutoARIMA not the class of AutoARIMA into models (it doesn't have to be AutoARIMA - it could be any other time series model)

faint mist May 23, 2023, 8:58 AM

#

Ahhh, stupid of me.

#

Thank you for noticing that, feels so stupid of me lol.

boreal gale May 23, 2023, 8:59 AM

#

nah, this sort of thing happens to everyone 😉

#

the library looks interesting, so i learnt something as well heh

faint mist May 23, 2023, 9:00 AM

#

Suggested by @past meteor

faint mist May 23, 2023, 9:32 AM

#

I am still reading through the source code and docs.

past meteor May 23, 2023, 9:33 AM

#

boreal gale the library looks interesting, so i learnt something as well heh

Yup! I got tired of packages just wrapping either statsmodels or pdarima which are horribly slow, at least this one implements everything in pure Python. Means I don't need to go back and forth between Python/R

faint mist May 23, 2023, 9:33 AM

#

Since I am only interested in the AutoARIMA class.

#

I see that I can insatiate AutoARIMA() directly and fit my timeseries

#

but I don't see how can I view the best ARIMA order

boreal gale May 23, 2023, 9:35 AM

#

past meteor Yup! I got tired of packages just wrapping either statsmodels or pdarima which a...

that's fairly surprising, actually pure python?
am wondering how they get that sort of performance in terms of training speed

past meteor May 23, 2023, 9:35 AM

#

boreal gale that's fairly surprising, actually pure python? am wondering how they get that s...

Numba

boreal gale May 23, 2023, 9:35 AM

#

oh right ha

faint mist May 23, 2023, 9:35 AM

#

past meteor A specific answer to your question, you can "unroll" your model and predict the ...

I am trying to implement the second option here.

past meteor May 23, 2023, 9:36 AM

#

You can use the forward method

faint mist May 23, 2023, 9:36 AM

#

https://github.com/Nixtla/statsforecast/blob/main/statsforecast/models.py#L515

#

Lol, that what I was trying to ask

#

if I can use the forward method for my case

past meteor May 23, 2023, 9:38 AM

#

It's actually SARIMAX so I'd switch off whatever terms you don't want

#

For example, a dataset I was working with didn't have any seasonality or trend so no point in going beyond an "autoARMA" for me

faint mist May 23, 2023, 9:39 AM

#

I see.

#

in my case I am looking at a dataset of gold prices

#

I did the timeseries decomposition and I can see the seasonal component

past meteor May 23, 2023, 9:41 AM

#

I wouldn't know a priori if it has seasonality and trend, you can plot the data / do a unit root test if you want

faint mist May 23, 2023, 9:41 AM

#

but not sure how frequent is it.

past meteor May 23, 2023, 9:41 AM

#

But this is @boreal gale 's territory moreso than mine 😅

faint mist May 23, 2023, 9:42 AM

#

I will look into unit root test

#

Thank you both!

#

Ah, by unit root test you mean ADF

past meteor May 23, 2023, 9:44 AM

#

There is also ljung-box etc.

#

I'd closely read exactly what they do because they trip me up as well. Hypothesis testing isn't my forte aside from when I was actively taking statistics/econometrics classes

faint mist May 23, 2023, 9:46 AM

#

I hate statistics and here I am

faint mist May 23, 2023, 10:02 AM

#

when running this code:

from statsforecast.models import AutoARIMA
arima = AutoARIMA(
    max_p=21,
    max_d=2,
    max_q=21,
    seasonal=False,
    trace=True,
    stepwise=False,
    parallel=False,
)

arima_fitted = arima.fit(train['y'].to_numpy())

#

I get this error:

File ~\miniconda3\envs\thesis\lib\site-packages\statsforecast\arima.py:1337, in search_arima(x, d, D, max_p, max_q, max_P, max_Q, max_order, stationary, ic, trace, approximation, xreg, offset, allow_drift, allow_mean, parallel, num_cores, period, **kwargs)
   1335 else:
   1336     raise NotImplementedError("parallel=True")
-> 1337 return best_fit

UnboundLocalError: local variable 'best_fit' referenced before assignment

#

def search_arima(
    x,
    d=0,
    D=0,
    max_p=5,
    max_q=5,
    max_P=2,
    max_Q=2,
    max_order=5,
    stationary=False,
    ic="aic",
    trace=False,
    approximation=False,
    xreg=None,
    offset=None,
    allow_drift=True,
    allow_mean=True,
    parallel=False,
    num_cores=2,
    period=1,
    **kwargs
):
    m = period
    allow_drift = allow_drift and (d + D) == 1
    allow_mean = allow_mean and (d + D) == 0
    # max_K = allow_drift or allow_mean

    if not parallel:
        best_ic = np.inf
        for i in range(max_p):
            for j in range(max_q):
                for I in range(max_P):
                    for J in range(max_Q):
                        if i + j + I + J > max_order:
                            continue
                        fit = myarima(
                            x,
                            order=(i, d, j),
                            seasonal={"order": (I, D, J), "period": m},
                        )
                        if fit["ic"] < best_ic:
                            best_ic = fit["ic"]
                            best_fit = fit
    else:
        raise NotImplementedError("parallel=True")
    return best_fit

#

am I missing something here ? from the error it seems it returns best_fit before doing the assignment ?

#

It seems the error is related to stepwise=False

#

quick question on how to use the forward() method

merry ridge May 23, 2023, 10:44 AM

#

I use a fair bit of statistics but I haven't used Python in a long time. At a glance you are never satisfying that final if statement that assigns best_fit.

faint mist May 23, 2023, 10:45 AM

#

merry ridge I use a fair bit of statistics but I haven't used Python in a long time. At a gl...

I believe that too, I am actually using a library and I think it is a bug when setting stepwise=False

faint mist May 23, 2023, 10:46 AM

#

faint mist quick question on how to use the ```forward()``` method

# create empty array for predictions
predictions = []
# loop through test set
for i in range(len(test), len(df)):
    # predict next value
    prediction = arima.forward(df['y'].iloc[:i].to_numpy(), h=1)
    # append to predictions list
    predictions.append(prediction)

#

does this make sense ?

merry ridge May 23, 2023, 10:46 AM

#

What does ic mean in this context

faint mist May 23, 2023, 10:47 AM

#

I think information criterion.

#

Default is AIC

merry ridge May 23, 2023, 10:48 AM

#

Oh okay. I would at least print each IC score it produces on each iteration and see what it is producing first. I assume it's producing none or something.

#

The other possibility without reading any documentation is that you define a variable ic to be the string "aic" but later write fit["ic"]. That doesn't look like you want to do that.

#

But I've written maybe 100 lines of code in the last 6 months so I can't recall that much syntax off the top of my head.

orchid cargo May 23, 2023, 12:57 PM

#

Hey for making some AI in python, what package that i have to installed?

wild sluice May 23, 2023, 1:04 PM

#

So i have a question. I'm new to machine learning but in every tutorial I watch all they do is import a library after teaching a model and it just does all the work for them. Is this how machine learning looks like for the average or it much more complex?

mild dirge May 23, 2023, 1:37 PM

#

Well there is multiple levels of complexity for libraries. Some allow you to get really low level and define stuff really specific, whereas others have preset models and premade layers.

#

There is a lot of repeatable concepts in machine learning, so making it from scratch for every project would not be needed.

#

But understanding how the model works is a whole other concept, you can make a very complex model with incomprehensible concepts with a few lines in Python while not understanding any of it, and sometimes still be succesful.

#

But it is mostly when it doesn't work from the get go that you need to understand how it works. @wild sluice

wild sluice May 23, 2023, 1:39 PM

#

thnx

wild sluice May 23, 2023, 1:45 PM

#

mild dirge But it is mostly when it doesn't work from the get go that you need to understan...

but doesn't that mean machine learning is easy to pick up on?

mild dirge May 23, 2023, 1:45 PM

#

Practically maybe, but theoretically I wouldn't say so no.

past meteor May 23, 2023, 1:45 PM

#

Just like programming, it's very easy to start but there's a million rabbit holes

mild dirge May 23, 2023, 1:45 PM

#

But again, if you don't understand it, and it doesn't work from the start, you will have a hard time fixing it.

#

Because you don't understand why it does not work

#

And especially the fine tuning of a model, and those last few % of accuracy are hard to get

past meteor May 23, 2023, 1:49 PM

#

wild sluice but doesn't that mean machine learning is easy to pick up on?

Imo it's worth it to just start/keep going. Over the years (and books) more and more stuff will make sense

wild sluice May 23, 2023, 1:51 PM

#

So if I'm learning machine learning do I also need skills like web app dev, cloud computing or networking?

serene scaffold May 23, 2023, 1:51 PM

#

wild sluice So if I'm learning machine learning do I also need skills like web app dev, clou...

not really

wild sluice May 23, 2023, 1:51 PM

#

huh

past meteor May 23, 2023, 1:52 PM

#

You can get by without those but knowing more than 1 thing is generally a good idea in my humble opinion (this applies for every subdomain)

serene scaffold May 23, 2023, 2:23 PM

#

past meteor You can get by without those but knowing more than 1 thing is generally a good i...

if that's your humble opinion, what is your arrogant opinion?

past meteor May 23, 2023, 2:26 PM

#

serene scaffold if that's your humble opinion, what is your arrogant opinion?

On this topic, none! 😄 But I guess I have a bunch of hot takes on other ones though

serene scaffold May 23, 2023, 2:27 PM

#

past meteor On this topic, none! 😄 But I guess I have a bunch of hot takes on other ones th...

share the hot takes
I wanna feel this channel burn.

past meteor May 23, 2023, 2:32 PM

#

The one I get the most flak for saying is that Pandas is unintuitive as hell

faint mist May 23, 2023, 2:54 PM

#

I have compared three deep learning models for timeseries forecasting using one step ahead, and multi-output forecast

#

Compared to ARIMA, ARIMA did outperform all of them

#

The deep learning models are LSTM, MLP and LSTM-MLP

#

LSTM being the worst

past meteor May 23, 2023, 2:55 PM

#

Arima ❤️

faint mist May 23, 2023, 2:55 PM

#

This is confusing to me

#

because i read many literature that deep learning outperforms ARIMA

#

not sure if I am doing something wrong

past meteor May 23, 2023, 2:56 PM

#

No, there's a bunch of papers that back up that specifically for forecasting traditional methods can outperform neural ones (see: https://www.sciencedirect.com/science/article/pii/S0169207019301128 )

The M4 Competition: 100,000 time series and 61 forecasting methods

The M4 Competition follows on from the three previous M competitions, the purpose of which was to learn from empirical evidence both how to improve th…

#

It really depends on a bunch of factors tbh, ARIMA has its own drawbacks

faint mist May 23, 2023, 3:02 PM

#

Interesting!

#

I will have a look at it

lapis sequoia May 23, 2023, 5:49 PM

#

guys

#

If I need to compare two models performance using paired t-tests, can I use validation accuracy for that? Since test accuracy is only one

#

but using validation to compare is bad

#

so idk\

valid heath May 23, 2023, 6:07 PM

#

numpy question:
how do i convert an array from this [[ 2. 4.], [ 4. 8.], [ 6. 12.]] to
[[[2, 2, 2], [4, 4, 4], [[4, 4, 4], [8, 8, 8]], [[6, 6, 6], [12, 12, 12]]]

agile cobalt May 23, 2023, 6:11 PM

#

reshape then repeat I think?

valid heath May 23, 2023, 6:11 PM

#

yes

agile cobalt May 23, 2023, 6:12 PM

#

probably something like np.repeat(arr.reshape(...), ...) experiment a bit

valid heath May 23, 2023, 6:12 PM

#

okk, i'll try, thanks

mild dirge May 23, 2023, 7:27 PM

#

!e

import numpy as np

arr = np.array([[ 2.,  4.], [ 4.,  8.], [ 6., 12.]])
new_arr = np.repeat(arr.reshape(-1, 1), 3, axis=1)
print(new_arr)

arctic wedgeBOT May 23, 2023, 7:27 PM

#

@mild dirge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[ 2.  2.  2.]
002 |  [ 4.  4.  4.]
003 |  [ 4.  4.  4.]
004 |  [ 8.  8.  8.]
005 |  [ 6.  6.  6.]
006 |  [12. 12. 12.]]

mild dirge May 23, 2023, 7:27 PM

#

Hmm, oh actually not the right shape

#

!e

import numpy as np

arr = np.array([[ 2.,  4.], [ 4.,  8.], [ 6., 12.]])
new_arr = np.repeat(np.expand_dims(arr, axis=-1), 3, axis=-1)
print(new_arr)

arctic wedgeBOT May 23, 2023, 7:29 PM

#

@mild dirge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[[ 2.  2.  2.]
002 |   [ 4.  4.  4.]]
003 | 
004 |  [[ 4.  4.  4.]
005 |   [ 8.  8.  8.]]
006 | 
007 |  [[ 6.  6.  6.]
008 |   [12. 12. 12.]]]

mild dirge May 23, 2023, 7:30 PM

#

@valid heath That's it right?

#

If you haven't figured it out yourself by now 😛

weary vigil May 23, 2023, 11:44 PM

#

Hi, is there currently a tool that allows you to chat with a website?

raven summit May 24, 2023, 12:06 AM

#

Everything You Need to Know About Microsoft Fabric https://medium.com/@pratapakshay096/microsoft-fabric-everything-you-need-to-know-522667ee493d

Medium

Microsoft Fabric: Everything you need to know

If you are interested in data and analytics, you have definitely heard of Microsoft Fabric, a new end-to-end data and analytics platform…

serene scaffold May 24, 2023, 1:27 AM

#

weary vigil Hi, is there currently a tool that allows you to **chat with a website**?

what does it mean to chat with a website?

serene scaffold May 24, 2023, 1:28 AM

#

raven summit Everything You Need to Know About Microsoft Fabric https://medium.com/@pratapaks...

Please don't "drop and run" links here. if you want to have a discussion about it, start by saying why you think it's interesting.

dusty bay May 24, 2023, 2:19 AM

#

i have separated header with data in csv file next i have to do multilevel using pandas. Does anyone know the steps?

#

this is the data and headers after I separated them

#

serene scaffold May 24, 2023, 2:30 AM

#

@dusty bay please only ask your question in one place, to prevent duplicated effort. you can cross post a help thread in a topical channel by linking to it.

#

#1110755910466404462 message is the thread.

regal vault May 24, 2023, 3:38 AM

#

im trying to run code on the gpu useing numba but am running into problems due to using classes ("that i made and consist of pure python code or code from other classes ive made which have pure python code")

#

idk how to get past it

serene scaffold May 24, 2023, 3:39 AM

#

regal vault im trying to run code on the gpu useing numba but am running into problems due t...

please show code and error

regal vault May 24, 2023, 3:39 AM

#

from point import Point
from ray import Ray
from color import Color
from numba import jit
class RenderEngine:
    @jit(nopython =True)
    def render(self, scene):
        width = scene.width
        height = scene.hieght
        aspect_ratio = float(width) / height
        x0 = -1.0
        x1 = +1.0
        xstep = (x1 - x0) / (width - 1)
        y0 = -1.0 / aspect_ratio
        y1 = +1.0 / aspect_ratio
        ystep = (y1 - y0) / (height - 1)
        
        camera = scene.camera
        pixels = Image(width, height)
        
        for j in range(height):
            y = y0 + j * ystep
            for i in range(width):
                x = x0 + i *xstep
                ray = Ray(camera, Point(x,y,0) - camera)
                pixels.set_pixel(i, j, self.ray_trace(ray, scene))
        return pixels
    def ray_trace(self, ray, scene):
        color = Color(0,0,0)
        dist_hit , obj_hit = self.find_nearest(ray, scene)
        if obj_hit is None:
            return color
        hit_pos = ray.origin + ray.direction * dist_hit
        color += self.color_at(obj_hit, hit_pos, scene)
        return color
    
    def find_nearest(self, ray, scene):
        dist_min = None
        obj_hit = None
        for obj in scene.objects:
            dist = obj.intersects(ray)
            if dist is not None and (obj_hit is None or dist < dist_min):
                dist_min = dist
                obj_hit = obj
        return (dist_min, obj_hit)
    def color_at(self, obj_hit, hit_pos, scene):
        return obj_hit.material
        
       
                ```

#

this is the engine class

#

here is the main class

#

from image import Image
from color import Color
from point import Point
from Sphere import Sphere
from vector import Vector
from scene import Scene
from engine import RenderEngine


def main():
    WIDTH = 1920
    HEIGHT = 1080
    camera = Vector(0, 0, -2)
    objects = [Sphere(Point(0,0,0), 0.5, Color.from_hex("#FF0000"))]
    scene = Scene(camera, objects, WIDTH,HEIGHT)    
    engine = RenderEngine()
    
    image = engine.render(scene)
    with open ('result.ppm', 'w') as img_file:
        image.write_ppm(img_file)
    


if __name__ == '__main__':
    main()
    ```

#

here is the error

#

  File "c:\Users\siddu\Documents\raytracer\main.py", line 25, in <module>
    main()
  File "c:\Users\siddu\Documents\raytracer\main.py", line 18, in main
    image = engine.render(scene)
            ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\siddu\AppData\Local\Programs\Python\Python311\Lib\site-packages\numba\core\dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "C:\Users\siddu\AppData\Local\Programs\Python\Python311\Lib\site-packages\numba\core\dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'Image': Cannot determine Numba type of <class 'type'>

File "engine.py", line 20:
    def render(self, scene):
        <source elided>
        camera = scene.camera
        pixels = Image(width, height)
        ^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'engine.RenderEngine'>
- argument 1: Cannot determine Numba type of <class 'scene.Scene'>```

#

@serene scaffold

#

ican send the project folder over a drive linl=k

#

or send the files over discord

serene scaffold May 24, 2023, 3:53 AM

#

regal vault ican send the project folder over a drive linl=k

I don't know how to help with this one. I was just encouraging you to unpack your question, to increase your chances of getting an answer.

regal vault May 24, 2023, 3:53 AM

#

ah

dull flare May 24, 2023, 11:10 AM

#

hlo, how do i check my gpu with tensorflow

#

when i ran
tf.config.list_physical_devices('GPU') its not showing anything instead of [] these brackets

#

i started learning tensorflow just yesterday

cold osprey May 24, 2023, 11:15 AM

#

Windows?

dull flare May 24, 2023, 11:15 AM

#

yes 11

#

rtx 3050Ti

cold osprey May 24, 2023, 11:16 AM

#

U need WSL

#

Tensorflow doesn't run on gpu natively on windows anymore

#

There's always pytorch

dull flare May 24, 2023, 11:17 AM

#

yea pytorch recognises it

#

but tf doesnt

#

wheat snow May 24, 2023, 11:55 AM

#

i want to change a column in my df to be replaced by teh backup because i made some errors:

df['ID']= df_backup['ID']

would that be correct?

#

ID ol d e.g:            Backup
93562                    ID:93562
01292                    ID:01292
32570                      ...  
32048                      ...

potent garnet May 24, 2023, 12:55 PM

#

Hi everyone I have question. I'm working on some dataset and some columns has 50K or 60K missing values, my dataset has 99K row, How can I handle with so much missing values I would be very happy if someone could help

cold osprey May 24, 2023, 12:56 PM

#

Depends on the data but drop the column I guess

steel dagger May 24, 2023, 1:55 PM

#

How can I fine tune word embeddings with gensim? can anyone help?

keen gust May 24, 2023, 2:19 PM

#

currently have a function that I've tweaked several times to either return a dataframe, variable, or even made some errors so that it should not work. This function is called in another file that displays a df within a streamlit page but when calling the function on this page, the output never changes. Whatever first loaded on that page is what persists no matter what changes are made to the underlying code. Is this just a streamlit bug? The function itself is just cleaning up a df based on the api key passed.

potent sky May 24, 2023, 3:26 PM

#

wdym by "changes are made to the underlying code"?
While you have a streamlit app running, you make some changes to the code and you want those to reflect in the app?

keen gust May 24, 2023, 3:32 PM

#

potent sky wdym by "changes are made to the underlying code"? While you have a streamlit ap...

yes, while I have the page open (or close it and reopen), if I edit the function that is in one file and then import & call it in another file/streamlit page, it never returns anything different than what originally loaded the first time. I could break the code and it will still function, it's as if those changes never reach the function call.

#

at this point I'm just coding within the individual pages which is fine since it isn't time consuming, mostly copy & paste, but I'd like to understand why the above didn't work as I thought it would.

potent sky May 24, 2023, 3:47 PM

#

I suspect the issue you're facing is not a characteristic of streamlit but python itself. When importing a module, it is loaded into memory for that "runtime". If the import statement is encountered again, it doesn't make much sense to load the same code in again (with associated overhead). So python loads imports only once.
If you want to reload a module explicitly, maybe this will help:
https://docs.python.org/3/library/importlib.html#importlib.reload

Python documentation

importlib — The implementation of import

Source code: Lib/importlib/init.py Introduction: The purpose of the importlib package is three-fold. One is to provide the implementation of the import statement (and thus, by extension, the__i...

#

!d importlib.reload

arctic wedgeBOT May 24, 2023, 3:47 PM

#

importlib.reload


importlib.reload(module)```
Reload a previously imported *module*. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter. The return value is the module object (which can be different if re-importing causes a different object to be placed in [`sys.modules`](https://docs.python.org/3/library/sys.html#sys.modules "sys.modules")).

When [`reload()`](https://docs.python.org/3/library/importlib.html#importlib.reload "importlib.reload") is executed:

potent sky May 24, 2023, 3:49 PM

#

keen gust at this point I'm just coding within the individual pages which is fine since it...

^^

keen gust May 24, 2023, 4:04 PM

#

potent sky ^^

thank you very much, makes sense. Just not something I was aware of!

burnt pond May 24, 2023, 4:09 PM

#

i want courses or tutorials on ml and ai for python

#

also what is this tensorflow playground how to use it what is this neural network concept

potent sky May 24, 2023, 4:43 PM

#

keen gust thank you very much, makes sense. Just not something I was aware of!

No problem!

potent sky May 24, 2023, 4:44 PM

#

burnt pond also what is this tensorflow playground how to use it what is this neural networ...

https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

Maybe this will help

YouTube

Neural networks

Learn the basics of neural networks and backpropagation, one of the most important algorithms for the modern world.

#

If you like 3b1b (who doesn't xd)

#

As a starting point

#

Tensorflow playground is an interactive graphical tool to visualize, tinker with, and understand the behavior of basic neural nets

potent sky May 24, 2023, 4:48 PM

#

burnt pond i want courses or tutorials on ml and ai for python

Depends on what kind of math/code balance you want and how deep you want to go.
But Andrew Ng's courses should be a good starting point

#

Oof it's been a while since I've been active here ^-^

plain jungle May 24, 2023, 6:57 PM

#

burnt pond i want courses or tutorials on ml and ai for python

https://youtu.be/x2YmEX1XzGI hopefully this helps getting your feet wet

YouTube

JTexpo

Math AI : Python Neural Network ( AI ) from Scratch Step-by-Step. O...

Automate algebraic problem solving with this comprehensive tutorial, where you'll learn how to implement a neural network to effortlessly tackle math questions. In this in-depth guide, we'll build a neural network from scratch using the powerful NumPy library, enabling you to create a dynamic model in Python.

Take your skills to the next level...

▶ Play video

#

I do have other videos on the channel and plan to release more soon

somber panther May 24, 2023, 8:23 PM

#

having a bit of trouble understanding axis in numpy, what does axis 0 represent in an n-dim array?

mild dirge May 24, 2023, 8:33 PM

#

!e

import numpy as np

# Make a 2d array
arr = np.arange(1, 17).reshape(4, 4)
print(arr, '\n')

# Take sum of axis 0
print('axis 0:\n', arr.sum(axis=0))

# Take sum of axis 1
print('axis 1:\n', arr.sum(axis=1))

arctic wedgeBOT May 24, 2023, 8:33 PM

#

@mild dirge :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[ 1  2  3  4]
002 |  [ 5  6  7  8]
003 |  [ 9 10 11 12]
004 |  [13 14 15 16]] 
005 | 
006 | axis 0:
007 |  [28 32 36 40]
008 | axis 1:
009 |  [10 26 42 58]

mild dirge May 24, 2023, 8:33 PM

#

@somber panther

#

axis 0 is the most outer dimension, so going from top to bottom

#

axis 1 is the inner dimension, going from left to right

somber panther May 24, 2023, 8:43 PM

#

mild dirge <@664732808404664320>

ty, this is similar to code i've looked at, if this were one layer in a 3d array would the outputs remain the same?

mild dirge May 24, 2023, 8:44 PM

#

If it was a 3d array, and you sum over a single axis, the result would be 2d

somber panther May 24, 2023, 8:44 PM

#

hmm, nvm the other layers would be added

mild dirge May 24, 2023, 8:44 PM

#

It reduces the specified dimension*

somber panther May 24, 2023, 8:44 PM

#

i see

mild dirge May 24, 2023, 8:45 PM

#

You can also sum over multiple axis though

#

You can think of a 3d numpy array as a cube of values. Each value in the 3d array has an x,y,z position

#

And summing over 1 axis, compresses that dimension, flattening it in that dimension

#

summing over 2 axis compresses both dimensions, in whatever order

somber panther May 24, 2023, 8:47 PM

#

sum axis 1 would return a 2d array representing the sums of each row, with each "layer" on a new row, correct?

#

used on a 3d array

mild dirge May 24, 2023, 8:48 PM

#

If you have a 3d array, and sum over axis 1, and you consider the axis 0 is x, axis 1 is y, axis 2 is z, so each value has (x, y, z) position. Then summing over axis 1 (y) means that you sum all values with the same x and z position.

somber panther May 24, 2023, 8:48 PM

#

kk, ty

#

hard for me to visualize a data cube, tend to think about it like a xlsx

mild dirge May 24, 2023, 8:50 PM

#

Can think of it as multiple images stacked together

#

One image is a rectangular grid, and we stack multiple on top

#

Thus making a 3d "cube" of data.

errant lake May 24, 2023, 10:08 PM

#

Hi, I'm fiddling with pandas and I get unexpected results. How to explain this?
Script is comparing the execution time of apply, np.where, list comprehension, str.contains on a column of string (which is one of my frequent use-cases)

import pandas as pd
import numpy as np
import time
from tqdm import tqdm
import matplotlib.pyplot as plt

# Create a dataframe with random strings
np.random.seed(0)
df = pd.DataFrame({'column': np.random.choice(['char', 'other', 'random', 'words', 'for', 'test'], 10_000_000)})

methods = ['apply with lambda', 'np.where', 'list comprehension', 'str.contains']
results = pd.DataFrame(index=methods, columns=[f'Test {i+1}' for i in range(5)])

for i in tqdm(range(5), desc='Running tests'):
    # apply with lambda
    start_time = time.time()
    df['column'].apply(lambda x: 'thing' if 'char' in x else 'some')
    results.loc['apply with lambda'][f'Test {i+1}'] = time.time() - start_time

    # np.where
    start_time = time.time()
    np.where(df['column'].str.contains('char'), 'thing', 'some')
    results.loc['np.where'][f'Test {i+1}'] = time.time() - start_time

    # list comprehension
    start_time = time.time()
    ['thing' if 'char' in x else 'some' for x in df['column']]
    results.loc['list comprehension'][f'Test {i+1}'] = time.time() - start_time

    # str.contains
    start_time = time.time()
    df['column'].str.contains('char')
    results.loc['str.contains'][f'Test {i+1}'] = time.time() - start_time

# Plot results
results.mean(axis=1).plot(kind='bar', ylabel='Execution Time (seconds)')
plt.show()

wooden sail May 24, 2023, 10:12 PM

#

np where and str contains performing the same makes sense, since you use str contains inside the np where

#

i'm not sure how apply handles stuff internally

errant lake May 24, 2023, 10:16 PM

#

Hmm any other way to test np.where() while not using str.contains?

wooden sail May 24, 2023, 10:18 PM

#

try with in

#

idk, this is a weird example because ideally you wouldn't do this this way either, you can just use == instead

errant lake May 24, 2023, 10:21 PM

#

True, I'm often replacing part of a string with another, and I tried to generalize this

wooden sail May 24, 2023, 10:22 PM

#

keep in mind what np where does is take an array of bools and use that to conditionally return the other 2 params