normal acorn Sep 5, 2023, 10:01 PM

#

?

wooden sail Sep 5, 2023, 10:01 PM

#

for lipschitz continuous functions, this gives you an upper bound on the change of magnitude of the gradient for any delta v

normal acorn Sep 5, 2023, 10:02 PM

#

What is the lipschitz?

wooden sail Sep 5, 2023, 10:06 PM

#

ok, since that doesn't ring a bell, try following the suggestion in the problem. you're told to minimize grad C times delta v (try minimizing its magnitude) with the constraint that delta v has magnitude epsilon

#

you should be able to set this up in lagrange form

normal acorn Sep 5, 2023, 10:07 PM

#

I think I need a deeper understang of Algerba to understnad this.

#

Thank you for you help, I will anyalze it one more time before looking deeping my understanding of the revlenet algebra.

wooden sail Sep 5, 2023, 10:11 PM

#

in fairness i'm giving you calculus suggestions instead of algebra ones. i'm guessing C here is scalar valued. set up the cauchy schwarz inequality for grad C times delta V, and try to show any other choice of delta v yields a larger value

normal acorn Sep 5, 2023, 10:12 PM

#

What do other choices entail?

#

−η was a varbaile chosen without explained reasoning, so would that same logic apply for picking any "choice"?

#

Could any choice introduce a variable?

wooden sail Sep 5, 2023, 10:14 PM

#

my hint to you is: delta v is parallel to grad C if you use the expression you were given. what does cauchy schwarz tell us?

normal acorn Sep 5, 2023, 10:15 PM

#

Okay. I will try again thank you pithink

past sky Sep 5, 2023, 10:22 PM

#

Hello someone pls help, so my professor want us to create an ai, model any thing that could solves questions on the screen. Here is what he said "A test on google forms or something like that will be posted the goal is for the script to read the screen, move the mouse and answer survey type questions"

#

I there anyway i could do this?

#

Any form of help would greatly be appericiated

quasi torrent Sep 5, 2023, 10:39 PM

#

Does anyone have experience with CNN in tensorflow?

serene scaffold Sep 5, 2023, 10:46 PM

#

quasi torrent Does anyone have experience with CNN in tensorflow?

always ask your actual question right away, rather than hinting at the topic and waiting for a commitment

quasi torrent Sep 5, 2023, 10:59 PM

#

so I am trying to create a CNN model that takes in a face image of a person and predicts the age of that person. But the model is overfitting. Here is my model structure :

```inputs = Input((input_shape))`

#convolutional layers
conv_1 = Conv2D(32,kernel_size=(3,3),activation='relu') (inputs)
maxp_1 = MaxPooling2D(pool_size=(2,2)) (conv_1)
conv_2 = Conv2D(64,kernel_size=(3,3),activation='relu') (maxp_1)
maxp_2 = MaxPooling2D(pool_size=(2,2)) (conv_2)
conv_3 = Conv2D(32,kernel_size=(3,3),activation='relu') (maxp_2)
maxp_3 = MaxPooling2D(pool_size=(2,2)) (conv_3)

flatten = Flatten() (maxp_3)

#Fully Connected Layers
dense = Dense(128,activation='relu',kernel_regularizer=l2(0.01))(flatten)

dropout = Dropout(0.3)(dense)

dropout = Dropout(0.4)(dense)

output = Dense(1,activation='linear',name='age_out')(dense)

model = Model(inputs=[inputs], outputs=[output])
model.compile(loss='mean_absolute_error',optimizer='adam',metrics='mean_squared_error')```

I am pretty new to CNNs so any help on how to not make the model overfit would be appreciated. I am training it on 20,105 images and here is the accuracy and error plot.

#

hasty mountain Sep 6, 2023, 1:36 AM

#

Hey guys, I'm reviewing the calculations around Variational AutoEncoders and I was wondering... What is the difference between Mean Absolute Error and Kullback-Leibler Divergence?

I know that I can't use MSE in the Encoding Loss because of optimization problems (the MSE would imply a term that I can't know for sure, while KLD allows that term to be eliminated). But how about MAE? It would be basically the absolute value of a subtraction between two distributions.

#

(I think Edd explained this to me before, but I don't remember... pithink )

#

Oh, nevermind. I've remembered that the difference between distributions isn't just about a subtraction between numbers, but also between the area underneath them, since it's a subtraction within an integral...something that doesn't happen in MAE.

hasty mountain Sep 6, 2023, 3:06 AM

#

Now I'm a bit confused over which KLD version I should use... I know that KLD has a formula for discrete variables, and another for continuous variables.

Usually, I see people using the discrete version for VAEs to generate images, but...since I'm dealing with images, pixels...shouldn't I use the continuous version?

barren jungle Sep 6, 2023, 7:49 AM

#

i want to make a programme which detects elephants from a img / vid/ webcam by using one of the best modeles in a audino , i dont have gpu and want leight weight , how do i make it

#

light*

torn ore Sep 6, 2023, 9:14 AM

#

I've been looking at game theories and algorithms and I'm considering making a tic tac toe bot with multiple difficulties. I found some minimax code for one that runs in command prompt, but it's synchronous. Does anyone know of one written async?

#

If not then I'm gonna dive into it myself lol

steady basalt Sep 6, 2023, 9:18 AM

#

Guys, my company is attempting to experiment using models like llama to predict big datasets in place of statistical modelling, what do you think about the feasability of such?

#

I'm trying to say it's a bad idea on anything larger than a thousand rows...

torn ore Sep 6, 2023, 9:20 AM

#

Pretty sure a larger dataset would yield more accurate results, no?

steady basalt Sep 6, 2023, 10:20 AM

#

you'd rather use gpt to analyse a large table? for predictive modelling?

#

how is that as reliable as normal means

serene scaffold Sep 6, 2023, 11:07 AM

#

steady basalt Guys, my company is attempting to experiment using models like llama to predict ...

What do you mean by "predict big datasets"?

steady basalt Sep 6, 2023, 12:53 PM

#

serene scaffold What do you mean by "predict big datasets"?

Imagine you have a table of data that youd normally use a random forest on

#

to predict y variable

#

they're feeding that data as string input to a llm

#

to see how it performs

boreal gale Sep 6, 2023, 12:59 PM

#

that honestly sounds like a recipe for disaster.

you can explain what a random forest do and how/why does it work
likewise, you can explain how/why a neural network specifically designed for your task works

same can't really be said for a LLM that's trained for other purposes - sure it could spit out things that resembles a reasonable output, but you can't reliably foresee when or understand why it breaks down (i.e. underperform your baseline model)

that's my two cents anyway.. i am very reluctant in fully embracing LLMs, maybe i should get with the times...

steady basalt Sep 6, 2023, 1:03 PM

#

agree

solid cloud Sep 6, 2023, 1:03 PM

#

boreal gale that honestly sounds like a recipe for disaster. you can explain what a random ...

Agreed. You wont be able to explain/interpret the predictions properly

wide cosmos Sep 6, 2023, 1:06 PM

#

Hey I'm kind of new to open source contibutions and Github but I had been working on one project for sometime and had recently completely uploaded it on github. Feel free to have a look and let me know if I should do something differently, also if there are other ideas you can suggest that'll be appreciated too. Thanks. https://github.com/Nik-code/nlp-chatbot

cold osprey Sep 6, 2023, 1:36 PM

#

solid cloud Agreed. You wont be able to explain/interpret the predictions properly

The ultimate black box

primal night Sep 6, 2023, 2:39 PM

#

Hi everyone,
I could really use some assistance with our project. I'm looking to utilize a drone for ground crack detection, but I'm not quite convinced that the camera on the Tell Edu drone is up to the task.

Could you kindly share any recommendations for a programmable drone with superior camera quality?

I greatly appreciate your help in advance! 🙏

boreal gale Sep 6, 2023, 2:40 PM

#

cold osprey The ultimate black box

would be interesting to see how it behaves when one apply shapley values to make it explainable

hasty mountain Sep 6, 2023, 2:47 PM

#

boreal gale that honestly sounds like a recipe for disaster. you can explain what a random ...

I just discovered that GANs aren't black boxes...they're black holes py_guido

solid cloud Sep 6, 2023, 2:57 PM

#

boreal gale would be *interesting* to see how it behaves when one apply shapley values to ma...

Well if we use an LLM for predictions, we can use it to generate SHAP values too 😂

hasty mountain Sep 6, 2023, 2:59 PM

#

Also guys, a small confusion around information theory in VAEs:
The Encoder, by compressing the data, tries to extract the most relevant features in the data, right? So, the lower the latent space, the more criterious the Encoder must be, less features are extracted, then more specific the latent space gets for each input data?

While the Decoder, by decompressing the data, must create new information based on the most relevant features extracted by the Encoder? All this while trying to recompose the original input data?

So, the Encoding process extracts most relevant features and loses information. The Decoding process tries to recompose (create) information based on such relevant features?

solid cloud Sep 6, 2023, 3:01 PM

#

hasty mountain Also guys, a small confusion around information theory in VAEs: The Encoder, by ...

Pretty much. This is generally what an autoencoder attempts to do - reconstruct the input through a bottleneck. A variational AE differs by generating a latent distribution which you sample from and pass to your decoder

hasty mountain Sep 6, 2023, 3:02 PM

#

solid cloud Pretty much. This is generally what an autoencoder attempts to do - reconstruct ...

Hm... I see... I was reading a paper here and I'm thinking that, since the VAE generates a latent distribution (not just a latent variable, like AE), then it's more insensitive to the dimensions of the Encoder output.

#

I'm even testing this right now in Colabs... I was using a VAE with Encoder generating mean and standard deviations with 128 dimensions (Batch, 128), and now I'm testing with 16 dimensions.

#

I think I'm beginning to understand now... The Encoder output in a VAE is not exactly the features, but rather a distribution, the latent space, and the size of such output is not the number of features, but simply the number of dimensions of this latent space, this distribution. Since this distribution ranges from -inf to inf, then all input features could be fit into this latent space even if it has just one single dimension.

At least I think this is a reasonable explanation pithink

hasty mountain Sep 6, 2023, 3:33 PM

#

Curious...then I could pass 64x64 RGB images to my Encoder and simply make it provide outputs with just 1 single dimension...

Optimeezachon! brainmon

worldly gust Sep 6, 2023, 5:58 PM

#

hi

#

i need help

#

I am building an artificial intelligence model to detect age and gender
I tried to make my own dataset
So, I collected photos and saved their required specifications in a file
When I was labeling the photos, I noticed that the labels were not compatible with the photos

#



import pandas as pd
import numpy as np
import os
import tensorflow as tf
import glob
data = pd.read_csv("data.csv")
data['Gender'].replace(['male','female'],[0,1],inplace=True)
data['Age'] = data['Age'].replace(['15_22', '23_30'], [0, 1]).astype(int)
data.head(13)
# Define image processing functions
def load_image(image_path):
    img = tf.io.read_file(image_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, (150, 150))
    return img
# Prepare image paths and labels
PATH = "./picture"
image_paths = sorted(glob.glob(PATH + "/*.jpeg"))  # Sort image paths
images = tf.stack([load_image(image_path) for image_path in image_paths])
labels = data[['Age', 'Gender']].to_numpy()
print(image_paths)

#

this is my code...

#

help me'

solid cloud Sep 6, 2023, 9:35 PM

#

hasty mountain I think I'm beginning to understand now... The Encoder output in a VAE is not ex...

Yep. The output of the encoder are parameters for a multivariate gaussian distribution which you then sample from

solid cloud Sep 6, 2023, 9:37 PM

#

worldly gust I am building an artificial intelligence model to detect age and gender I tried ...

What do you mean that the labels are not compatible with the photos?

quasi torrent Sep 6, 2023, 11:46 PM

#

worldly gust I am building an artificial intelligence model to detect age and gender I tried ...

I am curious are you building a CNN? Because I am doing a similar problem where the model figures out the age given a face image

desert oar Sep 7, 2023, 12:46 AM

#

steady basalt they're feeding that data as string input to a llm

i can't imagine that would work well. among many other problems, that data is going to look completely different from what any LLM was trained on and you couldn't expect any kind of reasonable results from that process. you could ask and LLM to build a model fitting pipeline for you though

desert oar Sep 7, 2023, 12:47 AM

#

worldly gust ```py import pandas as pd import numpy as np import os import tensorflow as tf...

!code i recommend using code block formatting for sharing code in the future

arctic wedgeBOT Sep 7, 2023, 12:47 AM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

spring scarab Sep 7, 2023, 12:47 AM

#

I recently came across a XPhoneBert and I am trying to train a model to see if two sentences sound similar using the transform library on hugging face: https://github.com/VinAIResearch/XPhoneBERT

I want to create a model using LSTM binary classification.

inputs: sentence_1 sentence_2

output: whether they sound similar or not.

for sentence_1 and sentence_2 I want to pre-pad them.

I have a list of sentences in sentence_1

I tried doing this using:

tokenizer(sentence_1, return_tensors'pt', padding=True, max_length=100)```
When I do this it looks like it always puts the padding on the end. How do I prep-pad these values?

Another question I have is once I get all the inputs_ids and attention_mask values do I need to run them through the model and how do I do that? If someone could give me a code example of how to do that it would be really helpful.

Thanks in advanced

GitHub

GitHub - VinAIResearch/XPhoneBERT: XPhoneBERT: A Pre-trained Multil...

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023) - GitHub - VinAIResearch/XPhoneBERT: XPhoneBERT: A Pre-trained Multilingual Model for ...

reef drum Sep 7, 2023, 3:30 AM

#

Anyone have any good tips on cs majors starting their first data science courses. Any tips in particular in getting familiar with libs like pandas, numpy, matplotlib

serene scaffold Sep 7, 2023, 3:55 AM

#

reef drum Anyone have any good tips on cs majors starting their first data science courses...

Whatever you're trying to do with numpy or pandas, resist the temptation to use a loop or .apply as much as you possibly can, and look for a solution involving neither in the docs.

There's no helping you with matplotlib because it sucks. I still don't understand it.

ionic badge Sep 7, 2023, 3:57 AM

#

reef drum Anyone have any good tips on cs majors starting their first data science courses...

there is tutorial on pandas for grandma on youtube. thats pretty neat. My suggestion would be to get started on a simple data analysis project - do summarization, some cleaning, look for outliers, do filtering, grouping, aggregation. Cover the basics first and jump into larger projects

reef drum Sep 7, 2023, 9:08 AM

#

serene scaffold Whatever you're trying to do with numpy or pandas, resist the temptation to use ...

Will keep this in mind thank you!

reef drum Sep 7, 2023, 9:09 AM

#

ionic badge there is tutorial on pandas for grandma on youtube. thats pretty neat. My sugges...

Thanks!

solid cloud Sep 7, 2023, 10:08 AM

#

reef drum Anyone have any good tips on cs majors starting their first data science courses...

I agree w the other suggestions. Generally practice makes perfect so try small projects that use those libraries. I also suggest familiarising yourself w statistics

reef drum Sep 7, 2023, 10:08 AM

#

solid cloud I agree w the other suggestions. Generally practice makes perfect so try small p...

Yeah i took statistics last year I got a good grade but the professor sucked so the concepts didnt stick too well.

slender kestrel Sep 7, 2023, 11:47 AM

#

quasi torrent so I am trying to create a CNN model that takes in a face image of a person and ...

to avoid over fitting use 2 methods 1 early stopping it is basically stopping the traning as soon as the error stops decreasing and learning rate decay they will help a lot in preventing over fitting of your model

slender kestrel Sep 7, 2023, 11:48 AM

#

hasty mountain I just discovered that GANs aren't black boxes...they're black holes <:py_guido:...

lol

ionic badge Sep 7, 2023, 12:02 PM

#

is there any resources for absolute beginners in large language model, something that i can start with. Thank you!

#

books, pdf, youtube series. anything will do. there is so much contenet out there, its hard to choose which one to go with

serene scaffold Sep 7, 2023, 12:05 PM

#

ionic badge is there any resources for absolute beginners in large language model, something...

an absolute beginner trying to do what with LLMs?

#

pretty much any project that would involve an LLM in a non-trivial way would be exceptionally challenging for a beginner, btw.

ionic badge Sep 7, 2023, 12:11 PM

#

serene scaffold pretty much any project that would involve an LLM in a non-trivial way would be ...

well all have to start somewhere right? 🙂 I'm looling for some easy to follow tutorials like corey schafer on python fundamentals. pdfs and books will also work.

serene scaffold Sep 7, 2023, 12:13 PM

#

ionic badge well all have to start somewhere right? 🙂 I'm looling for some easy to follow ...

I still need to know what you're trying to do with LLMs to make any suggestions.
but one wants to start with something that is achievable with a medium amount of effort, or you'll give up before achieving anything.

ionic badge Sep 7, 2023, 12:15 PM

#

serene scaffold I still need to know what you're trying to do with LLMs to make any suggestions....

Thanks. im not trying to do anything at this point - just learning the basics. The if-else and for-loops of LLMs. The basic building blocks - i have the math background - algebra,calc. I need to start with what comes next.

serene scaffold Sep 7, 2023, 12:17 PM

#

ionic badge Thanks. im not trying to do anything at this point - just learning the basics. T...

There are no if-else and for-loops of LLMs.

#

I would try to first build an understanding of what LLMs are.

ionic badge Sep 7, 2023, 12:21 PM

#

serene scaffold There are no if-else and for-loops of LLMs.

I'm sorry if Im not describing it the right way. Neural networks can be considered the basics of LLMs. Im looking for something that start from the basics. and go all the way up. You don't learn about classes or try to understand what classes are when you start learning Python.

serene scaffold Sep 7, 2023, 12:22 PM

#

ionic badge I'm sorry if Im not describing it the right way. Neural networks can be consider...

if you come up with an example of what you would want to do with LLMs once you understand how they work, I will look for and give you a resource for that.

#

that's all I can offer for the moment.

ionic badge Sep 7, 2023, 12:23 PM

#

serene scaffold if you come up with an example of what you would want to do with LLMs once you u...

got it. thank you for your time. Appreciate it

serene scaffold Sep 7, 2023, 12:24 PM

#

you might also read about the differences between GPT and BERT @ionic badge

night kelp Sep 7, 2023, 1:23 PM

#

Just wanted to understand industry standards, what's the benchmark for getting 100K recrods from table and having it in-memory. I have to apply couple of functions on top of this tabular data. Right now it takes 15 Seconds for me to get the data from snowflake, fetch it as Pandas DataFrame.
Is it normal in industry to do this kinda of thing?
Any one who have previously done, can you elobrate your tech stack.
I wanted to do this in a request response cycle. I'll send very minimal data to the frontend after computation. (sub second performance)

somber panther Sep 7, 2023, 2:02 PM

#

I'm looking at some data from RAND "smoothed percent of population with high school or equivalent degree" reporting figures in the .2-.3 range for all states... Am I misunderstanding this, is it not reporting the supposed population with HS diploma level education?

#

https://www.rand.org/pubs/tools/TLA243-3.html

Inpatient Hospitalizations for Firearm Injury: Estimating State-Lev...

The lack of reliable, state-level data on firearm injuries is a challenge for gun policy researchers. As part of the Gun Policy in America initiative, RAND researchers developed a publicly available longitudinal database of state-level estimates of inpatient hospitalizations that occur as a result of firearm injury.

steady basalt Sep 7, 2023, 2:08 PM

#

Funnily no one has to understand how LLMs work anymore, every company is throwing resources to applying the black box at all possible use cases

steady basalt Sep 7, 2023, 2:09 PM

#

ionic badge is there any resources for absolute beginners in large language model, something...

Insanely easy, pip install langchain or openai and read their docs, apis are king rn

lapis sequoia Sep 7, 2023, 2:29 PM

#

this channel is not really online

#

@noble quail u hv a lot to type i can see

#

u done

#

https://tenor.com/view/leonardo-dicaprio-clapping-clap-applause-amazing-gif-16384995

Tenor

noble quail Sep 7, 2023, 2:39 PM

#

I'm processing some frames of a video with OpenCV and saving to a file with pickle.
I'd like to check if the file sizes I'm getting as a result are reasonable.

I originally have a 640 x 360 video, I sample about 25,000 frames from this video and resize them to 25% of the original size, normalise by dividing by 255 then pickle.dump them in a file which is about 3.25GB - this seems a bit big to me so it'd be really helpful if someone could check that this is reasonable or not.

Thanks!

quasi torrent Sep 7, 2023, 2:51 PM

#

slender kestrel to avoid over fitting use 2 methods 1 early stopping it is basically stopping th...

is it possible to use both?

shrewd shale Sep 7, 2023, 3:24 PM

#

noble quail I'm processing some frames of a video with OpenCV and saving to a file with `pic...

what's the source video format? how many channels this video has (rgb / gray)? what do you pickle to disk? numpy array or python lists? it's probably some compressed format but you are dumping frames as raw rgb

mild dirge Sep 7, 2023, 3:25 PM

#

noble quail I'm processing some frames of a video with OpenCV and saving to a file with `pic...

Iirc videos are normally compressed using techniques that make heavy use of how most videos are organized

#

Meaning consecutive frames are often very similar to each other, with only minor differences

#

This allows for better compression than just regular compression on general data

#

And also depends on if the compression is lossless or not (when you load it back, is it the exact same?)

noble quail Sep 7, 2023, 3:33 PM

#

shrewd shale what's the source video format? how many channels this video has (rgb / gray)? w...

what's the source video format?
It's mp4, codec H.264

how many channels this video has (rgb / gray)
3 (I think - I'm not sure what this means so referred to this stackexchange answer

numpy array or python lists?
each frame is a numpy array but I pickle a normal python list of these arrays

Stack Overflow

how to get the number of channels from an image, in OpenCV 2?

The answers at Can I determine the number of channels in cv::Mat Opencv answer this question for OpenCV 1: you use the Mat.channels() method of the image.

But in cv2 (I'm using 2.4.6), the image d...

shrewd shale Sep 7, 2023, 3:34 PM

#

!e ```python
print(((6400.25) * (3600.25) * 25000 * 3) / (1024 ** 3))

arctic wedgeBOT Sep 7, 2023, 3:34 PM

#

@shrewd shale :white_check_mark: Your 3.11 eval job has completed with return code 0.

1.0058283805847168

shrewd shale Sep 7, 2023, 3:35 PM

#

I'm kinda rusty on this calculations nowadays, but rest is probably pickle overhead (numpy/list object overhead)

mild dirge Sep 7, 2023, 3:35 PM

#

Can't you construct it into an mp4 again?

#

Probably libraries for that

#

moviepy or something

shrewd shale Sep 7, 2023, 3:36 PM

#

did you tried np.save ?

noble quail Sep 7, 2023, 3:36 PM

#

mild dirge This allows for better compression than just regular compression on general data

Ah yeah this makes sense as to why there's a massive inflation - never realised how much of a difference compression made!

mild dirge Sep 7, 2023, 3:37 PM

#

It's actually pretty interesting, it also explains some of the weird artifacts you sometimes get when watching a movie

#

Where the image seems to move buth the actual image is wrong until the next key frame

noble quail Sep 7, 2023, 3:38 PM

#

shrewd shale did you tried `np.save` ?

No I wasn't aware of this - will give it a go!

noble quail Sep 7, 2023, 3:38 PM

#

mild dirge Can't you construct it into an mp4 again?

This is also a smart idea! Thanks!

shrewd shale Sep 7, 2023, 3:39 PM

#

If building a video again is not an option, may be this helps (https://stackoverflow.com/a/41425878/2886047)

Stack Overflow

best way to preserve numpy arrays on disk

I am looking for a fast way to preserve large numpy arrays. I want to save them to the disk in a binary format, then read them back into memory relatively fastly. cPickle is not fast enough,

ionic badge Sep 7, 2023, 4:24 PM

#

I am trying to run predict_qa_alpaca.py on a finetuned llama model and im getting module 'utils' has no attribute 'jload' - any idea what might be causing this?

ionic badge Sep 7, 2023, 4:40 PM

#

figured it out - there was conflict in utils.py from torch and utils.py from alpaca library

somber panther Sep 7, 2023, 5:17 PM

#

anyone with a statista account be willing to pull some data for me? want to compare education to gun hospitalizations for a portfolio project

twin forge Sep 7, 2023, 7:21 PM

#

Has someone worked with FinRL before?

https://github.com/AI4Finance-Foundation/FinRL

GitHub

GitHub - AI4Finance-Foundation/FinRL: FinRL: Financial Reinforceme...

FinRL: Financial Reinforcement Learning. 🔥. Contribute to AI4Finance-Foundation/FinRL development by creating an account on GitHub.

#

I want to find the average holding period of the portfolio and it's average monthly number of trades.

#

Also, I want to know if is possible to add a rebalance_portfolio argument

#

so that my portfolio is annualy rebalanced based on financial indicators of each company

halcyon hedge Sep 7, 2023, 7:35 PM

#

Is it possible to reduce the size of individual marker in plotly.express.scatter_geo() ?

#

I am working on a project in which I need to plot all the terrorists attacks that have happened since 1970 on a map, with the normal size of marker used by scatter_geo it is looking very messy

weak mortar Sep 7, 2023, 8:16 PM

#

halcyon hedge Is it possible to reduce the size of individual marker in plotly.express.scatter...

#

https://plotly.com/python-api-reference/generated/plotly.graph_objects.scattergeo.html?highlight=scattergeo#module-plotly.graph_objects.scattergeo

#

marker=dict(size=10)

quasi sparrow Sep 7, 2023, 9:33 PM

#

Hi,
There is this book I’ve been reading that says deploying machine learning models on a FastApi is a bad idea and recommends using TensorFlow serving instead.
In my ML pipeline, I have save my train model along with the preprocessing data steps to ensure the data preprocessing steps are the same as the preprocessing steps used in inference but to collect my training data, I have a FastAPI collecting data, which is what the author says we shouldn’t do.

My question: Is there a way to use tensorflow serving to also collect data? This way I wouldn’t have to have a FastApi collecting data? Is tensorflow serving only used for inference?
Thank you all!

past meteor Sep 7, 2023, 10:24 PM

#

quasi sparrow Hi, There is this book I’ve been reading that says deploying machine learning m...

I never fully understood the connection between FastAPI and ML. Fundamentally making a prediction is CPU bound, FastAPI is an async framework await model.predict() does not make a lot of sense.

#

TF serving to me does more of the "boilerplate" stuff you'd have to roll yourself like loading the model from disc at every request (which technically is async).

quasi sparrow Sep 7, 2023, 10:27 PM

#

Does that mean the FastAPI will “lock” the memory every time the model makes a prediction?

past meteor Sep 7, 2023, 10:29 PM

#

quasi sparrow Does that mean the FastAPI will “lock” the memory every time the model makes a ...

That's not what I meant. FastAPI (and ASGI in general) thrives when you have non-blocking operations. Basically you make a call to load your model from disk and instead of waiting for it to be loaded you're doing other stuff. I guess the vast amount of tutorials make it an OK choice though.

past meteor Sep 7, 2023, 10:34 PM

#

quasi sparrow Hi, There is this book I’ve been reading that says deploying machine learning m...

Imo if you're unsure I think you should look at "integrated" tools like MLflow simply because they're opinionated and make a lot of "decisions" for you.

quasi sparrow Sep 7, 2023, 10:35 PM

#

past meteor That's not what I meant. FastAPI (and ASGI in general) thrives when you have non...

I see, that makes sense!

past meteor Sep 7, 2023, 10:36 PM

#

Using an API to collect data... it depends on your use case

#

The easiest thing if you're not working in realtime is to store your data somewhere and process in batch.

quasi sparrow Sep 7, 2023, 10:38 PM

#

The application I'm trying to build works in real-time. It's supposed to be a device on the edge that predicts the output of a chemical process

timid kestrel Sep 7, 2023, 10:39 PM

#

hey hey yall, anyone got any good data sources to practice machine learning models with python? Ive tried searching in kaggle but i dont think i have a good trained eye to select good data sets. im tryna practice my xgboost, random forest parameter setting and optimization skills. also am pretty new to python coding. i was told to browse thru the pins but i cant find anything too specific

past meteor Sep 7, 2023, 10:39 PM

#

If the device on edge is making predictions, do you need an endpoint?

quasi sparrow Sep 7, 2023, 10:40 PM

#

I think I could use tensor flow serving to collect data from the logs, right? Deploy the model with poor inference and train the model as it gathers data

quasi sparrow Sep 7, 2023, 10:41 PM

#

past meteor If the device on edge is making predictions, do you need an endpoint?

Because I get the data fromthe SCADA system that gathers the signals from the industrial controllers

#

The SCADA system supports HTTP

green gyro Sep 7, 2023, 10:43 PM

#

There's a free ai art generator api?

past meteor Sep 7, 2023, 10:44 PM

#

quasi sparrow Because I get the data fromthe SCADA system that gathers the signals from the in...

At what frequency is your data coming in/out?

quasi sparrow Sep 7, 2023, 10:44 PM

#

Phew, fast. Half of a second.

past meteor Sep 7, 2023, 10:45 PM

#

quasi sparrow Phew, fast. Half of a second.

So you're sending data to the server every 0.5s?

quasi sparrow Sep 7, 2023, 10:46 PM

#

I'm gathering data with the FastAPI every half of a second

#

I haven't made any predictions yet :/

#

But yes, the SCADA system sends days to my FastAPi every half of a second

past meteor Sep 7, 2023, 10:46 PM

#

What happens if you lose internet for 5 seconds

past meteor Sep 7, 2023, 10:49 PM

#

quasi sparrow I'm gathering data with the FastAPI every half of a second

Imo you should look towards websockets considering they're bidirectional and you only open 1 connection. FastAPI supports them (assuming your SCADA system does too) https://fastapi.tiangolo.com/advanced/websockets/

WebSockets - FastAPI

FastAPI framework, high performance, easy to learn, fast to code, ready for production

quasi sparrow Sep 7, 2023, 10:51 PM

#

Interesting, yes! I think the SCADA system supports web sockets. Thank you , I’ll look into it

#

I’ll try to see if I can deploy it as an online machine learning system since the sensors tend to deteriorate over time

past meteor Sep 7, 2023, 10:53 PM

#

Last advice about moving to websockets: if it aint broke don't fix it 🤣. You could keep what you have rn and it can be a lessons learnt for the next project.

past meteor Sep 7, 2023, 10:54 PM

#

quasi sparrow I’ll try to see if I can deploy it as an online machine learning system since th...

Specifically about online ML, I did my thesis about that 🫡 . Again, you don't need to retrain continuously.

#

A very sane thing to do is to somehow collect the true label after your model has made its predictions, store it somewhere and monitor the performance drift of your model

#

That way you can actually monitor the performance of "candidate" models next to it and then deploy a new one when you chose instead of doing a gradient descent style predict -> observe y_true -> update continuously because maybe that'll make your models worse

quasi sparrow Sep 7, 2023, 10:59 PM

#

Oh, so constantly training the model actually deteriorates performance and the model should only be retrained when there is data drift or poor predictions?

past meteor Sep 7, 2023, 10:59 PM

#

No, I meant that continuously retraining might make it better, it might make it worse.

quasi sparrow Sep 7, 2023, 11:00 PM

#

I thought about training the model on a timed schedule.

past meteor Sep 7, 2023, 11:00 PM

#

If you're not forced to continuously retrain for some reason (e.g., you're on an embedded system that never persists the data, just has a limited buffer) then I don't see why you should retrain like that

quasi sparrow Sep 7, 2023, 11:01 PM

#

When retraining an online learner, do you use a whole fresh dataset to train the model or do you include some of the old dataset into the new dataset?

past meteor Sep 7, 2023, 11:01 PM

#

That's another thing, it's situation dependent

#

Sometimes updating the weights online is good, sometimes retraining from scratch with a window that contains old and new data is good (how big should the window be...?) etc

#

You can try all of these out, put them in a dashboard and select what works.

quasi sparrow Sep 7, 2023, 11:03 PM

#

past meteor Sometimes updating the weights online is good, sometimes retraining from scratch...

Do you know a good resource to learn more about this topic? I remember reading someone talk about it in this channel but I’m not familiar with the concept.

abstract wasp Sep 7, 2023, 11:05 PM

#

Anyone know about a program that can automatically create bounding boxes on images and label them? (YOLO format)

abstract wasp Sep 7, 2023, 11:06 PM

#

quasi sparrow Do you know a good resource to learn more about this topic? I remember reading s...

That’s just back prop., no? Taking the derivative of the functions. And doing the forward prop. again.

past meteor Sep 7, 2023, 11:09 PM

#

quasi sparrow Do you know a good resource to learn more about this topic? I remember reading s...

https://www.seldon.io/machine-learning-concept-drift

https://www.seldon.io/what-is-drift

This company has some great talks, papers and packages.

https://www.databricks.com/blog/2019/09/18/productionizing-machine-learning-from-deployment-to-drift-detection.html

Aside from that I'd say scouring the internet / google scholar is a good idea.

Getting good with something like Mlfow + dagster might be good:

https://docs.dagster.io/guides/dagster/managing-ml

https://mlflow.org/docs/latest/tracking.html#performance-tracking-with-metrics

Dagster schedules and trains a bunch of models at regular intervals with new data and you persist their results in mlflow. Your censor has likely drifted if the performance of your current prod model is dropping specifically with respect to the ones that you're training continuously. That's when you swap them out.

quasi sparrow Sep 7, 2023, 11:13 PM

#

Awesome, thank you for all the information, it’s been insightful!

quasi sparrow Sep 7, 2023, 11:14 PM

#

past meteor Sometimes updating the weights online is good, sometimes retraining from scratch...

Is updating the weights online similar to fine tuning the models with recent data?

past meteor Sep 7, 2023, 11:14 PM

#

quasi sparrow Is updating the weights online similar to fine tuning the models with recent dat...

Yeah

#

Oh yeah, you should also just monitor the summary statistics of your input variables over time. That in and of itself tells you if your sensor is drifting 🙂

ionic badge Sep 8, 2023, 2:19 AM

#

is there a discord server just for large language modles? thank you

#

*exclusively for

serene scaffold Sep 8, 2023, 2:24 AM

#

ionic badge is there a discord server just for large language modles? thank you

it wouldn't really make sense to have a discord server just for that.
did you decide what you want to use large language models to do?

ionic badge Sep 8, 2023, 2:27 AM

#

serene scaffold it wouldn't really make sense to have a discord server just for that. did you de...

it defenitely would - you have new models coming out every other week. And there are so many angles to cover. I'll be surprised if no discord server exists just for llms. Right now i am trying a bit of everything bert type models, fine tuning llama2 - I'm not looking into anything in particular just browsing around and see what each models can do

serene scaffold Sep 8, 2023, 2:33 AM

#

ionic badge it defenitely would - you have new models coming out every other week. And there...

out of curiosity, what was the impetus for your interest in LLMs?

ionic badge Sep 8, 2023, 2:36 AM

#

just curiousity - I have the resources to try out all of them as long as they are more or less 100B - I meant fine tune

ionic badge Sep 8, 2023, 2:36 AM

#

serene scaffold out of curiosity, what was the impetus for your interest in LLMs?

.

serene scaffold Sep 8, 2023, 2:36 AM

#

100B? bytes?

ionic badge Sep 8, 2023, 2:36 AM

#

billion

serene scaffold Sep 8, 2023, 2:36 AM

#

I see

shut girder Sep 8, 2023, 4:30 AM

#

Hello, I am trying to become a data analyst. How much of the NumPy library do I need to learn if I want to start working on basic data analytics projects?

desert oar Sep 8, 2023, 5:01 AM

#

shut girder Hello, I am trying to become a data analyst. How much of the NumPy library do I ...

not much in my opinion. you're better off focusing on pandas, which has a broadly similar interface and uses numpy internally in many places, but is more useful for general-purpose data analysis

#

numpy is very useful. but if you're new, focus on pandas first.

fallow frost Sep 8, 2023, 11:09 AM

#

Does Pytorch work well with PyPy or any other CPython alternative?

fallow frost Sep 8, 2023, 11:10 AM

#

desert oar not much in my opinion. you're better off focusing on pandas, which has a broadl...

agree, as a Data Engineer that often does Data Analysis, I use Pandas 99% of the time, I rarely need to use Numpy

past meteor Sep 8, 2023, 11:33 AM

#

shut girder Hello, I am trying to become a data analyst. How much of the NumPy library do I ...

What I can say about Numpy is that the Pandas documentation often assumes you know Numpy

#

But both libraries aren't something you "learn" but rather something you do imo.

left tartan Sep 8, 2023, 11:37 AM

#

This kind of question is asked a lot, and I never understood it. I don’t ‘learn a library’ like reading a book: I learn the parts I care about, and perhaps when I use it a lot, I’ll sit down to understand more about how it works. By this kind of question, I mean the: ‘how much of XYZ should I learn’

past meteor Sep 8, 2023, 11:37 AM

#

I'd at the very least read:
https://numpy.org/doc/stable/user/absolute_beginners.html before https://pandas.pydata.org/docs/user_guide/10min.html

Numpy guides will for instance go into detail about what broadcasting is while the pandas stuff name drops it. Which might be confusing for some readers, especially since it's a foundational aspect of working with dataframes.

past meteor Sep 8, 2023, 11:38 AM

#

left tartan This kind of question is asked a lot, and I never understood it. I don’t ‘learn ...

People differ! 🙂 At the very least I always read the quick start, user guide and one, or more, of the tutorials in the docs. Reading the reference OTOH makes no sense.

left tartan Sep 8, 2023, 11:39 AM

#

past meteor People differ! 🙂 At the very least I always read the quick start, user guide an...

Yah, fair, I skim the quick start, write some code, and then go back for more as needed. I just mean the ‘how much?’ As a sort of percentage question. Maybe it’s just how I translate ‘how much’ to a ‘do I really need to learn all of it?’

past meteor Sep 8, 2023, 11:40 AM

#

I see, that's valid. Especially for libraries with colossal APIs like Numpy/Pandas. Nobody knows all of them. Just reading the docs to know how to write it somewhat idiomatically + what features exist on a high level is OK.

blissful crystal Sep 8, 2023, 12:56 PM

#

Hey there
I have a few issues

my model works fine on colab.google (the basics - object detection using cifar-10)
but once i bring the model in with flask file (by downloading it)
it is not working properly

wooden sail Sep 8, 2023, 2:21 PM

#

fallow frost Does Pytorch work well with PyPy or any other CPython alternative?

is there any special reason to avoid cpython here? your pytorch code's heavy routines won't run in python anyway

fallow frost Sep 8, 2023, 2:23 PM

#

wooden sail is there any special reason to avoid cpython here? your pytorch code's heavy rou...

I dont want to optimize Pytorch per se, but an API that uses it along with other ML/NLP libraries written in pure-Python

wooden sail Sep 8, 2023, 2:23 PM

#

oof

#

are you sure about that? if performance is a problem, i would say you should avoid the pure python libraries

fallow frost Sep 8, 2023, 2:25 PM

#

wooden sail are you sure about that? if performance is a problem, i would say you should avo...

I agree, I always look for libraries that are implemtned with Cython/C-extension for this kind of stuff, but a colleague wrote the API and I'm wondering if there is a drop in replacement for CPython that will optimize it significantly

#

my major concern is if PyPy is fully supported by Pytorch or if there will small bugs

wooden sail Sep 8, 2023, 2:26 PM

#

i don't think it's supported at all

fallow frost Sep 8, 2023, 2:26 PM

#

so do you think the code will crash when I import it trought PyPY? or more like subtle errors?

wooden sail Sep 8, 2023, 2:26 PM

#

it won't work at all

#

i don't think pytorch can even be installed for it

#

you can read through here and take a look https://github.com/pytorch/pytorch/issues/17835 but really there is no way you will get any form of good ML/NLP performance with something written in pure python other than for small toy scenarios

GitHub

PyPy support · Issue #17835 · pytorch/pytorch

🚀 Feature Support pytorch from PyPy -- a fast, compliant alternative implementation of the Python language (http://pypy.org) Motivation While pytorch itself probably won't benefit much from PyP...

fallow frost Sep 8, 2023, 2:29 PM

#

oh damnn

fallow frost Sep 8, 2023, 2:30 PM

#

wooden sail you can read through here and take a look https://github.com/pytorch/pytorch/iss...

I suppose I can try adding type hints and compile it with Cython

wooden sail Sep 8, 2023, 2:30 PM

#

you can give that a shot. idk if you find that easier than using a proper machine learning module

#

at that point you may consider just rewriting it in C(++)

fallow frost Sep 8, 2023, 2:31 PM

#

yeah not really my choice

fallow frost Sep 8, 2023, 2:31 PM

#

wooden sail at that point you may consider just rewriting it in C(++)

you cant compare the two, one takes literally an hour, and the other...

wooden sail Sep 8, 2023, 2:32 PM

#

well, give it a shot with cython and see if you get the performance you want. but really you should think of any ML stuff implemented in pure python as nothing more than a proof of concept that later needs to be rewritten properly. whether you prefer cython or something else, that's up to you

#

rewriting in pytorch is probably a good idea if you want to be able to use GPUs, for example

#

otherwise the parallelization is on you

fallow frost Sep 8, 2023, 2:34 PM

#

ight, I'll try out some of this stuff

#

thanks anyways

hasty mountain Sep 8, 2023, 3:29 PM

#

Hey guys, I was thinking here... Considering that CIFAR10 uses 32x32 RGB images, thus, each image has 32x32x3 = 3,072 pixels, does it make sense if I make a neural network that receives one of such images and tries to extract it into...let's say... 12,000 values?
I suppose that, despite every image in the dataset having the same amount of pixels, each image has a different amount of relevant features...but the number of features can hardly be equal or higher than its number of pixels, right?

#

I know that the number of parameters in a neural network is some kind of "trial and error" game, but I'm trying to have some idea of the range of the possibilities I can try.

mild dirge Sep 8, 2023, 3:38 PM

#

hasty mountain Hey guys, I was thinking here... Considering that CIFAR10 uses 32x32 RGB images,...

What kind of layer would it be?

hasty mountain Sep 8, 2023, 3:38 PM

#

mild dirge What kind of layer would it be?

Linear

agile cobalt Sep 8, 2023, 3:38 PM

#

"extracting values" from the input is the foundation of how deep learning works, but usually you'll want to reduce the number of values rather than increasing, both for efficiency and so that you can reduce it into the answer you want to for the model to give you after some layers

how to extract things efficiently is one of the core questions, and the answer to that are all the different layers/architectures like convolution layers or transformers (attention)

mild dirge Sep 8, 2023, 3:38 PM

#

A convolutional layer that outputs a/some feature map(s) totaling 12,000 values
Or a fully connected layer

hasty mountain Sep 8, 2023, 3:38 PM

#

(Or Fully Connected)

mild dirge Sep 8, 2023, 3:38 PM

#

That would mean 3,072 * 12,000 weights

#

or 36 million weight values for a single layer

#

Disregarding the fact that you should probably not use fully connected on images

hasty mountain Sep 8, 2023, 3:39 PM

#

mild dirge Disregarding the fact that you should probably not use fully connected on images

Maybe. But I'm trying to test a VAE using FCC layers to see how it goes.

mild dirge Sep 8, 2023, 3:40 PM

#

Pretty sure you don't need fully connected layers for an auto-encoder

hasty mountain Sep 8, 2023, 3:40 PM

#

At least, it seems that the beta-VAE relies on FCC layers and it goes fine...

hasty mountain Sep 8, 2023, 4:01 PM

#

hasty mountain At least, it seems that the beta-VAE relies on FCC layers and it goes fine...

Oops, VQ-VAE*

#

Oh, ok. The VQ-VAE indeed uses FCC layers for the encoder and the decoder. But the Decoder output is also passed to a PixelCNN to generate the image pithink

desert oar Sep 8, 2023, 4:18 PM

#

fallow frost Does Pytorch work well with PyPy or any other CPython alternative?

I wouldn't expect it to

iron basalt Sep 8, 2023, 4:57 PM

#

fallow frost I suppose I can try adding type hints and compile it with Cython

You can try Taichi.

#

https://docs.taichi-lang.org/docs/accelerate_pytorch

Accelerate PyTorch with Taichi | Taichi Docs

Taichi and Torch serve different application scenarios but can complement each other.

iron basalt Sep 8, 2023, 5:01 PM

#

fallow frost I agree, I always look for libraries that are implemtned with Cython/C-extension...

Not significantly, but you can try Nuitka.

azure compass Sep 8, 2023, 6:17 PM

#

Does anyone have a good/easy to approach tutorial for pytorch --I'm trying to provide a tensor of MFCC feature extraction as an input and get 4 numbers out, I dont udnerstand exactly how to structure my data in order to do this though

abstract wasp Sep 8, 2023, 8:44 PM

#

I want to predict where an image was taken (estimate of coordinates), and this is what I was thinking. I was thinking of using geohashing, getting images of the location, and treat each geohash as an output—I’ll basically treat this as a classification problem. What do you guys think? It is a good way to go about this or do you guys have any suggestions, etc.?

lapis sequoia Sep 9, 2023, 3:55 AM

#

anyone looking to start a project with me? looking for good developers for salary who are pretty experienced in AI/DS/ML

verbal venture Sep 9, 2023, 8:04 AM

#

I want to make a web application that uses different LLMs frmo differnet organizations. I'm worried prehistoric inputs won't be used in the attention mechanism as they are different LLMs. is it possible to retain the different attention histories across each LLM. in other words, use the previous attention from one LLM in another

rugged mist Sep 9, 2023, 10:44 AM

#

i have a list of 2-tuples
each tuple contains one number thats a good approx of an unknown, and the other is far away
how do i collect the good number from all tuples so i can take their mean

humble wren Sep 9, 2023, 10:50 AM

#

I'm doing face recognition using cv2 dlib and face_recognition; is this performance normal??
The different videos show performing face recognition (not detection) per 100 and per every frame

wooden sail Sep 9, 2023, 12:22 PM

#

rugged mist i have a list of 2-tuples each tuple contains one number thats a good approx of ...

do you know which item of each tuple is the good one? btw this sounds like something that'd benefit from numpy instead of lists and tuples

rugged mist Sep 9, 2023, 1:02 PM

#

wooden sail do you know which item of each tuple is the good one? btw this sounds like somet...

i dont thats the hard part

wooden sail Sep 9, 2023, 1:03 PM

#

do you know anything about the statistics of the problem? are all of the tuples different realizations of the same random process?

rugged mist Sep 9, 2023, 1:07 PM

#

i have an array of sensors in a known arrangement, and im supposed to estimate the angle from which a signal arrives from the relative phase shift between pairs

for a given pair of sensors, for the same shift, theres two possible directions the wave may have come from

wooden sail Sep 9, 2023, 1:07 PM

#

ah a DOA problem

rugged mist Sep 9, 2023, 1:07 PM

#

yea

wooden sail Sep 9, 2023, 1:07 PM

#

but this is super different. you have a parametric model

#

the front-back problem is most easily solved by restricting your geometry

#

if you have only e.g. a uniform linear array and waves could really arrive from both directions, i'm not sure there's a good way to do this

rugged mist Sep 9, 2023, 1:09 PM

#

its not in a ula its on a circle

wooden sail Sep 9, 2023, 1:09 PM

#

then this shouldn't be much of a problem i think

#

you can't use plane wave models in this case anyway

rugged mist Sep 9, 2023, 1:10 PM

#

ive been told to lol

wooden sail Sep 9, 2023, 1:10 PM

#

hmmm i mean, i guess you could, if the array is super far away from the sources

#

a little weird

#

but yeah since the sensors are now not in a line, you have extra geometric info

#

i would think a standard migration/correlation/synthetic aperture focusing would already give you a nice spatial correlation map

rugged mist Sep 9, 2023, 1:11 PM

#

i spent a... non negligible amount of time trying to get the angles right but i get muddled up with the signs and 180 +-s

#

(trying to do it on paper i mean)

wooden sail Sep 9, 2023, 1:12 PM

#

i don't think this is one you can do on paper

#

or should, at least

#

none of the nice subspace methods work with a circular array

#

i'm 99% sure you'd have to use a more sophisticated estimator or do some matrix products you do not want to do by hand

rugged mist Sep 9, 2023, 1:13 PM

#

no im not doing the computation by hand i meant

#

just converting the angle i get from a pair into an angle wrt center of circle is really janky

wooden sail Sep 9, 2023, 1:14 PM

#

a pair of what?

rugged mist Sep 9, 2023, 1:14 PM

#

of sensors

wooden sail Sep 9, 2023, 1:14 PM

#

huh

#

is that what you were told to do?

rugged mist Sep 9, 2023, 1:14 PM

#

i think so

wooden sail Sep 9, 2023, 1:15 PM

#

i don't think that's the best way of doing it, but ok

#

what i imagine is that, for each pair of sensors, you get an okish estimate of the angle, and another estimate that is reflected wrt the line passing through the two sensors, yeah?

#

so maybe some form of clustering on the points would work

slender kestrel Sep 9, 2023, 1:18 PM

#

@past meteor hello sorry about the ping you suggested me a book for timeseries the book was otexts forecasting principles and it was a really nice one i completed it do you have any other suggestions for book on machine learning in similar formats like with video explanation and online pdf

#

also edd if you have any ideas please suggest

rugged mist Sep 9, 2023, 1:20 PM

#

wooden sail what i imagine is that, for each pair of sensors, you get an okish estimate of t...

yes

wooden sail Sep 9, 2023, 1:21 PM

#

rugged mist yes

you can try averaging and then removing the outliers or something like that. try plotting a scatterplot

#

but also, i'm crying in maximum likelihood estimation 😭

rugged mist Sep 9, 2023, 1:22 PM

#

hows that work

#

Dont tell me

#

i was thinking of sthn like this:
start with a uniform pdf over 0 to 2pi
iterate over pairs
for each pair, make some bellcurve like function that has two peaks at the two angles i get from this pair
update my pdf with this using some bayes rule type thing
take argmax pdf at the end

#

i rly need to read some literature on this

slender kestrel Sep 9, 2023, 1:26 PM

#

rugged mist i was thinking of sthn like this: start with a uniform pdf over 0 to 2pi iterate...

you can use bayesian optimization to find a distribution for your data points

#

i can be totally wrong here tho

wooden sail Sep 9, 2023, 1:26 PM

#

what one would do is take all of the data from the sensors together, make a parametric model that depends on the AOA, and then solve an optimization problem where one maximizes the log likelihood (assuming your parametric model describes the mean of a random distribution, probably guassian if you don't know anything else)

#

so you'd end up with a nonlinear least squares problem. you could take the estimate you get from your current approach, or from some other method, take it as an initial guess, and then use some (quasi) newton method or gradient method to find the angle

rugged mist Sep 9, 2023, 1:27 PM

#

o

#

kinda makes sense

past meteor Sep 9, 2023, 1:33 PM

#

slender kestrel <@260493929047130113> hello sorry about the ping you suggested me a book for ti...

For time series specifically or ML at large?

slender kestrel Sep 9, 2023, 1:33 PM

#

past meteor For time series specifically or ML at large?

ml at large

past meteor Sep 9, 2023, 1:34 PM

#

My favourite text remains introduction to statistical learning. Afterwards you can read dive into deep learning.

slender kestrel Sep 9, 2023, 1:34 PM

#

past meteor My favourite text remains introduction to statistical learning. Afterwards you c...

i guess reading otexts is enough to atleast get me a machine learning internship i suppose

slender kestrel Sep 9, 2023, 1:35 PM

#

past meteor My favourite text remains introduction to statistical learning. Afterwards you c...

does the book an introduction statistical learning also have video format explanations ? like the o texts one ?

slender kestrel Sep 9, 2023, 1:35 PM

#

slender kestrel i guess reading otexts is enough to atleast get me a machine learning internsh...

plus knowing about ml and dl algos

wooden sail Sep 9, 2023, 1:36 PM

#

that depends on where you apply

#

for ML many people like seeing a degree

past meteor Sep 9, 2023, 1:36 PM

#

slender kestrel does the book an introduction statistical learning also have video format explan...

No. I have a strong bias for reading. Videos are pretty useless

slender kestrel Sep 9, 2023, 1:37 PM

#

past meteor No. I have a strong bias for reading. Videos are pretty useless

well i do read too i just find the videos convinient thats it

past meteor Sep 9, 2023, 1:37 PM

#

I only use videos if I want to zone out and get a little bit of information for free

slender kestrel Sep 9, 2023, 1:37 PM

#

wooden sail for ML many people like seeing a degree

ok ok

past meteor Sep 9, 2023, 1:38 PM

#

They trick you into thinking you're learning, reading is tiresome because you actually are

slender kestrel Sep 9, 2023, 1:38 PM

#

past meteor I only use videos if I want to zone out and get a little bit of information for ...

yeaa i basically see those videos for that only when i am low on concentration so videos help a lot

#

https://youtu.be/LvySJGj-88U?si=ZVdWZTkTwBOFI497
here i found a playlist of the book

YouTube

Stanford Online

Statistical Learning: 1.1 Opening Remarks

Statistical Learning, featuring Deep Learning, Survival Analysis and Multiple Testing

Trevor Hastie, Professor of Statistics and Biomedical Data Sciences at Stanford University - https://statistics.stanford.edu/people/trevor-j-hastie

Robert Tibshirani, Professor of Statistics and Biomedical Data Sciences at Stanford University - https://statis...

▶ Play video

past meteor Sep 9, 2023, 1:41 PM

#

Please don't watch the videos. If you're going to watch videos at the very least watch it from another book

#

Because what'll happen is you might watch a chapter instead of reading it and that tricks you into thinking you've covered the material

#

It's not you personally, it's a general observation that applies to everyone

slender kestrel Sep 9, 2023, 1:44 PM

#

past meteor It's not you personally, it's a general observation that applies to everyone

ok so imma avoid watching videos if you suggest it

past meteor Sep 9, 2023, 1:44 PM

#

slender kestrel ok so imma avoid watching videos if you suggest it

If you want to watch videos on the side you can watch statistical rethinking 🙂

#

Read 1 watch 1 is a good compromise

slender kestrel Sep 9, 2023, 1:45 PM

#

past meteor If you want to watch videos on the side you can watch statistical rethinking 🙂

alrighty thnx a lot mate !

serene scaffold Sep 9, 2023, 2:18 PM

#

@past meteor please check your DMs

torn ore Sep 9, 2023, 2:39 PM

#

Hey so I found this program on the internet for a minimax in a tictactoe game. When it's untouched it works perfectly, but I want to add a difficulty setting and I'm tinkering with how to do that. I decided to add an additional check for if depth == 8 to just pick a random available cell. So if the player goes first, the programs first move will be a random pick instead of the center spot every time. However, this results in some weird behavior where after I pick cell 1, it picks random cell, I pick cell 7, the bot will always pick cell 3 instead of blocking my play (cell 4). That move is handled by the minimax portion so I don't understand why it won't block my play. Is there a flaw in this algorithm causing that? Is there another way I could alter difficulty?

https://paste.pythondiscord.com/WH5A

mild dirge Sep 9, 2023, 2:49 PM

#

Maybe because if the opponent plays perfectly it can't win anymore anyways?

#

This situation right? (circle=player, cross=ai) @torn ore

torn ore Sep 9, 2023, 2:53 PM

#

mild dirge This situation right? (circle=player, cross=ai) <@416738006511124480>

Yea that's pretty much what's happening. I play top circle, opponent plays randomly, as expected, then I play bottom circle, and it will not block with the next move

mild dirge Sep 9, 2023, 2:53 PM

#

No point

#

It has already lost if you play perfectly

#

#

You get this then, and player chooses middle

#

Then you have top right or bottom right is win

#

So it can't block both

torn ore Sep 9, 2023, 2:54 PM

#

Hmmm

mild dirge Sep 9, 2023, 2:55 PM

#

I'm not sure if minimax would choose the cell that prolongs the game for the longest

torn ore Sep 9, 2023, 2:55 PM

#

So you think it's playing through all the scenarios and the one with the smallest max loss is to just let it happen

#

Thats.. interesting Sussott I'm not familiar with these algorithms this is the first one I've looked at

#

So that could be the case

mild dirge Sep 9, 2023, 2:56 PM

#

Tbh it's been a while since I've implemented minimax, so it's a bit foggy

#

But from what I remember it simulates an optimal game where both players pick "optimally"

#

In which case the ai would have lost in that situation, so each choice would give the same reward I suppose

torn ore Sep 9, 2023, 2:58 PM

#

Yea it just threw me off because any other time I've played it, given the opportunity to block me or win with its move it'd block me most of the time - which kinda makes sense now that you bring that up, any other time its trying to tie but in this case it knows its a lost cause

#

So I guess I need to find a different way to adjust difficulty

#

🤯 me rn

#

Might have to just have it check for [x, ~, x] and pick the empty spot for the harder difficulty instead of actually predicting

mild dirge Sep 9, 2023, 3:16 PM

#

Yeah you can hardcode stuff for a lower difficulty ig

#

Maybe there's a more elegant way that you can encorporate the length of the game into the minimax algorithm

torn ore Sep 9, 2023, 3:34 PM

#

I'll probably hardcode certain moves in, seems easier. Or maybe move the random one from depth 8 to depth 6 to give the player one opportunity to win on the hard difficulty

#

I did have it setup to only look 4 moves ahead, but then I ran into an issue where the depth was never 0

past meteor Sep 9, 2023, 3:47 PM

#

torn ore I did have it setup to only look 4 moves ahead, but then I ran into an issue whe...

How are you encoding the state in your problem?

#

This only tangentially related but I'd encourage you to look at post state representations as they're popular for this kind of thing

torn ore Sep 9, 2023, 3:48 PM

#

The state is just a list of coordinates

#

The problem I was having where the depth was never 0 when looking 4 moves ahead was because I was making the depth a static 4 before giving it to the minimax so my checks for a draw game were never true

past meteor Sep 9, 2023, 3:49 PM

#

I got it wrong, after state is the term I was looking for.

twin forge Sep 9, 2023, 3:50 PM

#

twin forge I want to find the average holding period of the portfolio and it's average mont...

I've managed to get here:

df_action of a given model is considered, df_action has values like 0 (no action was taken), +1 or +100 (representing the purchase of one or more shares, limit being 100 shares) and -1 or -100 (representing the selling of one or more shares)

past meteor Sep 9, 2023, 3:50 PM

#

They're mostly used in reinf learning but they're applicable to minimax too

torn ore Sep 9, 2023, 3:51 PM

#

past meteor They're mostly used in reinf learning but they're applicable to minimax too

I don't think there's an after state in this. I put the code in this link if you wanna check it out. I'll look into what an after state is when I get done mowing the lawn lol

#

#data-science-and-ml message

past meteor Sep 9, 2023, 3:52 PM

#

I'd have to think about your actual problem at hand, I'll have a look.

torn ore Sep 9, 2023, 3:53 PM

#

Like I said before, I will probably end up hard coding the first couple moves for the 2 easier difficulties

#

Seems like the simplest way to achieve the expected results

#

Or I can make a random chance for the bot to play a random valid move

past meteor Sep 9, 2023, 4:02 PM

#

I Read up. Yeah if you play perfectly and go first I guess there's nothing it can do. You can add logic to prolong the game I guess

torn ore Sep 9, 2023, 4:06 PM

#

I think I'm just going to do something like x = randint(0,9) if depth == x <play a random valid move> so there's a small chance the bot misplays once in a while

slender kestrel Sep 9, 2023, 5:54 PM

#

past meteor I Read up. Yeah if you play perfectly and go first I guess there's nothing it ca...

congrats on the role man

weak mortar Sep 9, 2023, 5:58 PM

#

hi, i've been playing around with generating future data(fantasy data) from past data, using brownian motion. its just a hobby of mine, but curious to hear if its something that you use professionally in this field, and also hear if there are other similar methods to use instead?

price_simulations = []
for _ in range(n_simulations):
    price_simulation = [initial_price]
    for _ in range(n_days * 24): 
        candle_return = np.random.normal(mu * dt, sigma * np.sqrt(dt)) 
        price_t = price_simulation[-1] * (1 + candle_return)
        price_simulation.append(price_t)
    price_simulations.append(price_simulation)

#

as i dont truly know what im doing, i posted the loop im using in case i made any mistakes

slow vigil Sep 9, 2023, 6:13 PM

#

first thing I can tell you is that a double for-loop is going to be inefficient

#

It's going to be O(n^2)

#

I would look into using some sort of data analysis framework like numpy to see if you can vectorize the operations somehow

#

Which brings me to the reason I'm here...

#

Anyone know the best way to make a dataframe out of JSON data when I only need select fields from the data?

weak mortar Sep 9, 2023, 6:15 PM

#

okay thanks for letting me know! i plan to run it with multiprocessing to make use of more than 1 core

slow vigil Sep 9, 2023, 6:16 PM

#

I think that should be your 2nd option

#

1st option should be to make the code more efficient

#

then use multiprocessing if you need to

weak mortar Sep 9, 2023, 6:16 PM

#

yes i think that both options arent mutually exclusive

#

its just alot faster to use 24 cores than 1

slow vigil Sep 9, 2023, 6:17 PM

#

Sometimes it isn't

#

Sometimes the overhead of starting all those processes eats up any speed and/or efficiency gains you get from multiprocessing

#

If your code is efficient enough you won't need it

weak mortar Sep 9, 2023, 6:18 PM

#

when calculating on large datasets with pandas, i observe a rather linear relation between completion time and amount of cores though

slow vigil Sep 9, 2023, 6:19 PM

#

Sure if you have a lot of data. Doesn't change the fact that you can get a better efficiency increase by not having exponentially expensive operations in your code

weak mortar Sep 9, 2023, 6:21 PM

#

i firmly believe nobody would put up any argument against that more efficient code will lead to a faster completion time 💚

slow vigil Sep 9, 2023, 6:23 PM

#

Why are you using brownian motion?

weak mortar Sep 9, 2023, 6:24 PM

#

its the only method i know of, im quite green in this field

flint oxide Sep 9, 2023, 6:28 PM

#

please tell me if this doesn't belong here. twice I posted in help channel yet i received none so i will change my strategy

How do I change the position of the vertically placed dataframe column's label such that it doesn't overlap with the pie chart label on the left side?

student_list['Ngành đào tạo'].value_counts().plot.pie(autopct='%1.1f%%')
plt.legend(bbox_to_anchor=(2, 1), loc='upper left')
plt.show()

slow vigil Sep 9, 2023, 6:29 PM

#

Is that matplotlib or what

flint oxide Sep 9, 2023, 6:29 PM

#

yes

slow vigil Sep 9, 2023, 6:30 PM

#

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.tight_layout.html

#

Did you try this?

flint oxide Sep 9, 2023, 6:32 PM

#

No i haven't. I'l give it a shot

slow vigil Sep 9, 2023, 6:33 PM

#

Is that one plot?

#

it is. Ok I think tight_layout is only for subplots

#

https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylabel.html

flint oxide Sep 9, 2023, 6:34 PM

#

slow vigil Is that one plot?

yes it is

slow vigil Sep 9, 2023, 6:34 PM

#

Try the second one

flint oxide Sep 9, 2023, 6:36 PM

#

Awesome! I can interact with the y-axis title now.

#

Thank you very much @slow vigil

slow vigil Sep 9, 2023, 6:36 PM

#

np

flint oxide Sep 9, 2023, 6:40 PM

#

Strange follow up question but what did you look up to find these links?

slow vigil Sep 9, 2023, 8:33 PM

#

flint oxide Strange follow up question but what did you look up to find these links?

When I looked at the first link I noticed that there were 'pad' properties and so that tipped me off to search for 'matplotlib axis label padding'

lapis wigeon Sep 9, 2023, 9:41 PM

#

@past meteor hey if you don't mind could you give me a roadmap on learning ML with python

past meteor Sep 9, 2023, 9:42 PM

#

lapis wigeon <@260493929047130113> hey if you don't mind could you give me a roadmap on learn...

How good is your Python right now?

lapis wigeon Sep 9, 2023, 9:42 PM

#

I'd say im very thorough with the basics and intermediate

#

im partly more experienced in C# but have been learning python lately

#

not very high level though

past meteor Sep 9, 2023, 9:43 PM

#

And what's your end goal ML/DL wise. Are you in for the long haul or do you want to explore what exists in the space? (No wrong answers here: 🙂 )

lapis wigeon Sep 9, 2023, 9:44 PM

#

well im still exploring but if I like it, for the long haul ig

#

I was just learning something about neural networks a few weeks ago and got interested in ML

#

but honestly, exploring yea

past meteor Sep 9, 2023, 9:45 PM

#

So I'd say neural nets are a very specific type of ML model and you kind of need to have a baseline level of knowledge of the rest to know when they make sense

lapis wigeon Sep 9, 2023, 9:45 PM

#

yep

#

how good would I have to be at python to start digging into ML related stuff?

past meteor Sep 9, 2023, 9:47 PM

#

If your Python is rusty you have 2 problems: figuring out what ML is and figuring out how to do it in Python. Leads to a cognitive overload

#

If you know more C# I'd suggest you start by "translating" a C# project into Python to be honest.

lapis wigeon Sep 9, 2023, 9:48 PM

#

im not a huge fan of C#, i prefer to dev in python

#

id rather probably practice and get better at python

past meteor Sep 9, 2023, 9:49 PM

#

That's no problem. Have you completed a project in C#? Just redo it in Python

lapis wigeon Sep 9, 2023, 9:49 PM

#

past meteor That's no problem. Have you completed a project in C#? Just redo it in Python

yea

lapis wigeon Sep 9, 2023, 9:49 PM

#

past meteor If you know more C# I'd suggest you start by "translating" a C# project into Py...

oh alr

past meteor Sep 9, 2023, 9:50 PM

#

Simply because if you get more of the language down you won't be learning 2 things at once when you head into ML

lapis wigeon Sep 9, 2023, 9:50 PM

#

hmmm

#

what source would you recommend to learn ML from tho, the book you recommended?

past meteor Sep 9, 2023, 9:52 PM

#

My trinity of resources are:

All 3 are 100 % free and have PDFs on their sites

#

Only the last one is strictly about deep learning / neural networks but I actually believe book 3 depends on knowing book 2 and 2 on 1

#

You're free to start from book 3 for "exploration" purposes but if you're in it for the long haul I'd circle back and read them in that order 🙂

lapis wigeon Sep 9, 2023, 9:54 PM

#

past meteor You're free to start from book 3 for "exploration" purposes but if you're in it ...

alright, thanks for the books. for now i'll improve my knowledge on python and take a introductory course on kaggle as someone recommended and then checkout the 3rd book. I'm not very serious about it, just exploring as I'm still a highschool grad

#

thanks for the help tho 😄

past meteor Sep 9, 2023, 9:54 PM

#

Oooh that's important context I didn't get! 🙂 Kaggle is fine as well indeed. Good luck

lapis wigeon Sep 9, 2023, 9:55 PM

#

I mean i'm still very new to it so would prefer to look at basic courses before diving into books

#

thanks for the help man

left tartan Sep 9, 2023, 9:56 PM

#

I mentioned, but just reiterating: I’m still a fan of cs50 for ai for a structured survey that’s Python oriented

#

It’s a good flyover of topics with hands on practical examples. Still need the deeper stuff, but it’s satisfying to write code that does stuff

lapis wigeon Sep 9, 2023, 9:57 PM

#

yea i'll look at that

#

the cs50 courses are pretty good, I've tried cs50t and cs50x as well

#

thanks mate 😄

past meteor Sep 9, 2023, 10:08 PM

#

lapis wigeon I mean i'm still very new to it so would prefer to look at basic courses before ...

Kaggle is really good

#

If you're willing to pay €10-15 then 100 days of Python is decent as well

lapis wigeon Sep 9, 2023, 10:54 PM

#

does that include ML?

past meteor Sep 9, 2023, 10:55 PM

#

Some of the days are about ML but a whole range of topics are covered including web, game dev, databases, ...

torn ore Sep 10, 2023, 12:46 AM

#

past meteor I Read up. Yeah if you play perfectly and go first I guess there's nothing it ca...

You interested in what I ended up doing for this?

past meteor Sep 10, 2023, 12:46 AM

#

torn ore You interested in what I ended up doing for this?

Definitely!

torn ore Sep 10, 2023, 12:48 AM

#

Crap I thought I sent the code over to my phone I guess i didn't, one sec I can explain lol

#

5 difficulties:
Too easy
Easy
Med
Hard
Too hard

#

If it's too easy, x,y is a random valid move

#

Set to too hard, if depth == 9 choose a corner, else use the minimax

#

At the beginning of the ai turn I generated a randint(1,9)

#

If depth == 9 for easy and medium, pick a random move, for hard, pick a corner

#

Easy: if depth >= randint then pick a randomized move (more likely to happen early game)

Medium: if depth <= randint pick a random move(more likely to happen late game)

Hard: if depth == randint pick a random move

For each of those, if not then use minimax

#

    ec = empty_cells(board)
    depth = len(ec)
    odds = randint(1,9)
    if depth == 0:
        return 
    elif depth == 1:
        rn = 0
    else:
        rn = randint(0, depth-1)
    moves = {
        1: [0, 0], 2: [0, 1], 3: [0, 2],
        4: [1, 0], 5: [1, 1], 6: [1, 2],
        7: [2, 0], 8: [2, 1], 9: [2, 2],}
    if difficulty == hardness.Too_Easy:
        x,y = ec[rn][0], ec[rn][1]
    elif difficulty == hardness.Easy:
        if depth == 9:
            x = choice([0, 1, 2])
            y = choice([0, 1, 2])
        else:
            if depth >= odds:
                x,y = ec[rn][0], ec[rn][1]
            else:
                move = await minimax(board, depth, COMP)
                x,y = move[0], move[1]
    elif difficulty == hardness.Medium:
        if depth == 9:
            x = choice([0, 1, 2])
            y = choice([0, 1, 2])
        else:
            if depth <= odds:
                x,y = ec[rn][0], ec[rn][1]
            else:
                move = await minimax(board, depth, COMP)
                x,y = move[0], move[1]
    elif difficulty == hardness.Hard:
        if depth == 9:
            x = choice([0, 2])
            y = choice([0, 2])
        else:
            if depth == odds:
                x,y = ec[rn][0], ec[rn][1]
            else:
                move = await minimax(board, depth, COMP)
                x,y = move[0], move[1]
    else:
        if depth == 9:
            x = choice([0, 2])
            y = choice([0, 2])
        else:
            move = await minimax(board, depth, COMP)
            x, y = move[0], move[1]
    board[x][y] = COMP```

desert oar Sep 10, 2023, 1:54 AM

#

left tartan I mentioned, but just reiterating: I’m still a fan of cs50 for ai for a structur...

have you looked through the fastai course? it seemed pretty good when i skimmed over it last year

left tartan Sep 10, 2023, 1:54 AM

#

desert oar have you looked through the fastai course? it seemed pretty good when i skimmed ...

Have not, will check it out

desert oar Sep 10, 2023, 1:56 AM

#

caveat: i am not actually very experienced with "ai" things

#

i am a regression fitter at heart

serene scaffold Sep 10, 2023, 2:14 AM

#

I, too, am a regressive.

fossil cliff Sep 10, 2023, 3:16 AM

#

past meteor My trinity of resources are: 1. https://mml-book.github.io/book/mml-book.pdf 2....

is it for beginners i have just completed my highschool, would it be helpful for me as i want to pursue my career in data science.

#

i want to gather content so i can start learming

abstract wasp Sep 10, 2023, 5:00 AM

#

Anyone know how to restart the kernel in Kaggle? I updated tensorflow but when I double check the version, I still have the old one.
Also, I am trying to run this code in Jupyter notebook locally but every time I run the notebook I get something like “your kernel has died, it will restart automatically”. How can I fix this?

cold osprey Sep 10, 2023, 6:21 AM

#

hi, i have multiple large csv files of about 5 GB in size. I tried loading them with pandas but ran into memory error. Not sure what other tools i could use to load the data?

#

polars seems to crash the kernel everything i try to read those large files

wooden sail Sep 10, 2023, 6:27 AM

#

in polars you can use a lazyframe. dask has a similar behavior with dask.dataframe. otherwise you have to split the data into smaller chunks

#

if you keep trying to load all the data into memory, no language or module will help 😛 you'd have to go out and buy more ram

cold osprey Sep 10, 2023, 6:31 AM

#

lemme read up on lazy frame

#

the problem is i cant even get one of those 5GB files to load

wooden sail Sep 10, 2023, 6:37 AM

#

yeah, that's what lazyframe is for

#

to not load it to memory but read from disk

cold osprey Sep 10, 2023, 6:38 AM

#

ah right

#

scan_csv

abstract wasp Sep 10, 2023, 7:04 AM

#

Why? 😭 someone help T-T

Screenshot_2023-09-10_at_12.03.05_AM.png

vestal spruce Sep 10, 2023, 7:10 AM

#

abstract wasp Why? 😭 someone help T-T

You need to give us more context of your problem, perhaps share a snippet of the code in question on #1035199133436354600 ?

#

But judging from the error message I think you wanted to use the keras import from tensorflow package with predefined name models, you might want to check those keywords to see if any of them is defined. if models stated then perhaps use keras as its the default reference used.

flint oxide Sep 10, 2023, 8:09 AM

#

slow vigil When I looked at the first link I noticed that there were 'pad' properties and s...

ahh make sense. Thanks

zealous badger Sep 10, 2023, 9:25 AM

#

Hi , so I'm trying to automate preprocessing but I'm kind of stuck with outlier treatment

#

Afaik to remove outliers , we find the zscore and fix a threshold, and remove the ones which are above the threshold.

#

But I'm getting a weird error for this dataset where there's no datapoints left, when the threshold is 2.

zealous badger Sep 10, 2023, 9:43 AM

#

nvm , i think it might have been due to not accounting for columns with object dtypes

past meteor Sep 10, 2023, 10:25 AM

#

fossil cliff is it for beginners i have just completed my highschool, would it be helpful for...

I'd say you should try them and see if they're too difficult or not. If they are you can ping me or ask someone else and I can see what else can work 🙂

past meteor Sep 10, 2023, 10:26 AM

#

wooden sail yeah, that's what lazyframe is for

Btw I also use Lazyframes when the data is already in memory 🙂

#

I treat it as a query optimiser in SQL, several operations are done and they're "compiled" to a more optimal instruction. The non lazy API does them step-by-step. My entire data pipeline is reading data from a db, doing .lazy() and then aggregating all steps and then doing .collect()in the end 🤣

wooden sail Sep 10, 2023, 10:30 AM

#

yeah that also makes sense

#

each operation to the db has some overhead, so it can make sense to bundle up a few

past meteor Sep 10, 2023, 10:31 AM

#

It's also because if you do say 5 things sequentially and the last is a filter depending on what the previous 4 where the query optimizer might filter first which makes the preceding 5 faster

left tartan Sep 10, 2023, 1:21 PM

#

abstract wasp Anyone know how to restart the kernel in Kaggle? I updated tensorflow but when I...

I’ve never used Kaggle notebooks, but post the code for second issue and we can look. Disregard, I didn’t scroll :/

desert oar Sep 10, 2023, 1:41 PM

#

zealous badger Afaik to remove outliers , we find the zscore and fix a threshold, and remove th...

note that this is one recommended technique among many, it's not the only way

zealous badger Sep 10, 2023, 1:44 PM

#

desert oar note that this is one recommended technique among many, it's not the only way

yeah there's IQR too , afaik

past meteor Sep 10, 2023, 2:26 PM

#

zealous badger yeah there's IQR too , afaik

Both sadly only remove univariate outliers

#

Depending on your use case you might have multivariate outliers, personally I don't really go that far 🤣

zealous badger Sep 10, 2023, 2:27 PM

#

pithink

past meteor Sep 10, 2023, 2:41 PM

#

zealous badger <:pithink:652247559909277706>

So imagine you have a variable age that is from 0 to 120 and a variable income that is from 0 to 1M. 12 years old isn't an outlier and 50k income neither but together they are.

zealous badger Sep 10, 2023, 2:49 PM

#

past meteor So imagine you have a variable age that is from 0 to 120 and a variable income t...

ah

#

so we use clustering, of some sort pithink

cold osprey Sep 10, 2023, 3:08 PM

#

past meteor So imagine you have a variable age that is from 0 to 120 and a variable income t...

ure just jealous a 12 y.o. has higher income that u did at 12

past meteor Sep 10, 2023, 3:08 PM

#

cold osprey ure just jealous a 12 y.o. has higher income that u did at 12

Nope, I remember Lil Tay and she's definitely unhappy

cold osprey Sep 10, 2023, 3:08 PM

#

oh shit

#

not heard that name in a while

past meteor Sep 10, 2023, 3:08 PM

#

zealous badger so we use clustering, of some sort <:pithink:652247559909277706>

Theree's many methods and I think clustering is one of them yeah

weak mortar Sep 10, 2023, 4:32 PM

#

interestingly high amount of people who come in, state a question, someone take time to help them and explain the problem, but they just never reply with a thanks or give a hint that they received the instructions 🤷

desert oar Sep 10, 2023, 4:52 PM

#

weak mortar interestingly high amount of people who come in, state a question, someone take ...

typical behavior online

weak mortar Sep 10, 2023, 5:17 PM

#

i've been playing a bit with the function numpy.random.normal(loc=mu, scale=sigma) - to create artificial data with similar properties as the input data(brownian motion). are there other methods to use in brownian motion to generate the next value in a series, instead of gaussian distribution? should i plot the deviations(mu) of the input data and see whether they are evenly distributed across the bellcurve?

raw compass Sep 10, 2023, 8:24 PM

#

I was thinking about how it would be possible to train a model, and prompt it like this "open the file manager and create a new folder". I mean the mouse interaction is not that hard, but I suppose this model would need to have a huge knowledge about the operating system, and the state, so it can see the files, and stuff.

Is there any paper which describes something similar?

viscid ether Sep 10, 2023, 8:32 PM

#

Has anyone done chatbot projects?

left tartan Sep 10, 2023, 9:37 PM

#

weak mortar i've been playing a bit with the function numpy.random.normal(loc=mu, scale=sigm...

Is this in a finance context?

weak mortar Sep 10, 2023, 10:21 PM

#

Yes plan was to generate a few fantasy datasets when im backtesting to assess robustability, but "got lost in the dataframes" and started exploring it a bit in depth

left tartan Sep 10, 2023, 10:22 PM

#

Oh, as you can imagine; there’s tons of prior work here… such as https://en.m.wikipedia.org/wiki/Geometric_Brownian_motion

Geometric Brownian motion

A geometric Brownian motion (GBM) (also known as exponential Brownian motion) is a continuous-time stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion (also called a Wiener process) with drift. It is an important example of stochastic processes satisfying a stochastic differential equation (SDE); ...

#

There’s even sample Python code in that

weak mortar Sep 10, 2023, 10:23 PM

#

I just used the built in generate ohlc data fron my framework but noticed that price would often go below 0 , so had to do it myself

#

Ok thx i have a look

#

Yes brownian motion is what im doing

#

I see some variations of it are elaborated in the link 👍 a bit heavy stuff with all the math 😮‍💨

left tartan Sep 10, 2023, 10:28 PM

#

weak mortar I see some variations of it are elaborated in the link 👍 a bit heavy stuff with...

Yah, it’s a pretty well studied space due to black-scholes

weak mortar Sep 10, 2023, 11:56 PM

#

i am looking at histograms on the distribution of the calculated motions vs the real pct differences ... something is definitely off.

#

when i print out the numbers i see they are conclusively similar, just the calculated motion is 1000 times bigger than the real one

print((returns[0].mean())*10000)
print((data['Close'].pct_change().mean())*10000)

0.39038393...
0.00039782....

shut girder Sep 11, 2023, 12:04 AM

#

Hello, as a beginner to data analytics. Does anyone know where I can find projects? I want to gain experience and getting myself more familiar with the libraries that are commonly used in Data Analytics.

serene scaffold Sep 11, 2023, 12:05 AM

#

shut girder Hello, as a beginner to data analytics. Does anyone know where I can find projec...

have you done the Kaggle tutorial for Pandas?

shut girder Sep 11, 2023, 12:05 AM

#

serene scaffold have you done the Kaggle tutorial for Pandas?

Not yet

left tartan Sep 11, 2023, 12:06 AM

#

weak mortar i am looking at histograms on the distribution of the calculated motions vs the ...

Happy to look, maybe share more code? Not sure what you’re comparing.

#

But if bottom is supposed to be market actual, I think you have an error.

#

Or maybe not, I dunno: day to day movement is going to be very small percentages overall.

serene scaffold Sep 11, 2023, 12:08 AM

#

shut girder Not yet

one would use pandas a lot to slice and dice tabular data that fits in memory, so I recommend getting comfortable with it. ||(inb4 someone says you should just learn polars.)||

left tartan Sep 11, 2023, 12:09 AM

#

serene scaffold one would use pandas a lot to slice and dice tabular data that fits in memory, s...

||but what about duckdb||

serene scaffold Sep 11, 2023, 12:11 AM

#

left tartan ||but what about duckdb||

is that a meme?

left tartan Sep 11, 2023, 12:11 AM

#

Oh, I’m making fun of myself

marble spindle Sep 11, 2023, 1:27 AM

#

Hello, I'm wondering if anyone could help me with a tiny problem. I need to get some stats (mean, quartiles, var, min, max, std) from a dataframe, I am able to get most of them with pd.describe except for the var, whic i gotta generate in a separate dataframe; I've tried concatenating them so it's all nice in a single df but the var df is vertical and the describe df is horizontal, so i can't join them up properly :<

#

Sorry if this isn't the place to ask btw lol

weak mortar Sep 11, 2023, 1:37 AM

#

alright ill construct a minimal example for you to look at billybobby, would be good to get confirmed whether i am calculating it properly. maybe it do just look like that, as you indicated

weak mortar Sep 11, 2023, 2:47 AM

#

import pandas
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
ticker_symbol = "^IXIC"  # ^GSPC is the symbol for US100
start_date = "2003-01-01"
end_date = "2023-01-01"
data = yf.download(ticker_symbol, start=start_date, end=end_date)
data.index = pandas.to_datetime(data.index, unit='ms')  
print(data)
n_simulations = 10
n_rows = len(data)
mu = data['Close'].pct_change().mean() 
sigma = data['Close'].pct_change().std() 
initial_price = data['Close'].iloc[-1]  
prices = [initial_price]
returns = np.random.normal(mu, sigma, size=(n_simulations, n_rows)) 
cumulative_returns = np.cumprod(1 + returns, axis=0)
price_simulations = prices[-1] * cumulative_returns
simulationsDF = pandas.DataFrame(price_simulations)
simresults=simulationsDF.iloc[-1]
print(simulationsDF)
print(simresults)
fig, axes = plt.subplots(2, 1, figsize=(10, 10),sharex=True)
axes[0].hist(returns[0], bins=100, edgecolor='black')
axes[0].set_title('Brownian motion simulation daily pct change')
axes[0].set_xlim(xmin=-0.05, xmax=0.05)
axes[1].hist(data['Close'].pct_change(), bins=100, edgecolor='black')
axes[1].set_title("US100 daily pct change")
axes[1].set_xlim(xmin=-0.05, xmax=0.05)
plt.show()

i changed it to take data from yahoo finance so you can run it, i was using a local csv file before. the effect is less pronounced, either because its a different asset or because it is a lot less candles

#

i tried to make the histograms of equal size by setting bins and limit the x scale of both plots, but without success

#

also had a third plot of the simulations(the var simresults) but at a point it broke and couldnt get it to work again

left tartan Sep 11, 2023, 3:41 AM

#

weak mortar also had a third plot of the simulations(the var simresults) but at a point it b...

I’ll take a look tomorrow, mind firing me a dm reminder?

abstract wasp Sep 11, 2023, 4:17 AM

#

vestal spruce You need to give us more context of your problem, perhaps share a snippet of the...

Hi, yeah, this is my code .-.

vestal spruce Sep 11, 2023, 4:20 AM

#

abstract wasp Hi, yeah, this is my code .-.

you can use triple (`) and programming language name to make a snippet like this:

if __name__ == "__main__":
  print("Hello World!")

vestal spruce Sep 11, 2023, 4:21 AM

#

abstract wasp Hi, yeah, this is my code .-.

Or again, as I've suggested before, perhaps use the #1035199133436354600 forum instead.

abstract wasp Sep 11, 2023, 4:23 AM

#

vestal spruce Or again, as I've suggested before, perhaps use the <#1035199133436354600> forum...

I posted it there

vestal spruce Sep 11, 2023, 4:23 AM

#

abstract wasp I posted it there

Ah yes, I just searched it and found your post, it's already closed apparently.

abstract wasp Sep 11, 2023, 4:25 AM

#

`import numpy as np
from matplotlib import pyplot as plt
import os

import tensorflow as tf

train_images = '/kaggle/input/cnn-test/geohashing_images/train'
classes = os.listdir(train_images)
print(classes)

data = tf.keras.utils.image_dataset_from_directory(train_images, batch_size=5)

data_iterator = data.as_numpy_iterator()
batch = data_iterator.next()

fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
ax[idx].imshow(img.astype(int))
ax[idx].title.set_text(batch[1][idx])

data = data.map(lambda x,y: (x/255, y))

scaled_iterator = data.as_numpy_iterator()
batch = scaled_iterator.next()

fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
ax[idx].imshow(img)
ax[idx].title.set_text(batch[1][idx])

train_size = int(len(data).8)
val_size = int(len(data).1)+1
test_size = int(len(data)*.1)

train = data.take(train_size)
val = data.skip(train_size).take(val_size)
test = data.skip(train_size+val_size).take(test_size)

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout

model = models.Sequential(
[
layers.Conv2D(16, (3, 3), 1, activation='relu', input_shape=(256, 256, 3)),
layers.MaxPooling2D(),

    layers.Conv2D(32, (3, 3), 1, activation='relu'),
    layers.MaxPooling2D(),
    
    layers.Conv2D(16, (3, 3), 1, activation='relu'),
    layers.MaxPooling2D(),

    layers.Flatten(),
    
    layers.Dense(256, activation='relu'),
    layers.Dense(1, activation='sigmoid')
]

)`

vestal spruce Sep 11, 2023, 4:26 AM

#

abstract wasp `import numpy as np from matplotlib import pyplot as plt import os import tenso...

What are you using to build your project? code editor (Visual Studio Code) or IDE (PyCharm)?

abstract wasp Sep 11, 2023, 4:27 AM

#

vestal spruce What are you using to build your project? code editor (Visual Studio Code) or ID...

Kaggle, I tired jupyter notebook but there was some sort of issue with the kernel that kept dying.

vestal spruce Sep 11, 2023, 4:34 AM

#

abstract wasp Kaggle, I tired jupyter notebook but there was some sort of issue with the kerne...

Ok I see the problem, you might want to check on your post, I've sent you an answer there. Test it out and see if it work now.

cursive venture Sep 11, 2023, 8:44 AM

#

this is a really cool field ngl

delicate gyro Sep 11, 2023, 9:33 AM

#

someone pls help, my model is overfitting lol (asked on python-help)

lapis sequoia Sep 11, 2023, 10:57 AM

#

can anyone explain to me how the yolov8 detect head works exactly? everyone just says its a decoupled head and its anchor free but i'm not understanding how from the 3 feature maps sizes we get out of the neck we make the predictions exactly.
here (https://github.com/akashAD98/yolov8_in_depth) it says the size is 4 * reg_max but what is reg max?
i read this https://github.com/ultralytics/ultralytics/issues/2951 but couldn't really understand the idea of different anchor box and scales etc and how why we have this shape [batch_size, num_anchors * (5 + num_classes), height, width]
I would be really grateful if someone can explain to me

GitHub

GitHub - akashAD98/yolov8_in_depth: Understand yolov8 structure,cus...

Understand yolov8 structure,custom data traininig. Contribute to akashAD98/yolov8_in_depth development by creating an account on GitHub.

GitHub

Understanding YOLOv8 output from the Detect head in order to do an ...

Search before asking I have searched the YOLOv8 issues and discussions and found no similar questions. Question Hi everyone, For my master thesis, I am doing an implementation from scratch of YOLOv...

weak mortar Sep 11, 2023, 12:18 PM

#

left tartan I’ll take a look tomorrow, mind firing me a dm reminder?

Thanks. Whenever you have the time. I read a bit on wiki and come to understand that brownian motion simulations* doesnt account for the changes in volatility over time, but you can make a function of it called probability density function that accounts for it, but its getting a bit above my head

left tartan Sep 11, 2023, 12:22 PM

#

weak mortar Thanks. Whenever you have the time. I read a bit on wiki and come to understand ...

A pdf is just (eli5) a function that creates the charts you showed above (ok technically it’s a mass function since it’s discretized)

#

Yes, that’s the problem with simulating the market: volatility changes, so when regimes change, predictions go askew

#

(That’s “a” problem, not the.. plenty of other problems, like black swans and market manipulation)

weak mortar Sep 11, 2023, 12:29 PM

#

I see how this could be accounted for though, to some extend. Like if the return is multiplied with some low frequency oscilating curve with an average of 0 so it doesnt skew the mu. But i wouldnt be able to make that

#

About to pull backwards out of the rabbithole for now i guess. To get things done instead 😄

left tartan Sep 11, 2023, 12:30 PM

#

🙂 this is a deep hole

weak mortar Sep 11, 2023, 12:31 PM

#

Priorities are important. I wish i had the ability to make it, because it would work great. But i better focus on having a minimum product before going full nerd on all the small details

left tartan Sep 11, 2023, 12:33 PM

#

What's your near term / mvp goal?

weak mortar Sep 11, 2023, 1:17 PM

#

More important task atm would be to get multiprocessing work when using sk-opt, then also some work with making the strategy classes work. Just optimizing and automating the process all the way from data input to final test results with robustability assessments in a html with plots

shy grotto Sep 11, 2023, 2:32 PM

#

ok guys I really want to become a data analyst but to learn python with consistency and quickly understand it how?

lapis sequoia Sep 11, 2023, 2:45 PM

#

shy grotto ok guys I really want to become a data analyst but to learn python with consiste...

quickly and consistently, don't come together in my opinion. If you don't have some basic understanding of other programming languages I would recommend starting with: automate the boring stuff with python. Its a little book made for people who don't code and who want to automate their daily life but its in my opinion a good book to start and to become intrested in python

umbral charm Sep 11, 2023, 3:28 PM

#

Guys i just realised my lecturer was one of the developers of sci py

#

thats good for me

serene scaffold Sep 11, 2023, 3:31 PM

#

umbral charm Guys i just realised my lecturer was one of the developers of sci py

hopefully that means they won't teach you javapython or looping over numpy/pandas objects.

silent elk Sep 11, 2023, 3:32 PM

#

As I am a senior full-stack AI dev
I have rich experience with embedding and fine tuning models.
https://theimpactpositivecompany.com
https://chatbot.impactbuilder.app
https://insuranceai.app/
Here is my previous projects
in the first project, I implemented GPT Response and learning with pinecone vector database using OpenAI Embeddings with langchain and built search engines by the embedding.
and second and third project, I implemented PDF uploading and reading the data and then vector store using pinecone + langchain.
And then allow users to chat with gpt based on the provided PDF data.

InsuranceAI: The AI Tool Exclusively for Insurance Agents

InsuranceAI: UNLOCK artificial intelligence for insurance agents

umbral charm Sep 11, 2023, 3:33 PM

#

serene scaffold hopefully that means they won't teach you javapython or looping over numpy/panda...

Yea but i took her for granted first year

#

hopefully she comes back 2nd year foir me

desert oar Sep 11, 2023, 3:41 PM

#

weak mortar i am looking at histograms on the distribution of the calculated motions vs the ...

try using a logarithmic y scale for the 2nd chart

#

https://stats.stackexchange.com/q/378047/36229

Cross Validated

Log-normal returns

Let $P_t$ denote a stock price distributed as $\operatorname{lognormal}(\mu , \sigma^2 )$. Suppose we construct simple returns $R_t=\frac{P_t-P_{t-1}}{P_{t-1}}$.

My question is:

What is the

tacit basin Sep 11, 2023, 3:49 PM

#

Do you have any copilot like tools recommendation that would run locally on CPU and use local models for example code llama?

desert oar Sep 11, 2023, 3:49 PM

#

tacit basin Do you have any copilot like tools recommendation that would run locally on CPU ...

i'd like to know this as well

#

(or something self-hostable with modest GPU requirements)

tacit basin Sep 11, 2023, 3:50 PM

#

For chat lmstudio is nice.

agile cobalt Sep 11, 2023, 3:50 PM

#

I'm pretty sure that there's no way in hell that anything running on a CPU would get a passing grade performance in both speed and quality

tacit basin Sep 11, 2023, 3:50 PM

#

I wonder if there is something like that that would integrate with say vscode

#

Quantized 7/13B models are fast on M2 32G, i could run 34B code llama 4 bit quantized. Not bad I would say

#

Now if I could get some vscode integration like copilot with these models that would be great

agile cobalt Sep 11, 2023, 3:52 PM

#

there seem to exist a quite few options

#

literally just threw "llama" on the marketplace search, cannot really vouch for any of them though

tacit basin Sep 11, 2023, 3:52 PM

#

Yeah I did a search too. Fauxpilot seems to work on GPU only

#

Turbo pilot is early stage but should run on CPU

#

Just wondered if there's a obvious choice like copilot but local lol

agile cobalt Sep 11, 2023, 3:55 PM

#

not really ; maybe try Continue or Wingman?

tacit basin Sep 11, 2023, 3:55 PM

#

Yeah seems will need to install some 'random' tools and try them

agile cobalt Sep 11, 2023, 3:56 PM

#

still, even Copilot's performance (as far as quality goes) is questionable at times, and those will probably be a few tiers worse running local LLMs

tacit basin Sep 11, 2023, 3:56 PM

#

Probably yes

desert oar Sep 11, 2023, 3:57 PM

#

agile cobalt still, even Copilot's performance (as far as quality goes) is questionable at ti...

this is partly why i haven't jumped on the LLM hype train yet, every time i try to use it, i feel like i end up spending more energy fixing the output than i would have spent by just writing it

tacit basin Sep 11, 2023, 3:57 PM

#

Copilot does quite a lot in terms of getting context from vscode. So just code model in lmstudio for chat is not enough.

#

Finetuned wizard coder 34b betas GPT4 on benchmarks or is close.

#

In practice not sure

#

There's a reason GPT4 is a beast of a model

#

I think

past meteor Sep 11, 2023, 4:15 PM

#

desert oar this is partly why i haven't jumped on the LLM hype train yet, every time i try ...

There's definitely ways to make LLMs work. The NLP folk at work pay for a gpt4 sub on the condition that we use it in different ways and see what works and doesn't work.

#

I don't use Copilot but GPT4 does have a positive impact on my productivity. I would not recommend using the free tier under any circumstance.

oak lichen Sep 11, 2023, 4:31 PM

#

Heyy Guys
Im working on a hydroponic plant based Deep Learning Project ,
if anyone has any prior experience
pls DM

serene scaffold Sep 11, 2023, 4:36 PM

#

oak lichen Heyy Guys Im working on a hydroponic plant based Deep Learning Project , if any...

people generally don't want to have to send DMs to figure out what the question is going to be--just ask your whole question in this chat.

oak lichen Sep 11, 2023, 4:38 PM

#

I would be working on Application of DL to evaluate the concentrations of nutrients in hydroponically grown plants

I need help to exactly figure out how to go through this project
and should i have sensor data for the same or image data ?

If someone has a similar prior experience
Pls help me

oak lichen Sep 11, 2023, 4:38 PM

#

serene scaffold people generally don't want to have to send DMs to figure out what the question ...

followed your suggestion
thanks

serene scaffold Sep 11, 2023, 4:39 PM

#

oak lichen I would be working on Application of DL to evaluate the concentrations of nutrie...

sounds like you don't have a specific question yet. try using this channel when you have a specific question.

oak lichen Sep 11, 2023, 4:41 PM

#

serene scaffold sounds like you don't have a specific question yet. try using this channel when ...

edited it
thanks

thick chasm Sep 11, 2023, 4:58 PM

#

Can you please create a topical chat for deep learning

vocal spoke Sep 11, 2023, 5:25 PM

#

Can pl some one help me with yolo training. I cant get the model to train on a custom data set. PL help

serene scaffold Sep 11, 2023, 7:43 PM

#

thick chasm Can you please create a topical chat for deep learning

you can already ask about deep learning in this channel

thick chasm Sep 12, 2023, 4:23 AM

#

anyone please provide some good resources to have a great knowledge about transformers

abstract wasp Sep 12, 2023, 4:31 AM

#

I am using Spyder. I'm reading my files (I only have 3) but when I do so, I get an extra one called .DS_Store. Why is this here and how can I remove it?

royal crest Sep 12, 2023, 4:33 AM

#

https://en.wikipedia.org/wiki/.DS_Store

They are created by MacOS, just ignore it

.DS Store

In the Apple macOS operating system, .DS_Store is a file that stores custom attributes of its containing folder, such as folder view options, icon positions, and other visual information. The name is an abbreviation of Desktop Services Store, reflecting its purpose. It is created and maintained by the Finder application in every folder, and has ...

#

In terms of "hiding", you can choose to use startswith(".") to ignore all hidden files and directies or do e.g. [name for name in os.listdir(DIR) if name != ".DS_Store"] to specifically exclude .DS_Store

#

@abstract wasp

abstract wasp Sep 12, 2023, 4:37 AM

#

royal crest <@804221232210771978>

Ok, thank you!!

solemn void Sep 12, 2023, 6:24 AM

#

dunno where else to put this, but I'm looking for tips on reducing the number of iteratives to speed up compilation time:

#

import graphviz
import pylightxl
import re
FeatIndex = {}
GlobalGraph = graphviz.Digraph('PhiloDilemma', format='png', filename='unix.gv', node_attr={'color': 'lightblue2', 'style': 'filled', 'fixedSize': 'false'}, engine='fdp')

#Adds a node to the DAG
def AddFeat(name, parents, descr, prereqs):
    newFeat = {}
    newFeat['name'] = name
    newFeat['parents'] = parents
    newFeat['description'] = descr
    newFeat['prerequisites'] = prereqs
    newFeat['children'] = []
    FeatIndex[name] = newFeat

def parseString(string=""):
    string = string.lstrip();
    string = string.rstrip();
    if string.count('[') > 0:
        p = string.find('[')
        p2 = string.find(']')
        str1 = string[0:p]
        str2 = string[p2+1:len(string)];
        string = str1+str2
    while string.endswith(' ') or string.endswith(','):
        string = string[0:len(string)-1];
    string = re.sub(r'[^a-zA-Z\s]', '', string)
    return string;

def find(lst, val):
    ret = 0;
    try:
        ret = lst.index(val)
    except ValueError:
        return -1

#

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    print('PyCharm')
    #print(parseString('prereq=[test]test '));
    database = pylightxl.readxl(fn='C:\\Users\\mthom\\Downloads\\feat database.xlsx');
    sheet = database.ws('The Sheet')
    featList = database.ws(ws='The Sheet').col(col=1)
    #FeatList is roughly 3,800 entries long
    for i in range(0, len(featList)):
        featRaw = database.ws(ws='The Sheet').row(row=i+1)
        prereqs = featRaw[12].split(',');
        name = parseString(featRaw[0]);
        parents = [];
        for _pre in prereqs:
            pre = parseString(_pre)
            for i in featList:
                i = parseString(i)
                if i.lower() == pre.lower():
                    parents.append(i)
        AddFeat(name, parents, "", []);
    for i in FeatIndex:
        feat = FeatIndex[i]
        parents = feat.get('parents')
        newParents = parents;

        if len(newParents) > 0:
            for j in parents:
                _feat = FeatIndex[j]
                _par = _feat.get('parents')
                if len(_par) > 0:
                    for p in _par:
                        found = False;
                        for l in newParents:
                            if l == p:
                                found = True;
                        if found:
                            newParents.remove(p)

        for k in newParents:
            GlobalGraph.edge(k, i)
    GlobalGraph.view()```

potent sky Sep 12, 2023, 6:39 AM

#

thick chasm anyone please provide some good resources to have a great knowledge about transf...

I would say start with the original paper, then 'The Illustrated Transformers' article and by then you'll naturally find other resources that work best for you

gloomy anvil Sep 12, 2023, 8:03 AM

#

i have a problem where I need some guidance in what way I can utilize ML to find certain points in a time series. I have about 1 million separate timeseries that all kind of look like the picture attached. Of course every time series is a little different, but generally it kind of looks like it. for every time series i also have a timestamp that indicates a certain event that happens while the signal converges to 0 again. that event is the label that i want to find also for unseen timeseries through machine learning. Most examples found concerning finding certain points in timeseries was about anomaly detection. But I do not find anomalies or certain spikes in the timeseries. I just want to train a network to find a certain point in the timeseries depending on the way the whole timeseries is shaped. how would you go about approaching this? which kind of models or methods would you use? Or maybe you even have a link to a similar problem solution? My simple first approaches (DNNs with one hidden layer) all failed and just kind of returned some arithmetic average point in the timeseries. so if you have any suggestions they are more than welcome.

lapis sequoia Sep 12, 2023, 8:31 AM

#

silent elk As I am a senior full-stack AI dev I have rich experience with embedding and fin...

weird flex but okay

silent elk Sep 12, 2023, 8:32 AM

#

@lapis sequoia , In short, I 'm a full stack web developer / AI engineer

#

Yeah

#

if you want, I can help you

tacit basin Sep 12, 2023, 9:38 AM

#

desert oar i'd like to know this as well

Tried this workflow and it 'works'. Not talking about quality yet, it works but need to test it more:

lmstudio -> wizardcoder13b python -> local inference server -> start server
vscode -> install extension Continue -> setup local server in config file -> profit 🙂
https://marketplace.visualstudio.com/items?itemName=Continue.continue

https://continue.dev/docs/customization#local-models-with-openai-compatible-server

from continuedev.src.continuedev.libs.llm.openai import OpenAI

config = ContinueConfig(
    ...
    models=Models(
        default=OpenAI(
            api_key="EMPTY",
            model="Wizard Coder 13b python",
            api_base="http://localhost:1234/v1", # change to your server
        )
    )
)

past meteor Sep 12, 2023, 11:15 AM

#

gloomy anvil i have a problem where I need some guidance in what way I can utilize ML to find...

So in summary:

You have 1M series
Some time series are labelled with a point of interest on unlabeled series?
You want to use ML to find the point of interest

Is this correct?

#

Assuming the problem is as you've described, you must specify it a bit more clearly. You've mentioned:

I just want to train a network to find a certain point in the timeseries depending on the way the whole timeseries is shaped.

If that's truly what you meant it is P(x_t = point_of_interest | x_1 , x_2, ... x_n). That's uncommon for time series, you're conditioning on the whole thing.

Is your problem not p(x_t = point_of_interest | x_1 , x_2, ... x_t-1)?

quaint loom Sep 12, 2023, 2:54 PM

#

what is wrong? I cannot solve it 😡

https://paste.pythondiscord.com/OZJETWVAW2EXPIXCPYZSVCZYGI

serene scaffold Sep 12, 2023, 2:56 PM

#

quaint loom what is wrong? I cannot solve it 😡 https://paste.pythondiscord.com/OZJETWVAW2...

intervals = [("11:10:00", "11:19:59")]
you should pretty much never be using strings to represent time-related stuff.

#

because you end up with TypeError: '>=' not supported between instances of 'str' and 'datetime.time'

#

looks like you parse them later, I guess

quaint loom Sep 12, 2023, 2:59 PM

#

serene scaffold `intervals = [("11:10:00", "11:19:59")]` you should pretty much never be using s...

Do you have any suggestion how I should do it instead?

serene scaffold Sep 12, 2023, 2:59 PM

#

quaint loom Do you have any suggestion how I should do it instead?

oh, here's your mistake

data['TIME'] = pd.to_datetime(data['TIME'], errors='coerce').dt.time.astype(str)

#

you convert it back to a str at the very end.

#

I assume you did that to work around some other error

quaint loom Sep 12, 2023, 3:01 PM

#

serene scaffold I assume you did that to work around some other error

Python in a rabit hole. I am confused XD

quaint loom Sep 12, 2023, 3:04 PM

#

serene scaffold oh, here's your mistake ```py data['TIME'] = pd.to_datetime(data['TIME'], errors...

I am trying to solve that the module is just looking for the given time interval from the Column called "TIME".

serene scaffold Sep 12, 2023, 3:07 PM

#

quaint loom Python in a rabit hole. I am confused XD

I'm busy now, but you probably need to look at how your TIME column is formatted and parse it in such a way that you won't get so many errors.

#

and no matter what, don't convert it to str

quaint loom Sep 12, 2023, 3:08 PM

#

serene scaffold I'm busy now, but you probably need to look at how your TIME column is formatted...

Please check pm and thank you.

serene scaffold Sep 12, 2023, 3:08 PM

#

quaint loom Please check pm and thank you.

I won't have time, sorry.

quaint loom Sep 12, 2023, 3:09 PM

#

serene scaffold I won't have time, sorry.

Not now but when you have! I would highly appreciate it.
Have a good one

serene scaffold Sep 12, 2023, 3:09 PM

#

quaint loom Not now but when you have! I would highly appreciate it. Have a good one

I won't have time this week

#

my schedule is fucked.

lapis sequoia Sep 12, 2023, 3:25 PM

#

quaint loom Please check pm and thank you.

write here so we can help you

quaint loom Sep 12, 2023, 3:26 PM

#

lapis sequoia write here so we can help you

Thank you. I just left the computer so I will return tomorrow. Last time I was going to ask a question I was faced with stupidity.

gloomy anvil Sep 12, 2023, 3:40 PM

#

past meteor So in summary: * You have 1M series * Some time series are labelled with a poin...

thanks for your reply and sorry that I didnt see it until now. And yes, the problem description is a little uncommon, that's why i am asking here. we have a lot of indepentend timeseries, labeled with a point of interest.

Current simple solution:
currently what we do is take a timeseries and substract -0.5 from every observation. then we look at the timeseries and simply use the timestamp of the last moment the transposed timeseries cuts below x axis as out point of interest. That works in most cases well enough and gets close enough to the actual point of interest. we believe though, that there might be a better solution that could look at each timeseries more individually and find the point of interest more precisely for each timeseries.

Our goal:
Some type of ML solution that has looked at every of our collected timeseries and knows each point of interest. If shown a new and unseen timeseries, it is able to identify a point of interest that is close to the truth, because of its experience with all the seen and labeled data. what kind of algo or ml model would you suggest? I am even struggling to identify a model that suits this problem description, since most timeseries models like LSTMs and such all try to predict the next timestep. but that is not what we want at all. we simply want it to look at a timeseries and identify a certain point of interest.

past meteor Sep 12, 2023, 3:42 PM

#

gloomy anvil thanks for your reply and sorry that I didnt see it until now. And yes, the prob...

So you're definitely in this case: P(x_t = point_of_interest | x_1 , x_2, ... x_n), you're conditioning over the entire series?

gloomy anvil Sep 12, 2023, 3:43 PM

#

yes, the entire timeseries is history and we need to find a point in this timeseries ex post. no prediction of further timesteps or such.

past meteor Sep 12, 2023, 3:44 PM

#

Okay, perfect. Then I have 2 suggestions but they each have the same caveat

gloomy anvil Sep 12, 2023, 3:44 PM

#

i am all ears 🙂

past meteor Sep 12, 2023, 3:48 PM

#

Bidirectional RNNs/LSTMs or whatever do essentially this, they condition on the entire series and make a prediction. They're (or were?) commonly used in machine translation but they could be a good fit because they fit your problem statement.
Use an LSTM as you would normally to generate a latent variable. Basically the "latent" Z = F(x_1, x_2, ... x_n). Then you use an MLP or whatever on top of that to predict if the point is a point of interest, so it G(X_t, Z). Training this one will be more annoying, I'd train it in 2 phases and only attempt it if the first approach fails and you're desperate...

#

The caveat imo is that if you use a regular loss like BCE you don't really account for the fact that if your model says the point of interest is at t=49 but in reality it was at t=50 that's a lot better than the model saying it occurs at t=1000

gloomy anvil Sep 12, 2023, 3:50 PM

#

i dont really understand yet. I have used LSTMs with Keras in the past and i always used it to continue a timeseries. How would I need to implement such a LSTM to classify?

#

also I would need to translate the timeseries into a tensor with a lookback window, right? what would be my input and what would be the output?

#

do you maybe have a code example that i could look at?

past meteor Sep 12, 2023, 3:54 PM

#

gloomy anvil i dont really understand yet. I have used LSTMs with Keras in the past and i al...

This is done a lot in NLP, part-of-speech tagging (POS) is an example of this I think. The more general term is sequence prediction.

gloomy anvil Sep 12, 2023, 3:56 PM

#

okay

#

i will read into this

gloomy anvil Sep 12, 2023, 3:57 PM

#

past meteor 1) Bidirectional RNNs/LSTMs or whatever do essentially this, they condition on t...

https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/ this is what you are referring to, right?

MachineLearningMastery.com

Jason Brownlee

How to Develop a Bidirectional LSTM For Sequence Classification in ...

Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all timesteps of the input sequence are available, Bidirectional LSTMs train two instead of one LSTMs on the input sequence. The first on the input sequence as-is and the second on a reversed copy of...

#

thanks for your help and input! i will read into this topic

past meteor Sep 12, 2023, 3:57 PM

#

gloomy anvil https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classific...

Sequence classification usually means they get 1 label for the entire time series. You want 1 label per time step.

past meteor Sep 12, 2023, 4:01 PM

#

gloomy anvil okay

Maybe the models of machine translation (MTL) would for you? You can reformulate your problem as a seq-to-seq one. You have an input sequence (the time series) and you have an output sequence (zero's and a 1 if it's a point of interest).

In any case, I encourage you to look up bidirectional RNNs and look at a bit of MTL. Most of sequence stuff in neural nets are done in NLP. That's not my field whatsoever but I'm working on time series so it's good to look at their stuff for inspiration.

Also, the sliding window only matters if you don't want to condition on the entire series.

You'll need to tinker with your loss. I'd start out with BCE and maybe add MSE to constrain the model from having very wrong point of interests. Like a weighted average.

slender charm Sep 12, 2023, 4:07 PM

#

im working on the mnist dataset. Initially I used fetch_openml() to get it, but it takes a long time since it redownloads everytime I run the code. I manually downloaded mnist in .arff format. Is there a way to import it manually in the same format fetch_openml() would? As a bunch object?

Or maybe just a way to avoid fetch_openml() from downloading it every time i run?

weak mortar Sep 12, 2023, 5:37 PM

#

quaint loom what is wrong? I cannot solve it 😡 https://paste.pythondiscord.com/OZJETWVAW2...

Its maybe a long shot but have you tried .set_index('TIME') ?

#

With or without inplace=true . For me its a bit of trial and error with those time commands when i have some new data

serene scaffold Sep 12, 2023, 6:36 PM

#

weak mortar Its maybe a long shot but have you tried .set_index('TIME') ?

why would that solve it

quasi sparrow Sep 12, 2023, 7:21 PM

#

quaint loom Please check pm and thank you.

You can ask Chad, GeePT.
It does ok in finding bugs in code, but I wouldn’t use it to code the solution for you.

quasi sparrow Sep 12, 2023, 7:32 PM

#

gloomy anvil i have a problem where I need some guidance in what way I can utilize ML to find...

I don’t think you think ML for this. I would say this is more of a digital signal processing problem.
Do you only want to find that specific point relative to the time series trend?

serene scaffold Sep 12, 2023, 7:33 PM

#

quasi sparrow You can ask Chad, GeePT. It does ok in finding bugs in code, but I wouldn’t us...

Everyone knows about ChatGPT. Suggesting that someone use it isn't that helpful.

gloomy anvil Sep 12, 2023, 7:48 PM

#

quasi sparrow I don’t think you think ML for this. I would say this is more of a digital signa...

basically yes

gloomy anvil Sep 12, 2023, 7:51 PM

#

quasi sparrow I don’t think you think ML for this. I would say this is more of a digital signa...

i mean, i described our current solution in the picture attached which is basically what you proposed. this works "relatively" well, since all timeseries point of interest is where they hit the x axis somewhere around the actual point. but we believe there might be a more precise way when taking the individuality of each curve into consideration.

#

if you have looked at about 100 of these timeseries you would be able to quite confidently point out the actual point of interest

left tartan Sep 12, 2023, 7:52 PM

#

gloomy anvil i mean, i described our current solution in the picture attached which is basica...

Just phrasing it differently: so given a set of time series, each with a single event at a different X, you want a model that predicts X from other / test time series?

gloomy anvil Sep 12, 2023, 7:55 PM

#

left tartan Just phrasing it differently: so given a set of time series, each with a single ...

basically yes. i have a large dataset of independent timeseries labeled with an exact point of interest

gloomy anvil Sep 12, 2023, 7:55 PM

#

gloomy anvil thanks for your reply and sorry that I didnt see it until now. And yes, the prob...

here is a better description linked

gloomy anvil Sep 12, 2023, 7:56 PM

#

gloomy anvil i have a problem where I need some guidance in what way I can utilize ML to find...

and here

left tartan Sep 12, 2023, 7:56 PM

#

I'm just thinking about how to formulate the problem... like, one is to predict how long it takes to get to X. But, the supposition here is that X is a function of the shape of the graph?

gloomy anvil Sep 12, 2023, 7:58 PM

#

that is what i am struggling with as well. most time series models just want to predict the next step. but here you have a sequence and a human can quite quickly learn where the point of interest is in a timeseries when looking at some samples. so i feel like this gut feeling can be modeled with an ml model

left tartan Sep 12, 2023, 7:59 PM

#

X could hypothetically be merely a function of the integral (total area to date?) too, right?

#

Yah, I get that this is a ml question, I’m just doing the usual trying to understand the edges of the problem

gloomy anvil Sep 12, 2023, 8:00 PM

#

the integral is the entire area under the curve, right? sorry english is not my first language. so that wouldnt solve it.

#

our closest and easiest solution that works well enough is to simply substract 0.5 and check where the curve hits the x-axis. that works generally well enough for now. but there is still much room for improvement.

#

another user proposed bidirectional LSTMs. so that is what i am looking at right now. but if you have other suggestions i am very willing to read into it

#

https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/

MachineLearningMastery.com

Jason Brownlee

How to Develop a Bidirectional LSTM For Sequence Classification in ...

Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all timesteps of the input sequence are available, Bidirectional LSTMs train two instead of one LSTMs on the input sequence. The first on the input sequence as-is and the second on a reversed copy of...

left tartan Sep 12, 2023, 8:05 PM

#

it's a tough one, it sounds like something I'd need to play with teh understand teh relationships at play

#

Like, what's the underlying mechanism at play?

gloomy anvil Sep 12, 2023, 8:06 PM

#

well actually we are looking at a detector signal and the point of interest is a certain particle passing through an accelerator at that time

#

we have the detector signal on the one hand and on the other hand we can say the particle passed through at that point of time with a high enough degree of certainty. the time series is actually sampled at a frequency of 2 nano seconds

quasi sparrow Sep 12, 2023, 8:09 PM

#

You could take the derivative of the signal and find the points of inflection in that signal.

gloomy anvil Sep 12, 2023, 8:10 PM

#

quasi sparrow You could take the derivative of the signal and find the points of inflection in...

tried that already. but the -0.5 solution works better in practice

#

i mean in the picture i kind of depicted an idealized curve. in practice there are disturbances and interferences in the signals

quasi sparrow Sep 12, 2023, 8:14 PM

#

Look into machine learning/deep learning for radio frequency. Maybe you can find some ideas there

gloomy anvil Sep 12, 2023, 8:14 PM

#

but as a human you can tell quite confidently where the point is if you looked at enough timeseries and know what your are looking for

#

our current next approach is to use bidirectional lstms. and maybe trying some noise reduction on the time series and such.

quasi sparrow Sep 12, 2023, 8:15 PM

#

I still don’t think it is a ML/DL problem, and more of a DSP problem, but I could be wrong. I don’t have much experience in ML or DL 😰

left tartan Sep 12, 2023, 8:16 PM

#

gloomy anvil but as a human you can tell quite confidently where the point is if you looked a...

What's the frequency distribution / pdf look like?

gloomy anvil Sep 12, 2023, 8:17 PM

#

do you mean how much they vary? we normalized all timeseries and lined the initial spike up if thats what you mean

left tartan Sep 12, 2023, 8:17 PM

#

quasi sparrow I still don’t think it is a ML/DL problem, and more of a DSP problem, but I coul...

yah, I always try to understand classically before throwing any ml/dl/ai techniques, usually because I need to reframe the question.

left tartan Sep 12, 2023, 8:17 PM

#

gloomy anvil do you mean how much they vary? we normalized all timeseries and lined the initi...

Like, you mentioned the 0.5 point... like, how accurate is that? How does it deviate from that?

gloomy anvil Sep 12, 2023, 8:19 PM

#

ah okay, deviation from it is like +/- 15-20%. sounds like much but is actually already good enough. but yeah, i believe we can make it even better, since you can point to the point of interst with gut feeling

left tartan Sep 12, 2023, 8:29 PM

#

gloomy anvil ah okay, deviation from it is like +/- 15-20%. sounds like much but is actually ...

Hmm, let me think about this a bit, it's an interesting question.

#

I'm still inclined to try to reformulate this somehow. Like, is it distance from a significant peak (as your graphs suggest)?

gloomy anvil Sep 12, 2023, 8:55 PM

#

left tartan I'm still inclined to try to reformulate this somehow. Like, is it distance from...

We tried already formulating it as relative distance to the initial peak and such and relative to the whole timeseries but so far our simple -0.5 approach worked the best and ML is some kind of Hail Mary to make it more precise

#

I was also thinking about self organizing maps (there is a really nice package called SuSi for Python) and SVMs. Just throwing it out there in case it inspires you to something 😄

quasi sparrow Sep 12, 2023, 9:14 PM

#

gloomy anvil We tried already formulating it as relative distance to the initial peak and suc...

Hmm what about reconstructing the signal’s function from samples?

#

I did this for homework in grad school but I couldn’t find the code that I used :/

red hound Sep 12, 2023, 9:21 PM

#

Hi guys, I encountered a mysterium regarding pandas (at least it's a mysterium for me). I have a function like the one below, which is taking a df as input, filtering it (assigning the filtering results to the original df variable) and then doing some changes on slices of the dataframe before returning the result df.

def do_something(df, timestamp):
  df = df[df["column_a"] > 10]
  df["column_b"] = pd.DataFrame(timestamp)
  
  return df

What confuses me here is, that I get a SettingWithCopyWarning for the second line of my function, caused by the first line. I know in general, why this warning comes up and what it means. But to my knowledge, the first line should simply manipulate the original df and not create any temporary views/subsets. Can someone explain me, why this is happening here?

left tartan Sep 12, 2023, 9:33 PM

#

The first line returns a view/subset of the original datafram

left tartan Sep 12, 2023, 9:34 PM

#

red hound Hi guys, I encountered a mysterium regarding pandas (at least it's a mysterium f...

So, the warning is saying: "Hey, be careful, you're modifying a view/subset of the original dataframe... not the original dataframe". It's really easy to get mixed up when you operate on views.

#

"the first line should simply manipulate the original df and not create any temporary views/subsets.": That's not at all what's happening

#

The right side: df[df["column_a"] > 10] is returning a view of the original DF. Then, you used df = <the new view>. So now df points to a view, not the original df. This is all probably bad practice... it'd be better to do:

#

import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({"column_a": [1,2,3,4,5,6,7]})
def do_something(df, value):
    df.loc[df["column_a"] < 3, "column_b"] = value
do_something(df, datetime.datetime.now())
print(df)

red hound Sep 12, 2023, 9:52 PM

#

But when I explicitly override the reference on the original df, there is not much room for issues or am I wrong? What I honestly dislike about your solution is, that when I have lots of assignments in my function, I will have to write a lot of boilerplate code. And when I explicitly copy in the first line, I will have higher memory consumption. That does not make me happy, really. But thanks for your explanation

left tartan Sep 12, 2023, 9:53 PM

#

red hound But when I explicitly override the reference on the original df, there is not mu...

Why do you think you'll have higher memory consumption?

#

My solution was really just one line: df.loc[df["column_a"] < 3, "column_b"] = value

#

Whereas your version was two: ```
df = df[df["column_a"] > 10]
df["column_b"] = pd.DataFrame(timestamp)

red hound Sep 12, 2023, 9:57 PM

#

Sorry for the confusion, what I meant with "explicitly copy in the first line" is this:

def do_something(df, timestamp):
  df = df[df["column_a"] > 5].copy()
  df["column_b"] = pd.DataFrame(timestamp)
  return df

which should also be a viable solution.

What I meant with boilerplate is:

def do_something(df):
  df.loc[df["column_a"] < 3, "column_b"] = value_a
  df.loc[df["column_a"] < 3, "column_c"] = value_b
  df.loc[df["column_a"] < 3, "column_d"] = value_c
  df.loc[df["column_a"] < 3, "column_e"] = value_d
  df.loc[df["column_a"] < 3, "column_f"] = value_e
  return df

#

When I would be able to filter the dataframe once, before applying all the assignments, I would not have to rewrite the filter over and over again

left tartan Sep 12, 2023, 9:58 PM

#

red hound When I would be able to filter the dataframe once, before applying all the assig...

You could just do: df.loc[df["column_a"] < 3, ["column_b", "column_c"]] = ('a', 'b')

red hound Sep 12, 2023, 10:00 PM

#

Well 😄

#

That would be a bit hard to read for assignments like this

df["column_a"] = value.astype("int32").fillna(2).replace(0, 1)

This kind of scenario is pretty unsatisfying, as I encounter such quite often

left tartan Sep 12, 2023, 10:05 PM

#

I'm just commenting on the problem as stated: You want to update column(s) based on a condition with one or multiple constants. I believe what I proposed is the most efficient (barring a numpy solution) and cleanest way to do it. Making a copy seems unnecessary for what you're describing (altho a copy isn't as expensive as it sounds)

#

You could make it more readable with something like: py condition = df["column_a"] < 3 columns = ['column_a', ...] values = (val1, val2, ....) df.loc[condition, columns] = values

red hound Sep 12, 2023, 10:09 PM

#

I will experiment a bit and see, what it will look like. Thanks for your help!

magic dune Sep 13, 2023, 4:28 AM

#

can someone help me with nueral networks

#

💀

shut girder Sep 13, 2023, 4:47 AM

#

Hello, does anyone know why NumPy is used in Data Analytics?

quaint loom Sep 13, 2023, 4:53 AM

#

How can I change my threshold value according to value that adjusts smoothly with gradual changes but responds quickly and noticeably when there's a sudden change. The threshold value should also have some kind of negative value. If it drops significant below a given slope, it should also be considered as a error. https://paste.pythondiscord.com/OZJETWVAW2EXPIXCPYZSVCZYGI

echo lance Sep 13, 2023, 6:37 AM

#

I made a code that generates the data of satellite position for 2 years, every 1 minute .. it will take 186 hrs to run .. where I can run this code for free ?... edit 860 hrs

cold osprey Sep 13, 2023, 6:41 AM

#

Huh 186 hrs

echo lance Sep 13, 2023, 6:44 AM

#

Here it is

cold osprey Sep 13, 2023, 6:47 AM

#

What does fill_loc do?

echo lance Sep 13, 2023, 6:48 AM

#

Gets the time from df_loc, gets the orbital data from df_sat and calculate the coordinates of satellite and saves against time in df_loc. The time range is 1 jan 2021 to today ..gap of 30 seconds

cold osprey Sep 13, 2023, 6:49 AM

#

So like a join of sorts?

echo lance Sep 13, 2023, 6:49 AM

#

Yeah.. but the main location calculation is taking time

cold osprey Sep 13, 2023, 6:49 AM

#

Hard to help without details

echo lance Sep 13, 2023, 6:50 AM

#

Hmm.. is there any server which will give me this much cpu time for free? Where I can run this code and collect the csv after 10 days

echo lance Sep 13, 2023, 7:24 AM

#

cold osprey Hard to help without details


def fill_loc(df_tle: pd.DataFrame, df_loc: pd.DataFrame) -> pd.DataFrame:
    cols = ['mean_motion', 'eccentricity', 'inclination',
       'ra_of_asc_node', 'arg_of_pericenter', 'mean_anomaly', 'rev_at_epoch',
       'bstar', 'mean_motion_dot', 'mean_motion_ddot', 'semimajor_axis',
       'period', 'apoapsis', 'periapsis']
    tle_idx = df_tle.index
    j=0; count=0; flag=0;
    for i in tqdm(df_loc.index):
        t1 = df_loc.loc[i, 'date']
        if df_tle.loc[tle_idx[j+1], 'epoch'] < t1:
            j += 1
        df_loc.loc[i, cols] = df_tle.loc[tle_idx[j], cols]
        loc = get_live_data(*df_tle.loc[tle_idx[j],['tle_line1','tle_line2']] , t1)
        pos_lst = ['lat','lon','h','vx','vy','vz']
        df_loc.loc[i, pos_lst] = loc.values
    return df_loc

Any possible optimization???

pulsar stone Sep 13, 2023, 9:00 AM

#

Hi, i am trying to load CSV and PDF files into a vector database using glob to search for the files and Langchain Document Loaders to load them in. using this code:

for file in Path(cfg.DATA_PATH).rglob('*.csv'):
        filecsvint += 1
        print(f'[{datetime.now().strftime("%H:%M:%S")}] Loading {file} into vector database | Document Nr. {filecsvint}')
        try:
            print(f'[{datetime.now().strftime("%H:%M:%S")}] Loaded {file} successfully into vector database | Document Nr. {filecsvint}')
            documents.append(CSVLoader(file))
            filecsvintsucc += 1
        except Exception as e:
            filecsvintfail += 1
            print(f'[{datetime.now().strftime("%H:%M:%S")}] Could not load {file} into vector database | Document Nr. {filecsvint}')
            print(e)

    fileint = 0
    fileintsucc = 0
    fileintfail = 0
    for file in Path(cfg.DATA_PATH).rglob('*.pdf'):
  # Convert to string representation
        fileint += 1
        print(f'[{datetime.now().strftime("%H:%M:%S")}] Loading {file} into vector database | Document Nr. {fileint}')
        try:
            
            print(f'[{datetime.now().strftime("%H:%M:%S")}] Loaded {file} successfully into vector database | Document Nr. {fileint}')
            documents.append(PyPDFLoader(file))
            fileintsucc += 1
        except Exception as e:
            fileintfail += 1
            print(f'[{datetime.now().strftime("%H:%M:%S")}] Could not load {file} into vector database | Document Nr. {fileint}')
            print(e)```

Gives me
```bash
argument of type "PosixPath" is not iterable```
I tried this too:
```py
for file in Path(str(cfg.DATA_PATH)).rglob('*.csv'):
    # Rest of the code...

for file in Path(str(cfg.DATA_PATH)).rglob('*.pdf'):
    # Rest of the code...```
It gave me the same error.
Can anyone help me?

#

hope that belongs in here

young granite Sep 13, 2023, 9:26 AM

#

pulsar stone Hi, i am trying to load CSV and PDF files into a vector database using glob to s...

normally rather in the normal #1035199133436354600
but can u give the full traceback?

pulsar stone Sep 13, 2023, 9:26 AM

#

young granite normally rather in the normal <#1035199133436354600> but can u give the full tr...

As in the full Console output?

young granite Sep 13, 2023, 9:26 AM

#

the full error yes

pulsar stone Sep 13, 2023, 9:27 AM

#

sure, one sec

pulsar stone Sep 13, 2023, 9:30 AM

#

young granite the full error yes

https://paste.pythondiscord.com/GEZQ

#

Needed to cut it a bit cuz its hundreds of files that have the same error

#

It says loaded only cuz i was too stupid to put the log after it actually tries to load lol

young granite Sep 13, 2023, 9:32 AM

#

pulsar stone Hi, i am trying to load CSV and PDF files into a vector database using glob to s...

this isnt ur complete code?

AttributeError: 'CSVLoader' object has no attribute 'page_content'

the function is not defined and it seems to be a problem with that function

pulsar stone Sep 13, 2023, 9:34 AM

#

young granite this isnt ur complete code? AttributeError: 'CSVLoader' object has no attribute...

Not its not the full code, and yea that errors too. The one nagging me rn is the argument of type "PosixPath" is not iterable

pulsar stone Sep 13, 2023, 9:41 AM

#

young granite this isnt ur complete code? AttributeError: 'CSVLoader' object has no attribute...

https://paste.pythondiscord.com/YA6A this is the full

young granite Sep 13, 2023, 9:49 AM

#

pulsar stone Not its not the full code, and yea that errors too. The one nagging me rn is the...

thats a problem with the Path lib

#

can u print out the content of cfg for me

pulsar stone Sep 13, 2023, 9:51 AM

#

Sure

pulsar stone Sep 13, 2023, 9:52 AM

#

young granite can u print out the content of cfg for me

{'RETURN_SOURCE_DOCUMENTS': True, 'VECTOR_COUNT': 2, 'CHUNK_SIZE': 500, 'CHUNK_OVERLAP': 50, 'DATA_PATH': PosixPath('data'), 'DB_FAISS_PATH': 'vectorstore/db_faiss', 'MODEL_TYPE': 'llama', 'MODEL_BIN_PATH': 'model/llama-2-7b-chat.ggmlv3.q8_0.bin', 'MAX_NEW_TOKENS': 256, 'TEMPERATURE': 0.01}

#

Ooooh

young granite Sep 13, 2023, 9:53 AM

#

😄

pulsar stone Sep 13, 2023, 9:53 AM

#

no

#

Huh

#

tay@dedi:~/AiAssistant/API$ python3 db_build.py
[11:53:48] Building vector database from data/
Traceback (most recent call last):
File "/home/tay/AiAssistant/API/db_build.py", line 85, in <module>
run_db_build()
File "/home/tay/AiAssistant/API/db_build.py", line 33, in run_db_build
for file in cfg.DATA_PATH.rglob('*.csv'):
AttributeError: 'str' object has no attribute 'rglob'

young granite Sep 13, 2023, 9:55 AM

#

u should start to read the Tracebacks 😛

pulsar stone Sep 13, 2023, 9:58 AM

#

Yea, i am not that expierienced with Python at all

#

And now we are back to the phosixpath error. Cuz i got told to make cfg.DATA_PATH a Path Object. Something is clearly wrong here

#

gonna try ur approach but idk how to fix the glob issue

#

nope nothing works

#

Okay i somehow got it

#

Now i have the CSVLoader issue

quaint loom Sep 13, 2023, 11:51 AM

#

quaint loom How can I change my threshold value according to value that adjusts smoothly wit...

Anyone know?

pulsar stone Sep 13, 2023, 12:01 PM

#

pulsar stone Now i have the CSVLoader issue

Solved in Post #1151466638567288852

boreal gale Sep 13, 2023, 12:48 PM

#

!rule 6

arctic wedgeBOT Sep 13, 2023, 12:48 PM

#

Rules

6. Do not post unapproved advertising.

boreal gale Sep 13, 2023, 12:48 PM

#

we don't allow unapproved advertising - please remove your post

fresh osprey Sep 13, 2023, 12:53 PM

#

sorry, removing

boreal gale Sep 13, 2023, 12:53 PM

#

thank you 🙏

quaint loom Sep 13, 2023, 2:54 PM

#

quaint loom How can I change my threshold value according to value that adjusts smoothly wit...

Is there anyone who know if this could be a good solution or is there any other suggestions:
https://paste.pythondiscord.com/XCAA

desert oar Sep 13, 2023, 3:12 PM

#

quaint loom Is there anyone who know if this could be a good solution or is there any other ...

you didn't really explain what you were trying to do. this looks like anomaly detection?

#

so you are trying to maintain some kind of baseline rate of increase, and it's an anomaly if the rate of increase is too low?

#

it looks like in this current system, the baseline slope can only ever increase, never decrease. is that what you want?

#

oh wait that's wrong, i see if it decreases it will still update the slope, downward

#

i'm not sure about the sensitivity threshold for abrupt changes being the same as the adjustment factor, off the top of my head i don't see any convincing reason why those should be the same number

#

i think the idea is sound though, you might want to look into something like EWMA for a principled approach to adjusting this slope

quaint loom Sep 13, 2023, 3:27 PM

#

desert oar you didn't really explain what you were trying to do. this looks like anomaly de...

I found this anomaly detection on the internet and has no clue about it actually. What i am trying to make is a detector that will find a smooth or instant increase which can be difficult to detect just by looking with an eye on a slope. The slope may always have some kind of change weather it increase or decrease. But the slope should never be able to get into a minus condition as we`re talking about FCH4 from an aquatic system. At the same time, the slope will always increase, as you can see from the picture I posten already in this chat.

EWMA is new for me, but if you say that could be a solution, I can take some time to look into that.

desert oar Sep 13, 2023, 3:28 PM

#

quaint loom I found this anomaly detection on the internet and has no clue about it actually...

the slope must always increase, or the level must always increase?

quaint loom Sep 13, 2023, 3:28 PM

#

desert oar i'm not sure about the sensitivity threshold for abrupt changes being the same a...

You may be right here! It can never be the same. Its a little tricky for me to catch this myself.

quaint loom Sep 13, 2023, 3:29 PM

#

desert oar the _slope_ must always increase, or the _level_ must always increase?

The slope in this case, will and should always be upwards or at a constant (+-) rate

desert oar Sep 13, 2023, 3:29 PM

#

quaint loom The slope in this case, will and should always be upwards or at a constant (+-) ...

okay, but that means the slope must be positive, not always increasing

quaint loom Sep 13, 2023, 3:29 PM

#

desert oar okay, but that means the slope must be positive, not always increasing

Right

desert oar Sep 13, 2023, 3:30 PM

#

this code looks like it only treats an anomaly as an unexpectedly large decrease in level, not an unexpectedly large increase

#

what it does do is handle unexpectedly large increases differently, using a different adjustment factor

quaint loom Sep 13, 2023, 3:31 PM

#

quaint loom How can I change my threshold value according to value that adjusts smoothly wit...

@desert oar This is somehow an explanation on small event. The slope can be higher. I can show another picture

desert oar Sep 13, 2023, 3:32 PM

#

quaint loom <@389497659087650836> This is somehow an explanation on small event. The slope ...

the picture makes sense now that i've seen the code and you've explained more what you're trying to do

quaint loom Sep 13, 2023, 3:33 PM

#

desert oar the picture makes sense now that i've seen the code and you've explained more wh...

desert oar Sep 13, 2023, 3:33 PM

#

ah wait i was doubly wrong, they're doing the rapid adjustment for both positive and negative changes

#

so yeah this code seems like it does what you want, but i strongly suggest spending the time to understand it. you will also need to tune these adjustment parameters

#

and remind me again: what do you consider an "anomaly"?

quaint loom Sep 13, 2023, 3:35 PM

#

desert oar and remind me again: what do you consider an "anomaly"?

A slope with approxemently R^2 = 0.85 or higher is considered as nomaly

quaint loom Sep 13, 2023, 3:35 PM

#

desert oar so yeah this code seems like it does what you want, but i strongly suggest spend...

Thank you for that.

desert oar Sep 13, 2023, 3:36 PM

#

quaint loom A slope with approxemently R^2 = 0.85 or higher is considered as nomaly

R^2?

quaint loom Sep 13, 2023, 3:36 PM

#

And yes, I should. I have just not be able to run the full code yet as I still struggle with the first part : P

quaint loom Sep 13, 2023, 3:36 PM

#

desert oar R^2?

The coefficient of determination

desert oar Sep 13, 2023, 3:37 PM

#

quaint loom The coefficient of determination

with respect to what? that's not a slope

#

R^2 tells you how linear something is, it doesn't tell you how steep the line is

quaint loom Sep 13, 2023, 3:41 PM

#

Ehm. The slope is chosen according the a calculation for the FCH4. If the R^2 is not above (i.e 0.85 or above), the timeinterval when the measurment is done, as to be smaller. Ex if the time the measurment was 11:14:59 to 11:24:59 and the slope is 0.57. The time interval would be adjusted in order to get a better R^2.

desert oar Sep 13, 2023, 3:44 PM

#

quaint loom Ehm. The slope is chosen according the a calculation for the FCH4. If the R^2 is...

i don't know what you mean by this

#

it sounds like you're talking about shrinking your measurement time intervals until the linear approximation reaches a certain minimum acceptable level of correctness

#

the slope and the R^2 are not the same thing. do not get them confused

#

R^2 tells you how well the data matches any straight line. slope tells you how steep that line is.

#

you can have an R^2 of 1 but slope of 0

quaint loom Sep 13, 2023, 3:46 PM

#

quaint loom

Acoording this this figure, a diffusive flux of methane is C-D = example R^2. But if you want to catch ebullition event which can happen with a smaller or bigger change (D-F), The change of CH4 ebullition concentration (change) by
subtracting the concentration at the point E from the concentration at the point F during the
observation period.

quaint loom Sep 13, 2023, 3:47 PM

#

desert oar R^2 tells you how well the data matches _any_ straight line. slope tells you how...

Yeye, I am familiar with this. Maybe my way of talking is making it a little more confused.

desert oar Sep 13, 2023, 3:47 PM

#

quaint loom Yeye, I am familiar with this. Maybe my way of talking is making it a little mor...

it sounds like you are confused, or deliberately using the wrong terminology

#

i tried to clarify above

quaint loom Sep 13, 2023, 3:49 PM

#

Just so we`re on the same base. Do you understand what I am trying to solve? 🙂

desert oar Sep 13, 2023, 3:49 PM

#

quaint loom Just so we`re on the same base. Do you understand what I am trying to solve? 🙂

i thought i did, but after this talk of R^2 i no longer understand

#

i thought we were talking about slopes

quaint loom Sep 13, 2023, 3:52 PM

#

quaint loom

C-D can have a R^2 that has about 0.85. But the slope itself may after some time have a smooth or sudden increase as you can see here. And yes, my terminology is not the sharpest.

quaint loom Sep 13, 2023, 4:02 PM

#

desert oar i thought i did, but after this talk of R^2 i no longer understand

Maybe ignore the R^2 for now haha.

past meteor Sep 13, 2023, 5:11 PM

#

quaint loom Maybe ignore the R^2 for now haha.

So you want to find anomalies in time series?

#

https://arxiv.org/abs/0710.3742

arXiv.org

Bayesian Online Changepoint Detection

Changepoints are abrupt variations in the generative parameters of a data
sequence. Online detection of changepoints is useful in modelling and
prediction of time series in application areas such as finance, biometrics, and
robotics. While frequentist methods have yielded online filtering and
prediction techniques, most Bayesian papers have focu...

#

Stuff like the chow test tells you if there's a change of slope (but not where)

#

Imo this is a well researched area, if I were you I'd look at existing methods

desert oar Sep 13, 2023, 5:16 PM

#

past meteor So you want to find anomalies in time series?

it seems more like they're interested in individual big deviations from the slope

#

honestly their current technique seems fine

#

it's just EWMA on the trend slope, with the extra caveat that large changes have a greater adjustment factor

#

but large changes also get flagged as outliers, which i think is reasonable

#

so you get a big adjustment, and an alert for it

past meteor Sep 13, 2023, 5:17 PM

#

The problem with EWMA is that is has too many hyperparameters

desert oar Sep 13, 2023, 5:17 PM

#

it has 1 hyperparameter 🤔

past meteor Sep 13, 2023, 5:17 PM

#

My personal opinion

desert oar Sep 13, 2023, 5:18 PM

#

in this case they have 3: the small-change factor, the big-change factor, and the big-change threshold

past meteor Sep 13, 2023, 5:18 PM

#

Depending on how you formulate it but you have your alpha and also the size of the change

desert oar Sep 13, 2023, 5:18 PM

#

traditional EWMA just has a constant change factor

past meteor Sep 13, 2023, 5:18 PM

#

Seting both requires hindsight

desert oar Sep 13, 2023, 5:18 PM

#

yeah but that's about as simple as it gets

#

or simulation and testing

#

if you can simulate relatively realistic data scenarios, then you can basically run preference elicitation experiments on yourself and tune the hyperparameters by hand, using your opinion as a cost function

past meteor Sep 13, 2023, 5:19 PM

#

SPC can work here as well but it suffers from the same problem

desert oar Sep 13, 2023, 5:20 PM

#

what, setting a threshold on standard deviations from the trend?

past meteor Sep 13, 2023, 5:20 PM

#

Yes

desert oar Sep 13, 2023, 5:20 PM

#

that's basically the same as here, just using EWMA to retroactively estimate the trend

past meteor Sep 13, 2023, 5:20 PM

#

I'm just "paranoid" as I was working with thousands of distinct time series in the past on a similar problem and I didn't have the luxury to go into detail on individual ones

desert oar Sep 13, 2023, 5:21 PM

#

i agree that tuning is required but so does any changepoint detection algorithm

#

yeah, any time you need to automate this across 1000s of time series things get a lot more complicated

#

in that case you need an automated tuning procedure and associated cost function, which might be highly task-specific and can be difficult to determine

past meteor Sep 13, 2023, 5:21 PM

#

In the end, after chasing the god particle of ML/stats for too long, I just did SPC

desert oar Sep 13, 2023, 5:21 PM

#

but how did you estimate trend? that's the whole point here

#

otherwise it literally is just setting a deviation threshold, like in traditional SPC (as far as i understand it)

#

imo the EWMA is bordering on the simplest possible trend estimation, other than a flat moving average

past meteor Sep 13, 2023, 5:22 PM

#

My series were more or less trend stationary, or at least they should be

desert oar Sep 13, 2023, 5:23 PM

#

in this case they're asking about something that they expect to steadily increase over time, so they need to estimate trend

past meteor Sep 13, 2023, 5:23 PM

#

Or differencing

desert oar Sep 13, 2023, 5:24 PM

#

yeah but they're expecting the trend itself to change over time

#

at least that's what i understood their intent

#

although the 2nd chart they showed looked more like shifts in intercept with a constant trend

#

so i think also there's some confusion on their end that we can't know the truth about

past meteor Sep 13, 2023, 5:24 PM

#

I mean, it depends on the magnitude of the change. From their image differencing + SPC would have worked.

desert oar Sep 13, 2023, 5:24 PM

#

yeah agreed

#

that's a good point. if you expect the trend to be constant slope but just detecting large level shifts, then yes i agree

#

but if you want to actually model changing trend over time then i think you'd need to actually estimate the trend

past meteor Sep 13, 2023, 5:26 PM

#

But so would EWMA 🤷 it depends on your use case. In general I think checking out a paper like BOCD is still good because you might find out that your method was naive in some ways.

desert oar Sep 13, 2023, 5:26 PM

#

right, i think we agree on these points

#

as for simplicity, EWMA is just slope_curr = slope_prev + adj_factor * diff_curr which is pretty simple imo. their variation is to set adj_factor = adj_factor_large if diff_cur < diff_large_threshold else adj_factor_small which kind of makes sense if you want faster adjustment response on larger inputs

past meteor Sep 13, 2023, 5:26 PM

#

BOCD expects data with gaussian noise so it would not have worked here anyway. You'd need to detrend.

desert oar Sep 13, 2023, 5:26 PM

#

that is, they're doing EWMA on the slope, not on the level

#

maybe that's problematic for some reason because the slope is a rate, but as long as they're using fixed-time intervals it's equivalent

#

if they're using varying time intervals (which it actually does look like they're considering?) then you have issues with units and you might need an harmonic mean instead

past meteor Sep 13, 2023, 5:27 PM

#

I think if we both had the dataset we'd 100 % agree, we're just making different assumptions 🤣

desert oar Sep 13, 2023, 5:27 PM

#

lol yes

#

not just the data, also a clear understanding of the task!

past meteor Sep 13, 2023, 5:31 PM

#

My use case was essentially a forecasting case where we wanted to know if the model is deteriorating (thousands upon thousands of SKUs). My intuition was simply that under normal circumstances the error ~ Gaussian. Spikes in error may occur so we want to isolate cases where the error gets "bad enough" over a period that is "long enough"

#

Obviously the problem is influenced mostly by how you define bad and long enough

desert oar Sep 13, 2023, 5:34 PM

#

past meteor My use case was essentially a forecasting case where we wanted to know if the mo...

that makes sense

#

the hacky approach would be looking at a moving average of error variance

robust plover Sep 13, 2023, 5:38 PM

#

Hi, I need help with an assignment about ARIMA time series. Can I share the doc?

serene scaffold Sep 13, 2023, 5:40 PM

#

robust plover Hi, I need help with an assignment about ARIMA time series. Can I share the doc?

it's not likely that someone will want to walk you through the whole assignment. copy/paste the part you currently want help with (as text), and show what you've tried so far.

#

(do not post screenshots of text--these are difficult to read and refer to)

robust plover Sep 13, 2023, 5:41 PM

#

beside importing libraries haven't done much. ARIMA is a new concept to me did some online tutorials but need some human advice as well if possible

#

ARIMA Modeling:
Implement an ARIMA modeling process using the statsmodels library or a similar library.
Decide on the order (p, d, q) for the ARIMA model based on the characteristics of the simulated data.
Fit the ARIMA model to the simulated data.

Forecasting:
Use the trained ARIMA model to make future forecasts for a specified number of time steps.
Generate forecasts for a period beyond the existing simulated data.

past meteor Sep 13, 2023, 5:42 PM

#

robust plover ARIMA Modeling: Implement an ARIMA modeling process using the statsmodels librar...

Do you already know what the p, d and q mean?

robust plover Sep 13, 2023, 5:42 PM

#

serene scaffold it's not likely that someone will want to walk you through the whole assignment....

@serene scaffold

robust plover Sep 13, 2023, 5:42 PM

#

past meteor Do you already know what the p, d and q mean?

yes

past meteor Sep 13, 2023, 5:43 PM

#

robust plover yes

Do you know how to determine them? (I don't want to "solve" your assignment for you because this way you'll learn more tbh)

robust plover Sep 13, 2023, 5:44 PM

#

past meteor Do you know how to determine them? (I don't want to "solve" your assignment for ...

Not really. First time working with ARIMA and I need some explaining a bit before I can do something. I don't really need someone solving it but rather explaining.

past meteor Sep 13, 2023, 5:45 PM

#

robust plover Not really. First time working with ARIMA and I need some explaining a bit befor...

So P is the AR part of ARMA and the Q is the moving average part. The I stands for "integrated" corresponds to d(ifferencing)

desert oar Sep 13, 2023, 5:45 PM

#

robust plover Not really. First time working with ARIMA and I need some explaining a bit befor...

this is an assignment, so presumably you were taught something in your class. what did they teach you? is there something in the material that you're confused about?

past meteor Sep 13, 2023, 5:46 PM

#

You can determine the AR by looking at the PACF and the MA by looking at the ACF. I encourage you to really read what they are doing and not just read the plots.

robust plover Sep 13, 2023, 5:46 PM

#

desert oar this is an assignment, so presumably you were taught _something_ in your class. ...

I'm a self taught programmer. Took a 100 day online bootcamp. I did work with pandas but not on this scale.

past meteor Sep 13, 2023, 5:47 PM

#

Afterwards I'd use auto.arima (for instance https://github.com/Nixtla/statsforecast) to find p, d and q automatically and compare it to what you found from ACF, PACF and unit root tests

robust plover Sep 13, 2023, 5:49 PM

#

Can I use pycharm to code it or is it only strictly jupyterlab?

past meteor Sep 13, 2023, 5:50 PM

#

But really, before you do this you need a good grasp on what all the letters in the acronym (A R I M A) mean separately and then together (AR, I, MA). That'll make everything a whole lot easier.

robust plover Sep 13, 2023, 5:57 PM

#

@past meteor Thanks for the explanation. Hope it gets me somewhere.

desert oar Sep 13, 2023, 5:57 PM

#

robust plover I'm a self taught programmer. Took a 100 day online bootcamp. I did work with pa...

Yeah but you mentioned an assignment

robust plover Sep 13, 2023, 5:58 PM

#

@desert oar I'm learning new stuff focused on data science so I can land a job and showcase my projects.

#

I don't have have connections that can teach me except on discord.

past meteor Sep 13, 2023, 6:00 PM

#

Books are your friend. Unless you're on a tight schedule and need something done by yesterday you should be following a book imho.

left tartan Sep 13, 2023, 6:12 PM

#

robust plover <@389497659087650836> I'm learning new stuff focused on data science so I can la...

A good place to start on this topic is https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm (6.4.4.5 covers ARIMA). What i like about this text is it's broken into small chunks, with a lot of examples. The entire book is massive, but it's all great stuff.

echo lance Sep 14, 2023, 2:03 AM

#

left tartan A good place to start on this topic is <https://www.itl.nist.gov/div898/handbook...

Once i started the book Elements of Statistical Learning.. it was too hard for me. Just able to complete 3 chapter and gave up. Is there any other good book? Can I start this book from beginning ?

left tartan Sep 14, 2023, 2:04 AM

#

I'll let someone else recommend that, a lot of folks closer to college than I. I just know where I go to for applied stuff.

#

That book I cited is a good "read a chapter a week" kind of book. Great stuff, but it's long and more for applied / EDA stuff.

echo lance Sep 14, 2023, 2:07 AM

#

Ok.... i am looking for a book that has soem good math about ml ..makes my basics strong..enough to land me a good job

left tartan Sep 14, 2023, 2:24 AM

#

echo lance Ok.... i am looking for a book that has soem good math about ml ..makes my basic...

zestar75 recently suggested these: #data-science-and-ml message

heavy dagger Sep 14, 2023, 2:57 AM

#

I'm really lost in the class computer vision. I don't get the math/even concept of stuff like 2D convolution, gaussian kernel, filtering, edge detection and zero crossings and lapalacian filter and 1st derivative filter etc

#

Can anyone recommend resources that are very easy to understand?

iron basalt Sep 14, 2023, 3:21 AM

#

heavy dagger Can anyone recommend resources that are very easy to understand?

If you like videos: https://www.youtube.com/watch?v=KuXjwB4LzSA

YouTube

3Blue1Brown

But what is a convolution?

Discrete convolutions, from probability to image processing and FFTs.
Video on the continuous case: https://youtu.be/IaSGqQa5O-M
Help fund future projects: https://www.patreon.com/3blue1brown
Special thanks to these supporters: https://3b1b.co/lessons/convolutions#thanks
An equally valuable form of support is to simply share the videos.

-------...

▶ Play video

#

The last part is a neat application, but maybe too complicated. Just understanding the convolution in 3 different contexts (probability (counting), image processing, polynomial multiplication) will probably help.

magic dune Sep 14, 2023, 4:56 AM

#

I need help fixing my nueral network code would anyone be willing to help

halcyon hedge Sep 14, 2023, 7:32 AM

#

Calculate the upper and lower limits

df_temp = df

Q1 = df['Deaths'].quantile(0.25)
Q3 = df['Deaths'].quantile(0.75)
IQR = Q3 - Q1
lower = Q1 - 1IQR
upper = Q3 + 1IQR

Create arrays of Boolean values indicating the outlier rows

upper_array = np.where(df['Deaths']>=upper)[0]
lower_array = np.where(df['Deaths']<=lower)[0]

Removing the outliers

df_temp.drop(index=upper_array, inplace=True, axis=0)
df_temp.drop(index=lower_array, inplace=True, axis=0)

#

Is there anything wrong in the code? I am getting a "not found in axis error"

mystic ruin Sep 14, 2023, 8:10 AM

#

Is AIML and python good for AI?

past meteor Sep 14, 2023, 8:15 AM

#

heavy dagger Can anyone recommend resources that are very easy to understand?

Imo watch principles of computer vision on YouTube but you specifically need to watch it in the order they suggest on the channel.

Aside from that, you can look at the canonical computer vision books. What I did specifically was read first, watch afterwards and then implement. Rolling your own gaussian kernel etc can help.

potent sky Sep 14, 2023, 8:51 AM

#

past meteor My trinity of resources are: 1. https://mml-book.github.io/book/mml-book.pdf 2....

+1 these are great!
I would also add https://www.deeplearningbook.org

radiant cipher Sep 14, 2023, 10:59 AM

#

anyone aware if numpy has a way to collect "tree" scattered data, i have shape like parent_index: uint4, lookup_index: uint2 and given a index i want to get an array of all the lookup indexes until parent_index == 0

tidal bough Sep 14, 2023, 11:20 AM

#

radiant cipher anyone aware if numpy has a way to collect "tree" scattered data, i have shape l...

Sounds like an iterative task, so probably not easily done with numpy but maybe can be sped up with numba.

#

Though I'm confused about how your array works. It sounds like you have a structured array with a dtype consisting of two ints - but it seems to me that'd be very annoying to lookup nodes in, requiring a linear search each lookup.

radiant cipher Sep 14, 2023, 11:42 AM

#

basically the array stores a flattened tree where the index is the parent and the lookup index is the art stored somewhere else
i want to materialize the path of a given index if if/when necessary (which is rare)

boreal gale Sep 14, 2023, 11:49 AM

#

is the array basically a graph adjacency matrix of said tree?

radiant cipher Sep 14, 2023, 11:59 AM

#

boreal gale is the array basically a graph adjacency matrix of said tree?

its not a agency matrix, (as the tree is basically a very sparse graph) its literally just a array that serializes the tree by having each row store the index of its parent in addition to the data elements

serene scaffold Sep 14, 2023, 1:18 PM

#

@radiant cipher ConfusedReptile is right that you probably won't find an idiomatic numpy solution, because numpy isn't designed for "stateful iteration" (which is a term I just made up)

#

have you thought about using networkx for this?

left tartan Sep 14, 2023, 1:41 PM

#

radiant cipher its not a agency matrix, (as the tree is basically a very sparse graph) its lite...

i have a lot of data like this. when possible, i store (or compute) the full path (ie: 1.4.1.2) for later reference... rather than just the index of the parent.

pine wolf Sep 14, 2023, 3:23 PM

#

igraph and graph-tool have better interoperability with numpy

somber hamlet Sep 14, 2023, 5:30 PM

#

Hey, does someone knows what's the name of a histogram where's the x-axis are the percentile (or ). It's easy to find in plt how to normalise the frequencies, but haven't found how to normalise the x-axis

civic spruce Sep 14, 2023, 6:58 PM

#

Hi

#

I need Data sciend Engineer. IT will be paid well.

serene scaffold Sep 14, 2023, 7:47 PM

#

civic spruce I need Data sciend Engineer. IT will be paid well.

this server isn't for hiring

radiant cipher Sep 14, 2023, 8:03 PM

#

serene scaffold <@233155068155658240> ConfusedReptile is right that you probably won't find an i...

i guess i'll just make a cython function that operates on a view of the column with the indexes and then use the resulting array as index

#

hmm, if i use 0 as the parent of 0, i can actually just create a matrix of materialized paths by appending the values of the view of the parent indexes and creating the next line

cedar totem Sep 14, 2023, 8:43 PM

#

Hey, if i wanted to get into AI development where should I start? As a novice

#

Really I want to train something to handle some basic equations and hopefully scale it up

#

I only have experience with making algorithms, so I have no idea how to jump to AI from there

viscid ether Sep 14, 2023, 8:45 PM

#

Anyone interested in collaborating with me in my IslamAI project? Currently still collecting/cleaning authentic data and cresting blueprints for API endpoints. But will definitely love for some help with ML

https://github.com/yousefabuz17/IslamAI

GitHub

GitHub - yousefabuz17/IslamAI: Islamic AI ChatBot

Islamic AI ChatBot. Contribute to yousefabuz17/IslamAI development by creating an account on GitHub.

iron basalt Sep 14, 2023, 8:47 PM

#

radiant cipher anyone aware if numpy has a way to collect "tree" scattered data, i have shape l...

What kind of tree is it? Does it have a fixed number of child nodes (e.g. a binary tree in the case of N=2)? If not, do you know what the maximum number of children per node is?

radiant cipher Sep 14, 2023, 8:48 PM

#

iron basalt What kind of tree is it? Does it have a fixed number of child nodes (e.g. a bina...

the tree starts as list of path objects of different depth - so filesystem limits apply

#

(its hundrets millions of them, most dont need to be materealized, and having just a index to a tree saves so much memory )

small wedge Sep 14, 2023, 8:50 PM

#

cedar totem I only have experience with making algorithms, so I have no idea how to jump to ...

what kind of AI are you interested in making? it's a very broad term so just trying to narrow down what you're interested in

cedar totem Sep 14, 2023, 8:51 PM

#

Hmm. Something which can understand and perform mathematical operations given an equation and asked to solve, similar to wolframAlpha

iron basalt Sep 14, 2023, 8:52 PM

#

radiant cipher the tree starts as list of path objects of different depth - so filesystem limit...

Yea, then it probably requires Cython. Although it seems like you only need one loop? So you can probably Cython inline that.

granite nebula Sep 14, 2023, 8:52 PM

#

In stable_baselines3, I'm using MaskablePPO(from contrib). How do I set an action mask for a multidiscrete action space? This is what I'm trying right now, but it doesn't work. My multidiscrete action space has a shape of 6 and 4, so I'm returning a tuple with 2 ndarrays with shapes 6 and 4, respectively

    def action_masks(self):
        column_mask = np.zeros(
            (width,),
            np.int8,
        )
        for i in range(width):
            column_mask[i] = int(
                self.check_valid(self.last_in_column(i), i, initial=True)
            )

        color_mask = np.zeros((4,), np.int8)
        for i in range(1, 4):
            color_mask[i - 1] = int(self.tiles_left[i] != 0)
        return column_mask, color_mask

cedar totem Sep 14, 2023, 8:52 PM

#

If possible I would in future want to develop it to be able to perform calculus questions, such as differential equations, and give working, for me to use as an educational tool

iron basalt Sep 14, 2023, 8:53 PM

#

Or you can use Taichi, which is my preference these days for loops I need to make fast quickly directly in Python.

#

(Numba is way more restricted, buggy, etc)

small wedge Sep 14, 2023, 8:53 PM

#

cedar totem If possible I would in future want to develop it to be able to perform calculus ...

hm we don't really use AI to solve math problems, that just introduces risk of incorrect answers and things like language models that can accept text as input are not very good at doing complex math.

cedar totem Sep 14, 2023, 8:54 PM

#

Hmm. It does make sense that something algorithmic would be better

#

One function I would want to implement is being able to give it an input equation and an output equation and determine how you can manipulate the input equation to the output equation, if that makes sense

small wedge Sep 14, 2023, 8:55 PM

#

ah yes, that would be machine learning then

#

that entire field is focused on function optimization

cedar totem Sep 14, 2023, 8:56 PM

#

Its easy enough for x + 5 = 8, but when you start dealing with calculus, stuff starts getting wild lol

umbral charm Sep 14, 2023, 8:56 PM

#

Currently using Pandas, getting this error

#

result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for -: 'str' and 'str'

small wedge Sep 14, 2023, 8:56 PM

#

cedar totem Its easy enough for x + 5 = 8, but when you start dealing with calculus, stuff s...

https://developers.google.com/machine-learning/crash-course/ google's crash course

https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
3b1b's playlist covering neural networks, it get progressively more mathy as the videos continue but the early parts are very simple and intuitive explanations I would recommend watching the first two regardless of whether you want to dive into the math or not

andrew Ng's courses are very highly acclaimed, here is a link to what I believe is a completely free one https://see.stanford.edu/Course/CS229 and he has many others on coursera

iron basalt Sep 14, 2023, 8:57 PM

#

cedar totem One function I would want to implement is being able to give it an input equatio...

For that kind of stuff I recommend looking into how Wolfram Language works (it's actually a LISP-like).

cedar totem Sep 14, 2023, 8:57 PM

#

Wonderful! Ty for the help!

iron basalt Sep 14, 2023, 8:57 PM

#

It's basically just doing a bunch of operations on a tree.

#

Like factor, etc.

#

Then there are generic algorithms for doing various things like solving equations.

cedar totem Sep 14, 2023, 8:58 PM

#

Ah ye that makes sense

iron basalt Sep 14, 2023, 8:58 PM

#

It basically tries a bunch of things with heuristics.

#

So like a human would.

cedar totem Sep 14, 2023, 8:58 PM

#

Ah, just faster

#

Makes sense

iron basalt Sep 14, 2023, 8:59 PM

#

Pattern matching, then converting to the "normal form" or whatever you want to call it.

#

Then when it's in the nice form, do the algorithm for that form.

#

This requires a lot of work since you need to implement pretty much every math algorithm under the sun, e.g. complicated, exact, fast root finding and such.

#

Each one their own project.

#

Wolfram Alpha is also their own natural language processing tool that will convert natural language to this LISP-like Wolfram Language.

cedar totem Sep 14, 2023, 9:02 PM

#

Sounds like a larger can of worms than I anticipated

iron basalt Sep 14, 2023, 9:03 PM

#

You just need to add stuff to it over a long time before it can tackle pretty much any problem like now: https://www.youtube.com/watch?v=MeuCAT5HDh0

YouTube

SciMedTV

Introducing Mathematica, Stephen Wolfram

In this 1989 video presentation, Mathematica (TM) creator Stephen Wolfram demonstrates his award winning mathematics software. Wolfram demonstrates numerical calculations, algebraic calculation and graphical renderings. He concludes with a discussion of programming.

▶ Play video

#

Wolfram spent decades working on it non-stop.

umbral charm Sep 14, 2023, 9:03 PM

#

Is it possible for numbers in a CSV to be percived as a string by python, coz i think its happening to me and idk how to fix it

iron basalt Sep 14, 2023, 9:03 PM

#

However, you can make your life a lot easier by using sympy, to do most of the work for you.

#

If you just want to solve some simple equations, do a bit of calculus, it's not too bad, I made one such CAS for fun once in C...

#

Pretty fun project, recommend.

#

Also printing math to the terminal in ascii form like in that video was fun.

cedar totem Sep 14, 2023, 9:08 PM

#

Certainly a very clever thing haha

#

Well, ty for the knowledge

left tartan Sep 14, 2023, 9:48 PM

#

umbral charm Is it possible for numbers in a CSV to be percived as a string by python, coz i ...

Yes.

umbral charm Sep 14, 2023, 9:51 PM

#

left tartan Yes.

Yea no i fixed it, for some reason my CSV had commas in between the thousanth place

#

to make it easier to read

#

Idk how that happened to my CSV

left tartan Sep 14, 2023, 9:52 PM

#

Oh, I’ve had to fight those types of issues… or a single letter stuck in a column.

umbral charm Sep 14, 2023, 9:53 PM

#

YEA thats so irratating ill be scrolling through 100's of rows to find it

#

I mean Luckily we can just format the excel column to get rid of commas, coz i would have no idea how to do that on python

frozen girder Sep 14, 2023, 10:22 PM

#

Hi! If i have a df in pandas with the columns 'App' and 'Review'. How can i see which app have the name of the app on the review (without using apply)?

umbral charm Sep 14, 2023, 10:57 PM

#

frozen girder Hi! If i have a df in pandas with the columns 'App' and 'Review'. How can i see ...

try .contains, but it maybe bad for specific names or if the app is reffered to differently

frozen girder Sep 14, 2023, 11:00 PM

#

umbral charm try .contains, but it maybe bad for specific names or if the app is reffered to ...

I've already tried with this:
filtered_df = np.where(df_reviews['Translated_Review'].str.contains(df_reviews['App']))

#

But it gives: TypeError: unhashable type: 'Series'

umbral charm Sep 14, 2023, 11:41 PM

#

frozen girder I've already tried with this: filtered_df = np.where(df_reviews['Translated_Revi...

have u tried .itterows

frozen girder Sep 14, 2023, 11:46 PM

#

umbral charm have u tried .itterows

I will look at that, thanks!

left tartan Sep 14, 2023, 11:53 PM

#

frozen girder I will look at that, thanks!

In general: don’t use iterrows. That’s an anti-pattern with Pandas.

frozen girder Sep 14, 2023, 11:57 PM

#

left tartan In general: don’t use iterrows. That’s an anti-pattern with Pandas.

Then how can i do it?

left tartan Sep 14, 2023, 11:58 PM

#

You could either create a regex if the input is relatively simple, since contains takes a regex argument. Or, you could combine multiple conditions, one for each item in list

#

Hmm, I’m actually not quite sure what exactly you’re trying to filter on anyway: you want to know if a given rows review contains it’s app value?

frozen girder Sep 15, 2023, 12:01 AM

#

left tartan Hmm, I’m actually not quite sure what exactly you’re trying to filter on anyway:...

I want to know if the name of the app is contained in the review of the same row. I know i can do this with apply, but i want to know if there is anothre way to do it

left tartan Sep 15, 2023, 12:01 AM

#

df[df['col1'].str.contains(df['col2'], na=False)] ? I’m not at my desktop right now, otherwise I’d test first.

serene scaffold Sep 15, 2023, 12:15 AM

#

left tartan df[df['col1'].str.contains(df['col2'], na=False)] ? I’m not at my desktop right ...

ssh into prod and test it there

#

||I'll show myself out||

lapis sequoia Sep 15, 2023, 12:41 AM

#

Are you guys good at both pandas and sql? I’m only good at sql wondering how common having both is

left tartan Sep 15, 2023, 12:51 AM

#

lapis sequoia Are you guys good at both pandas and sql? I’m only good at sql wondering how com...

I’m very good at SQL, I’m good at Pandas (I’d rather be in SQL)

lapis sequoia Sep 15, 2023, 1:32 AM

#

I’ll have to practice then cool wish there was pandas leetcode

left tartan Sep 15, 2023, 1:42 AM

#

lapis sequoia I’ll have to practice then cool wish there was pandas leetcode

Lol, that’s a great idea

#

There’s a great sql ref… https://selectstarsql.com/

umbral charm Sep 15, 2023, 11:31 AM

#

left tartan I’m very good at SQL, I’m good at Pandas (I’d rather be in SQL)

Is your job like a Data analyst or sometin

distant mantle Sep 15, 2023, 11:35 AM

#

lapis sequoia I’ll have to practice then cool wish there was pandas leetcode

hi

left tartan Sep 15, 2023, 11:50 AM

#

umbral charm Is your job like a Data analyst or sometin

Broadly speaking: I build things for data analysts/scientists.

umbral charm Sep 15, 2023, 11:51 AM

#

left tartan Broadly speaking: I build things for data analysts/scientists.

by things do you mean software?

left tartan Sep 15, 2023, 11:52 AM

#

umbral charm by things do you mean software?

Yes

umbral charm Sep 15, 2023, 12:01 PM

#

left tartan Yes

Damn

#

anything i would know?

left tartan Sep 15, 2023, 12:03 PM

#

Wdym?

umbral charm Sep 15, 2023, 12:03 PM

#

is the software you make not public?

left tartan Sep 15, 2023, 12:05 PM

#

Correct

potent sky Sep 15, 2023, 1:51 PM

#

lapis sequoia I’ll have to practice then cool wish there was pandas leetcode

maybe something like this?
https://platform.stratascratch.com/coding?code_type=2

StrataScratch - Coding Questions

StrataScratch

lapis sequoia Sep 15, 2023, 1:52 PM

#

potent sky maybe something like this? https://platform.stratascratch.com/coding?code_type=2

oh this looks really good

#

it even has postgres

umbral charm Sep 15, 2023, 2:09 PM

#

left tartan Correct

Ohhh so do you like work for a company or do you like do contracting

serene scaffold Sep 15, 2023, 2:42 PM

#

Is there a name for the second formula here? the first one is markov's inequality, and the second is similar (but not the same as) chebyshev's inequality.

#

(disclosure, this is homework, so do not give me exact solutions)

lapis sequoia Sep 15, 2023, 3:07 PM

#

anyone here good with pytorch and cnn classifiers?