#data-science-and-ml

1 messages ยท Page 139 of 1

spare forum
#

Don't ask to ask

tranquil ledge
#

i did say what i want to do

void crescent
#

hey guys I know that transfer learning is the most optimal strategy and all, but for learning purposes what layers should come after a LSTM cell in my RNN

#

problem: classify syptomns into diseases

past meteor
#

Have you tried template matching with rotations?

void crescent
#
pneumonia_test = "I have been experiencing a persistent cough with phlegm, fever, and shortness of breath for the past week."


model.predict(pneumonia_test)
ValueError: Unrecognized data type: x=I have been experiencing a persistent cough with phlegm, fever, and shortness of breath for the past week. (of type <class 'str'>)
#

im confised because the train_sentences are strings as well

#

so why is my model unable to predict on a normal string?

void crescent
fallow tree
#

Hello hope u good !! how can i manage the CTGAN limitations with TimeSeries ?

dusk condor
tranquil ledge
dusk condor
#

i have not used Tableau, but this can be used to publish to web

tranquil ledge
lapis sequoia
#

maybe we can help anyways, if you describe the problem with some detail @tranquil ledge

warm river
#

hey i am new. after learning python, what will i do?? and i also want to contribute in a.i but i don't know the structure

left tartan
#

Start with learning Python and doing small projects. Don't worry about AI until you can do small projects on your own. Start in #python-discussion

dusty forge
#

A while ago someone in this channel is extensively using Prefect. Are they still around? ๐Ÿ˜„

dusty forge
#

I've been breaking my head and balls over Prefect for the last couple of days ๐Ÿ˜† ... I really want to like it, but when even copilot is unable to help me, Imma start sweating lolol

#

Something simple such as connecting Prefect Cloud to an external PSQL database, and then having it do a simple SELECT statement to see if it can actually see it ... no clue how to do it

#

I can use a masterclass ๐Ÿ˜†

scenic parcel
#

You end up going with lightning or ignite?

toxic mortar
#

why this happens

fresh thorn
#

Hey guys, I'm doing a research project utilizing Python and a certain library and I'm seeking help since I'm getting some errors. Is there any1 here who can help since I'm getting errors in the main method such as numpy and I'm lost on how to fix it. Also pls DM

dusty forge
#

yeah that's exactly what I'm trying to do, but with a PostgreSQL db instead of duck, the whole credentials blocks is so confusing and the doc on it is minimal

fallow coyote
#

How do you all utilise databases for ML/AI (for e.g. would you use a python script to put data from a csv to an sql database and use that database or to reformat an sql database so its optimised for ML)

serene scaffold
fallow coyote
iron basalt
# fallow coyote In general. Just want to have an idea of how databases are utilised in this fiel...

The primary purpose of a database is to store and retrieve data quickly and safely (don't lose the data, don't corrupt it, distributed, etc). They also provide stuff like SQL to allow you to specify exactly what you want from them. Then there are more specialized databases, or general, but used in different cases depending on performance requirements and such. So really databases are used in any field that has a bunch of data, which is pretty much everything these days as everyone has started to collect and store a bunch of data on everything.

#

ML requiring a bunch of data to train means that naturally a database would be involved somewhere.

#

Modern AI being heavily based on ML means a database is also probably involved.

#

"Data" comes in different forms, something like an image is very different from tabular data, and so it often has different storage and retrieval. Then there is how that data is organized internally by the database, and what kind of organization you want depends on what kind of queries you will be making frequently, so you can optimize for that specific problem.

south wraith
#

Hi folks, having a few little issues with my neural network. Wondering if I need someone to look at me code and see what is going on? I have LSTM and GNN set up in there. Currently it isn't generating any output from the brain.
The brain I feel is slow and not optimised very well.

Wondering if anyone could have a look and give me some advice because I found that it also at times uses a lot more RAM than it should with basic words. Sometimes only saying "hello" will make it try to allocate 13 GB over the available amount.

Also, wondering if it would be more beneficial to use a database and talk to that for the knowledge and all? Does anyone have experience with doing this for this application?

serene scaffold
south wraith
# serene scaffold Hello, when you ask for help, give all the information someone would need to sta...

Well, okay, fine, here is the code. https://www.blackbox.ai/editor?id=Z_yFmEUZcB7ZjLbYORzCt
Easier to do it there I think.

BLACKBOX AI is the Best AI Model for Code. Millions of developers use Blackbox Code Chat to answer coding questions and assist them while writing code faster. Whether you are fixing a bug, building a new feature or refactoring your code, ask BLACKBOX to help.

BLACKBOX has real-time knowledge of the world, making it able to answer questions abou...

spiral barn
#

can someone help me out with figuring out why my CNN is preforming so poorly at image classification here is some info: I am trying to predict the brand of a pair of jeans based on the brand patch on the back of the jeans to start I have 100 images for each class so 200 images total and I have done image augmentation to get a total of 2000 images but when I train a CNN I am getting a accuracy around 50% on the test.

spiral palm
#

do you think we can create a large language model only trained on analytically solving differential equations

wooden sail
drifting swift
#

what is pipeline in machine learning?
If i say it is a lifecyle of ML End to end project, would you conside this as a crt answer?

wooden sail
past meteor
spare forum
#

Pipeline could also refer to the combination of transformation of data +model in itself (they are called transformers but not the transformer in LLM)

peak thorn
#

can anyone please explain me what is weights and bias i m currently learning Gradient Descent and i m unable to understand these two terms it's confusing

lusty lotus
#

hey everyone, im building a TD boostrapped Q-learning (tabular) agent in cartpole-v1.

I would like to have some feedback. more specifically, i think the code for the update step + saving + lookup is wrong for table.py. i would appreciate it if anyone can have a look at my code. here's the discounted returns gamma = 0.99 for each trajectory

i have tried to check my formula against the q-value td update formula but it seems to be right. i suspect it is some careless mistake/not updating something properly

here is my repo: https://github.com/andreaslam/RLExperiments

lapis sequoia
#

Hii i am new is this the place to learn ai/ml wilth python

lapis sequoia
#

Nice

#

I will be asking some doubts as i get them

#

Is there anyone persuing ai/ml?

fringe oxide
#

hey all. I'm trying to build an image upscaler. is there any pretrained models that's simple to set up? i tried to set up real-esrgan but i'm having trouble getting it going. i'm new to this so, a bit of sample code along with the suggested model would also definitely help.

agile cobalt
#

there are some domain-specific ones, not sure about generic ones though

fringe oxide
#

oh.. i'll give it a try. thanks

umbral blaze
umbral blaze
fringe oxide
# umbral blaze I have to ask you - how do you even start using the models on websites like Hugg...

this stuff?

import torch
from diffusers import StableDiffusionPipeline

training_model = "hakurei/waifu-diffusion"

# Load the pre-trained stable diffusion model fine-tuned for anime
pipe = StableDiffusionPipeline.from_pretrained(
    training_model,
    # revision="fp16", // might fail on gtx cards, confirm.
    torch_dtype=torch.float16,
    safety_checker = None,
    requires_safety_checker = False
)
pipe = pipe.to("cuda")

# Generate an anime picture
prompt = "a catgirl chasing a ball"
images = pipe(prompt, height = height, width = width, guidance_scale=7.5).images
umbral blaze
#

Or did you code all that?

fringe oxide
#

i didn't make the model. it's a pretrained model

fringe oxide
#

i was just writing code to use the model

umbral blaze
#

So all you did was change the prompt?

fringe oxide
#

yes. what i'm currently working on is to find an upscaler so i can bring images to a consistent size to train a model.

umbral blaze
fringe oxide
#

just saying, i'm not the best person to ask. i started ml day before yesterday. lol

#

i looked around for code. put it together, asked cgpt to help a few times.

umbral blaze
umbral blaze
fringe oxide
umbral blaze
agile cobalt
agile cobalt
serene grail
umbral blaze
fringe oxide
agile cobalt
fringe oxide
#

in my case, i write code in .py files, but use jupyter notebooks as the consumer/tester.

lapis sequoia
umbral blaze
agile cobalt
#

you may need to get familiar with pytorch to train your own models, but if you're just consuming them to perform high level operations, the libraries provided by hugging face works wonders

fringe oxide
#

yeah right now i'm trying to set up an upscaler pipeline to feed the model i'm trying to train.

agile cobalt
fringe oxide
#

yeah there are too many anime ones. i am working on general purpose images though, currently.

umbral blaze
#

By the way, @agile cobalt , would you say the models on Hugging Face, if fine tuned, could be important for the community? Like maybe models that havenโ€™t been made before that can detect some type of disease or something? Or is the community too big to achieve something like that?

fringe oxide
#

it's all about how you train your model.

agile cobalt
lapis sequoia
#

whats the best hyperparameter optimization tool

stark lark
#

Cross validation brainmon

spare forum
#

Rarely gridsearch with really small dataset

umbral blaze
#

@agile cobalt I found a model which detects planes in images, how would I use it? I have no idea what to do next.

agile cobalt
#

look up a YouTube video on setting up YOLO

spare forum
#

But you should use optimisation algorithm and use cross validation, Hyperopt works good, optuna exists

spare forum
#

Answering previous message.

spare forum
uneven jewel
#

hey

lapis sequoia
#

and it worked but it didnt have bayesian optimization but i was wrong when i said that because it has integrations with a whole bunch of libraries that have it

#

like botorch

uneven jewel
#

someone help me with AI&DS

#

I'm beginner

#

someone tell me what should I learn

#

all Ik is some basics about python and some concepts of ML agorithms

#

๐Ÿ˜ญ

lapis sequoia
#

try doing a kaggle competition

uneven jewel
lapis sequoia
uneven jewel
cosmic lynx
#

so assuming I wanted to get into how the back end of the main ai tools like Keras and Tensorflow work, do I have to learn C?

lapis sequoia
#

which theme is this

#

its from vscode reolease notes

agile cobalt
cinder elk
#

settling at this coz today's the submission

lapis sequoia
lapis sequoia
# cosmic lynx so assuming I wanted to get into how the back end of the main ai tools like Kera...

keras itself it's all python, tensorflow i think it's c++, see this post https://stackoverflow.com/questions/35677724/tensorflow-why-was-python-the-chosen-language

cosmic lynx
#

isn't C++ still considered low level?

lapis sequoia
#

you may need to learn cuda if you want it difficult

#

it's not high level, i think bc of memory management, but idk much. PS: according to wikipedia it's high level.

cosmic lynx
#

okay, sounds like I need to understand C++ to get into the backend side of AI dev
thanks

small wedge
#

According to Wikipedia high level is anything above ASM but that's not how the term is used in the majority of CS discourse shrug colloquially it's more of a gradient than a binary designation.

#

C++ is low level from the perspective of all languages from ASM to python and beyond, so most people call it a low level lang

half vale
#

Anyone here setup a Mario gym env, RL model??

mystic arrow
#

Hey i wanna get into Machinelearning / Ai . but i have just finished some beginner courses of python what should i do next?

mystic arrow
#

make my own applications

umbral blaze
#

Object detection, etc.

mystic arrow
#

havent really decided

umbral blaze
mystic arrow
#

just doing the math for now but havent taken any course of machine learning

lapis sequoia
#

there is ml5.js i think as well, im always thinking of trying

#

may be useful to start, but it's js

#
mystic arrow
#

i maybe want to create smth by my own

spare forum
#

Also you can use geneticall algorithm but I don't know if it's that good for classic ML hyperparameter selection, it sure has this but idk if it's worth, but good to mention it tho @lapis sequoia

versed pilot
lapis sequoia
#

agreeing with u ig, makes sense.

lapis sequoia
#

The NLP newsletter covers the latest trending natural language processing (NLP) and machine learning (ML) news, projects, resources, and research papers. Click to read NLP Newsletter, by elvis, a Substack publication with tens of thousands of subscribers.

#

in case someone is interested on reading their selection of papers

#

i've just found it

#

what is the global loss function?

#

why global ?

#

sure

lapis sequoia
lapis sequoia
#

you are right, i didn't have time to check, thanks

#

i 'm looking for some aggregator / newsletter apart from scholar..

lapis sequoia
lapis sequoia
dry field
unkempt apex
lapis sequoia
#

:-) :-)

thorn flame
#

Folks! How do I find datasets for my recommendation system? :)

#

Is synthetic dataset also valid?

#

It's for an ecommerce store

serene grail
#

Kaggle has some datasets, not sure if it has exactly what you need though

thorn flame
#

I got something.

#

Just not exactly what I want. I realise I need a custom dataset

#

Also, how much dataset is enough here? (e.g. how many rows of csv on average?)

jaunty helm
thorn flame
#

I need to recommend products to users based on browser activity, prefs and purchase history

#

Where would I get all that data to scrape for instance?

thorn flame
#

Mind you, this is just a personal project.

#

And I'm an ML noob.

jaunty helm
thorn flame
#

so that's why I asked

jaunty helm
thorn flame
#

And what about synthetic generation?

jaunty helm
#

e.g. where do you shop online? there's probably other products there

jaunty helm
# thorn flame And I'm an ML noob.

this project doesn't really sound beginner friendly
if you're just getting started, joining one of the kaggle competitions is probably easier to start

jaunty helm
thorn flame
#

Just wondering if that would be enough data.

#

Since I can only scrape my data

thorn flame
#

AI-first

#

I'm already very close to the last stage

#

So I thought I could try to build one

#

My biggest problem currently is getting the data

#

I understand I could use sentence-transformers and KNN model to get recommendations

thorn flame
lapis sequoia
#

nice

stark lark
#

I guess that's a great lesson of start with understanding the problem and the data at hand, then build the solution. Never the other way around ๐Ÿ™‚

lapis sequoia
#

i made a grid search and its better than optuna because you dont have to worry about step sizes and doing appropriate number of iterations because it does this 1 0.5 1.5 0.25 0.75 1.25 1.75 0.125 0.375 ...

#

and the reason is beacuse I was working with SPSA and its insane how parameter sensitive that is

#

this is how SPSA LR to magnitude ratio affects loss for a coregistration task (where I am trying to find affine matrix values that make one image rotate to match another one on 1000 iterations)

#

you have to be so precise about magnitude

#

this was with random search because I haven't made grid search by that time

hollow night
#

Hi everyone!

I am taking the Machine Learning Specialization course by Stanford (instructor: Andrew Ng) from Coursera for about two weeks. The first two weeks (maybe the whole course) are completely theoretical. There are some optional labs, but I cannot take them since I did not purchase the whole course; I audit it. I am finding the course very difficult. The concepts like linear regression, supervised learning, and unsupervised learning are really complex. I couldn't understand much.

Can you please suggest to me a more practical Python machine learning course (free) that I can continue with this course?
I am a beginner in Python, 18 y/o and completed high school this year.
Thanks!

fallow tree
#

Hello

#

i wana ask a question

#

well i have a weelll organized machine learning model code

#

and i wanna execute it but obviously icant locally

#

so i need to execute it on cloud , where do u suggest ? because im trying to execute it in googlecolab but its kinda bad

serene grail
#

What do you mean by "it's kinda bad"? Can you be more specific?

fallow tree
#

the requirements doesnt match with what could googlecolab offer ! i find a lot of errors in my code due to dependancies or to version

river cape
#

Hi guys now if we are creating a virtual env for a system , it would be conda create -n env python 3.8
This would available in the whole system right?
Now what if I want a virtual env in directory how do I that?
I have used the --prefix method , but each time I want to activate it wants me to put the path
Is there any way to replace that path with the env name?

unkempt apex
novel mango
#

see this inconsistency in the values at the color bar, or it going so long for no reason, I want it to stay THE SAME regardless of what data is in the picture

tried this

vmin = 0
vmax = 100
my_cmap = mcolors.ListedColormap(['red', 'yellow', 'green', 'orange'])
bounds = [0, 25, 50, 75, 100]
norm = mcolors.BoundaryNorm(bounds, my_cmap.N)```
and other similar but none will work
magic dune
lapis sequoia
mild salmon
#

Hello, does anyone have tips regarding how to determine the optimum features for k-
means clustering to build regression model?

proper crag
#

is it good practice to use OOP in ML code?

spare forum
spare forum
mild dirge
wooden sail
#

you can also do it without OOP using a different framework like jax

mild salmon
spare forum
#

Then it's not "for regression models" BTW elbow method

proper crag
#

lol

#

oh yh, i was asking this bcuz i kinda stalked you ...and you said you made some OOP
lol

cinder elk
#

anyone who has used cvat before??

I don't know why the some of the annotations from the previous images are still showing even after I change the frame?

#

would it mean that all my images are would have this again or something?

#

I'm not sure really

serene scaffold
mild salmon
# spare forum Then it's not "for regression models" BTW elbow method

yeah sorry, english is not my first language,i should've rephrase it better. the elbow method is to determine the optimum n cluster right ? what i want to know is how to find the optimum features that i use to make those clusters. Is there any technical method that i can do or the only way that i can do is to go back to the fundamental principle related to my prediction target ?

ocean venture
#

Hello

#

Anyone there ?

serene grail
ocean venture
#

Ah ok so Iโ€™m trying to make a dataset to train with yolov8n but Iโ€™m having issues with annotating firstly itโ€™s taking to long and Iโ€™m wondering if there is like an auto annotation method ?

lapis sequoia
#

You may find one annotated dataset in Roboflow.

#

There is annotation software as well. Another option could be finding a NN that annotates for you at least to a first approximation.

lapis sequoia
#

There are paid annotators as well, but I never used it, I don't think it's ethical, since most are very underpaid (but you could pay well.)

ocean venture
lapis sequoia
#

yeah it's not for everyone

hollow night
serene scaffold
abstract pond
#

where can i find info about tesseract training except github?

magic dune
#

@hollow night

#

but ya understanding theory is a big part of it imo.

serene scaffold
#

I wasn't especially into the theory at first, but then I realized it was necessary if I was serious about this, so I decided to get theory pilled.

magic dune
wooden sail
# hollow night Hi everyone! I am taking the Machine Learning Specialization course by Stanford...

you can try and check out this book, which zestar always recommends https://www.statlearning.com/. however, especially the concepts you mentioned are ones that you'll need to understand well at a theoretical level if you want to work long-term in ML. the math can get arbitrarily hard, which you shouldn't take as a deterrent because you can get very far with only an intuitive understanding. but the concepts you listed are the foundation of everything that comes next, so those are ones that you really should understand very well, regardless of whether you want to pursue ML only as a practitioner or at a more involved level.

ideally, if you manage to code examples of the math you're learning without being guided through it, that would mean you've got it super well

fallow coyote
serene scaffold
fallow coyote
# serene scaffold What education are you getting formally?

That is kinda complicated ๐Ÿ˜‚. Ill try to be brief. Ive done three years (foundation and 2 1st years because I transferred to another uni) at uni (biomedical science) but dropped out. At high school (or in the UK we call sixth form if youre doing A levels) I did biology, chemistry and maths with grades DDB respectively. Dont let my grades fool you; im far more intelligent than what they show but i just fell under the pressure of wanting to get the highest grades possible. My mathematical ability has always been my strongest skill and I am absolutely capable learning complicated concepts

serene scaffold
ocean venture
#

just today

#

like idk if im configuring it right

fallow coyote
ocean venture
#

the thing im testing right now since i got my coral tpu im trying object detectin in real time on games i picked csgo for an example

#

keeps detecting random stuff

#

as of my gun and random objects

fallow coyote
serene scaffold
ocean venture
#

i mean the only thing right now im trying to is annotate but it will take to long som im trying the auto feature

#

can u come dms i want to send it there pls

serene scaffold
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

ocean venture
#

like what do you want me to send

#

as i said i just got my coral tpu and im inrested into ai object detection especially in games at real time so i just picked a random game and when right now when im trying to annotate it im having a hard time

#

yes i tried but most of them run on yolov5

#

and i need it on yolov8

#

oh shit

#

how you find that much on roboflow

#

i didnt find shit

#

ah

#

i typed it in a weird way idk why

#

lol

fallow coyote
ocean venture
#

btw bro can i use these pretrained models to auto annotate a dataset ?

#

cuz i want a big dataset to get a good model

#

WHAAAT

#

how my brain just stopped working

#

so your telling me its better to manually annotate ?

#

buut ?

#

sorry i dont get what you exactly mean

#

?

ocean venture
#

@lapis sequoia you still there ?

serene scaffold
#

@surreal frost you can't seek employment relationships on this server; your message has been removed.

hollow night
spare forum
#

Using js

simple tapir
proper crag
#

@final kiln what do you think? i think imma just learn to leverage coding skill just by straight up making an ML model

ocean pawn
#

I've implemented regression with pure numpy (jax), how much harder would implementing (simple) neural network with numpy? Is it worth the attempt? Thanks

#

Or should I fallback to Keras

serene scaffold
spare forum
north drift
#

How important is the math behind DS / ML when it comes to a job?

#

I'm just curious, currently taking a course on the pre-requisite math behind it.

#

This question is pretty heavily asked on reddit and they all seem to agree upon having a deep mathematical understanding

#

I just don't see how that's relevant when performing on the job?

#

Do I realistically need to use topology or linalg that much?

#

(just examples)

#

I don't plan on skipping it, of course. But, is it really necessary that I have to fully understand everything?

serene scaffold
indigo wing
# north drift This question is pretty heavily asked on reddit and they all seem to agree upon ...

They all probably are in the market or learning. They market is very saturated right now. I recommend just learning the tips and tricks. Obviously the basics are most important here, to understand what the code you write does on a mathematical level, since that will determine the correctness of the code. The most relevant thing is how to do your task. If someone want a edge detection tool given some parameters you have never worked with, you should be ready to be able to implement that in a different language than python as well. The most important thing is basic understanding > implementation > deep understanding will just take too much time and effort and not worth it for most tasks.

serene scaffold
#

I've never been asked to implement any ML thing in not-Python.

left tartan
left tartan
north drift
#

Alright, correct me if I'm wrong but,

An understanding of the underlying math (not highly extensive, I suppose?) of all the blackboxed ML / DS algorithms allows me to make informed (by informed, I assume you mean - understand the data and it's correlations which would later serve as information to choose the appropriate algorithm or even predict it's behaviour based on given input) decisions. So the math is not used for any major process, but rather it streamlines the decision making involved for selecting algorithms / models i.e. speeding up development process?

#

As in - data driven decision making? Use / Learn the underlying math, not for theory but for a robust / highly accurate / faster decision making?

#

Sorry about the long text, I'm just incredibly confounded.

#

I am under the assumption that the math serves as just the theoretical foundation and nothing else. Correct me here also please!

#

Would love to hear your angles on the above

buoyant vine
#

I guess depends on what you are trying to do, if this is for something like work where you're not developing research related models or other things. I would argue the actual math side is not as useful as just knowing roughly what the impact of one algorithm or system has on the system.

Not sure if I put that right, but i.e. Being able to say "Yeah this linear layer is to try and help with X behaviour" or going "Yeah this transformer is the right structure for doing this NLP text processing task" etc...

north drift
# spare forum Depends where

Suppose in Business, Analytics.

Then in R&D for new products. Not fundamental research, but that found in R&D and incubators.

#

I'd like for you to answer from both perspectives please!

spare forum
#

Business will be the lightest R&D you need optimization, stats etc...

spare forum
#

DS is large, you can pontetially do 0 DL, do have tabular data, time series, I think the most common factor would be stats and algebra

#

For R&D in time series you basically do applied maths

uneven jewel
#

hey guys

inland mulch
#

oh so this is the aids channel (unfunny)

uneven jewel
#

guys is Nvidia ai course worth it?

#

they teaching about generative AI

#

does generative AI has good scopes in jobs ?

#

helwp ๐Ÿ˜ญ ๐Ÿ’”

serene scaffold
# uneven jewel does generative AI has good scopes in jobs ?

I am a computational linguist, and most of my work has been about generative AI ever since ChatGPT was released. But jobs like this require a lot of education, and it's hard to say what the landscape will look like by the time you can get the requisite degree(s).

If you don't already have a relevant degree, the nvidia course won't help you.

#

nvidia is actually doing some workshops at my office next week.

uneven jewel
serene scaffold
uneven jewel
serene scaffold
uneven jewel
#

I just entered my 3rd year

serene scaffold
#

and what is the degree? computer science?

uneven jewel
#

They call it btech in here

serene scaffold
#

what kind of AI are you interested in?

uneven jewel
#

And I don't have any idea about what I should study

#

That's why I came here to ask for help

#

๐Ÿ’”

#

@serene scaffold you there buddy?

serene scaffold
#

sorry, I'm in a meeting

uneven jewel
#

I'll wait

left tartan
left tartan
#

But as a beginner, you should build a broad foundation and learn a little about a lot, not a lot about a little.

uneven jewel
#

You got any ideas which skill is in demand now and will be in future

left tartan
uneven jewel
left tartan
#

Then do a range of ML projects, don't worry about your 'job': get familiar with the terminology and concepts

uneven jewel
#

And I know some basics about ML algorithms and Ik how they work

left tartan
#

I'm more a data engineer, so my bias is towards being a competent software engineer.

uneven jewel
left tartan
#

Don't know that one, but yah, Kaggle projects are great

uneven jewel
left tartan
#

Also, check the pins. Zestars book list is great

uneven jewel
left tartan
#

There's several. Stats and ML

uneven jewel
iron basalt
# uneven jewel You got any ideas which skill is in demand now and will be in future

This is not financial advice. While the "AI" bubble may pop by the time you graduate, there will likely be some new trend and it will probably involve software somehow, software (and programming) is not going anywhere. Only go into software and especially AI if you are really into it, or you will get burnt out. AI is also not going anywhere, but it will probably not just have money blindly thrown at it every time it's mentioned as right now. In general, it's not a great idea to chase the current trend, since it will take you time to be ready, instead just getting good at the fundamentals is probably a better idea (keep expanding this set of things you know, for example, don't know just Python, or just LLMs, etc). Once you see the new trend, you can take advantage of it (or choose to not get involved in the trends at all and go with something more stable).

meager ridge
iron basalt
uneven jewel
#

So first I'll improve my ML and python skills

iron basalt
uneven jewel
#

Honestly, I just love working with software and computers

iron basalt
uneven jewel
#

But idk shit about physics,chem and bio

iron basalt
# uneven jewel But idk shit about physics,chem and bio

IMO (this is not financial advice!), the most well positioned are those that have a combination of math, one of the others just listed, and on top of that can program because everything involves computers now. However, in terms of pure wealth generation, these are undervalued given the skill requirements and you can make more in places like finance with half the skill set (or less) (but you will probably be miserable from it if you want to actually make things, and not just generate wealth).

uneven jewel
left tartan
uneven jewel
#

Anyway how Old are y'all?

iron basalt
left tartan
#

I've worked in several different fields in my career, seemingly unrelated but there's always some leverage of my past.

#

** All SWE related

left tartan
north drift
#

this is an answer I got from my friend as well, any knowledge you learn will definitely be useful, especially mathematics.
It's pretty hard to learn the math to be completely honest.
I am not enjoying learning Group Theory, Topology, Analytical Geometry.
While I did cover these mathematics in my engineering courses, I wouldn't necessarily describe them as something I saw useful. I struggle a lot, even today to solve problems specific to these parts of mathematics alone.
for example: prove a group is abelian or, we define a congruence class like so... etc.

I just couldn't bring myself to find this math relevant.

#

I guess what I am trying to say is, from what I have gathered so far, I will approach the Math Breadth-First, rather than Depth-First.
Learn that which is required until you encounter that which needs to be learned, in which case, learn that also.

#

bro is convincing me to learn the hard stuff

#

๐Ÿ˜ญ

#

In my experience, Group Theory has just been full of theorems, axioms and proofs

#

and, I just can't handle too much theory.

#

You're one of the few people who enjoy the intricacies of mathematics, we need more people like you

#

I, as an engineer, just focus on information that gets me results, it's kinda simplistic and pragmatic, but oh welp ๐Ÿคทโ€โ™‚๏ธ

raw pasture
#

when you come across such especially in your test, what do you think, any suggestions

Testing which columns are returned

assert list(q7_result.columns) == ['customerName', 'customerNumber', 'productName', 'productCode', 'total_ordered']

Testing how many rows are returned

assert len(q7_result) == 2531

Testing the values in the first result

assert list(q7_result.iloc[0]) == ['Petit Auto', 314, '1913 Ford Model T Speedster', 'S18_2949', 10]

north drift
#

If I didn't end up as an engineer, I would have chosen physics. It's the only science whose ideas I could comprehend.

raw pasture
#

any suggestion i came across this in my codility

north drift
#

Seriously, mathematicians, esp. those who are domain specific in Linear Algebra are just beasts. I don't know how they think

#

there's nothing physical about what they are doing, it's so abstract (which is the point, I get it) and hellish

#

took me an entire semester of Math to understand what a linear transformation is (thanks to 3Blue1Brown, I survived my finals)

#

Until the undergrad level, speaking from personal experience, physics as a subject is something that I have always operated with intuition. It just clicks. But I get what you're saying. We're at the point in science where there is a stark lack / misalignment of modern physics ideas and human intuition.
If I had to guess, 100 years back, Quantum Physics was looked at the same way back then just as it was emerging, as we are looking at our new age physics now.

#

I am an engineer so take my words with a grain of salt please ๐Ÿ™

#

Likewise. Software has essentially wiped out any interest I had in Physics from earlier ๐Ÿคทโ€โ™‚๏ธ

#

It was good speaking with you

spare forum
#

Literally the classic path to do DS atleast in my country is do have applied math Master degree or engineer school with strong stats and a DS path

shut shoal
#

I'm a little confused with the BERT model. In the BERT model, does pre-training and fine-tuning happen at a specific point in an encoder?

serene scaffold
topaz cobalt
#

So, forgive me if this is too oblique of vague, but I've been studying python a bit here, and I'm extremely interested in applying that knowledge to using PyTorch and building custom LLM chatbots, unfortunately, there is so much information readily available, that I'm not entirely sure where the best place to start is. I know "the beginning" is usually a great go-to, but what would the "beginning" look like?

topaz cobalt
#

What's NLP?

serene grail
dry field
#

so the way to learn chatbots from the ground up is to start with NLP

#

or else you can just learn how to gather data and finetune an LLM

topaz cobalt
#

Eventually, building my own model is something I wouldn't mind attempting, but I feel like there's a very steep learning curve in front of me to do that, and then I would still have to learn how to integrate it into an app

paper apex
#

would you want it to train on data you already have then or do you just want to make a generic chat bot people can communicate with?

topaz cobalt
#

Honestly, for my first project I wanted to use a transformer to build an NPC generator for TTRPGs, so, I imagine just interacting with a transformer, but indirectly

#

Oh, Hi nallo

paper apex
#

with something like this you wouldn't even need pytorch, you could use an LLM and give it an existing prompt with context i.e. 'you're an angry dwarf blacksmith' and it will do the rest. you could create context by letting the LLM summarise previous chat responses and concatenating the current request to it

topaz cobalt
#

I mean, that would work I'm quite sure, but I think I'd need PyTorch to interact with the LLM at all anyway?? Even if I don't part of the reason I wanna do it this way is to learn. Building the app I'm thinking of without PyTorch isn't goiong to help me learn how to use it, does that make sense?

#

What's that?

#

LangChain is a python framework?

#

I don't know what AWS is either, I'm very new, I just have a habit of jumping in the deep end.

#

Oh, ok

#

And LangChain is the library for building interactions with LLMs, such as the HFTs?

crisp hearth
#

Hi

lapis sequoia
#

Do you guys agree?

topaz cobalt
#

Yeah, I'm probably gonna use HFT, I'm broke and can't afford AWS

#

Hugging Face Transformers

crisp hearth
#

@final kiln hi

topaz cobalt
#

They're open source models

#

the amazon ones are?

#

Fair enough, I have another discord server I'm on that's dedicated to this type of thing, so I'll ask around there too, thanks for the help!

fiery stump
#

currently making an AI that uses a bunch of random numbers as input

#

it's supposed to try and mimic human randomness

topaz cobalt
#

Isn't randomness being proven to be only theoretical? Most of the things I've read indicate that true randomness just doesn't scientifically exist.

fiery stump
#

not for mimicing randomness per se, but trying to get close

#

like how 7 is the most common number you will get if you ask a human for a random number from 1 to 10

#

i need some data in the form of random numbers, but i cant computer-generate them (that would defeat the whole point) nor roll a dice

dry raft
#

hey guys how much linear algebra is needed for being so good at ml that you can fine-tune your own ml models

left tartan
#

Perhaps a useful exercise is to look at undergrad and graduate programs, and the math courses involved.

lapis sequoia
regal bronze
#

Heya guys a newbie here! Why am I getting a weird result?

My main.py

import numpy as np
import warnings
import os
import pandas
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
warnings.filterwarnings("ignore")
from keras import layers # noqa
import keras # noqa


def load_dataset(path: str) -> tuple[np.ndarray, pandas.DataFrame]:
    filedata = pandas.read_csv(path)
    hours_worked_X = (filedata['hours_studied']
                        .values
                        .reshape((-1, 1))
                        )

    test_score_y = filedata['test_score'].values
    
    return (hours_worked_X,test_score_y)



def main(data: tuple[np.ndarray, pandas.DataFrame]):
    model = keras.Sequential([
        layers.Dense(1, input_dim=1),
    ])
    model.compile(optimizer='adam', loss='mean_squared_error')
    model.fit(data[0], data[1], epochs=1000)


    score_to_check = 1
    score = model.predict(np.array([[score_to_check]]))[0][0]
    print(f"hours: {score_to_check} score: {score}")
    
            

if __name__ == "__main__":
    data = load_dataset("./dataset.csv")
    
    main(data)

my dataset is just

hours_studied,test_score
1,50
2,55
3,65
4,70
5,75
#

As result this is what I get (when input is 1)

Epoch 1000/1000
1/1 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 0s 33ms/step - loss: 4222.2148
1/1 โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 0s 40ms/step
hours: 1 score: 0.272627055644989
lapis sequoia
#

try dividing the test_score series by its maximum value @regal bronze

#

then train, then predict, then multiply the result by the maximum value obtained before.

regal bronze
#

Oh that I train each row separately?

#

I think I misunderstood otherwise it'll take a long time to do the training process ๐Ÿ˜†

lapis sequoia
#

i'll just sketch the idea:

max_score = test_series.max()
normed_test_series = test_series.divide(max_score)
#

but you may read what normalisation is, since it'll be useful. basically, it's bringing your data to a more suitable range

shut shoal
#

I get it now. I just had a misconseption earlier.

regal bronze
#

Oh and an other question are there any good tutorials to start off with ai development?

#

I've tried a lot but they all use difficult terms that I never heard of (as an 14 year old I've never heard of those terms such as slope)

lapis sequoia
#

it's difficult to calculate, i'd guess some other teenager here could actually have better advice

regal bronze
#

Atleast thanks for trying to answer

lapis sequoia
#

but i think it may be quite hard

#

another option is ml5.js, but that's javascript not python (i.e a different programming language)

#

last option i can think of, is asking chatgpt/claude; they are actually quite reasonable at that.

lapis sequoia
lapis sequoia
#

yes, for some things sure, and also to suggest you where to start

regal bronze
#

Sounds like a great advice that I could follow

#

And the last question for today is what maths is actually required in AI development?

shut shoal
serene scaffold
iron basalt
#

Some are lucky and have that context taught to them (or they find it themselves with a lot of effort).

#

Physics has a similar problem, although because it's physical you can at least use intuition of the real world for a while (until you get to stuff humans don't normally interact with (well they do, but can't notice it) or can't because it's too big, too small, etc).

iron basalt
gritty vessel
#

Hey

#

I created a project that generates music and video

#

Now I want to deploy it so that everyone can use it how can I do it?

serene scaffold
proper crag
#

so , i decide to make an ML project which the dataset is leaning toward businesses purposes due to the fact that its much more ezier to kind of visualize the goal.
my plan
use pandas to read the dataaset and matplotlib to use plot it to do some EDA bfrore i get into Feature Engineering and i'll use Logistic Regression from sklearn bcuz the goal is to classify which factor that could be the main influnc of store sale

#

here is my progress

import pandas as pd
import matplotlib.pyplot as plt
#import numpy as np
#from sklearn.model_selection import train_test_split
#from sklearn.linear_model import LogisticRegression
#from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

data = pd.read_csv('/Users/hatsunahana/Documents/New Folder With Items/programming folder/StoresPrep.csv')
data_quantile_1 = data['Revenue'].quantile(0.75) #31050500.0
data_quantile2 = data['Revenue'].quantile(0.25)   #9021375
data_min = data['Revenue'].min()   #2336000
data_max = data['Revenue'].max() #100083000
proper crag
#

here is my code to plet

sorted_data = data.sort_values(by=['Revenue'], ascending=False)
plt.figure(figsize=(12, 8))
plt.bar(sorted_data['Store_Number'], sorted_data['Revenue'], color='skyblue')  
plt.xticks(rotation=90)  
plt.title('Stores Sorted by Revenue (Highest to Lowest)')
plt.xlabel('Store_Number')
plt.ylabel('Revenue')
plt.show()```
#

why does its plot this way even sorted_data = data.sort_values(by=['Revenue'], ascending=False) ?

#

why does its plots it in random distribution regardless of sorted_data = data.sort_values(by=['Revenue'], ascending=False) ?

small wedge
#

you're plotting along 2 axes, so it stands to reason sorting the data would do nothing to your graph

#

the x axis is determined by the store number not the order of the revenue objects

#

if you wanna see a sorted graph the plot along the single revenue axis and make the store number a label or drop store number altogether in your plot

proper crag
#

how i can do it?

#

althoug ignor the graph's title

proper crag
# small wedge what are you expecting the graph to look like? <:pithink:652247559909277706>
plt.figure(figsize=(12, 8))
plt.bar(sorted_data['Type'], sorted_data['Revenue'], color='skyblue')
plt.xticks(rotation=90)
plt.title('Stores Sorted by Revenue (Lowest to Highest)')
plt.xlabel('Store_Type')
plt.ylabel('Revenue')
plt.show() ```
this code give me 3 big chunks super,hype and extra
but when i try 
```sorted_data = data.sort_values(by=['Revenue'], ascending=True)
plt.figure(figsize=(12, 8))
plt.bar(sorted_data['Type'], sorted_data['Revenue'], color='skyblue')
plt.xticks(rotation=90)
plt.title('Stores Sorted by Revenue (Highest to Lowest)')
plt.xlabel('Store_Type')
plt.ylabel('Revenue')
#plt.show()
#print(data)

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
print(sorted_data)```
the max is extra but theres few hype that is sandwiched in between which mean 
```sorted_data = data.sort_values(by=['Revenue'], ascending=True)
plt.figure(figsize=(12, 8))
plt.bar(sorted_data['Type'], sorted_data['Revenue'], color='skyblue')
plt.xticks(rotation=90)
plt.title('Stores Sorted by Revenue (Lowest to Highest)')
plt.xlabel('Store_Type')
plt.ylabel('Revenue')
plt.show()```
isnt the right approach
#

few Extra is sandwhiching each other with hype

small wedge
#

should be something like

df = pd.DataFrame({'revenue':[100,32,432,3,2,5,55], 'id':[1,2,3,4,5,6,7]}).sort_values(by='revenue')

_, ax = plt.subplots()
ax.bar=(range(len(df['id'])), df['revenue'])
ax.set_xticklabels([0, *df['id']])
plt.show()
#

I'm not great with matplotlib so some of that might be unnecessary shrug

proper crag
regal bronze
craggy agate
#

**I was trying to fine-tune Gemma-2B-IT and ran into this error, I am using LoRA, I have access to a T4 GPU on google colab for training, it's got 15GB of RAM. My dataset isn't very large, it's just a bunch of text messages. How do I fix this problem? I could quantize the LLM to 8bit or 4. **

`OutOfMemoryError Traceback (most recent call last)
<ipython-input-2-c3fada2fe639> in <cell line: 74>()
72
73 # Train the model
---> 74 trainer.train()
75
76 # Test the model

29 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
114
115 def forward(self, input: Tensor) -> Tensor:
--> 116 return F.linear(input, self.weight, self.bias)
117
118 def extra_repr(self) -> str:

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU
`

#

Code:
`tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_auth_token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
use_auth_token=HUGGINGFACE_TOKEN
)

lora_config = LoraConfig(
r=8,
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
task_type="CAUSAL_LM"
)

with open('/content/drive/MyDrive/messages-cleaned.json', 'r') as file:
messages = json.load(file)

dataset = Dataset.from_dict({"text": messages})

def tokenize_function(examples):
return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)

dataset = dataset.map(tokenize_function, batched=True)

data = dataset.train_test_split(test_size=0.1)
train_dataset = data["train"]
val_dataset = data["test"]

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=val_dataset,
args=transformers.TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=2,
max_steps=30,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
output_dir=OUTPUT_DIR,
optim="adamw_hf"
),
peft_config=lora_config
)

torch.cuda.empty_cache()

trainer.train()

text = "Wassup?"
prompt = text + "\nAnswer:"

inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=100, eos_token_id=tokenizer.eos_token_id)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)

model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)`

thorn flame
#

@buoyant vine So my understanding is that the NNDescent algo from PyNNDescent isn't a machine learning algorithm?

#

And unlike sklearn's nearest neighbor models, there's no actual training going on

#

For instance, I don't get to call any .fit() function to train the model with datasets

buoyant vine
#

none of them are actually 'learning'

#

SKLean just gets you to call fit to build the index

#

but it isn't some ML type thing

#

KNN/ANN (Approximate nearest neighbors) are not something you need AI models in order to do. they are a spacial processing type of problem

#

NNDescent is just a optimized way of building a graph

#

that you can then traverse to find the cloesest points

wooden sail
#

it really depends where you draw the line

#

from that POV, linear regression also isn't AI/ML, though you'll see it included in many books on the topic

thorn flame
#

Yes, cos I'd expect KNN models to be used for classification and regression tasks

#

Making them distinct from ANN (which is more like a specialised algorithm)

wooden sail
#

yeah

#

so the point there is that you're trying to solve some sort of optimization problem, and neural networks are one approach for doing so. this is what people often, but not always, mean by machine learning

#

the AI umbrella is typically bigger and includes optimization both with and without neural networks

#

a lot of this stuff is really just buzzwords and they're used differently by different people

thorn flame
#

I think I understand

#

When @buoyant vine talks about "learning", he's referring to neural networks

#

KNN's are neural networks I guess

wooden sail
#

it also depends on what you mean by "neural network"

#

KNN does "learn" some parameters, and you can treat it as a function learning centroids of data that is assumed to be uniformly distributed, and using those to approximate the original input

thorn flame
#

When I mean "learn", I think of adjusting to new test datasets

wooden sail
#

knn needs to relearn its parameters when you change the data

#

and most ML algs don't retrain themselves on the fly when you apply them to new data

#

but nothing stops you from doing that with KNN either

thorn flame
buoyant vine
#

ah right,

thorn flame
#

I mean not sorry

wooden sail
#

see, so linear regression is something i definitely would not call learning, since it's a classical method

thorn flame
#

So supervised learning isn't learning?

wooden sail
#

the point is that terms like AI, ML, "learning", are all buzzwords

buoyant vine
#

ANN and KNN effectively have defined behaviours, there is nothing to learn regarding the data, it is just a cut and dry "build a graph" "build a tree" or "compare each value one by one"

#

Now each ANN/KNN algorithm can have different levels of accuracy

wooden sail
#

what do you mean by "defined behaviors"? they need to tune all of their parameters based on the data

buoyant vine
#

but that is down to the algorithm, and doesn't truly get changed by the input data

wooden sail
#

all ML methods that do not adaptively change their architecture (which is the vast majority, outside of niche stuff done in papers) work exactly as you describe rn

thorn flame
wooden sail
buoyant vine
buoyant vine
#

no?

wooden sail
#

it depends both on the variance of the data, the hyper parameter k, and whether or not the data truly follows a uniform distribution

#

all of those affect the output error/performance

#

if the vectors lie on a more sophisticated manifold than just subsets of C^N, it won't work

#

if the training data does not follow a similar distribution to the one you'll use the knn to classify or regress, it won't work

#

assuming it works in the first place for the training data

thorn flame
#

I understand with KNN, you can set things like K, or weights that affect the overall perf

wooden sail
thorn flame
#

Like K-Means clustering??

buoyant vine
thorn flame
#

I'm an ML noob btw

buoyant vine
#

It is like having your search index be mapped by postcode, and then the query being mapped by intergalactic coordinates

#

maybe a bad analogy ๐Ÿ˜

thorn flame
#

Storing in form of numerical embeddings I think

#

KNN is just more explicit /exact, (re: K)

#

There's no explicit search for centroids

wooden sail
#

for most practical cases there is, since comparing against the full dataset is too expensive

#

but even before that, the properties of the data determine the achievable accuracy

buoyant vine
#

When I talk about accuracy with KNN

#

I am referring to the accuracy of the algorithm compared to the results you would get if you did a brute force

wooden sail
#

i'm referring to classification accuracy of test data

buoyant vine
#

since in KNN brute force is effectively the accuracy=1 baseline

#

right, that I agree with

#

Maybe we can drag it down into probably the biggest distinction that I can think of, which is as anomaly says, typically a KNN index keeps all the original 'training' data

#

in my mind a AI/ML model that 'learns' condenses that training data into a fixed/constant size rather than creating effectively a lookup table

wooden sail
#

all right, for the basic implementation of KNN that's true

buoyant vine
#

we can technically be nit picky and say something about like quantization and all that for KNN stuff. But the majority (all?) modern approachs still typically scale in size with the training data given to it

thorn flame
#

So which y'all suggest finally?

#

KNN (sklearn.NearestNeighbors) from sklearn or NNDescent from PyNNDescent?? :)

#

sklearn.NN is "unsupervised model" from the docs

#

And from my task requirement, I need to recommend products to users based on purchase history and browser activity

#

That seems like unsupervised learning

#

As opposed to say providing labels like user product prefs?

#

@buoyant vine @wooden sail

#

Also, maybe a supervised learning apporoach isn't bad?

buoyant vine
#

PyNNDescent imo

wooden sail
#

NND should scale better. recommenders are also one of the applications knn and related (graph based) approaches

#

as for parametric, another form of the recommender problem is via projection onto a low dimensional vector space

buoyant vine
#

I would generally argue that PyNNDescent is one of the best ANN libraries around, not just from a Python POV but in general

#

if you wan't to learn about the algorithm and implementation the author did a really good talk about it

thorn flame
#

Ah, awesome

wooden sail
thorn flame
#

My only issue with ANN seem to be that it doesn't give exact data points?

buoyant vine
#

yes the paper is solid, PyNNDescent does a bit extra on top though which is the cause for for the bulk of its search speed

thorn flame
#

For an ecommerce store, I'd expect users to see say 100 products

#

Regardless of approximation.

#

I mean based on approx

buoyant vine
#

as you tend towards 100% accuracy, your speed will fall off (exponentially usually)

#

but 99% accuracy or even 95% is normally good enough for most cases

thorn flame
#

Awesome.

#

I sincerely appreciate all the help.

lapis sequoia
thorn flame
#

@buoyant vine I think currently the algo looks good and uses user-based collaborative filtering. But I'm getting an error when I run the script particularly when querying with PyNNDescent:

def make_recommendations(
    user_id: int, user_index: NNDescent, purchase_history_df: pd.DataFrame
):
    top_n = 10
    purchase_pivot = purchase_history_df.pivot_table(
        index="user_id", columns="product_id", values="quantity", fill_value=0
    )
    user_item_matrix = purchase_pivot.values

    # Find similar users
    similar_users = user_index.query(user_item_matrix[user_id], k=top_n + 1)[0]

    # Get product IDs purchased by similar users
    similar_user_products = purchase_history_df[
        purchase_history_df["user_id"].isin(similar_users)
    ]["product_id"].value_counts()

    # Recommend top N products
    recommendations = similar_user_products.index[:top_n]
    return recommendations
thorn flame
# thorn flame <@290923752475066368> I think currently the algo looks good and uses user-based ...

Error along these lines:

 File "/home/manasseh/crossover/app.py", line 53, in <module>
    recommendations = make_recommendations(1, user_index, purchase_history_df)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/manasseh/crossover/app.py", line 36, in make_recommendations
    similar_users = user_index.query(
                    ^^^^^^^^^^^^^^^^^
  File "/home/manasseh/crossover/.venv/lib/python3.11/site-packages/pynndescent/pynndescent_.py", line 1748, in query
    indices, dists, _ = self._search_function(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/manasseh/crossover/.venv/lib/python3.11/site-packages/numba/core/dispatcher.py", line 423, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/home/manasseh/crossover/.venv/lib/python3.11/site-packages/numba/core/dispatcher.py", line 364, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'sum' of type float32
#

I've also read I don't need to add numba decorators (e.g. njit)

tidal bough
thorn flame
#

np array?

tidal bough
#

The error is "Unknown attribute 'sum' of type float32", and individual floats indeed don't have a .sum for obvious reasons. So it sounds like you passed a scalar somewhere where an array is expected.

buoyant vine
#

ngl

#

use polars, it'll make your life so much nicer

tidal bough
#

this looks like it's using pynndescent, so may not be an option

buoyant vine
#

I mean for the pandas related stuff

thorn flame
#

But the 1 is passed to the k parameter

buoyant vine
#

the input of pynndescent normally expects a 2D numpy array

thorn flame
#

And its expected type is integer

tidal bough
buoyant vine
#

The error it is failing on is probably your query data

#

not your K

thorn flame
#

It's an integer actually

#

The user_id i.e

#

If you look at the purchase_pivot, the user_id is used as index

#

So I think that's how I'm able to access with user_id in the query

#
  • user_item_matrix is an ndarray
#

Hence compatible with pynndescent

tidal bough
#

hmm, that's a 1d array... does query support that? it doesn't seem to have docs on this method, but the examples are all for multidimensional data

thorn flame
#

I think you're right

#

I need to convert to 2d array

thorn flame
#

So it seems to work @tidal bough @buoyant vine

#

Thanks folks! :) I'll just go ahead to test

faint quail
#

Im working on a neural network library from scratch with numpy and scipy, what do you guys think

ik the conv striding is prolly wrong same with the batch norm but im too dumb/lazy to fix it lol
https://paste.pythondiscord.com/GAUQ

empty mason
#

does anyone know why this wouldnt be pulling the table i need from html? (im new to data scraping sorry if this is a goofy mistake)

#

(table im trying to pull)

#

stock table is returning as a type none

serene scaffold
empty mason
#

ok i didnt know where this issue lies in terms of where to put it, thanks

twilit flower
#

I download a repo then copy it into my codes repo

#

Then after copy ๐Ÿค”๐Ÿค”๐Ÿค” i fucked up

left tartan
twilit flower
#

And thats the error I m getting

left tartan
indigo wing
#

Hey, really stupid question here. I have the target variable in the trainset, it's not in the testset, Should I drop it or should I add another column to testet that is the average of that target variable column. I think I need to refresh my basicsjam_cavedude . I kinda forgor such a simple thing @left tartan can the mod gods help mepy_strong pretty please

unkempt apex
#

If you are doing SL then you need in both!

#

otherwise drop..

indigo wing
#

sl as in boss?

mild dirge
#

If you want to know how well your model predicts the target variable, you will need to actual target variable.

unkempt apex
#

we have unique people in this server, ConfusedCamel, ConfusedReptile

indigo wing
#

I am doing basic regression, I remember my uni teach always made us drop tho. maybe I am remembering wrong from him

indigo wing
unkempt apex
#

basic regression on what?

indigo wing
unkempt apex
#

then you need that!

indigo wing
#

then why my univ teacher always made us go drop lemon_angrysad

unkempt apex
#

uni?

#

How stupid I am !

#

Dont add target varialbe in test set!

#

so think about this!
you are doing simple regression on just price prediction, so in training phase , you will need those variable to tell the model , this is what we need, but in test phase where model alreeady is trained you don't need that!,
you will just test with random value in test set

#

my bad sorry!

indigo wing
#

I see then when do we need to add the target var in test? in regression

unkempt apex
#

In general there is no need to add in test set..

indigo wing
#

okay boss! Thanks a lot

indigo wing
unkempt apex
#

check dm first

#

important for you

indigo wing
#

thanks

thorn flame
#

Basically I'm evaluating precision@k

buoyant vine
#

you are probably calculating the metrics wrong

thorn flame
#

Something like this:

    actual_set = set(actual)
    predicted_set = set(predicted[:k])
    intersection = len(actual_set.intersection(predicted_set))
    return intersection / k
#

predicted is the recommended products

#

Actual is products user have purchased in the past

#

K is 50 in this case

#

Previously, I was using the precision_score function from sklearn

buoyant vine
#

๐Ÿค” that doesnt make sense

thorn flame
#

But was getting 0.0 still

thorn flame
#

I dunno

buoyant vine
#

but two things

#
  1. why are you trying to get it to match the previous history? you want it to get similar items based on the history not the history itself
thorn flame
#

And based on similar user's history

#

It's user-based, yeah?

#

So it's also possible it just recommends products other users have engaged with?

buoyant vine
#

yes it just changes how you calculate the query vector

#

but how are you calculating the query vector

thorn flame
#

So can I just submit the solution as is? Without bothering about evaluation?

buoyant vine
#

no but we're missing info here

#

why is it supposed to return the history? that doesn't make sense as a target criteria

#

how are you calculating the query vector?

#

what are the similarity scores and what distance measure are you using

thorn flame
#

It's returning products from purchase history dataset

#

I'm not sure I used similarity scores and distance tbf

#

But the idea is that it searches for similar users and selects from products they've interacted with.

buoyant vine
#

no dont go for similar users

#

go for similar items

#

your index should contain a set of embedding for items

thorn flame
#

I see, not users

buoyant vine
#

where their embeddings are probably generated by keywords or what not

#

then you take a user's purchase history

#

take all the embeddings for each item and average them

#

to get a generalised 'view'

#

that is the basic start of a recommending system based on knn

serene grail
#

It's cool to learn about the inner workings of the recommendation systems that are so common in our lives nowadays

thorn flame
# buoyant vine take all the embeddings for each item and average them

Not sure I got this part but I'm doing something like this now. Does it make sense?

    purchase_pivot = purchase_history_df.pivot_table(
        index="user_id", columns="product_id", values="quantity", fill_value=0
    )
    user_product_matrix = purchase_pivot.values

    # Find similar products
    similar_products = product_index.query(
        np.reshape(user_product_matrix[user_id], (1, -1)), k=top_n + 1
    )[0]
buoyant vine
#

how are you getting the data in that DF tho

thorn flame
#

It's synthetic

#

I used Python's faker library

#

And some public dataset I found on Kaggle

#

The products dataset is from Kaggle

#

But user and purchase history are both synthetic

supple dagger
#

@thorn flame hisexy trans๐Ÿฅฐ๐Ÿฅฐ

thorn flame
#

Huh??

supple dagger
#

Cing con๐ŸŽŽ๐Ÿฅท

thorn flame
#

So with this, @buoyant vine I shouldn't worry about evaluation?

buoyant vine
#

๐Ÿค” but like... what is user_product_matrix

#

what is it containing

thorn flame
#

It's some kind of aggregate dataset

buoyant vine
#

you can evaluate anything if you don't know what it is representing

#

or how it was generated

runic sphinx
#

Hi, I'm a new user here, I want to learn about data science, and I was told that Python is easy for beginners, but is there any advice so I can learn it well? how did you learn it, I wonder if you would be willing to answer

Thank you๐Ÿ™๐Ÿ˜Š

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

thorn flame
#

The purchase_pivot dataframe is like an aggregate of the quantity of a particular product pruchased by the user

runic sphinx
thorn flame
#

So it's structured in such a way that users are rows and product ids are columns

#

And the value of the intersection is the quantity of that product purchased by the user

buoyant vine
#

but that doesn't form any sort of useful vector

#

well... technically it can. but for this case we're going to say it doesn't

#

bear in mind, what the system is doing to calculate the distance between two vectors is:

def distance(a: list[float], b: list[float]) -> float:
    result = 0.0
    for a, b, in zip(a, b):
       result += a * b
    return result
#

this is one of the most basic forms of vector distance aka dot product

#

it assumes your vector are pre-normalized, i.e. between 0 and 1

thorn flame
#

Also the user_product_matrix

tidal bough
#

ah yes, math.sumprod ๐Ÿ˜›

buoyant vine
#

because what you described above suggests it isn't

thorn flame
buoyant vine
#

or at least the vector themselves are not suitable for the KNN search in their existing state

thorn flame
#

For the user_product_matrix, here:

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])
#

That's the result

buoyant vine
#

๐Ÿคจ okay, let's get you a better dataset

#

no point trying to learn this when your data is going to spit out nonsense

thorn flame
#

What's wrong with the dataset?

buoyant vine
#

print the distances returned by nndescent

#

when you go query it

#

I suspect they are either insanely high, or basically near zero

thorn flame
#

Funny, the query method doesn't return a tuple as I'd expect

#

Something like indices, distances

buoyant vine
#

it does

thorn flame
#

In my case, it's not

buoyant vine
#

it returns tuple[np.array[:], np.array[:]]}

thorn flame
#

But I have just similar_products

buoyant vine
thorn flame
#

Ahh, gotcha

buoyant vine
#
 similar_products, distances = product_index.query(np.reshape(user_product_matrix[user_id], (1, -1)), k=top_n + 1)
thorn flame
#

It's like this:

[[0.8691931  0.8691931  0.8995514  0.90055974 0.90055974 0.90941435
  0.90941435 0.9097644  0.9097644  0.91088685 0.91289283 0.91620582
  0.91620582 0.91620582 0.92280783 0.92303062 0.92303062 0.92343841
  0.92522663 0.92566015 0.92591977 0.92591977 0.92615172 0.92702812
  0.92768535 0.92781716 0.92855477 0.92859515 0.92886892 0.92911416
  0.92911416 0.92916405 0.92974479 0.93113408 0.93113408 0.93164332
  0.93180938 0.93324679 0.93324679 0.93324679 0.93514436 0.93545601
  0.93705795 0.94188143 0.94228064 0.9433372  0.94547957 0.94574923
  0.94574923 0.94576492 0.94582209]]
buoyant vine
#

what metric is pynndescent using? default?

thorn flame
#

cosine

#

Are the values too high or too low?

#

I'm guessing it's between 0 and 1?

buoyant vine
#

it isn't so much that they're high or low, they are just very close to one another

#

what is this dataset?

thorn flame
#

Hmm

thorn flame
buoyant vine
#

link the kaggle dataset

thorn flame
#

But users and purchase history datasets are custom

#

for users, I have autoincremented user_id

#

And purchases based on product_id from the phones dataset

thorn flame
#

@buoyant vine cosine similarity dramatically reduced:

[[0.19798268 0.20261373 0.20261373 0.20261373 0.20435535 0.20850801
  0.21258985 0.213093   0.21336271 0.22002845 0.22045725 0.22085548
  0.22254818 0.22277667 0.22382116 0.2247374  0.22515145 0.22718953
  0.22726336 0.22739979 0.23045504 0.23045504 0.23107832 0.23168693
  0.23256001 0.23435276 0.23439567 0.23464295 0.23464295 0.23519942
  0.23519942 0.23530574 0.23545824 0.23545824 0.23755211 0.23782388
  0.23848263 0.23864098 0.23873667 0.23879228 0.23935779 0.23935779
  0.23994766 0.23994766 0.23998856 0.24073968 0.24097407 0.24273052
  0.24273052 0.24273052 0.24350467]]
#

I think the reason they're mostly the same is maybe lack of noise in the dataset?

#

Anyways. I don' t think so

#

This measures something else.

shut shoal
#

I'm learning the GPT model by myself and I want to check if this is the correct process. Could someone check my process?

  1. Tokenization of data
  2. Embedding (Token and Positional)
  3. Dropout
  4. Transformer decoder done n amount of times (This is where pre-training happens (language modeling) and fine tuning (what are the fine tuning methods?))
  5. Give the GPT some input and it'll be tokenized and embedded and then the output
frosty fulcrum
#

Does anyone know how to optimize a model using OpenVinoSharp Library?

serene scaffold
frosty fulcrum
#

nvm, the problem has been solved

rare ferry
#

Suppose I have a score, which is calculated differently for different teams. I want to create 1 score to show overall. There are ways to do this like average, weighted average, mean etc. Is there a specific term which is called for this process. I wanna know more about this,

indigo wing
#
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.20, shuffle=True)
print(np.shape(X_train))
print(np.shape(X_test))
print(np.shape(y_train))
print(np.shape(y_test))
print(train_df1.dtypes)

(1168, 80)
(292, 80)
(1168,)
(292,)
Id                 int64
MSSubClass         int64
MSZoning          object
LotFrontage      float64
LotArea            int64
                  ...   
MoSold             int64
YrSold             int64
SaleType          object
SaleCondition     object
SalePrice          int64
Length: 81, dtype: object

How do i one hot encode this to float, I am doind linear regression

remote stream
#

splits the rows into different columns and gives them a binary value

indigo wing
#

yes sir

#

I need to one hot encode this but first I need to fill the missing NaN values and convert object to num

#

I think I also need to convert int to float

#

since its asking for float when I fit(X,y)

#

can u help me plz

short terrace
#

is it difficult to understand how multi dimensional data can be represented in 2d with pca?

lapis sequoia
#

im not sure whether this is what you meant, but that formula can be derived somewhat easily by using the NN diagram, each input actually gets multiplied by all the weights, and all the weights, and so on as you go down the layers.

wooden sail
lapis sequoia
#

i guess, but to me it's easier thinking visually

#

something like this

wooden sail
#

yes

lapis sequoia
#

that shows the addition, and also the fact that it multiplies all weights

wooden sail
#

i do suggest you brush up your math a little bit, because the equation also tells you exactly that

#

but the diagram achieves the same effect

#

you can rewrite the first equation using diagrams like the one you shared now, then translate that into the second equation

lapis sequoia
#

yes, i am aware, but i didn't know exactly how to derive it from standard multiplications

#

it's just A*B*C... but that's to me not so easy to see it ends up in that formula

#

or it was

#

im currently finishing tha paper recommended in the forum here (https://alphaxiv.org/pdf/1802.01528) it's a bit oversimplified i think, but not bad

alphaXiv

This paper is an attempt to explain all the matrix calculus you need in order
to understand the training of deep neural networks. We assume no math knowledge
beyond what you learned in calculus 1, and provide links to help you refresh
the necessary math where needed. Note that you do not need to understand this
material before you start learning...

lapis sequoia
left tartan
wooden sail
#

i always recommend the matrix cookbook, but it doesn't offer proof for everything

toxic valve
#

Hey guys, what do you guys think of the Joy of Programming game? Is this good for a beginner at python?

unique spoke
#

Any idea why im getting this error : ```
AttributeError Traceback (most recent call last)
Cell In[21], line 2
1 #defining model
----> 2 net = cv2.dnn_DetectionModel(weightsPath,configPath)
3 net.setInputSize(320,320)
4 net.setInputScale(1.0/ 127.5)

AttributeError: module 'cv2' has no attribute 'dnn_DetectionModel'
. It was working before, the only thing I changed was adding py
from gaze_tracking import GazeTracking

gaze = GazeTracking()

gleaming gyro
#

is there a better bessel function than the scipy one? i tried using the one from mpmath and they look identical for all my purposes

left tartan
lapis sequoia
#

why are gradient boosted trees called gradient boosted trees

#

why gradient

#

doesn't it just fit a residual to a previous decision tree prediction

#

k

#

k

#

lets say decision tree predicts 1

#

and the real class is 0

#

so

#

then would gradinet boosting say that the next tree needs to predict -1 so that it predicts 0 instead of 1

runic parcel
#

Is there any good articles/blogs or docs for learning about AGI, Ai and stuff?

lapis sequoia
#

no they havent made any good ones yet

agile cobalt
lapis sequoia
#

agi is real they juyst havent made it and havent made any good articles because of how stupid they are

agile cobalt
runic parcel
runic parcel
#

So is there for that

#

Fundamentals like ann, cnn, and classification and stuff right?

agile cobalt
agile cobalt
runic parcel
#

So both are diff things

#

Donโ€™t compare gpt to agi

agile cobalt
agile cobalt
lapis sequoia
#

no

#

so objective is loss(preds, targets)

#

i get it

#

why do we change mse

#

what is the problem with (preds - targets) ^ 2

#

it says Taylor expansion of the loss function up to the second order whats the point

wooden sail
#

the MSE of a nonlinear function is in general nonconvex

#

repeatedly approximating with 2nd order taylor expansion is the same as using the newton method

lapis sequoia
#

why do we need a taylor expansion why not just take mse

wooden sail
#

because the newton method can reach local minima faster than just gradient descent

lapis sequoia
#

so those are newton trees

wooden sail
#

idk what you mean by trees, i have no context other than your last 6 messages

lapis sequoia
#

okay

#

I am tryingto undertsnad why gradient boosted trees have gradient in their names

#

it says

#

"Now that we have a way to measure how good a tree is, ideally we would enumerate all possible trees and pick the best one"

#

so ideally you dont want to use gradient descnet

#

why are they called gradient trees then

#

and then it says "In practice this is intractable, so we will try to optimize one level of the tree at a time" which doesnt sound like gradient decnt

wooden sail
#

the rest of the doc goes on to say that the updates you take are functions of the gradient and hessian

#

https://en.wikipedia.org/wiki/Gradient_boosting this explains it more clearly imo

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typ...

lapis sequoia
#

it only said "Learning tree structure is much harder than traditional optimization problem where you can simply take the gradient" about gradient and cant fuind hessian

wooden sail
#

it's not doing gradient descent

#

it's using the gradient g and hessian h to formulate a different problem, and then solves that

#

(the part about not doing gradient descent depends on how you want to think about it tbh: as an optimizer, or as a function applied to an input)

lapis sequoia
#

okay

#

so you find predictions that minimize the loss function and then fit the decision tree to those predictions

#

nie

#

its also stupid though isnt it

#

because what minimizes the loss is difference between preds and targets

#

thats when loss is the lowest

#

why not just fit to preds - targets

serene grail
#

isn't that what a difference is?

lapis sequoia
#

yeah why use newton method to find the best residual predictions when they are just the difference

#

or iam stupid

#

it says d loss(y, yhat) whats the d with respect to it doesnt have model parameters in it so its either y or yhat

#

okay im reading wikiepda residuals for a given model are proportional to the negative gradients of the mean squared error (MSE) loss function. But you already know the residulas whats the point of calculating them from gradients

#

im Ultra Dumb

abstract wasp
#

Hi rn my friends and I want to make an app and implement a llm into it and potentially deploy it to app stores. We were thinking of using react native but Iโ€™m not sure if thatโ€™s the right framework to use w ml models. Can you guys give me some suggestions if this is the right path ty

misty depot
#

Is this the correct place to talk about llms and model pruning?

agile cobalt
# abstract wasp Hi rn my friends and I want to make an app and implement a llm into it and poten...

unless you want to run inference on device, which framework you use for the app itself really doesn't matters at all - and if you do want to run on device, then it'll only work for extremely few high end devices & have a rather bad quality

generally you would host your own API for the app to talk to, which either runs the model or calls a hosting provider like openai, google or antrophic to run the model for you

misty depot
#

Anyone knows how to prune an llm model using Wanda method??

#

I am really confused

#

Also anyone knows how to get mistral 7b model in google colab without quantizing it

agile cobalt
agile cobalt
indigo wing
#

Hey if I want to get going with spark, should I deploy all the spark, kafka, zookeeper etc in docker images? Someone once told me to always do kafka in docker never local, is it the same for others as well?
even the database?

#

I just want to know, I am not familiar very much with docker and I am a windows user lemon_angrysad so my docker is already 3X worse than your unix one. How can I make an img?

glass badger
#

Is it ok to use pd.get_dummies for more than 2 unique values? Or should I just use onehotencoder?

abstract wasp
serene scaffold
#

Outside of that, it doesn't matter how you make the user interface. You can have a micro service in python that connects the model to the user interface.

misty depot
#

Also i am using google colab pro will the mistral model not work in that too?

#

I can change my research to a smaller model anytime tho

quaint citrus
#

I am creating a image caption generator web app, but have issues with importing torch. I was using poetry as my venv.
I might be answering my own qn but i just want to confirm if installing torch using the command "poetry add torch torchvision torchaudio" results in poetry possibly adding a torch version requiring cuda? cos when i specified to install cpu only torch, i can import torch without errors on poetry

agile cobalt
#

!rule 6 9

arctic wedgeBOT
#

6. Do not post unapproved advertising.

9. Do not offer or ask for paid work of any kind.

agile cobalt
#

This is not a job/recruitment board nor is there a place for it in this server.

atomic tide
#

@loud parcel I've deleted your message as we are not a job board. Please don't post advertisements here again.

serene scaffold
lapis sequoia
coral field
#

What are the best python packages to perform multi-label imbalanced classification oversampling? So far, I want to use MDO from "multi-imbalance", except I cannot "pip install" the package due to missing metadata. I also couldn't find any additional packages that address this facet. Does anyone know of any implementations?

indigo wing
#

Hey if I want to get going with spark, should I deploy all the spark, kafka, zookeeper etc in docker images? Someone once told me to always do kafka in docker never local, is it the same for others as well?
even the database?
I just want to know, I am not familiar very much with docker and I am a windows user lemon_angrysad so my docker is already 3X worse than your unix one. How can I make an img?

craggy agate
#

hey guys, I'm fine-tuning DistilBERT on a dataset, I have a macbook which I want to use. While Pytorch supports MPS, it is only using one of my 10 GPU cores, is there a way I can use all 10 or atleast 7-8?

#

Cause currently this would take 20 hours.

#

I can use the NPU but I want to keep it as my backup option as the TOPS is only 17. I believe all of my GPU cores together should deliver better performance but if that is not an option I would have to train this using Xcode which I am not too familiar with in order to use the Neural Engine.

buoyant vine
#

๐Ÿ˜… you've hit the trifecta

#

First, ditch the regular flavour of Kafka and zookeeper

#

Use red panda instead, it'll still give you a Kafka interface but is 100x easier to setup locally

#

Then for spark... Honestly if you're able to I'd just use something like AWS emr to spin up a small instance for a few hours to learn rather than locally

toxic mortar
regal bronze
#

Hey guys back again! So to my understanding is this correct?

I use linear regression if I want to predict values that only increases or decreased

I use a classifier such as DNNclassifier and Linear classifier if i want to predict the probability of a label??

Please correct me if I'm wrong

regal bronze
#

hmm, I followed like a tutorial via freecodecamp.org from tech with tim (great guy) and he didn't explain what the difference is between DNNClassifier and LinearClassifier, what is the difference between those 2 models?

#

Tensorflow

#

it's also weird that he gave an example of a classifier in a LinearRegression chapter

jaunty helm
regal bronze
#

that sounds difficult (I still don't know how to determine how much neurons I have to use for a layer)

#

And how much layers even is necessary

lapis sequoia
jaunty helm
#

like, if you need to use neural nets, use tf or torch, otherwise sklearn
tabular data maybe xgboost / any other gradient boosting tree library

rotund parrot
#

Not sure if this goes here but I need quick answer if anyoneโ€™s able to help out.

Suppose I have 24 columns which are number of hours in a day (col1, col2, col3โ€ฆetc). Each column has a calculated average percentage of utilization per hour.

I want to make new column that calculates daily utilization. Iโ€™m currently adding all the values in each column and dividing by 24 (average of averages?)

I keep confusing myself every time I think about it and not really sure whatโ€™s the best way to get the average DAILY utilization from those HOURLY numbers.

I tried to make my question as clear as possible. Hopefully this question makes sense lol

regal bronze
#

And I have to determine then how much neurons and layers i need based off the loss and the accuracy?

jaunty helm
jaunty helm
regal bronze
regal bronze
#

Oh and you just use different layers you think that will fit

jaunty helm
jaunty helm
#

(and it's very overwhelming for someone starting out)

regal bronze
#

this sounds overwhelming already

jaunty helm
regal bronze
jaunty helm
regal bronze
#

I think I asked already enough questions thanks yall!

#

Oh wait one more: are all models actually made with those layers and stuff?

regal bronze
jaunty helm
regal bronze
#

What am I stressing

jaunty helm
# regal bronze Oh well

it's not that difficult to just have something working once you get past the initial step
e.g. to "build a RandomForest model for classification" you just do

from sklearn.ensemble import RandomForestClassifier

df = pd.read_csv('data.csv')
target = 'Survived'
X, y = df.drop(target, axis=1), df[target]

model = RandomForestClassifier()
model.fit(X, y)  

# you can now use `model` to predict stuff
unknown_data = pd.read_csv('unknown.csv')  # same columns as `data.csv`, but there's no 'Survived' column
model.predict(unknown_data)
```it's skipping over some parts but yknow
heavy crow
#

I have two non periodic signals over a large time frame and and hoping to show there is a correlation with a certain lag between the two signals. I've managed to create this lag plot, how might i interpret these results?

heavy crow
#

Why am i seeing two "symetrical" positive peaks as well? They are only half as strong.. can i ignor ethem?

#

also, is normalizing the values before calculating the cross correlation allowed? does this mess with the results?

raw tree
#

is it worth it to try training longer ?

#

cause F1 does not budge

#

the dataset is heavily biased in case you are wondering

fickle pine
#

I was wondering, I have a prediction bot for stock graphs and it uses xgboost instead of just linear regression and it won't seem to scan for anything under 6 months of data points on a graph to make a prediction (the data points are each day) and so I added hourly, then 30 minute intervals and so on to add more data points for 1 day, 5day, and 1 month graph data but it still isn't enough even though the 6month has 128 data points and I made the 1 day, 5day, and 1 month graph data

#

Anyone able to help?

raw tree
#

not sure what you are talking about exactly, but stock market graphs are just random walks at short timespans

#

so it shouldnt add much predictive power anyways

heavy crow
#

For biased data sets, try weighing the minority

raw tree
raw tree
raw tree
heavy crow
#

The dip "at zero" is actually around 100seconds away, exactly what I would hope for. It's just my time series is over like 9h at 50hz

#

Person coeff and the other coeff I forget the name of are both around -0.7

heavy crow
raw tree
torn talon
#

i have a possible very dumb question. i have a csv file that just contains x,y values for the function y = x^3, from x=1 to x=12

#

im trying to make sure i understand how to use linearregression to model polynomial functions

#

im generating the polynomial regression like this:

#
    heatCapacityData = np.genfromtxt(heatCapacityFile, skip_header=1, delimiter=",")
    
    if (not self.__isHeatCapacityDataValid(heatCapacityData)):
      exit
    
    temps = heatCapacityData[:,0].astype(int)
    heatCapacity = heatCapacityData[:,1].astype(float)
    # TODO: consider short search through degrees to find more accurate
    # polynomial degrees. fear the overfit.
    poly = PolynomialFeatures(degree=3, include_bias=False)
    polyTemps = poly.fit_transform(temps.reshape(-1, 1))
    return LinearRegression().fit(polyTemps, heatCapacity)