#data-science-and-ml

1 messages Β· Page 115 of 1

elder hemlock
#

Ah! Right.

#

I tend to call that "static analysis".

#

In that the AI doesn't really run away from you.

#

Which is a property I would attribute to "deductive" thought patterns.

#

You could compare this solution to something like dijkstra's shortest path in that way.

#

I'm a fan of static analysis, and I believe it is underrated as a solution, and offers different properties that we can utilize.

#

I also believe a human thinks in a mix of "learning ai" and "static analysis".

#

See, my project to create this game is a very good example of where there are shortcomings to learning AI.

#

Since the AI cannot afford to have a delay on its rulings, and it shouldn't be biased.

#

I was in this chat yesterday, and I brought up what I was mainly using this for,

#

Did you see those messages

#

I'll link them

#

Yeah, I was pretty brief.

#

So, I think it turns out that Bayesian inference is very compatible with "static analysis", so I have done experiments on paper relating to this hypothetical game that lets the player create new things.

#

In that, I use Bayesian probability maths to establish a number quantifier for the "risk" that a design imposes on things around it.

#

Which is my basis for a measurement of fairness.

#

So for example, if there was a strong player, and a weak player, this algorithm in theory, can identify that this is not fair. In a consistent way.

#

And their fairness and riskiness can be expressed as a decimal number.

#

Well, if I can turn fairness into a number, that means I can create automated balancing for a game.
I can give players many freedoms and I can keep their abuse in check.

#

Uh, no.

#

This system design also paints a ideal test for character designs to pass.

#

This also connects with character writing, and the idea that ideal characters have flaws AND abilities.

#

Do you think it would list a discovery like that somewhere?

#

I would continue to work on this project, but I'm not sure people would understand it, even when finished.

#

I guess I mean people who aren't in the field.

#

The limit of explanation is the familiarity with the audience. And from alien minds come alien ideas.

#

I argue their ideas were better understood in retrospect.

#

And maybe mine might, given that I'm actually doing something new, which is unlikely.

#

Yes.

#

Exactly.

#

In communication, it seems biased to put the burden on only the one explaining, and not some intersection of me trying to explain to you, and you trying to understand me.

#

But I'm willing to try to explain in different ways, like the one you described.

cinder jay
#

hi guys

#

someone has ever used nnU-Net?

#

i have some warning that are a bit annoying

orchid forge
#

guys, i want to improve my data analysis skills can anyone recommend me some sorta statistics, probability, etc. specially for data analysis....kinda free couse or website?

gleaming osprey
#

Ig

orchid forge
gleaming osprey
orchid forge
#

also is it true that there is statistics and probability for specifically data analysis too?

#

oh so can you tell me some good websites to learn stats and prob for data analysis ?

#

i just sometimes dont understand that everything i do is not enough

#

and umm i end up with something more to learn

#

becuz i didnt knew it at the first place

#

i wish there was a detailed roadmap for data analysis

toxic mortar
#

I want to evaluate unsupervised learning natural language processing (topic modelling). Curently I am performing hyperparam tuning with GridSearch and RandomSearch. I set up pipeline so the output would be 2d html plotly graph and list of labeled cluster groups with its count (including outliers). After many iteration I would like to perform evaluation. How would you approach this? So far goal is to minimize the outlier number, but also not have big but dense clusters. Something in between. What evaluation metrics should I use. Something like std from numpy??

#

I mean realistically the outliers may be just noise cuz specific dataset is trash

buoyant vine
#

honey, 3blue1brown just uploaded a new video https://www.youtube.com/watch?v=eMlx5fFNoYc

Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support
An equally valuable form of support is to simply share the videos.

Demystifying self-attention, multiple heads, and cross-attention.
Instead of sponsored ad reads, these les...

β–Ά Play video
serene scaffold
#

Hello, remember what I told you about screenshots of text. Are you still having an issue?

#

What fire? Are we doing arson again? lemon_hyperpleased

odd meteor
terse frigate
#

@serene scaffold hey bro can you provide me with the link to register for free credits?

terse frigate
#

i signed up on aws educate

#

but i am not sure if i am on the right one as you asked me if i was still a student and this did not require my student email or verification

#

hence i thought id just ask you

serene scaffold
#

I don't know. I haven't looked at it in at least five yeras.

terse frigate
#

hahah okay okauy

#

thank you though thank you πŸ™

sage sparrow
#

Hi, my data specifically isn't showing in a bar chart with Plotly. I tried visualizing a mock pd.DataFrame and that worked. Any idea why and what to do to fix it?
Link to the other place I asked: #1226543867864682646 message

#

All I get is an empty graph

serene scaffold
sage sparrow
#

Oh got it, sorry

#

Should I delete this one then?

serene scaffold
sage sparrow
#

Okay, how do I do that?

serene scaffold
sage sparrow
#

Like that?...

serene scaffold
sage sparrow
serene scaffold
#

nothing. it's fine.

sage sparrow
#

I'm causing way too much trouble lol

serene scaffold
#

just remember for next time.

sage sparrow
#

Got it, thanks

tranquil mist
#

Do you guys know of any insightful papers on upsamling lowpassed signals into broadband ones ?

serene scaffold
#

I'm not home right now, but I might be able to help when I get home. In the meantime, you can share the code and error message as text.

languid glade
#

im trnna code in python a script that would lock onto the color green but inside the game it doesnt move the guy forcing him to look at the green but it works in general

orchid forge
#

Be safe while reaching home

arctic geyser
serene scaffold
#

But in general, I always live on the edge

orchid forge
#

Oh

half remnant
#

i am trying to get the data out from a .mat file which has accelration data from the matlab mobile app but it does not seem to work i can get it out in matlab super easy but when i try to do it in python it wont work, the data that is in the file is the time, x, y, z accelrations; here is the code and the result for it:

from scipy.io import loadmat

# Load the .mat file
acceleration_data = loadmat("drop.mat")

print(acceleration_data)

result:

{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sat Apr  6 11:28:20 2024', '__version__': '1.0', '__globals__': [], 'None': MatlabOpaque([(b'Acceleration', b'MCOS', b'timetable', array([[3707764736],
                     [         2],
                     [         1],
                     [         1],
                     [         1],
                     [         2]], dtype=uint32))                         ],
             dtype=[('s0', 'O'), ('s1', 'O'), ('s2', 'O'), ('arr', 'O')]), '__function_workspace__': array([[ 0,  1, 73, ...,  0,  0,  0]], dtype=uint8)}
gritty vessel
#

Hello Everyone

#

I have a doubt does cnns architecture is also responsible for output shape?

#

I am feeding an array of shape (1,1536,1392,6) as input and in output it's an array of (1,1536,1392,2)

#

But whenever I fit it I always gives shape mismatch error

#

And also I can't find any tutorial where they use arrays instead of images in cnn

#

I would really appreciate if you guys can suggest me some resources or projects which uses Multidim arrays in cnn

serene scaffold
serene scaffold
gritty vessel
serene scaffold
gritty vessel
#

You are right

#

For now can I explain the error ? I couldn't get the error from my internship lab they don't allow internet or electronic items in there πŸ˜…

serene scaffold
gritty vessel
#

Alright just a min I will show a similar error

#

ValueError: Input 0 of layer sequential is incompatible with the layer: expected shape=(None, 1392, 6), but got shape=(68232320, 64)

#

I will try to note down the error today frm my lab computer and will show the proper in today evening

serene scaffold
#

when it says that the expected shape is (None, 1392, 64), the None means "however many instances you have" or "batch size". so every instance, if viewed stand-alone, would be of shape (1392, 6)

#

and if you had three of those together, then the shape would be (3, 1392, 6). and if you instead had (4176, 6), which is 1392 * 3, you'd know you messed that step up somehow.

gritty vessel
#

I tried with one array also

serene scaffold
#

with one array?

gritty vessel
#

I mean stacked array

#

Of shape (1,1536,1392,6) as X

serene scaffold
#

you should be stacking instances along a new, leftmost dimension

gritty vessel
#

And for y array of shape (1,1536,1392,2)

serene scaffold
#

so each instance, viewed on its own, is an array of shape (1536,1392,6)?

gritty vessel
#

Yup

serene scaffold
#

what is the model designed to do?

gritty vessel
#

To predict where lightning is going to happen so I had pin point coordinates of lightning

#

What I did created an image where there is no lightning it will have zero and places where lightning is happening is represented by 1

gritty vessel
#

And another one is count wherever the lightning is 1 what's the count of it

serene scaffold
#

can you show the whole error message, starting from traceback?

gritty vessel
#

At that place so we can measure the Intensity

gritty vessel
serene scaffold
gritty vessel
#

I tried it on colab and my system but both crashed

serene scaffold
gritty vessel
#

Not enough resources error on my system

#

And Colab crashed and restarted

serene scaffold
gritty vessel
#

Single

serene scaffold
#

what kind of architecture is this?

gritty vessel
#

It was basic I just wanted to check whether it runs or not

#

But still it crashed

serene scaffold
#

can you copy and paste the whole thing into the paste bin?

#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

gritty vessel
#

I can't

serene scaffold
#

why not

#

because if you can't show me the code you were planning to work on in the evening, I won't be able to help with that

gritty vessel
#

My workplace does not allow electronic items so I didn't took laptop

gritty vessel
serene scaffold
gritty vessel
#

It's 7am here what about at yours?

serene scaffold
#

I'm east coast us

#

9:34 pm

#

anyway, problems like this are either because instances are being joined incorrectly, or the layers of the network are set up incorrectly (the output of one layer can't be fed to the next)

gritty vessel
#

Yeah that can be the problem

#

Alright I will ping you when I come back home is that ok?

serene scaffold
#

For this specific question, yes. But not in general.

gritty vessel
#

Yeah for this problem only not for anything general

slender kestrel
#

Hey so i am registring for a 36 hours live hackathon and and we are open to decide our own problem statement and the theme is build with ai to solve real world problems so what can be some realworld problems which i can solve using ai in 36 hours the problem statement should be novel since its is for a hackathon

tired lodge
arctic wedgeBOT
#

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

foggy heath
#

How can I implement AI to my discord bot? I want it to behave just like other AI models but I want to also give it a whole lot of information to answer certain questions and talk in certain ways. Is there like some website where I can create/edit a model then use APIs to make my bot send the model's messages.

serene scaffold
foggy heath
#

Language generation, I have no intention for it to generate any other media (right now.)

serene scaffold
foggy heath
#

I'm currently testing out hugging face's transformers and a pretrained model of gpt2 but things aren't going as planned

serene scaffold
foggy heath
foggy heath
#

right now I can't get it to even generate simple english

serene scaffold
foggy heath
foggy heath
foggy heath
#

uhh

#

gtx 1650 i believe laptop gpu

foggy heath
#

whaaaaaaaaaaaat

serene scaffold
#

you'll never be able to fine-tune ("edit") an LLM with that little compute power.

serene scaffold
# foggy heath whaaaaaaaaaaaat

the state-of-the-art in AI is pretty much always depends on the best hardware that currently exists anywhere. so if you want to do something in a home lab, you need to be content with staying a few generations behind, or paying for compute resources.

#

with an RTX 1650, you're probably still behind the "fine tuning language models" stage. and definitely behind for fine-tuning any 7 billion+ parameter models.

muted hollow
#

guys, which ide should i use. Im currently using google colab but keep exceeding usage limit

serene scaffold
misty mist
#

If you are using colab to train NN model, i think colab provides tesla t4 gpu. I recommend you to use gpu of kaggle, it will provide you p100

#

but i think there is 12 hour limit, and 30 hours limit per week

toxic mortar
#

with i9 13thgen

midnight harbor
#

Guys anyone please advice me a their best exp language detection (classification) library/freeapi
on python
tag me when anyone reply

frozen tundra
#

hi, i made my own neural network, and it works great using one input and one output only, but when i tried teaching it using mnist dataset (more then one input and output) it dosent work anymore, there is a mismatch of matrices and i dont understand how it works for one input and output but not for more https://paste.pythondiscord.com/JE2Q

#

the error is Traceback (most recent call last):
File "C:\Users\iddob\PycharmProjects\Neural2\main.py", line 24, in <module>
nn.back_prop(x_train[i], y)
File "C:\Users\iddob\PycharmProjects\Neural2\NeuralNetwork.py", line 112, in back_prop
d_predicted = (np.dot(self.weights[i].T, d_predicted) *
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (10,20) and (10,) not aligned: 20 (dim 1) != 10 (dim 0)

serene scaffold
#

great job getting it to work on individual instances, though! the hard part is done.

frozen tundra
#

Thanks, I will try to implement it. Do you have a website or an article you can direct me to?

serene scaffold
frozen tundra
#

Thanks I will look into it

serene scaffold
frozen tundra
serene scaffold
frozen tundra
#

Thank you

robust stratus
#

I am using data grip and I can’t see the page where it shows my query results

#

Idk why

orchid sky
#

@serene scaffold I was able to figure out the issue as it was indexing

serene scaffold
#

YAY

orchid sky
#

I did review python once more

#

Thank you again for the help

mild grotto
#

My brain is having trouble with my project

warm flame
#

hi, everybody.does anyone know is there a way to extract tweets without the api of x

fallow leaf
#

Hello guys, what is the best way to learn machine learning with python? I tried learning about linear regression but I feel overwhelmed by all the math and visualization and coding that are behind every concept.
have you tried self studying ML? if you have what is your suggestion

subtle remnant
jaunty helm
fallow leaf
#

i know about python and some basic stuff about needed libraries

jaunty helm
#

ideally you can read the code and know at least like 80% of what's going on in the code when learning a new ML technique
what you don't want is to often have to look up what the functions / syntaxes do in a specific step

fallow leaf
#

I like reading about stuff, any book suggestions?

jaunty helm
fallow leaf
#

thanks

toxic mortar
#

in the DL would you normalize every input ( non-categorical ) data?

stiff urchin
#

Before learning concepts like linear algebra, probability statistics, calculus for machine learning, what topics do i need to have a solid grasp to understand these concepts clearly? could anyone help with this?

spark nimbus
#

basic algebra would be a good start, after that shift to the topics you mentioned (I'd recommend linalg -> calculus -> probstat)

past meteor
#

That's how it was done in my bachelors. lin alg -> (multivariate) calculus -> probaility -> statistics (1 course) -> econometrics -> ML

plucky sedge
#

someone please help with my question, I've been waiting over 1 hour for someone to reply (I had to repost) #1227266024231932015 message
πŸ™πŸ™

slender kestrel
tired lodge
rancid sorrel
#

anyone know how to deal with pyspark and "dirty" text, when it comes to nlp imports

rancid sorrel
#

got it to work with

      .option("header", "true")
      .option("quote", "\"")
      .option("escape", "\"")
      .option("multiline", "true")
      .load(datafilenew))```
flint aurora
full furnace
#

Data science job still in demand? What's the essential stuff I need to learn to land a entry level job

serene scaffold
hollow sentinel
#

hi everyone, i'm trying to visualize this data

#

so i want a line graph in seaborn

#

where each ID is its specific line

#

i want the IQ on the y axis and the POS on the x axis

#

and i want to use the month and year column on the x axis

#
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_excel("/Users/rahuldas/Desktop/BAN 112 Final Project/Analysis_G1_Value.xlsx")
print(df.columns)
df['Date'] = pd.to_datetime(df['Year'].astype(str) + df['Month'], format='%Y%B').dt.strftime('%Y-%m')
print(df.columns)
#

this is what i have so far

#

am i making sense?

#

i'm just so stuck

hollow sentinel
#

i'm confused too

#

is what i'm suggesting not possible?

serene scaffold
#

IQ on the y-axis, POS on the x-axis, and I want to use the month nd year column on the x-axis
if you have a date and two features, then the date needs to be the x axis. You'll probably need to do it as two lines, one for IQ and one for POS.

hollow sentinel
#

it’s gonna be chaos with that many lines

serene scaffold
#

It would just be two.

hollow sentinel
#

huh

#

i think my brain is dead

serene scaffold
hollow sentinel
serene scaffold
#

the dates are out of order. that's a problem.

hollow sentinel
#

oh shit, i didn't realize that

serene scaffold
#

here we go

#

looks like dates aren't unique

hollow sentinel
#

that's a wacky looking graph

serene scaffold
#

it's a plot. not a graph. if you're going to be a data scientist, you can't use those words interchangeably.

hollow sentinel
#

...plot

#

i've been stuck carrying this group project on my shoulders for the past 3 weeks

#

it's an eight person group project

#

eight people stel

#

the whole goal is to figure out the relationship between invoiced quantity (IQ) and sales (POS)

#

ugh whatever i guess i'll go to office hours again

hollow sentinel
serene scaffold
hollow sentinel
#

oh oh oh, ok

serene scaffold
hollow sentinel
hollow sentinel
desert oar
#

it infuriates me to no end that Axes.scatter doesn't accept markersize=, only the magic s=

#

i like the magic s=, but sometimes i don't want it! argh

#

@hollow sentinel example of what i was talking about above:

import matplotlib.pyplot as plt
import numpy as np

seed = 2610494104
T = 1000
rng = np.random.default_rng(seed)
t = np.arange(T)
x = t / 1000 + rng.normal(scale=0.25, size=T)
y = x * 2.5 - 0.5 + rng.normal(scale=0.5, size=T)

plt.scatter(x, y, c=t, s=5)
plt.colorbar(label="Time")
plt.xlabel("X")
plt.ylabel("Y")
plt.title(f"Simulated data\n{seed=}")
plt.show()

(edited to use a better example)

shut girder
#

Hello everyone, I am currently studying Linear Algebra for data science, what are the key concepts that I should get a good understand of? I don't know if studying concepts abritarily would be a good use of time

wooden sail
river mural
#

is there a numpy/scipy alternative that supports very high precision like float128, float256, float512 or even exact math (i do not care much about performance)

jaunty helm
long canopy
#

new mixtral + griffin HYPE

desert oar
desert oar
#

wait, numpy has float128 @river mural

agile cobalt
agile cobalt
#

weirdly enough there is a is_decimal256 function but I do not see documentation about the decimal256 type itself

leaden narwhal
#

anyone can link me to a webpage talking about lstms and rnn's for grids, number of people and datetime

#

aka i want to predict future number of people in a certain grid in future times

serene scaffold
#

if you ask your question in more than one place, please link to the help thread, so that there's no duplication of effort

iron ruin
#

ill delete the message since i posted in help desk

#

dammet why scipy gotta go remove those funcs all of a sudden yert

iron ruin
serene scaffold
iron ruin
#

ohh alright thanks

serene scaffold
#

yw

iron ruin
#

now to wait while I try to somehow solve this dayum triu issue

iron ruin
# serene scaffold yw

if I want to ask for help regarding import issues instead , which channel should I ask ?

serene scaffold
hollow escarp
#

Hi, i've been told that i should ask here about opencv. Im currently starting working on creating plate recognition software which will work with resberry PI and additional camera. Im wondering what methods would give me the highest precision in outside environment. Also do you know any good cameras which cheap and would work perfect even at night and in bad weather conditions? Also im considering if python would perform not much worse than c++ ( i dont need super efficient software working In less than 2 seconds ) i just need to keep it as effective as possible

hollow escarp
# hollow escarp Hi, i've been told that i should ask here about opencv. Im currently starting wo...

Also i found some datasets with trained images to yolov8 and My question is, how accurate are these sets? https://universe.roboflow.com/roboflow-universe-projects/license-plate-recognition-rxg4e/dataset/4 - 22174 images

odd meteor
jade bloom
#

Hello, I have a question. I've trained an RL model to pathfind to a target. All this is a square(the robot) and a point(the target). Every frame the model uses the distance from the robot to the target, the target's corodinate point, and the robot's coordinate point to estimate the optimal angle to drive at to reach the target. After the angle is found, in the same frame the robot moves at that angle.
However, I am running into an issue where the model returns angles, but they are noisy. The drive angles vary quite a bit, +- 30 degrees. The robot is still able to drive to the point, it just jitters a lot. Is there any way to smoothen out the robot's path and/or filter the noise?

sweet zealot
#

It's my first day learning AI(RL) I'm watching a video about reinforcement learning in python. I'm not sure I understand these spaces correctly.

Let's say you want to create a bot for a video game flappy bird. I thought with space they ment the environment of the game(so the game window in X,Y coordinates). Since the bird can move from the bottom border to the top border. So the space of the environment would be:
box(bottem_border, top_border, 2,) ( 2 because flapy bord is a 2D game )?
Is my thinking correct here?

Also, for some reason I saw people put infinite numbers for lowest and highest value box(bottem_border, top_border, 2,) and they explained it would be fine. Which makes me realise I do not understand wtf a space actually is used for

mild grotto
#

I recommend thinking about it like this:
The INPUT should describe your current state when you pause the game

#

For flappy birds, my thinking is that it would be:
[ your Y coordinate, your Y velocity]
Because your bird is effected by gravity. You have no X coordinate, you can't move forward or backward, or anything like that.

#

Also as part of the input, you would need the locations of all obstacles (how far away/how tall they are)

#

Then it's simple: If you think this fully describes all the information about the current board state, then the output is just "Do I jump, or not jump?"

#

So the output is simply a true/false value (or 0,1)

#

And you can see even in the example under Dict he uses Height: ... Speed:Box(0,100, shape(1,)) . which is exactly what I'm describing for your bird

hallow sphinx
desert oar
thin palm
#

Hey, I am looking for a way to label data via web services is there an existing data base that would allow me to deploy a web based labeling tool?

#

using flask

urban cipher
#

so i pickled my keras model and tried loading it into my flask app for use, but an error occured:

TypeError: unpack_keras_model() missing 1 required positional argument: 'optimizer_weights'

any idea to fix this?

trim saddle
urban cipher
#

my Machine Learning instructor even demonstrated how to load a pickled model from his example, and have tried pickling before and loaded it normally (on google colab), but it seems it won't on my machine. could it be an incorrect version of keras and tensorflow installed?

leaden narwhal
finite fjord
#

!python /content/Licence-Plate-Detection-using-YOLO-V8/ultralytics/yolo/v8/detect/train.py model=yolov8n.pt data=/content/Licence-Plate-Detection-using-YOLO-V8/helmet-detection-1/data.yaml epochs=100

#

Hello everyone I m working on helmet detection using yolo v8 I m facing this error I have load the dataset from roboflow

lofty thorn
#

does anyone know from where i can find assignments on excel

gritty vessel
#

@serene scaffold hey I asked a que here regarding shape error in cnn .I just wanted to update I solved it on my own

#

The problem was in my architecture only in my output the image is of same shape as input so I had to upscale it to original after it gets small

leaden narwhal
#

will this still work?

orchid forge
#

i need help

sweet zealot
# mild grotto you can't move forward or backward

What exactly are these spaces used for? Are these 'space values' the values the model uses to check differences between actions? And make it choices based on these values? That would mean the space values you define(which you define as 'input') is like the foundation for your model if I'm understanding this correctly.

So what if I would instead of Y coordinate and Y velocity space like you mentioned, use a more values, would it make the model better?

For example:

  • The absolute difference between the bird and obstacle.
  • X,Y Coordinates of all obstacles on screen, so not just the upcoming obstacle. Allowing the bird to anticipate on next obstacles after the upcomming next one?
  • Y coordinates of the bird.
  • Etc...

I'm still a bit confused about the actual spaces the guy in the video defined. but I think I do know what spaces are used for now

sweet zealot
# mild grotto And you can see even in the example under `Dict` he uses `Height: ... Speed:Box(...

You mentioned Dict(('height':Discrete(2), "Speed":Box(0,100,shape=(1,)))) would be a good fit for a flappy bird model.

But If I break it down I don't understand it. height refer to the Y axis? Which should be the absolute difference between bottom border to top border. But then why does it say discrete(2)? Does this mean there are 2 values?

I'm assuming speed box just means speed value from 1-100 so that's clear.

Also don't understand what shape means.

craggy bough
#

i am trying to build a sequence to sequence model with 5 features but i need help , as i don't know how to set the outputs for each and the shape , i am uisng lstm

#

is there anyone who wanna get on voice chat and i can stream the code and you can see and help me

serene scaffold
mild grotto
#

They come from both the top and bottom. So you can't just use one "height" value. One value does not tell you where the pipe is. You need two

#

If a pipe 50 pixels from the top of the screen, does that mean you can go under it or over it? Clearly you need both the start and end of the height

#

So height needs 2 values per pipe

mild grotto
elfin robin
#

is subset selection and feature selection same ?

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @vivid magnet until <t:1712860170:f> (10 minutes) (reason: newlines spam - sent 12 consecutive newlines).

The <@&831776746206265384> have been alerted for review.

vivid magnet
#

Hello guys.
I'm traying to install delta-spark package to work with deltalake in python, but when i follow the tutorial steps, It brings a 'PySparkRuntimeError : [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.'
I already install the Java, from oracle site (version 17), and in the enviroment varibles has the JAVA_HOME (C:\Program Files\Common Files\Oracle\Java\javapath\java.exe). But the problem is still going. I'm om Windows and idk of How to set PYSPARK_SUBMIT_ARGS

odd meteor
# leaden narwhal also my grids are already created they are squared

Yeah, but the question is how well? If you want better model performance, go for hexagonal grid instead of square grid (H3 creates hexagonal grids)

There are some good reasons to use hexagon instead of square grids. Some of them are

  1. You can project them really good on any round surface (example, picture fitting square grid vs hexagonal grid on something round, example a globe, yeah, that round spinnable globe high school geography teachers always seem to have πŸ˜€)

  2. The distance from the centre of one hexagon to the centre of a neighbouring hexagon is same but such is not the case if you're using a square or triangle grid.

  1. You'll also get better projections with Geohashes.

If you wanna have more fun with your project, you can compare & contrast model performance on spatial features with square grid vs hexagonal grid.

Long video, but you'll understand better what I'm tryna say when you watch this https://youtu.be/TqRGLtbAHHw?si=Kw4abjH9mKgTaYZ_

desert oar
#

among many other practical reasons, there's already a fast implementation in C++ and a solid vectorized implementation in the h3ron library. just import and go.

viral zinc
#

Hi, I am new to data science and was hoping if someone can share a roadmap with me to learn data science with python and R. Currently I am comfortable writing code in python, but don't know the data science packages (numpy, pandas).

#

also, should I only be focusing on data science with python right now and then later learn it with R? or learn data science both with R and Python at the same time?

serene scaffold
#

They're not complimentary tools. Anything you do with R, you can do with python.

dense oar
#

Are data science courses necessary components of an AI degree at a university?

fallen steppe
#

How do i run the process using the gpu. This is Kaggle

#

It's enabled but it still runs on cpu and ram

#

Do i have to make changes in the code to use the gpu or something?

wooden sail
#

yep

#

code is written a little differently for gpu

#

in pytorch you have to manually move stuff to the gpu

fallen steppe
#

Ohh alright. Could you just send me any such code for reference

fallen steppe
#

Got it πŸ‘

#

Now the gpu is running

leaden narwhal
#

Thing is he is a big proud cause he is an assistant teacher

#

But nonetheless regarding my Machine learning process this whole ordeal has been a nightmare

#

Every time I think I’m going toward

#

I take 20 steps back

#

I feel like dying cause this learning curve is so big

odd meteor
spring field
#

Mmm, so, I have finally gone out and made my own ML model for a thing just for sort of practicing... anyway, it's very good at overfitting the data. sobbing
Mostly I just wanna hear your opinions on what part of the process has likely caused this issue.

I'll start with what is my end goal, there's this site https://http.cat/ that provides an API for some images (see first attachment for an example), I want to train a model that can "predict" (?) the coordinates of the top left and bottom right corners of a rectangle that would cover the number. So basically object localization is what it seems to be called (object detection but for a single object?). The images are of different resolutions and I want to devise a somewhat general approach for this and I kinda don't want to involve these images in the training because there are only a few of them, so I want to train the model on a larger dataset and then pretty much just put it to test on the actual images.

Alright, so, the dataset, I picked a rather naive approach for this and basically my idea was/is to generate a bunch of images with a resolution of [500, 800] by [500, 800] (those are ranges, size is picked at random in steps of 1 (so, any integer in the range, this is true for other random ranges mentioned here as well)) (the size is similar to what the end goal image sizes are). The base image/background is black. Then in those images I put a [300, 500] by [300, 500] area of random RGB noise (to sort of simulate the cat picture) (using np.random.randint for each channel) at a random location within the base image. Then I put a randomly generated string from string.ascii_letters of length [10, 30] (randomly chosen length) with a font size of 30 pt in a random place on the base image. Lastly I put a number in range [100, 999] (not randomly generated, these were just made in a sequence, one after another) in a random place on the base image (font size 50 pt). After crafting this random image, I downscaled it to a fixed size of 200 by 200, calculated where the top left and bottom right points should be for the rect and saved that all in a dictionary for use in the model. See the second image for an idea of how the final image looks like. I mean, it's really sort of very random, so I'm guessing that might be throwing the model off as well. And the dataset contains 4 such random images for each number in range [100, 999].

The model is also quite random I think, it's 4 layers of 2d convolution, batch norm, and relu, then it goes through a linear layer and finally a sigmoid. Using SGD as the optimizer and L2 normalization as the loss. See image 3 for the model implementation in pytorch. All that stuff is sort of kind of chosen randomly. Learning rate is 0.001. Batch size is 16. Dataset should be randomly shuffled (using the same seed value).

Now onto how it trains... well, it's overfitting the training data really badly, see image 4.

So, thoughts on what could be the major culprit or maybe everything is horrible. πŸ˜…
I suspect the dataset could be more structured to avoid all that stuff overlapping with each other, I can imagine the model itself can be improved as well. Ideally I'd ofc want the model to recognize the numbers and understand that that's where it needs to focus on basically... (way easier said than done, lol)

#

oh also, the training data is like 20% of the dataset (that is, I take an 80:20 split from the dataset for training and testing respectively), it's not the actual images from that website, it's testing against similar images from the same (randomly generated) dataset

spring field
#

where did you go? sobbing
so, sth about making the input less random, like applying the noise to the whole image instead of adding those random areas, and some other things
what about GAN though? that's sth about generative AI, which is certainly sth I'll probably check out some time, so thanks for that bit, but I guess it doesn't fit this case?

#

alright, what if I increase the dataset like 25 times? generating a 100 random images for each number? (sounds like it would take forever to compute... that's not good either)
the other idea that just came to mind regarding object localization is creating a varied dataset where some images don't even have a number and have it detect its presence alongside its location

#

how do people even research this stuff? cuz sure I can look up existing solutions to this probably, but I kinda wanted to tackle it myself a bit (though clearly I'm here asking for advice, lol... but anyway, probably better for my learning still and whatnot)
do researchers just put random stuff together and see how well it works out?

wooden sail
#

you run into a "practical" problem. then you read about that problem and see if there are good solutions. if the solutions aren't satisfactory or you can identify a way of making them better, you try out your ideas. if it works, great! if not (which is like 99% of the time), you dust yourself off and decide whether to try a different approach or move to a different problem

#

doing stuff randomly and/or blindly is a great way to increase those 99% odds to like 99.9999% that you'll fail

lofty thorn
#

what is the reward of the agent in reinforcement learning??

#

or penalty

serene scaffold
#

You have to set up the training procedure in such a way that the agent's actions can cause the reward to go up, if it does the right thing

#

And the agent needs to "know" that that happened

signal holly
#

guys how do I learn ai/ data science
I've been holding off on this for months now because idk how to exactly go about this

#

I searched up guides but there are so many and I'm still confused on what to do even after I try applying them

#

it's a mess

serene scaffold
# signal holly guys how do I learn ai/ data science I've been holding off on this for months n...

Here are some suggestions/things to keep in mind:

  • You will not get a job in DS/AI without a degree.
  • Guides on websites like Medium and Towards Data Science aren't actually intended to be helpful. They're just portfolio fodder for the authors.
  • DS/AI is applied math, so expect to learn lots of math as a separate thing from learning programming.
  • Don't try to learn DS/AI in terms of Python libraries. Python libraries like scikit-learn and pytorch are very helpful for creating models, but they aren't designed so that you learn more about DS/AI as you use them.
  • Pick a textbook or video course and stick with it, and make sure that you're actively engaging with it in some way. if it has practice problems or "homework assignments", do them.
iron basalt
#

Wide to break out of local minimum. So let yourself go wide every once in a while, especially if it's easy / cheap / quick to do so.

signal holly
#

what do you think?

serene scaffold
signal holly
serene scaffold
signal holly
#

what would I do after that course tho
would it just be one course

serene scaffold
signal holly
#

hmm when would I do projects tho
also since ai/ data science is a huge field
how would I know which direction to take

signal holly
serene scaffold
#

but also, if you're serious about wanting to learn DS/AI, six hours is not that much. if you were in a degree program related to AI, you'd be doing 40+ hours a week.

spring field
spring field
signal holly
spring field
#

I mean, at this point the idea of making an AI Player for a 3D game feels lightyears away (as, of course, expected)
I'll take that idea of writing articles for a portfolio though, I suppose that would help with understanding the concepts better as well

iron basalt
#

Once you know enough you can pick random points much better, so it can act as something extra or when you want to go more in a specific direction.

#

This also applies to learning libraries by just watching videos versus reading the documention (if it has any).

signal holly
iron basalt
#

Videos are useful at making you aware of things though. A book requires a lot of time to get through and you may not know yet if you actually care about the topics it covers. A book on linear algebra will usually not make it immediately apparent why you should care about it / if it's applicable to your problems, but a short video can quickly demonstrate its use without many details.

iron basalt
# signal holly wdym about this are you saying documentation is better

Documentation is like a book, if you really want to know the library, then you can read the documentation. Videos and other short forms can only give you small parts that may be enough, but in the case of something more complex like AI it can't cover enough (unless the video starts getting so long as to effectively be an audio book).

iron basalt
#

It's also why it's often the hardest part about game dev. Open ended, high dimensional. Not as comfy as just writing a rendering engine using very well established methods (or other more solved parts).

shut girder
#

Hello everyone, I am a beginner in machine learning. As I gradually build up my understanding of the mathematics used in machine learning, what would be some good ways to apply what I learn?

grim fog
cursive folio
#

Where can I get info on how to make Survival Predictability (months) on oncology (analysis), I have been researching online a lot but haven't found any info. Any help appreciated!

abstract rune
abstract rune
lapis sequoia
#

Hey everyone! I'm offering $100 to anyone who can help me install two local instances of voice cloning software like TortoiseTTS, X-TTS, etc. I'll pay $50 after each successful install, which means it should consistently clone the voice of a chosen person.

I've run into some snags trying to do it myself, so be prepared for a few challenges along the way.

Specs wise, I'm working with an NVIDIA GeForce RTX 4070 Ti, 13th Gen Intel(R) Core(TM) i5-13400F, and 32 GB of RAM, so hardware shouldn’t be an issue.

I’ll be checking my DMs tomorrow at 18:00 BST and will give everyone a fair shotβ€”first come, first serve. Looking forward to your messages!

peak ridge
#

hey chat

orchid forge
#

hello i want to make a cluster using a data

#

i really need help here

#

im not getting what data i should use to make a clustering map

#

anyone here?

#

this my dataset

#

how to solve this "Identify any patterns or clusters of restaurants in specific areas"

#

@serene scaffold

#

guys please

orchid forge
#

fuck it....i did it myself

#

hahahahaha

#

shove it

#

Rap helps to solve fucked up shit

agile owl
#

my spark job failed in the middle due to some random IO Error and java tracebacks suck

#

any suggestions on how to debug this

sweet zealot
# mild grotto Keep in mind, I haven't watched that video and I don't know what that guy is doi...

You're right about the pipes. I forgot to mention that I made the flappy bird game myself. Instead of pipes I used obstacles like small square hitboxes the bird should avoid. There could be 2/3/4/5 obstacles on the same X-axis on top of eachother, but never on the same Y axis. I did this so I can easily add a higher difficulty to the game and see how far I can push the eventual RL model. So it looks a bit like this

Currently I just take a screenshot of the game and return

  • every obstacle with X,Y coordinates in a list.
  • the bird X,Y Value.
#

I asked chatGPT and he gave me something like this

self.action_space = spaces.Discrete(2)  # Example: Jump or Don't Jump
self.observation_space = spaces.Box(low=0, high=800, shape=(4,), dtype=np.float32)  # Example: Flappy Y, Obstacle1 Y, Obstacle1 X, Obstacle2 Y, Obstacle2 X```
#

Action space seems logical,

low=0 high-800 is logical as well.
But then again shape 4? What if there are more obstacle

I think shape 4 should be a value that takes my list with every obstacle. I'm pretty sure the model needs to anticipate on up comming blocks as well so flappy does need to be aware of all obstacles on screen

orchid forge
hollow otter
#

can some expalin if there a way to integrate apps in python

#

like two apps merged by a link or something

serene scaffold
serene scaffold
orchid forge
orchid forge
#

😊

amber badger
#

Hello! Right now I am working on making a KNN model for a discrete outcome (readmission for healthcare patients, either 0 or 1), but since there's such few records with readmitted as 1 compared to the amount with 0, the model is having a tough time accurately predicting readmission, instead only predicting everything as non-readmitted. How can I take a sample from my dataframe with more of an emphasis on readmitted patients so they aren't so underrepresented when being put into the model?

#

(Doing this with Pandas DataFrames and SKlearn kNN)

dense atlas
#

Hey,I wanna learn about data science
Not dive into algorithm field
But more of making sense of data set and getting conclusions out of it.
Background : coming from economics and stats background,I would like to enter data analyst kind of field ,I still am extremely vague I know,But that is what I wanna learn more about

grim fog
#

but if it is a high percentage of rows u have to remove to make the dataset balanced u might have to look into upsampling

agile owl
#

this is the rate of mortgage origination in florida

#

Here's the average credit scores for new mortgages corresponding to the previous time series

amber badger
#

I'm using ChatGPT to learn other methods of going about making the model better but I'm slowly but surely hitting a wall with it

neon lintel
#

Has anyone tried to do the KiTS (kidney and kidney tumor segmentation) challenge for learning purposes? If yes, could you please point me to a guide on how to approach the problem? I know there's a lot of proposed solutions as well as general guides to specialized CNNs, and gpt4 has been helpful, but I figured I should ask here as well.

amber badger
#

Its the model we decided to go with, no true reason why outside of experience with it in an undergrad course

gritty vessel
#

Is there any way we can use class_weights for 3+ dimensional data?

#

Whenever I feed my data of shape (28,1536,1392,6) and try to use class weights it throws 3+ dimensions data not supported by class_weights error

hollow otter
royal crest
buoyant shoal
#

hi, dumb question, but does anyone see any glaring mistakes on my "curve fit"?

#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt


df = pd.read_csv("directory")

def func(x, a1,b1,a2,b2, a3, b3, a4, b4):
    return a1*np.exp(-1*b1*x**2) + a2*np.exp(-1*b2*x**2) + a3*np.exp(-1*b3*x**2) + a4*np.exp(-1*b4*x**2)


x_values = df['x']  # Array of x-values
y_values = df['y']  # Array of y-values
y_error = df['y_err']

popt, pcov = opt.curve_fit(func, x_values, y_values, p0 = [1,1]*4)

a1, b1, a2, b2, a3, b3, a4, b4 = popt

fit_y = func(x_values, a1, b1, a2, b2, a3, b3, a4, b4)

plt.plot(x_values, y_values, 'o', label='data') 
plt.plot(x_values, fit_y, '-', label='fit') 
plt.show()```
#

I get this lmao

buoyant shoal
#

I reworked it by first finding a good mathematical function that would fit this "sloping gaussian"

#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt


df = pd.read_csv("dest")

def gaus(x,a,x0,sigma, m, c):
    return a*np.exp(-(x-x0)**2/(2*sigma**2)) + (m*x + c)

x_values = df['x']  # Array of x-values
y_values = df['y']  # Array of y-values
y_error = df['y_err']

popt, pcov = opt.curve_fit(gaus, x_values, y_values, absolute_sigma=True)

a,x0,sigma,m, c = popt

perr = np.sqrt(np.diag(pcov))
print(type(perr))

fit_y = gaus(x_values, a,x0, sigma, m, c)

plt.errorbar(x_values, y_values, yerr=y_error) 
plt.errorbar(x_values, fit_y) 
plt.show()

#

what does this mean?

fervent seal
#
imu.dropna(inplace=True, subset = ["LONGITUDE", "SBG_ECAN_MSG_EKF_POS(LATITUDE"])
imu = imu.resample(rule="0.05s", on="Timestamp").mean()
#imu.dropna(inplace=True, subset = ["LONGITUDE", "SBG_ECAN_MSG_EKF_POS(LATITUDE"])
#imu.drop_duplicates(inplace=True, subset = ["LONGITUDE", "SBG_ECAN_MSG_EKF_POS(LATITUDE"])
print(imu["LONGITUDE"].head(10))```
Why do I need to call the second `dropna` after downsampling using resample? It ends up filling some of the rows with NaNs
#
$ python pathplot.py
Timestamp
19665 days 23:55:38.342858076   -81.569231
19665 days 23:55:38.392858076   -81.569230
19665 days 23:55:38.442858076   -81.569231
19665 days 23:55:38.492858076          NaN
19665 days 23:55:39.892858076   -81.569231
19665 days 23:55:39.942858076   -81.569231
19665 days 23:55:39.992858076   -81.569231
19665 days 23:55:40.042858076   -81.569231
19665 days 23:55:40.092858076   -81.569231
19665 days 23:55:40.142858076   -81.569231
Name: LONGITUDE, dtype: float64
#

Is this cause it can't find an appropriate value or something to stick in? Are samples for that entire 0.05s period just missing?

#

I can just backfill all the rows so it isn't really a problem but idk why it's doing it

abstract rune
#

can we say that
two vectors of R3, are a basis for for R2 ?
I am a bit confused

wooden sail
supple linden
#

Hi, does anyone know why ImageDataGenerator fails to be imported

#

from keras.preprocessing.image import ImageDataGenerator

wooden sail
# buoyant shoal

i also don't have a chance to debug your code rn, but i would note a few things:

  • you probably want to add an extra parameter that offsets your gaussians, e.g. a1*np.exp(-b1 * (x - c1)**2) (edit: i just noticed you did this in the second iteration)
  • the problem is anyway nonlinear and nonconvex, meaning there is no guarantee you'll find a good solution with gradient methods unless you already start close to the solution. you'll have to pick a better initial guess of the parameters
abstract rune
#

numpy is so overwhelming at start
so many functions ughh

odd meteor
runic parcel
#

how to make a ai, like i will provide lots of data and it will learn. so if i ask it anything it will able to answer me. how to make something like this?

stray salmon
#

Hello, I want to write a programm, which detects german traffic speed signs, could anyone help me with that?

tidal bough
wooden sail
tidal bough
#

last time I tried both i didn't see a huge difference between the different global searches scipy provides

wooden sail
#

aight, the performance probably depends on properties that are anyway too difficult to check on nasty functions

tidal bough
#

i did have a fun time in the past using simulated annealing to find optimal setups for reactors in minecraft :p

left tartan
jade jay
#

If anyone here is into quant trading lmk. I just started learning python and I want to be on track to be ready for the workforce

sick eagle
#

guys i am bigginer and i want to be like you, guys pleaz how can i start learn AI, what should be start from?

#

sorry because my english is too weak, i still learn english

jaunty helm
little arrow
#

what would be the best model to use when working with fourier analysis

#

specifically detecting common frequencies present within signals

#

would it be a neural network of some form

buoyant shoal
buoyant shoal
#

Is there a possibility that there’s a more approachable way/solution?

wooden sail
#

the words i used earlier (nonconvex and nonlinear) are math lingo for "the problem is difficult and there is no general method that will always work"

buoyant shoal
#

here’s the actual question

wooden sail
#

sure

#

they probably expect something simple like a gaussian (since they explicitly mention sigma) and a straight line to correct the "baseline"

#

just looking at the plot you can make some rough calculations of what the mean of the gaussian and the slope and offset of the line should be

#

try making up an initial value of sigma and then let scipy do the rest for you

buoyant shoal
#

hunch*

#

now i have to guess the value of the mean and try to β€œhelp” scipy?

wooden sail
buoyant shoal
#

honestly i don’t understand the last bit of the question lol, what does β€œwidth sigma of the peak” mean

wooden sail
#

you know how a gaussian distribution is parameterized by the mean and standard deviation?

buoyant shoal
#

yes

wooden sail
#

any "bell curve" of the form a exp(-b (x - c)^2) has an amplitude a, an offset c, and a "width" b

buoyant shoal
#

the mean looks like it’s 10 and std maybe like 2.5, right?

wooden sail
#

in statistics you would instead write -(x - c)^2 / b^2, where this b is the "standard deviation". that's what they're asking you for

#

std dev is commonly denoted with a sigma

wooden sail
#

the line seems to have a slope of -5/20 and offset of 5

#

how would you initially remove the peak?

buoyant shoal
wooden sail
#

at any rate, what you've dscribed are the first one and a half iterations of "expectation maximization", which you could also do ofc

buoyant shoal
#

oh nvm you wrote the functional for a normal distribution

wooden sail
#

if you remove the peak by fitting a gaussian, yes

#

alternating optimization of independent components of an observation, updating the expected value, and then maximizing the likelihood

#

wdym by "just eyeballing" though

#

you'd probably make up a gaussian with arbitrary parameters and subtract it, yeah?

#

just remove it?

#

more than a bit, but there's always heuristics involved in picking a good initial guess

#

that works and it's still the same as 1 iteration of expectation maximization, with just an unconvential initial guess

buoyant shoal
#

p0 = [3, 10, 2.5, -0.3, 6]

#

this looks good?

wooden sail
#

sure

#

depending on whether the sigma is squared in your model, you might be missing a square root afterwards

buoyant shoal
#

oh i just eyeballed it lol

#

maybe it is idk

wooden sail
#

you know this, you wrote the code yourself

buoyant shoal
buoyant shoal
wooden sail
#

did you write exp(... / sigma) or exp(... / sigma^2)?

buoyant shoal
#

i just copied the functional equation of a normal

wooden sail
#

ok

wooden sail
buoyant shoal
#

😭 so why did it not work previously? I thought it's a code after all

#

it should work to search everything?

#

Dumb take but yeah that's the gist

wooden sail
#

nope

#

that is not the case

#

as i said earlier, for nonconvex problems, that is simply not true

#

and there is no general way of solving nonconvex minimization problems

buoyant shoal
#

nonconvex means a function that isn't convex throughout the domain?

buoyant shoal
#

but i understand yeah

sick eagle
#

before 2 months, programming is too excited but now is boriiiing i think because i learn too much

wooden sail
#

the gist of it is, some problems cannot be solved in closed form and also no algorithm exists that is guaranteed to find a solution

#

this is one of them, even though it looks so simple

buoyant shoal
#

so it's kinda like newton's root finding algorithm? bad first estimate gives u (maybe no convergence)?

wooden sail
#

gradient-based methods like scipy's fit (which is a newton method) will only work in special conditions

wooden sail
#

iirc scipy uses levenberg-marquardt by default, a quasi-newton method with rank 1 updates of the hessian (or its inverse)

buoyant shoal
#

😭 no idea at all

#

i'm just doing lin alg now unfortunately

wooden sail
#

if you have no good idea of a good initial guess and/or your function is nondifferentiable (once or twice, depending on your alg), you'll need to use a method like the one confusedreptile suggested

wooden sail
#

the linear least squares problem is an example of a convex problem (i.e. it's "easy" to solve)

#

and the rank of the system matrix involved in the problem determines whether you have strict convexity or just convexity, which determines the number of solutions

buoyant shoal
#

wow okay least square problem is like at the end of my linear algebra text

#

I'm still doing subspaces and the like

#

Thanks btw, i think the answer to my question is 0.75

#

if others are curious

buoyant shoal
#

Like why is that method not mainstream then? Why use curve fit at all? Just default to that?

wooden sail
#

because if you have a good initial guess, the gradient-based ones have guarantees

#

you can predict how far you'll be from the true solution after N iterations

#

these other methods have no such guarantees, they're heuristics

#

they often work pretty ok, but you can never give guarantees

#

they probably require more function evaluations

#

assuming you can analytically compute the derivatives, at least

buoyant shoal
#

yeah but i guess the "slow part" isn't the main issue maybe

buoyant shoal
#

Lol crazy

wooden sail
#

part of it is that a lot of the operations are slicewise, so that separate blocks don't interact with each other

#

if you were to matricise it you'd have a bunch of kronecker products

#

you might consider defining the operations for a single slice and then just saying "and this is repeated x times for all ..."

honest grotto
#

Good day everyone

#

Please I need materials to learn Deep Learning

mild grotto
#

Hey I'm having trouble with Matrix math.
I have a matrix A which has shape (N,N) and a matrix B with shape (N,)

I want to scale A by B... so I tried A.dot(B) but this gives me shape (N,) instead of (N,N). What am I doing wrong?

desert oar
#

that said I don't understand how you expect to get any kind of scaling here. are you looking for element-wise multiplication? that would be .multiply/* not .dot/@

mild grotto
#

I want element wise multiplication yeah

#

I want every element in the 0th row of A multiplied by the 0th value in B

desert oar
#

but you will want to understand the broadcasting behavior in either case

mild grotto
#

I was trying * and it wasn't doing what I thought it would do

desert oar
mild grotto
#

I think it's getting closer to what I'm going for, but still some bugs

#

I have a xFlow and yFlow value at each point, i'm trying to update each point according to the flow

#

I assume the issue is because my flow variable isn't normalized or something

#

I'll try again later

winter drift
#

Can anyone explain to me how illusion generative ai works, it's just not clicking for me

agile cobalt
winter drift
agile cobalt
#

the QR Code and alike things?

#

do you understand the overall idea of ControlNet guidance

winter drift
royal crest
#

oddly gross

agile cobalt
#

without any more specific questions, the only thing I can recommend is reading up on ControlNet

serene scaffold
dreamy isle
#

i thought "this almost feels like AI"

#

well because it is, apparently

shut girder
# winter drift

If you look at it from afar, it kind of looks like a man's face

desert oar
#

I assume this is user error, but do I need to do anything other than hvplot.extension('matplotlib') to use the matplotlib backend for hvplot?

I tried doing this in an ipython console:

import hvplot as hv
import hvplot.pandas
hv.extension("matplotlib")

df = ...

df.hvplot.line(x="Date", y=["X", "Y"])

but I only got some output like this:

:NdOverlay   [Variable]
   :Curve   [Date]   (value)

and plt.show() did nothing.

#

the holoviews ecosystem has really set a new low bar in terms of bad documentation. reminds me of matplotlib a decade ago

#

good examples, but very hard to figure out what they're doing or how to generalize them, beyond guessing and checking

#

ah, some progress. the hvplot command returns a holoviews.core.overlay.NdOverlay for which that output is the display() result

#

also I stand corrected: holoviews itself has great docs

#

it's geoviews that's kind of bare, but I guess the idea is that you read the holoviews docs first

#

aha, this might just be a problem with my editor setup. I got this when trying to plt.show() a plain mpl figure:

FigureCanvasAgg is non-interactive, and thus cannot be shown
#

and that happens even after I explicitly run %matplotlib tkagg

desert oar
#

huh, it looks like it's maybe caused by holoviews?? that's so weird

#

hv.extension("matplotlib") causes that FigureCanvasAgg problem to arise even when only using matplotlib

dawn light
#

I'm currently trying to create a neural net from scratch

I was just wondering how the calculation of errors is done when testing against a validation set
For example, if I have a training data consisting of 5000 rows and 1000 rows of validation (and doing SGD with a mini batch size of approx 8-16), do I check the accuracy against the entire validation set (i.e. all of the 1000 rows?, or do I do something similar to SGD where I select a mini batch and only test it against that then calculate the mean?)

desert oar
spring field
#

does it make sense to use a noisy background for objects you want to identify for the model to sort of learn to ignore the background in the general case and only look for the thing it's trained for?

spring field
# spring field Mmm, so, I have finally gone out and made my own ML model for a thing just for s...

it's still in the context of this, I changed my approach a couple times, changed networks, realized that I should probably not ask it to also predict what number is displayed, otherwise I think it tries to not only predict the location but it tries to learn the locations as if they were specific to each class (class being a digit in range [100, 999]), so I remove that from the equation and tried with different noise levels for the background and it seemed to actually work really well in testing on similar randomly generated images, but it completely failed when it came to using these completely different cat images with numbers in them. I also switched to a DenseNet, which seemed to help more than using some random convolutional networks
so anyway, I decided to generate a massive dataset with 5k samples for each number in range [100, 999] (4.5 million images in total, compared to the 9 thousand image dataset I was using before), they are basically random noise as the background (random size of [500, 800] by [500, 800]) and then a randomly picked font (approx 40 font families + some of them have variations), font size (anywhere in range [40, 130]), and foreground colour is used to draw the number in a random place on the image, then the whole image is resized to 200 by 200
the idea being that given the randomness in the background it would learn to sort of ignore it and pretty much just learn to recognize a 3 digit number in any image and approximate its bounding box

#

for reference this is how an image sample might look like

#

also, a bit tangential to this topic, but any resources for those generative networks that embed a word in an environment/image?

odd meteor
honest grotto
#

Thank you πŸ™πŸΏ

signal whale
#

has anyone here used chartjs for webdev?

leaden narwhal
#

anyone can help me out knowing why 2 days before hand my actuals and train prediction values were closer than they are right now

lofty thorn
#

what is ' building model ' means?

leaden narwhal
lofty thorn
#

what is this in theta ..i don't get it

tawny sand
long locust
lofty thorn
#

when do i Start learning ML

long locust
#

I'm not sure, I don't know what course you are taking

lofty thorn
#

I am trying to learn for a ' emotion detection ' project

#

I thought starting learning opencv and ML in the beginning

#

but now stuck at the beginning

wooden sail
#

neural networks are built out of generalizations of the simple y = mx + b formula, so it's in your best interest to spend some time there until you grasp it

lofty thorn
wooden sail
#

linear functions, linear models

agile owl
#

should I apply a smoothing filter to this data before fitting a time series model

tawny sand
lofty thorn
#

i am learning...on the go...i don't have any fear regarding this

tawny sand
lofty thorn
#

i mean i know linear functions.. x = y + 1

#

a little bit

tawny sand
tawny sand
#

It will get more difficult, to the point where even those with formal education have problems grasping certain concepts

lofty thorn
tawny sand
#

Difficult to do in parallel, to say the least

#

I'd finish school first (for algebra and decent calculus and statistics), go a liitle in depth on multivariable calculus, and then start ML

lofty thorn
#

let's see what happens...i really wanna do this.

boreal gale
#

if you landed an hallucinate, that probably means someone saw the potential in you. keep going! you got this πŸ™‚ (edit: wait did i just hallucinated the internship bit??)

also don't be intimidated by math notations, it's just a language to convey concepts (sometimes more precise than just words - hence necessary), also it seems this book you are reading is also trying to help you decipher it in case you aren't already familiar with it (see the green arrow i added).

lofty thorn
agile owl
#

good news everybody, my deep VAR model is optimistic about unemployment

lapis sequoia
#

oh

desert oar
#

@final kiln I can't remember, did you look into flash attention at all?

#

nice, just didn't remember if it was on your radar or not

teal abyss
#

can someone help me install code LLaMA on my pc

desert oar
#

that sounds like a great plan

teal abyss
desert oar
#

understandable. gotta space out all the work, follow the top priorities first

potent sky
# teal abyss can you help me?

You'll have more luck describing your problem and someone here can pick it up and try to help you with it if they have the time

#

Dealing with vague questions is a lot of work and people generally (rightly so) don't want to do so much digging just to get to your problem after which they can begin solving it :)

warm trellis
#

Hey, what can be the reason for having really high mape on training set more than 117%, and 0.512% on validation dataset?

lapis inlet
#

Are there any good free platforms for deploying a tensorflow based flask api? I tried pythonanywhere but it seems to have a limit of 512 mb but the requirements itself cost more than that :/

agile cobalt
agile cobalt
warm trellis
#

True. I am using darts and torchmetrics for the metrics, but if we won't focus on the part where the training error is more than 100% but rather what can be the reason having really high mape on training and really low on validation dataset? @agile cobalt

agile cobalt
#

never mind, seems like it is possible with that metric
(I just assumed it was accuracy/classification without reading it properly, my bad)

warm trellis
agile cobalt
#

Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application,[5] and there are many studies on shortcomings and misleading results from MAPE.[6][7]

  • It cannot be used if there are zero or close-to-zero values (which sometimes happens, for example in demand data) because there would be a division by zero or values of MAPE tending to infinity.[8]

  • For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
    wikipedia but yeah that is a weird metric...

warm trellis
#

yes that's exactly what I've after scaling, real close values to the zero and a lot of zero values as well..

agile cobalt
#

my first guess would really be just: try a different metric

warm trellis
#

Actually, I try rmse, mae, as well..
But they are really small such as 0.00289. I am having kind of difficulties to assess with this metrics.
In general, when it would happen that training is having more error on than the validation?

desert oar
warm trellis
desert oar
#

start by checking for bugs, then data leakage

agile owl
#

does anyone know how to set up a simple baseline model for comparison with GluonTS models

#

I'm trying to match the outputs of statsmodels.tsa.VAR forecasts to the model I'm using in GluonTS's forecasting but it's a huge pain in the neck because I don't think they do it in the same way

craggy agate
#

Help! ANN model loss values in negative and accuracy of 0.000e+00
I am working on this ANN model and with each epoch, my loss value decreases by 300k on average. I tried to reduce the learning rate but it's not helping either, has anyone else faced this issue? Can someone tell me how I fix this

desert oar
#

0 accuracy however is a problem

#

hard to say what the actual problem is though, without knowing more about the data. did you check that X_train and Y_train are constructed correctly?

craggy agate
#

The x_train and y_train seem fine though

desert oar
spring field
# spring field it's still in the context of this, I changed my approach a couple times, changed...

well, I guess my hypothesis was wrong, this is epoch 33, going over a 45k image dataset applying 5 layers of dense blocks and transition layers and it still can't find the box around the number in the cat images... though I gotta say that at least it's drawing the box around the cat, because previous attempts couldn't even get that to happen
it surely has at least learned similar data really well

desert oar
#

for example @craggy agate you should get ~99% accuracy with this:

import numpy as np
import tensorflow as tf

ann = tf.keras.models.Sequential()
ann.add(tf.keras.layers.Dense(units=16, input_dim=1, activation="relu"))
ann.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))
ann.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

n = 1000
x = np.linspace(0.0, 1.0, n)
y = np.where(x <= 0.5, 0.0, 1.0)

ann.fit(x, y, batch_size=1, epochs=10)
#

so start there, and then start adding in pieces from your actual code/data until it falls over

lapis sequoia
#

Does anyone know a good way to make segmentation masks quickly? Something like magic lasso in photoshop but free

lapis sequoia
dawn light
#

for the calculations of precision and recall for multilabel classification, how does one get the values for FN and FP?
it's pretty easy to understand what they mean for binary classification (spam or not spam for example), but how does ano determine what a False postive and a False nega is for multilabel classification?

For example, for digit recognition, isn't it the case that if the NN classifies a digit as wrong, it's just wrong? It's neither FP nor FN?

desert oar
craggy agate
crystal geyser
#

Hello guys,
This is a file of data contains viral posts, I scraped these using Instagram Viral Content Finder, This is a crawler I developed completely from scratch, it search all posts related to specific niche and find those posts which are viral on Instagram, after scraping it stores the data in data.json file...

craggy agate
earnest anchor
craggy agate
#

I checked

#

Probably in compile or train

earnest anchor
#

My question is

What are you training for??

Like what analysis do you want to perform from this data base??

lapis sequoia
#

Whats the best way to store a very big dataset of torch tensors for the fastest access

#

so far fastest one i found is zarr but still is there anything faster

#

Because I want to save a giant dataset into a zip file but I don't understand if zar will try to load whole zip into RAM
its 150 gb the dataset and its of mri images that I use 2d slices of and even those are painfully slow to load

left tartan
#

Have you tried parquet? Have you benchmarked the use cases?

lapis sequoia
#

and benchmarked and zarr is the fasters

#

fastest

#

savez compressed takes 228 seconds for all slices of 10 images and zarr zipobject takes 32 seconds which is weird so i think it loads it into ram so it wont work on whole dataset

#

and DirectoryStore takes 82 seconds

#

and also it needs to be compressed like zarr and numpy savez compressed and bloscpack because otherwise it doesnt fit into my hdd

agile cobalt
#

"fastest access" is not quite clear. Access in which way?
position based? based on comparing with a value in a certain column? you could even read that as loading the entire data into memory

lapis sequoia
craggy agate
#

Is that possible?

lapis sequoia
craggy agate
#

What about cloud based

#

AWS hosted

#

You could also use a couple of pi 5s

agile cobalt
#

you can try using parquet, which is a really good format for most tabular data + supports a bunch of other types as long as your libraries that support them well (e.g. pyarrow or polars, not pandas), but zarr is probably amonst the best

just make sure you are using the smallest data types you can afford to

lapis sequoia
#

i tried h5py and it takes 38 seconds

boreal gale
#

curious if you have looked into https://lancedb.github.io/lance/index.html ?
(we considered it at work, but threw the idea out because it takes too long to fully validate and we already have something that works, i.e. i don't know how good is it - we don't deal with image btw, just tablular data.)

potent garnet
#

Hi everyone, I'm working on time series project but I have one question is anyone deal with irreguler time series data before ?

boreal gale
potent garnet
desert oar
#

It does look arrow-based so that's interesting, makes it a direct competitor to parquet (and feather), so it's even weirder and more suspicious that they don't mention either

boreal gale
desert oar
#

I'm actually confused, lance looks more like a directory format analogous to iceberg or hive

#

In which case, again, suspicious that they don't mention either

potent garnet
boreal gale
potent garnet
#

here is example train and prediction

sturdy kiln
#

hello so this is the first time im dealing with regression type data, and im getting absurdly high values in my MSE and loss values with the use of KerasRegressor

#
def model_2():
  model = Sequential()
  model.add(Dense(60, input_shape=(len(columns),), kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(200, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(300, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(1, kernel_initializer='normal'))
  model.compile(loss= 'mean_squared_error' , optimizer= 'adam')
  return model
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=model_2, epochs=10, batch_size=10, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, house_x, house_y, cv=kfold, scoring='neg_mean_squared_error')
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))```
#

dataset looks sumn like this

columns = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 
           'waterfront', 'view', 'condition', 'grade', 'sqft_above', 
           'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 
           'long', 'sqft_living15', 'sqft_lot15']
house_x = housingDF[list(columns)].astype('object').values
house_y = housingDF["price"].astype('object').values

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21613 non-null  int64  
 1   date           21613 non-null  object 
 2   price          21613 non-null  float64
 3   bedrooms       21613 non-null  int64  
 4   bathrooms      21613 non-null  float64
 5   sqft_living    21613 non-null  int64  
 6   sqft_lot       21613 non-null  int64  
 7   floors         21613 non-null  float64
 8   waterfront     21613 non-null  int64  
 9   view           21613 non-null  int64  
 10  condition      21613 non-null  int64  
 11  grade          21613 non-null  int64  
 12  sqft_above     21613 non-null  int64  
 13  sqft_basement  21613 non-null  int64  
 14  yr_built       21613 non-null  int64  
 15  yr_renovated   21613 non-null  int64  
 16  zipcode        21613 non-null  int64  
 17  lat            21613 non-null  float64
 18  long           21613 non-null  float64
 19  sqft_living15  21613 non-null  int64  
 20  sqft_lot15     21613 non-null  int64  
dtypes: float64(5), int64(15), object(1)
memory usage: 3.5+ MB```
#

ive been tweaking and modifying the model with different node amounts and added layers but it seems that my MSE is still very large, the difference between a non standard and standardized model also isnt very much

#

i dont know what im doing wrong or any thing i have to do to make this model perform better, so any help is appreciated

#

i have no baseline at all if this is good or bad because from what im seeing its normal to get thousands in their MSEs but billions?

fair warren
#

Is there any movement in the industry towards using polars (or any other df solution) instead of pandas?

agile cobalt
#

sort of - pandas is still dominant by far, but there are a few places moving to polars for efficiency/performance gains, at least on new projects

pyspark is common in some contexts though (and has been for a while)

#

not sure if I would call it a movement in the industry yet though

lapis sequoia
#

wot are teh baics of data science pls

#

anyone explain to me in private chat?

sturdy kiln
#
# Normalize our X values
from sklearn import preprocessing as prep
min_max_scaler = prep.MinMaxScaler()
house_x = min_max_scaler.fit_transform(house_x)

def model_3():
  model = Sequential()
  model.add(Dense(60, input_shape=(len(columns),), kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(512, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(1024, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(1024, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(1024, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(512, kernel_initializer='normal', activation= 'relu'))
  model.add(Dense(1, kernel_initializer='normal'))
  model.compile(loss= 'mean_squared_error' , optimizer= 'adam')
  return model
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=model_3, epochs=10, batch_size=10, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, house_x, house_y, cv=kfold, scoring='neg_mean_squared_error')
print("Model 3: %.2f (%.2f) MSE" % (results.mean(), results.std()))```
Ive normalized the data, added a bunch more layers, and this loss value is still absurdly high. I have no idea if the problem is my evaluation function itself, the data, the model or god itself i am at a loss (heh)
untold bloom
# sturdy kiln i have no baseline at all if this is good or bad because from what im seeing its...

hi,

its normal to get thousands in their MSEs but billions?
the unit of MSE is not the same as the data right? it's the mean of (true - prediction) ^ 2, so there is a unit mismatch and therefore people sometimes use RMSE for a more interpretable metric. Indeed, if you take the square root of your MSE there, you'd end up around ~thousands. From a bit of a search (because you didn't share :p), I assume your data is "kc_house_data" and it seems the mean of the target is around 500 thousand, so your result (~200 thousand) is not off. Also I found this where they obtained an RMSE of ~100 thousand, so that's another something you can look and rest assured. Even though you report validation score and they test score, underlying message is independent of those

sturdy kiln
#

is RMSE just the square root of MSE?

untold bloom
#

yes

#

root mean squared error

sturdy kiln
#

so i could just get the RMSE from the previous eval results by sqrt the mean

untold bloom
#

yes

#

need to negate first though

#

sklearn reports negative MSE

sturdy kiln
#

i am using NMSE

untold bloom
#

yes

sturdy kiln
#

ah so i negate it first

untold bloom
#

yes

#

(it is negative so as to unify it's maximizing stuff)

#

stuff being the validation score

sturdy kiln
#

hmm newer model shows 200k RMSE, whats the interpretation from this

#

that the "error" value is in average 200k off from the true values?

untold bloom
#

yes

sturdy kiln
#

hmm honestly it doesnt sound so bad when put it that way

versed pilot
prisma briar
#

Should I take ML or GenAI course before?

desert oar
left tartan
vestal spruce
#

does anyone know which python library/package is used often for audio processing (e.g. splitting audio into segments, changing sample rate, etc.) ?

royal crest
#

librosa, pyaudioanalysis, tensorflow io, ffmpeg

sweet stag
#

guys where do i learn about sci kit learn?

worldly dawn
sick eagle
#

guys, i have a dump question

#

why are you using Github

#

for what

#

why is it importent

wooden sail
#

i'd say there are roughly 3 big reasons

#

the first is separate from github. git on its own is a great tool for versioning control. it helps you track any changes you (or anyone else) makes to the code, revert those changes, etc.

#

next, it's great if you can both back up the code and all of the versioning history somewhere remote, and also be able to access it from anywhere. this is what github does: it's a hub for your git repositories. one of many, might i add. there are alternatives

#

the third reason would be that the particular choice of using github comes with the advantage of them offering nice things like github actions, github pages, etc, while also being free. (in exchange, microsoft uses your repos without permission to train AI)

#

that means you can use git just locally, you can pair it up with github or any other remote hub for git repos, and if you do choose github as your remote hub, it comes with goodies

sick eagle
#

@wooden sail thanks ,bro ,thanks too much

midnight harbor
#

Guys help me out here

Why can't i find any yolo model that have face and person class in single model

Yolo default by ultralytics has person class but not face, yoloface has only face class but not person

The only choice i got is train a new model with thee two class but if there is a model with both of these classes please tag me here

stone oyster
#

Could someone tell me the steps to uninstall micromamba from Windows? The official docs, Google, ChatGPT3.5 and Gemini are all useless here.

craggy agate
lofty thorn
#

hi

stone oyster
crisp raptor
#

you know you have been learning machine learning correctly when the hardest part is thinking of an input medium

lofty thorn
#

can anyone explain this plot to me

crisp raptor
lofty thorn
#

self studying for the project

crisp raptor
#

yeah what is the project

wooden sail
crisp raptor
#

context

lofty thorn
#

it is an example from 'model based learning'

crisp raptor
#

well what exactly is it supposed to represent? It looks like it just is simple regression for ponts

tidal bough
lofty thorn
#

is this a linear regression plot..idk i am just guessing rn

crisp raptor
#

yes it is

lofty thorn
#

ok

tidal bough
#

sure, I guess? though it doesn't show the best fit, just three arbitrary ones, to show what the parameters mean.

lofty thorn
#

what theta is representing here

crisp raptor
#

parameters

tidal bough
#

see the equation

crisp raptor
#

look at the equation above

wooden sail
#

theta are just parameters of a linear equation

#

what the text says is basically the whole story

lofty thorn
#

what is meant by parameter

#

models?

wooden sail
#

if you have the equation of a line y = mx + b, if you change m and b, you can generate any line you like at all

crisp raptor
#

an adjustable variable of sorts that changes the outcome

wooden sail
#

and there is a particular choice of m and b that best explains the data, that's the blue line

#

finding that best m and b is called regression

#

"linear regression", at that, because m and b are the parameters of a linear equation

crisp raptor
#

statistics is fun

wooden sail
#

that linear equation is your "model". you assume the data follows a straight line, and find the parameters m and b of the straight line

#

a "model" is just how you decide you want to explain a phenomenon

#

here, we say "the data should be a straight line". that means our model is that the data is of the form y = mx + b

#

y = mx + b is the equation of absolutely any straight line that can ever exist. the parameters are m and b. this is what defines what line we get

#

regression is the process of finding the model parameters based on data

#

the black and red lines are just examples of arbitrary lines that have nothing to do with the data

#

the blue line has a good choice of m and b

crisp raptor
#

@lofty thorn is this for a class?

lofty thorn
#

ohk.....

tidal bough
wooden sail
lofty thorn
#

its little up

#

right?

#

i mean down

wooden sail
#

probably

crisp raptor
#

and a slightly decreased angle

lofty thorn
#

so every straight line is y = mx + c
m and b decides where the line is going to be

#

?

wooden sail
#

indeed

#

you tell me πŸ˜›

potent sky
#

Anyone read through the infini-attention paper yet?

crisp raptor
#

I'm stuck in the 2010's with ML, so no

lofty thorn
#

can anyone tell me what should be basic knowledge needed to start learning ML

wooden sail
#

linear algebra, multivar calculus and statistics are widely regarded as the basics

lofty thorn
#

ok

thin lotus
wooden sail
#

that's certainly the case

#

but even very basic questions like "what layers and activation functions make sense for my problem?" can only be properly answered this way

thin lotus
#

Ofc , if you wanna do deep learning I 100% agree it will be hard, but for more basic "ML" like clustering it becomes a lot simpler

crisp raptor
#

if you are looking for resources, wikipedia is an excellent source for math related things

#

also, does anybody have a good idea on how I could embed FEN notation for a NN chess experiment

wooden sail
#

already the most basic clustering methods require all 3 of linalg, calculus and stats, since you're always computing ratios of expectations of vector-valued functions for those

crisp raptor
#

doesn't mean you have to take classes for it, you could just teach yourself

craggy agate
left tartan
#

Perhaps there's an argument about how much you actually need to -do- vs -understand-.

#

But calculus (at least the undergrad 1-3) isn't a particularly hard subject: the reason it's hard is students have terrible algebra fundamentals.

wooden sail
#

right, for practical purposes you hardly need engineering level knowledge of the 3, which is already very basic

crisp raptor
#

I should point out basic knowledge, like basic DE, PD, and integration and derivatives

left tartan
crisp raptor
#

well, the last 2 years

crisp raptor
wooden sail
#

funny tidbits here include things like MAE not being differentiable at 0 and pytorch/tf/jax making an arbitrary (and different from each other) choice of what to use as a subgradient at that point, or that the log doesn't expect you to evaluate at negative values (which normally returns a complex number), and the derivative assumes you won't either, so it's often just defined as f'(x)/f(x) even for negative values of f(x), which doesn't make sense

#

but you'll never find out if you don't know calc πŸ˜›

wooden sail
#

these will directly affect and possibly ruin what you're doing

wooden sail
#

jax gives errors for some, but not all of these, and makes arbitrary decisions in others

#

you need to know enough math to even realize this happens

#

same with the dimensions of things like CNNs when you apply them to multi-layer inputs

#

tf and pytorch do different things by default (broadcast vs implicitly make extra CNN layers)

left tartan
crisp raptor
#

theres the benefit of building your toolchains from the ground up; you know how to fix your errors

wooden sail
#

i don't wanna sit down and make a general computational graph tool myself

crisp raptor
#

it's not bad knowledge to know if something fails you

wooden sail
#

sadly these fall under the category of being just as difficult, if not more, than the original problem

#

so you immediately have at least 2x the work to do

crisp raptor
#

yay for engineering πŸ‘

wooden sail
#

it does, but not if you have a job to do and get paid for it

#

on your free time it's fine, but under time and money constraints not really

crisp raptor
#

R

#

the language?

#

do you know what R is

wooden sail
#

R is fairly high level, kinda the opposite direction of what we're discussing now

crisp raptor
#

I would say C, but that's just because I'm an old man at heart

#

Fuck the government πŸ™‚

#

That would essentially stab GNU in the back multiple times

#

because they are fucking trying to kill off their brainchild

iron basalt
wooden sail
#

indeed

iron basalt
craggy agate
#

But I would say functions are pretty important compared to limits.

wooden sail
#

i'm often in the position where i do have to implement some stuff that normally one wouldn't. i was discussing this with zestar just the other day. if you can, you'd wanna use built-in stuff like scipy's solver. i just happened to be solving a problem requiring a quasi-newton method, but with a massive, highly structured matrix

craggy agate
#

Calculus will obviously help a lot but it's not imperative

wooden sail
#

once you're at the point of anyway having to replace numpy's matrix multiplication with matrix-free stuff and explicitly compute jacobians' and hessians' actions instead of storing the matrix itself, you're already one step away of writing any gradient-based method from scratch

#

i guess the point being that having more knowledge lets you solve problems in different ways and can even make problems that would otherwise be practically impossible, possible. those are kinda edge cases though

#

or just help you debug your pipeline better

iron basalt
#

It's about change, and most interesting things involve change.

#

(That's why its the language of physics)

#

I don't think they can really be compared in terms of importance.

#

It's also kind of like saying 1 is more important than 2. Like what does that mean? They both are used heavily, and come up whenever they do.

#

"Important" is not a universal rank-able thing.

wooden sail
#

inb4 well-ordering of maths

iron basalt
#

It's actually the opposite, you just don't usually see it and can use more well known alternatives that may or may not be as elegant.

#

Oh my bad, I read common, not uncommon.

craggy agate
#

It's about limits as well

#

😭

desert oar
#

yeah you can think about it as being all about relating levels and rates

#

limits are more of a general real analysis thing but they turn out to be foundational for calculus

#

(or "the" calculus as some would have it)

iron basalt
#

Which then became the method of exhaustion for a while. And then more modern forms built on that.

desert oar
#

wild how they used to do math by just writing out words and drawing diagrams

iron basalt
#

Although there may have been earlier understandings, it's not super clear, seems to show up globally at earlier times, history is still ongoing (being discovered).

iron basalt
warm trellis
#

When I was doing an internship in one company, I was given a problem to predict cost of building something into the future, though there was no historical price changes. It was a cost building at a time, in different places, and they were in completely different settings. so not identical at all, therefore different costs. What one has to do in that settings to predict into future what will be the cost ? I couldn’t build what they asked me to build.. it’s kind of mysterious for me if it’s even possible to do it in the first place. Or if I was given an impossible tasks, and have been naive to accept it. Also blame myself that I couldn’t do it

desert oar
#

if the prices of inputs don't change historically, then i would estimate the cost of the new thing by looking at the historical costs of the components or processes that are required to build the new thing

outer estuary
#

Hello Im luckythespacecat I am a game developer. I need help getting Character.ai to work with my game, if you can help DM me. Below is a link to the respitory that I was trying to use. if you can get this to work please let me know and DM me. I am trying to set it up for a game im making (not in python) if you help I will also credit you properly
https://github.com/kramcat/CharacterAI

left tartan
coral lotus
#

i also added import keras, import keras_rcnn, import numpy

#

im getting an error py line 5, in <module> training_dictionary, test_dictionary = keras_rcnn.datasets.shape.load_data() ^^^^^^^^^^^^^^^^^^^ AttributeError: module 'keras_rcnn' has no attribute 'datasets'

#

i think its trying to access the module keras_rcnn instead of going into the folder keras_rcnn

#

how would i fix this?

desert oar
coral lotus
#

in python in vscode

desert oar
coral lotus
#

not sure, i just ran the main.py file

#

the main.py file contains all the code from the readme

desert oar
coral lotus
#

oh hm i see

desert oar
coral lotus
#

how would i fix this then, because i still need to do the imports

#

let me check one sec

coral lotus
#

keras_rcnn\datasets\shape.py - this is the file the code is meant to access

#

but its looking in keras_rcnn library

#

idk how to fix it

desert oar
#

and is there anything in keras_rcnn/__init__.py or is it just a blank file?

coral lotus
desert oar
coral lotus
#

its not imported in that file

coral lotus
desert oar
#

i thought you said it was shape.py i see, i was confused

#

what are the import statements in main.py?

coral lotus
#

no but when i ran main.py, the error is coming from the program not being able to access the shape.py file

desert oar
coral lotus
#

ohh alright one sec

#

then im guessing i would also have to import keras_rcnn.preprocessing

desert oar
#

the program not being able to access the shape.py file
that's not what's happening. python found the keras_rcnn module but you also need to explicitly import the submodule that you're using

desert oar
#

(you probably don't need keras_rcnn.datasets by itself)

coral lotus
#

wouldnt that be included by importing keras_rcnn.datasets

desert oar
coral lotus
#

ohh

#

wait why is this an error

desert oar
#

read it!

#

(please post code as text in the future, not as screenshots)