#data-science-and-ml

1 messages · Page 15 of 1

cloud sand
#

the parser is giving you error because of the bad args

#

but it's not a code-related problem

young granite
#

so any advice what i can do now?

cloud sand
#

just delete all of your args

young granite
#

i did

cloud sand
#

what's the command?

young granite
cloud sand
#

what

#

why did you do that? the code is ok, if you delete all that part is not going to work

#

the code is perfect, there is no error inside it

young granite
#

what u mean by delete all of my args then?

cloud sand
#

take the args and remove them

#

(except of the first obvs)

young granite
#

maybe i misunderstand u but if u tell me to not touch the code, cause its working properly, why should i now go ahead and delete args out of an example where i did not code 1 line myself?

cloud sand
#

wait I suck at communicating so let's start from scratch

young granite
#

thanks for ur patience tho

cloud sand
#

a command is structured like this
python3 <-- [command] file.py <-- [argument 0] abc <-- [argument 1]

#

in the command you (or your IDE) have probably passed some argument that parse doesn't know how to handle

#

so he tells: "wtf dude what the heck is this arg?"

#

and crashes

cloud sand
young granite
#

and how do i see those args my IDE is sending?

cloud sand
#

vscode has stuff

#

but, are you using macOS, windows or linux?

young granite
#

windows vscode

cloud sand
#

try opening the terminal and running it by command

#

without the IDE

young granite
#

so create a normal .py file with the input and then execute through terminal?

cloud sand
#

yea

#

save a "file.py" in your desktop, put the original unmodified code in it, then navigate there with your terminal and type "python file.py"

young granite
#

but inside my venv where pytorch is install or in general terminal?=

cloud sand
#

just activate the venv in the terminal

young granite
#

just did

#

it does stuff now

#

lets see what the result is

#

it got problems importing some torch functions

#

Episode 620 Last reward: 200.00 Average reward: 188.34
Episode 630 Last reward: 200.00 Average reward: 187.51
Episode 640 Last reward: 200.00 Average reward: 192.52
Solved! Running reward is now 195.03876913985297 and the last episode runs to 200 time steps!

#

this however was the output

cloud sand
#

so it worked correctly

young granite
#

seems like it

#

sadly i wanted a bit more hands on visualised example to understand step by step

#

🗿

cloud sand
#

you mean you want to see the thing moving?

young granite
#

i mean i would like to do an example with training and then use an example and see predicitions

#

to get started with ML in pytroch

cloud sand
#

well just save the last checkpoint

#

and load it in a nb to do stuff

young granite
#

can u tell me how?

cloud sand
#

torch.save and torch.load

young granite
#

can u maybe tell me why i always start in a venv when i open a new terminal?

cloud sand
#

because you didn't deactivate it

young granite
#

i wasnt using that

#

i got a stock venv

#

and a torch venv

#

i started latter and now its back in stock

cloud sand
#

just deactivate everything and go back to the one you want

young granite
#

can i disable to start in a default?

cloud sand
#

if you deactivate it it goes away

young granite
cloud sand
#

deactivate it

young granite
#

i just did

cloud sand
#

awesome

young granite
#

and when i open new terminal its back online

#

🗿

cloud sand
#

what command did you type to deactivate it?

young granite
#

deactivate

#

its persistent

cloud sand
#

deactivate is a script

#

you have to go in the correct directory

#

and call the script

#

wait do you have anaconda installed?

young granite
#

no

#

u mean by:
."name"\Scripts\deactivate

cloud sand
#

I don't know about windows

young granite
#

should be the right one cause its referring to a .bat same as for activate

young granite
# cloud sand I don't know about windows
Microsoft Windows [Version 10.0.22000.856]
(c) Microsoft Corporation. Alle Rechte vorbehalten.

F:\Python\Environments>f:/Python/Environments/aktie/aktie/Scripts/activate.bat```
this is always the first line without me typing somethng inside the terminal so it looks like its set as default somehow
cloud sand
#

wait what are you referring to?

young granite
cloud sand
#

no I mean

#

the string, what is the string are you talking about

#

could you paste only it over here?

young granite
#

if u got some mins we could do this in VC if u willing to?

cloud sand
#

sure, I don't use windows but maybe I can help you a bit

#

hop on in vc 0

#

in the meantime im going to get the persmission to talk

#

!voiceverify

unique flame
#

Groupby and Value_counts could help.

twilit current
#

Hi guys. I'd like some general advise on working with "sparse" datasets. I'm used to dealing with neatly organised ML data, but at my job all the datasets are "sparse." Most fields contain null values.

#

I honestly don't know how to deal with it, and I'm looking for some advice 🙂

#

(I ended up moving this question over to #help-apple by the way)

main moon
#

hi

#

i wanted help to get into data science and automation with python
i have completed the basic math and coding stuff related to statistics and have a good grasp of basic python fundamentals
but idk where to go after this

pallid orchid
#

Hello everyone! Can anyone give me links to some cool tutorial projects for absolute beginners in AI, i do know Python pretty well btw

quaint loom
wooden sail
#

looking just that that equation, it's impossible to tell. the answer should be either in the surrounding text or nearby equations in whatever you're reading

cloud sand
#

but (this is not the case) when you see λ in front of an equation, it's most likely lambda calculus

cloud sand
#

this is an example

#

again, not related to the specific equation you sent

quaint loom
cloud sand
#

probably in this case it's a constant

#

but as @wooden sail said it's kinda impossible to give you a definite answer without seeing the full paragraph

#

could you link us the paper/document?

quaint loom
#

I am not sure how to link a PDF file into Discord. Let me see, otherwise I will just send a printscreen

cloud sand
#

send it to me in DM

#

directly the file

quaint loom
cloud sand
#

tysm! 😄

#

for the others, sorry for being so dumb

#

that's just an eigenvalue

wooden sail
#

aha

quaint loom
#

How would you guys write this equation in Python? I am mostly thinking about the deviation under the biggest deviation

serene scaffold
quaint loom
serene scaffold
quaint loom
#

Ofcourse by switching out the symbols with numbers. but where I place the deviatio (/) would be wrong

serene scaffold
desert oar
#

you need to use () to group the numerator and denominator

#

as you pointed out, there's no "big fraction" symbol

desert oar
# quaint loom Yes, correct

multiplication is *, addition is +, subtraction is -. use () to group operations together. python implements the usual order-of-operations (sometimes referred to using the english mnemonic "PEMDAS" or "BEMDAS")

quaint loom
#

Do you think this would work? I am not good with math, yet
ΔRnc + (pa * cp * VPD / rac) / Δ + y ((1 + rc/rac))

desert oar
#

it's just like in math, if you wrote this out on paper on one line using /

unique flame
#

or like those single line calculators

desert oar
#

you also have an extra layer of parentheses around 1 + rc/rac

#

yeah exactly. it's the same as something like a graphing calculator

quaint loom
desert oar
quaint loom
desert oar
quaint loom
desert oar
#
(ΔRnc + (pa * cp * VPD / rac)) / (Δ + y (1 + rc / rac))
#

equivalently:

(ΔRnc + (pa * cp * (VPD / rac))) / (Δ + y (1 + (rc / rac)))
#

the () aren't needed around VPD / rac because of how multiplication works (it's "commutative", meaning you can swap the order and get the same result, and / is just *)

quaint loom
desert oar
#

and you don't need the () around (rc / rac) because / "binds" more tightly than +

#

when in doubt, add more parens to group things

desert oar
quaint loom
#

Hahaha. They just required me to have a bachelor in some relation to climate, GIS and environment

unique flame
#

Multiple linear regression is an AI model right?
I've been reading a Master thesis and he wrote "The comparison of ANN models with Multiple Linear Regression model and Logistic Regression
model shows that the AI model produced inferior result than its counterparts in term of model
accuracy.
"
Which I find a weird sentence since they're all AI no?

steady basalt
#

Any linear algebra gods in chat

serene scaffold
desert oar
steady basalt
#

Does pca require svd

wooden sail
plain saffron
#

@desert oar thanks 👍

fossil timber
#

ML newbie here!
I've only read one or two articles/tutorials about ml but haven't actually done anything with it, so I thought I'd try a little project.

I'm not interested in becoming an expert, I'd just like to be pointed to some resources/what kind of model I should use.

The idea:
Generate a list, of random length, of 2-tuples, with each of the values randomly chosen from a fixed set. An image will be generated from that data.

I want to generate "good" images (according to the training data)

I figure the best way is, instead of asking for "good" values, generating random values until I get a score above a certain threshold.

What kind of model should I use?
Also, what's an easy way to get people to rate a bunch of images?

desert oar
# fossil timber ML newbie here! I've only read one or two articles/tutorials about ml but haven'...

in general, we solve problems like this by defining a "goodness" score and then implementing some kind of algorithm that maximizes the "average goodness" over a big collection of samples or test cases.

generating completely random images is probably not an effective algorithm because the probability of randomly generating a good image is very very low on an image of nontrivial size. but you can use it as a baseline for comparing to other algorithms. if your custom algorithm is worse than random generation, then you did something very wrong!

#

how to design the algorithm is most of the content of the field of machine learning

#

what is your training data in this case?

#

and what's the goal of this? there are a few different ways you can approach this, but do note that generating images from input is maybe not a trivial problem (think: dall-e)

cloud sand
#

also because MLPs are not nearly the only thing in AI

mighty lance
#

So, I'm a complete ML beginner I know python quite well and kind of familiar with data science libraries like numpy and pandas what are some good beginner resources to get started with ML?

cloud sand
fossil timber
# desert oar what is your training data in this case?

Ok, mentioning image generation may have given the wrong impression.
The model needs only to evaluate the list of data to generate the image.

To be precise, each element in the list describes a layer of the image, with one value being a mask and the other being a colour. (Yes, this is for Minecraft banners)

The masks and colours are simply chosen from a predefined set.

Basically, the model only needs to score a small list of tuple[int, int]

desert oar
#

in that case, you still want some "goodness" function to score your combinations

#

then you can just do

best_score = 0.0
best_image = None
for mask, color in options:
    image = make_image(mask, color
    score = calc_score(image)
    if score > best_score:
        best_score = score
        best_image = image
#

(this pattern of looping and incrementally computing the "best" thing in a list is called "dynamic programming")

fossil timber
desert oar
fossil timber
#

That's the thing, I want to train the ai to predict what people, on average, would rate the image.

cloud sand
#

you need data I fear

fossil timber
#

Yeah, I was hoping there's some sort of website that lets you upload a bunch of images and let people rate them.

cloud sand
#

you can pay people on mechanical trunk to do this

unique flame
torpid arrow
#

ayo are ML models that use int8 fixed-width between 0, 127 or -128, 127

#

ah thats why i couldnt find any google results - good to know thanks

#

wait youre completely wrong

#

im am talking about normalization not sure what fixed width data typing means

#

yeah man im not a beginner - just want to know if i should normalize between -127 and 127

#

or 0 and 127

#

Yolov7

#

with tensor rt conversion into int8

serene scaffold
# steady basalt Any linear algebra gods in chat

I was afk when I told you not to ask to ask earlier. But let me emphasize that that was at least the third time I asked you to stop asking to ask. (And asking "does anyone know about x" is a form of asking to ask.) This way of fishing for help wastes everyone's time, including your own. Being able to ask questions here is a privilege--if you keep wasting peoples time, I'm going to have to take it away temporarily.

torpid arrow
#

x′ = x−minx / maxx−minx normalizes my features in [0,1]

#

to run inference on a model using int8 datatype

#

does the data need to be normalized (scaled) between 0 and 127 (the maximum number in pos using int8)

#

like how you use 64bit models

#

or is the data scaled between -127 to 127

#

yes with fp16/fp32

#

eh

#

no the data i feed into the network

#

sorry im misunderstanding you, im not talking about the structure of the model - the data - yes i run a normalization on the data - im asking do you scale the data between range -127 to 127 or 0 to 127

#

or map the data whatever is clearer

#

page 12 shows values -max and + max on the top row, I assume thats referencing my data's range (0,1)

#

ur a troll then

#

its a very simple question

#

see example on page 12 of the pdf i sent

#

its a basic question about normalization but i guess its beyond you ill ask elsewhere

quaint loom
#

Can I not use the dollar sign in a code?

serene scaffold
quaint loom
serene scaffold
quaint loom
#

Isn`t it enough to use sympy?

serene scaffold
quaint loom
#

Well, I do use it but it seems like it wont. Not really important but cool to know

quaint loom
#

Because I use it earlier *

serene scaffold
# quaint loom Because I use it earlier *

in a notebook? like I said, you can tell the notebook "this cell is latex, not python". and dollar signs do mean something in latex, as you know. but not in python.

empty slate
#

Hi, I tried making a custom neural network using only NumPy in python and I believe i have done everything correctly, but somehow the cost doesn't seem to decrease, if i paste the code here can someone possibly help me?

#
import numpy as np

#************ MAIN CODE **************#
class Activations:
    def Relu(self, input):
        self.output = np.maximum(0, input)
        self.deri = (self.output > 0).astype(int)

    def Softmax(self, input):
        input = input - np.max(input, axis = 1 , keepdims = True)
        self.output = np.exp(input)/np.sum(np.exp(input),axis = 1, keepdims = True)
        self.deri = self.output*(1- self.output)

class Layer(Activations):
    def __init__(self, input_size, next_neurons, bias_req = 0):
        self.inputs = np.array(input)
        self.weights = np.random.randn(input_size, next_neurons)
        self.bias_req = bias_req
        if bias_req == 1:
            self.bais = np.random.randn(1,1)     # The Shape (1,1) maybe?
        else:
            self.bais = [[0]]

    def forward(self, inputs, activation):
        self.inputs = np.array(inputs)
        x = np.dot(self.inputs, self.weights) +self.bais

        self.activation = activation
        if activation == 'Relu':
            self.Relu(x)
        elif activation == "Softmax":
            self.Softmax(x)
        else:
            self.output = x
            self.deri = (self.output > self.output - 1).astype(int)

class Back_Pass:
    def loss(self, expected, predicted):
        self.cost = np.sum(0.5*(predicted - expected)**2, axis =0)/len(predicted)
        self.error = np.sum((predicted - expected), axis = 0)/len(predicted)

    def back(self, this_layer):
        self.error = (this_layer.deri)* self.error
        weights_buffer = this_layer.weights

        if this_layer.bias_req == 1:
            this_layer.bais -= l_rate*np.sum(self.error) 

        if len(self.error) == 1:
            this_layer.weights -= l_rate*np.dot(this_layer.inputs.T, self.error) 
        else:
            for i in range(len(self.error)):
                this_layer.weights -= l_rate*np.dot(this_layer.inputs[i].T, self.error[i]) 

        self.error = np.dot(self.error, weights_buffer.T)```
pseudo pasture
#

Guys How Deep learning will help in accuracy of Animals language processing

supple scroll
#

if you're making a dataset for a text to image ai, how do you format images that aren't the same shape as your output?

#

do you just crop it to only be the center of the image and scale it to fit?

wild dome
#

using Pandas here

how to get a DataFrame with a single row being the numbers in orange, with the same columns (d_0.0 - d_-1)?

The orange numbers are the count of the bold values, for each column.

A bold value is the minimum of its row.

tired violet
supple scroll
#

i havent set anything up yet, i just want to know how it works
also what do you mean by "scan"?

tired violet
#

The model will scan ie read the image data in a data set in a certain way it should be one of the ways I mentioned. Also it could take a sample from the raw image depending on what the model is expecting from it's codebook input is how it converts tokenized text to tokenized image data and back to decoded text and or images. Again based on the codebook of the specific model or the way you have set the codebook for your model.

supple scroll
#

when i deal with images i usually just do it from top left to bottom right RGB(A) in order

tired violet
#

Ok what's the expected size of the sample grid.

supple scroll
#

idk, 256x256?

#

i havent set anything up yet, just wanna figure out what the best way to format the images is

serene scaffold
arctic wedgeBOT
#

DataFrame.min(axis=NoDefault.no_default, skipna=True, level=None, numeric_only=None, **kwargs)```
Return the minimum of the values over the requested axis.

If you want the *index* of the minimum, use `idxmin`. This is the equivalent of the `numpy.ndarray` method `argmin`.
tired violet
#

256X256 is a fairly good size approximately 65536 inputs for the index ie codebook.

wild dome
#

!docs pandas.io.formats.style.Styler.highlight_min

arctic wedgeBOT
serene scaffold
wild dome
#

but, the bold value is the minimum of its row

#

minimum is per row, then I want count per column

serene scaffold
#

@wild dome try df.eq(df.min(axis=1)).sum(axis=0)

tired violet
#

Top left to bottom right ok if that is the case then if the raw image if bigger than 256X256 will / should be rejected by your model or you will end up with incomplete data. The layering needs to be consistent or it will be unsatisfactory as resultant data will be off by the discrepancy between the samples.

#

You can downsize all images as long as they are square and then super res them to the size you need for the out put

wild dome
supple scroll
wild dome
#

but this is the output

wild dome
tired violet
#

Then you need to adjust the grid for irregularities such as rectangle shaped data is 256X128 ect but it's easier for the data to be square as you won't have to formulate the data to fit

supple scroll
#

well yeah, but like i could just change the non square images

#

should i just crop everything outside the middle?

#

should i do something weird like use edge detection and crop around the area with the most edges?

#

there's a perfectly good library of images i can use i just need to figure out how to format the stuff in it

tired violet
#

As long as your not loosing important data yes I would crop to the square grid of your choice

#

Don't make work for yourself make it as simple as you can.

#

Just remember the layer stack has to be all the same shape and smaller data costs less time in training.

#

Most models drop resolution to 64X64 then super resolution up to 256X256 then output unless you have a larger capacity you can super res to higher resolution if you want and vqgan to fill any holes in the image

supple scroll
#

what if i tried squishing all the images down to a square, then use another network to try and figure out what size the original image was

#

then use an upscaler to bring it back to that size? nikoThink

tired violet
#

To keep from having distorted results

#

And it involves more processing by your gpu or cpu or both.

#

Increasing cost to you if your using a network server

#

Mostly at this point it's experiment and see but starting with a good consistent data set is a must or your problem will multiply overtime

wild dome
#

I have this dataframe that goes on

#

now, I want to get a dataframe for each unique p, alpha pair, so I do this

for p in betas["p"].unique():
    for alpha in betas["alpha"].unique():
        fixed = betas[(betas["p"] == p) & (betas["alpha"] == alpha)]
#

is that good or is there a better way? like a Pandas method or something

serene scaffold
lapis sequoia
#

is tensorflow image based only?

lapis sequoia
#

Does anyone know how I can remove my package from conda-forge?

serene scaffold
#

Just use pip

lapis sequoia
#

The package is currently published on pip (PyPI) and conda-forge. I can't figure out how to remove the published package from conda-forge.

delicate apex
lapis sequoia
#

if I want to train a model for sentimental analysis with a dataset, what libraries/methods would you guys recommend using

serene scaffold
lapis sequoia
#

i'm doing it for learning and I wanna work on soemthing which invlves sentimental analysis

serene scaffold
lapis sequoia
#

what's special about spaCy?

serene scaffold
#

It's a library for general NLP stuff

#

Everyone uses it

#

My coworkers are mostly boomers who don't know python. But they still try to use spaCy

#

(read everything I just said in a Donald Trump voice)

lapis sequoia
#

i tried using the donal trump voice and it made the messages worse

lone halo
#

hedge fund, private equity, investment bank

gloomy anvil
#

Hello, maybe a little bit off topic, but is someone of you familiar with Tableau? I need to hide the marked bars:

#

If I rightclick > hide, the bars for the sarimax models disappear as well. Is there some easy trick to simply hide bars selectively?

tacit horizon
#

I have a set of unbalance data above 97%:3%, after doing oversampling, i get a pretty good accuracy result, However my False Negatives is so high, which direction i can go?

#

i also tried under sampling, it will make my dataset become too small

unique flame
#

Is a small dataset bad?

tacit horizon
#

97% is 0 3% is 1

lapis sequoia
#

#bot-commands

odd slate
#

For column header of a dataframe, should I use "Code status" or "Code.status" or "Code_status"?

steady basalt
#

The test set is imbalanced making things worse, basically there’s not enough information in ur data to let the model learn

#

If you’ve already gone thru all typical processes to max out performance

#

At least try under sampling or combined sampling

#

Make sure you also do feature selection properly

#

And try a couple basic models

#

Ur issue is false positives btw not false negatives

jagged forum
#

Hi everyone! I hope this is the right channel to ask

#

Does anyone know how to open .json file of size 50GB?

#

I've tried trying things suggested by google, but none of it worked

steady basalt
#

Python

#

Or maybe some sort of big data database software?

cloud sand
hasty grail
#

I assume you do not have a machine with 50 GB memory so you definitely need to process the file in smaller chunks

steady basalt
#

how tf did i open a 18gb file on 16gb ram

hasty grail
#

You can open a file without reading all of its contents into memory

steady basalt
#

isnt there software to handle such data

#

made by apache or oracle or amaozn

hasty grail
#

Yeah you can use Apache Spark

#

But that takes some effort to set up

#

If you are only doing simple operations you can consider writing your own code to read and process the file in chunks

steady basalt
#

ive never used spark

#

hard to learn?

#

i feel like pyspark is something theyt tjust dont teach in uni

hasty grail
#

The framework does most of the heavy lifting so its API is similar to using pandas dataframes

#

However everything has to be done within the context of a pyspark session

#

I think it's quite easy to pick up if you are already familiar with pandas

#

But if you are looking to leverage distributed processing (which is kinda what Spark is meant for) then you need to set up a cluster and configure your application to use it

brave sand
#

can someone explain generative adversarial inverse reinforcement learning?

brave sand
#

thanks!

#

usually are multi headed models used for localization?

spare briar
#

there are other formats that have fast read/write without loading into ram like h5 or lmdb

steady basalt
#

guys... think we need to turn the server logo black...

spare briar
#

or if you only need a subset of data at a time write a generator

worthy hollow
#

hey guys
been a long time, took a break

worthy hollow
#

HERE THE PROBLEM WITH SMALLER DATAFRAMES, i got 2 df as input and one output in which I make use of both input dfs

#

INPUT

#

!e ```py
import pandas as pd

input_nat = pd.DataFrame({ "Planets": ["Earth", "Mer", "Ven"],
"Degrees": ["38", "156", "310"],
"Start_Date": ["25948", "107774", "42177"],
"Now": ["30935", "128441", "50284"],
"Cycles": ["130", "133", "26"],
"0": ["30935", "128441", "50284"],
"1": ["30974", "128597", "50595"],
"2": ["31012", "128754", "50905"]
})

input_cum = pd.DataFrame({"Date": ["21/05/2022","08/09/2022","10/10/2022", "18/10/2022","19/11/2022", "24/11/2022", "21/03/2023", "01/10/2023", "07/12/2023"],
"Earth": ["30830","30935", "30966", "30974", "31006", "31012", "31130", "31317", "31384"],
"Mer": ["128017","128441", "128597", "128643", "128754", "128768", "129234", "130062", "130299"],
"Ven": ["50108","50284", "50336", "50349", "50400", "50408", "50595", "50905", "51013"]
})

print(input_nat)
print(input_cum)

arctic wedgeBOT
#

@worthy hollow :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |   Planets Degrees Start_Date     Now Cycles       0       1       2
002 | 0   Earth      38      25948   30935    130   30935   30974   31012
003 | 1     Mer     156     107774  128441    133  128441  128597  128754
004 | 2     Ven     310      42177   50284     26   50284   50595   50905
005 |          Date  Earth     Mer    Ven
006 | 0  21/05/2022  30830  128017  50108
007 | 1  08/09/2022  30935  128441  50284
008 | 2  10/10/2022  30966  128597  50336
009 | 3  18/10/2022  30974  128643  50349
010 | 4  19/11/2022  31006  128754  50400
011 | 5  24/11/2022  31012  128768  50408
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ajexavamoc.txt?noredirect

worthy hollow
#

OUTPUT

#

!e ```py
import pandas as pd

output = pd.DataFrame({ "Planets": ["Earth", "Mer", "Ven"],
"Degrees": ["38", "156", "310"],
"Start_Date": ["25948", "107774", "42177"],
"Now": ["30935", "128441", "50284"],
"Cycles": ["130", "133", "26"],
"0": ["08/09/2022", "08/09/2022", "08/09/2022"],
"1": ["18/10/2022", "10/10/2022", "21/03/2023"],
"2": ["24/11/2022", "19/11/2022", "01/10/2023"]
})

print(output)

arctic wedgeBOT
#

@worthy hollow :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |   Planets Degrees Start_Date     Now Cycles           0           1           2
002 | 0   Earth      38      25948   30935    130  08/09/2022  18/10/2022  24/11/2022
003 | 1     Mer     156     107774  128441    133  08/09/2022  10/10/2022  19/11/2022
004 | 2     Ven     310      42177   50284     26  08/09/2022  21/03/2023  01/10/2023
worthy hollow
#

as you can see, output takes the initial form of the dataframe input_nat and replace the values of columns "0, 1, 2" by their respective index date from input_cum
i want to make a for loop that can be able to reproduce this output

proven bough
#

To make cat-dog recognition data for a nn i would need to make the images grayscale to get the brightness from the pixels as one input and stretch the image to a set amount of pixels so that theyre all say 200x200, correct?

serene scaffold
#

hmm, this might be more complicated than I thought. What do 0, 1, and 2 mean in the columns?

worthy hollow
#

actually

#

it's only column "0, 1, 2"

#

that we want to use for the matter

serene scaffold
#

sure, but what does 0, 1, 2 mean?

worthy hollow
# serene scaffold sure, but what does 0, 1, 2 mean?
# NATAL CHART 03/01/2009

nat = natal.copy() 
s_d = "03/01/2009"

nat_h1 = nat.copy()

nat_h1.Degrees = nat_h1.Planets.map(helio.set_index("Date").loc[s_d])
nat_h1.Start_Date = nat_h1.Planets.map(helio_cum.set_index("Date").loc[s_d])
nat_h1.Now = nat_h1.Planets.map(helio_cum.set_index("Date").loc[today])
nat_h1.Cycles = (nat_h1.Now - nat_h1.Start_Date) / nat_h1.Degrees

nat_h1['0'] = ((nat_h1.Cycles + 0) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['1'] = ((nat_h1.Cycles + 1) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['2'] = ((nat_h1.Cycles + 2) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['3'] = ((nat_h1.Cycles + 3) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['4'] = ((nat_h1.Cycles + 4) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['5'] = ((nat_h1.Cycles + 5) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['6'] = ((nat_h1.Cycles + 6) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['7'] = ((nat_h1.Cycles + 7) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['8'] = ((nat_h1.Cycles + 8) * (nat_h1.Degrees)) + nat_h1.Start_Date
nat_h1['9'] = ((nat_h1.Cycles + 9) * (nat_h1.Degrees)) + nat_h1.Start_Date

nat_h1 = nat_h1.round()
nat_h1
#
nat_h1['0'] = ((nat_h1.Cycles + 0)```
#

here's what it is used for

serene scaffold
#

sorry, but that doesn't help. I'll keep thinking when I can.

worthy hollow
#

the 0 - 1 - 2 - etc - 9 columns represent

#

the different degrees that we need to match to their date from the input_cum dataframe

worthy hollow
#

let me formulate well what I want to do maybe this will help - it might not be clear so far

#

OUR GOAL is to convert degrees data from INPUT_NAT columns: ["1", "2", "3", "4", "5", "6", "7", "8", "9"] to their CORRESPONDING INDEX DATE which are located in INPUT_CUM

#

idk if this helps

worthy hollow
worthy hollow
#

it's just what we use for the operation, increment at every columns

#

what we want to do is to convert those degrees to their matching dates which are showing in the future

worthy hollow
serene scaffold
balmy beacon
#

👍

worthy hollow
timid kiln
#

I hope this question doesn't come out too nebulous...

I have a table of data in Excel. I am using python to run a query off of a database. The results of the query go into a dataframe. What I need to do is combine some of the data from the table in Excel into the query dataframe. I'm not sure even where to start? If someone could let me know what terms to search for/look up, or perhaps even an example of using one dataframe to update another dataframe, that would be super helpful.

Thanks folks!

brave sand
#

what is a ml algorithm suited for localization, that performs better than a generative adversarial inverse rl algorithm?

timid kiln
#

OK, yeah, I'm already familiar with getting data out of excel. The tl;dr of my question is:

If someone could let me know what terms to search for/look up, or perhaps even an example of using one dataframe to update another dataframe, that would be super helpful.

brave sand
timid kiln
#

Thanks!

errant lake
timid kiln
cloud sand
wooden forge
#

Hey there, good evening. I'm trying to implement a slider in a 3D plot with matplotlib. But unlike 2d plot, I can't unpack the axis to then update its value. I have two spheres and would like to make one rotate around the other. But it is not working very well, would anyone know how to deal with this?

loud apex
#

Hello
is there an extension in vscode for autocompletion for DS and ML libraries?

timid kiln
#

Regarding data cleaning, I'm working with a historian (plant data). Sometimes the instrument simply goes offline. Based on experience, I know it's safe to enter missing values with the average of the last, say, 10 values. How would I go about doing that in a pandas dataframe?

To be honest, not even sure where to start. Thinking about it logically, I guess I'd have to go row by row through the dataframe and when I encounter an empty value, go get the last 10 values, average them, and put the result in that value. Seems... overly complicated to me? But I am a beginner.

Thanks in advance for your help.

desert oar
steady basalt
#

did u guys have to work as analysts for a couple years before getting a DS role?

desert oar
steady basalt
#

well 'analyst' is going to go to shit real fast from what ive heard so im trying to get paset that asap

steady basalt
#

beyond imaginable saturation

#

random people who never touched a computer in their lives taking an udemy sql course for quick n easy money

timid kiln
steady basalt
#

hurting supply and demand

desert oar
#

people have been doing that for years. those people start at the absolute rock bottom of the hierarchy and end up in undesirable jobs.

#

if anything, there is less demand for such people than there used to be, as businesses become wiser about hiring data people

steady basalt
#

i dont want to go thru 2+ years of DA now that ive finished my masters which id really want to use the skills ive learnt

desert oar
#

you will. use the career resources at your school and apply to a lot of jobs

steady basalt
#

ive applied to countless ds roles in the past 2 weeks and 99% of them say i dont have enough experienc ein DS to become a DS lol?

#

they tend to like my cv

desert oar
#

you might want to focus especially on "series B" startups with established small data teams that have the capacity to absorb and mentor a junior DS, but don't want to teach python and stats from scratch

steady basalt
#

yeah there really arent junior DS roles here in london

desert oar
#

don't bother with companies that don't have an established DS team, those companies need seniors and only seniors. if such a company hires a junior, it means they don't know what they're doing and you might have a bad experience struggling along without support (this is what happened to me and it sucked).

steady basalt
#

I just applied to one company and they asked what salary im looking for, I gave a fair number and they email me back saying they cannot offer my more than a rly bad wage and that it was more of a analyst level work evne tho the job was DS

#

i asked what is the wage? and they said they cant disclose

desert oar
#

keep looking. i've worked in several companies that have hired people right out of masters with 0-2 years industry experience

steady basalt
#

i basically i had to get my guess right in the first place

#

how rude is that

desert oar
#

i can't speak for the london job market however

desert oar
steady basalt
#

they cudda lowballed me and id have prob taken it

#

just to get my foot in the door

desert oar
#

that's literally their strategy

steady basalt
#

and get the xp

#

they didnt low ball me though

#

they didnt give me anumber

#

just said i want too much

#

are u in europe?

desert oar
#

no, north america

steady basalt
#

oh loll

#

u guys have disgusting salaries

#

same cost of living as london and triple income

#

wish i was there

#

average sql monkey wage here is 45k dollars

desert oar
#

but also disgusting cost of living, and taxes that aren't much lower than in europe unless you own a home / have kids and can take big income deductions

steady basalt
#

and ive heard its 100k +

#

in usa

desert oar
#

okay that's really low. we have analyst-level employees in eastern europe making 45k

steady basalt
#

from the roles ive been applying to

#

most have been 30-45k

desert oar
#

junior analysts in the usa won't be making 100k, more like 50-80 depending on industry and location

steady basalt
#

data analyst/data scientist but im a graduate

#

its so fkin hard to land a first role

desert oar
#

fresh out of masters you should be close to 6 figures for sure. the problem with data science right now is that teams are generally small and can't absorb a junior that needs mentoring, or think they can't

steady basalt
#

and ive added like 6 months of heavily embellished experience as an 'intern' while i studied, im still being told i dont have enough experience to do these low-end data science roles, ive applied to 200+ jobs

#

include data analyst

desert oar
#

talk to your university career center. the market might be particularly crazy where you are

wooden sail
steady basalt
#

my uni doesnt rly have a useful career center most likely

#

my uni has shit admin

desert oar
steady basalt
#

theres a cv workshop and advice session thats it

#

they dont actually get u jobs

#

in this field at least

#

i think its hot af for consultants, economics, stuff like that though

wooden sail
#

the operation is very similar to a convolution

desert oar
#

oh yeah, you can construct the indices, but you still need to loop over the "windows" that you construct

timid kiln
wooden sail
#

i'm just saying you can replace one python loop for many C loops. depending on how many there are, it can still be faster

#

using series indexing and built ins for the means

desert oar
#

!e ```python
import numpy as np
import pandas as pd

y = pd.Series([
1.5, 2.3, 2.1, 3.4, None, 2.2, 1.6, None, None, 2.9, 2.7
])
print(y)

null_pos = np.flatnonzero(y.isnull())
null_pos = null_pos[null_pos != 0] # Can't do anything if the first element is 0

window_size = 3
windows = [(max(i - window_size, 0), i) for i in null_pos]

for i0, i1, in windows:
y.iat[i1] = y.iloc[i0 : i1].dropna().mean()
print(y)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 0     1.5
002 | 1     2.3
003 | 2     2.1
004 | 3     3.4
005 | 4     NaN
006 | 5     2.2
007 | 6     1.6
008 | 7     NaN
009 | 8     NaN
010 | 9     2.9
011 | 10    2.7
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ovorinozeb.txt?noredirect

desert oar
#

something like that, right?

#

@steady basalt hopefully you can get in touch w/ someone that has some more local expertise. in the usa there are a ton of good analyst jobs at startups as well as in fields like insurance. there is also the actuarial certification route, although i've been told that it's a huge amount of work and isn't necessarily worth the effort beyond the first set of exams

#

i can't speak for how many of them are interested in a junior, but you should at least be able to get an interview and from there you can make a lot of progress with good interviewing skills

#

200 is a lot of jobs... are you consistently struggling during certain kinds of interviews? do you need to work on your presentation and/or interviewing skills? do you need to work on your "problem solving" skills for technical interview sessions? are you able to talk coherently about your thesis and internship work?

#

re: windows, it'd be pretty useful if pandas had the option to apply window functions sequentially

steady basalt
#

I tried applying to walmart and they asked which state i live in in the form

desert oar
#

also recent versions of julia have loop fusion built into the jit compiler which is super awesome for stuff like this

steady basalt
#

but thats only at a 10% callback rate

#

99% of the time, they are looking for someone with more experience

#

London is quite literally either Senior roles or child prodigies only

#

Granted, I turned down 2 data analyst roles, but they were offering less than my current shit job

#

so far ive only seen 2 companies advertising 'junior ds'

desert oar
#

sometimes the analyst is necessary in order to make dashboards and run random reports so that the data scientists have free time to actually do data science, and the analyst doesn't get to do much data science of their own. but a team that cares about investing in its contributors will recognize that their "analyst" wants to grow into a proper ds role, and will plan accordingly.

steady basalt
#

Doesn’t really matter if it’s actually analyst, I need it for the cv to become an actual one

desert oar
#

(unfortunately such teams are not that common, but in my recent job search seem to be a lot more common than they used to be, particularly at techy startups)

steady basalt
#

And also it eases the burden on me having to spend my evenings practising sql and dashboarding

#

During work hours, so can focus on ML and learning new things entirely

#

That’s the issue trying to break in, you only have evenings and weekends to train up and get good enough

#

That includes all sorts of stuff employers commonly look for, as well as competency to pass technicals

#

That’s essentially almost two full time jobs, one to pay the bills after uni and one is self inflicted and unpaid 😅

desert oar
#

i don't know if that option is available to you

steady basalt
#

Not worth the risk of leaving my current job that pays decently

shrewd grove
#

i had a gap in my cv during covid.

#

in the UK noone cares. Ive heard in the US people care more, hence "forgetting" the gap on the cv and mentioning it later during interviews.

steady basalt
#

I’ve been asked by a recruiter about any gaps

#

If I had any

shrewd grove
#

I think it is important to be on the other end too

#

so if You have an opportunity to recruit people - do it!

desert oar
#

i don't think the gap is a big deal if you've been using it productively

#

especially 3 months, that's completely normal after you graduate

#

people will start to notice after a year

steady basalt
brave sand
desert oar
#

being stressed for money is worse than being busy imo

iron basalt
#

(Or both / everything? Or not objects / landmarks based?)

#

What are your inputs?

#

What is the more general problem?

brave sand
#

it's personal-ish

iron basalt
#

No, sorry.

brave sand
#

oh ok

#

other agents

#

it's landmark based

#

why is an generative adversarial inverse rl used?

iron basalt
brave sand
iron basalt
brave sand
#

what other prediction models are common for this?

iron basalt
#

Multi-agent does not tell much alone.

#

Is vision even involved?

brave sand
#

vision is involved

#

probability of detection is determined by distance between agents, speed of agent, terrain. terrain is a multiplier between [0, 1]

#

there is a fugitive that is trying to reach hideout. "police" are trying to detect and track fugitive with fixed cameras, ground based search party, and "helicopter"

iron basalt
#

Are you trying to localize other agents from the POV of each agent?

brave sand
#

in a way?

#

I guess you could say that

iron basalt
#

What is the input to each agent since it's 2D? What is "vision" in this world?

#

(Raycasts?)

brave sand
#

no it isn't raycasts

#

forest density is a factor which is a np array

#

it would be easier if I sent the powerpoint tbh

rustic widget
#

Can anyone teach me how to do 3D matrix and 3D cameras

brave sand
#

with the current info I given you what would be the most suitable model? @iron basalt

rustic widget
iron basalt
#

But now i'm not sure what your are doing in your problem.

brave sand
#

it's fine then, thanks for your time

#

if you want, I could just dm the slideshow but discuss here

iron basalt
#

If you are running a simulation of a bunch of agents and want to predict where they will be. Then that might be where the RL comes in.

brave sand
#

yeah i figure that too

#

just not super familiar with marl stuff

iron basalt
#

That's localization in a different sense. More of a prediction thing. You know in the simulation where all the agents are (you have the exact positions in memory / simulation), but you want the agents to mimic some real behavior (inverse RL).

brave sand
#

could time series here work?

#

time series of state/observation pairs

iron basalt
#

It's more running a multi-agent RL simulation and designing them to mimic some real behavior. Then running that forward to predict.

brave sand
#

how would I predict though?

iron basalt
#

The prediction is running the simulation.

brave sand
#

how is the "police" in the simulation able to predict the location of the fugitive?

iron basalt
#

Like if I simulate a ball falling due to gravity. I can predict where the ball will be by running it.

brave sand
#

Oh isn't that every simulation though?

#

What makes this different? Just a more complex simulation and less predictable?

iron basalt
#

The police would have the last known locations and then enter those into simulation, then hope that those simulated agents mimic the real behavior enough to give a good prediction.

#

It would require a map of the terrain as well.

brave sand
#

Ok, this is more clear now

iron basalt
#

Yes, a ball is simple, a whole person not so much.

brave sand
#

Gotcha, so the GAN IRL is used here for that

iron basalt
#

So my guess is they are suggesting using inverse RL to mimic real fleeing behavior.

brave sand
#

any models that are similar but perform better?

iron basalt
#

In inverse RL?

brave sand
#

yeah

iron basalt
#

GAN IRL will probably do just fine. There is always better, but that is getting into the bottomless pit rabbit hole that is RL in general (and inverse).

brave sand
#

where could I look or read into inverse rl?

#

any recommended papers?

iron basalt
brave sand
#

thanks!

iron basalt
# brave sand thanks!

It also falls more broadly under apprenticeship learning (AL), yet another thing that could be added to the list of unsupervised, supervised, reinforcement, semi supervised, etc.

#

(IRL is not the only approach)

brave sand
#

what other approach would you suggest?

#

could be broad but fits this scenario?

iron basalt
#

The other approach typically used is to learn world dynamics (a world model) and use that to simulate the expert.

#

RL does this implicitly (depending on type).

#

(And IRL lets you handle the fact that you don't really know what the reward function should be)

brave sand
#

Ok, I’ll look into world models

iron basalt
#

IRL is a pretty good fit the problem.

#

If I understood it correctly.

brave sand
#

well are there any other IRL models besides GANS?

iron basalt
#

Generative models can be used to learn world models. So hopefully it makes sense how this all fits together in GAN IRL.

#

GAN is the choice made there but it could be something else.

brave sand
#

Cool, thanks for the conversation

iron basalt
iron basalt
#

The GAN IRL paper even has "...IRLGAN is GAIL without known actions...".

#

The "GAN" part comes via analogy (yes, very confusing (keyword search optimization?)).

#

So the connection to world models is not it in this case (separate approach to apprenticeship learning).

iron basalt
#

*The trick that IRLGAN is doing is summarized pretty well in their figure 1. in the OptionGAN paper (also can see the GAN-ness of it).

#

*IRLGAN still seems like a fine choice for your problem (again, if I understood it correctly).

cloud sand
#

it sounds like a normal SLAM problem

wooden forge
#

Hey there, currently experimenting with interactive 3d Surface plot in matplotlib, I've discovered the ipywidgets that allows me to make a 3d plot interactive. What I want to do is being able to rotate a sphere around the blue one, and I am using the interact function for it. But I'd like to know how to make it less squeezed whenever it's turning around

#

Another example of what I mean by squeezed

#
def plot(a):
    fig = plt.figure()
    ax  = plt.axes(projection='3d')
    res = Rotation(a,light_source) # Controls the rotation with the angle a around the center
    #ax.plot_surface(atmosphere[0,:], atmosphere[1,:], atmosphere[2,:], alpha=0.3)
    ax.plot_surface(earth[0,:], earth[1,:], earth[2,:], alpha=1) # the blue sphere
    ax.plot_surface(res[0,:], res[1,:], res[2,:]) # the orange sphere
    #fig.tight_layout()

interactive_plot = interact(plot, a=FloatSlider(min=-0, max=2*np.pi, step=0.2, value=0.0))```
**And here is the code I've used**
dawn dune
#
torch.Size([2688])```

Is there a way I can reduce the top tensor to just [2688] to match the bottom?
hasty grail
lapis sequoia
#

Hi all, when I do a principal component analysis for asset pricing factors, the returns are the rows by date and the columns are the portfolios, right? So my first column is then the return of the total market? I know not really related to python just that I will do the analysis in python but maybe someone can still help me out haha

gaunt tusk
#

!cban 1013683173219106816 Joined just to spam a sus looking executable file link

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @summer pond permanently.

old grove
#

any tableau experts here ?

serene scaffold
chrome lake
#

I am performing multiple linear regression with sklearn in python
I am at the feature selection/feature engineering section
And I am trying to nail down some columns to remove if need removing
RN i found the differetn feature's columns variances
and I am trying to decide a cut off point
How would one find a safe cutoff point?

wooden forge
#

Hey there, currently using Ipywidgets to animate a 3d surfaceplot, the thing is, whenever I change the step of the Play widget, the animation doesn't work anymore, and it seems to only accept integer values as step value and that's really bothering, anyone would know how to solve this?

#

And I can't find anything online it's really bothering

#

I'd truly appreciate anyone's help, I'm stuck on this for too long and I can't find any solution

lapis sequoia
#

is there a function which outputs the pearson and spearman correlation in one table?

brave sand
inland notch
#

so there is this project on the internet which involves mapping areas which havent been previously mapped before from satellite images , to do this you just outline the structure and put a tag which is the structure type, I was wondering if someone could tell me if sometype of AI could map the buildings using maybe image recognition? I dont know much about AI so could someone tell me if this would be possible?

cloud sand
#

it's almost guaranteed to work because google has been doing this exact thing for ages

worthy hollow
#

HERE THE PROBLEM WITH SMALLER DATAFRAMES, i got 2 df as input and one output in which I make use of both input dfs
INPUT

#

!e ```py
import pandas as pd

input_nat = pd.DataFrame({ "Planets": ["Earth", "Mer", "Ven"],
"Degrees": ["38", "156", "310"],
"Start_Date": ["25948", "107774", "42177"],
"Now": ["30935", "128441", "50284"],
"Cycles": ["130", "133", "26"],
"0": ["30935", "128441", "50284"],
"1": ["30974", "128597", "50595"],
"2": ["31012", "128754", "50905"]
})

input_cum = pd.DataFrame({"Date": ["21/05/2022","08/09/2022","10/10/2022", "18/10/2022","19/11/2022", "24/11/2022", "21/03/2023", "01/10/2023", "07/12/2023"],
"Earth": ["30830","30935", "30966", "30974", "31006", "31012", "31130", "31317", "31384"],
"Mer": ["128017","128441", "128597", "128643", "128754", "128768", "129234", "130062", "130299"],
"Ven": ["50108","50284", "50336", "50349", "50400", "50408", "50595", "50905", "51013"]
})

print(input_nat)
print(input_cum)

arctic wedgeBOT
#

@worthy hollow :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |   Planets Degrees Start_Date     Now Cycles       0       1       2
002 | 0   Earth      38      25948   30935    130   30935   30974   31012
003 | 1     Mer     156     107774  128441    133  128441  128597  128754
004 | 2     Ven     310      42177   50284     26   50284   50595   50905
005 |          Date  Earth     Mer    Ven
006 | 0  21/05/2022  30830  128017  50108
007 | 1  08/09/2022  30935  128441  50284
008 | 2  10/10/2022  30966  128597  50336
009 | 3  18/10/2022  30974  128643  50349
010 | 4  19/11/2022  31006  128754  50400
011 | 5  24/11/2022  31012  128768  50408
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/metevimazi.txt?noredirect

worthy hollow
#

OUTPUT

#

!e ```py
import pandas as pd

output = pd.DataFrame({ "Planets": ["Earth", "Mer", "Ven"],
"Degrees": ["38", "156", "310"],
"Start_Date": ["25948", "107774", "42177"],
"Now": ["30935", "128441", "50284"],
"Cycles": ["130", "133", "26"],
"0": ["08/09/2022", "08/09/2022", "08/09/2022"],
"1": ["18/10/2022", "10/10/2022", "21/03/2023"],
"2": ["24/11/2022", "19/11/2022", "01/10/2023"]
})

print(output)

arctic wedgeBOT
#

@worthy hollow :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |   Planets Degrees Start_Date     Now Cycles           0           1           2
002 | 0   Earth      38      25948   30935    130  08/09/2022  18/10/2022  24/11/2022
003 | 1     Mer     156     107774  128441    133  08/09/2022  10/10/2022  19/11/2022
004 | 2     Ven     310      42177   50284     26  08/09/2022  21/03/2023  01/10/2023
worthy hollow
#

as you can see, output takes the initial form of the dataframe input_nat and replace the values of columns "0, 1, 2" by their respective index date from input_cum
i want to make a for loop or anything else that can be able to reproduce this output

#

if anyone could help thatd be lovely

steady basalt
silent mesa
#

what are the important type of models i should know?
like ik keras sequential....any resource to speedrun this info?

desert oar
desert oar
# steady basalt I have a place to live without the need for money. I’m now finished w thesis tho

then it might not be a bad idea. if your current job is not too busy or stressful then you can probably do the work that you need to do in nights and weekends. i would offer to help by looking at your cv etc. but i am hesitant to give advice since you live in an area that i am not familiar with when it comes to the job market. but you could at least bring it up with some people that you know and trust IRL and see what they think about it

untold bloom
#
mapper             = df_2.melt(id_vars="Date", var_name="Planets").set_index(["Planets", "value"])
numeric_cols       = df_1.columns[df_1.columns.str.fullmatch("\d+")]
pairs              = pd.MultiIndex.from_frame(df_1.set_index("Planets").filter(numeric_cols).stack().droplevel(-1).reset_index(name="value"))
df_1[numeric_cols] = mapper.loc[pairs].to_numpy().reshape(-1, len(numeric_cols))
  • form a mapper of (planet, value) -> date
  • get the names of the numeric columns, i.e., those that match ^\d+$
  • get the (planet, value) pairs out of the numeric columns as a MultiIndex; direct set_index won't work (won't repeat, e.g., Earth, for each value) so we stack
  • map the pairs; now we have the dates and so far no operation implicitly sorted things for us or did something to disturb order; therefore we can safely go to NumPy domain and reshape there the flat result; then we assign at the end
silent mesa
worthy hollow
desert oar
#

check the pinned messages. the best consolidated resources currently are textbooks and online courses. if you were interested specifically in getting a high-level overview of the various kinds of models that are used in machine learning, you can look at https://scikit-learn.org/stable/user_guide.html and https://hastie.su.domains/ElemStatLearn/ , but not all of the models listed there are in common use nowadays, some of them having been supplanted by deep neural networks in many problem domains

worthy hollow
# untold bloom ```py mapper = df_2.melt(id_vars="Date", var_name="Planets").set_ind...
helcunn = helio_cum.copy() # DF WHERE I GET THE DATES (df_2)
nat_test = nat_h.copy() # DF WHERE I WANT TO REPLACE DEGREES VALUES BY THEIR CORRESPONDING DATES (df_1)

mapper             = helcunn.melt(id_vars="Date", var_name="Planets").set_index(["Planets", "value"]) 
numeric_cols       = nat_test.columns[nat_test.columns.str.fullmatch("\d+")]
pairs              = pd.MultiIndex.from_frame(nat_test.set_index("Planets").filter(numeric_cols).stack().droplevel(-1).reset_index(name="value"))
nat_test[numeric_cols] = mapper.loc[pairs].to_numpy().reshape(-1, len(numeric_cols))

nat_test
#

ok so this code

#

brings me this error:

#
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11096/2558107041.py in <module>
      5 numeric_cols       = nat_test.columns[nat_test.columns.str.fullmatch("\d+")]
      6 pairs              = pd.MultiIndex.from_frame(nat_test.set_index("Planets").filter(numeric_cols).stack().droplevel(-1).reset_index(name="value"))
----> 7 nat_test[numeric_cols] = mapper.loc[pairs].to_numpy().reshape(-1, len(numeric_cols))
      8 
      9 nat_test

KeyError: "[('Mer', 128600.0), ('Mer', 128756.0), ('Mer', 128912.0), ('Mer', 129847.0), ('Ven', 51217.0), ('Ven', 51527.0), ('Ven', 52148.0), ('Ven', 52458.0), ('Ven', 52768.0), ('Ven', 53079.0), ('Mar', 18100.0), ('Mar', 18335.0), ('Mar', 18571.0), ('Jup', 2910.0), ('Jup', 3207.0), ('Jup', 3505.0), ('Jup', 3802.0), ('Jup', 4099.0), ('Jup', 4396.0), ('Jup', 4693.0), ('Jup', 4990.0), ('Jup', 5287.0), ('Sat', 1217.0), ('Sat', 1381.0), ('Sat', 1545.0), ('Sat', 1709.0), ('Sat', 1872.0), ('Sat', 2036.0), ('Sat', 2200.0), ('Sat', 2364.0), ('Sat', 2528.0), ('Ura', 720.0), ('Ura', 1071.0), ('Ura', 1423.0), ('Ura', 1774.0), ('Ura', 2125.0), ('Ura', 2476.0), ('Ura', 2827.0), ('Ura', 3178.0), ('Ura', 3530.0)] not in index"```
untold bloom
#

there are some (planet, value) pairs in your frame that doesn't have a corresponding date in the other frame, this means

worthy hollow
#

ah i see yeah that does make sense

#

more accuratly

worthy hollow
#

how could i write somethings in your code, that **IF it doesnt find the exact number, scan and take the most similar number to it **and show its date instead

worthy hollow
untold bloom
#

are those planetary values unique? those 30935, 128754 etc.

untold bloom
#

okay...

#

we can use pd.merge_asof; it will handle the merging without exact matches but rather with "nearest" match

#

code changes slightly...

#

no longer need Serieses as the mapper and pairs, but rather dataframes for merge_asof

#

so

#
dates              = df_2.melt(id_vars="Date", var_name="Planets")
numeric_cols       = df_1.columns[df_1.columns.str.fullmatch("\d+")]
pairs              = df_1.set_index("Planets").filter(numeric_cols).stack().droplevel(-1).reset_index(name="value")
mapper             = pd.merge_asof(pairs.astype({"value": float}).sort_values("value"),
                                   dates.astype({"value": float}).sort_values("value"),
                                   on="value",
                                   direction="nearest").astype({"value": int}).astype({"value": str}).set_index("value")["Date"]
df_1[numeric_cols] = df_1[numeric_cols].replace(mapper)
#

the .astype({"value": float}).sort_values("value") parts in pd.merge_asof are inherent requirements of the function: need numeric column to merge on, and it needs to be sorted.

#

since some planetary values might be large (idk), i used float there instead of int

#

in the second to last part, .astype({"value": int}).astype({"value": str}) converts those floats back to string

#

with this code, the mapper is value -> Date

#

since you said planet is not important in uniqueness but values are enough, we reduced to value -> date mapping

#

lastly, .replace(mapper) will replace the values in those numeric columns with this mapper to fill in with dates

worthy hollow
#

ok so

worthy hollow
#

i've used your code but as you can see, it doesnt return

#

date valuesbut actually the degrees

#
helcunn = helio_cum.copy() # DF WHERE I GET THE DATES (df_2)
helcunn = helcunn.round()

nat_test = nat_h.copy() # DF WHERE I WANT TO REPLACE DEGREES VALUES BY THEIR CORRESPONDING DATES (df_1)

dates              = helcunn.melt(id_vars="Date", var_name="Planets")
numeric_cols       = nat_test.columns[nat_test.columns.str.fullmatch("\d+")]
pairs              = nat_test.set_index("Planets").filter(numeric_cols).stack().droplevel(-1).reset_index(name="value")
mapper             = pd.merge_asof(pairs.astype({"value": float}).sort_values("value"),
                                   dates.astype({"value": float}).sort_values("value"),
                                   on="value",
                                   direction="nearest").astype({"value": int}).astype({"value": str}).set_index("value")["Date"]

nat_test[numeric_cols] = nat_test[numeric_cols].replace(mapper)

nat_test
untold bloom
#

in the samples you sent values were of type string; is it the case with the data you're showing in that image?

worthy hollow
#

those are float, is it the problem? Shall I convert them to STR?

untold bloom
#

then perhaps try without .astype({"value": int}).astype({"value": str}) in the mapper's definition

#

because that's converting floats to decimalless strings

worthy hollow
#

still bringing the same

untold bloom
#

this won't work either

worthy hollow
untold bloom
#

what .astype({"value": str}) does is convert "value" column's values to strings

worthy hollow
#

@untold bloom you are a genius mind thats crazy thanks A LOT

#

but i have just another question

#

wait lemme open my excel so u can see

#

here are the correct dates values

#

here's the one it generate

worthy hollow
#

and not the nearest value in the future <---- its what we actually want it to take

#

idk if it's understandable, i'm not expressing myself well

untold bloom
#

then perhaps try direction="forward"

worthy hollow
#

giving old

#

highlighted in red

untold bloom
#

i don't know, sorry

worthy hollow
#

no worry

#

thanks a lot for your time and dedication, much appreciated, you made me advance load, i'll try to figure a solution, have a great day

molten plinth
#

hey everyone, is this a good place to ask pandas questions?

untold bloom
#

you too

lapis sequoia
#

Can anyone tell me why we need to scale the data?

hardy kernel
#

is there a fast way to do this transformation on a numpy array of around 100k to 200k entries?

threshold = 5
replacement = -100
[1,2,3,4,5,6,7,8,9,10] transforms into [-100, -100, -100, -100, 5,6,7,8,9]

basically checks if an element exceeds a threshold, if it does, replaces it with a value.

wanted something like np.maximum()
wooden sail
hardy kernel
#

does it generate a new array or do the transformation in-place?

wooden sail
#

!e

import numpy as np
thresh = 5
rep = -100
x = np.arange(1,11)
print(x)
x[x < thresh] = rep
print(x)
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [ 1  2  3  4  5  6  7  8  9 10]
002 | [-100 -100 -100 -100    5    6    7    8    9   10]
hardy kernel
#

you are amazing

#

also wow that is a sick feature to have on a bot

wooden sail
#

doing x[indices] = values replaces the values of x in place at the specified indices

hardy kernel
#

awesome

wooden sail
wooden sail
# hardy kernel awesome

ah also if you wanted it in more of a function form, you can do the same with np.where, which behaves very much like a ternary bool

hardy kernel
#

thank you , good to know !

#

I read that np.maximum was the fastest that's why I was thinking if there was a np related function or not. I guess where is the closest?

wooden sail
#

np.where(x < thresh, -100, a) would be the nomenclature

hardy kernel
#

tyvm

wooden sail
#

any method that directly exploits numpy's broadcasting and indexing should be fine. i.e. as long as you are not writing a for loop for this, you're good

timid kiln
#

I've got this dictionary, when I convert it to a dataframe the keys turn into the index. I want the keys to turn into a named column and have a numerical index instead. How do I make this happen?

ripe forge
#

Just reset index after making a dataframe

lapis sequoia
#

halo, i have a bunch of conversational data and i want to turn it into a chat bot, how should i go about doing this>?

#

like ALOT of conversational data

#

On Tech with Tims yt channel he creates a classifier that classifies wether a review is positive or negative. I dont quite understand how the model knows to specify output based on that feature, can anyone enlighten?
https://www.youtube.com/watch?v=k-_pWoy2fb4&list=PLzMcBGfZo4-lak7tiFDec5_ZMItiIIfmj&index=5&ab_channel=TechWithTim

This python neural network tutorial introduces the idea of text classification using a neural network and tensorflow 2.0. We will create a fairly simple model that is capable of classifying movie reviews as either positive or negative!

Text-Based Tutorial: https://techwithtim.net/tutorials/python-neural-networks/text-classification-p1/

Playlis...

▶ Play video
half horizon
#

I have a dataset of coffee prices
it has dates, closeing rate, opening rate, high, low

I want to create a polynomial regression model to predict coffee prices in the future

#

but I only have one feature which is the Dates

#

can someone guide me here ?

desert oar
cinder schooner
#

Hello everyone, I want to start learning tensorflow to then pass the certification. Is there any official book or course I can read? or else would you recommend me some ressources to learn?

serene scaffold
#

also, tensorflow is declining in popularity relative to pytorch

cinder schooner
#

there's the tensorflow developer certification

#

I searched and there's no pytorch official certification or anything so I thought it would be better to learn tensorflow

serene scaffold
#

I would verify if any employers even care about that certification. my guess is that they won't.

#

@desert oar do you know?

desert oar
#

no idea. in general i'm told that certifications aren't worth a whole lot, because being certified with tensorflow doesn't mean that you actually are a good data scientist

cinder schooner
#

For some context, I started a year ago specializing in AI

desert oar
#

things like this always depends on what kinds of jobs you are trying to get, your background and experience level, and what region you are in

cinder schooner
#

I want to become an ML engineering and I'm coming from a software development background so all of my previous work experiences are in Web dev or software. So I don't have much to show for to land an ML opportunity. I'm currently doing a master degree that I finish in March and I need to land an internship for April so I thought that specializing in either Tensorflow or Pytorch would be good and I found that the certification might help me. Any thought on this ?

serene scaffold
#

oh, you don't have an internship lined up. hmm. well have you applied?

cinder schooner
#

I will be working on a project with that library but to prepare for interviews, I know the theory around ML and DL but I can't answer technical questions on DL frameworks so I thought this would help

serene scaffold
#

well, it wouldn't be a waste of time. does your university give you access to O'Reily online?

cinder schooner
#

No it doesn't

chrome lake
#

Hello guys

tacit basin
#

Hi 🙂

chrome lake
#

If I have a list of categorial column which I can then perform dummy operations, how would I create a correlation matrix between my categorial feature columns and the target column?

#

Essentially

#

How do i select what columns to use for linear regression if my columns are categorial/dummy data

tacit basin
#

What's target column?

chrome lake
#

These are some of my categorial ones:

#

Here is my target:

#

I'm uploading it 1 sec

#

'SalePrice is my target.

#

The first image are some of my categorials that are due to be dummied

tacit basin
#

Use all fratures

chrome lake
#

Any particular reason y?

tacit basin
#

Why not.

#

Did it give you poor performance?

#

Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued data, such as using the Pearson’s correlation coefficient, but can be challenging when working with categorical data. The two most commonl...

chrome lake
#

No I didn't check the performance actually

#

I should do that lol

tacit basin
#

Yeah do it first to get a baseline

chrome lake
#

Alright, thank you so much :)

rich olive
#

Guys I have a dataframe where I create a new column based on generating keywords from another existing column. I want to one hot encode my new column. Problem is each value has multiple keywords. Right now they're stored in a list which is unhashable. How would you go about ohe something like this?

tacit basin
rich olive
#

Thanks that's exactly it

#

I actually already did it manually lol

#
job_title_filtered = ds_salaries
for keyword_lst in job_title_keywords:
    job_title_filtered[keyword_lst[0]] = job_title_filtered.job_title.apply(
        lambda x: [1 if any(keyword in x for keyword in keyword_lst) else 0]
)
#

oops idk why i put lambda output in a list

velvet plover
#

hello guys

#

does anyone have experience with tensorflow

unborn adder
#

I just started python recently, I have done some coding in C++ for a while now...there is robotics competition in my school in March, my friend and I decided we will join and we wanted to build AI robo car for performing some tasks(following line, sorting different color boxes, climbing), I want to ask if someone can recommend any good resource for learning ML or AI for a beginner(I have no prior experiencewith it)

steady basalt
#

I’m leaning towards your judgement and logic. It’s just extremely tiring and hard. Especially when one has a gf.

#

Imagine it doesn’t go down too well both on me mentally or her emotionally if I’m coming home at 5:30 and doing sql, python, ML and maths until 10

#

And all on weekends too

rich olive
#

I'm trying to sns.boxplot() my OHE series so that any row with an entry of 1 in those series gets included in that x-value. Any ideas?

#

I don't know how else to boxplot with overlap for items that fit into several x categories

lapis sequoia
#

halo, i have a bunch of conversational data and i want to turn it into a chat bot, how should i go about doing this>?

#

plej help

pure plover
#

I'm trying to concatenate data in pandas using pd.concat - based on the documentation, it looks like I'm doing it correctly but the output indicates otherwise.

#

The output is just one of the eight datasets I'm attempting to merge

#

df_ET=pd.concat([df_ET1, df_ET2,df_ET3...])

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

pure plover
#

okay

#

{'Time [s]': {0: 0.0, 1: 60.013, 2: 120.014, 3: 180.015, 4: 240.025}, '0uM': {0: 224, 1: 287, 2: 354, 3: 420, 4: 485}, '0.61uM': {0: 218, 1: 275, 2: 340, 3: 402, 4: 459}, '2.44uM': {0: 192, 1: 212, 2: 253, 3: 298, 4: 332}, '9.76uM': {0: 123, 1: 133, 2: 136, 3: 139, 4: 143}, '39.1uM': {0: 109, 1: 111, 2: 112, 3: 112, 4: 113}, '156uM': {0: 103, 1: 102, 2: 106, 3: 103, 4: 105}, '625uM': {0: 92, 1: 94, 2: 94, 3: 95, 4: 95}, '2500uM': {0: 96, 1: 103, 2: 103, 3: 99, 4: 100}, '10.2uM IAEDANS': {0: 26028, 1: 25725, 2: 25171, 3: 24840, 4: 24392}, 'Protein': {0: 'WT', 1: 'WT', 2: 'WT', 3: 'WT', 4: 'WT'}}

serene scaffold
#

Are they all like this?

pure plover
#

yes

serene scaffold
#

Okay. So when you used concat, what happened? What did you want to happen?

#

If you got an error, and you say "I got an error", I don't know what error you got.

pure plover
#

The resulting dataframe was only the last of the dataframes I was attempting to merge (#8)

serene scaffold
#

Are you sure that's not just how it was displayed?

serene scaffold
pure plover
#

question, does the index also repeat? When I check tail, the index only goes to 59 out of 480

#

but it indicates 480 rows

#

I do see dataset #1 in the df.head() so it does look like its there

desert oar
half horizon
#

anyone free, need some help in data parsing and cleaning for a regression model

#

I have a date like 03/01/2000

should i separate it into 3 features ?
Like:
Day = 3
Month = 1
Year = 2000

and use these 3 features Instead of one

#

would it like help my model ?

desert oar
half horizon
#

english please...

half horizon
#

the values are gonna be cyclical but not the model ....

quaint loom
#

Aloha

desert oar
steady basalt
#

@desert oar ic asnt DM u not my friend

#

u dsiabled friend req

desert oar
#

sent

quaint loom
#

Is there anyone who know how to do this arithmetic in python?

quaint loom
wooden sail
#

pretty much exactly as you have it there

desert oar
#

otherwise use the information i told you before

quaint loom
desert oar
#

* for multiplication, and use () for grouping

wooden sail
#
4096*(0.6108*exp(...))/(...)**2
quaint loom
desert oar
#

** for powers

wooden sail
#

though i guess one of the parentheses really isn't needed

quaint loom
tidal bough
quaint loom
wooden sail
half horizon
quaint loom
#

What is often confusing me is the division above another division and how to write it and that is why I feel like asking. @wooden sail @desert oar

quaint loom
quaint loom
wooden sail
#

.latex note that
[
\frac{ \left( \frac{a}{b} \right) }{ \left( \frac{c}{d} \right) } = \frac{(a/b)}{(c/d)} = (a/b)/(c/d)
]

strange elbowBOT
quaint loom
#

for me

quaint loom
#

For some reason, it wont be solved.

tidal bough
#

order of operations is wrong here

#

and as for the syntax error, you probably have a missing )

wooden sail
#

you missed a closing ) somewhere

quaint loom
lapis sequoia
#

Do anyone understand this errormessage?

serene scaffold
# lapis sequoia

It means that fitmodel is not the type that you expected it to be.

In the future, please do not ask people to read screenshots of text. Please copy and paste the actual text into the chat.

tidal bough
#

We should make the bot delete screenshots and replace them with text OCRed from them. It is of course a horrible idea and will create far more problems than it will solve. Looking forward to it 🥴

lapis sequoia
wooden sail
#

because a lot of the time we don't know the answers either, we'll just test stuff out until we find a solution

#

if we can copy the code, it's easier to help (or motivates one to try to help in the first place)

#

also stuff is usually left out otherwise. e.g. what's fitmodel?

serene scaffold
#

whatever a "History" is bing_shrug

lapis sequoia
#

Im trying to do this

tidal bough
#

I don't see fitmodel.predict anywhere here, which is what your code fails at.

lapis sequoia
#

This TechwithTim video is 3y old. TF has changed?

tidal bough
lapis sequoia
#

I dont really understand the difference between "Fitmodel" and "model". Isnt it the same? I also tried using both and doesnt work

tidal bough
#

You're trying to do fitmodel.predict. But fitmodel isn't the model, it's an output of it, so it understandably doesn't have predict.

lapis sequoia
tidal bough
#

whoops, sorry, not a prediction

lapis sequoia
#

It thought it saved the model with a new name "fitmodel"

tidal bough
#

it's a History instance, so it's basically data on the training process

serene scaffold
#

though this is the first I've heard about History objects.

lapis sequoia
serene scaffold
#

there are new versions pretty often, but I'm not sure when the last major release was.

tidal bough
#

so fitmodel is a horrible variable name and should be something like, like in the example there, history

lapis sequoia
wooden sail
#

maybe he goes on to plot the loss later on?

lapis sequoia
#

I think I came to some conclussions now, thanks for the help alot!

#

I really appreciate CS, I have just started getting into deeplearning

half horizon
#

anyone wanna help me with polynomial regression in #help-kiwi

serene scaffold
#

Please don't ask to ask. I realize you're trying to be polite, but it's actually obnoxious.

chrome lake
#

is this output value a bit too low for an RMSE?

#

Or is it okay and just means i did well

#

Is there such a thing as too low of an RMSE where its sketch

steady basalt
#

nvm u dont

quaint loom
steady basalt
#

its fine

#

if its math rleated to data sciecne

serene scaffold
steady basalt
#

even if its un reelevent?

#

i can come here with my calc homework?

#

i ahve a question

serene scaffold
#

I'll decide that case by case.

#

But usually no. There is a math discord

quaint loom
steady basalt
#

f(x) = 3x + 7x^2 - 12x^4, as x approaches infinite

steady basalt
#

whats the behaviour?

#

how do u find any asyomptote from that?

#

I thought u have to factorise and make a fraction but its not working

quaint loom
somber panther
#

anyone recommend some good lectures for beginning data analysis?

#

want to go over the fundementals, im pretty good with excel

#

and obv learning python...

shell crest
#

read a book

serene scaffold
#

.wa limit of 3x + 7x^2 - 12x^4 as x approaches infinity

strange elbowBOT
serene scaffold
#

Victory for Stelercus!

lapis sequoia
#

Hi. Has anyone ever used models on Hugging Face (and autotrain) for instance segmentation? Looking for some pipeline pointers.

desert oar
shell crest
#

I know this is a bad question but does anyone know how does fastKDE work? I want to customise the number of levels, but I'm not seeing how inside the documentation. I have not really read into how the FFT goes into the KDE calculation, but I'm looking for something more efficient than scipy's KDE

mystic tinsel
#

Hello! Does anybody here have experience with nanodet? If so could you please provide some insight on the 32 raw bb coordinates returned by the model?

wooden sail
steady basalt
#

Or for factorised

wooden sail
#

hmm?

#

you can factor it yourself, and that poly is also a fraction, just a trivial one

#

denominator = 1

steady basalt
#

therefore theres no horizontal asymptote?

lapis sequoia
#

halo, i have a bunch of conversational data and i want to turn it into a chat bot, how should i go about doing this>?

cloud sand
lapis sequoia
cloud sand
#

wdym?

lapis sequoia
#

idek what dat means

steady basalt
#

how tf this make sense WTF

cloud sand
#

you basically benefit of the greater sampling efficiency thanks to the scaling laws of llms so that without a corpus of terabytes of data you can get it to learn

cloud sand
wooden sail
#

how do you add 1/3 + 1/4?

#

you rewrite it as 4/12 + 3/12 = 7/12. that's exactly what is happening there, but you're trying to do it in the opposite direction

cloud sand
#

hint: the product of the two denominators is a remarkable product, so you can simply abbreviate it

steady basalt
#

1/7

#

a/(x-2) + b/(x+4)

#

a/1 + b/7

#

somehow equals 1/7

wooden sail
#

i have no idea what you're trying to do there

#

ah you made a substitution into it

#

well, the solution is trivial isn't it? at a glance a=0 and b = 1

#

this only makes sense if the denominator can be nontrivially factored, but you chose a prime number for your example

steady basalt
#

7a+b = 1 is how far i got

#

a/1 + b/7

#

does that HAVE to mean a is zero and b is 1?

#

omg

#

well what if b is a value taht means that both are non zero

wooden sail
#

it's one solution, but you're doing so many things wrong at the same time it's impressive

steady basalt
#

first time 😄

#

uve been tracking my progress

wooden sail
#

at the point where you rewrite it as 7a + b = 1, you should immediately think back to your linear algebra and recognize this as a hyperplane in 2D, i.e. a line

#

meaning there are infinitely many solutions along a line

steady basalt
#

taht idnt occur to m e

wooden sail
#

because this decomp makes no sense when applied to scalars

steady basalt
#

did i just get unlucky with a awkward example

#

typical

wooden sail
#

you made the mistake of working with a constant fraction, which i did only to motivate where this comes from

#

but in the backward direction it makes no sense

steady basalt
#

so theres inf solutions for a and b

wooden sail
#

work with an actual x-dependent function

steady basalt
#

well, i was watching a integration video wher this first came to me and the guy did 1/.... = a/... + b/....

#

and a and b where 0.5

wooden sail
#

that has nothing to do with anything i just told you

#

in your position, trying to interpret stuff in your own words is detrimental because you don't have the basis to do this correctly and sensibly

steady basalt
#

is this not constant?

wooden sail
#

no dude wtf?

#

do you see the x's everywhere?

steady basalt
#

I had x's in both my factors too

wooden sail
#

no you didn't. the first thing you did was say "let x = 3"

#

from that point on, everything was constant, which is why you got infinitely many solutions

steady basalt
#

that makes sense

#

so x must be unknown to find solutions for a and b

#

?

wooden sail
#

that's actually not at the crux either

#

the key observation is that you have 2 unknowns, a and b

#

that means you need at least 2 equations to be able to find them uniquely

steady basalt
#

and in my case i had 1 = ....

#

or

wooden sail
#

in this case, you obtain the equations by evaluating x at different values

steady basalt
#

1/7 =

wooden sail
#

you evaluated x at 1 value. that's not enough

steady basalt
#

i had no other equation than itself ?

wooden sail
#

you did before you took x = 3

#

the evaluation is to be done later

#

you have to evaluate x somewhere else as well

steady basalt
#

let me try that again without setting x i just did that thinking i could check with it

#

its 10am realise u cant do that

#

so the solutions are just asmpytoe and 2 and -4?

#

(verticle)

wooden sail
#

what?

#

what asymptotes? weren't we doing partial fraction decomp?

steady basalt
#

we were i just wante dto check before moving forward

#

also, im trying to prove now that it equals 1/x-2 + 1/x+4

wooden sail
#

your loose usage of "it" "that" and the like makes it impossible for me to understand you

steady basalt
#

1/(x-2)(x+4)

#

so its as simple as saying 1/1 + 1/7 = 1/7?

wooden sail
#

still not making sense

half horizon
#

if someone can help me in converting liner regression model to polynomial please check #help-peanut

plain saffron
#

suppose I have a dataset with let's say int arrays as values and floats ranging from 0 to 10 as labels. and say the amount of data per value range makes a normal curve (not sure if that's the term but whatever I think it's understandable). so there's very few data with labels in the 0s and 9s but lots of it in the 5s. but the data on a larger sample is way more even than in this dataset. will there be a bias or will it be accurate anyway? should I try to even it out before training or is it not worth? or is there something else I need to do in order for it to work properly?

wooden sail
#

if you know for certain that that is the shape, what you'll have wouldn't be bias, it'd be variance. if you then do a single fitting on the data, then yeah, you'll get the correct parameters with probability 0, which you could in some sense think of as a "bias" (but it's rather that estimators yield random variables)

plain saffron
#

if you know for certain that that is the shape
well I know for certain that it isn't

wooden sail
#

what i mean is, you know that it would be a normal dist if there were no noise/if you had infinitely many samples

plain saffron
#

I don't understand what you mean. maybe I shouldn't have used the term normal curve because it's not accurate it was really just an estimation. what I was trying to say is that the shape of my dataset doesn't correlate at all to the shape of the entire dataset

wooden sail
#

i don't think what you said right now is what you mean either though

#

if that's the case, you just have a bad sample and you shouldn't use it at all

#

if it has some error but on average the correct shape (i.e. properly sampled but small population), then you can still produce estimates

plain saffron
#

maybe my definition of "shape" isn't right then...

wooden sail
#

your usage of "doesn't correlate", too

plain saffron
#

the values are all correct and are the same as with the full dataset, the thing that's different is that I have just more values with 5s than I have values with 9s, which isn't the same as with the full one... Is that what I said? because that's what I meant to say

#

with the full one there are approximately as much 5s as there are 9s

#

same goes for all the other values

wooden sail
#

and you want to find out the distribution of the whole dataset from this sample?

plain saffron
#

no, I just want to train a model with only that sample

#

and I was wondering if it would yield similar results than a model trained with a larger sample

wooden sail
#

that'll depend on how large the difference is in the relative frequencies between your sample and the larger sample. if they're close (or if you compensate the difference), then yes

plain saffron
#

so my question is, it would be less accurate because of the size difference, sure, but would it also be less accurate because of the uneven distribution of data in my dataset? from what you've just said I'm assuming yes it would? and by "compensate the difference" do you mean even out my sample or something else?

#

sorry if my questions sound redundant I'm not too familiar with all things ai yet

wooden sail
#

by compensate the difference i mean indeed even out the sample, or alternatively use a custom cost function that accounts for this

#

as a degenerate example, imagine you train your model using a sample that only has 9s. you can imagine it will perform poorly if you then use this model on the full dataset that looks completely differently

#

any time you use data-driven methods, the quality of the data directly impacts how good the result is

plain saffron
#

yeah it does make more sense when you look at it this way

#

well thank you I guess I got all the answers I was looking for

steady basalt
#

@wooden sail now that the exams over, i show you this: integrate between a and infinity, e^bx dx

#

does e have special properties for this?

wooden sail
#

the usual properties for integration and differentiation

steady basalt
#

this is far away from my currenty studying as you progbably know

#

how would you go by solving it?