hasty mountain Mar 10, 2023, 1:05 AM

#

Guys, in the field of drug discovery...
I'm a bit used to dealing with images, so I'd very much like to use chemical formulas in SMILES labels, convert them to vectors and organize these vectors into numpy arrays to somehow simulate a chemical molecular formula of a compound.
However, I've also came to know that there's the option of using molecular graphs(which seems to be a bit crazy to me).

Can someone tell me which approach tends to be more promising?

#

Though this molecular graph seem to be somehow like a multi-dimensional vectorization pithink

lapis sequoia Mar 10, 2023, 1:12 AM

#

HELL ooo
I think it is funny btw

I am intrested to make bot that suggest me some stuff based on statistic

I guess i think i have TO learn about MACHINE LEARNING.
That might be something else.

where should i start?
Or what else i need to do?

I know i have a google but i cannot interact and cannot get what i wanted.

So... yeah that is all

patent lynx Mar 10, 2023, 1:18 AM

#

Hello I want to find suggestions on how to fine tune hyperparameters to predict NBA point spread?

#

             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=3, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=0.03550000000000002,
             max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=20, max_leaves=3,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=600, n_jobs=-1, num_parallel_tree=None,
             predictor=None, random_state=None, ...)```

#

It fails to capture the fatter tails of my true Y

hasty mountain Mar 10, 2023, 1:44 AM

#

patent lynx It fails to capture the fatter tails of my true Y

The idea is to predict the points in each game or tournament?
Maybe you could make a model to try to predict a median or a standard deviation, and then use that as a feature for another model to predict the actual points... pithink

#

Or maybe the max expected points for a game/tournament...

patent lynx Mar 10, 2023, 1:45 AM

#

in each game, my features are already in a rolling mean/median so that I can predict the next game's stats...

#

this is done so i can predict the point spread within a sports betting website

hasty mountain Mar 10, 2023, 1:46 AM

#

Uh...then maybe try some ensemble learning in another way?

hasty mountain Mar 10, 2023, 2:24 AM

#

Exactly using GNNs, but with conv operations

#

Dealing with 3 dimensuonal molecular formulas in general, the isomers...

#

Using GANs and VAEs to generate new molecules. All those seem easier to me when dealing with n-dimensional arrays rather than simple vectors.

#

I'm trying to review/enhance TrimNet

#

A quick search (quick one, I still didn't read anything) shows me that molecular graphs are usually represented as structures that resemble molecular formulas. If I can use molecular graphs in n-dimensional arrays, then goodbye SMILES brainmon

limber kiln Mar 10, 2023, 2:51 AM

#

Why is my torch loss going to NAN -

#

import numpy as np
import torch
from torch import nn
lr = 0.001
epochs = 100
def generate_random():
    # https://stackoverflow.com/questions/35730534/numpy-generate-data-from-linear-function
    x = np.arange(100)
    delta = np.random.uniform(-10, 10, size = (100, ))
    y = .4 * x + 5 + delta
    return x, y

class linear_regression(nn.Module):
    def __init__(self):
        super(linear_regression, self).__init__()
        self.layer = nn.Sequential(nn.Linear(1,1))
    def forward(self, x):
        return self.layer(x)

linear_model = linear_regression()

loss = nn.MSELoss()

opt = torch.optim.SGD(linear_model.parameters(), lr = lr)


x, y = generate_random()
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
print("x = ", x.shape, " y = ", y.shape)
for i in range(epochs):
    x = torch.tensor(x).to(torch.float32)
    y = torch.tensor(y).to(torch.float32)

    pred = linear_model(x)
    model_loss = loss(y, pred)
    with torch.no_grad():
        print("model_loss = ", model_loss)
    opt.zero_grad()
    model_loss.backward()
    opt.step()

#

Can someone please help?
Sorry, I am sure I am doing something really silly

#

Never mind. My learning_rate was high.

serene scaffold Mar 10, 2023, 3:28 AM

#

limber kiln Never mind. My learning_rate was high.

tfw the gradient explodes 💥

long aspen Mar 10, 2023, 5:01 AM

#

new to numpy, is there any function that can reshape an item to the same value but in the size of a different dimension?

i seem to not get my toes on...

[1, 2, 3] -> [ [1, 1, 1], [2, 2, 2], [3, 3, 3] ]
np.arange(10).???(???)

slate scroll Mar 10, 2023, 5:02 AM

#

long aspen new to numpy, is there any function that can reshape an item to the same value b...

Looks like you want to repeat along another dimension? Like so, https://numpy.org/doc/stable/reference/generated/numpy.repeat.html

#

np.reshape is to change the shape of your current data but it doesn't change the total number of elements

long aspen Mar 10, 2023, 5:05 AM

#

slate scroll Looks like you want to repeat along another dimension? Like so, https://numpy.or...

got it 👍

.repeat(3).reshape((480, 360, 3)
doing this for pyav (which should've been in #media-processing )

wary breach Mar 10, 2023, 5:52 AM

#

Anyone have experience with multi-layer ensembles?

dawn light Mar 10, 2023, 7:15 AM

#

i came across this video recently: https://youtu.be/_9LX9HSQkWo where they turned videos of themselves into animation by using stable diffusion (specifically, training SD to a specific art style, and training SD to recognize their faces)
is there any guide out there on how i can do something similar (just img2img tho, not video to video), i.e. i train SD on a set of images which i can then use as a style when entering prompts

i was also wondering if there's a guide out there that gives a high-level overview of SD.
Every time i hear Lora, controlnet, dreambooth, etc. i get confused with what exactly their relation to SD is (not to mention the plethora of github repos of SD and huggingface models).

YouTube

Corridor Crew

Did We Just Change Animation Forever?

ANYONE can make a cartoon with this groundbreaking technique. Want to learn how? We made a ONE-HOUR, CLICK-BY-CLICK TUTORIAL on http://www.corridordigital.com/

Watch the full ROCK PAPER SCISSORS anime on Corridor ► https://youtu.be/GVT3WUa-48Y

This project has been a huge labor of love, and it is due to the amazing open-source community that ...

▶ Play video

agile cobalt Mar 10, 2023, 8:35 AM

#

you might want to try asking in the Stable Diffusion discord server

junior schooner Mar 10, 2023, 8:35 AM

#

I'm writing a python program that uses sqlite3 to allow users to create, update and view databases.
Thus far users can add data manually or from the web.

I want to add a module for data visualisation (maybe using plotly or pandas) but am unsure how or what i can implement without knowing what the data is.

For example, if the data is categorical I could use a bar chart or heat map, if it's numerical I could use a line chart or scatter plot. I also wouldn't know what headers go on what axis. Can anyone give me some suggestions of what I could implement without this information?

agile cobalt Mar 10, 2023, 8:37 AM

#

junior schooner I'm writing a python program that uses sqlite3 to allow users to create, update ...

for starters, you might want to look up the metadata tables sqlite offers if you do not know about them

#

most tools would leave "which header goes on what axis" up to the user to decide

#

some examples out of the top of my head slightly similar to what you are trying to do would be google sheets, excel and mode.com

hasty mountain Mar 10, 2023, 9:04 AM

#

Hey guys, what's the difference between "vanilla neural networks"(MLP, convolutional...) and Graph Neural Networks?

I mean, when implementing neural networks from scratch, like from pure numpy, I can understand that there'll be no graphs involved. However, the popular deep learning frameworks(Pytorch, tensorflow/keras) use graphs by default, right? I guess that even allows for proper forward and backward pass with custom operations. So is there any difference between a GNN or a MLP or CNN when working with those frameworks?

hasty mountain Mar 10, 2023, 9:41 AM

#

Hm... So "vanilla NNs" usually require padding to make the data regular, while GNNs don't?

#

Like in NLP models. Since the phrases have different lengths, a padding have to be applied to make all sentences have the same length and so the model can receive them as input

#

Is that it?

hasty mountain Mar 10, 2023, 9:58 AM

#

I see... So, trying to work with arrays here would be assigning a specific structure to my network, which might be innapropriate...

#

pithink

#

I hope it isn't that much difficult to work with graphs in VAEs and GANs...

young granite Mar 10, 2023, 11:16 AM

#

hey folks;

i stumbled across:
https://shap.readthedocs.io/en/latest/

and wanted to use it to display influence of features but not for pictures.
Does one of u has experience with shap?

spark nimbus Mar 10, 2023, 11:32 AM

#

given a numpy array (or pandas dataframe) of datetime64[D], is it possible to change the day on all elements?
My end goal is to get an array or dataframe containing a datetime of the last day of the month, and I already have a numpy array of the number of days in each month.

boreal gale Mar 10, 2023, 11:34 AM

#

young granite hey folks; i stumbled across: https://shap.readthedocs.io/en/latest/ and wante...

but not for pictures.
what does this mean?

long charm Mar 10, 2023, 1:27 PM

#

In Q learning, can the gamestsate change?

#

I’m trying to have the snake from the snake game learn to play efficiently but the state of the gird is constantly changing

velvet bronze Mar 10, 2023, 1:32 PM

#

Hello Guys I want to get into Machine Learning, I just started Numpy, I need a roadmap🥹

steep cypress Mar 10, 2023, 1:45 PM

#

velvet bronze Hello Guys I want to get into Machine Learning, I just started Numpy, I need a r...

try this: https://whimsical.com/machine-learning-roadmap-2020-CA7f3ykvXpnJ9Az32vYXva

Whimsical

Machine Learning Roadmap 2020

2020 machine learning roadmap built in Whimsical. View for detailed mind map on: machine learning resources and the machine learning process for projects.

mild dirge Mar 10, 2023, 1:46 PM

#

steep cypress try this: https://whimsical.com/machine-learning-roadmap-2020-CA7f3ykvXpnJ9Az32v...

Jeez haha

steep cypress Mar 10, 2023, 1:48 PM

#

mild dirge Jeez haha

haha its a lot, there's other ones like: https://e2eml.school/blog.html

Table of Contents

Brandon Rohrer post library

velvet bronze Mar 10, 2023, 1:52 PM

#

mild dirge Jeez haha

Jeeezz hahaa. That's still usefull

velvet bronze Mar 10, 2023, 1:53 PM

#

steep cypress haha its a lot, there's other ones like: https://e2eml.school/blog.html

Woowww I love yours

steep cypress Mar 10, 2023, 1:55 PM

#

velvet bronze Woowww I love yours

glad I could help

simple tapir Mar 10, 2023, 3:40 PM

#

Can i flatten a tensor more than once?

#

I tried flattening one twice but it still has the same shape

serene scaffold Mar 10, 2023, 3:43 PM

#

simple tapir Can i flatten a tensor more than once?

once a tensor is flat, the shape is (n,), where n is the number of elements. you can't get more flat than that.

simple tapir Mar 10, 2023, 3:45 PM

#

Alright, thanks

wooden sail Mar 10, 2023, 3:53 PM

#

what were you expecting to happen when calling flatten more than once?

hardy bramble Mar 10, 2023, 5:41 PM

#

Hii, anyone knows how to generate a graph from a map like this, only the orange border, i've tried with opencv and networkx but is not working

median escarp Mar 10, 2023, 7:42 PM

#

Does space science fall under this channel?

serene scaffold Mar 10, 2023, 7:44 PM

#

median escarp Does space science fall under this channel?

this channel is for any scientific computing that's done in Python, but if your question requires domain-specific knowledge about space science to solve the Python part, then it's unlikely that anyone will know the answer.

median escarp Mar 10, 2023, 7:55 PM

#

Ic.. actually Im working with ISS based calculations. And other things. Eg-Tracking satellites

serene scaffold Mar 10, 2023, 7:58 PM

#

median escarp Ic.. actually Im working with ISS based calculations. And other things. Eg-Track...

if you can ask the question in such a way that people only need to know numpy and the formula you're trying to use, people might be able to help.

#

if people have to know (for example) astropy, it's less likely that you'll get help

merry fern Mar 10, 2023, 8:02 PM

#

how do you grab the column name based on iloc instead of printing the value?
example:

for i in [level 2 multiindex list]:
  df[(level1, i)].iloc[1:3] <---- i want these column names```

serene scaffold Mar 10, 2023, 8:04 PM

#

merry fern how do you grab the column name based on iloc instead of printing the value? exa...

you probably don't even want to have that for loop. but you can have df[(level1, i)].iloc[1:3].columns

merry fern Mar 10, 2023, 8:04 PM

#

serene scaffold you probably don't even want to have that for loop. but you can have `df[(level1...

AttributeError: 'Series' object has no attribute 'columns'

#

that's what i ran into...

#

and .name returns the index

serene scaffold Mar 10, 2023, 8:05 PM

#

merry fern ```AttributeError: 'Series' object has no attribute 'columns'```

please do print(df.columns) and show the text

merry fern Mar 10, 2023, 8:06 PM

#

so what i'm trying to do is create a dataframe based on conditions here...

serene scaffold Mar 10, 2023, 8:06 PM

#

this looks like it only has one level of indexing; do the rows have more than one level?

merry fern Mar 10, 2023, 8:06 PM

#

the multiindex is: Scenario, Account

so if I pass a list as the Account, i want to look at those 2 columns (index# 1 and 2 or [1:3]) and do something

serene scaffold Mar 10, 2023, 8:07 PM

#

merry fern the multiindex is: Scenario, Account so if I pass a list as the Account, i want...

I still need to know if it's the rows or the columns with two levels of indexing.

merry fern Mar 10, 2023, 8:07 PM

#

rows

#

this is how far i got:

serene scaffold Mar 10, 2023, 8:07 PM

#

please do print(df.head().to_dict()) and put the text in the paste bin

#

!paste

arctic wedgeBOT Mar 10, 2023, 8:07 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

merry fern Mar 10, 2023, 8:08 PM

#

i want it to return ['MC','FW','MC'] (because I need to throw an IF in there that says if the columns value is 0 to not include it

#

hold on let me drop all that other sht its irrelevant

serene scaffold Mar 10, 2023, 8:10 PM

#

@merry fern adding a ping to a message after you send it has no effect, just so you know.
please do print(df.sample(10).to_dict()) instead

merry fern Mar 10, 2023, 8:11 PM

#

serene scaffold <@148769055753371649> adding a ping to a message after you send it has no effect...

thats the entire df lol. for now its just 1 "scenario' -- Default

serene scaffold Mar 10, 2023, 8:11 PM

#

merry fern thats the entire df lol. for now its just 1 "scenario' -- Default

please do the print statement with the original dataframe, before you try to solve the problem.

merry fern Mar 10, 2023, 8:12 PM

#

this is the entire dataframe

#

well, without the multiindices

#

actually no they're in there ha

#

im pretty new to multiindicies

serene scaffold Mar 10, 2023, 8:13 PM

#

okay, well, I'm not following. if you can do print(df.sample(10).to_dict()) with the original dataframe within the next five minutes, we can continue.

#

so you deleted the rest of it?

merry fern Mar 10, 2023, 8:14 PM

#

its irrelevant, not used

serene scaffold Mar 10, 2023, 8:14 PM

#

also, please do everything as text

merry fern Mar 10, 2023, 8:14 PM

#

k

serene scaffold Mar 10, 2023, 8:15 PM

#

the thing is that the solution shouldn't involve any list comprehensions, so we should rewind to before you used them

#

what happened to model_para?

#

can you show that instead?

merry fern Mar 10, 2023, 8:15 PM

#

df = model_para

#

im using a list comprehension because eventually im inserting the list comp into a dataframe creation

serene scaffold Mar 10, 2023, 8:16 PM

#

it's very unlikely that the idiomatic solution would involve a list comprehension

merry fern Mar 10, 2023, 8:17 PM

#

so here's the issue, i need to create a dataframe based on the list that is passed. if the list is 1, then it looks at the df columns MC' and 'FW' to see if the value is non-zero. if its non-zero, then it includes that in a column to be created for the dataframe

#

if the list is 2 then it needs to iterate over the list and make multiplea ccounts

#

right now it only works with 1 which requires no logic to look

serene scaffold Mar 10, 2023, 8:18 PM

#

sorry, but I don't think I can help with this.

merry fern Mar 10, 2023, 8:18 PM

#

thanks anyway! im so close...

#

I'm pretty close, I just need to 1) isolate the column name, and 2) produce a list of individuals (4 in this case), rather than a list of 2

young granite Mar 10, 2023, 8:35 PM

#

boreal gale > but not for pictures. what does this mean?

im trying to use it with my multioutput pipeline but until now did not manage to get it right

sharp herald Mar 10, 2023, 9:40 PM

#

How to crop a QR code from a larger photo and decode it with pyzbar? I tried using cv2.QRCodeDetector() from python-opencv but it fails to recognize too.

#

the qrcode is large, version 18

limber kiln Mar 10, 2023, 11:48 PM

#

Why does make_dot not work here -

# %matplotlib inline

import torch
from torchviz import make_dot

import torchvision
from torchview import draw_graph
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math

a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)

b = torch.sin(a)

c = 2 * b
print(c)

d = c + 1
print(d)

out = d.sum()
print(out)

make_dot(d , params=dict(a.named_parameters())).render("a_torchviz", format="png")

#

Never mind. Got it working -

# %matplotlib inline
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz/bin/'
import torch
from torchviz import make_dot

import torchvision
from torchview import draw_graph
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math

a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)

b = torch.sin(a)

c = 2 * b
print(c)

d = c + 1
print(d)

out = d.sum()
print(out)
make_dot(out.mean()).render("a_torchviz", format="png")

late scarab Mar 11, 2023, 1:00 AM

#

Greetings! Not sure, the topic is right, but mb sb can help me.

I'm pretty new in big point cloud visualization and met the pretty strange problem last week. I'm using pycharm pro 22.1 on my macbook pro m1pro 16gb. When I render the cloud using open3d, it renders in external window. As I understood, this window is a part of original python and the problem is that, it is always crashed and I have to restart my kernel again and again. Before this, I used plotly and plotly renders directly in notebook (plotly cannot work with 1_000_000+ points). Pls, give an advise, how can I fix it?

P.S. I tried to use this http://www.open3d.org/docs/latest/tutorial/Basic/jupyter.html, but no way, my work stuck for 2 days on this problem.
P.P.S I tried to use open3d on windows pc with 128 gb RAM in my lab and no problems, btw, I can not use server all the time.

digital rover Mar 11, 2023, 6:08 AM

#

Not sure if I should ask about Pandas here.

Anyone here tried the 2.0 yet? Is the compatibility seamless with the numpy backend version?

patent lynx Mar 11, 2023, 6:09 AM

#

@hasty mountain as i explored and researched a bit. Do you know a regression models that allows you to set weights on the target variable? For example " values on 2 std away are more important than the mean" this allows my model to be more robust to outliers. So far my best score is around r2 of 0.48 using xgboost using pseudo huber loss

hasty mountain Mar 11, 2023, 6:12 AM

#

patent lynx <@388857837222100993> as i explored and researched a bit. Do you know a regressi...

Well...you could try some specific activation functions...or thresholding... pithink

#

I don't know if I get it. You want to assign a higher weight to values that are further from the mean, with higher standard deviation?

patent lynx Mar 11, 2023, 6:13 AM

#

Lets stick higher values further from the mean for now maybe my terminology isnt the best...

hasty mountain Mar 11, 2023, 6:14 AM

#

Well, you could try something like: https://pytorch.org/docs/stable/generated/torch.nn.Tanhshrink.html#torch.nn.Tanhshrink

#

If you manage to normalize your values such as your mean gets around 0, this might be useful pithink

patent lynx Mar 11, 2023, 6:16 AM

#

Thanks I'll try it

hasty mountain Mar 11, 2023, 6:16 AM

#

https://pytorch.org/docs/stable/generated/torch.nn.Hardtanh.html#torch.nn.Hardtanh

hasty mountain Mar 11, 2023, 6:16 AM

#

hasty mountain https://pytorch.org/docs/stable/generated/torch.nn.Hardtanh.html#torch.nn.Hardta...

Maybe this might allow for discarding the normalization.
Or this one:
https://pytorch.org/docs/stable/generated/torch.nn.Hardshrink.html#torch.nn.Hardshrink

patent lynx Mar 11, 2023, 6:23 AM

#

hasty mountain If you manage to normalize your values such as your mean gets around 0, this mig...

Not sure if I should normalise it, cause the target variable is centered around zero in an odd way.

#

It is bimodal distributed, because it is impossible to see nba game results in a draw.

hasty mountain Mar 11, 2023, 6:33 AM

#

pithink

#

That's surely a dataset that I don't understand, so...double check if my suggestions make sense

glossy moth Mar 11, 2023, 6:57 AM

#

Hey! Is there an optimal ratio of positive to negative data when training a model, and does it depend on model type?

lapis sequoia Mar 11, 2023, 7:18 AM

#

hardy bramble Hii, anyone knows how to generate a graph from a map like this, only the orange ...

A graph how? Do you want to find a connection with the place and some other variable? Do you want to find a connection between distance and some other variable?

iron basalt Mar 11, 2023, 7:18 AM

#

late scarab Greetings! Not sure, the topic is right, but mb sb can help me. I'm pretty new...

Does it work with a regular Python file, not Jupyter?

iron basalt Mar 11, 2023, 7:18 AM

#

glossy moth Hey! Is there an optimal ratio of positive to negative data when training a mode...

Depends on the problem and model type.

glossy moth Mar 11, 2023, 7:27 AM

#

iron basalt Depends on the problem and model type.

How do I evaluate this? Are there some common rules and/or a paper you could point me toward that could help me with ratio determinations?

latent spire Mar 11, 2023, 7:28 AM

#

where would i go to rent a ai learning based vps

iron basalt Mar 11, 2023, 7:33 AM

#

glossy moth How do I evaluate this? Are there some common rules and/or a paper you could poi...

You probably want it balanced. Don't use just accuracy to measure your model.

hardy bramble Mar 11, 2023, 7:39 AM

#

lapis sequoia A graph how? Do you want to find a connection with the place and some other vari...

I want to extract the dotted shape, measure the area of the shape and divide it into two equal parts, but I can't extract the shape because opencv recognizes the other icons as shapes too. Or at least I don't know how, I need to do it through images. I have also tried to create a graph with networkx.

This is my code

import cv2
import numpy as np

# Random name
name = datetime.now()
name = "resources/result/" + str(name.timestamp()) + ".jpg"

# Load image
img = cv2.imread('resources/mapa_colonia_2.png')

# Convert BGR to HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Define the range
lower_red = np.array([0, 50, 50])
upper_red = np.array([10, 255, 255])
lower_red2 = np.array([170, 50, 50])
upper_red2 = np.array([180, 255, 255])

# Create a mask
mask_red = cv2.inRange(hsv, lower_red, upper_red)
mask_red2 = cv2.inRange(hsv, lower_red2, upper_red2)
mask = cv2.bitwise_or(mask_red, mask_red2)

# Mask to original img
res = cv2.bitwise_and(img, img, mask=mask)

# Convert to grayscale
gray = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)

# Canny Filter
edges = cv2.Canny(gray, 100, 200)

# Find Contours
contours, hierarchy = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Draw on original image
cv2.drawContours(img, contours, -1, (184, 7, 166), 2)

# Save the image
cv2.imwrite(name, img)

iron basalt Mar 11, 2023, 7:42 AM

#

hardy bramble I want to extract the dotted shape, measure the area of the shape and divide it ...

Is the shape you want always the largest?

hardy bramble Mar 11, 2023, 7:43 AM

#

iron basalt Is the shape you want always the largest?

Yes

iron basalt Mar 11, 2023, 7:43 AM

#

hardy bramble Yes

And it's detecting it?

hardy bramble Mar 11, 2023, 7:46 AM

#

iron basalt And it's detecting it?

No, if I search by the largest shape I get the pointer on the map.

iron basalt Mar 11, 2023, 7:46 AM

#

hardy bramble No, if I search by the largest shape I get the pointer on the map.

How are you doing the search?

hardy bramble Mar 11, 2023, 7:52 AM

#

iron basalt How are you doing the search?

filtering with something like this

max_contour = max(contours, key=cv2.contourArea)

iron basalt Mar 11, 2023, 7:53 AM

#

hardy bramble filtering with something like this ```python max_contour = max(contours, key=cv...

Ok, so there are two issues, one is that the rectangle's lines are dotted, the other is that it has partial occlusion.

#

Try HoughLinesP and see if it gives you the four lines: https://docs.opencv.org/3.4/d9/db0/tutorial_hough_lines.html

#

You can also try applying a blur (to connect the dots).

hardy bramble Mar 11, 2023, 7:57 AM

#

iron basalt Try HoughLinesP and see if it gives you the four lines: https://docs.opencv.org/...

Ohh, ok ill try

glossy moth Mar 11, 2023, 7:57 AM

#

iron basalt You probably want it balanced. Don't use just accuracy to measure your model.

What’s the downside of using accuracy?

hardy bramble Mar 11, 2023, 7:57 AM

#

iron basalt You can also try applying a blur (to connect the dots).

Can this work when it is not a regular shape?

iron basalt Mar 11, 2023, 7:59 AM

#

glossy moth What’s the downside of using accuracy?

It does not give the full picture. Getting good accuracy on the dataset does not always mean it will get good results in use.

iron basalt Mar 11, 2023, 7:59 AM

#

hardy bramble Can this work when it is not a regular shape?

It will detect multiple lines and you will probably have to stitch / extend them together.

hardy bramble Mar 11, 2023, 8:02 AM

#

iron basalt It will detect multiple lines and you will probably have to stitch / extend them...

Thanks, ill try it

glossy moth Mar 11, 2023, 8:04 AM

#

iron basalt It does not give the full picture. Getting good accuracy on the dataset does not...

Sure, but how do you balance high accuracy + very large available set to train on vs balanced pos:neg with a much smaller set

hard birch Mar 11, 2023, 12:34 PM

#

I have wine data and I'm trying to use regression to predict quality

#

the quality data seems to be multimodal and I want to use a regression which is better handled for such tasks

#

I'm really new to this so anyone have advice

#

I'm using sklearn but if tensorflow is better equipped for this please do tell

severe trellis Mar 11, 2023, 12:37 PM

#

I'm looking for a graphing lib that can create modern/elegant looking graphs. Is matplotlib a good choice for this, here's an example of what I'd consider modern/elegant

wooden sail Mar 11, 2023, 12:40 PM

#

matplotlib can make all of these, but they won't look as pretty by default

#

maybe check out seaborn (which wraps matplotlib) or plotly if you want something that looks pretty out of the box

severe trellis Mar 11, 2023, 12:42 PM

#

Ah, I don't mind learning matplotlib, so I'll give the configuration a shot

#

Seems like a really useful skill

wooden sail Mar 11, 2023, 12:42 PM

#

here's an example, then, of how to tweak the colors yourself https://matplotlib.org/stable/gallery/color/color_demo.html#sphx-glr-gallery-color-color-demo-py

hard birch Mar 11, 2023, 12:53 PM

#

https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

#

This is the data I have, winequality-red.csv to be exact

#

#

This is the seaborn plot of the red date

#

data

ripe sapphire Mar 11, 2023, 1:20 PM

#

severe trellis I'm looking for a graphing lib that can create modern/elegant looking graphs. Is...

I wonder how can you make the 4th one

severe trellis Mar 11, 2023, 3:59 PM

#

How would I go about creating a trendline for a graph where the x axis is just a time (in this case a dict, the key is a datetime.datetime) and the y-axis is the actual numerical value.
All the existing trendlines I've seen seem to depend on a numerical x-axis.

#

perhaps a trendline that looks like this https://cdn.overseer.tech/file/the-void/screenshots/firefox_8qOz0vfxok.png

mild dirge Mar 11, 2023, 4:04 PM

#

As long as the times are uniformly spaced (equal time between values) then you can just calculate a moving average that calculates the average of the last x values.

#

@severe trellis

hasty mountain Mar 11, 2023, 4:29 PM

#

Hey guys, in CrossEntropyLoss function as defined in Pytorch docs:
It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

What is the idea of this weight argument? The more of a specific class I have in my unbalanced dataset, the lower should be the weight I assign to it?
What improvement does this provide?

#

Oh, wait... I just remembered that I could also make this weight a learnable parameter. brainmon

mild dirge Mar 11, 2023, 4:35 PM

#

Yeah, so if your optimization is based solely on accuracy, and 99% of your data is of class "apple" for example, then your model will perform very well by just changing the weights such that it will always give apple, even if the image is an orange, because it would still get a 99% accuracy.

#

Changing the weight of each class means that it will make the apple class less important to optimize, such that the model must also optimize getting "orange" right.

hasty mountain Mar 11, 2023, 4:37 PM

#

Oh... then I guess making this a learnable parameter might allow my model to cheat pithink

mild dirge Mar 11, 2023, 4:37 PM

#

I'm not sure how that works, changing the parameter that determines how the score is calculated seems weird

hasty mountain Mar 11, 2023, 4:37 PM

#

I mean, it could simply assign a very high weight to the class it's predicting the most

mild dirge Mar 11, 2023, 4:37 PM

#

right

#

Another solution is just balancing the data

#

By undersampling/oversampling and augmentation etc.

#

But that could give mediocre results as well

hasty mountain Mar 11, 2023, 4:39 PM

#

hasty mountain I mean, it could simply assign a very high weight to the class it's predicting t...

Unless I make some kind of adversarial setup...where a model metric improves when the another's metric decreases... pithink

#

And one model makes the classification, another makes the optimization of those weights in crossentropy

mint palm Mar 11, 2023, 4:40 PM

#

dusty valve Mar 11, 2023, 5:19 PM

#

i made a cnn that takes input shape of (128, 128, 3), and i wanted to test it on an image of myself. i took an image, reshaped to 128, 128, 3 and outputed in plt (128, 128, 3) and the shape of the encoded array also said it was (128, 128, 3), but the error says its (32, 128, 3)

#

shape is (128, 128, 3)
Traceback (most recent call last):
  File "C:\Users\owner\OneDrive\Desktop\python\r-u-a-10\test.py", line 13, in <module>   
    print(np.argmax(model.predict([data])))
  File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler    
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\owner\AppData\Local\Temp\__autograph_generated_file5cgx781m.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:

    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2137, in predict_function  *
        return step_function(self, iterator)
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2123, in step_function  **  
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2111, in run_step  **       
        outputs = model.predict_step(data)
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2079, in predict_step
        return self(x, training=False)
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\input_spec.py", line 295, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "r-u-a-10" is incompatible with the layer: expected shape=(None, 128, 128, 3), found shape=(32, 128, 3)```

mild dirge Mar 11, 2023, 5:31 PM

#

You need to put that image into a list

dusty valve Mar 11, 2023, 5:31 PM

#

i did

#

print(np.argmax(model.predict([data])))

mild dirge Mar 11, 2023, 5:32 PM

#

Is that the first layer that gives the error?

dusty valve Mar 11, 2023, 5:32 PM

#

yes

mild dirge Mar 11, 2023, 5:32 PM

#

So the shape of what you give it is (1, 128, 128, 3) ?

dusty valve Mar 11, 2023, 5:32 PM

#

no

mild dirge Mar 11, 2023, 5:32 PM

#

It should be

dusty valve Mar 11, 2023, 5:32 PM

#

well yes

mild dirge Mar 11, 2023, 5:33 PM

#

Is it (1, 128, 128, 3) or (128, 128, 3) ?

woven berry Mar 11, 2023, 5:33 PM

#

idk if this is the right channel but for matplotlib how do i make it so that the arrow is visible over the axh and axv line?

dusty valve Mar 11, 2023, 5:33 PM

#

mild dirge Is it `(1, 128, 128, 3)` or `(128, 128, 3)` ?

first

mild dirge Mar 11, 2023, 5:34 PM

#

Hmm, well I doubt the model is lying, did you print the shape of the input before the line that predicts it?

dusty valve Mar 11, 2023, 5:34 PM

#

i dunno where it says 32

dusty valve Mar 11, 2023, 5:34 PM

#

mild dirge Hmm, well I doubt the model is lying, did you print the shape of the input befor...

yes

mild dirge Mar 11, 2023, 5:34 PM

#

!paste

arctic wedgeBOT Mar 11, 2023, 5:34 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

mild dirge Mar 11, 2023, 5:34 PM

#

Show the code

dusty valve Mar 11, 2023, 5:34 PM

#

okay

#

from PIL import Image
import numpy as np
from keras.models import load_model
from keras import Sequential
from matplotlib import pyplot as plt
model: Sequential = load_model('./r-u-a-10')  # type: ignore
image = Image.open(r'C:\Users\owner\Pictures\Camera Roll\TEST.jpg')
data = np.array(image.resize((128, 128)).convert('RGB').getdata(), np.uint8).reshape((128, 128, 3))
image.close()
plt.imshow(data)
plt.show()
print(data.shape)
print(np.argmax(model.predict([data])))```

mild dirge Mar 11, 2023, 5:35 PM

#

.reshape((128, 128, 3))

mild dirge Mar 11, 2023, 5:36 PM

#

dusty valve first

And you get (1, 128, 128, 3) ?

mystic aspen Mar 11, 2023, 5:36 PM

#

hardy bramble Hii, anyone knows how to generate a graph from a map like this, only the orange ...

Use mapbox or matplotlib to drow the boundaries and they provides area accordingly

mild dirge Mar 11, 2023, 5:36 PM

#

That doesn't make sense

dusty valve Mar 11, 2023, 5:37 PM

#

mild dirge And you get `(1, 128, 128, 3)` ?

because i put it in a list

mild dirge Mar 11, 2023, 5:37 PM

#

Oh hmm right

dusty valve Mar 11, 2023, 5:37 PM

#

otherwise it is 128, 128, 3

mild dirge Mar 11, 2023, 5:38 PM

#

Can you just do .reshape((1, 128, 128, 3)) and remove that [] part?

dusty valve Mar 11, 2023, 5:38 PM

#

okay

mild dirge Mar 11, 2023, 5:38 PM

#

I doubt it makes a difference, but maybe it really expects a np array

dusty valve Mar 11, 2023, 5:39 PM

#

thanks, it worked

mild dirge Mar 11, 2023, 5:39 PM

#

Oh, guess it just doesn't accept a list

#

Weird about the 32 though

dusty valve Mar 11, 2023, 5:40 PM

#

yes

royal hound Mar 11, 2023, 6:43 PM

#

so imbalanced dataset

#

what do again i forgot

severe trellis Mar 11, 2023, 7:05 PM

#

mild dirge As long as the times are uniformly spaced (equal time between values) then you c...

Ahh, gotcha. Though, say the step is 600 seconds over a week, would a rolling period of a day be reasonable? This would just be calculating the average for day 1, plotting it, doing the same for day 2, etc.?

mild dirge Mar 11, 2023, 7:06 PM

#

Different values for the rolling average give different results

#

Value of 1 gives original graph

#

value of 10 gives a bit more smoothing

#

value of 100 gives a lot more smoothing

#

It's a choice you have to make to get the results you want

#

And if the period of the rolling average is a day, does not mean you calculate it only for each day

#

But at each timepoint you get the data from then until 24 hours before, and take the average

severe trellis Mar 11, 2023, 7:09 PM

#

I'm confused, I've got a kv mapping of the datetime with how many players were online at that time, I would just split this into 7 days (since this is data for a week), and plot each point on the graph as the trendline so you'd be able to see the day-to-day different. For example:
Plot 1: Average number of players on day 1
Plot 2: Av. on day 2
etc.

Is this what a rolling average is?

mild dirge Mar 11, 2023, 7:10 PM

#

No

#

Do you know numpy?

severe trellis Mar 11, 2023, 7:10 PM

#

On a fundamental level, yes

mild dirge Mar 11, 2023, 7:10 PM

#

Let me make an example

#

#

import numpy as np
import matplotlib.pyplot as plt


def moving_average(ys, window):
    # The mvoing average is same length as original dataset
    result = np.zeros(ys.shape[0])
    
    # For each point we calculate the average of the previous points (nr determined by window size)
    for i in range(ys.shape[0]):
        result[i] = np.mean(ys[max(0, i-window):i+1])

    return result


# Defining the y values
ys = np.zeros(1000)
for i in range(1, ys.shape[0]):
    ys[i] = np.random.normal(ys[i-1], 1)

# Plotting for different window sizes
fig, ax = plt.subplots(1)
ax.plot(ys, label='original')
ax.plot(moving_average(ys, 10), label='Window size of 10')
ax.plot(moving_average(ys, 50), label='Window size of 50')
ax.plot(moving_average(ys, 200), label='Window size of 200')
plt.legend()
plt.show()

#

@severe trellis

#

This is not the most efficient way of doing it, but just to illustrate how the moving average is calculated

#

So even though the window size is set to 200 for the last call, it still calculates it for all 1000 points

#

Not just every 200th point

tidal bough Mar 11, 2023, 7:20 PM

#

mild dirge This is not the most efficient way of doing it, but just to illustrate how the m...

~~ cumsum ;)~~

severe trellis Mar 11, 2023, 7:29 PM

#

Tysm for that, it really makes sense. Quite happy with my first graph, what do you think?

queen cradle Mar 11, 2023, 7:29 PM

#

@severe trellis In the statistics literature, you have a time series, and what you're looking for is called a "smoothing".

mild dirge Mar 11, 2023, 7:30 PM

#

Yeah that looks good

#

If you want to do this for more data, definitely use cumsum to get the total of points between two time points

queen cradle Mar 11, 2023, 7:30 PM

#

There are a lot of ways of smoothing. Sliding windows are good. Exponentially weighted moving averages are good. A lot of filters from signal processing (scipy.signal.windows) are good for this purpose.

mild dirge Mar 11, 2023, 7:31 PM

#

Pandas also has a moving average method iirc

queen cradle Mar 11, 2023, 7:32 PM

#

For a first try, I would usually recommend an exponentially weighted moving average.

severe trellis Mar 11, 2023, 7:33 PM

#

Ah, understood. Thank you guys so much!

tidal bough Mar 11, 2023, 7:37 PM

#

this trendline looks weird to me; the peaks on it aren't where the peaks on either of the graphs are

mild dirge Mar 11, 2023, 7:40 PM

#

tidal bough this trendline looks weird to me; the peaks on it aren't where the peaks on eith...

Well it averages everything out, so yeah it would not get to those peaks

#

That is also partially why ema would be better

wooden sail Mar 11, 2023, 7:40 PM

#

you can correct that by using a zero phase filter

tidal bough Mar 11, 2023, 7:41 PM

#

mild dirge Well it averages everything out, so yeah it would not get to those peaks

i mean the horizontal location. EMA should be better at that, yeah

wooden sail Mar 11, 2023, 7:41 PM

#

check out https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html

#

applying a filter in both directions doubles its order and corrects the phase shift

mild dirge Mar 11, 2023, 7:43 PM

#

So it averages the moving average in both direction?

wooden sail Mar 11, 2023, 7:43 PM

#

yep

mild dirge Mar 11, 2023, 7:43 PM

#

Ah that's a pretty creative trick

wooden sail Mar 11, 2023, 7:44 PM

#

you can derive it in closed form too, but the impulse response is not causal. some people don't like this 😛 it entails padding the end and beginning with zeros and then truncating

#

alternatively, filter in one direction, reverse the result, filter again, and reverse one last time

#

same result, different "interpretation"

queen cradle Mar 11, 2023, 7:45 PM

#

severe trellis Ah, understood. Thank you guys so much!

Here, try this:

import numpy as np
import scipy.signal

alpha = 0.5

data = np.zeros(20)
data[0] = 1.0

a = np.array((1.0, -(1.0 - alpha)))
b = np.array((1.0,))

scipy.signal.lfilter(b, a, data)

Replace data with your actual data. Try different values of alpha until you find one you like.

wooden sail Mar 11, 2023, 7:45 PM

#

you like having that capacitor-feel in your filters, huh

tidal bough Mar 11, 2023, 7:46 PM

#

but the impulse response is not causal. some people don't like this
yeah, it's slightly concerning. but I guess it's probably not possible to get an accurate phase without breaking causality

wooden sail Mar 11, 2023, 7:46 PM

#

the future is now, old man. digital signals can be shifted arbitrarily

tidal bough Mar 11, 2023, 7:47 PM

#

wooden sail the future is now, old man. digital signals can be shifted arbitrarily

investors hate this one trading trick

pd.load_dataset("bitcoin_prices").shift("5d").tail(100)

wooden sail Mar 11, 2023, 7:48 PM

#

i actually laughed out loud

severe trellis Mar 11, 2023, 7:52 PM

#

queen cradle Here, try this: ```py import numpy as np import scipy.signal alpha = 0.5 data ...

This just copies the original data and offsets every value by the alpha?

#

alpha = 1.4
data = list(steam_data.values())
a = np.array((1.0, -(1.0 - alpha)))
b = np.array((1.0,))
data = scipy.signal.lfilter(b, a, data)```

queen cradle Mar 11, 2023, 8:51 PM

#

severe trellis This just copies the original data and offsets every value by the alpha?

No, it's an exponentially moving average with parameter alpha.

#

Those are low-pass filters.

rich spindle Mar 11, 2023, 9:21 PM

#

how would i start if i wanted to just make a simple AI ? i haven't ever tried one before

#

like say i just wanna generate text or something

mild dirge Mar 11, 2023, 9:26 PM

#

That's not simple AI

#

The start would be something like linear regression or a perceptron, then making a multi-layer perceptron etc.

#

Generating text is already quite hard

#

But it really depends on what you want to do. Like making it from scratch, or just use a toolbox that has premade models

rich spindle Mar 11, 2023, 9:28 PM

#

hm

iron basalt Mar 11, 2023, 9:58 PM

#

rich spindle like say i just wanna generate text or something

For what purpose are you generating text? Not all text generation is the same problem.

rich spindle Mar 11, 2023, 9:58 PM

#

just random stuff for fun

iron basalt Mar 11, 2023, 10:04 PM

#

rich spindle just random stuff for fun

If a chat bot is what you want. I would start with Markov chains.

rich spindle Mar 11, 2023, 10:05 PM

#

kk

dusk knot Mar 11, 2023, 10:20 PM

#

technical question regarding numpy and type hinting

looking at https://numpy.org/doc/stable/reference/arrays.scalars.html and https://numpy.org/devdocs/reference/typing.html

if I add a tpye hint : ArrayLike to a function parameter, will ArrayLike "properly represent" both numpy ndarray as well as a numpy scalar?

Expressed more shortly, the question is: Is a numpy scalar also a numpy.typing.ArrayLike?

#

I would like my function to be usually called with ndarrays as values for the parameters, but I was wondering if I can also allow for scalar values to be passed to the function where possible/applicable. I already tried it out with some different calls to the same function, it works. However, that said, I now arrived at my above question about the type hinting.

glossy moth Mar 11, 2023, 10:24 PM

#

glossy moth Sure, but how do you balance high accuracy + very large available set to train o...

Does anyone know the answer to this?

#

I am still confused about weighing the pros/cons of a huge unbalanced set and judging with accuracy or using a subset that is much smaller but more balanced

mild dirge Mar 11, 2023, 10:32 PM

#

Accuracy is not a good measure if you want your model to perform on all classes

#

There's multiple ways to deal with unbalanced data, you should look up image augmentation, undersampling, oversampling etc.

glossy moth Mar 11, 2023, 10:48 PM

#

mild dirge There's multiple ways to deal with unbalanced data, you should look up image aug...

Thank you. Appreciate the direction

dusty valve Mar 11, 2023, 11:25 PM

#

!d pandas.Series.shift

prime hearth Mar 12, 2023, 2:52 AM

#

https://github.com/Simplyalex99/OpenReview hello would appreciate feedback on the backend folder for code , thanks!

GitHub

GitHub - Simplyalex99/OpenReview

Contribute to Simplyalex99/OpenReview development by creating an account on GitHub.

glossy urchin Mar 12, 2023, 4:41 AM

#

hi

#

is there anyway to check if value exists in column then return the qualities of that row?

agile cobalt Mar 12, 2023, 4:42 AM

#

can you show an example of what you mean?

glossy urchin Mar 12, 2023, 4:42 AM

#

yes

#

so lets say i get an area code from the user

#

i check to see if the area code exists

#

then get acess to item code and stuff

agile cobalt Mar 12, 2023, 4:43 AM

#

uh, showing the describe() result without showing the actual data is broadly speaking not very useful

glossy urchin Mar 12, 2023, 4:43 AM

#

oh its not my code

#

its an image off of google

agile cobalt Mar 12, 2023, 4:44 AM

#

if possible, it would be useful to have a minimum example of what the input looks like and what do you want for the output to look like

#

kinda like how you would format a unit test

glossy urchin Mar 12, 2023, 4:44 AM

#

so like lets say i input 125.449411

agile cobalt Mar 12, 2023, 4:44 AM

#

doesn't have to be the real data, just formatted like it

glossy urchin Mar 12, 2023, 4:45 AM

#

oh this was a bad example

#

but pretend the leftmost column doesnt exist

#

and i input that

#

i check in area code column if that value exists

#

if it does i get acess to that row

#

like item code , element code , etc.

agile cobalt Mar 12, 2023, 4:47 AM

#

again ~~(more straightforward this time...),~~ give a full example of what the dataframe would look like in a way that I could load it with pandas for testing

#

to be more specific, something like https://stackoverflow.com/help/minimal-reproducible-example

Stack Overflow

How to create a Minimal, Reproducible Example - Help Center

Stack Overflow | The World’s Largest Online Community for Developers

glossy urchin Mar 12, 2023, 4:48 AM

#

yes

#

check to see if value is in column p

#

then i acess the values in the row

#

if it is there

#

like if value is name

#

then then i want the q and v for it

#

to be able to acess it

#

or do i have to make a dictionary with p and the index

#

or how do i iterate through a column

agile cobalt Mar 12, 2023, 4:58 AM

#

at that point you might as well load it into a dictionary instead of a dataframe

#

but assuming that there are no duplicated values, one option would be just using set_index

#

!e ```py
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': ['A', 'B', 'C'],
'C': [True, False, True],
})
dict_like = df.set_index('B')
print(dict_like.loc['B'])

arctic wedgeBOT Mar 12, 2023, 5:00 AM

#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | A        2
002 | C    False
003 | Name: B, dtype: object

agile cobalt Mar 12, 2023, 5:00 AM

#

(the returned object is a Series btw - the index being the previous dataframe's columns, the values being the corresponding values, and the name being the key)

glossy urchin Mar 12, 2023, 5:01 AM

#

can i dm you and show exactly what im trying to do

agile cobalt Mar 12, 2023, 5:01 AM

#

nah, I'll leave it at that and let you figure out how to integrate as well as recommend for you to look further into the documentation to understand it better if you are not very used to how data flows in pandas

glossy urchin Mar 12, 2023, 5:01 AM

#

ok

agile cobalt Mar 12, 2023, 5:01 AM

#

do check out the pandas User Guides if you haven't yet

glossy urchin Mar 12, 2023, 5:04 AM

#

got it

#

would mapping the names to an index be a valid solution

#

i could do it in o(1)

severe trellis Mar 12, 2023, 6:16 AM

#

wooden sail check out https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filt...

Yeah I'm pretty confused with this, do I pass in the raw data, or the rolling average? ```py
alpha = 0.8
data = rolling_average(np.array(list(steam_data.values())), 50)
a = np.array((1.0, -(1.0 - alpha)))
b = np.array((1.0,))
data = scipy.signal.filtfilt(b, a, data)
ax.plot(steam_data.keys(), data, color='#666666', linestyle='dashed', label='Trendline', linewidth=2)

wooden sail Mar 12, 2023, 6:34 AM

#

severe trellis Yeah I'm pretty confused with this, do I pass in the raw data, or the rolling av...

the raw data

severe trellis Mar 12, 2023, 6:38 AM

#

With an alpha of 1 it does this

#

Editing the alpha just moves it up or down, e.g. with a value of 0.8:

wooden sail Mar 12, 2023, 6:41 AM

#

looks about right

#

the more you increase alpha, the closer the result will be to the average of all the points

#

and you don't get a shift now

severe trellis Mar 12, 2023, 6:49 AM

#

I'm looking for a trendline that looks more like this https://cdn.overseer.tech/file/the-void/screenshots/firefox_xuPUxAD5q9.png

wooden sail Mar 12, 2023, 6:53 AM

#

like the dotted line?

grizzled barn Mar 12, 2023, 7:08 AM

#

I’m interested in writing a computer vision Poker program that looks at all the cards on a table (dealers, yours, and other players’) and then makes a decision based on whether you should fold, check, or up your bet. I feel like this would be a really easy program to write, but could anyone with experience in computer vision give their thoughts?

tawdry ruin Mar 12, 2023, 8:08 AM

#

I am curious to know how power BI works, it looks like an integration of features of SQL, matplotlib. Any information on this?

severe trellis Mar 12, 2023, 8:44 AM

#

wooden sail like the dotted line?

ya

wooden sail Mar 12, 2023, 8:48 AM

#

try smaller values of alpha to increase the amount of smoothing

severe trellis Mar 12, 2023, 8:52 AM

#

I don't know how to explain this in more detail, the alpha just offsets the trendline when I adjust the alpha. It doesn't nothing in regards to smoothing

#

It's quite literally a clone of the original data, and being offset by the alpha

#

Whether or not it's meant to do this, idk, but the screenshots I sent above demonstrate what happens when the alpha is adjusted

wooden sail Mar 12, 2023, 8:58 AM

#

any chance you can share the original data?

#

if not, i'll try to make a synthetic example

arctic wedgeBOT Mar 12, 2023, 9:01 AM

#

Hey @severe trellis!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

severe trellis Mar 12, 2023, 9:01 AM

#

Sure, and here's my code to save you some time https://paste.pythondiscord.com/rebinifuxe

📎 stats_week.json

wooden sail Mar 12, 2023, 9:16 AM

#

severe trellis Sure, and here's my code to save you some time https://paste.pythondiscord.com/r...

oops i was already making synthetic data. let's take a look

#

!e

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

t = np.linspace(0, 7, 300)
Nt = len(t)
s = np.zeros(Nt)

for n in range(1,4):
    s += np.sin(2*np.pi*n*t + np.pi*3/2)/n
trend = 3*np.cos(2*np.pi*t/9)
s += trend
s += np.random.normal(loc=0.0, scale=0.2, size=Nt)

plt.plot(t, s)
plt.plot(t, trend)
legends=["observed", "trend"]

alphas = [0.5, 0.2, 0.08]
for alpha in alphas:
    a = np.array((1.0, -(1.0 - alpha)))
    a /= sum(a)
    b = np.array((1.0,))
    filtered = sp.signal.filtfilt(b, a, s)
    plt.plot(t, filtered)
    legends.append(f"{alpha=}")
    

plt.legend(legends)
plt.savefig("example.png")

#

ah man

arctic wedgeBOT Mar 12, 2023, 9:17 AM

#

@wooden sail :warning: Your 3.11 eval job timed out or ran out of memory.

[No output]

wooden sail Mar 12, 2023, 9:17 AM

#

smh i'll just paste the image

#

tidal bough Mar 12, 2023, 9:18 AM

#

oh, I forgot the bot has scipy
that allows for a lot of shenanigans

wooden sail Mar 12, 2023, 9:18 AM

#

it's a little messy, but hopefully you get the idea

#

smaller values of alpha decrease the cutoff frequency of the low-pass exponential filter

severe trellis Mar 12, 2023, 9:18 AM

#

do you just happen to know all this off the top of your head?

wooden sail Mar 12, 2023, 9:18 AM

#

if alpha is too small you get only a straight line, but you should be able to tune it to taste

wooden sail Mar 12, 2023, 9:19 AM

#

severe trellis do you just happen to know all this off the top of your head?

i've spent a fair amount of time doing signal processing

tidal bough Mar 12, 2023, 9:19 AM

#

(and also there was a discussion of phase correction for moving averages yesterday 😉 )

wooden sail Mar 12, 2023, 9:19 AM

#

the data is made up, but hopefully you find it more or less convincing. i tried to make it look similar to yours

#

if electrical circuits mean anything to you, an exponentially weighted moving average filter is the discrete form of what a resistor-capacitor circuit does

tidal bough Mar 12, 2023, 9:21 AM

#

I wonder if a better way to fit this curve would be fourier fitting, or however it's called. It looks like it's basically a big sinusoid plus a small sinusoid plus noise

wooden sail Mar 12, 2023, 9:21 AM

#

that's basically what i assumed would work, but i made up the weights lol

tidal bough Mar 12, 2023, 9:21 AM

#

oh wait, lmao, that's how you're generating it, no wonder it'd work

wooden sail Mar 12, 2023, 9:21 AM

#

yeah lol

#

you can never get it to match super cleanly this way btw, i should warn you

#

you'd need a parametric estimator or a fancy nonparametric one. but low/bandpass filtering can give you a rough idea

#

at any rate, the TL;DR is, try making alpha smaller. and don't forget to divide the a vector by the sum of its entries. otherwise the result gets scaled weirdly

trail yacht Mar 12, 2023, 9:25 AM

#

Make me master data science and python!

severe trellis Mar 12, 2023, 9:29 AM

#

How does the trend = work here, I assume this bases it solely off of the t you pass into the formula for it, rather than the data you gave it. Though, doing this doesn't quite work for me trend = 3*np.cos(2*np.pi*np.array(list(steam_data.values()))/9)

wooden sail Mar 12, 2023, 9:30 AM

#

well, i looked at your data, and just "by eye" and also from the image with the dotted line that you shared, it just "looked" like the overall trend was a slow oscillation

#

so i made up a random low frequency sinusoid and added it to the data

#

this is not what your data does exactly though, so just ignore that part

severe trellis Mar 12, 2023, 9:31 AM

#

Ohh, I see. This data can vary, since it's just the number of players/twitch viewers for a game on a given week. It would vary a bit

wooden sail Mar 12, 2023, 9:31 AM

#

that's just how i generated data for myself to test the filters

#

you can ignore everything before the line where i declare the alphas

#

since you get your data directly from somewhere else

severe trellis Mar 12, 2023, 9:33 AM

#

ah oops, I understand. That trendline was just eyeballed, the alphas are the actual generated ones

#

Woah this is perfect, thank you so much. 100% need to start learning these data science tools 🤩

wooden sail Mar 12, 2023, 9:36 AM

#

did it work better with any of those alphas?

#

this is a good place to get acquainted with the basics btw http://www.dspguide.com/pdfbook.htm

simple tapir Mar 12, 2023, 9:45 AM

#

So, this is a visualisation of a CNN architecture and I wonder why we use 4 layers here? I mean, the first 2 aren't enough for the machine to learn?

wooden sail Mar 12, 2023, 9:47 AM

#

my best answer is "test it and see", because deep neural networks are in general not explainable 😛

#

try removing the layers and see how it performs

simple tapir Mar 12, 2023, 9:50 AM

#

Alright, lemme try 😄

#

Thanks

severe trellis Mar 12, 2023, 10:25 AM

#

wooden sail did it work better with any of those alphas?

yup, no doubt

#

mild dirge Mar 12, 2023, 10:48 AM

#

simple tapir So, this is a visualisation of a CNN architecture and I wonder why we use 4 laye...

There's definitely quite a few more layers than 4 here, they are just grouped into 4 "blocks" of layers. And if you think there are too many, you can probably make the kernel for maxpooling bigger and remove some layers. If you just remove some convolutional+maxpool block, then you will get too big of a feature map at the end when it enters the dense layer.

uneven mist Mar 12, 2023, 11:06 AM

#

Hello all! I'm starting with NN and wanted to start with MNIST Handwritten Digits recognition from scratch. Could anyone recommend good sources where the programming steps are explained good.

simple tapir Mar 12, 2023, 11:10 AM

#

mild dirge There's definitely quite a few more layers than 4 here, they are just grouped in...

Hmm, gotcha thank you

mild dirge Mar 12, 2023, 11:11 AM

#

If you really want to learn how to do it from scratch, I would mostly look into the theory. Understand that each layer can be represented as a weight matrix. Understand how a forward pass is done by a dot product of the input/feature map with the weight matrix. How you can add a bias by concatenating a 1 to the input vector and adding an extra column to the weight matrix. Understand backpropagation, and how the chain rule works. @uneven mist

#

And if you understand that, you only really need to look up some numpy functions to get it into code. I would personally not look up a tutorial for writing a NN from scratch, as they will just show you the entire completed code, and you won't learn as much. And often those tutorials contain many mistakes from my experience.

#

This video might also be good, it goes over most of the maths and stuff, but it's coded in a language other than Python, so you can't just straight up copy it
https://www.youtube.com/watch?v=hfMk-kjRv4c&t=1s

YouTube

Sebastian Lague

How to Create a Neural Network (and Train it to Identify Doodles)

Exploring how neural networks learn by programming one from scratch in C#, and then attempting to teach it to recognize various doodles and images.

Source code: https://github.com/SebLague/Neural-Network-Experiments
Demo: https://sebastian.itch.io/neural-network-experiment

If you'd like to support me in creating more videos (and get early acce...

▶ Play video

uneven mist Mar 12, 2023, 11:15 AM

#

mild dirge If you really want to learn how to do it from scratch, I would mostly look into ...

Tysm 🙂

mild dirge Mar 12, 2023, 11:16 AM

#

And if you ever get stuck, you could just always ask here

severe trellis Mar 12, 2023, 11:33 AM

#

Sebastian Lague makes some great videos

severe trellis Mar 12, 2023, 2:44 PM

#

Is matplotlib meant to be used in run_in_executor()? I save the figure as a BytesIO buffer, and send it. But each time I use the command, a chunk of memory gets used, and never gets released. Calling this multiple times can bring it down from 600-200, and it may jump up to 500 later, etc.

#

https://cdn.overseer.tech/file/the-void/screenshots/DiscordCanary_Ya0hl46o8E.png

wooden sail Mar 12, 2023, 3:17 PM

#

that could be a good way of managing it. pyplot will at least keep the latest axis in memory

#

you could be careful in deleting axes and closing figures as you go along

digital fog Mar 12, 2023, 3:20 PM

#

Does anyone have any experience of measuring the greeks using binomial tree option pricing theory? Got a project and could do with a code review to double check for code for delta and gamma.

severe trellis Mar 12, 2023, 3:29 PM

#

wooden sail you could be careful in deleting axes and closing figures as you go along

Ah, I thought it'd do that automagically. I've tried using plt.close(), (fig) and ('all'), neither making a difference

tidal bough Mar 12, 2023, 3:34 PM

#

i think it does clean them up if you're using a non-interactive backend

severe trellis Mar 12, 2023, 3:38 PM

#

Yup, I just make the graph and go on with the rest. This is just ran in the executor https://paste.pythondiscord.com/afufikipam

#

Perhaps the buffer isn't getting cleaned up?```py
graph: BytesIO = await self.bot.loop.run_in_executor(None, plot_stat_graph, steam_data, twitch_data, formatter)
f = discord.File(graph, filename='graph.png')
e = discord.Embed()
e.set_image(url=f'attachment://{f.filename}')

    await ctx.send(embed=e, file=f)```

strange igloo Mar 12, 2023, 3:41 PM

#

Hi Everyone, Happy Sunday

What are some popular charting libraries that are easy to use and well established? I'm trying to stay away from ones that are trends and then die out

tidal bough Mar 12, 2023, 3:43 PM

#

matplotlib, for sure

wooden sail Mar 12, 2023, 3:45 PM

#

severe trellis Perhaps the buffer isn't getting cleaned up?```py graph: BytesIO = await...

i'll be honest with you: i have no idea what any of this does, so i can't help lol

strange igloo Mar 12, 2023, 3:46 PM

#

I was looking for something less complex than matplotlib

wooden sail Mar 12, 2023, 3:47 PM

#

but it looks like you're passing the graphs to the bot? could the memory usage be the bot holding on to all the images?

wooden sail Mar 12, 2023, 3:47 PM

#

strange igloo I was looking for something less complex than matplotlib

you might find seaborn (a wrapper of matplotlib) easier to use. a lot of people like plotly too. tbh i don't think they're very different from mpl, and the basic functionality of mpl is easy to use

#

it's the customization that gets tricky

strange igloo Mar 12, 2023, 3:48 PM

#

You're right. I've used matplotlib a bit. It's just so hard to get it to do custom visuals.

severe trellis Mar 12, 2023, 4:29 PM

#

wooden sail i'll be honest with you: i have no idea what any of this does, so i can't help l...

Looked into it, seems like the issue is matplotlib, creating multiple figures can cause this. I've tried every possible solution I've seen to no avail

hasty mountain Mar 12, 2023, 4:53 PM

#

Hey guys, I want to compute a metric that might allow me to have a better idea on how my neural network gradients are behaving(without having to plot their values).
Is it a good idea to compute the mean of those gradients after each iteration(each batch), and, after an epoch is concluded, sum all those means? The idea is to make it such as the closer this result is to 0, the lower the optimization being performed(result = 0 would be an optimal or vanishing gradients case)

tidal bough Mar 12, 2023, 5:25 PM

#

severe trellis Looked into it, seems like the issue is matplotlib, creating multiple figures ca...

what comes to mind:

update matplotlib
see if you can periodically importlib.reload it, maybe? only if that actually deallocates the memory though

misty lava Mar 12, 2023, 5:32 PM

#

Anyone familiar with Tweepy and Twitter API?

serene scaffold Mar 12, 2023, 5:35 PM

#

misty lava Anyone familiar with Tweepy and Twitter API?

always ask your actual question. don't ask if anyone knows about the question you haven't asked.

misty lava Mar 12, 2023, 5:43 PM

#

----> 1 MyStreamListener=MyStreamListener()
      2 MyStream=tweepy.Stream(
      3     bearer_token=credentials.BEARER_TOKEN,
      4     auth=api.auth,
      5     listener=MyStreamListener,
      6 )
      7 MyStream.filter(languages=["en"],track=settings.TRACK_WORDS)

TypeError: StreamingClient.__init__() missing 1 required positional argument: 'bearer_token'

getting this error, would appreciate any help smile

serene scaffold Mar 12, 2023, 5:44 PM

#

misty lava ``` ----> 1 MyStreamListener=MyStreamListener() 2 MyStream=tweepy.Stream(...

in a new cell, run MyStreamListener?, and look at what it says

#

though it looks like StreamListener has been removed from the newest version of tweepy

misty lava Mar 12, 2023, 5:48 PM

#

running

MyStreamListener(bearer_token=credentials.BEARER_TOKEN)

<main.MyStreamListener at 0x2640094b130>

#

class MyStreamListener(tweepy.StreamingClient):
    def on_connect(self):                                       # DISPLAYS "CONNECTED" ONCE CONNECTED
        print("Connected")

serene scaffold Mar 12, 2023, 5:49 PM

#

misty lava running ``` MyStreamListener(bearer_token=credentials.BEARER_TOKEN) ``` <__main...

I said to run MyStreamListener?, but nevermind about that instruction.

Anyway, MyStreamListener is the name of the class. did you assign MyStreamListener(bearer_token=credentials.BEARER_TOKEN) to a variable?

boreal gale Mar 12, 2023, 5:52 PM

#

severe trellis Looked into it, seems like the issue is matplotlib, creating multiple figures ca...

what matplotlib backend are you using? have you checked out this github issue https://github.com/matplotlib/matplotlib/issues/20300 ?

fleet river Mar 12, 2023, 6:29 PM

#

Hi, I just wanted to ask if spacy is a good option to start with Machine learning?

serene scaffold Mar 12, 2023, 6:41 PM

#

fleet river Hi, I just wanted to ask if spacy is a good option to start with Machine learnin...

spacy has a lot of tools for nlp, but there aren't any end-to-end machine learning libraries.

fleet river Mar 12, 2023, 6:51 PM

#

So spacy is good then?

devout oak Mar 12, 2023, 7:34 PM

#

Hey i need to build a NN from scratch for an assignment of mine , can you guess point me to some good resources to help me do the assignment

devout oak Mar 12, 2023, 7:35 PM

#

fleet river Hi, I just wanted to ask if spacy is a good option to start with Machine learnin...

depends on what you want to do and spacy is for NLP based tasks as far as i remember i would also suggest you to look at nltk as well

fleet river Mar 12, 2023, 7:42 PM

#

For Neural Thinking... I had BrainJS

#

have*

fleet river Mar 12, 2023, 7:42 PM

#

devout oak depends on what you want to do and spacy is for NLP based tasks as far as i reme...

It's also easy

devout oak Mar 12, 2023, 7:43 PM

#

fleet river It's also easy

yes its good but as complexity increases its not so handy i guess but sure give it a short : )

fleet river Mar 12, 2023, 7:43 PM

#

devout oak yes its good but as complexity increases its not so handy i guess but sure give ...

What do you suggest?

devout oak Mar 12, 2023, 7:44 PM

#

when i started out i did use spacy and then moved to nltk

#

so go ahead

fleet river Mar 12, 2023, 7:56 PM

#

devout oak when i started out i did use spacy and then moved to nltk

Thankyou

spare pollen Mar 12, 2023, 8:51 PM

#

hey, for a school project i had to make a dots and boxes game with reinforcement learning (Q learning), what my teacher told us to do is basically make a table of all the boards and values for each board, and to train the board so it would play well, but mine doesn't really play well, so i was hoping for some ideas or general help
the basic idea is that we play a game, and we do an average of the cost of the board in the table, and the cost of the board from the game which we gain by going back from the outcome with multiplication of 0.9
so that if the outcome is 1 (good) 2 boards down is 0.81 (1*0.9*0.9) and we average that 0.81 with the table value

#

so ive tried running first a game of 2 random opponents to create a table,
then run a game of a table opponent against a random one so that the table one will get smarter
but it didnt, it mostly played as if it expected a random opponent
so i tried making the opponent smarter by telling it to capture squares if it can but now its dumber
my theory is because the game is won by the first player on optimal play from both sides, its hard to train the opponent to play well if it looses most of the time

#

so im out of ideas

queen cradle Mar 12, 2023, 9:02 PM

#

spare pollen so im out of ideas

I'd try having it play itself.

spare pollen Mar 12, 2023, 9:04 PM

#

queen cradle I'd try having it play itself.

do you mean train a Q table for the first player aswell as the second and have them play eachother?

#

ill try, but i dont think it'll do much, due to the fact the game is won by the first player if both play optimally i would think the first player will soon get a boost in costs and the opposite for the second as it will start to loose

queen cradle Mar 12, 2023, 9:06 PM

#

I think I'd use the same Q table for the two players. But with the box labels swapped (i.e., only train player 0, but when you need to move as player 1, relabel every 0 box as a 1 and every 1 box as a 0).

spare pollen Mar 12, 2023, 9:07 PM

#

there will be inconsistencies i think

#

first one that comes to my mind is that because player 2 never starts the starting positions will never occure in the table

queen cradle Mar 12, 2023, 9:19 PM

#

The Q table is going to need more entries, I agree. But if you imagine a human player, then really they're going to use the same principles to evaluate positions regardless of which player they are. So it seems like a sensible approach to me.

#

I believe this idea is used for training top-ranked chess and Go programs, but I don't know too much about those.

spare pollen Mar 12, 2023, 9:22 PM

#

queen cradle The Q table is going to need more entries, I agree. But if you imagine a human p...

i see what you say, ill try it, thanks!

#

hopefully it works

plush jungle Mar 12, 2023, 9:59 PM

#

is there a better way of finding out the right hyperparameters than just incrementing changing one hyperparameter at a time and seeing what happens?

boreal gale Mar 12, 2023, 10:07 PM

#

plush jungle is there a better way of finding out the right hyperparameters than just increme...

the problem you have right now is called "hyperparameter optimisation"

the classic solution to this is either grid search or random search:

grid search meaning you define a "grid" of hyperparameter e.g. 3 choices of param1 and 2 choices of param2, together they create a "grid" of size 6, try all these 6 configurations out and see what configuration is good.
random search meaning you literally try a random configuration and see what is good.

there has been plenty of research done in exploring what other ways are there to do this
the one i usually reach for is bayesian optimisation.

but as usual, the more model you try to apply on your data (different hyperparameter could be considered as another model you try on your data), the more likely you are overfiting in the long run (it's worth noting re-splitting your dataset into a different train/test set doesn't make it less overfitting)

plush jungle Mar 12, 2023, 10:52 PM

#

boreal gale the problem you have right now is called "hyperparameter optimisation" the clas...

what does it mean when your loss is decreasing but your accuracy isn't increasing?

#

like this?

Epoch : 100, Train loss: 10.321332544088364 , Train Acc: 0.48500001430511475, Val loss: 18.077982330322264, Val acc: 0.5035000443458557
Epoch : 110, Train loss: 9.53213369846344 , Train Acc: 0.48750001192092896, Val loss: 17.678728103637695, Val acc: 0.5050000548362732
Epoch : 120, Train loss: 8.479382407665252 , Train Acc: 0.5350000262260437, Val loss: 17.400557136535646, Val acc: 0.5050000548362732
Epoch : 130, Train loss: 12.972423934936524 , Train Acc: 0.512499988079071, Val loss: 17.024103546142577, Val acc: 0.5050000548362732
Epoch : 140, Train loss: 11.378312253952027 , Train Acc: 0.48750001192092896, Val loss: 16.716188049316408, Val acc: 0.5050000548362732
Epoch : 150, Train loss: 8.84527666568756 , Train Acc: 0.5149999856948853, Val loss: 16.279064178466797, Val acc: 0.5065000057220459
Epoch : 160, Train loss: 6.3715451717376705 , Train Acc: 0.48500001430511475, Val loss: 15.720507717132568, Val acc: 0.5102499723434448
Epoch : 170, Train loss: 7.068263298273086 , Train Acc: 0.4950000047683716, Val loss: 15.122348213195801, Val acc: 0.5182499885559082
Epoch : 180, Train loss: 8.795898056030273 , Train Acc: 0.512499988079071, Val loss: 14.284347248077392, Val acc: 0.5189999341964722
Epoch : 190, Train loss: 6.765073442459107 , Train Acc: 0.5249999761581421, Val loss: 13.267550659179687, Val acc: 0.5205000042915344
Epoch : 200, Train loss: 9.654535031318664 , Train Acc: 0.4650000035762787, Val loss: 11.865579986572266, Val acc: 0.5224999785423279
Epoch : 210, Train loss: 5.291061848402023 , Train Acc: 0.5199999809265137, Val loss: 11.279545116424561, Val acc: 0.5224999785423279
Epoch : 220, Train loss: 7.116158974170685 , Train Acc: 0.44999998807907104, Val loss: 10.99517889022827, Val acc: 0.5235000252723694```

velvet bronze Mar 12, 2023, 11:37 PM

#

Guys I need to write a program that uses computer vision to detect workers who do not wear safety clothing on site
What are the libraries i'll have to master and what are some suggestions on how i'll go about it?

serene scaffold Mar 12, 2023, 11:38 PM

#

velvet bronze Guys I need to write a program that uses computer vision to detect workers who d...

Forget about "mastering" things. But opencv is a library for computer vision.

velvet bronze Mar 12, 2023, 11:39 PM

#

serene scaffold Forget about "mastering" things. But opencv is a library for computer vision.

Okayy I see, good to know I don't need to master

velvet bronze Mar 12, 2023, 11:40 PM

#

serene scaffold Forget about "mastering" things. But opencv is a library for computer vision.

Will i need tensorflow to train for object detection?

serene scaffold Mar 12, 2023, 11:40 PM

#

velvet bronze Will i need tensorflow to train for object detection?

You might need tensorflow or pytorch

velvet bronze Mar 12, 2023, 11:41 PM

#

serene scaffold You might need tensorflow or pytorch

Oh okay.....Thanks a lot

misty lava Mar 13, 2023, 12:22 AM

#

Using Tweepy for Twitter API

stream = MyStream(bearer_token=credentials.BEARER_TOKEN)

# CLEARS RULESET BEFORE STREAMING DATA
for rule in stream.get_rules().data:
        stream.delete_rules(rule.id)
# ADDING RULES TO RULESET TO STREAM SPECIFIC DATA
stream.add_rules(tweepy.StreamRule("#ETH"))
stream.add_rules(tweepy.StreamRule("$ETH"))
stream.add_rules(tweepy.StreamRule("ETH"))
stream.add_rules(tweepy.StreamRule("Ethereum"))
stream.add_rules(tweepy.StreamRule('-is:retweet'))
stream.add_rules(tweepy.StreamRule('-"Giveaway" -"Participants" -"Winner" -"friends" -"notifications on" -"RT" -"help pay your bills" -"Whale" -"#WLgiveaway" -"#nftgiveaway" -"current Ethereum gas prices"'))
stream.add_rules(tweepy.StreamRule('-"price update" -"join me" -"learn to trade" -"DM" -"tag" -"Send DM" -"price updates" -"chance to win" -"trade and watch" -"follow us" -"opensea" -"swap Alert" -"bought for" -"Item listing" -"#whitelist"'))
stream.add_rules(tweepy.StreamRule('-"Want to win" -"community of real traders" -"discord community" -"staking service" -"#whale" '))

#START STREAM
stream.filter(expansions=["author_id",],tweet_fields=["created_at","referenced_tweets","lang","attachments"])

class MyStream(tweepy.StreamingClient):
    #TWEETS = "STATUS UPDATES".
    def on_connect(self):        # DISPLAYS "CONNECTED" ONCE CONNECTED
        print("Connected") 
    # AVOID RETWEETED TWEETS, NON-ENGLISH TWEETS AND TWEETS WITH ATTACHMENTS, ONLY ORIGINAL ENGLISH TWEETS WITH NO ATACHMENTS ARE STORED 
    def on_tweet(self,tweet):
        print(tweet.data)
        if tweet.referenced_tweets != None or tweet.lang != "en" or tweet.attachments != None:
            return True

Trying to have my stream not show Retweets and not show the phrases/words that have the - " ", Currently the stream shows RT's and tweets containing those words/phrases.

Would appreciate any advice/guidance

novel python Mar 13, 2023, 12:22 AM

#

I have two different dataframes, I want to cross values on them to populate the left one with the names of the company, which is only available on the right one, and both have the company ids. I tried merge here but I'm not sure if that's the correct solution, can't seem to figure it out.

serene scaffold Mar 13, 2023, 12:41 AM

#

novel python I have two different dataframes, I want to cross values on them to populate the ...

please do print(df.head().to_dict('list')) for both dataframes and paste the results as (no screenshots) in the same message. please ping me in that one message.

merry wadi Mar 13, 2023, 1:20 AM

#

Hello everyone! Working on a Node Level GNN binary classification problem that has very low positive classes (8%). I attempted to modify my training with BCELoss to account for the class weights like this.


def train(loader):
  loss_lst = []
  model.train() 
  
  for i, data in enumerate(loader):
    optimizer.zero_grad()
    out = model(data)
    out = out.reshape((data.x.shape[0]))
    loss = criterion(out, data.y.float())
    weight_ = (weight[data.y.data.view(-1).long()].view_as(data.y)  )
    
    new_loss = torch.mean(weight_ * loss)
    
#     loss_class_weighted = weighted_binary_cross_entropy(out, data.y.float(), weights=[0.92, 0.08])
#     loss_class_weighted.backward()
    
    new_loss.backward()
    loss_lst.append(new_loss.detach().numpy())

    optimizer.step()
  return np.average(loss_lst)```
But my results look a little strange over 1000 epochs. Did I do something wrong in the code or is there a better way of handling imbalanced classes?

verbal venture Mar 13, 2023, 3:30 AM

#

does anyone know what type of algo I should use for this? "Use a suitable ML algo, for when every feature column contains data of type integer ranging from 0 to 255 and the target column contains categorical data represented by six integers 1, 2, 3, 4, 5 and 6"

tulip wyvern Mar 13, 2023, 3:37 AM

#

How come my cost isn’t decreasing? It starts at 0.69314 and stays at that throughout all the iterations (decreases by like 0.000001 each 100 iterations). My layers are 12288, 20, 7, 5, 1 and I've tried learning rates of 0.1, 0.002, and 0.00001. It’s a monkey vs gorilla binary classification model that uses 2000 images to train that are each sized 64 x 64.

compact ivy Mar 13, 2023, 3:54 AM

#

hi, what ml model should i use to get the PD of dataset from a banck

#

?

bright stone Mar 13, 2023, 3:54 AM

#

i am trying to finetune gptj for Q&A but have trouble figuring out how to config the tokenizer and data collector, i've tried googling and had read the hugging face document several times with out result, i am sure this may be a simple answer to find (or that's what i thought before finally asking here

serene scaffold Mar 13, 2023, 4:42 AM

#

tulip wyvern How come my cost isn’t decreasing? It starts at 0.69314 and stays at that throug...

0.1 to 0.002 is a pretty big jump. try 0.01, or something

serene scaffold Mar 13, 2023, 4:43 AM

#

bright stone i am trying to finetune gptj for Q&A but have trouble figuring out how to config...

I haven't heard of gptj. try showing some code so that potential answerers for your question have an idea of what to look for.

tulip wyvern Mar 13, 2023, 4:43 AM

#

serene scaffold `0.1` to `0.002` is a pretty big jump. try `0.01`, or something

okay I will try that

#

@serene scaffold also do you think it will help if i size my images as 256 * 256 o rwill that make it worse?

serene scaffold Mar 13, 2023, 4:44 AM

#

tulip wyvern okay I will try that

so the images were actually larger, but you downscaled them to 64 by 64?

tulip wyvern Mar 13, 2023, 4:44 AM

#

Yeah

#

Because I had cost issues with 256 * 256 as well

serene scaffold Mar 13, 2023, 4:44 AM

#

do you have convolutional layers?

tulip wyvern Mar 13, 2023, 4:45 AM

#

No

#

I'm not using any pthon libraries because I wanted to make it from scratch first so I can learn the basics

serene scaffold Mar 13, 2023, 4:45 AM

#

I see

#

so you're not even using numpy? let alone torch/tensorflow?

tulip wyvern Mar 13, 2023, 4:46 AM

#

Im using numpy

serene scaffold Mar 13, 2023, 4:47 AM

#

that's an external library. but maybe you already know that

tulip wyvern Mar 13, 2023, 4:47 AM

#

Yeah

serene scaffold Mar 13, 2023, 4:48 AM

#

anyway, I don't do image AI, but I wouldn't expect to get good results without convolutions anyway

tulip wyvern Mar 13, 2023, 4:48 AM

#

Yeah I got 40% training accuracy and 85% test accuracy

#

That isn't overfitting right because that'd be the other way around

bright stone Mar 13, 2023, 4:57 AM

#

i am trying to finetune gptj for Q&A but have trouble figuring out how to config the tokenizer and data collector, i've tried googling and had read the hugging face document several times with out result, i am sure this may be a simple answer to find (or that's what i thought before finally asking here

def tokenize_function(examples):
    current_tokenizer_result = tokenizer(examples["text"], padding=True, truncation=True)
    return current_tokenizer_result


print("Splitting and tokenizing dataset")
tokenized_datasets = current_dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets["train"].select(range(100))
small_eval_dataset = tokenized_datasets["train"].select(range(100))


training_args = TrainingArguments(output_dir=GPTJ_FINE_TUNED_FILE,
                                  report_to='all',
                                  logging_dir='./logs',
                                  per_device_train_batch_size=1,
                                  #label_names=['input_ids', 'attention_mask'],  # 'logits', 'past_key_values'
                                  num_train_epochs=1,
                                  no_cuda=True
                                  )

metric = evaluate.load("accuracy")


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)


data_collator = DefaultDataCollator(tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    data_collator=data_collator,
)

i am using these parameter
but keep having problems on object type

stone glacier Mar 13, 2023, 8:49 AM

#

Hello, I graduate in like 3 months. Can anyone tell me what projects would be good to have when applying for jobs?

#

As in what specialisation of DSAI?

#

I got like 5 or 7 ready just not sure if I can fit it all in the word docx

odd meteor Mar 13, 2023, 8:54 AM

#

stone glacier As in what specialisation of DSAI?

The best project is usually the one that gives you wings like Red Bull when you're asked to explain why you did the project. Simply, build project that interests you!

odd meteor Mar 13, 2023, 9:01 AM

#

tulip wyvern Yeah I got 40% training accuracy and 85% test accuracy

From 40% to 85% is a huge upgrade. Interesting! And there's no class imbalance in your dataset? Do you have a validation set? What was the accuracy on that?

young granite Mar 13, 2023, 10:06 AM

#

im struggling to create a dropdown for my plotly plots.
i created a legendgroup of IDs and wanted to access the group via dropdown and display the IDs of each group as normal legend.
Therefore i tried to use fig.data but i dont get it

muted crypt Mar 13, 2023, 12:13 PM

#

Hello! I'm working on my thesis and I need to find the most optimal x-shift value that makes the red line (or curves) have the smallest error when compared to the blue one (in the x axis for instance). In my mind I feel like a good approach is working with something similar to the correlation between signals or something similar to align the data. However this is just longitude, latitude data and I can't find much information about a way to solve this online as these are just random paths with no maximum or references to align. Any ideas on how to approach this?

#

(Left image, what I have. Right image is what I'd like to get)

young granite Mar 13, 2023, 12:14 PM

#

muted crypt Hello! I'm working on my thesis and I need to find the most optimal x-shift valu...

u got 2 vectors and want to compare them

muted crypt Mar 13, 2023, 12:15 PM

#

young granite u got 2 vectors and want to compare them

Well this is jus the line made by merging a bunch of points from a test that I did

#

The real data that I have to use are more complex than that

young granite Mar 13, 2023, 12:16 PM

#

so they are random data points and not vectors?

#

vectors would be easy by data points u need to calc. the approx. area sum between them and find when the area is smallest for a given iteration

boreal gale Mar 13, 2023, 12:17 PM

#

muted crypt Hello! I'm working on my thesis and I need to find the most optimal x-shift valu...

I need to find the most optimal x-shift value that makes the red line (or curves) have the smallest error when compared to the blue one (in the x axis for instance)
this smells like an optimisation problem, though your definition of "error" will need to be more fleshed out
have you had a look at scipy.optimize?

muted crypt Mar 13, 2023, 12:17 PM

#

young granite so they are random data points and not vectors?

Thousands of points

#

Here they are just turned into a line to its easier to look at.
The data that I have to deal with looks more like that

muted crypt Mar 13, 2023, 12:21 PM

#

boreal gale > I need to find the most optimal x-shift value that makes the red line (or curv...

I need to see how much the real trajectory differs from the intended one. The thing is that the real one is usually shifted in time and need to find the most suitable shift to compare it fairly

wooden sail Mar 13, 2023, 12:21 PM

#

the cross-correlation function is a good place to start

muted crypt Mar 13, 2023, 12:21 PM

#

boreal gale > I need to find the most optimal x-shift value that makes the red line (or curv...

I haven't heard of spicy.optimize but I'm going to take a look at it
(by the way nice to see you again here!)

tidal bough Mar 13, 2023, 12:22 PM

#

The really naive way would be to find a shift that minimizes mean squared error vertically, which probably even has a simple analytical solution

muted crypt Mar 13, 2023, 12:22 PM

#

wooden sail the cross-correlation function is a good place to start

yes I started with something like that but I got stuck

muted crypt Mar 13, 2023, 12:23 PM

#

tidal bough The really naive way would be to find a shift that minimizes mean squared error ...

True too! I want to do that when comparing both of them. The thing is that there are thousands of points for hundreds of trajectories and I felt like looping would take a ton of time but it has to be the easiest way

boreal gale Mar 13, 2023, 12:25 PM

#

muted crypt I need to see how much the real trajectory differs from the intended one. The th...

okay, then this problem is significantly more complicated than just a simple optimisation of MSE, since you need to expand/contract your observations for a fair comparison, i recall reading some paper on this, let me try to dig them out

muted crypt Mar 13, 2023, 12:26 PM

#

boreal gale okay, then this problem is significantly more complicated than just a simple opt...

somehow I always get into the most complicated stuff that seems easy at first!

wooden sail Mar 13, 2023, 12:26 PM

#

it's always 2 line segments?

muted crypt Mar 13, 2023, 12:27 PM

#

wooden sail it's always 2 line segments?

I wish! the intended trajectory is defined by just a few points and can be divided in lines but the real data are just a ton of points trying to follow the intended path

#

The upper line is the real data while the dotted line the intended path. There's a shift in time (To) as well as space error

wooden sail Mar 13, 2023, 12:29 PM

#

if it's only a slice as shown here, it's not SUPER complicated

boreal gale Mar 13, 2023, 12:29 PM

#

muted crypt somehow I always get into the most complicated stuff that seems easy at first!

i remember the name now! the keyword is "dynamic time warping"
i think this at least gives you a hint how to expand/contract your observations, with this fix maybe this becomes a simple optimisation of MSE?

wooden sail Mar 13, 2023, 12:30 PM

#

if you need to contract the observations, it immediately becomes challenging to find the global optimum. if you only want a shift, that may still be possible

muted crypt Mar 13, 2023, 12:30 PM

#

wooden sail if it's only a slice as shown here, it's not SUPER complicated

I have to do it for a whole trajectory like this so I suppose maybe we would have to split it in segments?

wooden sail Mar 13, 2023, 12:30 PM

#

oh boy, it gets better

#

the usual definition of distance is useless here

muted crypt Mar 13, 2023, 12:31 PM

#

of course it has to be a damn drone which adds the vertical component

wooden sail Mar 13, 2023, 12:31 PM

#

your metric should be something like a wasserstein distance or something of the sort

#

if you ONLY want shifts, it's still not super difficult

muted crypt Mar 13, 2023, 12:32 PM

#

boreal gale i remember the name now! the keyword is "dynamic time warping" i think this at l...

sounds like solid thing! I'll have to investigate about that because it's the first time I hear about this. I thought about splitting the dimensions in order to make it a 2D MSE

wooden sail Mar 13, 2023, 12:32 PM

#

2d mse alone won't work

muted crypt Mar 13, 2023, 12:32 PM

#

wooden sail if you ONLY want shifts, it's still not super difficult

That's what I want to begin with

wooden sail Mar 13, 2023, 12:33 PM

#

you need to assign each point on one curve to one point on the other

#

otherwise the distance is ill-defined

muted crypt Mar 13, 2023, 12:33 PM

#

wooden sail you need to assign each point on one curve to one point on the other

oh yes, I see what you mean

wooden sail Mar 13, 2023, 12:33 PM

#

measuring distance is an optimization problem all of its own in this scenario, and then minimizing that distance is very expensive

muted crypt Mar 13, 2023, 12:33 PM

#

i was wondering about calculating the area between the 2 curves but that seems complex

wooden sail Mar 13, 2023, 12:34 PM

#

that's a suitable approach too, but you still need to pair the points to be able to do that

#

or maybe not, actually

muted crypt Mar 13, 2023, 12:34 PM

#

you're right

wooden sail Mar 13, 2023, 12:34 PM

#

hmm it's pretty challenging

muted crypt Mar 13, 2023, 12:35 PM

#

in 2d it would be an integral but I'd need to find a function which adapts to the trajectory with regression I guess

wooden sail Mar 13, 2023, 12:35 PM

#

the 2d figure is curved though

tidal bough Mar 13, 2023, 12:35 PM

#

Looking at the actual graphs you have makes me gravitate towards hacky solutions like

iterating over the actual trajectory, find the closest point on the intended trajectory to each real one (this can be inaccurately done in O(N) if the trajectories are close enough together that you can assume the closest point shifts smoothly along the intended trajectory)
calculate the mean of squares of these distances, and take that as your loss.

#

it'd super not work if the trajectories were significantly different, but yours look close enough

wooden sail Mar 13, 2023, 12:36 PM

#

it would still work if the trajectories were different if you find the closest point correctly

#

the problem is precisely that though. even measuring the distance is an optimization problem

tidal bough Mar 13, 2023, 12:37 PM

#

that's true, probably a spatial partitioning kind of task

wooden sail Mar 13, 2023, 12:37 PM

#

i'd look at it as the optimal linear assignment task

#

time for some yonker-volgenant

muted crypt Mar 13, 2023, 12:37 PM

#

But I would need to make points in between right? for instance the intended trajectory are like 15 points in space while the real one is made out of thousands

#

it's like finding the closest distance from a point to a line

wooden sail Mar 13, 2023, 12:38 PM

#

you may need to interpolate or zero pad

tidal bough Mar 13, 2023, 12:38 PM

#

for my hacky idea it doesn't really change things since you can just turn each line segment into a hundred points or whatever

wooden sail Mar 13, 2023, 12:38 PM

#

you have to do something so that the curves are the same length and sampled at the same intervals, ideally

#

e.g. interpolation and zero padding

muted crypt Mar 13, 2023, 12:39 PM

#

#

adding points in between can be done

#

ZBAjh37hxFRUWMGDFCfayoqIiqqioA8vPzOXDgAA4ODm14lUL8vslIixDCJLRlHySoHWUJDg6u19Dw7NmzzJ49mw4dOlBTU8PChQsltAjRhmTJsxBCCCGMjSx5FkIIIYTpktAihBBCCJMgoUUIIYQQJkFCixBCCCFMgoQWIYQQQpgECS1CCCGEMAkSWoQQQghhEiS0CCGEEMIkSGgRQgghhEmQ0CKEEEIIkyChRQghhBAmQUKLEEIIIUyChBYhhBBCmAQJLUIIIYQwCRJahBBCCGESJLQIIYQQwiRIaBFCCCGESZDQIoQQQgiToP2N5zVtchZCCCGEEL9BRlqEEEIIYRIktAghhBDCJEhoEUIIIYRJkNAihBBCCJMgoUUIIYQQJkFCixBCCCFMwv8B5WuNCbsuMpYAAAAASUVORK5CYII.png

muted crypt Mar 13, 2023, 12:40 PM

#

wooden sail you have to do something so that the curves are the same length and sampled at t...

same length in the number of points you mean?

wooden sail Mar 13, 2023, 12:40 PM

#

no, length in space

#

part of the problem requires deciding what to do about points that cannot be paired. one solution is to artificially extend the curves

#

actually, yes, since i also suggested a regular sampling interval

#

the two things are the same

#

if not though, the issue is that you'd have to make a decision about what to do with leftover points that cannot be paired

#

cardinality mismatch is an issue with wasserstein metrics

muted crypt Mar 13, 2023, 12:43 PM

#

right, my first idea was to count the number of points of the real trajectory, then add points inbetween the lines of the intended trajectory and compute the distance as each point would have a pair but i'll look at what you mention

#

but extending the curves requires me to find the distance and seems like a hard thing to do with so many points and noise

wooden sail Mar 13, 2023, 12:45 PM

#

yeah that's maybe not the best approach. i think adding points in between should do

muted crypt Mar 13, 2023, 12:48 PM

#

wooden sail yeah that's maybe not the best approach. i think adding points in between should...

I'm going to try this first but before I'll need a way to fix the time error too I suppose :(

#

I guess the shifting is the hard part

#

otherwise the time shift will be taken into account in the error

#

or maybe not

wooden sail Mar 13, 2023, 12:49 PM

#

the time shift will be in the error, sure

boreal gale Mar 13, 2023, 12:49 PM

#

would you mind dumping an example of real trajectory and the intended trajectory somewhere? i want to try some stuff 🙂

muted crypt Mar 13, 2023, 12:50 PM

#

wooden sail the time shift will be in the error, sure

true true

muted crypt Mar 13, 2023, 12:50 PM

#

boreal gale would you mind dumping an example of real trajectory and the intended trajectory...

how can I send this? I have it in dataframes

#

a csv I suppose that works

boreal gale Mar 13, 2023, 12:51 PM

#

yeah CSV on https://paste.pythondiscord.com/ should be fine!

cinder schooner Mar 13, 2023, 12:52 PM

#

Hello, so i have an image classification project with bird images and 30 classes. I tried using different architectures but they are all overfitting somehow. I tried adding some dropout, lowering the batchsize, changing the learning rate and I tried adding some horizontal and vertical flip transformations. But it aint better. I have 1500 train images with 51 image +- for each classe and 270 validation images. What can i / should try?

muted crypt Mar 13, 2023, 12:53 PM

#

boreal gale yeah CSV on https://paste.pythondiscord.com/ should be fine!

okay let me see if I can do this, wait a sec

muted crypt Mar 13, 2023, 12:59 PM

#

boreal gale yeah CSV on https://paste.pythondiscord.com/ should be fine!

Here you go,
Intended: https://paste.pythondiscord.com/budagaqufi
Real: https://paste.pythondiscord.com/esumidefug
(secs is the seconds column)

boreal gale Mar 13, 2023, 1:04 PM

#

sweet! thanks

tidal bough Mar 13, 2023, 1:09 PM

#

👀

#

here's what I got with the closest-point-but-naively approach on a shitty generated trajectory

#

from scipy.spatial.distance import cdist


def find_closest(expected: np.ndarray, real: np.ndarray) -> np.ndarray:
    d = expected.shape[1]
    M = expected.shape[0]
    N = real.shape[0]
    assert expected.shape == (M, d), expected.shape
    assert real.shape == (N, d), real.shape

    closest_inds = np.zeros((N,), dtype=int)
    closest_inds[0] = np.argmin(cdist(real[:1, :], expected).reshape(-1))
    for i in range(1, N):
        cur_pnt = real[i, :]
        previous_closest = closest_inds[i - 1]
        # search back and forth from this index only to the local minimum
        closest_dist = None
        closest_ind = None
        for j in range(previous_closest - 1, -1, -1):
            dst = np.linalg.norm(cur_pnt - expected[j, :])
            if closest_dist is None or dst <= closest_dist:
                closest_dist = dst
                closest_ind = j
            else:
                break
        for j in range(previous_closest, M):
            dst = np.linalg.norm(cur_pnt - expected[j, :])
            if closest_dist is None or dst <= closest_dist:
                closest_dist = dst
                closest_ind = j
            else:
                break
        # and take that as the new closest
        closest_inds[i] = closest_ind
    return closest_inds

#

here's the approach I used. This is probably pretty slow because it's looping in Python, but that can be fixed by making it numba or cython. And the important part is that the number of distances that it has to examine for each point of the real trajectory is probably pretty low - probably constant in most cases, even.

boreal gale Mar 13, 2023, 1:14 PM

#

what's the code you used to plot that?

wooden sail Mar 13, 2023, 1:14 PM

#

tidal bough ```py from scipy.spatial.distance import cdist def find_closest(expected: np.n...

you can save yourself all the square roots

tidal bough Mar 13, 2023, 1:14 PM

#

boreal gale what's the code you used to plot that?

closest_inds = find_closest(expected=sampled_pts, real=real_trajectory)
plt.figure("closest distances", clear=True)
plt.plot(sampled_pts[:, 0], sampled_pts[:, 1], "o-", ms=3)
plt.plot(real_trajectory[:, 0], real_trajectory[:, 1], "o-", ms=3)
for a, ind in zip(real_trajectory, closest_inds):
    b = sampled_pts[ind]
    plt.plot([a[0], b[0]], [a[1], b[1]], "r-")
plt.show()

sampled_pts is just 100 points obtained by linear interpolation of the intended trajectory

wooden sail Mar 13, 2023, 1:14 PM

#

squaring preserves ordering of positive numbers, so the dot products suffice

tidal bough Mar 13, 2023, 1:14 PM

#

wooden sail you can save yourself all the square roots

yeah, I can

wooden sail Mar 13, 2023, 1:16 PM

#

very nice demo. this already probably suffices for a nice neuristic

tidal bough Mar 13, 2023, 1:17 PM

#

oh, and the trajectory I used is this:

intended_trajectory = np.array([[0, 0], [0, 1], [1, 1], [1, 0], [0, 0]]) # just a square
T_fin = len(intended_trajectory)
intended_interp = scipy.interpolate.interp1d(np.arange(T_fin), intended_trajectory, axis=0, bounds_error=False, fill_value=[0,0])

N = 100
time = np.linspace(0, T_fin, N, endpoint=False, dtype=np.float64)
sampled_pts = intended_interp(time)
deviation = np.zeros_like(sampled_pts)
deviation[:, 0] = (0.1 * np.sin(time * 2.7 + 0.2) + 0.1 * np.sin(time * 6.6 + 0.97)) * np.linspace(0, 1, N)
deviation[:, 1] = (0.12 * np.sin(time * 3.2 + 0.5) + 0.1 * np.sin(time * 7.4 + 0.12)) * np.linspace(0, 1, N)
real_trajectory = sampled_pts + deviation

muted crypt Mar 13, 2023, 1:18 PM

#

tidal bough 👀

what a beast of a reptile 🦎

boreal gale Mar 13, 2023, 1:27 PM

#

fwiw, same thing but ~~still~~ with dynamic time warping (code: https://paste.pythondiscord.com/qokahovera)

#

the paired points are quite different in some places as expected

#

with dynamic time warping, you can only match current or future point in the correpsonding time series, which might be a nice property for you

#

the arrow indicates where this property is not observed with the closest point approach

young granite Mar 13, 2023, 1:31 PM

#

does one know plotly a fair bit?
i create traces in a for loop and later want to access em via buttons,
but i dont know how to access them in a smart way.

for group in dict.keys():
  fig.add_trace()
    for ID in group:
      fig.add_trace()

i normally assign group as legendgroup but i wanted to improve and use buttons for groups so that in the legend all names are plotted instead of just the group_name

wooden sail Mar 13, 2023, 1:43 PM

#

i would use this one https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html, but the setup is very expensive as you need all pairwise distances. in exchange, it assigns only one point on one curve to one point in the other, and it's the "best one"

#

you can append points at infinity to the shorter list, and decide later what to do about them

muted crypt Mar 13, 2023, 1:56 PM

#

boreal gale with dynamic time warping, you can only match current or future point in the cor...

Do you see a lot of difference when using dynamic time warping?

tidal bough Mar 13, 2023, 1:57 PM

#

currently trying to apply the same code to the trajectory Pau342 posted

#

it looks like it fails but it's hard to tell because the 3d plot lags as hell and crashed my jupyter once already 🥴

#

having trouble rotating the plot to see, but it seems at some point it jumps and starts being very wrong

muted crypt Mar 13, 2023, 1:59 PM

#

tidal bough it looks like it fails but it's hard to tell because the 3d plot lags as hell an...

are you trying to plot the trajectory and the closest distances in 3d like you did with the example from before?

wooden sail Mar 13, 2023, 2:00 PM

#

~~yours is only index based right?~~ tbh i didn't read your code carefully so i'm not sure how you're doing the assignment 😛

muted crypt Mar 13, 2023, 2:00 PM

#

tidal bough having trouble rotating the plot to see, but it seems at some point it jumps and...

oh yes! it seems to work great in some areas and then somehow focuses in the wrong part

#

maybe there are more points in the segment from the back and when it increases the index it goes to that area

boreal gale Mar 13, 2023, 2:02 PM

#

is it a case where espg 4326 (i.e. lat lng) just isn't that comparable when you have altitude?

maybe you need to convert to espg 27700 (or basically make lat lng into meters just like altitude) first?

muted crypt Mar 13, 2023, 2:02 PM

#

tidal bough having trouble rotating the plot to see, but it seems at some point it jumps and...

you can use that to rotate the plot!
ax.view_init(elev=090, azim=0)

tidal bough Mar 13, 2023, 2:03 PM

#

btw @wooden sail, here's a math question: suppose you have two continuous trajectories a(t), b(t) (continuous functions mapping from [0,1] to a 3d space). You then construct a function d(t) = f(t') where t' from [0,1] is such that b(t') is the closest point to a(t) over all possible times. Can anything be said about the conditions for d(t) being continuous? There are obviously cases in which it is (trivial one: a=b=d), and cases in which it isn't (imagine a horseshoe-shaped a and a line-shaped b which connects the ends of a - in the middle, d(t) is going to jump from the left to the right end of the horseshoe).

#

this seems like the kind of thing there's been papers on, tbh

wooden sail Mar 13, 2023, 2:06 PM

#

tidal bough btw <@467435887236612106>, here's a math question: suppose you have two continuo...

oof

muted crypt Mar 13, 2023, 2:06 PM

#

boreal gale is it a case where espg 4326 (i.e. lat lng) just isn't that comparable when you...

what I did is to convert lat, lon, alt to x, y, z with pyproj in order to have better numbers to work with, maybe some functions don't take all the decimals or someting

sleek harbor Mar 13, 2023, 2:06 PM

#

Is this any good, or is there something better (for Data Science)? Any good statistics courses on Coursera?

https://www.khanacademy.org/math/statistics-probability/

Khan Academy

Statistics and Probability | Khan Academy

Learn statistics and probability for free—everything you'd want to know about descriptive and inferential statistics. Full curriculum of exercises and videos.

wooden sail Mar 13, 2023, 2:07 PM

#

tidal bough btw <@467435887236612106>, here's a math question: suppose you have two continuo...

continuous in what sense? the easiest that comes to mind is constraining the first n derivatives

#

so a C^n continuous curve

#

the part about b(t') isn't actually that important

tidal bough Mar 13, 2023, 2:09 PM

#

I was thinking just C^0-continuous, but sure, it doesn't really matter

boreal gale Mar 13, 2023, 2:09 PM

#

muted crypt what I did is to convert lat, lon, alt to x, y, z with pyproj in order to have b...

ah my point is more the points jumping all over the place might be due to difference in scale of lat/lng and altitude.
i think cdist only makes sense when the scale of all lat, lng and alt are all the same.
i.e. there is nothing wrong with having lat,lng,alt, but to use cdist for the closest point approach, maybe you need to convert to a consistent scale (e.g. in meters)?

wooden sail Mar 13, 2023, 2:10 PM

#

tidal bough I was thinking just C^0-continuous, but sure, it doesn't really matter

i would consider it to be the same continuity conditions needed for a curve passing through m points, ignoring the b(t') condition

tidal bough Mar 13, 2023, 2:11 PM

#

boreal gale ah my point is more the points jumping all over the place might be due to differ...

ah, yikes, that's probably correct

#

that may well be the reason

muted crypt Mar 13, 2023, 2:13 PM

#

boreal gale ah my point is more the points jumping all over the place might be due to differ...

makes sense!

tidal bough Mar 13, 2023, 2:13 PM

#

lemme alter the distance function...

boreal gale Mar 13, 2023, 2:25 PM

#

an alternative quick hack would be lat_but_in_meters = lat / 0.0000089987192
but the proper solution is probably to actually use haversine.. though i haven't dealt with 3d version of it and i am not sure if it even exists..

muted crypt Mar 13, 2023, 2:28 PM

#

boreal gale an alternative quick hack would be `lat_but_in_meters = lat / 0.0000089987192` ...

i think that the simplest way is to express lat and lon in meters with respect to the earth center and then substract the values from the first point to have the distances in meters with respect to the first point (0, 0, 0) meters

tidal bough Mar 13, 2023, 2:29 PM

#

wow, I should not be having this many problems with implementing a distance function but I am

#

ok there, I did it.

#

i managed to mix up the latitude and longtitude, forgot to convert to radians, and forget about the radius of the earth in process

muted crypt Mar 13, 2023, 2:31 PM

#

tidal bough i managed to mix up the latitude and longtitude, forgot to convert to radians, a...

from pyproj import Transformer
wgs_utm31=Transformer.from_crs("EPSG:4326",'EPSG:32631',always_xy=True)
def xy(lon, lat, ref_lon, ref_lat):
x_ref, y_ref = wgs_utm31.transform(ref_lon,ref_lat)
x_p,y_p=wgs_utm31.transform(lon, lat) #x_point
x = x_ref - x_p # x -> equator
y = y_ref - y_p # y -> meridian
return x,y

versed gulch Mar 13, 2023, 2:31 PM

#

for training an AI model does it make sense to reduce the learning rate during within training based on the training loss or the validation loss?

muted crypt Mar 13, 2023, 2:31 PM

#

muted crypt from pyproj import Transformer wgs_utm31=Transformer.from_crs("EPSG:4326",'EPSG:...

maybe that helps

tidal bough Mar 13, 2023, 2:31 PM

#

def earth_distance(a, b):
    ϕ1, θ1, r1 = a
    ϕ2, θ2, r2 = b
    θ1, θ2 = (np.radians(θ) + np.pi / 2 for θ in (θ1, θ2))
    ϕ1, ϕ2 = (np.radians(ϕ) for ϕ in (ϕ1, ϕ2))
    r1, r2 = (r + 6400e3 for r in (r1, r2))
    return r1**2 + r2**2 - 2 * r1 * r2 * (np.sin(θ1) * np.sin(θ2) * np.cos(ϕ1 - ϕ2) + np.cos(θ1) * np.cos(θ2))

but I think this is now right

#

sadly it doesn't help

boreal gale Mar 13, 2023, 2:34 PM

#

how did you do the interpolation between the waypoints btw? just wondering am i doing something dumb and slow..

tidal bough Mar 13, 2023, 2:34 PM

#

huh. Are the points in the intended trajectory supposed to be dozens of kilometers apart?

>>> [earth_distance(a,b) for a,b in zip(intended_trajectory, intended_trajectory[1:])]
[17420.6875,
 4938.859375,
 44526.0,
 4938.859375,
 1661.421875,
 0.0,
 100.0,
 2861.4375,
 907.15625,
 38189.703125,
 907.15625,
 19351.90625]

muted crypt Mar 13, 2023, 2:34 PM

#

tidal bough huh. Are the points in the intended trajectory supposed to be dozens of kilomete...

they are meters apart, even less for the real trajectory

wooden sail Mar 13, 2023, 2:36 PM

#

tidal bough huh. Are the points in the intended trajectory supposed to be dozens of kilomete...

that's one capable drone

muted crypt Mar 13, 2023, 2:37 PM

#

tidal bough Mar 13, 2023, 2:37 PM

#

oh wait, I forgot these are squared dists

#

that puts the first and second points 131m apart, which seems to match yours, yay

muted crypt Mar 13, 2023, 2:38 PM

#

nice!

tidal bough Mar 13, 2023, 2:39 PM

#

velocity over time, slightly EWM-smoothed

muted crypt Mar 13, 2023, 2:40 PM

#

tidal bough velocity over time, slightly EWM-smoothed

wo!

tidal bough Mar 13, 2023, 2:40 PM

#

sadly that still doesn't help with whatever's happening with the plot

#

here's closest_inds. There's indeed a few discontinuities here

#

and I don't get why...

muted crypt Mar 13, 2023, 2:45 PM

#

hmmm in some point of the trajectory, there's like 10 seconds where the drone stops and hovers in the air, which is show there but maybe that messes up the points that the next point are paired with

tidal bough Mar 13, 2023, 2:47 PM

#

here's closest distance over point index

#

the two small peaks are turns, and are correct

#

whatever the hell happens after ~800 probably isn't

muted crypt Mar 13, 2023, 2:49 PM

#

does the function imply that a point in the future can't be closer to a point of the trajectory from the past? I don't know if that makes sense but I feel like at some point it skips to a further point and then all the next points have no choice but to calculate the distance to a point that's shifted

#

because it works pretty smoothly for half of the trajectory

tidal bough Mar 13, 2023, 2:50 PM

#

muted crypt does the function imply that a point in the future can't be closer to a point of...

does the function imply that a point in the future can't be closer to a point of the trajectory from the past
no, it does search back in time too (but only to a local minimum, yadda yadda)
but I feel like at some point it skips to a further point and then all the next points have no choice but to calculate the distance to a point that's shifted
that might be the case, yeah

#

actually lemme plot something interesting

wooden sail Mar 13, 2023, 2:51 PM

#

embrace brute force

muted crypt Mar 13, 2023, 2:51 PM

#

like Ry shared it seems that at the end there is an area where all points merge together, maybe that's the closest dist but maybe it has run our of index

wooden sail Mar 13, 2023, 2:52 PM

#

how many points are there in each curve

muted crypt Mar 13, 2023, 2:53 PM

#

it's not specificied, the real trajectory is made out of the data recorded by a drone every 0.1 seconds

#

so if it takes 10 seconds for a turn would be 100 points

tidal bough Mar 13, 2023, 2:54 PM

#

boreal gale how did you do the interpolation between the waypoints btw? just wondering am i ...

if this was to me, I did

T_fin = len(intended_trajectory)
intended_interp = scipy.interpolate.interp1d(np.arange(T_fin), intended_trajectory, axis=0, bounds_error=False, fill_value=intended_trajectory[-1])
M = 1000
time = np.linspace(0, T_fin, M, endpoint=False, dtype=np.float64)
sampled_pts = intended_interp(time)

#

currently calculating the million-element distance matrix between the real and sampled points to compare the actual distances against my naive ones

#

it's not going fast, mostly because it uses a custom distance function. and I can't even rewrite it in numba because I'm using py3.11 😔

wooden sail Mar 13, 2023, 2:56 PM

#

how are you computing it

#

this is the sort of stuff you'd einsum

tidal bough Mar 13, 2023, 2:58 PM

#

scipy.spatial.distance.cdist with metric=earth_distance

#

can probably be done much better in several cdists over the individual coords

#

okay, I moved to python3.10

#

matrix calculation time went from "at least 2 minutes" to 5.6s

#

god I love numba

wooden sail Mar 13, 2023, 3:01 PM

#

i can't find anywhere what earth_distance does

#

if you've converted to meters and have arrays, say, of size n x3 and m x 3

tidal bough Mar 13, 2023, 3:02 PM

#

tidal bough ```py def earth_distance(a, b): ϕ1, θ1, r1 = a ϕ2, θ2, r2 = b θ1, θ2...

^

tidal bough Mar 13, 2023, 3:03 PM

#

tidal bough matrix calculation time went from "at least 2 minutes" to 5.6s

oh no, something went wrong

#

okay fixed

boreal gale Mar 13, 2023, 3:06 PM

#

corresponding plots from me, distance is presumably meters (code: https://paste.pythondiscord.com/ovicodofof)
i did piecewise interpolation between each waypoints pairs instead of every waypoints estimated by one function
also only did a / 0.0000089987192 hack to convert to "meters"

time for me to get back to real work 😩 , good luck!

tidal bough Mar 13, 2023, 3:07 PM

#

#

so I guess my algorithm is just bad and needs at least some global search part

wooden sail Mar 13, 2023, 3:08 PM

#

i think it'd be a fair bit faster if you first convert to meters and then do a broadcasted difference

muted crypt Mar 13, 2023, 3:08 PM

#

how did you go from the local search to the actual closest

tidal bough Mar 13, 2023, 3:09 PM

#

tidal bough

a-ha, but look at the first one! it looks like the problems began when there was an actual, genuine discontinuity in the closest indices!

#

so that's why my local search failed

wooden sail Mar 13, 2023, 3:09 PM

#

muted crypt how did you go from the local search to the actual closest

through the magic of brute force

tidal bough Mar 13, 2023, 3:09 PM

#

muted crypt how did you go from the local search to the actual closest

actual_closest is obtained by calculating all the distances between real_trajectory points and sample_pts

#

and it only takes like 7s, god I love numba x2

muted crypt Mar 13, 2023, 3:10 PM

#

why haven't I heard of numba before

tidal bough Mar 13, 2023, 3:10 PM

#

tidal bough and it only takes like 7s, god I love numba x2

from numba import njit
@njit
def earth_distance(a, b):
    ϕ1, θ1, r1 = a
    ϕ2, θ2, r2 = b
    θ1 = np.radians(θ1) + np.pi / 2
    θ2 = np.radians(θ2) + np.pi / 2
    ϕ1 = np.radians(ϕ1)
    ϕ2 = np.radians(ϕ2)
    r1 = r1 + 6400e3
    r2 = r2 + 6400e3
    return r1**2 + r2**2 - 2 * r1 * r2 * (np.sin(θ1) * np.sin(θ2) * np.cos(ϕ1 - ϕ2) + np.cos(θ1) * np.cos(θ2))

true_distance_matrix = np.sqrt(cdist(real_trajectory, sampled_pts, metric=earth_distance))

wooden sail Mar 13, 2023, 3:10 PM

#

now remove the sqrt to speed up the code more :p

tidal bough Mar 13, 2023, 3:10 PM

#

i need the sqrt for the plot later

#

but I guess I can take it after the argmin

wooden sail Mar 13, 2023, 3:11 PM

#

you can sqrt the true vals

tidal bough Mar 13, 2023, 3:11 PM

#

meh, who cares

wooden sail Mar 13, 2023, 3:11 PM

#

right, after you have the support

#

that's probably the slowest operation in what you have left

#

i'd almost bet you can shave the time in half

tidal bough Mar 13, 2023, 3:11 PM

#

alright, alright, fixed

wooden sail Mar 13, 2023, 3:12 PM

#

now it takes 20s

tidal bough Mar 13, 2023, 3:12 PM

#

5.3s :p

#

but neither of the measurements are good, so

muted crypt Mar 13, 2023, 3:16 PM

#

tidal bough

could you share the code that you used to obtain this?

#

somehow the one that ry shared doesn't work for me

tidal bough Mar 13, 2023, 3:17 PM

#

muted crypt could you share the code that you used to obtain this?

https://paste.pythondiscord.com/fewujodinu
here's my whole notebook, these plots are the closest_inds_3d and closest dists ones

boreal gale Mar 13, 2023, 3:18 PM

#

muted crypt somehow the one that ry shared doesn't work for me

doesn't work in what way 🙀

#

eh i really ought to use jupytext :\

tidal bough Mar 13, 2023, 3:20 PM

#

i'm actually using vscode's jupyter support

muted crypt Mar 13, 2023, 3:20 PM

#

boreal gale doesn't work in what way 🙀

I'm trying to understand this

tidal bough Mar 13, 2023, 3:23 PM

#

incidentally, if your real trajectories have about this number of points, you could maybe just use the bruteforce solution 🥴

#

well, if you need to perform such a search once. then ~5s is okay

#

if you're going to e.g. use it as a loss function while optimizing the trajectory, then you probably want it to be much faster

muted crypt Mar 13, 2023, 3:24 PM

#

i have like 80 different trajectories 🫠

#

but yes, it's just a result to do once to show the results

tidal bough Mar 13, 2023, 3:25 PM

#

actually... this can be made much faster by just removing interpolation. For each point in the real trajectory and for each line segment in the intended trajectory, find the closest point on the segment using the exact algorithm. written in numba or cython it'll be very fast

wooden sail Mar 13, 2023, 3:25 PM

#

we're calling it brute force, but constructing the distance matrix is also necessary to solve the problem optimally in many metrics

muted crypt Mar 13, 2023, 3:25 PM

#

i'm not sure if I understand yet the difference between brute force and the code that you did

wooden sail Mar 13, 2023, 3:26 PM

#

reptile's alg tries to avoid computing the full matrix

muted crypt Mar 13, 2023, 3:27 PM

#

like caluculating the distance for just a few points against calculating the distance to each point?

tidal bough Mar 13, 2023, 3:28 PM

#

the idea of my local search algorithm is that if ith of the real trajectory was closest to jth point of the intended one, then the i+1th point will be closest to one of the nearby ones to the jth one. So it searches backward (points j, j-1, j-2...) only until the distance stops dropping, and forward with the same stopping condition. So basically, it finds the local minimum of distance starting the search at point j.

#

the hope is that in many cases, this local minimum is also the global one. turns out, it's not always the case.

wooden sail Mar 13, 2023, 3:29 PM

#

it's not the case if you have loops

muted crypt Mar 13, 2023, 3:29 PM

#

tidal bough the idea of my local search algorithm is that if `i`th of the real trajectory wa...

okay that makes a lot of sense now! you seem to know what you're doing!

#

I'm installing the visual studio you mentioned because it didn't work for me in jupyter :(

wooden sail Mar 13, 2023, 3:30 PM

#

tidal bough the idea of my local search algorithm is that if `i`th of the real trajectory wa...

btw this makes a critical assumption that will almost always fail

#

that there is no error in the locations

#

but gps is very noisy

boreal gale Mar 13, 2023, 3:30 PM

#

muted crypt I'm trying to understand this

i don't understand why you are getting that tbh, but https://paste.pythondiscord.com/iwoqopajaj is an alternative version which does the same piecewise interpolation

muted crypt Mar 13, 2023, 3:31 PM

#

wooden sail but gps is very noisy

yes!

#

I suppose that the error that was appearing before was due to a corrupt point that was causing trouble

wooden sail Mar 13, 2023, 3:32 PM

#

that could be it, or following a path involving loops that moves away and then returns to the curve several times

muted crypt Mar 13, 2023, 3:32 PM

#

boreal gale i don't understand why you are getting that tbh, but https://paste.pythondiscord...

in the same spot

#

Why code never works at the first time for me!

boreal gale Mar 13, 2023, 3:32 PM

#

ah that means your csv is not the same as mine

muted crypt Mar 13, 2023, 3:33 PM

#

boreal gale ah that means your csv is not the same as mine

how did you get the FPLlat names?

#

did I send it like this?

tidal bough Mar 13, 2023, 3:33 PM

#

you did

boreal gale Mar 13, 2023, 3:33 PM

#

yep

muted crypt Mar 13, 2023, 3:33 PM

#

what

boreal gale Mar 13, 2023, 3:34 PM

#

https://paste.pythondiscord.com/budagaqufi.py this little bugger here

muted crypt Mar 13, 2023, 3:36 PM

#

boreal gale https://paste.pythondiscord.com/budagaqufi.py this little bugger here

okay facts

#

I though I sent other files, they are the same but with different columns

#

it works now

tidal bough Mar 13, 2023, 3:38 PM

#

tidal bough actually... this can be made much faster by just removing interpolation. For eac...

trying to figure out this solution, now

muted crypt Mar 13, 2023, 3:38 PM

#

boreal gale yep

ry once again never ceases to amaze me

#

confusedReptile

#

You are a beast too, what a legend 🏅

tidal bough Mar 13, 2023, 3:39 PM

#

oh no, realised there's a bit of a problem - I'm definitely not calculating shortest-point-on-segment in spherical coords

muted crypt Mar 13, 2023, 3:39 PM

#

and Edd, another king here
The three kings of the server, I owe you one!

tidal bough Mar 13, 2023, 3:40 PM

#

so I guess I'll have to turn them all into euclidean coords after all

muted crypt Mar 13, 2023, 3:40 PM

#

tidal bough so I guess I'll have to turn them all into euclidean coords after all

seems always to be the most comfortable thing

wooden sail Mar 13, 2023, 3:40 PM

#

i'm glad reptile and ry are going the extra mile and demoing all of this

#

this problem really isn't easy, so i sat it out lol

#

i just critique from the back row and yell at you for holding the flashlight wrong

muted crypt Mar 13, 2023, 3:42 PM

#

ry know I always come with interesting problems to solve despite having no idea on how to code I always get in messes like this that I would never ever solve on my own

wooden sail Mar 13, 2023, 3:44 PM

#

there are papers upon papers on problems like this one published every year. there are several approaches to measuring the distance, and then several more to minimizing it

boreal gale Mar 13, 2023, 3:44 PM

#

well the problem is too interesting to not try it out myself 🙈 and i happen to know a technique that seems relevant.. (but when you have a hammer everything looks like a nail lol)
kudos to reptile coming up with an algo!

muted crypt Mar 13, 2023, 3:45 PM

#

i propose ry and reptile to team up, you two can solve everything!

wooden sail Mar 13, 2023, 3:45 PM

#

boreal gale well the problem is too interesting to not try it out myself 🙈 and i happen to ...

this is pretty good tbh. if you're very familiar with solving a specific kind of problem, it's always worth a shot converting other problems into the one you already know

muted crypt Mar 13, 2023, 3:47 PM

#

yeah, that seems like a very specific way of solving this problem. Every line of code contains something new for me so I'll slowly try to understand everything, years of experience here!

muted crypt Mar 13, 2023, 4:03 PM

#

by the way, I might just use part of this code in my bachelor's thesis, I was wondering if you're okay with that? might as well put you in the credits hah @boreal gale @tidal bough

wooden sail Mar 13, 2023, 4:06 PM

#

"huh. Are the points in the intended trajectory supposed to be dozens of kilometers apart?" [10] Reptile, Confused. In data-science-and-ai, PYTHON DISCORD (2023).

tidal bough Mar 13, 2023, 4:07 PM

#

i did it

#

#

this is an exact solution

muted crypt Mar 13, 2023, 4:08 PM

#

wooden sail "huh. Are the points in the intended trajectory supposed to be dozens of kilomet...

credits to Edd, the creator of "brute force"

tidal bough Mar 13, 2023, 4:08 PM

#

and it runs in like 30ms

muted crypt Mar 13, 2023, 4:08 PM

#

tidal bough

you legend

wooden sail Mar 13, 2023, 4:08 PM

#

oh very nice

#

what'd you change this time

tidal bough Mar 13, 2023, 4:08 PM

#

they key is to abuse the fact the intended trajectory is just line segments

@njit
def closest_point_on_segment(v: np.ndarray, w: np.ndarray, p: np.ndarray):
    "closest point on segment vw to point p"
    # https://stackoverflow.com/a/1501725
    vw = w - v
    length_squared = vw.T @ vw
    if length_squared <= 1e-8:
        return v
    t = max(0, min(1, (p - v).T @ vw / length_squared))
    res = v + t * vw
    return res


@njit
def find_closest_lines(expected: np.ndarray, real: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    All inputs should be in euclidean coordinates. All input arrays should be float, or numba won't like it
    expected: (M, d), the waypoints of the intended trajectory, defining M-1 line segments of the trajectory.
    real: (N,d), real trajectory.
    returns:
        closest_pts, (N,d) array - for each point in the real trajectory, closest point along the intended trajectory
        closest_dists, (N,) array - squared distance to that point
        closest_segment_ind, (N,) array of ints - index of the segment that point belongs to (from 0 to M-1 inclusive)
    """
    d = expected.shape[1]
    M = expected.shape[0]
    N = real.shape[0]
    # numba doesn't support asserts, smh
    # assert expected.shape == (M, d), expected.shape
    # assert real.shape == (N, d), real.shape
    closest_pts = np.zeros_like(real)
    closest_dists = np.full((N,), np.inf)
    closest_segment_ind = np.zeros((N,), dtype=np.int32)
    for i in range(N):
        p = real[i]
        for j in range(0, M - 1):
            v = expected[j]
            w = expected[j + 1]
            r = closest_point_on_segment(v, w, p)
            cur_dist = (r - p).T @ (r - p)
            if cur_dist < closest_dists[i]:
                closest_dists[i] = cur_dist
                closest_pts[i] = r
                closest_segment_ind[i] = j
    return closest_pts, closest_dists, closest_segment_ind

wooden sail Mar 13, 2023, 4:09 PM

#

ah man i had that in mind as well

#

that's pretty clean

muted crypt Mar 13, 2023, 4:09 PM

#

looks extremely clean

#

clean^2

wooden sail Mar 13, 2023, 4:09 PM

#

so you're simply doing orthogonal projections

#

super nice

#

i like it. it won't work if rotations are involved, but otherwise this is fantastic

muted crypt Mar 13, 2023, 4:11 PM

#

therefore, with this method, increasing the number of points to infinite would compute the area between the two paths right?

wooden sail Mar 13, 2023, 4:11 PM

#

btw what shape are the vectors? numpy ignores the T if the arrays are 1 dim

wooden sail Mar 13, 2023, 4:11 PM

#

muted crypt therefore, with this method, increasing the number of points to infinite would ...

no, it still iterates over the segments

tidal bough Mar 13, 2023, 4:12 PM

#

wooden sail btw what shape are the vectors? numpy ignores the T if the arrays are 1 dim

they are 1d, this is just dot product

wooden sail Mar 13, 2023, 4:12 PM

#

aight. i appreciate the clarity it gives tho

tidal bough Mar 13, 2023, 4:12 PM

#

muted crypt therefore, with this method, increasing the number of points to infinite would ...

there's no parameters to tweak in this solution, except I guess the number of points in the real trajectory

muted crypt Mar 13, 2023, 4:12 PM

#

so how would you get a fairly accurate measurement of the deviation?

tidal bough Mar 13, 2023, 4:12 PM

#

so you could take as the loss function, say, the mean squared distance to closest point

#

which would be, like, just (closest_dists**2).mean().

wooden sail Mar 13, 2023, 4:13 PM

#

that's the one. what reptile did is turn the problem into one of ray-tracing. the number of points on the true trajectory is essentially the number of rays, so the points cannot be increased to infinity

tidal bough Mar 13, 2023, 4:13 PM

#

huh, raytracing?

wooden sail Mar 13, 2023, 4:13 PM

#

you did it without noticing 😛 the problem is equivalent

muted crypt Mar 13, 2023, 4:14 PM

#

oh I though it was done otherwise as there aren't many red dots in the graph

tidal bough Mar 13, 2023, 4:14 PM

#

if I drew 2000 lines on a 3d plot, matplotlib would definitely have crashed my jupyter again 😛

#

they're drawn for every 20th point of the real trajectory

muted crypt Mar 13, 2023, 4:15 PM

#

okay! that got me confused but makes total sense!

#

so I see that doing it this way the problem about the time shift is avoided

#

yet I guess it would be nice to find a way to stretch/scale the trajectory to see how it improves

tidal bough Mar 13, 2023, 4:18 PM

#

https://paste.pythondiscord.com/oyapexafiy.py
Full notebook. the function takes* 30ms on the real 3d trajectory and 600 microseconds on my initial made-up 2d one

after compiling, which takes ~5s each time the function is called with a previously-unseed combination of input types

#

# I'll just trust ry that this is a valid approximation, lol:
real_df[["lat", "lon"]] = real_df[["lat", "lon"]] / 0.0000089987192
intended_df[["FPLlat", "FPLlon"]] = intended_df[["FPLlat", "FPLlon"]] / 0.0000089987192

also the way it transforms into euclidean space is ^

muted crypt Mar 13, 2023, 4:24 PM

#

Crazy fast! Almost feel unbelievable!

tidal bough Mar 13, 2023, 4:28 PM

#

i'm not sure how to take time into account in any way, here

#

maybe just add time as the fourth coordinate, with some coefficient depending on how important it is that the bot passes the points at the right times

#

waiting for the day numba starts supporting 3.11. or codon starts supporting windows, I guess. or maybe I should see if cython can be used as easily as numba, I guess.

#

https://github.com/numba/numba/issues/8304#issuecomment-1456646575

the packages for 3.11 should arrive early next week as @cbouss will be able to acknowledge [...] chances are good to get a release candidate next week or the week after
oh damn, nice

copper zodiac Mar 13, 2023, 4:43 PM

#

Chatterbot is being not very poggers

wooden sail Mar 13, 2023, 4:44 PM

#

tidal bough <https://github.com/numba/numba/issues/8304#issuecomment-1456646575> > the packa...

oh sweet

copper zodiac Mar 13, 2023, 4:59 PM

#

Can someone explain why ChatterBot dependency install craps itself?

tidal bough Mar 13, 2023, 4:59 PM

#

in what way?

copper zodiac Mar 13, 2023, 5:06 PM

#

Hold on lemme pip install it

#

It takes a while

#

It gets stuck on building Spacy dependencies and then spits out a bunch of errors

merry fern Mar 13, 2023, 5:10 PM

#

serene scaffold sorry, but I don't think I can help with this.

hey, i got a solution if you were curious. here's how it looks:

{
'Cedant' : [[cedant for acct in premium_para.columns[1:3] \
  if premium_para.loc[(scenario, cedant),acct] != 0] for cedant in cedant], \
'Account' : [[(cedant + ': ' + acct) for acct in premium_para.columns[1:3] \
  if premium_para.loc[(scenario, cedant),acct] != 0] for cedant in cedant]
}```

copper zodiac Mar 13, 2023, 5:14 PM

#

Aha here we go

#

Do you want me to send it as a text file so I don't bloat the chat?

arctic wedgeBOT Mar 13, 2023, 5:16 PM

#

Hey @copper zodiac!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

copper zodiac Mar 13, 2023, 5:19 PM

#

nevermind it's way too long

deft harbor Mar 13, 2023, 5:56 PM

#

Can anyone think of some code that would generate synthetic data showing multiple chains in MCMC converging? I would like to create a plot like the one below, without having to try a bunch of different PyMC models. It is being used for a written example of convergence, but I'm struggling with trying to create 5-10 (x_i, y_i) that come together as x gets larger.

misty lava Mar 13, 2023, 6:05 PM

#

Hello,

Currently using Tweepy 4.13.
my output of stream.filter() is attached, as you can see referenced_tweets type: replied_to and quoted are showing, I only want to see original tweets.

stream = MyStream(bearer_token=credentials.BEARER_TOKEN)

# CLEARS RULESET BEFORE STREAMING DATA
# for rule in stream.get_rules().data:
#         stream.delete_rules(rule.id)
# ADDING RULES TO RULESET TO STREAM SPECIFIC DATA
stream.add_rules(tweepy.StreamRule('"$ETH" -is:retweet'))

#START STREAM
stream.filter(expansions=["author_id",],tweet_fields=["created_at","referenced_tweets","lang","attachments"])

Here is the MyStream

class MyStream(tweepy.StreamingClient):

    # DISPLAYS "CONNECTED" ONCE STREAM IS CONNECTED
    def on_connect(self):        
        print("Connected") 

    # AVOID RETWEETED TWEETS, NON-ENGLISH TWEETS AND TWEETS WITH ATTACHMENTS, ONLY ORIGINAL ENGLISH TWEETS WITH NO ATACHMENTS ARE STORED 

    def on_tweet(self,tweet):
        # if tweet.referenced_tweets != None or tweet.lang != "en" or tweet.attachments != None:
        #     return True
        if tweet.referenced_tweets is None:
            return True
        if tweet.lang !="en":
            return True
        if tweet.attachments is None:
            return True
        print(tweet.data)

hasty mountain Mar 13, 2023, 6:27 PM

#

Guys, can someone help me with unsupervised learning in Neural Networks?
I know that the idea is to make the neural network to work like a tSNE or a PCA, reducing the information entropy before passing such information to the classifier layers. I've also seen that the ideal method for working with unlabeled data is to pretrain a neural network in unsupervised learning mode, apply supervised fine-tuning with labels available and only then apply self-learning to generate pseudolabels that can be incorporated to the dataset.

Problem is...I'm having the impression that the supervised fine-tuning is actually sabotaging my model performance somehow. The losses doesn't decrease that much, and the consistency loss(MSE between 2 different outputs generated from the same input) appear to be increasing.

Is this normal or I'm doing something wrong?

#

PS: I'm using some information bottleneck in my feature extractor last layer(from 18,432 features, the net has to extract 128), and dropout of 20%, which doesn't seem to be enough to prejudice the model, but idk.

bold timber Mar 13, 2023, 7:12 PM

#

Hello guys, anyone enlightens me why I get a warning like this?

In this case, I want to build a model for classifying disaster tweets. In this case, I build a hybrid embedding model in which I use a universal-sentence-embedding pretrained model for token-level embedding and LSTM for character-level-embedding

For efficiency, I leveraged a number of methods from the tf. data API which is I combine characters and tokens into a dataset and also turn it into a PrefetchDataset of batches.

The complete warning is like this:
"WARNING:absl:Found untraced functions such as lstm_cell_1_layer_call_fn, lstm_cell_1_layer_call_and_return_conditional_losses, lstm_cell_2_layer_call_fn, lstm_cell_2_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading."

I also used ModelCheckpoint callbacks, but after training the model, I can't load the best model performance. Why did it happen? can you guys enlighten me?

indigo cove Mar 13, 2023, 8:18 PM

#

Anybody knows what to do with this error?

#

This happens when running pycharm

charred light Mar 13, 2023, 8:19 PM

#

indigo cove This happens when running pycharm

You are passing None into one of your variables.

indigo cove Mar 13, 2023, 8:21 PM

#

Thank you!

mint palm Mar 13, 2023, 9:03 PM

#

#

deft harbor Mar 14, 2023, 1:13 AM

#

hasty mountain Guys, can someone help me with unsupervised learning in Neural Networks? I know ...

Try an autoencoder and then use the latent variables?

hasty mountain Mar 14, 2023, 1:14 AM

#

deft harbor Try an autoencoder and then use the latent variables?

Yes, that would be the default idea, but I'm testing the idea from a paper that actually used a ResNet-18

#

I hope the problem is simply adjust how many epochs the model will make before going to fine-tuning and to self-learning...

#

I've ran another test. The first fine-tuning seems to be ok, as well as the pseudolabels generation.
The problem is...the pseudolabels generation is going fine, but it seems it may not be fine enough, which might be causing trouble...

#

At least, this is my guess pithink

#

Oh...nevermind... I just saw that, instead of sorting my losses from the lowest to highest, I was sorting them from highest to lowest...and then incorporating the worse pseudolabels into the dataset... py_guido

Remember kids: sleep for at least 8 hours each night, otherwise you might get dumb

deft harbor Mar 14, 2023, 1:18 AM

#

That isn't something I would have guessed. Glad you resolved it though.

trail cloud Mar 14, 2023, 4:44 AM

#

hello
Does anybody know how to prevent Jupyter Kernel from wrapping text/plain output?
From get_iopub_msg, kernel returns content whose text/plain is wrapped, which is unwanted

rugged vale Mar 14, 2023, 7:13 AM

#

for pandas is there a naming convention for masks

young granite Mar 14, 2023, 10:32 AM

#

rugged vale for pandas is there a naming convention for masks

fully up to u

#

I stumbled across SHAP Multi Output Regression Model (https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Multioutput Regression SHAP.html?highlight=multi output) and want to implement it.
In the end i would like to achive a plot like for classes (https://shap.readthedocs.io/en/latest/example_notebooks/image_examples/image_captioning/Image Captioning using Azure Cognitive Services.html) so to say a heatmap for each feature influence. Is there a direct solution to that problem or do i need to use the shap.data and code it myself? 😄

regal zephyr Mar 14, 2023, 10:47 AM

#

Does anyone know where I can find an open source dataset for DNA STR loci and Bio Markers for predicting medical info, like the probability of having a specific disease etc...

young granite Mar 14, 2023, 10:56 AM

#

regal zephyr Does anyone know where I can find an open source dataset for DNA STR loci and Bi...

kaggle.com

regal zephyr Mar 14, 2023, 10:57 AM

#

young granite kaggle.com

I need something more professional and used by researchers

young granite Mar 14, 2023, 10:58 AM

#

regal zephyr I need something more professional and used by researchers

prob. got to buy the set then

meager venture Mar 14, 2023, 3:45 PM

#

We were searching for something like FastAPI for the Kafka-based service we were developing, but couldn’t find anything similar. So we shamelessly made one by reusing beloved paradigms from FastAPI and we shamelessly named it FastKafka. The point was to set the expectations right - you get pretty much what you would expect: function decorators for consumers and producers with type hints specifying Pydantic classes for JSON encoding/decoding, automatic message routing to Kafka brokers and documentation generation.

Please take a look and tell us how to make it better. Our goal is to make using it as easy as possible for someone with experience with FastAPI.

https://github.com/airtai/fastkafka

GitHub

GitHub - airtai/fastkafka: FastKafka is a powerful and easy-to-use ...

FastKafka is a powerful and easy-to-use Python library for building asynchronous web services that interact with Kafka topics. Built on top of Pydantic, AIOKafka and AsyncAPI, FastKafka simplifies ...

versed gulch Mar 14, 2023, 6:20 PM

#

Does anyone know how to append an array (2D) to a 3D array i.e. i have a 3D array of 98x100x100 and want to append 2 2D zero arrays of size 100x100 at the beginning and the end of my 3D array giving a final shape of 100x100x100?

wooden sail Mar 14, 2023, 6:22 PM

#

appending is not really a thing for numpy arrays, but you can make a new one

#

something like

new_array = np.zeros((100,100,100), dtype=your_dtype)
new_array[1:100,:,:] = old_array
new_array[0,:,:] = some_2D_array
new_array[-1,:,:] = some_other_2D_array
``` where you can automate the 100s by using the shape of the old_array

late herald Mar 14, 2023, 6:27 PM

#

is anyone familiar with huggingface? i need some help with training custom data with hugging face model

heavy crow Mar 14, 2023, 7:07 PM

#

any of you guys gotten a chance to play around with the llama models? i've been experimenting with the quantized 4bit models and it seems promising!

strange igloo Mar 14, 2023, 7:10 PM

#

Hi Everyone - what is a good blog for data analytics that isn't Medium? Perhaps something long running and established with good credibility?

low mason Mar 14, 2023, 7:17 PM

#

Is there a decorator to vectorize simple classes, or classes composed of base types + vectorizable classes? For example, I have code like this:

class WorkerState:
def init(self):
self.has_speed = False
self.has_wings = False
self.has_food = False
self.is_bot = False

class TeamState:
def init(self):
self.eggs = 2
self.berries_deposited = [False for _ in range(12)]
self.workers = [WorkerState() for _ in range(4)]
I've manually written code like this:

def vectorize_worker(worker: WorkerState) -> np.ndarray:
return np.array([worker.is_bot, worker.has_food, worker.has_speed, worker.has_wings], float)

def vectorize_team(team_state: TeamState) -> np.ndarray:
parts = [[float(team_state.eggs)], np.array(team_state.berries_deposited, float)]
for worker in team_state.workers:
parts.append(vectorize_worker(worker))
return np.concatenate(parts)

But it's pretty rote and something that could be handled automatically by a not all that smart library. Does such a library exist?

misty flint Mar 14, 2023, 7:57 PM

#

FYI, i just got an email about this:

#

heres the link if anyone is interested. its basically happening now https://www.youtube.com/live/outcGtbnMuQ?feature=share

YouTube

OpenAI

GPT-4 Developer Livestream

Join Greg Brockman, President and Co-Founder of OpenAI, at 1 pm PT for a developer demo showcasing GPT-4 and some of its capabilities/limitations.Join the co...

▶ Play video

#

ZoomEyes

mild dirge Mar 14, 2023, 8:30 PM

#

misty flint FYI, i just got an email about this:

We're all going to die

#

It reads images now

coral cradle Mar 14, 2023, 8:33 PM

#

does anyone know any api that can give me the real exchange rate between 2 countries. I don't want to use the nominal exchange rate

misty flint Mar 14, 2023, 8:35 PM

#

mild dirge It reads images now

bro it read his sloppy napkin pseudocode and created frontend code to display in the browser

#

im dead

#

💀

mild dirge Mar 14, 2023, 8:35 PM

#

Pretty awesome yeah. The examples they showed were 100% cherry picked, but it's still super impressive.

tacit basin Mar 14, 2023, 8:36 PM

#

Demo was impressive. But it was demo so ... 🙂

#

Apparently bing chat runs on this. Not image part that is

misty flint Mar 14, 2023, 8:37 PM

#

i want to try that napkin trick though

tacit basin Mar 14, 2023, 8:37 PM

#

https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI’s-GPT-4

misty flint Mar 14, 2023, 8:37 PM

#

can i write pseudocode too while drinking my morning coffee

#

coffeevee

#

the image part isnt released yet though

#

which is a bummer CL5_FeelsBongoMan

#

oh well

#

signing up for the waitlist anyway

#

📝

tidal bough Mar 14, 2023, 8:56 PM

#

versed gulch Does anyone know how to append an array (2D) to a 3D array i.e. i have a 3D arra...

np.concatenate should be able to do it - it's what you use for joining a sequence of arrays along an existing axis

wooden sail Mar 14, 2023, 8:57 PM

#

i somehow always forget that exists

#

but it's still important to note that that does not append either. just for completeness

tidal bough Mar 14, 2023, 9:03 PM

#

you mean, unlike np.append? 😉

#

i know what you mean though, it's true that none of them grow the array inplace.

#

the docs really want to make sure the reader understands this

hasty mountain Mar 14, 2023, 9:15 PM

#

mild dirge It reads images now

Papers

#

I want papers

#

brainmon

#

I don't care about OpenAI's code. They're a mess. I want the concepts brainmon

misty flint Mar 14, 2023, 9:23 PM

#

hasty mountain I want **papers**

https://cdn.openai.com/papers/gpt-4.pdf

hasty mountain Mar 14, 2023, 9:23 PM

#

misty flint https://cdn.openai.com/papers/gpt-4.pdf

Thanks! brainmon

misty flint Mar 14, 2023, 9:24 PM

#

~100 pages. have fun with that

hasty mountain Mar 14, 2023, 9:25 PM

#

Aw...no funny title? What happened to Radford? grumpchib

charred light Mar 14, 2023, 9:40 PM

#

HAHA

#

Also, the real notes are in the appendix.

hasty mountain Mar 14, 2023, 9:47 PM

#

Nice

#

py_guido 🔥

#

Too bad I still couldn't manage to make my own language model. I'd really like to better check all those problems people complain about those models

charred light Mar 14, 2023, 9:53 PM

#

The main paper is only like 14 pages. The rest are part of the appendix, covering various prompts.

#

I just realized they have like 300+ people working on this. Jesus

cyan basalt Mar 14, 2023, 9:55 PM

#

anyone here recommends any courses to get into ai?

charred light Mar 14, 2023, 9:55 PM

#

Makes much more sense now why it's so flushed out lol

cyan basalt Mar 14, 2023, 9:57 PM

#

cyan basalt anyone here recommends any courses to get into ai?

could be videos on yt or anything like that, i already know a bit of python, and would like to deep a bit deeper into it

tidal bough Mar 14, 2023, 9:57 PM

#

oh interesting, gpt-4 comes already RLHFed

charred light Mar 14, 2023, 9:58 PM

#

Yea, they also have a whole section on prompts that are allowed, not allowed. "Injection style"(For lack of a better term) attacks to bypass prompts

hasty mountain Mar 14, 2023, 9:59 PM

#

Prompt injection?
Did they try the Do Anything Now protocol? hyperlemon

#

It seems the folks from Reddit are handling chatGPT quite effectively with that

charred light Mar 14, 2023, 10:02 PM

#

The images are the most interesting to me. I'll look up how they actually do this later. But to be able to draw meaning from the image is pretty cool.

hasty mountain Mar 14, 2023, 10:04 PM

#

Ugh...it's quite cool, indeed. I just get a bit sad when I think about how much computational power that may require...

tidal bough Mar 14, 2023, 10:04 PM

#

i love how they mention in their paper the assorted reasons to expect an AI to murder us all and are like "so anyway, we decided to scale it some more to see what happens"

charred light Mar 14, 2023, 10:05 PM

#

More like people using AI to murder people. They have a section on politics (e.g. creating propaganda @ specific age group).

hasty mountain Mar 14, 2023, 10:05 PM

#

tidal bough i love how they mention in their paper the assorted reasons to expect an AI to m...

lol

tidal bough Mar 14, 2023, 10:06 PM

#

charred light More like people using AI to murder people. They have a section on politics (e.g...

Yeah, sadly they only kinda sorta pretend to care about the killing-everyone thing (see also their recent press release, "planning for AGI and beyond").

charred light Mar 14, 2023, 10:10 PM

#

There is no way they care about the humanities side of this model. I JUST realized how many people worked on this project (p15-17). At low balling ~100k salary each person, that's a lot of money sunk into this project. Monetization and reaping that money back is going to come first.

hasty mountain Mar 14, 2023, 10:11 PM

#

charred light There is no way they care about the humanities side of this model. I JUST realiz...

Hello Microsoft!

charred light Mar 14, 2023, 10:12 PM

#

My main concern is when the barrier to entry in accessing AI like this becomes zero. Going to make the internet a lot muddier.

hasty mountain Mar 14, 2023, 10:13 PM

#

It'll be a quite interesting game of cat and mouse...probably there'll be also models to detect when text was AI-generated

charred light Mar 14, 2023, 10:13 PM

#

There's one prompt on how to build a bomb. (Although I wouldn't trust chatgpt not to have pulled that from some joke website that causes harm to the person attempting to build it.)

hasty mountain Mar 14, 2023, 10:13 PM

#

Speaking of which...I still have to test a Text GAN...

tidal bough Mar 14, 2023, 10:16 PM

#

hasty mountain It'll be a quite interesting game of cat and mouse...probably there'll be also m...

so far the existing ones are pretty bad (they start having a lot of false positives once the text you're scanning is unusual, e.g. fiction). Maybe if they finetune the publicly-accessible models to intentionally write in a style that's recognizable to the recognizer model (which does not necessarily mean looking strange to a human).

hasty mountain Mar 14, 2023, 10:17 PM

#

tidal bough so far the existing ones are pretty bad (they start having a lot of false positi...

Perhaps one could make a Text GAN...and use the discriminator to be that model

#

pithink

#

Poor discriminators...always so neglected...maybe it's their time to shine

tidal bough Mar 14, 2023, 10:19 PM

#

wow, the section comparing early and release gpt-4 is pure gold

charred light Mar 14, 2023, 10:20 PM

#

Yep, 100% worth reading through it

#

Or skimming it*

charred light Mar 14, 2023, 10:21 PM

#

misty flint ~100 pages. have fun with that

Skim through it EYES

bold timber Mar 14, 2023, 10:38 PM

#

bold timber Hello guys, anyone enlightens me why I get a warning like this? In this case, I...

does anyone enlighten me about this?

cyan basalt Mar 14, 2023, 10:52 PM

#

cyan basalt could be videos on yt or anything like that, i already know a bit of python, and...

...

hasty mountain Mar 14, 2023, 10:54 PM

#

cyan basalt ...

https://www.youtube.com/@NeuralNine

YouTube

NeuralNine

NeuralNine is an educational brand focusing on programming, machine learning and computer science in general! Let's develop brains!

◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
🐍 The Python Bible Book: https://www.neuralnine.com/books/
👕 Programming Merch: https://www.neuralnine.c...

misty flint Mar 14, 2023, 10:54 PM

#

charred light Skim through it <:EYES:452475619339665419>

ugh i dont wanna Running

hasty mountain Mar 14, 2023, 10:54 PM

#

There's also a guy more focused on maths that folks here tend to recommend...I don't remember what channel...

charred light Mar 14, 2023, 10:56 PM

#

3blue1brown: https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw

frozen bloom Mar 14, 2023, 11:28 PM

#

hey am trying to predict Mortalité hosp

import pandas as pd
import numpy as np
from google.colab import drive
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import classification_report

Mount Google Drive

drive.mount('/content/drive')

Load Excel file into dataframe

df = pd.read_excel('/content/drive/MyDrive/Classeur2_enfants.xlsx')
df.fillna(value=0, inplace=True)

Define the independent variables that you want to use to predict mortality

X = df[['num ', 'age ', 'sexe ', 'ATCDS ', 'AAR', 'RAA', 'Dyspnée', 'ICD', 'ACFA', 'IM isolée ', 'stade ', 'MM à IM prédom', 'stade .1', 'SOR ', 'grade ', 'FE %', 'FE ', 'PAPS ( mmhg)', 'grade (paps)', 'IT ', 'I,Ao', 'autres anomalies ', 'CAV complet', 'CAV partielle', 'CIA os ', 'CIV ', 'annuloplastie Mitrale', 'Plastie de KAY', 'Commissurotomie', 'Elargissement du feuillet post de la valve', 'fermeture du cleft', 'DEVEGA', 'autre PT ', 'RVAo', 'PVAo', 'fermeture de CAV complet ', 'fermeture de CAV partiel ', 'fermeture de CIA os', 'fermeture de CIV', 'CEC (min)', 'Clampage (min)', 'Mortalité hosp', 'décès précoce ']]

Define the target variable that you want to predict

y = df['Mortalité hosp']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lr_cv = LogisticRegressionCV(cv=5)
lr_cv.fit(X_train, y_train)

y_pred = lr_cv.predict(X_test)
print(classification_report(y_test, y_pred))

arctic wedgeBOT Mar 14, 2023, 11:29 PM

#

Hey @frozen bloom!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

frozen bloom Mar 14, 2023, 11:29 PM

#

some one can help me .

serene scaffold Mar 14, 2023, 11:59 PM

#

misty flint ~100 pages. have fun with that

inb4 ask chatgpt to explain

hasty mountain Mar 15, 2023, 12:00 AM

#

lol. True

#

Then it proceeds to explain a random paper from 2015

serene scaffold Mar 15, 2023, 12:04 AM

#

frozen bloom some one can help me .

thanks for showing the code. but you have to say what the problem is

mild dirge Mar 15, 2023, 12:05 AM

#

charred light The images are the most interesting to me. I'll look up how they actually do thi...

hahaha

#

Yeah it is really cool that they can extract information from images now as well with gpt

serene scaffold Mar 15, 2023, 12:08 AM

#

I don't really get how the same model could use both

#

guess I'll have to read the paper

misty flint Mar 15, 2023, 12:09 AM

#

serene scaffold inb4 ask chatgpt to explain

bro you shouldve seen the demo. kinda wild lol

limpid saddle Mar 15, 2023, 12:59 AM

#

Hello!
Can someone give me an idea on how to deal with images from hugging face?

To give a better idea, I want to access this dataset: https://huggingface.co/spaces/competitions/aiornot

I first loaded the dataset by doing

ds = load_dataset('competitions/aiornot')```

and `ds` would print out:

```py
DatasetDict({
    test: Dataset({
        features: ['id', 'image', 'label'],
        num_rows: 43442
    })
    train: Dataset({
        features: ['id', 'image', 'label'],
        num_rows: 18618
    })
})```

#

I am not sure what to do with this. I know I can access the nth image by doing

ds['train'][0]["image"]```

But what should the next step be? I was looking to convert the images to an np array to be able to deal with them and feed them to the CNN model, but I am not sure if that's the right thing to do. Is that even necessary?

#

Also, I'll be using tf

tacit basin Mar 15, 2023, 4:13 AM

#

cyan basalt anyone here recommends any courses to get into ai?

My default recommendation is practical deep learning by fastai https://course.fast.ai/ you will find prerequisites there

Practical Deep Learning for Coders - Practical Deep Learning

A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.

thorn bobcat Mar 15, 2023, 10:15 AM

#

anyone here used tesseract-ocr before?

mild dirge Mar 15, 2023, 12:10 PM

#

thorn bobcat anyone here used tesseract-ocr before?

question?

mild dirge Mar 15, 2023, 12:11 PM

#

thorn bobcat anyone here used tesseract-ocr before?

If you just asked the question, you probably would have already received an answer by now 😛

mild dirge Mar 15, 2023, 2:32 PM

#

Bit off-topic, but south-park has a new episode on chatgpt, it's pretty funny

drifting kelp Mar 15, 2023, 4:12 PM

#

How can I use pytables to write big matrices (60000 X 60000) and make operations with it?

novel python Mar 15, 2023, 4:28 PM

#

what's the easiest way to drop all rows in a dataframe that a column in that dataframe contains values from another column from another dataframe and lenght doesn't match? I tried using a for loop:

for i in range(0, len(df)):
    print(i)
    if (df['Created By: Full Name'].iloc[i] in inactive_people['Full Name'].to_list()):
        df = df.drop(i, axis=0)

but for some reason it gives me IndexError: single positional indexer is out-of-bounds at some point, which I don't understand why since I reseted df indexes before running this.

boreal gale Mar 15, 2023, 4:39 PM

#

novel python what's the easiest way to drop all rows in a dataframe that a column in that dat...

oops, i made a typo. here is my message again.
once you dropped even one row, then length of the dataframe (aka the number of rows) is no longer the same, the length is less than before your drop operation, hence df['Created By: Full Name'].iloc[i] is guaranteed to blow up since i could be up to the original length - 1 because for i in range(0, len(df)):

i think you just want df[~df['Created By: Full Name'].isin(inactive_people['Full Name'])]?

novel python Mar 15, 2023, 4:40 PM

#

boreal gale oops, i made a typo. here is my message again. once you dropped even one row, th...

ok I feel stupid. That worked perfectly, lmao. Thanks a lot!

#

had no idea pandas had such thing, that's what I get for missing the basics

sleek shuttle Mar 15, 2023, 4:50 PM

#

Hi guys, I have a question about how to tokenize a text. is it better to use nltk or spacy?
Thanks in advance

mossy lance Mar 15, 2023, 4:53 PM

#

novel python had no idea pandas had such thing, that's what I get for missing the basics

another thing to keep in mind, if you are writing or trying to iterate over a dataframe then theres usually a better vectorised method of doing it which will be faster

olive stone Mar 15, 2023, 6:41 PM

#

Hey
I am trying to train a model on Google Colab, the training goes through 4,000 images. But when training, Colab crashes because of running out of RAM.
I tried to use batches, but it didn't work.
Any idea?

wooden sail Mar 15, 2023, 6:46 PM

#

if smaller batches don't help, you can try reducing the number of layers

bright pasture Mar 15, 2023, 6:48 PM

#

Someone told me to remove a ddp line due to the code I have assuming that it's training n multiple GPU's.

#

What do I do?

mild dirge Mar 15, 2023, 6:54 PM

#

bright pasture What do I do?

First find out what is causing the issue. Check how much memory the model takes up. Then check how much memory a batch takes up. Also find how much memory is available

#

The first thought that goes through my mind is that you might have big images, and maybe only 2-3 convolutional/pooling layers which makes for a very large weight matrix for the first dense layer

bright pasture Mar 15, 2023, 6:59 PM

#

mild dirge First find out what is causing the issue. Check how much memory the model takes ...

I... did not understand a word you said, I'm sorry. Basically, I'm trying to run this. https://github.com/justinjohn0306/so-vits-svc-4.0-v2

I believe the train.py thing assumes that I'm doing multi gpu training, but I'm not. I only have one GPU.

GitHub

GitHub - justinjohn0306/so-vits-svc-4.0-v2: SoftVC VITS Singing Voi...

SoftVC VITS Singing Voice Conversion. Contribute to justinjohn0306/so-vits-svc-4.0-v2 development by creating an account on GitHub.

mild dirge Mar 15, 2023, 7:01 PM

#

bright pasture I... did not understand a word you said, I'm sorry. Basically, I'm trying to run...

Oh whoops sorry, pinged the wrong person, was meant for @olive stone

bright pasture Mar 15, 2023, 7:02 PM

#

All good. Would you be able to help me too?

mild dirge Mar 15, 2023, 7:03 PM

#

I have never used the model that you linked. If following the instructions gave an error, and the code is thousands of lines of code, I'm not sure how to fix it either :/

wild rivet Mar 15, 2023, 7:54 PM

#

Any Risk Analysts here?

grand warren Mar 15, 2023, 8:18 PM

#

hi i am trying to make an ocr. and my plan to do it is by first thresholding the image and then seperating the letters by using cv2 and then predicting the seperated letters but the seperating letters parts is not working very well. what can i do to seperate the letters?

hasty mountain Mar 15, 2023, 8:21 PM

#

grand warren hi i am trying to make an ocr. and my plan to do it is by first thresholding the...

You could try a model for object recognition.

grand warren Mar 15, 2023, 8:22 PM

#

i wanna make my own model tho

hasty mountain Mar 15, 2023, 8:24 PM

#

Make your own object recognition model

#

joe_salute

grand warren Mar 15, 2023, 8:24 PM

#

is it really that hard T^T

mild dirge Mar 15, 2023, 8:24 PM

#

You could try a clustering algorithm

grand warren Mar 15, 2023, 8:24 PM

#

whats that?

hasty mountain Mar 15, 2023, 8:24 PM

#

There might be something here that may help you:
https://learnopencv.com/?s=OCR

Optical Character Recognition using PaddleOCR

Optical Character Recognition is the process of recognizing text from an image by understanding and analyzing its underlying patterns. We will implement and compare various OCR algorithms provided by PaddleOCR

mild dirge Mar 15, 2023, 8:25 PM

#

Well after thresholding, you have hopefully a bunch of black letters on a white background f.e. and then you just want to find clusters of black pixels

#

And those clusters are then cropped out into separate images

hasty mountain Mar 15, 2023, 8:25 PM

#

I just discovered this LearnOpenCV and...well, they got the only diffusion model tutorial that really helped me, so it might be worth taking a look

grand warren Mar 15, 2023, 8:26 PM

#

mild dirge Well after thresholding, you have hopefully a bunch of black letters on a white ...

hmm

mild dirge Mar 15, 2023, 8:26 PM

#

But if you want actual good results, you want to use a premade model

grand warren Mar 15, 2023, 8:26 PM

#

but what will i end up learning if i do that?

mild dirge Mar 15, 2023, 8:27 PM

#

Well I don't know why you are making it. If you make it to learn, then yeah obviously maybe try making your own. If you are making it because you need such a model, then its better to use a premade model.

grand warren Mar 15, 2023, 8:27 PM

#

no not that i need it

#

clustering kind of sounds interesting

#

is it an ai model too?

#

which one would suit my job?

mild dirge Mar 15, 2023, 8:35 PM

#

Well I didn't mean clustering in it's conventional meaning. More like using a flood fill for every black pixel you find.

#

And after finding all the connected black pixels, you crop it

grand warren Mar 15, 2023, 8:38 PM

#

hmm

steady basalt Mar 15, 2023, 9:06 PM

#

misty flint bro it read his sloppy napkin pseudocode and created frontend code to display in...

We redundant rip humans

stuck ore Mar 16, 2023, 2:14 AM

#

hey hey hey does anybody know how i can use numpu.poly1d to output an expression

#

@ me if you do ! thank you !

violet gull Mar 16, 2023, 2:44 AM

#

this is AlexNet

#

what does the 3rd dimension represent

#

i thought it was the number of feature maps but it cant be because how can 384 turn into 256

hasty mountain Mar 16, 2023, 3:24 AM

#

violet gull i thought it was the number of feature maps but it cant be because how can 384 t...

self.conv = torch.nn.Conv2d(384, 256, 3, 1, 1, bias=False)

I don't remember the details on how the conv layer does it, but it kinda filters the input feature maps

violet gull Mar 16, 2023, 3:25 AM

#

what

lapis sequoia Mar 16, 2023, 3:32 AM

#

Does someone know why I am getting std as nan

stuck ore Mar 16, 2023, 3:41 AM

#

I know this is probably a pretty simple issue but I'm a beginner and would love some help. How can I output an equation with numpu.poly1d?
It is working, but in a way that is not useful. It outputs an equation with the exponents on the line above the equation so it looks like superscript instead of just using a carat and it has improper notation for multiplication. I'm assuming is was intended to be printed and not used later in the script as an actual equation but I need to use it as an actual equation.
This is what it's giving me:
2
-0.01252 x + 1.026 x - 16.14
but this is what I need:
-0.01252 * x**2 + 1.026 * x - 16.14

My code:

xymodels = []
time = [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300]
temp = [17, 41, 66, 300]
for t in time:
    browningRate = [(-0.01217126 * t) + 0.519399, (-0.001115784 * t) - 1.21772,
                    (-0.00361034 * t) + 0.333761, (1134 * t) - 834.9]
    model = np.poly1d(np.polyfit(temp, browningRate, 2))
    print('\n'+ str(t) + ':')
    print(model)
    xymodels.append(model)

flat sable Mar 16, 2023, 4:44 AM

#

hello, thats my solo python project https://github.com/veldanava/AiWaifu

GitHub

GitHub - veldanava/AiWaifu: an smart ai waifu

an smart ai waifu. Contribute to veldanava/AiWaifu development by creating an account on GitHub.

#data-science-and-ml

Mount Google Drive

Load Excel file into dataframe

Define the independent variables that you want to use to predict mortality

Define the target variable that you want to predict