#data-science-and-ml

1 messages · Page 51 of 1

hasty mountain
#

Guys, in the field of drug discovery...
I'm a bit used to dealing with images, so I'd very much like to use chemical formulas in SMILES labels, convert them to vectors and organize these vectors into numpy arrays to somehow simulate a chemical molecular formula of a compound.
However, I've also came to know that there's the option of using molecular graphs(which seems to be a bit crazy to me).

Can someone tell me which approach tends to be more promising?

#

Though this molecular graph seem to be somehow like a multi-dimensional vectorization pithink

lapis sequoia
#

HELL ooo
I think it is funny btw

I am intrested to make bot that suggest me some stuff based on statistic

I guess i think i have TO learn about MACHINE LEARNING.
That might be something else.

where should i start?
Or what else i need to do?

I know i have a google but i cannot interact and cannot get what i wanted.

So... yeah that is all

patent lynx
#

Hello I want to find suggestions on how to fine tune hyperparameters to predict NBA point spread?

#
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=3, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=0.03550000000000002,
             max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=20, max_leaves=3,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=600, n_jobs=-1, num_parallel_tree=None,
             predictor=None, random_state=None, ...)```
#

It fails to capture the fatter tails of my true Y

hasty mountain
#

Or maybe the max expected points for a game/tournament...

patent lynx
#

in each game, my features are already in a rolling mean/median so that I can predict the next game's stats...

#

this is done so i can predict the point spread within a sports betting website

hasty mountain
#

Uh...then maybe try some ensemble learning in another way?

hasty mountain
#

Exactly using GNNs, but with conv operations

#

Dealing with 3 dimensuonal molecular formulas in general, the isomers...

#

Using GANs and VAEs to generate new molecules. All those seem easier to me when dealing with n-dimensional arrays rather than simple vectors.

#

I'm trying to review/enhance TrimNet

#

A quick search (quick one, I still didn't read anything) shows me that molecular graphs are usually represented as structures that resemble molecular formulas. If I can use molecular graphs in n-dimensional arrays, then goodbye SMILES brainmon

limber kiln
#

Why is my torch loss going to NAN -

#
import numpy as np
import torch
from torch import nn
lr = 0.001
epochs = 100
def generate_random():
    # https://stackoverflow.com/questions/35730534/numpy-generate-data-from-linear-function
    x = np.arange(100)
    delta = np.random.uniform(-10, 10, size = (100, ))
    y = .4 * x + 5 + delta
    return x, y

class linear_regression(nn.Module):
    def __init__(self):
        super(linear_regression, self).__init__()
        self.layer = nn.Sequential(nn.Linear(1,1))
    def forward(self, x):
        return self.layer(x)

linear_model = linear_regression()

loss = nn.MSELoss()

opt = torch.optim.SGD(linear_model.parameters(), lr = lr)


x, y = generate_random()
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
print("x = ", x.shape, " y = ", y.shape)
for i in range(epochs):
    x = torch.tensor(x).to(torch.float32)
    y = torch.tensor(y).to(torch.float32)

    pred = linear_model(x)
    model_loss = loss(y, pred)
    with torch.no_grad():
        print("model_loss = ", model_loss)
    opt.zero_grad()
    model_loss.backward()
    opt.step()


#

Can someone please help?
Sorry, I am sure I am doing something really silly

#

Never mind. My learning_rate was high.

serene scaffold
long aspen
#

new to numpy, is there any function that can reshape an item to the same value but in the size of a different dimension?

i seem to not get my toes on...

[1, 2, 3] -> [ [1, 1, 1], [2, 2, 2], [3, 3, 3] ]
np.arange(10).???(???)

slate scroll
#

np.reshape is to change the shape of your current data but it doesn't change the total number of elements

long aspen
wary breach
#

Anyone have experience with multi-layer ensembles?

dawn light
#

i came across this video recently: https://youtu.be/_9LX9HSQkWo where they turned videos of themselves into animation by using stable diffusion (specifically, training SD to a specific art style, and training SD to recognize their faces)
is there any guide out there on how i can do something similar (just img2img tho, not video to video), i.e. i train SD on a set of images which i can then use as a style when entering prompts


i was also wondering if there's a guide out there that gives a high-level overview of SD.
Every time i hear Lora, controlnet, dreambooth, etc. i get confused with what exactly their relation to SD is (not to mention the plethora of github repos of SD and huggingface models).

ANYONE can make a cartoon with this groundbreaking technique. Want to learn how? We made a ONE-HOUR, CLICK-BY-CLICK TUTORIAL on http://www.corridordigital.com/

Watch the full ROCK PAPER SCISSORS anime on Corridor ► https://youtu.be/GVT3WUa-48Y

This project has been a huge labor of love, and it is due to the amazing open-source community that ...

▶ Play video
agile cobalt
#

you might want to try asking in the Stable Diffusion discord server

junior schooner
#

I'm writing a python program that uses sqlite3 to allow users to create, update and view databases.
Thus far users can add data manually or from the web.

I want to add a module for data visualisation (maybe using plotly or pandas) but am unsure how or what i can implement without knowing what the data is.

For example, if the data is categorical I could use a bar chart or heat map, if it's numerical I could use a line chart or scatter plot. I also wouldn't know what headers go on what axis. Can anyone give me some suggestions of what I could implement without this information?

agile cobalt
#

most tools would leave "which header goes on what axis" up to the user to decide

#

some examples out of the top of my head slightly similar to what you are trying to do would be google sheets, excel and mode.com

hasty mountain
#

Hey guys, what's the difference between "vanilla neural networks"(MLP, convolutional...) and Graph Neural Networks?

I mean, when implementing neural networks from scratch, like from pure numpy, I can understand that there'll be no graphs involved. However, the popular deep learning frameworks(Pytorch, tensorflow/keras) use graphs by default, right? I guess that even allows for proper forward and backward pass with custom operations. So is there any difference between a GNN or a MLP or CNN when working with those frameworks?

hasty mountain
#

Hm... So "vanilla NNs" usually require padding to make the data regular, while GNNs don't?

#

Like in NLP models. Since the phrases have different lengths, a padding have to be applied to make all sentences have the same length and so the model can receive them as input

#

Is that it?

hasty mountain
#

I see... So, trying to work with arrays here would be assigning a specific structure to my network, which might be innapropriate...

#

I hope it isn't that much difficult to work with graphs in VAEs and GANs...

young granite
spark nimbus
#

given a numpy array (or pandas dataframe) of datetime64[D], is it possible to change the day on all elements?
My end goal is to get an array or dataframe containing a datetime of the last day of the month, and I already have a numpy array of the number of days in each month.

boreal gale
long charm
#

In Q learning, can the gamestsate change?

#

I’m trying to have the snake from the snake game learn to play efficiently but the state of the gird is constantly changing

velvet bronze
#

Hello Guys I want to get into Machine Learning, I just started Numpy, I need a roadmap🥹

steep cypress
steep cypress
velvet bronze
steep cypress
simple tapir
#

Can i flatten a tensor more than once?

#

I tried flattening one twice but it still has the same shape

serene scaffold
simple tapir
#

Alright, thanks

wooden sail
#

what were you expecting to happen when calling flatten more than once?

hardy bramble
#

Hii, anyone knows how to generate a graph from a map like this, only the orange border, i've tried with opencv and networkx but is not working

median escarp
#

Does space science fall under this channel?

serene scaffold
median escarp
#

Ic.. actually Im working with ISS based calculations. And other things. Eg-Tracking satellites

serene scaffold
#

if people have to know (for example) astropy, it's less likely that you'll get help

merry fern
#

how do you grab the column name based on iloc instead of printing the value?
example:

for i in [level 2 multiindex list]:
  df[(level1, i)].iloc[1:3] <---- i want these column names```
serene scaffold
merry fern
#

that's what i ran into...

#

and .name returns the index

serene scaffold
merry fern
#

so what i'm trying to do is create a dataframe based on conditions here...

serene scaffold
#

this looks like it only has one level of indexing; do the rows have more than one level?

merry fern
#

the multiindex is: Scenario, Account

so if I pass a list as the Account, i want to look at those 2 columns (index# 1 and 2 or [1:3]) and do something

serene scaffold
merry fern
#

rows

#

this is how far i got:

serene scaffold
#

please do print(df.head().to_dict()) and put the text in the paste bin

#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

merry fern
#

i want it to return ['MC','FW','MC'] (because I need to throw an IF in there that says if the columns value is 0 to not include it

#

hold on let me drop all that other sht its irrelevant

serene scaffold
#

@merry fern adding a ping to a message after you send it has no effect, just so you know.
please do print(df.sample(10).to_dict()) instead

merry fern
serene scaffold
merry fern
#

this is the entire dataframe

#

well, without the multiindices

#

actually no they're in there ha

#

im pretty new to multiindicies

serene scaffold
#

okay, well, I'm not following. if you can do print(df.sample(10).to_dict()) with the original dataframe within the next five minutes, we can continue.

#

so you deleted the rest of it?

merry fern
#

its irrelevant, not used

serene scaffold
#

also, please do everything as text

merry fern
#

k

serene scaffold
#

the thing is that the solution shouldn't involve any list comprehensions, so we should rewind to before you used them

#

what happened to model_para?

#

can you show that instead?

merry fern
#

df = model_para

#

im using a list comprehension because eventually im inserting the list comp into a dataframe creation

serene scaffold
#

it's very unlikely that the idiomatic solution would involve a list comprehension

merry fern
#

so here's the issue, i need to create a dataframe based on the list that is passed. if the list is 1, then it looks at the df columns MC' and 'FW' to see if the value is non-zero. if its non-zero, then it includes that in a column to be created for the dataframe

#

if the list is 2 then it needs to iterate over the list and make multiplea ccounts

#

right now it only works with 1 which requires no logic to look

serene scaffold
#

sorry, but I don't think I can help with this.

merry fern
#

thanks anyway! im so close...

#

I'm pretty close, I just need to 1) isolate the column name, and 2) produce a list of individuals (4 in this case), rather than a list of 2

young granite
sharp herald
#

How to crop a QR code from a larger photo and decode it with pyzbar? I tried using cv2.QRCodeDetector() from python-opencv but it fails to recognize too.

#

the qrcode is large, version 18

limber kiln
#

Why does make_dot not work here -

# %matplotlib inline

import torch
from torchviz import make_dot

import torchvision
from torchview import draw_graph
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math

a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)

b = torch.sin(a)

c = 2 * b
print(c)

d = c + 1
print(d)

out = d.sum()
print(out)

make_dot(d , params=dict(a.named_parameters())).render("a_torchviz", format="png")
#

Never mind. Got it working -

# %matplotlib inline
import os
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz/bin/'
import torch
from torchviz import make_dot

import torchvision
from torchview import draw_graph
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math

a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)

b = torch.sin(a)

c = 2 * b
print(c)

d = c + 1
print(d)

out = d.sum()
print(out)
make_dot(out.mean()).render("a_torchviz", format="png")

late scarab
#

Greetings! Not sure, the topic is right, but mb sb can help me.

I'm pretty new in big point cloud visualization and met the pretty strange problem last week. I'm using pycharm pro 22.1 on my macbook pro m1pro 16gb. When I render the cloud using open3d, it renders in external window. As I understood, this window is a part of original python and the problem is that, it is always crashed and I have to restart my kernel again and again. Before this, I used plotly and plotly renders directly in notebook (plotly cannot work with 1_000_000+ points). Pls, give an advise, how can I fix it?

P.S. I tried to use this http://www.open3d.org/docs/latest/tutorial/Basic/jupyter.html, but no way, my work stuck for 2 days on this problem.
P.P.S I tried to use open3d on windows pc with 128 gb RAM in my lab and no problems, btw, I can not use server all the time.

digital rover
#

Not sure if I should ask about Pandas here.

Anyone here tried the 2.0 yet? Is the compatibility seamless with the numpy backend version?

patent lynx
#

@hasty mountain as i explored and researched a bit. Do you know a regression models that allows you to set weights on the target variable? For example " values on 2 std away are more important than the mean" this allows my model to be more robust to outliers. So far my best score is around r2 of 0.48 using xgboost using pseudo huber loss

hasty mountain
#

I don't know if I get it. You want to assign a higher weight to values that are further from the mean, with higher standard deviation?

patent lynx
#

Lets stick higher values further from the mean for now maybe my terminology isnt the best...

hasty mountain
#

If you manage to normalize your values such as your mean gets around 0, this might be useful pithink

patent lynx
#

Thanks I'll try it

patent lynx
#

It is bimodal distributed, because it is impossible to see nba game results in a draw.

hasty mountain
#

That's surely a dataset that I don't understand, so...double check if my suggestions make sense

glossy moth
#

Hey! Is there an optimal ratio of positive to negative data when training a model, and does it depend on model type?

lapis sequoia
iron basalt
iron basalt
glossy moth
latent spire
#

where would i go to rent a ai learning based vps

iron basalt
hardy bramble
# lapis sequoia A graph how? Do you want to find a connection with the place and some other vari...

I want to extract the dotted shape, measure the area of the shape and divide it into two equal parts, but I can't extract the shape because opencv recognizes the other icons as shapes too. Or at least I don't know how, I need to do it through images. I have also tried to create a graph with networkx.

This is my code

import cv2
import numpy as np

# Random name
name = datetime.now()
name = "resources/result/" + str(name.timestamp()) + ".jpg"

# Load image
img = cv2.imread('resources/mapa_colonia_2.png')

# Convert BGR to HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# Define the range
lower_red = np.array([0, 50, 50])
upper_red = np.array([10, 255, 255])
lower_red2 = np.array([170, 50, 50])
upper_red2 = np.array([180, 255, 255])

# Create a mask
mask_red = cv2.inRange(hsv, lower_red, upper_red)
mask_red2 = cv2.inRange(hsv, lower_red2, upper_red2)
mask = cv2.bitwise_or(mask_red, mask_red2)

# Mask to original img
res = cv2.bitwise_and(img, img, mask=mask)

# Convert to grayscale
gray = cv2.cvtColor(res, cv2.COLOR_BGR2GRAY)

# Canny Filter
edges = cv2.Canny(gray, 100, 200)

# Find Contours
contours, hierarchy = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Draw on original image
cv2.drawContours(img, contours, -1, (184, 7, 166), 2)

# Save the image
cv2.imwrite(name, img)
iron basalt
hardy bramble
iron basalt
hardy bramble
iron basalt
hardy bramble
iron basalt
#

You can also try applying a blur (to connect the dots).

glossy moth
hardy bramble
iron basalt
iron basalt
glossy moth
hard birch
#

I have wine data and I'm trying to use regression to predict quality

#

the quality data seems to be multimodal and I want to use a regression which is better handled for such tasks

#

I'm really new to this so anyone have advice

#

I'm using sklearn but if tensorflow is better equipped for this please do tell

severe trellis
#

I'm looking for a graphing lib that can create modern/elegant looking graphs. Is matplotlib a good choice for this, here's an example of what I'd consider modern/elegant

wooden sail
#

matplotlib can make all of these, but they won't look as pretty by default

#

maybe check out seaborn (which wraps matplotlib) or plotly if you want something that looks pretty out of the box

severe trellis
#

Ah, I don't mind learning matplotlib, so I'll give the configuration a shot

#

Seems like a really useful skill

wooden sail
hard birch
#

This is the data I have, winequality-red.csv to be exact

#

This is the seaborn plot of the red date

#

data

ripe sapphire
severe trellis
#

How would I go about creating a trendline for a graph where the x axis is just a time (in this case a dict, the key is a datetime.datetime) and the y-axis is the actual numerical value.
All the existing trendlines I've seen seem to depend on a numerical x-axis.

mild dirge
#

As long as the times are uniformly spaced (equal time between values) then you can just calculate a moving average that calculates the average of the last x values.

#

@severe trellis

hasty mountain
#

Hey guys, in CrossEntropyLoss function as defined in Pytorch docs:
It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

What is the idea of this weight argument? The more of a specific class I have in my unbalanced dataset, the lower should be the weight I assign to it?
What improvement does this provide?

#

Oh, wait... I just remembered that I could also make this weight a learnable parameter. brainmon

mild dirge
#

Yeah, so if your optimization is based solely on accuracy, and 99% of your data is of class "apple" for example, then your model will perform very well by just changing the weights such that it will always give apple, even if the image is an orange, because it would still get a 99% accuracy.

#

Changing the weight of each class means that it will make the apple class less important to optimize, such that the model must also optimize getting "orange" right.

hasty mountain
#

Oh... then I guess making this a learnable parameter might allow my model to cheat pithink

mild dirge
#

I'm not sure how that works, changing the parameter that determines how the score is calculated seems weird

hasty mountain
#

I mean, it could simply assign a very high weight to the class it's predicting the most

mild dirge
#

right

#

Another solution is just balancing the data

#

By undersampling/oversampling and augmentation etc.

#

But that could give mediocre results as well

hasty mountain
#

And one model makes the classification, another makes the optimization of those weights in crossentropy

mint palm
dusty valve
#

i made a cnn that takes input shape of (128, 128, 3), and i wanted to test it on an image of myself. i took an image, reshaped to 128, 128, 3 and outputed in plt (128, 128, 3) and the shape of the encoded array also said it was (128, 128, 3), but the error says its (32, 128, 3)

#
shape is (128, 128, 3)
Traceback (most recent call last):
  File "C:\Users\owner\OneDrive\Desktop\python\r-u-a-10\test.py", line 13, in <module>   
    print(np.argmax(model.predict([data])))
  File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler    
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\owner\AppData\Local\Temp\__autograph_generated_file5cgx781m.py", line 15, in tf__predict_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:

    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2137, in predict_function  *
        return step_function(self, iterator)
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2123, in step_function  **  
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2111, in run_step  **       
        outputs = model.predict_step(data)
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\training.py", line 2079, in predict_step
        return self(x, training=False)
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "C:\Users\owner\AppData\Roaming\Python\Python310\site-packages\keras\engine\input_spec.py", line 295, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "r-u-a-10" is incompatible with the layer: expected shape=(None, 128, 128, 3), found shape=(32, 128, 3)```
mild dirge
#

You need to put that image into a list

dusty valve
#

i did

#

print(np.argmax(model.predict([data])))

mild dirge
#

Is that the first layer that gives the error?

dusty valve
#

yes

mild dirge
#

So the shape of what you give it is (1, 128, 128, 3) ?

dusty valve
#

no

mild dirge
#

It should be

dusty valve
#

well yes

mild dirge
#

Is it (1, 128, 128, 3) or (128, 128, 3) ?

woven berry
#

idk if this is the right channel but for matplotlib how do i make it so that the arrow is visible over the axh and axv line?

dusty valve
mild dirge
#

Hmm, well I doubt the model is lying, did you print the shape of the input before the line that predicts it?

dusty valve
#

i dunno where it says 32

mild dirge
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

mild dirge
#

Show the code

dusty valve
#

okay

#
from PIL import Image
import numpy as np
from keras.models import load_model
from keras import Sequential
from matplotlib import pyplot as plt
model: Sequential = load_model('./r-u-a-10')  # type: ignore
image = Image.open(r'C:\Users\owner\Pictures\Camera Roll\TEST.jpg')
data = np.array(image.resize((128, 128)).convert('RGB').getdata(), np.uint8).reshape((128, 128, 3))
image.close()
plt.imshow(data)
plt.show()
print(data.shape)
print(np.argmax(model.predict([data])))```
mild dirge
#

.reshape((128, 128, 3))

mild dirge
mystic aspen
mild dirge
#

That doesn't make sense

dusty valve
mild dirge
#

Oh hmm right

dusty valve
#

otherwise it is 128, 128, 3

mild dirge
#

Can you just do .reshape((1, 128, 128, 3)) and remove that [] part?

dusty valve
#

okay

mild dirge
#

I doubt it makes a difference, but maybe it really expects a np array

dusty valve
#

thanks, it worked

mild dirge
#

Oh, guess it just doesn't accept a list

#

Weird about the 32 though

dusty valve
#

yes

royal hound
#

so imbalanced dataset

#

what do again i forgot

severe trellis
mild dirge
#

Different values for the rolling average give different results

#

Value of 1 gives original graph

#

value of 10 gives a bit more smoothing

#

value of 100 gives a lot more smoothing

#

It's a choice you have to make to get the results you want

#

And if the period of the rolling average is a day, does not mean you calculate it only for each day

#

But at each timepoint you get the data from then until 24 hours before, and take the average

severe trellis
#

I'm confused, I've got a kv mapping of the datetime with how many players were online at that time, I would just split this into 7 days (since this is data for a week), and plot each point on the graph as the trendline so you'd be able to see the day-to-day different. For example:
Plot 1: Average number of players on day 1
Plot 2: Av. on day 2
etc.

Is this what a rolling average is?

mild dirge
#

No

#

Do you know numpy?

severe trellis
#

On a fundamental level, yes

mild dirge
#

Let me make an example

#
import numpy as np
import matplotlib.pyplot as plt


def moving_average(ys, window):
    # The mvoing average is same length as original dataset
    result = np.zeros(ys.shape[0])
    
    # For each point we calculate the average of the previous points (nr determined by window size)
    for i in range(ys.shape[0]):
        result[i] = np.mean(ys[max(0, i-window):i+1])

    return result


# Defining the y values
ys = np.zeros(1000)
for i in range(1, ys.shape[0]):
    ys[i] = np.random.normal(ys[i-1], 1)

# Plotting for different window sizes
fig, ax = plt.subplots(1)
ax.plot(ys, label='original')
ax.plot(moving_average(ys, 10), label='Window size of 10')
ax.plot(moving_average(ys, 50), label='Window size of 50')
ax.plot(moving_average(ys, 200), label='Window size of 200')
plt.legend()
plt.show()
#

@severe trellis

#

This is not the most efficient way of doing it, but just to illustrate how the moving average is calculated

#

So even though the window size is set to 200 for the last call, it still calculates it for all 1000 points

#

Not just every 200th point

severe trellis
#

Tysm for that, it really makes sense. Quite happy with my first graph, what do you think?

queen cradle
#

@severe trellis In the statistics literature, you have a time series, and what you're looking for is called a "smoothing".

mild dirge
#

Yeah that looks good

#

If you want to do this for more data, definitely use cumsum to get the total of points between two time points

queen cradle
#

There are a lot of ways of smoothing. Sliding windows are good. Exponentially weighted moving averages are good. A lot of filters from signal processing (scipy.signal.windows) are good for this purpose.

mild dirge
#

Pandas also has a moving average method iirc

queen cradle
#

For a first try, I would usually recommend an exponentially weighted moving average.

severe trellis
#

Ah, understood. Thank you guys so much!

tidal bough
#

this trendline looks weird to me; the peaks on it aren't where the peaks on either of the graphs are

mild dirge
#

That is also partially why ema would be better

wooden sail
#

you can correct that by using a zero phase filter

tidal bough
wooden sail
#

applying a filter in both directions doubles its order and corrects the phase shift

mild dirge
#

So it averages the moving average in both direction?

wooden sail
#

yep

mild dirge
#

Ah that's a pretty creative trick

wooden sail
#

you can derive it in closed form too, but the impulse response is not causal. some people don't like this 😛 it entails padding the end and beginning with zeros and then truncating

#

alternatively, filter in one direction, reverse the result, filter again, and reverse one last time

#

same result, different "interpretation"

queen cradle
# severe trellis Ah, understood. Thank you guys so much!

Here, try this:

import numpy as np
import scipy.signal

alpha = 0.5

data = np.zeros(20)
data[0] = 1.0

a = np.array((1.0, -(1.0 - alpha)))
b = np.array((1.0,))

scipy.signal.lfilter(b, a, data)

Replace data with your actual data. Try different values of alpha until you find one you like.

wooden sail
#

you like having that capacitor-feel in your filters, huh

tidal bough
#

but the impulse response is not causal. some people don't like this
yeah, it's slightly concerning. but I guess it's probably not possible to get an accurate phase without breaking causality

wooden sail
#

the future is now, old man. digital signals can be shifted arbitrarily

tidal bough
wooden sail
#

i actually laughed out loud

severe trellis
#
alpha = 1.4
data = list(steam_data.values())
a = np.array((1.0, -(1.0 - alpha)))
b = np.array((1.0,))
data = scipy.signal.lfilter(b, a, data)```
queen cradle
#

Those are low-pass filters.

rich spindle
#

how would i start if i wanted to just make a simple AI ? i haven't ever tried one before

#

like say i just wanna generate text or something

mild dirge
#

That's not simple AI

#

The start would be something like linear regression or a perceptron, then making a multi-layer perceptron etc.

#

Generating text is already quite hard

#

But it really depends on what you want to do. Like making it from scratch, or just use a toolbox that has premade models

rich spindle
#

hm

iron basalt
rich spindle
#

just random stuff for fun

iron basalt
rich spindle
#

kk

dusk knot
#

I would like my function to be usually called with ndarrays as values for the parameters, but I was wondering if I can also allow for scalar values to be passed to the function where possible/applicable. I already tried it out with some different calls to the same function, it works. However, that said, I now arrived at my above question about the type hinting.

glossy moth
#

I am still confused about weighing the pros/cons of a huge unbalanced set and judging with accuracy or using a subset that is much smaller but more balanced

mild dirge
#

Accuracy is not a good measure if you want your model to perform on all classes

#

There's multiple ways to deal with unbalanced data, you should look up image augmentation, undersampling, oversampling etc.

glossy moth
dusty valve
#

!d pandas.Series.shift

prime hearth
glossy urchin
#

hi

#

is there anyway to check if value exists in column then return the qualities of that row?

agile cobalt
#

can you show an example of what you mean?

glossy urchin
#

yes

#

so lets say i get an area code from the user

#

i check to see if the area code exists

#

then get acess to item code and stuff

agile cobalt
#

uh, showing the describe() result without showing the actual data is broadly speaking not very useful

glossy urchin
#

oh its not my code

#

its an image off of google

agile cobalt
#

if possible, it would be useful to have a minimum example of what the input looks like and what do you want for the output to look like

#

kinda like how you would format a unit test

glossy urchin
#

so like lets say i input 125.449411

agile cobalt
#

doesn't have to be the real data, just formatted like it

glossy urchin
#

oh this was a bad example

#

but pretend the leftmost column doesnt exist

#

and i input that

#

i check in area code column if that value exists

#

if it does i get acess to that row

#

like item code , element code , etc.

agile cobalt
#

again (more straightforward this time...), give a full example of what the dataframe would look like in a way that I could load it with pandas for testing

glossy urchin
#

yes

#

check to see if value is in column p

#

then i acess the values in the row

#

if it is there

#

like if value is name

#

then then i want the q and v for it

#

to be able to acess it

#

or do i have to make a dictionary with p and the index

#

or how do i iterate through a column

agile cobalt
#

at that point you might as well load it into a dictionary instead of a dataframe

#

but assuming that there are no duplicated values, one option would be just using set_index

#

!e ```py
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': ['A', 'B', 'C'],
'C': [True, False, True],
})
dict_like = df.set_index('B')
print(dict_like.loc['B'])

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | A        2
002 | C    False
003 | Name: B, dtype: object
agile cobalt
#

(the returned object is a Series btw - the index being the previous dataframe's columns, the values being the corresponding values, and the name being the key)

glossy urchin
#

can i dm you and show exactly what im trying to do

agile cobalt
#

nah, I'll leave it at that and let you figure out how to integrate as well as recommend for you to look further into the documentation to understand it better if you are not very used to how data flows in pandas

glossy urchin
#

ok

agile cobalt
#

do check out the pandas User Guides if you haven't yet

glossy urchin
#

got it

#

would mapping the names to an index be a valid solution

#

i could do it in o(1)

severe trellis
severe trellis
#

With an alpha of 1 it does this

#

Editing the alpha just moves it up or down, e.g. with a value of 0.8:

wooden sail
#

looks about right

#

the more you increase alpha, the closer the result will be to the average of all the points

#

and you don't get a shift now

severe trellis
wooden sail
#

like the dotted line?

grizzled barn
#

I’m interested in writing a computer vision Poker program that looks at all the cards on a table (dealers, yours, and other players’) and then makes a decision based on whether you should fold, check, or up your bet. I feel like this would be a really easy program to write, but could anyone with experience in computer vision give their thoughts?

tawdry ruin
#

I am curious to know how power BI works, it looks like an integration of features of SQL, matplotlib. Any information on this?

severe trellis
wooden sail
#

try smaller values of alpha to increase the amount of smoothing

severe trellis
#

I don't know how to explain this in more detail, the alpha just offsets the trendline when I adjust the alpha. It doesn't nothing in regards to smoothing

#

It's quite literally a clone of the original data, and being offset by the alpha

#

Whether or not it's meant to do this, idk, but the screenshots I sent above demonstrate what happens when the alpha is adjusted

wooden sail
#

any chance you can share the original data?

#

if not, i'll try to make a synthetic example

arctic wedgeBOT
severe trellis
wooden sail
#

!e

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

t = np.linspace(0, 7, 300)
Nt = len(t)
s = np.zeros(Nt)

for n in range(1,4):
    s += np.sin(2*np.pi*n*t + np.pi*3/2)/n
trend = 3*np.cos(2*np.pi*t/9)
s += trend
s += np.random.normal(loc=0.0, scale=0.2, size=Nt)

plt.plot(t, s)
plt.plot(t, trend)
legends=["observed", "trend"]

alphas = [0.5, 0.2, 0.08]
for alpha in alphas:
    a = np.array((1.0, -(1.0 - alpha)))
    a /= sum(a)
    b = np.array((1.0,))
    filtered = sp.signal.filtfilt(b, a, s)
    plt.plot(t, filtered)
    legends.append(f"{alpha=}")
    

plt.legend(legends)
plt.savefig("example.png")
#

ah man

arctic wedgeBOT
#

@wooden sail :warning: Your 3.11 eval job timed out or ran out of memory.

[No output]
wooden sail
#

smh i'll just paste the image

tidal bough
#

oh, I forgot the bot has scipy
that allows for a lot of shenanigans

wooden sail
#

it's a little messy, but hopefully you get the idea

#

smaller values of alpha decrease the cutoff frequency of the low-pass exponential filter

severe trellis
#

do you just happen to know all this off the top of your head?

wooden sail
#

if alpha is too small you get only a straight line, but you should be able to tune it to taste

wooden sail
tidal bough
#

(and also there was a discussion of phase correction for moving averages yesterday 😉 )

wooden sail
#

the data is made up, but hopefully you find it more or less convincing. i tried to make it look similar to yours

#

if electrical circuits mean anything to you, an exponentially weighted moving average filter is the discrete form of what a resistor-capacitor circuit does

tidal bough
#

I wonder if a better way to fit this curve would be fourier fitting, or however it's called. It looks like it's basically a big sinusoid plus a small sinusoid plus noise

wooden sail
#

that's basically what i assumed would work, but i made up the weights lol

tidal bough
#

oh wait, lmao, that's how you're generating it, no wonder it'd work

wooden sail
#

yeah lol

#

you can never get it to match super cleanly this way btw, i should warn you

#

you'd need a parametric estimator or a fancy nonparametric one. but low/bandpass filtering can give you a rough idea

#

at any rate, the TL;DR is, try making alpha smaller. and don't forget to divide the a vector by the sum of its entries. otherwise the result gets scaled weirdly

trail yacht
#

Make me master data science and python!

severe trellis
#

How does the trend = work here, I assume this bases it solely off of the t you pass into the formula for it, rather than the data you gave it. Though, doing this doesn't quite work for me trend = 3*np.cos(2*np.pi*np.array(list(steam_data.values()))/9)

wooden sail
#

well, i looked at your data, and just "by eye" and also from the image with the dotted line that you shared, it just "looked" like the overall trend was a slow oscillation

#

so i made up a random low frequency sinusoid and added it to the data

#

this is not what your data does exactly though, so just ignore that part

severe trellis
#

Ohh, I see. This data can vary, since it's just the number of players/twitch viewers for a game on a given week. It would vary a bit

wooden sail
#

that's just how i generated data for myself to test the filters

#

you can ignore everything before the line where i declare the alphas

#

since you get your data directly from somewhere else

severe trellis
#

ah oops, I understand. That trendline was just eyeballed, the alphas are the actual generated ones

#

Woah this is perfect, thank you so much. 100% need to start learning these data science tools 🤩

wooden sail
#

did it work better with any of those alphas?

simple tapir
#

So, this is a visualisation of a CNN architecture and I wonder why we use 4 layers here? I mean, the first 2 aren't enough for the machine to learn?

wooden sail
#

my best answer is "test it and see", because deep neural networks are in general not explainable 😛

#

try removing the layers and see how it performs

simple tapir
#

Alright, lemme try 😄

#

Thanks

severe trellis
mild dirge
uneven mist
#

Hello all! I'm starting with NN and wanted to start with MNIST Handwritten Digits recognition from scratch. Could anyone recommend good sources where the programming steps are explained good.

mild dirge
#

If you really want to learn how to do it from scratch, I would mostly look into the theory. Understand that each layer can be represented as a weight matrix. Understand how a forward pass is done by a dot product of the input/feature map with the weight matrix. How you can add a bias by concatenating a 1 to the input vector and adding an extra column to the weight matrix. Understand backpropagation, and how the chain rule works. @uneven mist

#

And if you understand that, you only really need to look up some numpy functions to get it into code. I would personally not look up a tutorial for writing a NN from scratch, as they will just show you the entire completed code, and you won't learn as much. And often those tutorials contain many mistakes from my experience.

#

This video might also be good, it goes over most of the maths and stuff, but it's coded in a language other than Python, so you can't just straight up copy it
https://www.youtube.com/watch?v=hfMk-kjRv4c&t=1s

Exploring how neural networks learn by programming one from scratch in C#, and then attempting to teach it to recognize various doodles and images.

Source code: https://github.com/SebLague/Neural-Network-Experiments
Demo: https://sebastian.itch.io/neural-network-experiment

If you'd like to support me in creating more videos (and get early acce...

▶ Play video
mild dirge
#

And if you ever get stuck, you could just always ask here

severe trellis
#

Sebastian Lague makes some great videos

severe trellis
#

Is matplotlib meant to be used in run_in_executor()? I save the figure as a BytesIO buffer, and send it. But each time I use the command, a chunk of memory gets used, and never gets released. Calling this multiple times can bring it down from 600-200, and it may jump up to 500 later, etc.

wooden sail
#

that could be a good way of managing it. pyplot will at least keep the latest axis in memory

#

you could be careful in deleting axes and closing figures as you go along

digital fog
#

Does anyone have any experience of measuring the greeks using binomial tree option pricing theory? Got a project and could do with a code review to double check for code for delta and gamma.

severe trellis
tidal bough
#

i think it does clean them up if you're using a non-interactive backend

severe trellis
#

Perhaps the buffer isn't getting cleaned up?```py
graph: BytesIO = await self.bot.loop.run_in_executor(None, plot_stat_graph, steam_data, twitch_data, formatter)
f = discord.File(graph, filename='graph.png')
e = discord.Embed()
e.set_image(url=f'attachment://{f.filename}')

    await ctx.send(embed=e, file=f)```
strange igloo
#

Hi Everyone, Happy Sunday

What are some popular charting libraries that are easy to use and well established? I'm trying to stay away from ones that are trends and then die out

tidal bough
#

matplotlib, for sure

wooden sail
strange igloo
#

I was looking for something less complex than matplotlib

wooden sail
#

but it looks like you're passing the graphs to the bot? could the memory usage be the bot holding on to all the images?

wooden sail
#

it's the customization that gets tricky

strange igloo
#

You're right. I've used matplotlib a bit. It's just so hard to get it to do custom visuals.

severe trellis
hasty mountain
#

Hey guys, I want to compute a metric that might allow me to have a better idea on how my neural network gradients are behaving(without having to plot their values).
Is it a good idea to compute the mean of those gradients after each iteration(each batch), and, after an epoch is concluded, sum all those means? The idea is to make it such as the closer this result is to 0, the lower the optimization being performed(result = 0 would be an optimal or vanishing gradients case)

tidal bough
misty lava
#

Anyone familiar with Tweepy and Twitter API?

serene scaffold
misty lava
#
----> 1 MyStreamListener=MyStreamListener()
      2 MyStream=tweepy.Stream(
      3     bearer_token=credentials.BEARER_TOKEN,
      4     auth=api.auth,
      5     listener=MyStreamListener,
      6 )
      7 MyStream.filter(languages=["en"],track=settings.TRACK_WORDS)

TypeError: StreamingClient.__init__() missing 1 required positional argument: 'bearer_token'

getting this error, would appreciate any help smile

serene scaffold
#

though it looks like StreamListener has been removed from the newest version of tweepy

misty lava
#

running

MyStreamListener(bearer_token=credentials.BEARER_TOKEN)

<main.MyStreamListener at 0x2640094b130>

#
class MyStreamListener(tweepy.StreamingClient):
    def on_connect(self):                                       # DISPLAYS "CONNECTED" ONCE CONNECTED
        print("Connected")
serene scaffold
boreal gale
fleet river
#

Hi, I just wanted to ask if spacy is a good option to start with Machine learning?

serene scaffold
fleet river
#

So spacy is good then?

devout oak
#

Hey i need to build a NN from scratch for an assignment of mine , can you guess point me to some good resources to help me do the assignment

devout oak
fleet river
#

For Neural Thinking... I had BrainJS

#

have*

devout oak
devout oak
#

when i started out i did use spacy and then moved to nltk

#

so go ahead

spare pollen
#

hey, for a school project i had to make a dots and boxes game with reinforcement learning (Q learning), what my teacher told us to do is basically make a table of all the boards and values for each board, and to train the board so it would play well, but mine doesn't really play well, so i was hoping for some ideas or general help
the basic idea is that we play a game, and we do an average of the cost of the board in the table, and the cost of the board from the game which we gain by going back from the outcome with multiplication of 0.9
so that if the outcome is 1 (good) 2 boards down is 0.81 (1*0.9*0.9) and we average that 0.81 with the table value

#

so ive tried running first a game of 2 random opponents to create a table,
then run a game of a table opponent against a random one so that the table one will get smarter
but it didnt, it mostly played as if it expected a random opponent
so i tried making the opponent smarter by telling it to capture squares if it can but now its dumber
my theory is because the game is won by the first player on optimal play from both sides, its hard to train the opponent to play well if it looses most of the time

#

so im out of ideas

queen cradle
spare pollen
#

ill try, but i dont think it'll do much, due to the fact the game is won by the first player if both play optimally i would think the first player will soon get a boost in costs and the opposite for the second as it will start to loose

queen cradle
#

I think I'd use the same Q table for the two players. But with the box labels swapped (i.e., only train player 0, but when you need to move as player 1, relabel every 0 box as a 1 and every 1 box as a 0).

spare pollen
#

there will be inconsistencies i think

#

first one that comes to my mind is that because player 2 never starts the starting positions will never occure in the table

queen cradle
#

The Q table is going to need more entries, I agree. But if you imagine a human player, then really they're going to use the same principles to evaluate positions regardless of which player they are. So it seems like a sensible approach to me.

#

I believe this idea is used for training top-ranked chess and Go programs, but I don't know too much about those.

spare pollen
#

hopefully it works

plush jungle
#

is there a better way of finding out the right hyperparameters than just incrementing changing one hyperparameter at a time and seeing what happens?

boreal gale
# plush jungle is there a better way of finding out the right hyperparameters than just increme...

the problem you have right now is called "hyperparameter optimisation"

the classic solution to this is either grid search or random search:

  • grid search meaning you define a "grid" of hyperparameter e.g. 3 choices of param1 and 2 choices of param2, together they create a "grid" of size 6, try all these 6 configurations out and see what configuration is good.
  • random search meaning you literally try a random configuration and see what is good.

there has been plenty of research done in exploring what other ways are there to do this
the one i usually reach for is bayesian optimisation.

but as usual, the more model you try to apply on your data (different hyperparameter could be considered as another model you try on your data), the more likely you are overfiting in the long run (it's worth noting re-splitting your dataset into a different train/test set doesn't make it less overfitting)

plush jungle
#

like this?

Epoch : 100, Train loss: 10.321332544088364 , Train Acc: 0.48500001430511475, Val loss: 18.077982330322264, Val acc: 0.5035000443458557
Epoch : 110, Train loss: 9.53213369846344 , Train Acc: 0.48750001192092896, Val loss: 17.678728103637695, Val acc: 0.5050000548362732
Epoch : 120, Train loss: 8.479382407665252 , Train Acc: 0.5350000262260437, Val loss: 17.400557136535646, Val acc: 0.5050000548362732
Epoch : 130, Train loss: 12.972423934936524 , Train Acc: 0.512499988079071, Val loss: 17.024103546142577, Val acc: 0.5050000548362732
Epoch : 140, Train loss: 11.378312253952027 , Train Acc: 0.48750001192092896, Val loss: 16.716188049316408, Val acc: 0.5050000548362732
Epoch : 150, Train loss: 8.84527666568756 , Train Acc: 0.5149999856948853, Val loss: 16.279064178466797, Val acc: 0.5065000057220459
Epoch : 160, Train loss: 6.3715451717376705 , Train Acc: 0.48500001430511475, Val loss: 15.720507717132568, Val acc: 0.5102499723434448
Epoch : 170, Train loss: 7.068263298273086 , Train Acc: 0.4950000047683716, Val loss: 15.122348213195801, Val acc: 0.5182499885559082
Epoch : 180, Train loss: 8.795898056030273 , Train Acc: 0.512499988079071, Val loss: 14.284347248077392, Val acc: 0.5189999341964722
Epoch : 190, Train loss: 6.765073442459107 , Train Acc: 0.5249999761581421, Val loss: 13.267550659179687, Val acc: 0.5205000042915344
Epoch : 200, Train loss: 9.654535031318664 , Train Acc: 0.4650000035762787, Val loss: 11.865579986572266, Val acc: 0.5224999785423279
Epoch : 210, Train loss: 5.291061848402023 , Train Acc: 0.5199999809265137, Val loss: 11.279545116424561, Val acc: 0.5224999785423279
Epoch : 220, Train loss: 7.116158974170685 , Train Acc: 0.44999998807907104, Val loss: 10.99517889022827, Val acc: 0.5235000252723694```
velvet bronze
#

Guys I need to write a program that uses computer vision to detect workers who do not wear safety clothing on site
What are the libraries i'll have to master and what are some suggestions on how i'll go about it?

serene scaffold
velvet bronze
velvet bronze
serene scaffold
velvet bronze
misty lava
#

Using Tweepy for Twitter API

stream = MyStream(bearer_token=credentials.BEARER_TOKEN)

# CLEARS RULESET BEFORE STREAMING DATA
for rule in stream.get_rules().data:
        stream.delete_rules(rule.id)
# ADDING RULES TO RULESET TO STREAM SPECIFIC DATA
stream.add_rules(tweepy.StreamRule("#ETH"))
stream.add_rules(tweepy.StreamRule("$ETH"))
stream.add_rules(tweepy.StreamRule("ETH"))
stream.add_rules(tweepy.StreamRule("Ethereum"))
stream.add_rules(tweepy.StreamRule('-is:retweet'))
stream.add_rules(tweepy.StreamRule('-"Giveaway" -"Participants" -"Winner" -"friends" -"notifications on" -"RT" -"help pay your bills" -"Whale" -"#WLgiveaway" -"#nftgiveaway" -"current Ethereum gas prices"'))
stream.add_rules(tweepy.StreamRule('-"price update" -"join me" -"learn to trade" -"DM" -"tag" -"Send DM" -"price updates" -"chance to win" -"trade and watch" -"follow us" -"opensea" -"swap Alert" -"bought for" -"Item listing" -"#whitelist"'))
stream.add_rules(tweepy.StreamRule('-"Want to win" -"community of real traders" -"discord community" -"staking service" -"#whale" '))

#START STREAM
stream.filter(expansions=["author_id",],tweet_fields=["created_at","referenced_tweets","lang","attachments"]) 
class MyStream(tweepy.StreamingClient):
    #TWEETS = "STATUS UPDATES".
    def on_connect(self):        # DISPLAYS "CONNECTED" ONCE CONNECTED
        print("Connected") 
    # AVOID RETWEETED TWEETS, NON-ENGLISH TWEETS AND TWEETS WITH ATTACHMENTS, ONLY ORIGINAL ENGLISH TWEETS WITH NO ATACHMENTS ARE STORED 
    def on_tweet(self,tweet):
        print(tweet.data)
        if tweet.referenced_tweets != None or tweet.lang != "en" or tweet.attachments != None:
            return True

Trying to have my stream not show Retweets and not show the phrases/words that have the - " ", Currently the stream shows RT's and tweets containing those words/phrases.

Would appreciate any advice/guidance

novel python
#

I have two different dataframes, I want to cross values on them to populate the left one with the names of the company, which is only available on the right one, and both have the company ids. I tried merge here but I'm not sure if that's the correct solution, can't seem to figure it out.

serene scaffold
merry wadi
#

Hello everyone! Working on a Node Level GNN binary classification problem that has very low positive classes (8%). I attempted to modify my training with BCELoss to account for the class weights like this.


def train(loader):
  loss_lst = []
  model.train() 
  
  for i, data in enumerate(loader):
    optimizer.zero_grad()
    out = model(data)
    out = out.reshape((data.x.shape[0]))
    loss = criterion(out, data.y.float())
    weight_ = (weight[data.y.data.view(-1).long()].view_as(data.y)  )
    
    new_loss = torch.mean(weight_ * loss)
    
#     loss_class_weighted = weighted_binary_cross_entropy(out, data.y.float(), weights=[0.92, 0.08])
#     loss_class_weighted.backward()
    
    new_loss.backward()
    loss_lst.append(new_loss.detach().numpy())

    optimizer.step()
  return np.average(loss_lst)```
But my results look a little strange over 1000 epochs. Did I do something wrong in the code or is there a better way of handling imbalanced classes?
verbal venture
#

does anyone know what type of algo I should use for this? "Use a suitable ML algo, for when every feature column contains data of type integer ranging from 0 to 255 and the target column contains categorical data represented by six integers 1, 2, 3, 4, 5 and 6"

tulip wyvern
#

How come my cost isn’t decreasing? It starts at 0.69314 and stays at that throughout all the iterations (decreases by like 0.000001 each 100 iterations). My layers are 12288, 20, 7, 5, 1 and I've tried learning rates of 0.1, 0.002, and 0.00001. It’s a monkey vs gorilla binary classification model that uses 2000 images to train that are each sized 64 x 64.

compact ivy
#

hi, what ml model should i use to get the PD of dataset from a banck

#

?

bright stone
#

i am trying to finetune gptj for Q&A but have trouble figuring out how to config the tokenizer and data collector, i've tried googling and had read the hugging face document several times with out result, i am sure this may be a simple answer to find (or that's what i thought before finally asking here

serene scaffold
serene scaffold
tulip wyvern
#

@serene scaffold also do you think it will help if i size my images as 256 * 256 o rwill that make it worse?

serene scaffold
tulip wyvern
#

Yeah

#

Because I had cost issues with 256 * 256 as well

serene scaffold
#

do you have convolutional layers?

tulip wyvern
#

No

#

I'm not using any pthon libraries because I wanted to make it from scratch first so I can learn the basics

serene scaffold
#

I see

#

so you're not even using numpy? let alone torch/tensorflow?

tulip wyvern
#

Im using numpy

serene scaffold
#

that's an external library. but maybe you already know that

tulip wyvern
#

Yeah

serene scaffold
#

anyway, I don't do image AI, but I wouldn't expect to get good results without convolutions anyway

tulip wyvern
#

Yeah I got 40% training accuracy and 85% test accuracy

#

That isn't overfitting right because that'd be the other way around

bright stone
#

i am trying to finetune gptj for Q&A but have trouble figuring out how to config the tokenizer and data collector, i've tried googling and had read the hugging face document several times with out result, i am sure this may be a simple answer to find (or that's what i thought before finally asking here

def tokenize_function(examples):
    current_tokenizer_result = tokenizer(examples["text"], padding=True, truncation=True)
    return current_tokenizer_result


print("Splitting and tokenizing dataset")
tokenized_datasets = current_dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets["train"].select(range(100))
small_eval_dataset = tokenized_datasets["train"].select(range(100))


training_args = TrainingArguments(output_dir=GPTJ_FINE_TUNED_FILE,
                                  report_to='all',
                                  logging_dir='./logs',
                                  per_device_train_batch_size=1,
                                  #label_names=['input_ids', 'attention_mask'],  # 'logits', 'past_key_values'
                                  num_train_epochs=1,
                                  no_cuda=True
                                  )

metric = evaluate.load("accuracy")


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)


data_collator = DefaultDataCollator(tokenizer)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    data_collator=data_collator,
)

i am using these parameter
but keep having problems on object type

stone glacier
#

Hello, I graduate in like 3 months. Can anyone tell me what projects would be good to have when applying for jobs?

#

As in what specialisation of DSAI?

#

I got like 5 or 7 ready just not sure if I can fit it all in the word docx

odd meteor
odd meteor
young granite
#

im struggling to create a dropdown for my plotly plots.
i created a legendgroup of IDs and wanted to access the group via dropdown and display the IDs of each group as normal legend.
Therefore i tried to use fig.data but i dont get it

muted crypt
#

Hello! I'm working on my thesis and I need to find the most optimal x-shift value that makes the red line (or curves) have the smallest error when compared to the blue one (in the x axis for instance). In my mind I feel like a good approach is working with something similar to the correlation between signals or something similar to align the data. However this is just longitude, latitude data and I can't find much information about a way to solve this online as these are just random paths with no maximum or references to align. Any ideas on how to approach this?

#

(Left image, what I have. Right image is what I'd like to get)

young granite
muted crypt
#

The real data that I have to use are more complex than that

young granite
#

so they are random data points and not vectors?

#

vectors would be easy by data points u need to calc. the approx. area sum between them and find when the area is smallest for a given iteration

boreal gale
muted crypt
#

Here they are just turned into a line to its easier to look at.
The data that I have to deal with looks more like that

muted crypt
wooden sail
#

the cross-correlation function is a good place to start

muted crypt
tidal bough
#

The really naive way would be to find a shift that minimizes mean squared error vertically, which probably even has a simple analytical solution

muted crypt
muted crypt
boreal gale
muted crypt
wooden sail
#

it's always 2 line segments?

muted crypt
#

The upper line is the real data while the dotted line the intended path. There's a shift in time (To) as well as space error

wooden sail
#

if it's only a slice as shown here, it's not SUPER complicated

boreal gale
wooden sail
#

if you need to contract the observations, it immediately becomes challenging to find the global optimum. if you only want a shift, that may still be possible

muted crypt
wooden sail
#

oh boy, it gets better

#

the usual definition of distance is useless here

muted crypt
#

of course it has to be a damn drone which adds the vertical component

wooden sail
#

your metric should be something like a wasserstein distance or something of the sort

#

if you ONLY want shifts, it's still not super difficult

muted crypt
wooden sail
#

2d mse alone won't work

muted crypt
wooden sail
#

you need to assign each point on one curve to one point on the other

#

otherwise the distance is ill-defined

muted crypt
wooden sail
#

measuring distance is an optimization problem all of its own in this scenario, and then minimizing that distance is very expensive

muted crypt
#

i was wondering about calculating the area between the 2 curves but that seems complex

wooden sail
#

that's a suitable approach too, but you still need to pair the points to be able to do that

#

or maybe not, actually

muted crypt
#

you're right

wooden sail
#

hmm it's pretty challenging

muted crypt
#

in 2d it would be an integral but I'd need to find a function which adapts to the trajectory with regression I guess

wooden sail
#

the 2d figure is curved though

tidal bough
#

Looking at the actual graphs you have makes me gravitate towards hacky solutions like

  1. iterating over the actual trajectory, find the closest point on the intended trajectory to each real one (this can be inaccurately done in O(N) if the trajectories are close enough together that you can assume the closest point shifts smoothly along the intended trajectory)
  2. calculate the mean of squares of these distances, and take that as your loss.
#

it'd super not work if the trajectories were significantly different, but yours look close enough

wooden sail
#

it would still work if the trajectories were different if you find the closest point correctly

#

the problem is precisely that though. even measuring the distance is an optimization problem

tidal bough
#

that's true, probably a spatial partitioning kind of task

wooden sail
#

i'd look at it as the optimal linear assignment task

#

time for some yonker-volgenant

muted crypt
#

But I would need to make points in between right? for instance the intended trajectory are like 15 points in space while the real one is made out of thousands

#

it's like finding the closest distance from a point to a line

wooden sail
#

you may need to interpolate or zero pad

tidal bough
#

for my hacky idea it doesn't really change things since you can just turn each line segment into a hundred points or whatever

wooden sail
#

you have to do something so that the curves are the same length and sampled at the same intervals, ideally

#

e.g. interpolation and zero padding

muted crypt
#

adding points in between can be done

muted crypt
wooden sail
#

no, length in space

#

part of the problem requires deciding what to do about points that cannot be paired. one solution is to artificially extend the curves

#

actually, yes, since i also suggested a regular sampling interval

#

the two things are the same

#

if not though, the issue is that you'd have to make a decision about what to do with leftover points that cannot be paired

#

cardinality mismatch is an issue with wasserstein metrics

muted crypt
#

right, my first idea was to count the number of points of the real trajectory, then add points inbetween the lines of the intended trajectory and compute the distance as each point would have a pair but i'll look at what you mention

#

but extending the curves requires me to find the distance and seems like a hard thing to do with so many points and noise

wooden sail
#

yeah that's maybe not the best approach. i think adding points in between should do

muted crypt
#

I guess the shifting is the hard part

#

otherwise the time shift will be taken into account in the error

#

or maybe not

wooden sail
#

the time shift will be in the error, sure

boreal gale
#

would you mind dumping an example of real trajectory and the intended trajectory somewhere? i want to try some stuff 🙂

muted crypt
muted crypt
#

a csv I suppose that works

boreal gale
cinder schooner
#

Hello, so i have an image classification project with bird images and 30 classes. I tried using different architectures but they are all overfitting somehow. I tried adding some dropout, lowering the batchsize, changing the learning rate and I tried adding some horizontal and vertical flip transformations. But it aint better. I have 1500 train images with 51 image +- for each classe and 270 validation images. What can i / should try?

muted crypt
boreal gale
#

sweet! thanks

tidal bough
#

here's what I got with the closest-point-but-naively approach on a shitty generated trajectory

#
from scipy.spatial.distance import cdist


def find_closest(expected: np.ndarray, real: np.ndarray) -> np.ndarray:
    d = expected.shape[1]
    M = expected.shape[0]
    N = real.shape[0]
    assert expected.shape == (M, d), expected.shape
    assert real.shape == (N, d), real.shape

    closest_inds = np.zeros((N,), dtype=int)
    closest_inds[0] = np.argmin(cdist(real[:1, :], expected).reshape(-1))
    for i in range(1, N):
        cur_pnt = real[i, :]
        previous_closest = closest_inds[i - 1]
        # search back and forth from this index only to the local minimum
        closest_dist = None
        closest_ind = None
        for j in range(previous_closest - 1, -1, -1):
            dst = np.linalg.norm(cur_pnt - expected[j, :])
            if closest_dist is None or dst <= closest_dist:
                closest_dist = dst
                closest_ind = j
            else:
                break
        for j in range(previous_closest, M):
            dst = np.linalg.norm(cur_pnt - expected[j, :])
            if closest_dist is None or dst <= closest_dist:
                closest_dist = dst
                closest_ind = j
            else:
                break
        # and take that as the new closest
        closest_inds[i] = closest_ind
    return closest_inds
#

here's the approach I used. This is probably pretty slow because it's looping in Python, but that can be fixed by making it numba or cython. And the important part is that the number of distances that it has to examine for each point of the real trajectory is probably pretty low - probably constant in most cases, even.

boreal gale
#

what's the code you used to plot that?

wooden sail
tidal bough
# boreal gale what's the code you used to plot that?
closest_inds = find_closest(expected=sampled_pts, real=real_trajectory)
plt.figure("closest distances", clear=True)
plt.plot(sampled_pts[:, 0], sampled_pts[:, 1], "o-", ms=3)
plt.plot(real_trajectory[:, 0], real_trajectory[:, 1], "o-", ms=3)
for a, ind in zip(real_trajectory, closest_inds):
    b = sampled_pts[ind]
    plt.plot([a[0], b[0]], [a[1], b[1]], "r-")
plt.show()

sampled_pts is just 100 points obtained by linear interpolation of the intended trajectory

wooden sail
#

squaring preserves ordering of positive numbers, so the dot products suffice

tidal bough
wooden sail
#

very nice demo. this already probably suffices for a nice neuristic

tidal bough
#

oh, and the trajectory I used is this:

intended_trajectory = np.array([[0, 0], [0, 1], [1, 1], [1, 0], [0, 0]]) # just a square
T_fin = len(intended_trajectory)
intended_interp = scipy.interpolate.interp1d(np.arange(T_fin), intended_trajectory, axis=0, bounds_error=False, fill_value=[0,0])

N = 100
time = np.linspace(0, T_fin, N, endpoint=False, dtype=np.float64)
sampled_pts = intended_interp(time)
deviation = np.zeros_like(sampled_pts)
deviation[:, 0] = (0.1 * np.sin(time * 2.7 + 0.2) + 0.1 * np.sin(time * 6.6 + 0.97)) * np.linspace(0, 1, N)
deviation[:, 1] = (0.12 * np.sin(time * 3.2 + 0.5) + 0.1 * np.sin(time * 7.4 + 0.12)) * np.linspace(0, 1, N)
real_trajectory = sampled_pts + deviation
muted crypt
boreal gale
#

the paired points are quite different in some places as expected

#

with dynamic time warping, you can only match current or future point in the correpsonding time series, which might be a nice property for you

#

the arrow indicates where this property is not observed with the closest point approach

young granite
#

does one know plotly a fair bit?
i create traces in a for loop and later want to access em via buttons,
but i dont know how to access them in a smart way.

for group in dict.keys():
  fig.add_trace()
    for ID in group:
      fig.add_trace()

i normally assign group as legendgroup but i wanted to improve and use buttons for groups so that in the legend all names are plotted instead of just the group_name

wooden sail
#

you can append points at infinity to the shorter list, and decide later what to do about them

muted crypt
tidal bough
#

currently trying to apply the same code to the trajectory Pau342 posted

#

it looks like it fails but it's hard to tell because the 3d plot lags as hell and crashed my jupyter once already 🥴

#

having trouble rotating the plot to see, but it seems at some point it jumps and starts being very wrong

muted crypt
wooden sail
#

yours is only index based right? tbh i didn't read your code carefully so i'm not sure how you're doing the assignment 😛

muted crypt
#

maybe there are more points in the segment from the back and when it increases the index it goes to that area

boreal gale
#

is it a case where espg 4326 (i.e. lat lng) just isn't that comparable when you have altitude?

maybe you need to convert to espg 27700 (or basically make lat lng into meters just like altitude) first?

muted crypt
tidal bough
#

btw @wooden sail, here's a math question: suppose you have two continuous trajectories a(t), b(t) (continuous functions mapping from [0,1] to a 3d space). You then construct a function d(t) = f(t') where t' from [0,1] is such that b(t') is the closest point to a(t) over all possible times. Can anything be said about the conditions for d(t) being continuous? There are obviously cases in which it is (trivial one: a=b=d), and cases in which it isn't (imagine a horseshoe-shaped a and a line-shaped b which connects the ends of a - in the middle, d(t) is going to jump from the left to the right end of the horseshoe).

#

this seems like the kind of thing there's been papers on, tbh

muted crypt
sleek harbor
wooden sail
#

so a C^n continuous curve

#

the part about b(t') isn't actually that important

tidal bough
#

I was thinking just C^0-continuous, but sure, it doesn't really matter

boreal gale
wooden sail
tidal bough
#

that may well be the reason

tidal bough
#

lemme alter the distance function...

boreal gale
#

an alternative quick hack would be lat_but_in_meters = lat / 0.0000089987192
but the proper solution is probably to actually use haversine.. though i haven't dealt with 3d version of it and i am not sure if it even exists..

muted crypt
tidal bough
#

wow, I should not be having this many problems with implementing a distance function but I am

#

ok there, I did it.

#

i managed to mix up the latitude and longtitude, forgot to convert to radians, and forget about the radius of the earth in process

muted crypt
versed gulch
#

for training an AI model does it make sense to reduce the learning rate during within training based on the training loss or the validation loss?

tidal bough
#
def earth_distance(a, b):
    ϕ1, θ1, r1 = a
    ϕ2, θ2, r2 = b
    θ1, θ2 = (np.radians(θ) + np.pi / 2 for θ in1, θ2))
    ϕ1, ϕ2 = (np.radians(ϕ) for ϕ in1, ϕ2))
    r1, r2 = (r + 6400e3 for r in (r1, r2))
    return r1**2 + r2**2 - 2 * r1 * r2 * (np.sin(θ1) * np.sin(θ2) * np.cos(ϕ1 - ϕ2) + np.cos(θ1) * np.cos(θ2))

but I think this is now right

#

sadly it doesn't help

boreal gale
#

how did you do the interpolation between the waypoints btw? just wondering am i doing something dumb and slow..

tidal bough
#

huh. Are the points in the intended trajectory supposed to be dozens of kilometers apart?

>>> [earth_distance(a,b) for a,b in zip(intended_trajectory, intended_trajectory[1:])]
[17420.6875,
 4938.859375,
 44526.0,
 4938.859375,
 1661.421875,
 0.0,
 100.0,
 2861.4375,
 907.15625,
 38189.703125,
 907.15625,
 19351.90625]
muted crypt
muted crypt
tidal bough
#

oh wait, I forgot these are squared dists

#

that puts the first and second points 131m apart, which seems to match yours, yay

muted crypt
#

nice!

tidal bough
#

velocity over time, slightly EWM-smoothed

muted crypt
tidal bough
#

sadly that still doesn't help with whatever's happening with the plot

#

here's closest_inds. There's indeed a few discontinuities here

#

and I don't get why...

muted crypt
#

hmmm in some point of the trajectory, there's like 10 seconds where the drone stops and hovers in the air, which is show there but maybe that messes up the points that the next point are paired with

tidal bough
#

here's closest distance over point index

#

the two small peaks are turns, and are correct

#

whatever the hell happens after ~800 probably isn't

muted crypt
#

does the function imply that a point in the future can't be closer to a point of the trajectory from the past? I don't know if that makes sense but I feel like at some point it skips to a further point and then all the next points have no choice but to calculate the distance to a point that's shifted

#

because it works pretty smoothly for half of the trajectory

tidal bough
#

actually lemme plot something interesting

wooden sail
#

embrace brute force

muted crypt
#

like Ry shared it seems that at the end there is an area where all points merge together, maybe that's the closest dist but maybe it has run our of index

wooden sail
#

how many points are there in each curve

muted crypt
#

it's not specificied, the real trajectory is made out of the data recorded by a drone every 0.1 seconds

#

so if it takes 10 seconds for a turn would be 100 points

tidal bough
#

currently calculating the million-element distance matrix between the real and sampled points to compare the actual distances against my naive ones

#

it's not going fast, mostly because it uses a custom distance function. and I can't even rewrite it in numba because I'm using py3.11 😔

wooden sail
#

how are you computing it

#

this is the sort of stuff you'd einsum

tidal bough
#

scipy.spatial.distance.cdist with metric=earth_distance

#

can probably be done much better in several cdists over the individual coords

#

okay, I moved to python3.10

#

matrix calculation time went from "at least 2 minutes" to 5.6s

#

god I love numba

wooden sail
#

i can't find anywhere what earth_distance does

#

if you've converted to meters and have arrays, say, of size n x3 and m x 3

tidal bough
#

okay fixed

boreal gale
#

corresponding plots from me, distance is presumably meters (code: https://paste.pythondiscord.com/ovicodofof)
i did piecewise interpolation between each waypoints pairs instead of every waypoints estimated by one function
also only did a / 0.0000089987192 hack to convert to "meters"

time for me to get back to real work 😩 , good luck!

tidal bough
#

so I guess my algorithm is just bad and needs at least some global search part

wooden sail
#

i think it'd be a fair bit faster if you first convert to meters and then do a broadcasted difference

muted crypt
#

how did you go from the local search to the actual closest

tidal bough
# tidal bough

a-ha, but look at the first one! it looks like the problems began when there was an actual, genuine discontinuity in the closest indices!

#

so that's why my local search failed

wooden sail
tidal bough
#

and it only takes like 7s, god I love numba x2

muted crypt
#

why haven't I heard of numba before

tidal bough
# tidal bough and it only takes like 7s, god I love numba x2
from numba import njit
@njit
def earth_distance(a, b):
    ϕ1, θ1, r1 = a
    ϕ2, θ2, r2 = b
    θ1 = np.radians(θ1) + np.pi / 2
    θ2 = np.radians(θ2) + np.pi / 2
    ϕ1 = np.radians(ϕ1)
    ϕ2 = np.radians(ϕ2)
    r1 = r1 + 6400e3
    r2 = r2 + 6400e3
    return r1**2 + r2**2 - 2 * r1 * r2 * (np.sin(θ1) * np.sin(θ2) * np.cos(ϕ1 - ϕ2) + np.cos(θ1) * np.cos(θ2))

true_distance_matrix = np.sqrt(cdist(real_trajectory, sampled_pts, metric=earth_distance))
wooden sail
#

now remove the sqrt to speed up the code more :p

tidal bough
#

i need the sqrt for the plot later

#

but I guess I can take it after the argmin

wooden sail
#

you can sqrt the true vals

tidal bough
#

meh, who cares

wooden sail
#

right, after you have the support

#

that's probably the slowest operation in what you have left

#

i'd almost bet you can shave the time in half

tidal bough
#

alright, alright, fixed

wooden sail
#

now it takes 20s

tidal bough
#

5.3s :p

#

but neither of the measurements are good, so

muted crypt
# tidal bough

could you share the code that you used to obtain this?

#

somehow the one that ry shared doesn't work for me

tidal bough
boreal gale
#

eh i really ought to use jupytext :\

tidal bough
#

i'm actually using vscode's jupyter support

muted crypt
tidal bough
#

incidentally, if your real trajectories have about this number of points, you could maybe just use the bruteforce solution 🥴

#

well, if you need to perform such a search once. then ~5s is okay

#

if you're going to e.g. use it as a loss function while optimizing the trajectory, then you probably want it to be much faster

muted crypt
#

i have like 80 different trajectories 🫠

#

but yes, it's just a result to do once to show the results

tidal bough
#

actually... this can be made much faster by just removing interpolation. For each point in the real trajectory and for each line segment in the intended trajectory, find the closest point on the segment using the exact algorithm. written in numba or cython it'll be very fast

wooden sail
#

we're calling it brute force, but constructing the distance matrix is also necessary to solve the problem optimally in many metrics

muted crypt
#

i'm not sure if I understand yet the difference between brute force and the code that you did

wooden sail
#

reptile's alg tries to avoid computing the full matrix

muted crypt
#

like caluculating the distance for just a few points against calculating the distance to each point?

tidal bough
#

the idea of my local search algorithm is that if ith of the real trajectory was closest to jth point of the intended one, then the i+1th point will be closest to one of the nearby ones to the jth one. So it searches backward (points j, j-1, j-2...) only until the distance stops dropping, and forward with the same stopping condition. So basically, it finds the local minimum of distance starting the search at point j.

#

the hope is that in many cases, this local minimum is also the global one. turns out, it's not always the case.

wooden sail
#

it's not the case if you have loops

muted crypt
#

I'm installing the visual studio you mentioned because it didn't work for me in jupyter :(

wooden sail
#

that there is no error in the locations

#

but gps is very noisy

boreal gale
muted crypt
#

I suppose that the error that was appearing before was due to a corrupt point that was causing trouble

wooden sail
#

that could be it, or following a path involving loops that moves away and then returns to the curve several times

muted crypt
#

Why code never works at the first time for me!

boreal gale
#

ah that means your csv is not the same as mine

muted crypt
#

did I send it like this?

tidal bough
#

you did

boreal gale
#

yep

muted crypt
#

what

boreal gale
muted crypt
#

I though I sent other files, they are the same but with different columns

#

it works now

tidal bough
muted crypt
#

confusedReptile

#

You are a beast too, what a legend 🏅

tidal bough
#

oh no, realised there's a bit of a problem - I'm definitely not calculating shortest-point-on-segment in spherical coords

muted crypt
#

and Edd, another king here
The three kings of the server, I owe you one!

tidal bough
#

so I guess I'll have to turn them all into euclidean coords after all

muted crypt
wooden sail
#

i'm glad reptile and ry are going the extra mile and demoing all of this

#

this problem really isn't easy, so i sat it out lol

#

i just critique from the back row and yell at you for holding the flashlight wrong

muted crypt
#

ry know I always come with interesting problems to solve despite having no idea on how to code I always get in messes like this that I would never ever solve on my own

wooden sail
#

there are papers upon papers on problems like this one published every year. there are several approaches to measuring the distance, and then several more to minimizing it

boreal gale
#

well the problem is too interesting to not try it out myself 🙈 and i happen to know a technique that seems relevant.. (but when you have a hammer everything looks like a nail lol)
kudos to reptile coming up with an algo!

muted crypt
#

i propose ry and reptile to team up, you two can solve everything!

wooden sail
muted crypt
#

yeah, that seems like a very specific way of solving this problem. Every line of code contains something new for me so I'll slowly try to understand everything, years of experience here!

muted crypt
#

by the way, I might just use part of this code in my bachelor's thesis, I was wondering if you're okay with that? might as well put you in the credits hah @boreal gale @tidal bough

wooden sail
#

"huh. Are the points in the intended trajectory supposed to be dozens of kilometers apart?" [10] Reptile, Confused. In data-science-and-ai, PYTHON DISCORD (2023).

tidal bough
#

i did it

#

this is an exact solution

muted crypt
tidal bough
#

and it runs in like 30ms

muted crypt
wooden sail
#

oh very nice

#

what'd you change this time

tidal bough
#

they key is to abuse the fact the intended trajectory is just line segments

@njit
def closest_point_on_segment(v: np.ndarray, w: np.ndarray, p: np.ndarray):
    "closest point on segment vw to point p"
    # https://stackoverflow.com/a/1501725
    vw = w - v
    length_squared = vw.T @ vw
    if length_squared <= 1e-8:
        return v
    t = max(0, min(1, (p - v).T @ vw / length_squared))
    res = v + t * vw
    return res


@njit
def find_closest_lines(expected: np.ndarray, real: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    All inputs should be in euclidean coordinates. All input arrays should be float, or numba won't like it
    expected: (M, d), the waypoints of the intended trajectory, defining M-1 line segments of the trajectory.
    real: (N,d), real trajectory.
    returns:
        closest_pts, (N,d) array - for each point in the real trajectory, closest point along the intended trajectory
        closest_dists, (N,) array - squared distance to that point
        closest_segment_ind, (N,) array of ints - index of the segment that point belongs to (from 0 to M-1 inclusive)
    """
    d = expected.shape[1]
    M = expected.shape[0]
    N = real.shape[0]
    # numba doesn't support asserts, smh
    # assert expected.shape == (M, d), expected.shape
    # assert real.shape == (N, d), real.shape
    closest_pts = np.zeros_like(real)
    closest_dists = np.full((N,), np.inf)
    closest_segment_ind = np.zeros((N,), dtype=np.int32)
    for i in range(N):
        p = real[i]
        for j in range(0, M - 1):
            v = expected[j]
            w = expected[j + 1]
            r = closest_point_on_segment(v, w, p)
            cur_dist = (r - p).T @ (r - p)
            if cur_dist < closest_dists[i]:
                closest_dists[i] = cur_dist
                closest_pts[i] = r
                closest_segment_ind[i] = j
    return closest_pts, closest_dists, closest_segment_ind
wooden sail
#

ah man i had that in mind as well

#

that's pretty clean

muted crypt
#

looks extremely clean

#

clean^2

wooden sail
#

so you're simply doing orthogonal projections

#

super nice

#

i like it. it won't work if rotations are involved, but otherwise this is fantastic

muted crypt
#

therefore, with this method, increasing the number of points to infinite would compute the area between the two paths right?

wooden sail
#

btw what shape are the vectors? numpy ignores the T if the arrays are 1 dim

wooden sail
tidal bough
wooden sail
#

aight. i appreciate the clarity it gives tho

tidal bough
muted crypt
#

so how would you get a fairly accurate measurement of the deviation?

tidal bough
#

so you could take as the loss function, say, the mean squared distance to closest point

#

which would be, like, just (closest_dists**2).mean().

wooden sail
#

that's the one. what reptile did is turn the problem into one of ray-tracing. the number of points on the true trajectory is essentially the number of rays, so the points cannot be increased to infinity

tidal bough
#

huh, raytracing?

wooden sail
#

you did it without noticing 😛 the problem is equivalent

muted crypt
#

oh I though it was done otherwise as there aren't many red dots in the graph

tidal bough
#

if I drew 2000 lines on a 3d plot, matplotlib would definitely have crashed my jupyter again 😛

#

they're drawn for every 20th point of the real trajectory

muted crypt
#

okay! that got me confused but makes total sense!

#

so I see that doing it this way the problem about the time shift is avoided

#

yet I guess it would be nice to find a way to stretch/scale the trajectory to see how it improves

tidal bough
#

https://paste.pythondiscord.com/oyapexafiy.py
Full notebook. the function takes* 30ms on the real 3d trajectory and 600 microseconds on my initial made-up 2d one

  • after compiling, which takes ~5s each time the function is called with a previously-unseed combination of input types
#
# I'll just trust ry that this is a valid approximation, lol:
real_df[["lat", "lon"]] = real_df[["lat", "lon"]] / 0.0000089987192
intended_df[["FPLlat", "FPLlon"]] = intended_df[["FPLlat", "FPLlon"]] / 0.0000089987192

also the way it transforms into euclidean space is ^

muted crypt
#

Crazy fast! Almost feel unbelievable!

tidal bough
#

i'm not sure how to take time into account in any way, here

#

maybe just add time as the fourth coordinate, with some coefficient depending on how important it is that the bot passes the points at the right times

#

waiting for the day numba starts supporting 3.11. or codon starts supporting windows, I guess. or maybe I should see if cython can be used as easily as numba, I guess.

copper zodiac
#

Chatterbot is being not very poggers

copper zodiac
#

Can someone explain why ChatterBot dependency install craps itself?

tidal bough
#

in what way?

copper zodiac
#

Hold on lemme pip install it

#

It takes a while

#

It gets stuck on building Spacy dependencies and then spits out a bunch of errors

merry fern
# serene scaffold sorry, but I don't think I can help with this.

hey, i got a solution if you were curious. here's how it looks:

{
'Cedant' : [[cedant for acct in premium_para.columns[1:3] \
  if premium_para.loc[(scenario, cedant),acct] != 0] for cedant in cedant], \
'Account' : [[(cedant + ': ' + acct) for acct in premium_para.columns[1:3] \
  if premium_para.loc[(scenario, cedant),acct] != 0] for cedant in cedant]
}```
copper zodiac
#

Aha here we go

#

Do you want me to send it as a text file so I don't bloat the chat?

arctic wedgeBOT
#

Hey @copper zodiac!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

copper zodiac
#

nevermind it's way too long

deft harbor
#

Can anyone think of some code that would generate synthetic data showing multiple chains in MCMC converging? I would like to create a plot like the one below, without having to try a bunch of different PyMC models. It is being used for a written example of convergence, but I'm struggling with trying to create 5-10 (x_i, y_i) that come together as x gets larger.

misty lava
#

Hello,

Currently using Tweepy 4.13.
my output of stream.filter() is attached, as you can see referenced_tweets type: replied_to and quoted are showing, I only want to see original tweets.

stream = MyStream(bearer_token=credentials.BEARER_TOKEN)

# CLEARS RULESET BEFORE STREAMING DATA
# for rule in stream.get_rules().data:
#         stream.delete_rules(rule.id)
# ADDING RULES TO RULESET TO STREAM SPECIFIC DATA
stream.add_rules(tweepy.StreamRule('"$ETH" -is:retweet'))

#START STREAM
stream.filter(expansions=["author_id",],tweet_fields=["created_at","referenced_tweets","lang","attachments"]) 

Here is the MyStream

class MyStream(tweepy.StreamingClient):

    # DISPLAYS "CONNECTED" ONCE STREAM IS CONNECTED
    def on_connect(self):        
        print("Connected") 

    # AVOID RETWEETED TWEETS, NON-ENGLISH TWEETS AND TWEETS WITH ATTACHMENTS, ONLY ORIGINAL ENGLISH TWEETS WITH NO ATACHMENTS ARE STORED 

    def on_tweet(self,tweet):
        # if tweet.referenced_tweets != None or tweet.lang != "en" or tweet.attachments != None:
        #     return True
        if tweet.referenced_tweets is None:
            return True
        if tweet.lang !="en":
            return True
        if tweet.attachments is None:
            return True
        print(tweet.data)
hasty mountain
#

Guys, can someone help me with unsupervised learning in Neural Networks?
I know that the idea is to make the neural network to work like a tSNE or a PCA, reducing the information entropy before passing such information to the classifier layers. I've also seen that the ideal method for working with unlabeled data is to pretrain a neural network in unsupervised learning mode, apply supervised fine-tuning with labels available and only then apply self-learning to generate pseudolabels that can be incorporated to the dataset.

Problem is...I'm having the impression that the supervised fine-tuning is actually sabotaging my model performance somehow. The losses doesn't decrease that much, and the consistency loss(MSE between 2 different outputs generated from the same input) appear to be increasing.

Is this normal or I'm doing something wrong?

#

PS: I'm using some information bottleneck in my feature extractor last layer(from 18,432 features, the net has to extract 128), and dropout of 20%, which doesn't seem to be enough to prejudice the model, but idk.

bold timber
#

Hello guys, anyone enlightens me why I get a warning like this?

In this case, I want to build a model for classifying disaster tweets. In this case, I build a hybrid embedding model in which I use a universal-sentence-embedding pretrained model for token-level embedding and LSTM for character-level-embedding

For efficiency, I leveraged a number of methods from the tf. data API which is I combine characters and tokens into a dataset and also turn it into a PrefetchDataset of batches.

The complete warning is like this:
"WARNING:absl:Found untraced functions such as lstm_cell_1_layer_call_fn, lstm_cell_1_layer_call_and_return_conditional_losses, lstm_cell_2_layer_call_fn, lstm_cell_2_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading."

I also used ModelCheckpoint callbacks, but after training the model, I can't load the best model performance. Why did it happen? can you guys enlighten me?

indigo cove
#

Anybody knows what to do with this error?

#

This happens when running pycharm

charred light
indigo cove
#

Thank you!

mint palm
deft harbor
hasty mountain
#

I hope the problem is simply adjust how many epochs the model will make before going to fine-tuning and to self-learning...

#

I've ran another test. The first fine-tuning seems to be ok, as well as the pseudolabels generation.
The problem is...the pseudolabels generation is going fine, but it seems it may not be fine enough, which might be causing trouble...

#

At least, this is my guess pithink

#

Oh...nevermind... I just saw that, instead of sorting my losses from the lowest to highest, I was sorting them from highest to lowest...and then incorporating the worse pseudolabels into the dataset... py_guido

Remember kids: sleep for at least 8 hours each night, otherwise you might get dumb

deft harbor
#

That isn't something I would have guessed. Glad you resolved it though.

trail cloud
#

hello
Does anybody know how to prevent Jupyter Kernel from wrapping text/plain output?
From get_iopub_msg, kernel returns content whose text/plain is wrapped, which is unwanted

rugged vale
#

for pandas is there a naming convention for masks

young granite
#

I stumbled across SHAP Multi Output Regression Model (https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/model_agnostic/Multioutput Regression SHAP.html?highlight=multi output) and want to implement it.
In the end i would like to achive a plot like for classes (https://shap.readthedocs.io/en/latest/example_notebooks/image_examples/image_captioning/Image Captioning using Azure Cognitive Services.html) so to say a heatmap for each feature influence. Is there a direct solution to that problem or do i need to use the shap.data and code it myself? 😄

regal zephyr
#

Does anyone know where I can find an open source dataset for DNA STR loci and Bio Markers for predicting medical info, like the probability of having a specific disease etc...

regal zephyr
young granite
meager venture
#

We were searching for something like FastAPI for the Kafka-based service we were developing, but couldn’t find anything similar. So we shamelessly made one by reusing beloved paradigms from FastAPI and we shamelessly named it FastKafka. The point was to set the expectations right - you get pretty much what you would expect: function decorators for consumers and producers with type hints specifying Pydantic classes for JSON encoding/decoding, automatic message routing to Kafka brokers and documentation generation.

Please take a look and tell us how to make it better. Our goal is to make using it as easy as possible for someone with experience with FastAPI.

https://github.com/airtai/fastkafka

GitHub

FastKafka is a powerful and easy-to-use Python library for building asynchronous web services that interact with Kafka topics. Built on top of Pydantic, AIOKafka and AsyncAPI, FastKafka simplifies ...

versed gulch
#

Does anyone know how to append an array (2D) to a 3D array i.e. i have a 3D array of 98x100x100 and want to append 2 2D zero arrays of size 100x100 at the beginning and the end of my 3D array giving a final shape of 100x100x100?

wooden sail
#

appending is not really a thing for numpy arrays, but you can make a new one

#

something like

new_array = np.zeros((100,100,100), dtype=your_dtype)
new_array[1:100,:,:] = old_array
new_array[0,:,:] = some_2D_array
new_array[-1,:,:] = some_other_2D_array
``` where you can automate the 100s by using the shape of the old_array
late herald
#

is anyone familiar with huggingface? i need some help with training custom data with hugging face model

heavy crow
#

any of you guys gotten a chance to play around with the llama models? i've been experimenting with the quantized 4bit models and it seems promising!

strange igloo
#

Hi Everyone - what is a good blog for data analytics that isn't Medium? Perhaps something long running and established with good credibility?

low mason
#

Is there a decorator to vectorize simple classes, or classes composed of base types + vectorizable classes? For example, I have code like this:

class WorkerState:
def init(self):
self.has_speed = False
self.has_wings = False
self.has_food = False
self.is_bot = False

class TeamState:
def init(self):
self.eggs = 2
self.berries_deposited = [False for _ in range(12)]
self.workers = [WorkerState() for _ in range(4)]
I've manually written code like this:

def vectorize_worker(worker: WorkerState) -> np.ndarray:
return np.array([worker.is_bot, worker.has_food, worker.has_speed, worker.has_wings], float)

def vectorize_team(team_state: TeamState) -> np.ndarray:
parts = [[float(team_state.eggs)], np.array(team_state.berries_deposited, float)]
for worker in team_state.workers:
parts.append(vectorize_worker(worker))
return np.concatenate(parts)

But it's pretty rote and something that could be handled automatically by a not all that smart library. Does such a library exist?

misty flint
#

FYI, i just got an email about this:

mild dirge
#

It reads images now

coral cradle
#

does anyone know any api that can give me the real exchange rate between 2 countries. I don't want to use the nominal exchange rate

misty flint
#

im dead

#

💀

mild dirge
#

Pretty awesome yeah. The examples they showed were 100% cherry picked, but it's still super impressive.

tacit basin
#

Demo was impressive. But it was demo so ... 🙂

#

Apparently bing chat runs on this. Not image part that is

misty flint
#

i want to try that napkin trick though

misty flint
#

can i write pseudocode too while drinking my morning coffee

#

the image part isnt released yet though

#

which is a bummer CL5_FeelsBongoMan

#

oh well

#

signing up for the waitlist anyway

#

📝

tidal bough
wooden sail
#

i somehow always forget that exists

#

but it's still important to note that that does not append either. just for completeness

tidal bough
#

you mean, unlike np.append? 😉

#

i know what you mean though, it's true that none of them grow the array inplace.

#

the docs really want to make sure the reader understands this

hasty mountain
#

I want papers

#

I don't care about OpenAI's code. They're a mess. I want the concepts brainmon

hasty mountain
misty flint
#

~100 pages. have fun with that

hasty mountain
#

Aw...no funny title? What happened to Radford? grumpchib

charred light
#

Also, the real notes are in the appendix.

hasty mountain
#

Nice

#

py_guido 🔥

#

Too bad I still couldn't manage to make my own language model. I'd really like to better check all those problems people complain about those models

charred light
#

The main paper is only like 14 pages. The rest are part of the appendix, covering various prompts.

#

I just realized they have like 300+ people working on this. Jesus

cyan basalt
#

anyone here recommends any courses to get into ai?

charred light
#

Makes much more sense now why it's so flushed out lol

cyan basalt
tidal bough
#

oh interesting, gpt-4 comes already RLHFed

charred light
#

Yea, they also have a whole section on prompts that are allowed, not allowed. "Injection style"(For lack of a better term) attacks to bypass prompts

hasty mountain
#

Prompt injection?
Did they try the Do Anything Now protocol? hyperlemon

#

It seems the folks from Reddit are handling chatGPT quite effectively with that

charred light
#

The images are the most interesting to me. I'll look up how they actually do this later. But to be able to draw meaning from the image is pretty cool.

hasty mountain
#

Ugh...it's quite cool, indeed. I just get a bit sad when I think about how much computational power that may require...

tidal bough
#

i love how they mention in their paper the assorted reasons to expect an AI to murder us all and are like "so anyway, we decided to scale it some more to see what happens"

charred light
#

More like people using AI to murder people. They have a section on politics (e.g. creating propaganda @ specific age group).

tidal bough
charred light
#

There is no way they care about the humanities side of this model. I JUST realized how many people worked on this project (p15-17). At low balling ~100k salary each person, that's a lot of money sunk into this project. Monetization and reaping that money back is going to come first.

charred light
#

My main concern is when the barrier to entry in accessing AI like this becomes zero. Going to make the internet a lot muddier.

hasty mountain
#

It'll be a quite interesting game of cat and mouse...probably there'll be also models to detect when text was AI-generated

charred light
#

There's one prompt on how to build a bomb. (Although I wouldn't trust chatgpt not to have pulled that from some joke website that causes harm to the person attempting to build it.)

hasty mountain
#

Speaking of which...I still have to test a Text GAN...

tidal bough
hasty mountain
#

Poor discriminators...always so neglected...maybe it's their time to shine

tidal bough
#

wow, the section comparing early and release gpt-4 is pure gold

charred light
#

Yep, 100% worth reading through it

#

Or skimming it*

charred light
bold timber
hasty mountain
misty flint
hasty mountain
#

There's also a guy more focused on maths that folks here tend to recommend...I don't remember what channel...

charred light
frozen bloom
#

hey am trying to predict Mortalité hosp

import pandas as pd
import numpy as np
from google.colab import drive
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import classification_report

Mount Google Drive

drive.mount('/content/drive')

Load Excel file into dataframe

df = pd.read_excel('/content/drive/MyDrive/Classeur2_enfants.xlsx')
df.fillna(value=0, inplace=True)

Define the independent variables that you want to use to predict mortality

X = df[['num ', 'age ', 'sexe ', 'ATCDS ', 'AAR', 'RAA', 'Dyspnée', 'ICD', 'ACFA', 'IM isolée ', 'stade ', 'MM à IM prédom', 'stade .1', 'SOR ', 'grade ', 'FE %', 'FE ', 'PAPS ( mmhg)', 'grade (paps)', 'IT ', 'I,Ao', 'autres anomalies ', 'CAV complet', 'CAV partielle', 'CIA os ', 'CIV ', 'annuloplastie Mitrale', 'Plastie de KAY', 'Commissurotomie', 'Elargissement du feuillet post de la valve', 'fermeture du cleft', 'DEVEGA', 'autre PT ', 'RVAo', 'PVAo', 'fermeture de CAV complet ', 'fermeture de CAV partiel ', 'fermeture de CIA os', 'fermeture de CIV', 'CEC (min)', 'Clampage (min)', 'Mortalité hosp', 'décès précoce ']]

Define the target variable that you want to predict

y = df['Mortalité hosp']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lr_cv = LogisticRegressionCV(cv=5)
lr_cv.fit(X_train, y_train)

y_pred = lr_cv.predict(X_test)
print(classification_report(y_test, y_pred))

arctic wedgeBOT
#

Hey @frozen bloom!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

frozen bloom
#

some one can help me .

serene scaffold
hasty mountain
#

lol. True

#

Then it proceeds to explain a random paper from 2015

serene scaffold
mild dirge
#

Yeah it is really cool that they can extract information from images now as well with gpt

serene scaffold
#

I don't really get how the same model could use both

#

guess I'll have to read the paper

misty flint
limpid saddle
#

Hello!
Can someone give me an idea on how to deal with images from hugging face?

To give a better idea, I want to access this dataset: https://huggingface.co/spaces/competitions/aiornot

I first loaded the dataset by doing

ds = load_dataset('competitions/aiornot')```

and `ds` would print out:

```py
DatasetDict({
    test: Dataset({
        features: ['id', 'image', 'label'],
        num_rows: 43442
    })
    train: Dataset({
        features: ['id', 'image', 'label'],
        num_rows: 18618
    })
})```
#

I am not sure what to do with this. I know I can access the nth image by doing

ds['train'][0]["image"]```

But what should the next step be? I was looking to convert the images to an np array to be able to deal with them and feed them to the CNN model, but I am not sure if that's the right thing to do. Is that even necessary?
#

Also, I'll be using tf

tacit basin
thorn bobcat
#

anyone here used tesseract-ocr before?

mild dirge
mild dirge
mild dirge
#

Bit off-topic, but south-park has a new episode on chatgpt, it's pretty funny

drifting kelp
#

How can I use pytables to write big matrices (60000 X 60000) and make operations with it?

novel python
#

what's the easiest way to drop all rows in a dataframe that a column in that dataframe contains values from another column from another dataframe and lenght doesn't match? I tried using a for loop:

for i in range(0, len(df)):
    print(i)
    if (df['Created By: Full Name'].iloc[i] in inactive_people['Full Name'].to_list()):
        df = df.drop(i, axis=0)

but for some reason it gives me IndexError: single positional indexer is out-of-bounds at some point, which I don't understand why since I reseted df indexes before running this.

boreal gale
# novel python what's the easiest way to drop all rows in a dataframe that a column in that dat...

oops, i made a typo. here is my message again.
once you dropped even one row, then length of the dataframe (aka the number of rows) is no longer the same, the length is less than before your drop operation, hence df['Created By: Full Name'].iloc[i] is guaranteed to blow up since i could be up to the original length - 1 because for i in range(0, len(df)):

i think you just want df[~df['Created By: Full Name'].isin(inactive_people['Full Name'])]?

novel python
#

had no idea pandas had such thing, that's what I get for missing the basics

sleek shuttle
#

Hi guys, I have a question about how to tokenize a text. is it better to use nltk or spacy?
Thanks in advance

mossy lance
olive stone
#

Hey
I am trying to train a model on Google Colab, the training goes through 4,000 images. But when training, Colab crashes because of running out of RAM.
I tried to use batches, but it didn't work.
Any idea?

wooden sail
#

if smaller batches don't help, you can try reducing the number of layers

bright pasture
#

Someone told me to remove a ddp line due to the code I have assuming that it's training n multiple GPU's.

#

What do I do?

mild dirge
# bright pasture What do I do?

First find out what is causing the issue. Check how much memory the model takes up. Then check how much memory a batch takes up. Also find how much memory is available

#

The first thought that goes through my mind is that you might have big images, and maybe only 2-3 convolutional/pooling layers which makes for a very large weight matrix for the first dense layer

bright pasture
# mild dirge First find out what is causing the issue. Check how much memory the model takes ...

I... did not understand a word you said, I'm sorry. Basically, I'm trying to run this. https://github.com/justinjohn0306/so-vits-svc-4.0-v2

I believe the train.py thing assumes that I'm doing multi gpu training, but I'm not. I only have one GPU.

GitHub

SoftVC VITS Singing Voice Conversion. Contribute to justinjohn0306/so-vits-svc-4.0-v2 development by creating an account on GitHub.

mild dirge
bright pasture
#

All good. Would you be able to help me too?

mild dirge
#

I have never used the model that you linked. If following the instructions gave an error, and the code is thousands of lines of code, I'm not sure how to fix it either :/

wild rivet
#

Any Risk Analysts here?

grand warren
#

hi i am trying to make an ocr. and my plan to do it is by first thresholding the image and then seperating the letters by using cv2 and then predicting the seperated letters but the seperating letters parts is not working very well. what can i do to seperate the letters?

hasty mountain
grand warren
#

i wanna make my own model tho

hasty mountain
#

Make your own object recognition model

grand warren
#

is it really that hard T^T

mild dirge
#

You could try a clustering algorithm

grand warren
#

whats that?

hasty mountain
mild dirge
#

Well after thresholding, you have hopefully a bunch of black letters on a white background f.e. and then you just want to find clusters of black pixels

#

And those clusters are then cropped out into separate images

hasty mountain
#

I just discovered this LearnOpenCV and...well, they got the only diffusion model tutorial that really helped me, so it might be worth taking a look

mild dirge
#

But if you want actual good results, you want to use a premade model

grand warren
#

but what will i end up learning if i do that?

mild dirge
#

Well I don't know why you are making it. If you make it to learn, then yeah obviously maybe try making your own. If you are making it because you need such a model, then its better to use a premade model.

grand warren
#

no not that i need it

#

clustering kind of sounds interesting

#

is it an ai model too?

#

which one would suit my job?

mild dirge
#

Well I didn't mean clustering in it's conventional meaning. More like using a flood fill for every black pixel you find.

#

And after finding all the connected black pixels, you crop it

grand warren
#

hmm

stuck ore
#

hey hey hey does anybody know how i can use numpu.poly1d to output an expression

#

@ me if you do ! thank you !

violet gull
#

this is AlexNet

#

what does the 3rd dimension represent

#

i thought it was the number of feature maps but it cant be because how can 384 turn into 256

hasty mountain
violet gull
#

what

lapis sequoia
#

Does someone know why I am getting std as nan

stuck ore
#

I know this is probably a pretty simple issue but I'm a beginner and would love some help. How can I output an equation with numpu.poly1d?
It is working, but in a way that is not useful. It outputs an equation with the exponents on the line above the equation so it looks like superscript instead of just using a carat and it has improper notation for multiplication. I'm assuming is was intended to be printed and not used later in the script as an actual equation but I need to use it as an actual equation.
This is what it's giving me:
2
-0.01252 x + 1.026 x - 16.14
but this is what I need:
-0.01252 * x**2 + 1.026 * x - 16.14

My code:

xymodels = []
time = [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300]
temp = [17, 41, 66, 300]
for t in time:
    browningRate = [(-0.01217126 * t) + 0.519399, (-0.001115784 * t) - 1.21772,
                    (-0.00361034 * t) + 0.333761, (1134 * t) - 834.9]
    model = np.poly1d(np.polyfit(temp, browningRate, 2))
    print('\n'+ str(t) + ':')
    print(model)
    xymodels.append(model)
flat sable