#data-science-and-ml

1 messages Β· Page 79 of 1

mild dirge
#

This is the original, not the cleaned one @tidal bough

latent remnant
#

i should run this?

jaunty falcon
#

hi

left tartan
# latent remnant i should run this?

Seems like all you want is: ```py
from io import StringIO
import pandas as pd

s = StringIO("""Team,Player,Tournament,Matches,Batting Innings,Not Out,Runds Scored,Highest Score,Batting Average,Balls Faced,Batting Strike Rate,100,50,0,4s,6s,Bowling Innings,Overs Bowled,Maidens Bowled,Runs Conceded,Wickets Taken,Best Bowling Figures,Bowling Average,Bowling Economy Rate,Bowling Strike Rate,4+ Innings Wickets,5+ Innings Wickets,Catches Taken,Stumpings Made
Delhi Daredevils,CH Morris,IPL 2016,12,7,4,195,82*,65,109,178.89,0,1,1,15,12,12,44,0,308,13,Feb-30,23.69,7,20.3,0,0,8,0""")

data = pd.read_csv(s)
team_2019 = data[data["Tournament"] == "IPL 2019"]
team_2019_newdf = team_2019[["Player", "Team", "Matches", "Batting Average", "Batting Strike Rate", "Bowling Innings", "Bowling Average", "Bowling Economy Rate"]]
team_2019_newdf.to_csv("newfile.csv")

latent remnant
left tartan
#

Change the "s" in the read_csv to your file name

#

The problem with your original code was you had brackets around the series... ```py

d = {
"Player" : player_2019,
"Team" : player_2019,
"Batting Innings" : player_batting_innnings,
"Batting Average" : player_batting_avg,
"Batting Strike Rate" : player_strikerate,
"Bowling Innings" : player_bowling_innings,
"Bowling Averagw" : player_bowling_average,
"Bowling Economy Rate" : player_bowling_eco
}

latent remnant
latent remnant
left tartan
#

also, in jupyter, use "display(df)" instead of "print(df)", it'll look nicer.

latent remnant
#

can i ask what stringIO does?

left tartan
#

it lets me treat a string like a file/io object

latent remnant
latent remnant
past meteor
#

9 times out of 10 I learn stuff by just reading their official documentation. If it's sparse, shady or something odds are that I won't touch the package

latent remnant
#

Thank you so much! 😊

compact valley
#

is 8gb macbook pro m1 enough for data science?
I heard that i need more ram for data..?

mild dirge
#

If you want to run large models, and within a certain time limit, you probably want a desktop, or just a simple laptop and use servers to run the models @compact valley

#

I wouldn't recommend a laptop to run any big models, but you can use services like google collab to run it on their servers

compact valley
#

I have a desktop with 32gb ram

#

my company would also pay for cloud i guess thats an option here

mild dirge
#

32GB is enough, and most important is gpu for most models

#

And cpu, but that is often not the bottleneck

past meteor
#

My work laptop has 8GB ram

jaunty hinge
#

can i get help for an ARIMA model here

past meteor
#

99 % of my development is through SSH. I cycle to work, I don't want to carry a GPU monstosity uphill on my bike πŸ™‚

#

Also, my dev VM only had 4GB ram in the beginning until I annoyed IT enough to make it 8 and subsequently 64. I work with several tens of millions of rows of data. It's stupid but being constrained memory wise teaches you how to do things "properly"

#

Which matters a lot if/when you scale to terabyte size datasets

past meteor
jaunty hinge
# past meteor shoot! πŸ™‚

start=len(train)

end=len(train)+len(test)-1

pred=model.predict(start=start,end=end,typ='levels').rename('ARIMA Predictions')

pred.plot(legend=True)

test['AvgTemp'].plot(legend=True)

When I run this code I get this error

TypeError: Model.predict() missing 1 required positional argument: 'params'

#

idk how to fix it i need it for like an essay for my highschool

past meteor
#

Are you using statsmodels?

jaunty hinge
#

yeah

past meteor
#

Have you fit your model already?

jaunty hinge
#

uhh

#

im not sure

past meteor
#

Yeah, you've fit it πŸ™‚ now you can do model_fit.forecast()

#

I assume you want do out-of-sample forecasts? (note: what people mean with out-of-sample is that you fit your model with weather data until 2023 and then you use the model to see what 2024 is like)

jaunty hinge
#

yeah

#

how do i use like the code u mentioned because im soo new to python

finite bluff
#

hello every one

left tartan
#

I'm just adapting an example I had in a notebook already: ```py
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA

n_points = 100
x = np.linspace(0, 20 * np.pi, n_points)
noise = np.random.normal(0, 0.5, n_points)
y = 5 * np.sin(x / 2) + noise

p, d, q = 1, 1, 1
model = ARIMA(y, order=(p, d, q))
fit_model = model.fit()

forecast_steps = 20
forecast = fit_model.get_forecast(steps=forecast_steps)
conf_int = forecast.conf_int()

plt.figure(figsize=(12, 6))
plt.plot(y, label="Data")
plt.plot(np.arange(n_points, n_points + forecast_steps), forecast.predicted_mean, color="red", label="Forecast")
plt.fill_between(np.arange(n_points, n_points + forecast_steps), conf_int[:, 0], conf_int[:, 1], color="pink", alpha=0.3)
plt.legend()
plt.show()

print(fit_model.summary())

past meteor
#

So you have your screenshot right? Make a new cell in your notebook and then simply write model_fit.forecast()

#

You need to put the amount of steps you want to forecast for in the method. Otherwise it'll default to 1.

past meteor
# jaunty hinge this?

Are you familiar with the "statistics" behind ARIMA by the way, how did you pick your order parameters?

left tartan
past meteor
#

If it's weather data I actually think you need a SARIMA model

jaunty hinge
#

im just trying to get this math essay done for school

#

i hate ib

past meteor
#

Wild they have you doing ARIMA in high school

jaunty hinge
#

nah its a self thing

past meteor
#

That's crazy

jaunty hinge
#

i get to pick my topic

#

and i couldnt find anything for math

#

so i wanted to create a short term weather forecast

#

and it led me here

past meteor
#

What's your "window"? How many days are you considering?

jaunty hinge
#

7 days

past meteor
#

Okay then you don't need SARIMA you still might

#

What's the frequency of your measurements

jaunty hinge
#

wdym

past meteor
#

Do you have a measurement every hour? Every day, every minute?

jaunty hinge
#

oh every day

past meteor
#

Just one per day?

jaunty hinge
#

yeah

#

is that bad

past meteor
#

Okay then you don't need SARIMA at all. That's a relief

#

regular ARMA will be enough

jaunty hinge
#

So i did what u said i got this

#

im not rlly sure i know what this is

past meteor
#

I think you're just plotting your test data as-is now, no?

jaunty hinge
#

i think so

past meteor
#

When is your assignment due?

jaunty hinge
#

is there anyway i can show u my full code

jaunty hinge
past meteor
#

If you have the time I'd review basic Python and also watch some videos on AR, MA and ARIMA

jaunty hinge
#

i mean after i finish the model i need to write like 2500 words abt it and stuff

past meteor
#

Maybe I'm not the best at this but there's too much for me to unpack

jaunty hinge
#

ive been watching this tutorial

#

and i had to watch like so many other tutorials in the middle to get where he was

past meteor
#

Basically they tell you how correlated each measurement at time n is with n-1, n-2, n-3, ...

#

They directly inform you what the p and q of the (so the AR and the MA) of arima should be

jaunty hinge
#

oh yeah this was all fine

#

my data was stationary

past meteor
#

In your case the I should be 0, you shouldn't detrend

jaunty hinge
#

i just need to like predict and show the next forecast

past meteor
#

Your data being stationary or not is only related to the I parameter of ARIMA

#

If you just want to predict then you should save the result into a new variable so predictions = model_fit.forecast(steps=7)

jaunty hinge
#

the data i already have tho

finite bluff
#

hey guys, could I ask a question about this field?

past meteor
finite bluff
#

Oh sorry for that. I'm just curious about data science. Actually, I'm learning IT at my university and now I'm interested in data science. What things I have to learn first and what websites could help me to cover a whole range of topics in a simple way to kick off this journey :))

odd meteor
# finite bluff Oh sorry for that. I'm just curious about data science. Actually, I'm learning I...

https://Kaggle.com/learn combine that with YouTube + courses from Udemy or DataCamp or DataQuest etc (if you're interested in making a financial commitment), and you'll be well on your way

#

Check our pinned message for additional resource

lapis sequoia
left tartan
#

This channel really needs a sticky (pinned?)

odd meteor
past meteor
#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

past meteor
#

I think there was a data science specific one? idk

misty flint
#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

misty flint
#

oh hey that worked

left tartan
misty flint
#

i remember stel did a similar command

past meteor
#

The data science resources there are pretty meek though

misty flint
#

they are. we were talking about adding also some data eng resources on there

left tartan
#

Yah, I have a bunch I’d like to contribute for the data science stuff

misty flint
#

but i kinda dropped the ball on that

#

oops

#

@serene scaffold billybobby has volunteered. let me compile something for DE/MLE

left tartan
#

Lol

past meteor
#

Excel data engineering pithink (I'm joking)

misty flint
#

oh speaking of which

#

were you able to get it working

left tartan
#

No

misty flint
#

python in excel

left tartan
#

It's in a limited rollout

misty flint
#

rip

left tartan
#

so even though I have beta channel, they haven't pushed it to all beta users

misty flint
#

im seeing it more and more on my newsfeed

#

i think even fireship did a video

left tartan
#

it's interesting... the execution is a sandboxed anaconda install that runs in amazon cloud with about 400 pre-selected packages. Can't run whatever you want, though.

misty flint
#

400 pre-selected packages. i dont remember if thats more or less than what comes with regular anaconda

left tartan
#

but, this is a good thing, when compared to the security nightmare of vba.

misty flint
#

excel vba is a security nightmare def

#

idk how those finance folks arent aware

left tartan
#

necessary evil, perhaps.

misty flint
#

probs

past meteor
#

Compiling a data engineering list is hard(er) because it can mean literally anything

misty flint
#

it does but we will start somewhere

past meteor
#

good ol' SQL fundamentals

left tartan
past meteor
#

I think the issue with DE moreso than with DS is that it's a really tools-driven field

#

You can make an argument that on some level you're learning tools and not concepts

#

Or the old heads that reduce all of the domain to dimensional modelling and dashboards lemon_angrysad

left tartan
#
Medium

Common Backend Services & Opensource Data Stack

misty flint
#

met them in person. real cool folks πŸ‘

#

and ofc the diagram is from the first chapter of FoDE. i really sound like a broken record at this point

left tartan
#

hey, I put fode in my list

misty flint
#

and thats all that matters

left tartan
#

I guess I should throw duckdb in my list then

misty flint
#

my goal has been fulfilled

past meteor
#

I have never heard of that book interesting

#

You should add a Kimball / Inmon book to the list

#

That's how it started for me at least 🀷

misty flint
misty flint
past meteor
left tartan
#

hmm

past meteor
#

At least for my work I prefer it much more than DuckDB or SQL in general. My transformations are awkward to express as SQL queries. It's unnecessary pain.

worn stratus
past meteor
#

I'm still a big believer in the fact that people should YEET Pandas out of their data engineering stack

#

It's inferior in all respects compared to Polars except it integrating better with data viz and ML tools

#

The memory footprint being a lot smaller, better multi threading and the lazy API are just big selling points

worn stratus
past meteor
#

If you're working with time series. Group by dynamic is the absolute GOAT

worn stratus
#

the main thing pandas has going for it is the first movers advantage a ridiculous amount of tooling layered on top of it

misty flint
#

if that makes sense

past meteor
#

Sure, it makes sense

misty flint
#

for DS and DAs

#

for ETL and whatnot, def different workflows

past meteor
#

I still use Pandas when I'm getting very close to my "final" application

misty flint
#

yeah just to peek and look

#

i do the same

past meteor
#

E.g., when I'm getting close to sklearn and whatnot I use Pandas

worn stratus
#

I'm completely off of pandas at work at this point, which might screw me when I come to interview

past meteor
#

I think Spark / Databricks are (sadly) data engineering essentials

#

Why read a small CSV in Python when you can wait several seconds for the JVM to start!! πŸ˜‰

worn stratus
past meteor
#

I'd prefer doing Spark in Scala if not only for the fact that UDFs are a lot less painful and they don't tank your performance as much

left tartan
past meteor
#

DuckDB is just SQL, there's nothing to learn?

worn stratus
worn stratus
left tartan
past meteor
#

I don't fully understand the DuckDB hype

#

I get it for people that don't know Python

#

The people that are going for the whole DuckDB + DBT set-up

left tartan
#

Oh, dbt is heaven... but:

past meteor
#

For the rest DuckDB and Polars are quite similar in terms of capabilities. The question becomes "do you want to write SQL or have a dataframe API"

#

I want a dataframe API any day of the week because you get all of software's best practices for free

worn stratus
#

i get the duckdb hype. SQL is already the lingua franca of the data world, and it let's you avoid pandas which is basically a dsl largely separate from actual python

left tartan
#

the duckdb team is just cranking out features and it's just so much better than what's out there. For example: columns with regular expressions or lambdas or Python UDFs or pivot/unpivot

misty flint
left tartan
#

(I'm not affiliated, but just what they've shipped in the past year is amazing)

past meteor
#

SQL overuse can be terrible as well

left tartan
#

as_of joins, positional joins, etc.

past meteor
#

It's declarative but when you need something that doesn't exist natively you end up tying together all of those declarative things into a procedural monster

misty flint
#

when you run the sql query in order to get more sql queries galactic_brain

past meteor
left tartan
#

Note the inline use of the global dataframe: ```py
import duckdb
import math
import pandas as pd
somedf = pd.DataFrame({"i": [i for i in range(22)]})
with duckdb.connect() as con:
con.create_function('factorial_method_native', lambda x: math.factorial(x), [duckdb.typing.HUGEINT], duckdb.typing.HUGEINT, type='native')
df = con.sql('select factorial_method_native(i) from somedf tbl(i)').df()
print(df)

misty flint
past meteor
#

Writing strings in Python also just sucks

#

it's a hack

left tartan
left tartan
past meteor
#

I've looked at DuckDB extensively tbh. I've even "sold" it to colleagues using worse tools (laughs in R)

left tartan
#

but for me, I can join a polars df, with hive partitioned parquet files, with a csv file, in a single operation.

#

that's just black magic for me.

past meteor
#

Went to talks ran by elite data engineering teams on DuckDB etc as well

#

But I just still don't see why I'd squeeze it into my current workflow

#

It looks inferior to my current setup, just my 2 cents

worn stratus
#

the main thing putting me off of duckdb is that it's kinda redundant for analysis once you're data is already in some SQL database.

why pull data from snowflake just to change the SQL flavor?

misty flint
#

i think it has a dif use case imo

left tartan
#

I mean, if you're in snowflake, you can stay in snowflake, I prob wouldn't change that.

past meteor
#

There's edge cases where I would use duckDB though, I think its streaming and larger-than-memory support is better than Polars

#

If I were really really memory constrained I'd move to DuckDB

#

But realistically, at that point they should hire someone else. I'm not a data engineer 😒

#

Everywhere I've been everyone just sucked at it so I do it, with pleasure

#

Otherwise the data pipeline would be csv files on teams

worn stratus
left tartan
#

Yah, I'd mix it in, personally... duckdb works over polars /arrow data too, so you can use either or both

misty flint
#

chroma, vector db, is built on top of duckdb so theres that

#

i need that one meme where you lift the mask and its just duckdb underneath

past meteor
#

I don't understand vector db's either

misty flint
#

so you got some embeddings right?

past meteor
#

I get that part

misty flint
#

these are lists of floating point numbers

#

and you need to read/write them

#

super duper fast

past meteor
#

What I don't get is why the tech is so overhyped

misty flint
#

current db systems suck at this

misty flint
past meteor
#

Just serializing it normally and doing a dot product works unless you have a really really large set of embeddings

misty flint
#

the issue comes down to the model

past meteor
#

People have been doing algebraic topic modelling for ages without vector dbs

misty flint
#

some models generate embeddings that are only 300ish dimensions per "row" (looks at SBERT)

#

some are wildly large embeddings

#

with huge dimensions

past meteor
#

Just benchmark matrix vector multiplication with different sizes in numpy and then ask yourself if the hype is warranted. I have and for me it was a "no"

misty flint
#

i mean my work mentor built his own vector db doing just what you described

#

however

#

just look at how much money pinecone is raking in lol

#

there are some use cases where some stuff just doesnt scale

#

someone in this server was telling about it

#

and how they were waiting for a good vector db to come out

#

thats equivalent to cassandra

past meteor
#

This exists but it's not proportional to the vector db propaganda

misty flint
#

agreed

past meteor
#

I did a course on information retrieval in uni and I guess this is why they reminded us that the triangle inequality is a thing

#

There's also extensive research on approximate nearest neighbour search etc

upper flame
#

@past meteor i sent u a dm

cyan belfry
#

Hi 😊
new to this topic, I have a huge trading data set of about 420k rows each day
what will be the best way to find similar patterns for all the data sets I have ?
And after that of course convert it into a model

cyan belfry
left tartan
#

Maybe narrow down your question then?

cyan belfry
#

im not sure how πŸ˜…

#

I thought maybe an AI could go through the data and find something

cyan belfry
ashen ore
#

What are good laptops out there rn for performing deep learning tasks and processing other ml models

left tartan
desert oar
#

save your 10 GB subset locally and go to town with duckdb, polars, pandas, whatever

misty flint
#

@serene scaffold i ended up compiling a basic list aimed towards beginners. ill send you a link to the notion doc if youre still interested in updating pydis resources

past meteor
desert oar
past meteor
#

It really is beautiful technology though. Fully separate compute and storage. Predicate pushdown. S3-like files with a SQL interface that make it act like a RDBMS, ...

#

I'm wary of doing R&D in the cloud though

past meteor
upper flame
#

Hey can someone assist me with a code pls. it's rlly URGENT. For volunteers ping me or dm

slim bone
#

I'm trying to understand the point of max-pooling, at least on an intuitive level.
The book I'm reading is trying to explain this but I can't understand the explanation at all, thus I can't really formulate a concrete question besides "Why do we actually use max-pooling?"
For the sake of clarity: I know what it does, just not why it's used.
Thanks in advance

Edit: Trying to research this a little further - and it seems like there's no definitive answer. Am I digging too much into things? Because indeed, most of my learning experience with a lot of subjects related to ML could sort of be summed up with "X works, because someone thought X could work, so they implemented X into some models and noticed an improvement"
So perhaps there's a meta question here: Should I even bother trying to understand some of these concepts?

I've attached the explanation I'm having trouble with

pallid badge
#

Would this be the correct subchannel to discuss hdf5 files?

serene scaffold
misty flint
twin forge
#
def OHLCV(list_tickers, start, end):
    ohlcv = {}

    for t in list_tickers:
        try:
            data = yf.download(t, start=start, end=end, interval="1d", repair=True).dropna()
            if not data.empty:
                ohlcv[t] = data
        except:
            pass
    return ohlcv```

Hey guys I'm working with a big ticker sample that includes delisted stocks but as you can see there's tons of price data series behaving incorrectly.

How can I identify these bad time series data so that I can remove them from my analysis? 

I'm thinking something along the lines of "based on how the other prices reacted during time interval, remove these tickers from ohlcv"
verbal venture
#

Can someone help me with this code: ```py
def initialize_model(N,V, random_seed=1):
'''
Inputs:
N: dimension of hidden vector
V: dimension of vocabulary
random_seed: random seed for consistent results in the unit tests
Outputs:
W1, W2, b1, b2: initialized weights and biases
'''

### START CODE HERE (Replace instances of 'None' with your code) ###
np.random.seed(random_seed)
# W1 has shape (N,V)
W1 = np.random.randn(N, V)

# W2 has shape (V,N)
W2 = np.random.randn(V, N)

# b1 has shape (N,1)
b1 = np.zeros((N, 1))

# b2 has shape (V,1)
b2 = np.zeros((V, 1))

### END CODE HERE ###
return W1, W2, b1, b2```
left tartan
twin forge
#

could I simply .apply() something?

#

maybe for example: if stock volume is under x don't add it to ohlcv_dict?

slim flicker
#

!verify

#

!voiceverify

#

!voiceverify

verbal venture
#

it's not passing tests

serene scaffold
serene scaffold
verbal venture
#

Wrong initialization for b2 vector. Check the use of the random seed.
Expected: [[0.77951459]
[0.02293309]
[0.57766286]
[0.00164217]
[0.51547261]]
Got: [[0.]
[0.]
[0.]
[0.]
[0.]].
24 Tests passed
12 Tests failed

#

I don't know what I could change about the func. I did torch.randn and regular np.zeros() but got the same test output

serene scaffold
verbal venture
#

ah

left tartan
verbal venture
#

I got the same output

serene scaffold
verbal venture
#

change b1 & b2 to randn

left tartan
verbal venture
#

Wrong initialization for b2 vector. Check the use of the random seed.
Expected: [[0.77951459]
[0.02293309]
[0.57766286]
[0.00164217]
[0.51547261]]
Got: [[-0.10106761]
[-0.05230815]
[ 0.24921766]
[ 0.19766009]
[ 1.33484857]].

serene scaffold
#

@verbal venture the n in randn stands for normal. It does something different than the general purpose random array generator

verbal venture
#

ah ok, how long have you been doing AI for

serene scaffold
verbal venture
#

all NLP?

serene scaffold
#

Pretty much

verbal venture
#

is it just transformers nowadays?

#

like in productin

serene scaffold
#

Yes

#

Everyone wants large language models that use transformers

verbal venture
#

figured

#

can you help with this as well? ```py
def back_prop(x, yhat, y, h, W1, W2, b1, b2, batch_size):
'''
Inputs:
x: average one hot vector for the context
yhat: prediction (estimate of y)
y: target vector
h: hidden vector (see eq. 1)
W1, W2, b1, b2: matrices and biases
batch_size: batch size
Outputs:
grad_W1, grad_W2, grad_b1, grad_b2: gradients of matrices and biases
'''

# Compute z1 as "W1β‹…x + b1"
z1 = np.dot(W1, x) + b1

### START CODE HERE (Replace instanes of 'None' with your code) ###

# Compute l1 as W2^T (Yhat - Y)
l1 = (yhat - y)

# if z1 < 0, then l1 = 0
# otherwise l1 = l1
# (this is already implemented for you)

l1[z1 < 0] = 0 # use "l1" to compute gradients below

# compute the gradient for W1
grad_W1 = np.dot(l1, x.T) / batch_size

# Compute gradient of W2
grad_W2 = np.dot(l1, h.T) / batch_size

# compute gradient for b1
grad_b1 = np.sum(l1, axis=1, keepdims=True) / batch_size

# compute gradient for b2
grad_b2 = np.sum(yhat - y, axis=1, keepdims=True) / batch_size
### END CODE HERE ###

return grad_W1, grad_W2, grad_b1, grad_b2
#

error is: boolean index did not match indexed array along dimension 0; dimension is 5778 but corresponding boolean dimension is 50

orchid sky
#

!resources ai

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

twilit tundra
# slim bone I'm trying to understand the point of max-pooling, at least on an intuitive leve...

Why pooling? Because it's helpful to scale CNNs.

Why maxpooling? It introduces nonlinearity and the intuition is pretty natural: you're letting a group of neurons fire if at least one of them fires.

It also makes the whole computation more robust (for example if you detect an object only on one corner of your pooling box, the signal still carries over when it would have been 'obfuscated' by an avg pooling)

twilit tundra
#

There are alternatives (most notably using a convolutional layer with stride 2 https://arxiv.org/pdf/1412.6806.pdf) but it didn't really improve the performances of visual models, so it's not popular outside of academics

slim bone
#

It all feels so... "handwavy" I think is the terminology in English

#

When I thought about it further I managed to conjure up some random explanation where "The most dominant pixels on the feature map are carried over, and thus it's perhaps easier to concentrate on the relevant features of the image" but honestly it seems like any effort at trying to explain this phenomena on an intuitive level without the appropriate background is futile

wooden sail
# slim bone When I thought about it further I managed to conjure up some random explanation ...

i think this might be a nice read for you https://www.di.ens.fr/willow/pdfs/icml2010b.pdf

at the end of the day, max pooling is a heuristic. it doesn't always work nor make sense. the paper i linked explores the regime in which max pooling is beneficial. keep in mind downsampling is a common operation in signal processing that can be justified in many ways, e.g. the spectrum of the data is relatively low frequency, allowing downsampling without loss of information, or the data admitting a sparse representation in some domain. max pooling is one choice of irregular downsampling that can be beneficial depending on the properties of the data. there are also cases in which you do lose information by downsampling, but the high frequency information benefits from using the low frequency info as a "prior" or "initial guess", where it makes sense to consider both the downsampled data and the full data at the same time (think e.g. u-nets with "skip connections". but yeah, it doesn't always make sense and there is also no guarantee of optimality for it in general

#

but as you correctly point out, someone tried stuff and this worked in many cases πŸ˜› people looked for explanations and interpretations after

shadow viper
#

helo everyone

#

how can you check for top 5 states with top 10 most demanding services each in the us?

#

how exactly can i get this data?

worn stratus
worn stratus
slim bone
# wooden sail i think this might be a nice read for you https://www.di.ens.fr/willow/pdfs/icml...

Thank you both for the detailed explanation
Admittedly I don't entirely understand the terminology behind "High/low frequency data" (Not sure if signal processing is necessarily even taught in my degree) but I think I got the idea of what you're trying to say, something like "Transformations of the type F^n -> F^m where m < n can still be "rich" with information and often help the computer process it" or something along those lines? ^^;

I really just wanted to hear that there isn't a single concrete explanation for this and that I'm not missing some fundamental theory. So indeed, I think you put my mind at ease

wooden sail
#

if you wanna think about it that way, it'd be that the original signal in F^n is in a vector space of dimension d <= m

#

then T: F^n -> F^m can be invertible from the left if you apply it to an input that is not in the null space of T

#

the case of "signal frequency" is a particular choice of T where we have a unitary matrix U in F^nxn, and we construct T by keeping m columns of U and transposing. the particular U of orthogonal complex exponentials is the "fourier basis" used to look at the frequency domain of signals, but other choices are possible

slim bone
#

My Linear Algebra is a little rusty, maybe that's why I'm not getting it

wooden sail
#

that's not necessary depending on what you do after, but it can be nice to have

slim bone
wooden sail
#

for example with u-nets that i mentioned earlier, you can use a network to denoise an image

#

you can do this by slowly downsampling an image down to very few samples, and then using those few samples to reconstruct the full image without noise

#

similar things are done all the time in image processing more generally under the guise of "auto-encoders"

slim bone
#

Completely out of context but is "Image processing" its own field?

#

Like ML or whatever

wooden sail
#

yesn't πŸ˜›

#

in all of signal processing, the general maths are transferrable

#

but in each application within it you can go arbitrarily deep & cursed

#

at some point you start seeing stuff that is only done in image processing and virtually no other signal processing/ML application

slim bone
#

Ah, so it's not that simple

#

Got it haha

slender kestrel
wooden sail
#

hi

slender kestrel
#

hows you ! and hows life been ?

wooden sail
#

been ok, i'm not sure i remember you though

slender kestrel
#

you helped me once on a problem i was stuck on

wooden sail
#

i see, hopefully that worked out well πŸ˜›

slender kestrel
proper meteor
#

someone help with the shape of the images pls

wooden sail
past meteor
#

Principal component analysis'

wooden sail
#

PCA is one way of doing it, but not the only

past meteor
#

Definitely .

wooden sail
#

technically any time you use a parametric representation with fewer parameters than there are data points, this happens

rugged mist
#

oh interesting i know about pca

wooden sail
#

i.e. fitting the 3 parameters of a sinusoid to 100 time domain samples

past meteor
#

It's more general than PCA, I guess any compression and reconstruction pair can work

wooden sail
#

or linear regression

#

yep

past meteor
#

In uni I always wondered why PCA specifically was always mentioned but in the end, the properties of the method just make it ultra suitable

wooden sail
#

PCA is mentioned because it ties to covariance matrices and the fundamental theorem of linear algebra in one shot

#

super clean whenever your data is contained in a comparatively low dimensional linear subspace of the overall space the data is in

past meteor
#

The orthogonality of the eigenvectors is a large selling point

wooden sail
#

i would say that's also not even necessary since you can always gauss jordan after finding a basis

past meteor
#

Would that apply to say the bottleneck of an autoencoder?

wooden sail
#

how do you mean?

past meteor
#

Conceptually they're similar things as your principal components, they "compress" the data

wooden sail
#

ah yeah

past meteor
#

But that property does not hold for autoencoders

wooden sail
#

which property?

past meteor
#

Orthogonality of the neurons in the bottleneck

wooden sail
#

it's kinda moot since it's nonlinear anyway

past meteor
#

Tbh people have been playing with this, see beta-autoencoders

wooden sail
#

beta encoder means something related to ADCs to me, lemme see if i can find what it means in ML

past meteor
#

Or beta VAE's, I'm not at my computer right now so I can't look up the specifics. The idea is just that people want representations that are independent

wooden sail
#

you'd have to define what you mean by independent. if it's the neurons you wanna make indep, you need to define some sort of inner product or distance metric

#

if it's an architecture that ends up generating a matrix of atoms, then the final result is some sort of frame and ideas similar to those of PCA apply

stiff dove
#

Hello everyone can anybody tell me this kind of loss graph indicates that my learning rate is too high or is there another issue?

past meteor
#

I'm a bit rusty in the specifics as you can see, I haven't read the paper in a while

wooden sail
#

i'm also not savvy on VAEs, so i can only discuss this very superficially

#

but a cursory glance at a beta VAE paper makes me think this is indeed the case. some regularized learning of "latent factors", but i can't tell if these are used to decompose the data linearly or nonlinearly

#

in either case, it's more or less the same idea: if you can construct a decent model for your data, you can then focus on learning the parameters of that model instead of learning everything from scratch. those parameters are usually few

past meteor
#

Personally I'm a big sucker for kernel methods (when I can use them) so I'd just use KPCA over these

wooden sail
#

i would also lump those in the same category

past meteor
#

Of course, they're all conceptually the same

weak mortar
#

Hi πŸ™‚ hobbyist playing with some dataframes, backtesting and data vizualisation here. Looking for suggestion for some cool library or software that i can use to further visualize the data. Heatmaps and matplotlib are fine but my dataframe with results will be 3 or maybe more axis, thinking i could find some more tools to assist me here?

left tartan
frozen vessel
#

Heyy guys

#

I need some help

weak mortar
#

Good, the backtesting.py lib i use plots price graph and other stuff with bokeh, but it is very heavy on the computer to load. Okay i used a bit seaborn for heatmaps will look into its other features and plotly. Would you visualize it in 3d if you had 3 axis?

frozen vessel
#

so I wanted to build a feature that allows the user to say input into their microphone, then this input gets translated to english (if it isn't already in english) and then gets printed out in the form of text. Basically a speech to text model with translation

#

Now, the requirement of this project is that it needs to be done using an API (school mandated it 😭 )

left tartan
#

I rarely have liked 3d visualizations, it tends to be noisy and hard to visually interpret.

frozen vessel
#

I tried doing this as my code, it records the input but then for some reason can't transcribe it

#

used whisper api from openai

left tartan
frozen vessel
#

I'd appreciate any help I could get

weak mortar
#

I have 1minute data but rn im resampling it to 5min to save time and also bokeh plots evrry single candle so it lags alot

left tartan
weak mortar
#

The lib should be able to resample before sending data to bokeh but it didnt work for me

left tartan
weak mortar
#

Yeah maybe i will just plot linecharts of equity and close prices with matplotlib instead. Its not that important, more all the metrics of the results i look at

left tartan
#

fwiw, plotly.express is kinda nice:

#
import plotly.express as px
fig = px.line(df, x="date", y="price", color="equity")
fig.show()
weak mortar
#

Okay looks alot like matplotlib syntax iirc πŸ™‚ ill have a shot at them all today. Like to just play around then eventually keep what i see works well

left tartan
#

yah, it's intentionally matplotlibby

weak mortar
#

I see many use jupyter, but it never really appealed to me

#

I just append all the results into a html page

left tartan
#

I primarily work in notebooks, but that's my business. But, we continually refactor anything complex into separate modules.

#

It's a nice environment for doing this type of work, sort of a all in one (repl, visualization, stateful kernel, etc).

weak mortar
#

I can see how its probably faster and more flexible than when im modifying the htm with .replace(), tag by tag 🫣😝

left tartan
weak mortar
#

it looks actually quite nice

#

i think i would be better off using that to make some interactive functionality in the visualized data

#

and make stuff look pretty

polar mason
#

I have a small issue with graphs and I dont exactly know which library would be best.

The data is:
x length = 4000
Y length = 10,000
Z length = 255

Drawing up the actual plot with matplotlib takes so long

#

i was actually thinking of using a third party software just for plotting but I was wondering what would be the best way to show this data

serene scaffold
agile cobalt
#

you could aggregate it before plotting, but a third party software won't be much more faster than matplotlib if it were just trying to do the same thing as you're trying to do

(it could do it in a way smarter than what you're trying to do though)

polar mason
#

so I cant get rid of any of the data at the moment, all individual nodes are important

serene scaffold
polar mason
#

ive tried it with a smaller Y length and its perfect, but I want to do the full 10k plots

serene scaffold
#

Spaced out? Does that mean you'd need a giant monitor to view it?

agile cobalt
#

you can probably aggregate it in buckets of 40x100x1 down to a 100 x 100 x 255 grid without any real loss

polar mason
serene scaffold
polar mason
#

Just needs to fit on my screen

serene scaffold
#

Is your screen the size of a television?

polar mason
#

I need a highres image that I can zoom in on

polar mason
serene scaffold
#

What would you do once you zoom in? Because you can make more than one plot

#

I have to go all of the sudden.

polar mason
#

I just need to plot all of it so I can see which parts need to be edited and changed

#

It takes an hour with only 5000 plots and its killing me

polar mason
#

im still trying to find ways to space out the whole scatter and increase the amount i can plot

#

Actually, Im gonna look into VisPy

polar mason
#

ah I ended up answering my own issue, it was an issue of fig size haha

left tartan
# serene scaffold What would you do once you zoom in? Because you can make more than one plot

As of 2011, The Great Picture (111 feet (34 m) wide and 32 feet (9.8 m) high) holds the Guinness World Record for the largest print photograph, and the camera with which it was made holds a record for being the world's largest. The photograph was taken in 2006 as part of the Legacy Project, a photographic compilation and record of the history of...

polar mason
#

like give me a 12k projector

left tartan
#

rent a movie theater

polar mason
#

anyway this actually works atm so im cool with it

polar mason
left tartan
polar mason
left tartan
#

I mean, what library?

polar mason
#

matplotlib

#

which i was surprised at

left tartan
#

Oh, interesting, came out pretty nice... assumed it was something else

polar mason
#

its a 10k image res tho

#

so it takes up like 150 mbs per image

#

but i get good stuff like this

left tartan
#

What kind of sensor data?

polar mason
#

I believe vibration data?

#

yeah its vibrations over an array of 4000 locations

#

this is part of my thesis atm

#

except i cant get enough of making this lil plot spin

#

it spin

cerulean kayak
#

can someone please explain to me what %matplotlib inline does in Jypyter?

polar mason
polar mason
#

more bigger spin

left tartan
#

I think it's enabled by default though, at least in my environment

cerulean kayak
left tartan
#

Yah, doesn't do anything for me. I think it used to create a separate image or something

humble portal
#

I'm trying to use an implementation of FIt-SNE, but it runs out of memory on an A10 at this line:

dY = torch.sum(
            (PQ * num.to(device)).unsqueeze(1).repeat(1, no_dims, 1).transpose(2, 1) * (Y.unsqueeze(1) - Y),
            dim=1
        )

Unfortunately, these variables are poorly named and undocumented. However, the function preceeding the line where the function runs out of memory is

sum_Y = torch.sum(Y * Y, dim=1)
num = -2. * torch.mm(Y, Y.t())  # (N, N)
num = 1. / (1. + (num.to('cpu') + sum_Y.to('cpu')).t().to('cpu') + sum_Y.to('cpu'))
num.fill_diagonal_(0)
Q = num / torch.sum(num)
Q = torch.max(Q, torch.tensor(1e-12, device=Q.device))
Q = Q.to(device)

# Compute gradient
PQ = P - Q

Q.to('cpu')
P.to('cpu')

Y is initialized to a tensor of zeros equal to the length of the dataset and the code given as well as the line that breaks is within a loop that runs for a given number of times. P is some tensor based on the dataset (I think it's the dataset after undergoing some dimension reduction).

Essentially: How can I change this expression to calcualte dY with less memory?

polar mason
#

the actual file is about 150 mbs

simple tapir
#

hey

#
data_id_mean = data.ID.mean()
data.ID.map(lambda p: p - data_id_mean)

What does this actually do?

#

We take the mean of ID column but what's p here? What do we minus from mean of IDs?

small wedge
simple tapir
#

hmm, I see thanks

weak mortar
#

While i always liked numbers, i think i should acquire some knowledge about statistics. I could ask a dozen questions about variance, standardization and quantiles, but maybe i should read a book about it. Not that i like books alot though. Any suggestions? And also, good morning

#

(Non professional data scientist, algotrader thats asking)

twilit tundra
weak mortar
#

I like the straight forward title! Thx

#

Will look for it

twilit tundra
#

If you prefer videos, statquest has a lot of short and accessible videos

weak mortar
#

Even if i only end up reading half of it like all other books i try and read.. still a half book wiser

weak mortar
#

While it makes sense to just loom at the bulk of the distribution (quantiles), the outliers do also have relevance πŸ€” i think the big challenge is to narrow down the metrics to the most useful ones to assist an informed decision

unique ether
#

I've narrowed it down to the following fields: Statistics, Linear Algebra, Probability theory and Calculus.

pine wolf
#

though calculus will probably be taught before linear algebra in college

cosmic harbor
#

Hello everyone,
After building Pytorch from source, CUDA is not available. Here is what I did:

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git submodule sync
git submodule update --init --recursive
conda install cmake ninja
pip install -r requirements.txt
conda install mkl mkl-include
conda install -c pytorch magma-cuda115
make triton
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py develop

Can anyone help?

unique ether
past meteor
#

so you can do it after lin alg/calc

unique ether
past meteor
#

It will depend on your text book but we had mini proofs in probability theory that required a working knowledge of say integrals

unique ether
#

Basically my situation right now is this: I'm starting a 1 year conversion course soon as some of you already know going from a Bsc in Nat Sci to an Msc in AI and ML.

I know a bit of statistical testing from my Nat Sci course but not too much.

I've been studying python for like 10 hours a day for the last month and now I'm trying to brush up on my maths. What would you all reccomend I focus on? I have roughly 20 days until the start of my course.

#

Just assume I'll be studying 10 hours a day up until the course starts

#

Because I will be

pine wolf
#

the calculus you need for ML i don't think is too deep, at least not to start; not that i'm that familiar with ML -- but linalg is everywhere

unique ether
#

maybe a lil bit of calculus 2?

pine wolf
#

yeah, like calc 1 to understand back propagation

unique ether
#

Rodget that

#

Many thanks mate

#

I'm actually watching an udemy course on the fundamentals of math right now. Just to refresh my math knowledge and have a solid foundation going forward.

#

I'm thinking: Math fundementals (refresher) --> Algebra (refresher) --> Linear Algebra --> Calculus 1

keen solstice
#

hey guys,I had a question.
WHen I type print( in my jupyter lab, it doesnt complete the other bracket at the end ,and nor does it automatically add the ending quotation mark if I type the opening one,

Any fix?

unique ether
#

are you typing in a markdown cell?

keen solstice
#

no in a code cell

unique ether
#

I'm not sure then mate sorry. I'm new too.

polar mason
#

me and my friends have been debating.

For just general AI and data science, higher ram speeds or more ram volume?

#

like we where talking about how you could chunk it to make more out of the speeds, but then more volume means you can just load it at once

#

which is better?

untold bloom
potent sky
unique ether
#

At its code, AI and ML is just maths isn't it?

#

I'm starting to realise that

past meteor
#

Like an engineering discipline the real world matters (the problem you're solving) and also your constraints (stuff related to comp sci)

unique ether
#

So its just applying maths to a problem within the constraints of computer science

unique ether
#

Do you lot think that trying to code math problems into python is good practice for a career in ML?

#

math problems and formulas and equations and such

wooden sail
#

that's unavoidably a large part of what you'll do

#

along with the practical implications of handling huge amounts of data and doing math with stuff that doesn't fit in memory

left tartan
# unique ether Rodget that

fwiw, these videos are wonderful intros for linear and calc. Just a high level intro, but I'd suggest watching it before and after you take either course... it's just really well done: https://www.youtube.com/@3blue1brown/courses. And for calculus, this is a gentle intro to Calculus taught by one of the great professors of the subject: https://ocw.mit.edu/courses/res-18-005-highlights-of-calculus-spring-2010/. There are probably other options for linear & calculus deep dives, but his explanations are probably my favorite. I absolutely love his proof of the derivative of e^x.

flint orbit
#

Hi guys, is this the right place to ask questions about Jupyter?

flint orbit
# past meteor yes, ask away

I think I solved it. The problem was with the fact that jupyter closed the cell for editing right at the moment when I typed something and stopped even for a short moment. Presumably it was related to autosave setting of vs code, I'm running notebooks inside it.

twin forge
#
from uniqed.runners.tof_run import detect_outlier

df = detect_outlier(ohlcv_dict["UPRO"]["Adj Close"], cutoff_n=80)


Cell In[4], line 1
    df = detect_outlier(ohlcv_dict["UPRO"]["Adj Close"], cutoff_n=80)

  File ~\AppData\Roaming\Python\Python310\site-packages\uniqed\runners\tof_run.py:34 in detect_outlier
    np_time_series = time_series.values[:, 0]

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

but my test run was unsuccessfull, I've tried reset_index() on the "Adj Close" series to create a "Date" column, and .reshape(). Any ideas?

left tartan
#

For example, if there's an erroneous 0 after trading hours /etc, perhaps you want to remove them. Or, perhaps you just want to restrict your data to the trading window.

twin forge
left tartan
#

You seem to not want to look at the data? I don't really get it. Pick a spot where a particular equity has a lot of zeros and look at it.

twin forge
left tartan
#

The dataframe, is what I'm talking about.

twin forge
#

sure but saying "it's after hour data" without even looking if yahoo finance registers it

#

kinda ducky_sphere

left tartan
#

I said "perhaps".

#

You won't know until you actually inspect the anomalies.

lavish ember
#

should i install packages like tensorflow globally or in virtual environment? I am beginner.

frosty root
#

hey so im writing my college application and one of the supplementals is why i want to do the honors program, im saying that i want to do it for the mentorship so i can learn a ton ab ai and using it (which is true) but i dont know what there is in the real world with ai that isnt taught in the basic courses i screenshotted below. any thoughts on some things that college doesnt teach you in ai that a mentor from the industry could?

small wedge
frosty root
molten onyx
#

i have a bug in my backpropagation function i have a feeling it's with calculating the derivative. can you tell me if i have a obvious flaw in my code?
the code is in c++ but i think the its understandable

    double derivative_outputlayer(double current_weight, double output, double prev_output, double target)
    {
        double f1 = output - target;
        double f2 = output * (1 - output);
        double f3 = prev_output * current_weight;

        double derivative = f1 * f2 * f3;
        
        return derivative;
    }

    double derivative_hiddenlayer(double current_weight, double output, double prev_output, double sum_gradients)
    {
        double f1 = output * (1 - output);
        double f2 = prev_output;
        double f3 = sum_gradients;

        double derivative = f1 * f2 * f3;

        return derivative;
    }
fleet musk
#

Hi. Anaconda keeps telling me to update to latest version.

If i update, will it update all installed packages across my envs? Will my old projects break?

past meteor
# molten onyx i have a bug in my backpropagation function i have a feeling it's with calculati...

Can I comment on the code itself a bit?

  1. I would give more descriptive names to f1, f2 and f3 especially considering f1 in function 1 == f1 in function2. To me this detracts from the readability.

  2. I'd rename the function to make clear that you're dealing with a sigmoid. If I see "derivative_outputlayer" I then check the body of the function to see what loss you're using. You can help the reader a bit out here πŸ™‚

molten onyx
past meteor
#

double derivative_outputlayer_sigmoid(double current_weight, double output, double prev_output, double target)
    {
        double error = output - target;
        double activation_derivative = output * (1 -output);


        double derivative = error * activation_derivative * prev_output
        
        return derivative;
    }```
#

it's also strange that you're doing the derivative w.r.t. a single neuron in the previous layer

molten onyx
#

im kinda confused anyways with the howl the backprop thing. i watched a few videos and everytime when i watched them i had a million question which were left unanswerd

past meteor
#

Okay can I give you a protip then?

#

Start smaller. Just write out gradient descent in any language you want. Then turn it into stochastic gradient descent. Then add regularization.

#

Write your own data generating function. Play around with the amount of regularization, play with the batch size. See what it's doing and try and find out why.

#

Once you've done SGD for a simple linear model implementing backprop will be a bit easier. Also, it's important to know that nowadays people don't do backprop like this. They use automatic differentiation

molten onyx
#

ok, thanks! i think its the bes advice ive ever gotten in my howl Journey.

#

ik that backprop isnt as relevant nowadays but i find it quite interesting and im only gonna use it to apply for a apprenticeship

humble portal
past meteor
#

https://arxiv.org/abs/2106.11342 <= this book also uses the method of starting with linear regression and moving to neural nets if you want more "guidance"

molten onyx
#

ill look into it tommorow, thanks!

humble portal
#

42000 elements

past meteor
#

If I recall correctly, but you'll have to fact check me on this, t-sne does form some kind of pairwise similarity matrix, basically an NxN thing

humble portal
#

Sorry, I forgot to copy the post with the dimensions. PQ and num are (42000 x 42000) while, Y is (42000 x 2)

past meteor
#

42000x42000 x floating point precision

humble portal
#

Yeah pretty much. I'm not well versed in how t-sne works, I'm just trying to visualize datasets so that I can display coreset coverage, and there aren't any good torch implementations of t-sne, so I'm trying to get the only not-completely-garbage one I could find to work

past meteor
#

If it's double precision you're looking at idk 14.1GB ram?

#

How much memory do you have?

#

Does it need to be from Torch?

humble portal
#

That step is trying to create 13.2GiB, so yeah that seems accurate.

It does in fact need to work with torch. Keras will not work for my implementation. I have 2 A10s, although it seems that this can only utilize one at any given time.

past meteor
humble portal
#

I don't believe that the sklearn method works with tensors

past meteor
#

nope

humble portal
#

It also seems to be a CPU-compute method, so it won't function well with such a large dataset, let alone what I need to eventually do with IMAGENET

past meteor
#

Plus it'd be a moot point if it goes ahead and forms that pairwise distance matrix anyway.

Are you open to other ways to visualise your dataset?

humble portal
#

Such as?

It seems that t-SNE is the standard now in domains that require a representation of "distance"

past meteor
#

Heads up though, I've had a month long holiday and I'm a bit rusty ML-wise so maybe I'm not seeing something.

#

If you have images you can just ram them through any pretrained model and compute whatever distance you want. Just remove the top of the network, flatten or pool, whichever you want and then compute the distance

#

Also, at that point you have a vector. You can PCA this vector if you want and plot that way. This is what I did in the past sometimes.

humble portal
past meteor
#

Let me read your original post and think for a bit. EDIT: not much to go on for your problem πŸ˜›

past meteor
humble portal
#

I can't say much on the topic, but I'm attempting to use coreset selection to guess the bounding manifolds of the individual classes within a space. I want to display the dataset in a way where I can highlight certain points based on said coreset selection.

stable cape
#

Hello! I have a question. I have an algorithm that gathers data on financial performance, keeps it and yields it to an AI. However, im facing problems feeding the ai bc the amount of data is too large. To solve this, I am trying with langchain models, but i dont know which one and how to use it. Does anyone know about it ? If not, do you know any alternatives?

#

langchain agents*

past meteor
#

My profs loved kernel methods so we covered Determinantal point processes which are probably coreset selection now that I know what the term is

#

You'll run into the same issue there cause as you probably know Kernel methods need that NxN matrix as well 😿

humble portal
past meteor
#

Or am I missing something, especially since you mentioned imagenet

humble portal
#

I guess you're correct there. Hmm well since I only need to calculate it once for each dataset, I guess a CPU-bound option works. How fast is the sklearn option? Even if I need to run over the course of a few days, there shouldn't be too much of an issue.

past meteor
#

The docs say it can take hours 🀣

pallid badge
#

Hi, I need to process 200000 images. I want to store them into an hdf5 file, 3d array. First dim is the image number. I track the image number. I create a hdf5 dataset. Can I populate the array only by keeping track of the image number in the parallel processing?

dset[image_number]=converted_image

Then I relay on my image_number to sort this out in the dataset the position?
H5py puts the converted data via the image_number into the correct location of the dataset?

desert oar
desert oar
pallid badge
#

I am trying parallel processing, the images are distributed and come back now converted. They have to be placed back into the dataset (3D np.ndarry).

#

I use zmq

stiff wedge
#

I am trying to convert a GZ File to a CSV File however I am running into errors, this is my current code:

import gzip
import csv
import io

with gzip.open(r'C:\code\un-general-debates-blueprint.csv.gz', 'rt', encoding="utf-8") as vFile:
    file_content = vFile.read()
    #file_content = csv.reader(vFile)

nlpFile = open(r"C:\code\un-general-debates-blueprint.csv", "w")

nlpFile.write(file_content)

nlpFile.close()

print(file_content)

Does anyone know how to possible transform this GZ File to a saved CSV File?

stiff wedge
sonic meteor
#

Can anyone help me get into neuroevolution??

lapis sequoia
#

How to fix this?

warm shard
#

Any NN journey? im new to ML started with sound analyzation with logistic regression, planning to move on naive bayes and neural networks or even more of them in time.

#

i dont know much about NNs i want to gain experience with data management first.

desert oar
desert oar
#

the only caveat here is that audio data can be a little complicated, and it tends to be easier to work with "tabular" data (like what you might find in a spreadsheet)

#

but you can work with audio metadata in tabular form, like a database of songs, year released, etc.

#

like you could try to predict genre using artist and title or something

warm shard
desert oar
#

at the same time, plenty of people do machine learning with audio and image data just using a big folder of files & a text file of labels

desert oar
warm shard
warm shard
desert oar
warm shard
#

Okay thanks for all advices.

desert oar
#

i don't think you need to master data management. data management is a means to an end. start simple, and if your needs expand you will learn more things.

warm shard
lapis sequoia
twin forge
#
def log_retornos(n):
    df = pd.DataFrame()
    for t in clean_price.keys():
        df[t] = clean_price[t]["Adj Close"]
    df = df.dropna()
    
    sec_returns = np.log(1 + df.pct_change()).dropna() * 100
    sec_outliers = pd.DataFrame()
    for t in sec_returns.columns:
        sec_outliers[t] = detect_outlier(sec_returns[[t]], cutoff_n=80)["TOF"]
    
    for t in clean_price.keys():
        try:
            plt.figure(figsize=(22, 10))
            plt.rcParams.update({"font.size": 19})
            plt.plot(sec_returns[t])
        
            outlier_positions = sec_outliers.index[sec_outliers[t] == 1]
        
            if not outlier_positions.empty:
                valid_positions = outlier_positions.intersection(sec_returns.index)
                plt.scatter(valid_positions, sec_returns[t].loc[valid_positions], marker='x', color='red', label='Outlier')
        
            plt.title(f"Log Retornos em % - {t}")
            plt.legend()
            plt.show()
        except:
            pass
        
log_retornos(int(input("Defina o nΓΊmeros de preΓ§os vizinhos: ")))```

Traceback (most recent call last):

  Cell In[3], line 30
    log_retornos(int(input("Defina o nΓΊmeros de preΓ§os vizinhos: ")))

  Cell In[3], line 10 in log_retornos
    sec_outliers[t] = detect_outlier(sec_returns[[t]], cutoff_n=80)["TOF"]

  File ~\AppData\Roaming\Python\Python310\site-packages\uniqed\runners\tof_run.py:39 in detect_outlier
    ).fit_transform(np_time_series)

  File C:\ProgramData\anaconda3\lib\site-packages\sklearn\utils\_set_output.py:142 in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)

  File ~\AppData\Roaming\Python\Python310\site-packages\uniqed\transformers\transformers.py:25 in fit_transform
    return self.fit(x).transform(x)

  File C:\ProgramData\anaconda3\lib\site-packages\sklearn\utils\_set_output.py:142 in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)

  File ~\AppData\Roaming\Python\Python310\site-packages\uniqed\transformers\transformers.py:21 in transform
    X = self._embedding(x, self.d, self.tau)

  File ~\AppData\Roaming\Python\Python310\site-packages\uniqed\transformers\transformers.py:37 in _embedding
    X = np.zeros((embedded_length, d))

ValueError: negative dimensions are not allowed
#

While trying to plot log_returns with markers on outlier data, I've been getting the following error

pallid badge
#

Another question in this context is how ZMQ and multiprocessing ensure that the workers return the data back into the corect order

cyan sierra
#

Hi everyone. In a classification problem, is it correct to one hot encode date like this? 'day_of_week_0', 'day_of_week_1', 'day_of_week_2',
'day_of_week_3', 'day_of_week_4', 'day_of_week_5', 'day_of_week_6',
'month_1', 'month_2', 'month_3', 'month_4', 'month_5', 'month_6',
'month_7', 'month_8', 'month_9', 'month_10', 'month_11', 'month_12'

unique ether
#

Any algorithm engineers lurking about?

past meteor
tall swallow
#

guys ik this is outa context. I need pc experts opinions on this. I when i play a game like gta v i get around 300 fps low that was about 2 years ago. I do the same thing now i cant even reach 150. I have updated Graphics Drvrs and the exact same settings on everything. whats the issue

rustic snow
#

In a convolution neural network does a convolution layer reduce the dimentions of the input picture

small wedge
# rustic snow In a convolution neural network does a convolution layer reduce the dimentions o...

it depends on the exact configurations of the convolutional layer, if it only has one filter and the dim of the kernel/length of the stride are greater than one then the output of the convolution will be smaller than the original image. However in practice almost all convolutional layers are going to have multiple filters which ends up increasing the number of parameters output by the layer. Generally we chain convolutional layers with pooling layers to reduce dimensionality and introduce other kinda of spatial invariance.

nocturne hornet
#

Im taking two courses in AI and neural networks at my uni. Is it normal to not get what biases and weights do to my dataset at the beginning? I'm having problems figuring out what it does to my results as im not understanding the lecture jargon.

wooden sail
#

that depends on what were the prerequisites/how early on in your study program you take the course

nocturne hornet
#

well were two weeks in and i kinda feel like were getting thrown in the deep end really quickly, but thats me. Our first task was to make a perceptron classifier, from which understand only works if i can make a linear line from the dataset? We then had to adopt a SVM classifier from lecture notes that would give me a different result if its not linear from what im understanding. And im still blank at what my biases and weights do to these classifiers

nocturne hornet
wooden sail
#

kalman filters is not introductory, oof

#

have you taken any linear algebra?

left tartan
# nocturne hornet well were two weeks in and i kinda feel like were getting thrown in the deep end...

Bias and weights are kinda fundamental. Maybe watch 3b1b, first vid covers it: https://m.youtube.com/watch?v=aircAruvnKk

What are the neurons, why are there layers, and what is the math underlying it?
Help fund future projects: https://www.patreon.com/3blue1brown
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

Additional funding for this project provided by Amplify Partners

Typo correction: At 14 minutes 45 seconds, th...

β–Ά Play video
nocturne hornet
wooden sail
#

all right. the weights form a matrix

#

it gets multiplied to the input data vector. then you add another vector, the "biases", to shift the result

#

this is called an "affine transformation". it transforms one line into another, and then moves it around in space

weak mortar
#

Hi. a little issue with a DataTable in dash / plotly. as it was convenient for me to transpose the dataframe used, the DataTable is not displaying the row header

nocturne hornet
wooden sail
#

that, and stretch as well

nocturne hornet
#

so what should i be look at when it comes to tuning my data assuming my classifiers are built corretly?

#

All i got now is a plot with some dots either side of the median line

#

That question sounded amateurish, but that is my current level πŸ™‚

wooden sail
#

i'd start by checking that 3b1b video, and then reading in your coursebook about support vector machines

nocturne hornet
#

Implementet and SVM into my code, just ended up with the same accuracy as they are both linear

#

But yes, ill save the video for the morning.

nocturne hornet
wooden sail
glacial rampart
nocturne hornet
wooden sail
#

oof

nocturne hornet
#

I'm abit frustrated as the lecturers are above my classes head, they are really good at this. Just not teaching. Never worked with weights and biases before even in the introductory course, just sorting algos and implementing kalman filter to a pygame :/

past meteor
#

You can ask me anything about SVMs @nocturne hornet, do you have any specific questions?

past meteor
abstract wasp
#

Hi, has anyone ever made an AI that geotags photos? If so, which dataset did you use?

left tartan
#

Curious if any ai can beat that guy

desert oar
twin forge
#

this was one of the goals, removing the time interval where stock return == 0%, because it translates to a stock delist

Log Returns in % - BGP

#

ps: plots in portuguese, I know kekw

twin forge
#

problem is, what to do with these illiquid stocks with huge price movements @glacial rampart

left tartan
#

You could also melt (unpivot) the data back to narrow form, which might be easier to work with.

abstract wasp
left tartan
#

Did you check Kaggle?

abstract wasp
twin forge
#
for t in sec_outliers.columns:
   plt.figure(figsize=(22, 10))
   plt.rcParams.update({"font.size": 19})
        
   sec_returns_t = sec_returns[t].replace(0, np.nan).dropna()
        
   if not sec_returns_t.empty:
     plt.plot(sec_returns_t)```

By replacing the 0s (when you calculate variation of NaN values result = 0%) with NaN and removing them, I'm removing the period where a stock has been delisted
pallid badge
# desert oar The best way to guarantee ordering in any concurrent or parallel processing is t...

You are right. I distribute the images via the following range() over all workers.
Some code snippets

#in worker
for i in range(worker_id, nimages, self._nworkers):
            img = images[i]
  push_sock.send_pyobj((i, result))

def _unordered_recv(self, sock):
        while True:
            img_number, result = sock.recv_pyobj()
            yield (img_number, result)

#in collector later
generator = self._unordered_recv(pull_sock)

The previous programmed wrote an ordered_rec function

def _ordered_recv(self, sock):
        cache = {}
        next_img_number = 0
        while True:
            img_number, result = sock.recv_pyobj()
            if img_number == next_img_number:
                yield (img_number, result)
                next_img_number += 1
                while next_img_number in cache:
                    yield (img_number, cache.pop(next_img_number))
                    next_img_number += 1
            else:
                cache[img_number] = result    

But strangely, it seems not to work. When I use ordered_recv I don't get an error message, but some entries in the converted image stack remain empyt. That I don't understand. Unordered seems to work.

left tartan
#

Why not just receive them out of order and sort? @pallid badge

#

Memory limits?

twin forge
#

@left tartan

#

remove_outlier() working perfectly

left tartan
#

Nice

#

Which library was that?

twin forge
#

dope project

#

I can also set the length of bad returns allowed to be less or more lenient

#

0.1 meaning I except at least 90% of time series to behave normally

left tartan
#

neat, I'll have to try it out someday. I still don't like the application of outlier removal without an understanding of the mechanism / reason for the outliers, for the record.

lime grove
#

How do you handle outlier removal @left tartan ?

pallid badge
# left tartan Why not just receive them out of order and sort? <@815357558386589800>

Hi! Thank you for your answer. I ran the script with the ordered_recv and unordered_recv. Indeed, only in the unordered_recv version output file seem to have all images inside. In the ordered version some arrays show only 0 that should not be the case.
It would be nice to understand why.
I got this snippet of ordered_recv by another person, I thought he would know what he is doing? For me, it is very important to be in control because I aim to give my code to other people.
Is there any better way to control the input, the processing by ZMQ, and the output, please? Currently for me this is a blackbox.
The only way I can come up with is as follows: I run without the parallel processing and do it in series. Then I redo it, I compare the two final files. Thank you for the discussion πŸ™‚

weak mortar
#

good morning πŸ™‚ i am converting a dataframe to dict to use as data in a dash_table.DataTable. Unfortunately dash seem to require that i use the method 'records' for converting to dict, which results in all row titles being deleted. if i use 'index' the dict contains the row titles, but dash will not generate the table.

left tartan
left tartan
stable cape
#

hello, does anyone know how to implement a delay for each request langchain makes? I have the starter freeplan in Openai of 5$ and it limits the amount of requests per minute so i get blocked... Here 's the code: data = yaml.load(f, Loader=yaml.FullLoader)
json_spec = JsonSpec(dict_=data, max_value_length=4000)
json_toolkit = JsonToolkit(spec=json_spec)

json_agent_executor = create_json_agent(
llm=OpenAI(temperature=0, openai_api_key=OPEN_API_KEY),
toolkit=json_toolkit,
verbose=True,

)

nocturne hornet
past meteor
nocturne hornet
past meteor
#

Maybe this can help, I wrote this in the past: Maybe this helps? Idk. I wrote this in the past

#

Your perceptron's decision boundary is pretty bad compared to that of the support vector machine. You see the line is to close too the purple class. There's probably points just beyond that line that are still supposed to be in the purple class but your perceptron will say they're in the yellow

#

That's a direct consequence of the margin term you add to a SVM, which is related to L2 regularization. If you add that to your perceptron the results will be better. L2 also wants to constrain a magnitude of a vector if you recall πŸ™‚

#

The reason why you have points inside of the dotted lines are because of the slack variables SVMs are because of the slack variables (eta in my screenshot)

past meteor
#

/info dump over

nocturne hornet
past meteor
#

You can try it with different values that aren't 0. At least for the SVM it should converge.

nocturne hornet
past meteor
#

Did you code the SVM from scratch pithink

nocturne hornet
#

No, we got handed this code with the task of implenting it to our dataset. With the end goal of comparing perceptron vs SVM in both time and accuracy.

#

Tried coding perceptron from scratch, probably why its bad 😦

past meteor
#

Learning rate in an SVM? Can you show me the implementation

#

Is it stochastic gradient descent?

nocturne hornet
# past meteor Is it stochastic gradient descent?
class SVM:
    def __init__(self, learning_rate=LEARNING_RATE_SVM, lambda_param=LAMBDA_PARAM, n_iters=N_ITERS):
        self.lr = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def _init_weights_bias(self, X):
        n_features = X.shape[1] # The number of features is the number of columns in the dataset
        self.w = np.zeros(n_features) # The weights are the coefficients of the input variables. They are multiplied by the inputs and summed to arrive at an output. They are updated in the learning process.
        self.b = 0 # The bias is a constant value that is added to the weighted sum of inputs to determine the output of a neuron. It is a constant value that is learned during training.

    def _get_cls_map(self, y):
        return np.where(y <= 0, -1, 1)

    def _satisfy_constraint(self, x, idx):
        linear_model = np.dot(x, self.w) + self.b 
        return self.cls_map[idx] * linear_model >= 1
    
    def _get_gradients(self, constrain, x, idx):
        if constrain:
            dw = self.lambda_param * self.w
            db = 0
            return dw, db
        
        dw = self.lambda_param * self.w - np.dot(self.cls_map[idx], x)
        db = - self.cls_map[idx]
        return dw, db
    
    def _update_weights_bias(self, dw, db):
        self.w -= self.lr * dw
        self.b -= self.lr * db
    
    def fit(self, X, y):
        self._init_weights_bias(X)
        self.cls_map = self._get_cls_map(y)

        for _ in range(self.n_iters):
            for idx, x in enumerate(X):
                constrain = self._satisfy_constraint(x, idx)
                dw, db = self._get_gradients(constrain, x, idx)
                self._update_weights_bias(dw, db)
    
    def predict(self, X):
        estimate = np.dot(X, self.w) + self.b
        prediction = np.sign(estimate)
        return np.where(prediction == -1, 0, 1)```
past meteor
#

Your class is very weird

nocturne hornet
#

The class is from the course github page which were told to implement, but I dont know much about this classifier to judge.

#
class Perceptron: # Perceptron classifier class
    def __init__(self, learning_rate=LEARNING_RATE, n_iters=N_ITERS): 
        self.lr = learning_rate 
        self.n_iters = n_iters 
        self.weights = None # The weights are the coefficients of the input variables. They are multiplied by the inputs and summed to arrive at an output. They are updated in the learning process.
        self.bias = None # The bias is a constant value that is added to the weighted sum of inputs to determine the output of a neuron. It is a constant value that is learned during training.

    def fit(self, X, y): # The fit method is used to train the model
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.n_iters): 
            for idx, x_i in enumerate(X): #
                linear_output = np.dot(x_i, self.weights) + self.bias
                y_predicted = np.where(linear_output > 0, 1, 0)
                update = self.lr * (y[idx] - y_predicted)
                self.weights += update * x_i
                self.bias += update

    def predict(self, X): # The predict method is used to predict the class of a sample
        linear_output = np.dot(X, self.weights) + self.bias # The linear output is the weighted sum of the inputs
        return np.where(linear_output > 0, 1, 0)``` this is the perceptron class i ended up with
past meteor
#

This implementation is wrong

nocturne hornet
#
class SVM:
    def __init__(self, learning_rate=1e-3, lambda_param=1e-2, n_iters=1000):
        self.lr = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def _init_weights_bias(self, X):
        n_features = X.shape[1]
        self.w = np.zeros(n_features)
        self.b = 0

    def _get_cls_map(self, y):
        return np.where(y <= 0, -1, 1)

    def _satisfy_constraint(self, x, idx):
        linear_model = np.dot(x, self.w) + self.b 
        return self.cls_map[idx] * linear_model >= 1
    
    def _get_gradients(self, constrain, x, idx):
        if constrain:
            dw = self.lambda_param * self.w
            db = 0
            return dw, db
        
        dw = self.lambda_param * self.w - np.dot(self.cls_map[idx], x)
        db = - self.cls_map[idx]
        return dw, db
    
    def _update_weights_bias(self, dw, db):
        self.w -= self.lr * dw
        self.b -= self.lr * db
    
    def fit(self, X, y):
        self._init_weights_bias(X)
        self.cls_map = self._get_cls_map(y)

        for _ in range(self.n_iters):
            for idx, x in enumerate(X):
                constrain = self._satisfy_constraint(x, idx)
                dw, db = self._get_gradients(constrain, x, idx)
                self._update_weights_bias(dw, db)
    
    def predict(self, X):
        estimate = np.dot(X, self.w) + self.b
        prediction = np.sign(estimate)
        return np.where(prediction == -1, 0, 1)``` this is the original one that is in our course repo, which I assume is written by our professor
uneven bronze
#

What is the difference between language module and an actual ai

sonic valley
#

AI is an umbrella term for automated machinery to complete a task autonomously

past meteor
past meteor
#

That makes the implementation not strictly wrong but just "strange". Idk why your professor would code it up from scratch, why they'd use gradient descent in the primal etc

sonic valley
#

is it possible to increase the amount of iteration steps with tensorflow when predicting an answer?

past meteor
hot hazel
#

Will AI replace 90% of programmers in 5 years it very good it will replace us right and if not then WHY

nocturne hornet
hazy verge
#

Can anyone provide the resources to learn how to make chatbot from scratch?

sonic valley
hazy verge
#

Thank you

sonic valley
#

I wouldn't use chatterbot, because i've had troubles getting to work properly on 3.11.4

hazy verge
#

OK πŸ‘

sonic valley
#

Right now, Generative Pre-trained Transformer (πŸ€“) GPT 4 is the most advanced model, and it has hallucination rates of 20%, so every 5 messages will make false information

#

It is incapable of doing arithmatics, and its coding knowledge is completely based from StackOverflow

#

Up until AI can make new information based on what it knows proefficiently, we'll still be their overlords

past meteor
nocturne hornet
#

simple difference between the predicted output and the actual target to update the weights.

nocturne hornet
past meteor
past meteor
#

It's very close to SVMs, especially if you added L2. I'd encourage you to just write down the equations

hot hazel
nocturne hornet
past meteor
#

I've not used Perceptron in the past mainly because I can't think of a situation where I would favour it above a generic SGDclassifier/regressor.

hot hazel
#

Please replay

#

I am scared if ai will replace us in 5 years

past meteor
nocturne hornet
hot hazel
past meteor
#

Virtually no engineer cares about discussing this topic

#

It's tiresome and pointless

hot hazel
#

Can you tell me why it won't this is what I need

#

The why

past meteor
#

No, find a philosophy discord server and ask them. Thanks.

past meteor
hot hazel
#

I know it won't replace us but why

past meteor
#

Write out the equations and it'll make more sense πŸ™‚

wooden sail
#

ai has already replaced us. in reality, zestar is a bot we have linked to chatgpt, the true overlord

hot hazel
#

Zestar will you replace us

wooden sail
#

zestar already replaced me by giving better answers in this channel lemon_angrysad

hot hazel
#

Zestar how do you feel as a chat gpt and will you replace us

mild dirge
#

It's definitely better than its 74 predecessors

past meteor
#

Maybe zestar76 won't take so much time to remember why exactly SVMs are a maximum margin classifier.

hot hazel
#

Another ai

#

NOo I don't want it to replace us

pale hemlock
#

IF anyone is truely interested

hot hazel
#

What is this

#

WILL ai replace us

pale hemlock
#

if we let it

#

this is a Machine learning tensor model defines values in a coordinate system and catagorgized via mathematics in dimensions based off the orginal model

#

I eventually want to create a machine learning model that becomes AI

#

The graphic 3d aspect is not important the the system as a whole, merely an in intended consequence to model i desgned.

#

the fact that can do provides deeper insite in to my goal

desert oar
#

the enthusiasm is appreciated, but i also don't think this does anything with machine learning

pale hemlock
desert oar
# pale hemlock its defining a tensor off of real world values like circle triangle, data along ...

i see. this is actually a well-established technique, you're basically treating each coordinate of each shape here as a separate feature. there are some advantages (e.g. simplicity of the code) but there are also problems with it. for example you can take a (small) image and flatten the pixels, so that each pixel is a separate input feature. the problem is that you lose the proper sense of the 2d locality of the data, so the model has a lot more work in order to construct a good internal representation of the data, compared to using a CNN.

#

however one thing you can do is use something like an autoencoder to obtain a fixed-size vector embedding for each "object" and feed those into a model. that's basically what all of deep learning for NLP is based on. the nice thing there is that you can tailor the embedding model to accommodate all kinds of weird objects (text, shapes, images, whatever) but the output is very uniform and can be put into a very generic model.

pale hemlock
#

You could also define the picture as a rectangle and store the info about along that particular dimension

desert oar
#

right, that's exactly the technique. i'm saying that simply flattening the coordinates might not be the most effective way to obtain a vector representation of a polygon.

#

although it probably wouldn't be bad either in the case specifically of k-gons, since k is fixed

#

would be interesting to see if a model can distinguish self-intersecting, convex, and regular polygons just by flattening the coordinates

pale hemlock
#

the whole point those flattned coordinates are unique to that particular information. the whole pupose of my endevour is to formlize context.. say you want it to identify its self. it can refer to its current shape dude to its parameters that changed over time..

desert oar
#

i'm not sure what you mean by that

pale hemlock
#

right im trying

desert oar
#

yes, flattening the coordinates is an invertible transformation between the original polygon and the flat representation

pale hemlock
#

say you run this thing and learns the system dymentions, it would soon learn that in contenxt its a rectangle shaped box (gathered from the internet based on hardware information, dimensions, type, cpu then contextualize the information as hardware, it could read x and y values a see it as information from this source as a application, perhaps these values are window dimensions, they can be stored on the dimension avaible and given the cordinates of this newly created understand the model conforms to this information holding values that can be stored as a variable, Ie the computer its on.

#

in other words with each complete growth factor that conforms to lets say a triangle the orginal modle stores the information coordinately, those corrdinates are input references of stored dictionaries

#

these references can be combined to show in theory a 3d object once it learns enought about "its self"

#

what i find intersting is that you could intheory rotate the tensor object to store information anew.

desert oar
#

i don't follow, what does CPU have to do with this?

pale hemlock
#

nothing but if you ask the model what type of cpu it has, because when the tensor recieves input and creates a new storage dimension, any value learned in this dimension can later be refernece when tyring to talk a machine learning about itself.

#

what type of CPU you have because i haven't yet programed that far, i had a hard drive malfunction a week ago and starting new i have to retrace my steps, however this is a new approach of what i was trying to do

#

because a CPU is generally square, it can store the dimensions, name and stuff on the fly in said dimensions, say it managed to learn square thing and stored in the square dimension, it can refence instantaneously along those parameters and anything else that references square or square objects. perhaps what it revived threw videos, just stored in a space as a refence defined by the orignal tensor.

#

basically right now, if you run a model againsts it it would know its a square, with circle blach blach but an object that receives and learns as outputs

tiny nimbus
#

Hi, I am trying the ai cells magics from https://blog.jupyter.org/generative-ai-in-jupyter-3f7174824862 in jupyter lab. They work great, but I would like to streamline my process. Is there a way to avoid having to %load_ext jupyter_ai_magics in each notebook before using these magics? I looked around and advice on this was old/nonworking. Ideally I would like to change a config or add an argument to my jupyter lab startup alias.

Medium

Jupyter AI, a new open source project, brings generative artificial intelligence to notebooks with magic commands and a chat interface.

pale hemlock
#

hmmm i developed another method to do what i did earlier except.. well just look

boreal gale
desert oar
pale hemlock
#

hold on

pale hemlock
tiny nimbus
boreal gale
# tiny nimbus <@231160898872410123> Amazing! That worked. I was previously trying to modify t...

https://ipython.org/ipython-doc/3/config/intro.html

i was gonna go with ipython startup scripts as i knew that's definitely possible to run python code when a kernel starts up,
but i forgot where to put start up scripts, so i google ipython profile since i knew it's a profile-based thiing and that led me to that page and the top bit immediately showed how to enable cython magic by default so i adapted that instead.

boreal gale
# tiny nimbus <@231160898872410123> Amazing! That worked. I was previously trying to modify t...

i am actually unsure why your previous approach didn't work.

edit: that seems to be the config for LabApp not ServerApp which is what we needed to change
edit2: yeah no. jupyter-lab --generate-config is for both ServerApp and LabApp actually, and that's probably not to do with the kernel that's being spawned
edit3: i guess the keypoint is just realising jupyter is just using ipykernel - hence you need to alter ipython profile , not jupyter's profile.

tiny nimbus
broken flax
#

Hi I’m new here. Just signed up for Nucamp’s DevOps course with Python starting in September. In the meantime I’m trying to figure out how to work with list in real world situations but I got stuck somewhere. I was able to import my list (.csv) into pandas but don’t know what to do after that. How do I sort the list or find some item in the list. I learned to do this with just simple small list but not in real situations. Any suggestions? Please dm me. Thanks

pale hemlock
pale hemlock
#

well i finally got somewhere

pale hemlock
#

this is my final product?

#

anyone interested

winter drift
#

Hey guys, any good known datasets of pictures of trash and debris?

serene scaffold
#

yeah, the selfie folder on my phone

turbid fox
#

im planning on making a simple machine learning / ai algorithm to play chess. are there any python libraries i should investigate, aside from pandas / numpy / tensorflow. looking for a starting point in the project, i guess

serene scaffold
pallid badge
# left tartan I think you have a mistake in the code. You yield img_number with cache.pop(next...

@left tartan Thank you so much. You were absolutely right. There is the mistake, I did not see it.
Corrected version is now:

 def _ordered_recv(self, sock):
        cache = {}
        next_img_number = 0
        while True:
            current_img_number, current_result = sock.recv_pyobj()
            if current_img_number == next_img_number:
                yield (current_img_number, current_result)
                next_img_number += 1
                while next_img_number in cache:
                    next_result = cache.pop(next_img_number)
                    yield (next_img_number, next_result)
                    next_img_number += 1
            else:
                cache[current_img_number] = current_result    

This one makes sense in case you have a live queue, where images from a detector come in and should be displayed in the correct order. It could also happen that a scan stops and we don't want the final file to have empty places.
If order does not matter, the unordered version is ok, the target file gets filled because (img_number, result) travel together and I can identify the image and the location in the final file with img_number.

pallid badge
turbid fox
pallid badge
#

@left tartan I check now ordered and ordered_fix and let you know, but it looked good on the first glance

serene scaffold
turbid fox
serene scaffold
sterile nebula
#

do python have a library that can match similar lookin charts?

desert oar
# turbid fox not my first project, im just looking for resources

This isn't really my area of expertise, but you can probably find something that somebody has put together using reinforcement learning if you search around, maybe based on AlphaZero. however you should keep in mind that there is a long history of chess playing algorithms that don't use "machine learning" in the modern sense, which you also might want to look into

#

yeah it seems like AlphaZero has been used for chess since 2017, you should be able to find at least something about using it in python

rocky vortex
#

Minmax (sometimes Minimax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. When dealing with gains, it is referred to as "maximin" – to maximize the minimum gain. Originally formulated for se...

#

this is what stockfish uses if i remember correctly

civic elm
#

Hmm a really nice project would be a voice activated chess

#

No board just voice haha

#

Woah that would be cool time to learn RNNs

pale hemlock
#

Could Chatgpt3 interact with this model by its self?
ChatGPT
Yes, GPT-3 can interact with your tensor-based model by itself. GPT-3 is capable of processing and generating text, which means it can send text inputs to your model and receive responses from it. This interaction would involve GPT-3 generating prompts that are formatted in a way that your tensor-based model can understand and interpret.

#

@desert oar ive gotten somewhere i hope this changes your preception on what i may or may not understand

#

the neuro network model

golden haven
#

If anyone is good with python selenium here and you have a moment please check my post on python help page, just posted it there ❀️

civic elm
#

anyone using a mac? my jupyter notebook kernel is using the /usr/bin/python3 where it should be the anaconda python. I don't know how to fix this

north rain
#

@brittle radishplease don't post advertisements like that in this server

dry flame
#

need advice on NLP books
so I'm eyeing some books as a handbook to guide me in learning NLP with libraries like spacy, nltk etc (let's assume i know nothing of NLP). right now i have these 3 books listed. but i wonder if there are better books for complete beginners?

opaque idol
#

I just finished learning Python. I want to start learning how to train ai agents to play a game. Can anyone link me somewhere and explain how should I start doing that?

wraith heart
#

πŸ“£ I've just published an in-depth article on Stable Diffusion, an AI technology that's transforming the way we deal with noisy data. 🌟

πŸ” What's Stable Diffusion?

πŸ§ πŸ”¬ The Science Behind It

πŸŽ“ Get Hands-On
The tutorial includes a step-by-step guide on setting up Stable Diffusion, so you can get started on your own projects.

Let me know what you think!

https://medium.com/@Naykafication/stable-diffusion-phenomenon-from-core-principles-to-real-world-applications-e5f54c795b15?source=friends_link&sk=1a99411d24a0d86967ed72943959f48f

Medium

Beyond the Hype: Practical Tutorial to Stable Diffusion and Its Impact on Tech

fallow frost
#

does DuckDB have query parameters?

weak mortar
#

Hi guys. I have a few issues with pandas.style.background_color function. Which topic will you suggest me to ask in?

#

or is it better i open a thread in python help?

boreal gale
weak mortar
#

my problem is that i am coloring my table with heatmap, but i only want to apply it to specific rows. you can add subset="column name" to target specific columns, but it will always target columns, not rows, despite setting axis to 0 or 1.
code:
metrics_html = combined_metrics.style.background_gradient(cmap='Greens',axis=0).to_html()

#

as im very familiar with css, i tried overwriting the styles of the cells in the html, but due to the nature of how the colors are applied it is not possible to do

boreal gale
# weak mortar my problem is that i am coloring my table with heatmap, but i only want to apply...

have a read here: https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.background_gradient.html

in particular:

subset:  label, array-like, IndexSlice, optional
A valid 2d input to DataFrame.loc[<subset>], or, in the case of a 1d input or single key, to DataFrame.loc[:, <subset>] where the columns are prioritised, to limit data to before applying the function.

here is an example to get what you wanted:

import pandas as pd

import numpy as np

arr = np.random.randint(1, 100, size=(10, 10))

df = pd.DataFrame(arr)

df.style.background_gradient(cmap='Greens', subset=([3,4], slice(0, None)), axis=0)
weak mortar
#

okay thanks let me try that right away. i tried with the slice thing and apply and applymap last night, but the slice stuff was confusing for me.

boreal gale
#

you don't even need slice actually..

df.style.background_gradient(cmap='Greens', subset=([3,4], ), axis=0) will do just fine

weak mortar
#

thats basically what i was already doing. it still only searches the columns for 3 and 4 here, it cannot search the rows

boreal gale
weak mortar
#

how you make it work on rows though, it says it looks through columns 🀷

#

your example works. just figuring out how to make it also work on my df

boreal gale
#

hmm? i though the documentation is quite clear already πŸ€”

unless you weren't aware there is a difference between [3,4] and ([3,4], )?

weak mortar
#

yes i was totally unaware of that πŸ™‚

#

i see that you thereby are targeting rows , i just didnt go further with that because it always said it was searching cols

boreal gale
#

ah! okay, you can experiment with df.loc[<whatever-you-put-as-subset>] to see what it will target (in the 2D case according to the docs)

#

but if it's 1D then it's df.loc[:, <subset>]

#

what a weird API.. πŸ€·β€β™‚οΈ

weak mortar
#

okay case solved. thank you very much!
combined_metrics.style.background_gradient(cmap='Greens', subset=(["opti","bt1","bt2","bt3","bt4"], ), axis=0).to_html()
yesterday i was so close, litereally just had to put ", " after the ]

latent remnant
#

anyone familiar with Power Bi?

pale hemlock
#

IF anyone is truly interested, patient, willing to listen to a reasonable implementation of working on a novel concept of Machine learning and AI.. please by all means message me

serene scaffold
opaque idol
umbral charm
#
import numpy as np
import math
import pandas as pd
from pandas.plotting import scatter_matrix
import scipy
import matplotlib.pyplot as plt
from scipy.stats import *
pd.set_option('display.max_rows', None, 'display.max_columns', None, 'display.width', None)
housing = pd.read_csv(filepath_or_buffer = 'filepath')
print(housing)

why does this not print all of it, it prints from like 3000 - 13000

#

this is on pychamr btw maybe thats the issue

#

and i do have 13000 data points

mild dirge
#

Terminal has a maximum length @umbral charm

umbral charm
#

was thinking this, anyway to solve?

#

or do i just have to use like IDLE

mild dirge
#

I don't think you ever need to print over 10k lines probably.

#

Could print to a text file

umbral charm
#

That is true, its just useful to see if it works

mild dirge
#

But probably better to only print the useful info

umbral charm
#

yea would be but i like to visualise everything just incase

mild dirge
#

Maybe there is a setting to change the maximum line count

#

But you'd need to look around in the settings

umbral charm
#

would that be it?

wooden sail
#

printing out 10k lines of text is one of the worst ways of visualizing anything

#

make a plot of the stuff you care about

mild dirge
#

You're not going to read through 10k lines in les than a few hours πŸ˜›

wooden sail
umbral charm
#

Yea its fine i should be good with 10k lines

#

i was just curiuos, dealing with large datasts is a pain

#

oh

#

it worked

umbral charm
mild dirge
#

It was edds suggestion πŸ‘ŒπŸ½

unique ether
#

What is the most important mathematical formula in ML?

#

is it the Quadratic formula?

mild dirge
#

Really subjective. There is also the chain rule for backwards propagation

verbal swan
mild dirge
#

Lots of activation functions like tanh, ReLU, sigmoid etc.

#

Also pretty big

#

But yeah, not really a single answer πŸ˜›

verbal swan
#

I don't think there is a single most important formula, multiple statistical formulas are used

unique ether
#

Cheers everyone

umbral charm
#
housing['date'] = pd.to_datetime(housing['date'], format = '%d/%m/%Y')
housing2 = housing[(housing['date'] > '2016-12-01') and (housing['date'] < '2018-01-01')]
print(housing2)
#

why doesnt this work

#

' raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'

young granite
umbral charm
young granite
#

only the full error i assume it results from line (2)?

unique ether
#

the traceback mate

umbral charm
#

ok

#

but dont bully the username alright

left tartan
#

(…) and (…)

umbral charm
#
Traceback (most recent call last):
  File "D:\Users\FatBoy\PycharmProjects\Coursera\Gali.py", line 11, in <module>
    housing2 = housing[(housing['date'] > '2016-12-01') and (housing['date'] < '2018-01-01')]
  File "C:\Users\FatBoy\bottle\lib\site-packages\pandas\core\generic.py", line 1466, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
left tartan
#

Oh sorry, you’re comparing to a string.

#

You want to compare the series to a date obj

umbral charm
#

When i put the '&' instead of and

#

it works

#

but i dont want to use '&' coz idk what it actual means i only know its a bitwise operator

left tartan
#

Oh and that too:

#

Each side of that AND is a boolean series.

#

You want the bitwise AND of the series… not the boolean AND which makes no sense: what is [10101010] and [00011010]?

#

A boolean AND operation would give you a True or a False, which makes no sense

#

Does that make any sense?

umbral charm
#

i see

#

so if [10101010] and [00011010]? makes no sense

#

what would [10101010] & [00011010] do

left tartan
#

What do you think it’ll do? Do you know anything about bitwise operations or boolean arithmetic?

umbral charm
#

nope nothing about bitwise operator, thats why i didnt want to use it

#

i also know that there is a pipe for OR bitwise

unique ether
#

doesn't & return everything that is in both lists?

left tartan
#

Let me demonstrate... one sec

umbral charm
#

oh

#

& literally goes down to the bits and compares them

#

so if im correct

left tartan
#
import pandas as pd
s1 = pd.Series([0, 1, 1, 0, 1])
s2 = pd.Series([1, 0, 1, 1, 0])
print(s1 & s2)
umbral charm
#

that would print

#

00100

#

yay or nay?

left tartan
#

Yes

#

So, lookinga tyour original code:

#

You have series A which is : (housing['date'] > '2016-12-01') and series B which is: (housing['date'] < '2018-01-01')

unique ether
#

this would print 11111 right?

import pandas as pd
s1 = pd.Series([0, 1, 1, 0, 1])
s2 = pd.Series([1, 0, 1, 1, 0])
print(s1 | s2)
left tartan
#

So, you want all rows where A is True and B is True