#data-science-and-ml

1 messages · Page 76 of 1

twilit tundra
#

I'd go with the cross-validation selected model, it's the most likely to generalize well

celest vine
#

Guys help.
I am running spark in pycharm. Loading data, doing some transformations. But when I try to write it in the same project folder I am getting windows error 5 Access is denied

agile cobalt
#

just keep in mind that if you tune the hyperparameters too much you might end up 'overfitting' them to your test set

small wedge
#

I will try to break out the math to make it easier to understand.

C = 1/m * Σ(i=1;m) (yᵢ - aᵢ)^2
a = 1/(1+e^-z)
z = wᵀx + b

∂C/∂a = -2/m * Σ(i=1;m) (yᵢ - aᵢ)
∂a/∂z = a * (1-a)
∂z/∂w = xᵀ


to calculate the gradient of the weights ∂C/∂w we just multiply these partials together using the vector chain rule.

∂z/∂w * ∂a/∂z * ∂C/∂a

which is the same as 

-2/m Σ(i=1;m) (yᵢ-aᵢ) * a * (1-a) * xᵀᵢ

Sigmoid makes it a bit messy to write out but that would be how you calculate the gradient.  Breaking it up into partials makes it a lot easier to understand IMO.
slim bone
small wedge
#

hmm why would you need to take a derivative with respect to a single vᵢ? The sum makes the output scalar anyway

slim bone
#

Because there is no ‘v’ in the equation, or is there?

#

It’s just a sum of the delta between predictions and targets?

small wedge
#

yes, v is just a term used to shorten that delta between predictions and targets when writing out the formula (and also to make it clear where the -1 is coming from)

#

you can remove v and put the (y - y_hat) there instead and the math will work the same

#
C = 1/m * Σ(i=1;m) (yᵢ - aᵢ)^2
∂C/∂a = -2/m * Σ(i=1;m) (yᵢ - aᵢ)
#

am I understanding what you're asking correctly?

tidal bough
small wedge
#

oops, I can change it then. I meant to make it more clear where the -1 was coming from but ig I just made it more confusing.

slim bone
#

Thanks again

tidal bough
#

Here's what I'm getting

small wedge
#

you're right that was a big mistake

slim bone
#

Wow

#

You two are special, I might finally get it
When people just say “use the chain rule” it really doesn’t mean much to a beginner but when you put it like that it’s so much nicer

#

Oh and needless to say, thank you

small wedge
#

That's great ^^, I know the feeling of that epiphany. It's really simple if you abstract it out to partials

#

and my apologies with the incorrect math and making it more confusing lol

slim bone
#

Oh no worries! It wasn’t such a terrible mistake and it probably made the initial reading a little clearer :)

tidal bough
#

I think "use the chain rule" is often confusing because calculus courses don't often have examples where a function has n variables 🙂

#

It'd be like this:

#

(In the derivation above, the function in question is C, which depends on a_1, ..., a_m. Hence the sum in the result.)

desert oar
#

fair, i'm guilty of advising that 😉

slim bone
slim bone
desert oar
#

yeah but i hope i didn't mislead along the way

#

@tidal bough do you actually need the sum property there? i think it "expands" naturally in this case because C is itself a sum over i, so the whole thing expands out into a sum of partial derivatives by linearity

#

i never learned total vs partial derivatives properly in school, it's something i should probably revisit at some point

tidal bough
desert oar
iron basalt
tidal bough
#

ah yes, nabla-transposed 🥴

iron basalt
tidal bough
#

mathematicians try not to abuse notation challenge (impossible)

slim bone
#

ML is probably the first time I’ve seen the two go hand in hand

iron basalt
#

Oh, and this one image:

slim bone
#

feel free to elaborate

slim bone
# iron basalt Oh, and this one image:

I think this is hitting beyond my bracket considering I’ve just learned how to use the chain rule with more than a single nested function

Nevertheless, extremely curious to note that such a connection even exists. Thank you

iron basalt
# slim bone And uh, I’m genuinely not sure what you mean by “guess how other things work” ^^...

With a strong foundation in the fundamentals of calculus (its purpose / what it's doing), you can predict how it will play in out combination with something like vectors / linear algebra. This is my usual approach, I derive my own stuff, then after that read more about the topic. After a certain point I only need a few cues to predict the rest / adjust my stuff to match. This approach lets me know that I actually understood the previous topic leading up to the next one (confirmation of prediction), including its purpose from which much can be predicted since I then can guess what the inventors of the topic were aiming for / would probably go for next (on the same timeline / "wavelength"/ however you want to put it). The other thing I do is follow a historical approach. I try to find how/why they were inventing the math / what the context was at the time (what did people know then / what were the unsolved problems). This kind of prediction task (predict answer, then check) is usually done via the practice problems in books, but I like to take it a step further and predict the next chapter(s) too. I'm not recommending this approach, it's just what I do.

#

Basically, I like to reinvent the wheel.

serene scaffold
#

!otn a squiggle's reinvented wheel

arctic wedgeBOT
#

:ok_hand: Added squiggle’s-reinvented-wheel to the names list.

upper flame
#

Hey guys do you know a lil bit of finance ? Cause i have a trading ai that i try to finish … could someone help me please 🙏. This AI has a very big potential, the people who accept to help can keep the code and run it to generate some wealth… it’s about 95% done

small wedge
upper flame
small wedge
upper flame
#

Here is a snippet here they are function calls

#
print("test 1")
bot = Bot()
print("test 2")    

# Call Market class

market = Market(symbol='EURUSD', yahoo_ticker='MSFT', currency='EUR', hist_window=365)
market.fx_price()
market.stock_price()
data=market.market_to_dataframe()

# Call the Balance class
ip_address = "127.0.0.1" 
port_id = 7495 
client_id = 1  
current_price=market.fx_price(real_time= True)
price=market.fx_price()


bot.nextorderId = None
bot.run_loop();
print("wa7el Houni");
balance = BalanceApp(ip_address,port_id,client_id)
balance.start()
balance.accountSummary(reqId=123, account="DU11643091", tag="TotalCashValue", value="12345", currency="EUR")
balance.error(reqId=123, errorCode=456, errorString="Some error message")

# Call the RiskManager class

riskmg = RiskManager(balance, stop_loss_pct=0.05)
max_take_profit_pct = riskmg.calculate_max_take_profit_pct()
print("Maximum take profit pct: ", max_take_profit_pct)
order_size=riskmg.calculate_order_size(current_price)
print("Order size:", order_size)
riskmg.calculate_risk(price, stop_loss=7.5)

# Call the NNTS class

nnts = NNTS(lookback=50, units=128, dropout=0.5, epochs=200, batch_size=64)
X, y=nnts._prepare_data(data)
model=nnts._build_model(X)
buy_signals=nnts.generate_signals(data, strategy='buy')
sell_signals=nnts.generate_signals(data, strategy='sell')

# Call the TradingProcess class

tp = TradingProcess(balance, risk_percentage=0.05)
tp.update_equity()
tp.can_open_position(price, stop_loss=0.05)
tp.can_afford_position(price)
tp.open_position(price, stop_loss=0.05)
tp.close_position(price)
tp.update_position(price)
tp.fit(X, y)
tp.predict(X)

# Call the DataProcessor class

datapp = DataProcessor(feature_collumns=["open","high", "low", "close", "volume"])
datapp.preprocess_data(data)


# Call PlaceCancelOrder class

pcorder = PlaceCancelOrder()
pcorder.place_order(buy_signals, sell_signals, symbol='EURUSD', order_type='MKT')
pcorder.cancel_order(order_id=1)

# Call Bot function
bot.execute_trade(buy_signals, sell_signals, price)
#

@small wedge

small wedge
#

which part do you need help with?

upper flame
#

there are almost done but there are some arguments that i couldn't figure how to call them

tidal scroll
#

Hello, everyone. I would like to ask about a slight problem in an RDF graph, so the elements are too close to each other, and there is no space between them. I have been working on this project for my final school assignment and have searched everywhere on Google, Graphviz documentation, Stack Overflow, and YouTube, but none of the solutions are working. Therefore, I would appreciate some assistance here if you don't mind.

This is the code

`new_rdf_file = '../../output/rdf/dummy_rdf.rdf'

g.parse(new_rdf_file, format='xml')

gv_graph = graphviz.Graph(strict=True, format='svg', engine='neato')

def get_local_name(uri):
uri_str = str(uri)
return uri_str.replace(nba_players, '').replace("http://", '').replace("https://", '')

for subject, predicate, obj in g:
subject_label = get_local_name(subject)
obj_label = get_local_name(obj)
predicate_str = str(predicate)

# Add nodes and edges to the Graphviz graph
gv_graph.node(subject_label)
gv_graph.node(obj_label)
# gv_graph.edge(subject_label, obj_label, label=predicate_str)
gv_graph.edge(head_name=subject_label, tail_name=obj_label, label=predicate_str)
gv_graph.attr(pad="1.0")

output_file = 'dummy_output.svg'
gv_graph.render(output_file, view=True)`

Thank you in advance

small wedge
upper flame
#

what should i put in quantity

small wedge
#

oh or it's just stocks

#

then I assume it'd be how many shares you want of a stock

upper flame
#

no it's both

#

but i'll start with currency exchange

#

the thing is i don't know how the ai will buy the units that i can afford

misty flint
#

new colab feature?

random raft
#

hello

#

anyone?

sleek harbor
weary sedge
#

Does anyone know a solution to the issue that I have?

"(base)" does not display on the terminal, when I use bash. But when I change my shell to zsh, it displays.

Why is that?

sage obsidian
#

Can anyone recommended a difficult python project?

unique flame
#

Does a text summarisation model from huggingface sent your text to huggingface? I downloaded the model to my device and run it without internet, but was wondering if there are security issues when summarising personal documents.

twilit tundra
unique flame
#

Thanks wanted to be sure!

upper flame
unique current
#

guys got question is there a way to replace \ in text using replace option?

#

i tried

message.replace("\", "")
#

but doesnt work

unique current
#

?

unique current
twilit tundra
#

Use double backslash, backslah is a special character

unique current
#

but one is in text

#

i just need to change output of script

#

and output is text and \

sleek harbor
#

Question to those who work. How powerful of a PC/laptop do you need? Do employers provide cloud compute, so that you could work on a weak device, or do they expect you to have a powerful PC and use your own processing power for everything you do?

Would the laptop linked below (gave 2 links in case one doesn't work) be good enough for work? No GPU, and not the best CPU. Only 8Gb RAM.. but what do you think?

https://sl.aliexpress.ru/p?key=ScdFZED
https://aliexpress.ru/item/1005001520846730.html?sku_id=12000027438880217&spm=a2g2w.productlist.search_results.1.1a364aa6fKeQyh

serene scaffold
sleek harbor
twilit tundra
#

Even outside of tech, they provide their own laptop for security reasons

boreal gale
#

this is pretty dang cool! i am just swamped by work atm, i might give this a look in the weekend

sleek harbor
slim bone
#

Kind of curious regarding Pytorch's nn.Linear() function:

test_img = torch.ones(1,4,4, dtype=torch.float)
test_flatten = nn.Flatten()
test_flattened_image = test_flatten(test_img)
test_layer1 = nn.Linear(16, 4)
test_hidden1 = test_layer1(test_flattened_image)
print(test_hidden1)
-------------------
output:
random values in (-1,1)

Anyone got any idea what that's about? are the weights just initialized randomly?
document for reference: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

twilit tundra
#

The weights are initialized randomly

slim bone
twilit tundra
#

Yes you can overwrite the attributes or reload a pretrained model

slim bone
#

Ah cool, that's kind of what confused me to begin with

slim bone
twilit tundra
#

I'm on iceraven on my phone, with the dark reader extension

slim bone
#

Looks really neat 🙂 Ty for the help

dusty valve
#

Hello humans, fastest way to render a 2d image? Im doing some computing and the output is a 2d array. Ive used mplt. Scipy or Pillow seem good too

molten hamlet
#

depends whats loader is doing ;d

#

Can you stack arrays with different shapes somehow? I want to index different arrays

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5])
arr3 = np.array([6, 7, 8, 9])

stacked = np.stack([arr1, arr2, arr3])

scope = stacked[0,1] # 1
scope = stacked[2,2] # 8

celest vine
#

Hey, any data engineers here?
I have 1 year experience as a data analyst and I am trying to break into data engineering.
I know python, sql, hadoop, spark and azure(adf, databricks) and aslo basics of airflow.
Is this enough to land a job?

balmy idol
mild dirge
#

You could perhaps pad the shorter arrays if it really helps with efficiency and the lengths don't differ too much

molten hamlet
tidal bough
balmy idol
# tidal bough The functions from `scipy.signal` usually have an argument for boundary conditio...

thank you, im trying to accomplish this from R:

> filter(x, rep(1, 3), circular = TRUE)
Time Series:
Start = 1 
End = 100 
Frequency = 1 
  [1] 103   6   9  12  15  18  21  24  27  30  33  36  39  42  45  48  51  54  57  60  63  66  69  72  75  78  81  84  87  90  93  96  99 102
 [35] 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162 165 168 171 174 177 180 183 186 189 192 195 198 201 204
 [69] 207 210 213 216 219 222 225 228 231 234 237 240 243 246 249 252 255 258 261 264 267 270 273 276 279 282 285 288 291 294 297 200

in python the closest i can get is this:

np.convolve(x, [1,1,1], mode='valid')
array([  6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36,  39,  42,
        45,  48,  51,  54,  57,  60,  63,  66,  69,  72,  75,  78,  81,
        84,  87,  90,  93,  96,  99, 102, 105, 108, 111, 114, 117, 120,
       123, 126, 129, 132, 135, 138, 141, 144, 147, 150, 153, 156, 159,
       162, 165, 168, 171, 174, 177, 180, 183, 186, 189, 192, 195, 198,
       201, 204, 207, 210, 213, 216, 219, 222, 225, 228, 231, 234, 237,
       240, 243, 246, 249, 252, 255, 258, 261, 264, 267, 270, 273, 276,
       279, 282, 285, 288, 291, 294, 297])```
but the 200 and the 103 drop off
tidal bough
# balmy idol thank you, im trying to accomplish this from R: ``` > x <- 1:100 > filter(x, re...

Huh, it looks like scipy.signal's convolve doesn't have wrapping, which is weird to me. Anyway, you can use ndimage's instead:

>>> scipy.ndimage.convolve(x, [1,1,1], mode='wrap')
array([103,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36,  39,
        42,  45,  48,  51,  54,  57,  60,  63,  66,  69,  72,  75,  78,
        81,  84,  87,  90,  93,  96,  99, 102, 105, 108, 111, 114, 117,
       120, 123, 126, 129, 132, 135, 138, 141, 144, 147, 150, 153, 156,
       159, 162, 165, 168, 171, 174, 177, 180, 183, 186, 189, 192, 195,
       198, 201, 204, 207, 210, 213, 216, 219, 222, 225, 228, 231, 234,
       237, 240, 243, 246, 249, 252, 255, 258, 261, 264, 267, 270, 273,
       276, 279, 282, 285, 288, 291, 294, 297, 200])
molten hamlet
#

__getitem__ is used with []

lean breach
#

is anybody really strong with using langchain/chroma with the gpt-api. Im working on a project and have some questions if anybody would hop on a discord call wit me.

molten hamlet
#

what do you measure with time? full loop? or just getting items

boreal gale
# sleek harbor Would be grateful, as I really can't seem to figure out why the table doesn't di...

prefixing this answer with disclaimer: i am not a dash expert, so take what i said with a grain of salt.

here is what i understood

  1. upon turning on debug mode, i immediately saw there is an error on start up, this would explain the issue you are seeing (or maybe rather confirming the symptoms you are seeing)
  2. upon opening the callback DAG view with debug mode on, i see your intra_sector_corr populating function takes quite some time to run, where as your stocks table populating function is immediately triggered, this is basically a race condition due to improper specification of what callback to run first (as to how your can do this, see the link i posted before or my below attempt)
  3. by adding
@app.callback(Output("stocks-dropdown", "value"), Input("stocks-dropdown", "options"))
def pick_first_option_on_change(options):
    return options[0]

i can alter the callback DAG into this, i believe the callback run from top to bottom on initialisation, so hence we have successfully made the race condition go away by forcing stocks table to wait until the first callback for intra_sector_corr population completes

twilit tundra
#

I'm not sure I understand your question. Should the 5k photos be labelled as 0 and the 70-100 photos labelled as 1?

#

@lapis sequoia

#

Since you should have random photos vs a small number of close photos, I'd use an existing image embedder, embed a few of the class 1 photos as reference, embed the rest of the image and class them by cosine similarity with the reference

#

The technique I mentionned should work, the alternative is probably managing the unbalance by oversampling your class 1 photos

dusty valve
serene scaffold
dusty valve
balmy idol
#

Is there a python analog to this behavior from sequence in R? :

 [1] 1 2 3 4 1 2 3 1 2 1```
agile cobalt
#

not built-in, but you can just implement it yourself

#

we have range() for normal [1, 2, 3, 4], but that behaviour you exemplified seems very weird

#

!e ```py
def sequence(n):
for j in range(n, -1, -1):
yield from range(j)
print(list(sequence(4)))

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

[0, 1, 2, 3, 0, 1, 2, 0, 1, 0]
agile cobalt
#

numpy/scipy might have it builtin somewhere, but if not you'll probably want to create arrays with np.arange then concatenate them with some other numpy function

wind shale
#

i ahve solution but dont know how it work , can anyone help me

timid grove
trim spruce
#

Hello!
I would like to make an application that will make me an electricity forecast for the next period based on trained models. what should I start with? Apart from the correlation coefficient/humidity/temperature/seasonality coefficient, what else can I use? Sorry if I'm posting where I shouldn't, please redirect me!

sleek harbor
# boreal gale prefixing this answer with disclaimer: i am not a dash expert, so take what i sa...

so basically, if I understand correctly, that's simply setting the default value for the second dropdown. And it works!!! Thank you! I still don't really quite understand why it doesn't work without it.. I would have thought that since the table updating callback has the second dropdowns "value" as an input, dash would make the connection that the second dropdown needs to have an "options" parameter to work, which is the output of the first dropdown, and dash would define the order appropriately. Turns out, it seems, it doesn't make the connection, that in order to choose a value, you first need options. Will keep that in mind for the future. I learned something new today! Thx)

vestal spruce
#

From my past experience, as long as your data is consistent and not SVG as its using a different method to represent image. everything should be fine. Albeit some other factor outside of your question that could impact into the performance of your model, such as the general size of the image, usual HR image tend to have thousand if not million of pixel, which could affect the time it takes for your model to process and train the data. I hope my answer is to your satisfaction and be of use. 🎩 👌

oblique quarry
#

Good afternoon, can somebody please take a look at this?

slim bone
#

Hey folks, I think there's a critical knowledge gap in my understanding of gradient descent:
Let us assume a neural network with a single input layer with 3 neurons , and an output layer with 2 neurons
So we feed the system some data, and it outputs some neuron with the highest value (prediction)
I'll ignore the activation function

To fix the weights take some loss function L:
L = loss(w1a1 + w2a2 + w3a3 + b)
calculate its gradient with respect to the weights, and update the weights - This decreases the loss of the function (Assuming we're not already close to the local minima):
New weights: m1, m2, m3
Now we go to the next batch of data and do the same thing: (b1, b2, b3)

Problem is though - now the function has changed: The input is different, and thus the loss function is different - so the local minima of the loss function has shifted elsewhere.
L = loss(m1b1 + m2b2 + m3b3 + b)

What am I missing here? Thanks in advance

#

(Just to clarify, this is me explaining the mental image in my head - not me trying to prove something of course)

tidal bough
#

But weirdly enough, that ends up working alright to optimize on the whole dataset. In fact, even weirder (to me at least), stochastic gradient descent sometimes works better than optimizing on the whole dataset at once, because this jittering helps the model to not get stuck in shallow local minima, but rather move gradually to the global one - much like how optimization algorithms like simulated annealing occasionally accept changes that increase loss in order to break out of local minima.

slim bone
#

Is this because the loss decreases between batches due to chasing the minimum? So the current minimum we're chasing isn't as important as just decreasing the cost?

#

I hope this makes sense

#

More simply, maybe - we're moreso trying to simply decrease the cost as efficiently as possible ,which happens to be in the direction of some local minima, rather than trying to actually reach a local minima

tidal bough
#

Another intuition pump is that if your learning rate is very small, then it should work no matter how small the batch is - because taking a tiny step down the gradient of the entire dataset is the same as taking a very tiny step down the gradient of each sample. (I think I can mathematically formalize this one if you want)

#

And it turns out that in between it it still works - if you have a not-too-big learning rate and use not-too-small samples, the weigths on average end up going in the whole dataset's direction.

slim bone
#

@tidal bough It’s not so much that batching is possible that confuses me - more so that the local minima shifts between each batch and that the algorithm still works as intended

#

Like, it’s the fact that the function changes at any point of time

#

Although, I think I kind of get it now

tidal bough
#

So just perfectly going along the gradient is actually a bad idea. And it turns out that introducing minibatching, and hence a random aspect to the walk, fixes that.

slim bone
#

So I got the impression that the loss function has a single "form" if you will, and that the local minimas never move

#

So maybe if I ask declare what I understand in a concise manner, and you could just confirm:

  1. The loss function changes between each and every step
  2. Thus, the local minimas* move between each step
  3. Despite this property, the algorithm still works
  4. Not only that it works, it's occassionally better, and helps us break out of "bad" local minimas (Typically done with batching, which is what the right side of the image is trying to illustrate)

Are all of these correct?

tidal bough
#

Yeah, this looks right to me. Showing a graph like that is mostly a lies-for-simplicity kind of thing - a realistic one would be where there's local minima everywhere, and some are deeper than others, and just going for the valley in which you start will be a bad solution.

slim bone
#

Yeah, this is obviously just a function with two variables

tidal bough
#

It might also be interesting for you to look up some modifications of gradient descent other than SGD, like gradient descent with momentum, but tbh I don't myself know much about how they work (basically, you can make your gradient descent intentionally overshoot the minima it goes for, which again helps with getting into a global optimum instead).
Maybe even more illuminating would be simulated annealing. It's a metaheuristic algorithm for multidimensional optimization (I don't think people use it in NNs, mostly just for normal problems) - you have an iterative optimizer with "temperature", and for zero temperature it's basically gradient descent, whereas for infinite temperature it's just a random walk. You start with a high temperature and gradually lower it to zero over the iterations. As a result, the optimizer ends up first wandering into a relatively large and deep basin, and then finding its local minimum, and that usually produces decent.

slim bone
#

That's so interesting - indeed a lot of the things in ML feel so... What's the word, deterministic? As in -
"Why should I use X over Y"
"We just tested a bunch of models and we've reached the conclusion that X is typically better"
Which is a pretty unsatisfying answer, but at the same time kind of what you want to hear as a beginner instead of being overwhelmed with even more theory

Specifically, the second approach you mentioned sounds extremely random and doesn't sound like anything you can formally explain beyond "Yeah it just sounded like something that could work and it did"

tidal bough
#

Sure, the reason it's called that is because it's loosely based on the theory of how metals anneal. Works for nature, apparently works for numerical optimization too 😛

#

I suspect that there are in fact more convergence guarantees for all of this than I'm implying, because I don't often read ML research papers, but not sure it's much more.

tidal bough
#

So minibatching (stochastic gradient descent) is provably the same as ordinary gradient descent for small enough learning rates.

#

(Note how this means that this is a case where lowering the learning rate might hurt your model, because the lower the learning rate, the more SGD acts like ordinary SG, which means going for the closest local minimum rather than jumping around - and for training NNs, that's generally a bad idea.)

slim bone
#

Extremely interesting - I’ll read what you’ve formalized in a few minutes (irl shenanigans)

Thank you for your help and the curious insights!

tidal bough
#

Here's the same thing but slightly rewritten (including made slightly more correct by noting the next term of the taylor series, etc) and using linalg notation rather than indices everywhere.

lime karma
#

First time ever hustling with sorta-data-science, and I just challenged myself to build a script to find dominant colors in each frame for a video

#

with AMD Ryzen 5 4600H with Radeon Graphics (12) @ 3.000GHz processing 59450 frames takes ```bash
name id tid ttot scnt
_MainThread 0 139651551671424 73.62470 14472
ManagerThread 1 139651316668096 7.709011 12115
Thread 2 139651325060800 4.361026 8368

small wedge
tidal bough
small wedge
small wedge
#

woah

#

damn I was gonna ask for the source lol

tidal bough
#

i mean, I can post the latex 😛

small wedge
#

nah I figured it was from a book or something, disregard me XD

lean breach
boreal gale
# sleek harbor so basically, if I understand correctly, that's simply setting the default value...

that's simply setting the default value for the second dropdown.
kind of, it's setting the default value of the dropdown when the dropdown's list of possible options changes.

I still don't really quite understand why it doesn't work without it..
it's about the ordering of when callback are invoked on initialisation. and as you rightly pointed out, dash does not make that connection between "value"and "options" for you.

fresh harbor
#

Can model conversion to fp16 take a hit on accuracy? Does it have an impact on inference time?

agile cobalt
#

typically you'd expect for it to decrease the accuracy while either keeping the inference time constant or lowering it, but reducing the model size significantly

#

usually you'll have to train a bit after converting to a different precision iirc

fresh harbor
#

Its a completely closed source model

agile cobalt
#

and?

fresh harbor
#

No instructions on how to train it

#

Just an onnx lying around

agile cobalt
#

I'd recommend not trying to convert it yourself then but rather asking whoever gave you the model then

fresh harbor
#

Alright

errant bison
#

hii, i want to make an automatic licence plate detection, how can i do so and what tutorial should i follow?

lapis sequoia
#

If I wanted to make a mlp in pytorch and then move all the weights to my own library for testing, is there any better move than making a mlp nn.Module and then manually parsing it's .state_dict()?

twilit tundra
# errant bison hii, i want to make an automatic licence plate detection, how can i do so and wh...

There are 3 steps for this kind of task: detecting the license plate, outlining the characters in the license plate, and read those characters. This is a basic tutorial and then you can improve each step: https://pyimagesearch.com/2020/09/21/opencv-automatic-license-number-plate-recognition-anpr-with-python/

twilit tundra
sleek harbor
#

When working with pandas and you have a categorical column, do you usually convert it from the default object type to categorical (which saves a bit of memory and, I assume, makes some operations faster), or is the overhead of the type casting/conversion (whatever it does under the hood) not worth it? What's the best practice here?

twilit tundra
jaunty lion
#

Hey im trying to create a rnn. I have multiple audio dataframes for each song. Every dataframe corresponds to a chunk of the song. this means that songs with varying lengths have varying amount of dataframes. From my very limited understanding of rnn, its beneficial to train it in batchsizes where the batchsize matches the length of the Dataframes for a single element. My question is, if it is a valid approach to pad the amount of dataframes with dataframes containing only -1, so its consistent.
If something i said makes no sense or is stupid, feel free to point it out.

errant bison
mild dirge
#

Dividing it up into three parts is not a bad idea, so if you only need to replace one part that is more doable

harsh bane
#

Hoi, before i ask, which channel is appropriate for help with stable diffuson dependencies and such? Specifically on amd

serene scaffold
harsh bane
#

Could be a imcompatible something that it can't read from because it's too new possibly. Honestly don't know

serene scaffold
harsh bane
#

(134)(deck@arch ComfyUI)$ python main.py --normalvram --disable-cuda-malloc --use-split-cross-attention
Total VRAM 4096 MB, total RAM 11795 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Custom GPU 0405 : native
Using split optimization for cross attention
python: /usr/src/debug/hip-runtime-amd/clr-rocm-5.6.0/hipamd/src/hip_code_object.cpp:754: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion err == hipSuccess' failed. Aborted (core dumped) (134)(deck@arch ComfyUI)$ python main.py --normalvram --disable-cuda-malloc --use-split-cross-attention Total VRAM 4096 MB, total RAM 11795 MB Set vram state to: NORMAL_VRAM Device: cuda:0 AMD Custom GPU 0405 : native Using split optimization for cross attention python: /usr/src/debug/hip-runtime-amd/clr-rocm-5.6.0/hipamd/src/hip_code_object.cpp:754: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion err == hipSuccess' failed.
Aborted (core dumped)

#

As it's a steam deck, i'd toy around with the "--insertcommand" to see what runs the best, but can't get it to even launch :P Gotten automatic to run in the past, but can't seem to get it to work now, so truing comfyui lol

serene scaffold
harsh bane
serene scaffold
#

I'm not sure what to suggest, unfortunately

harsh bane
#

No worries. I'll ask around/wait for someone who could possibly know :P

iron basalt
harsh bane
#

Aye. But got it somewhat working now, with python rocm, now i'm debugging with comfy's creator as there'a a conflict when i try to generate. Doesn't get past clip

novel python
#

guys, I don't think this is worth a topic on help because it's not exactly python-related. But since you guys are used to using jupyter notebooks, I'd like to know something: if you use the vscode extension, does it stop the colors out of nowhere sometimes too? It's getting annoying for me over the last day. It keeps "crashing" the colors, autocomplete, etc. The notebook itself still works. I've already looked for conflicting extensions, but nothing I could find that helped.

novel python
granite atlas
#

I had a question about Neural Networks.

Are there any tutorials which teach how to make neural networks from scratch without using any library or frameworks?

#

I wanted to learn the basics in Julialang, so the language won't matter for the most part as long as it's a sane one.

serene scaffold
granite atlas
#

Numpy doesn't exist in other languages

serene scaffold
#

So? Any language can have constructs that don't exist in other languages

#

And you could have a language where something like bumpy is part of the language

granite atlas
#

@serene scaffold most of the numpy's functions are covered in base Julia interpreter. so what are the suggested tutorials that you have

#

I would still avoid any frameworks related directly to ai/nneu though

serene scaffold
iron basalt
granite atlas
#

I was doing a dude's nneu from scratch in python but he used his own library in middle so i felt betrayed

granite atlas
#

God bless you mate

ashen seal
#

Looking for some guidance in creating an interactive html report similar looking to the image. I have some csv/excel data and want to create a nice dashboard looking report detailing data migration progress. The objective would be to output a single html encapsulating the data and interactive visualisations. Has anyone done something similar before? Would you be able to point me in the right direction?

worn stratus
quartz wigeon
#

Are there any good resources for learning reinforcement learning hands-on? I've tried a few university courses on youtube but all of them are highly theoretical and don't involve code.

agile cobalt
#

well yeah, machine learning is 95% theory 5% code

quartz wigeon
#

If so, as a complete beginner in ML, where should I start learning reinforcement learning? Is it ok to skip stuff like supervised and unsupervised learning and delve into reinforcement learning directly?

#

In short, what are the prerequisites for reinforcement learning?

sonic meteor
#

Can anyone provide me a roadmap or maybe some resources to get started with neruoevolution (genetic algorithms and NEAT)
I am currently doing the Huggy Face Deep RL course (if that helps)
Please ping me when you reply

sonic meteor
quartz wigeon
#

Thanks for the suggestion though, I'll check it out

sonic meteor
worn stratus
quaint loom
#

I may have asked several times about this questions but how would you guys made this into latex text?

merry ridge
#

I normally would not make something like that in LaTeX and instead make it elsewhere and include it as a figure later. If you really want to make it in LaTeX I would use TikZ, but that's not a very pleasant task

ashen seal
mild dirge
ashen axle
#

Hi all, I'm looking for some advice about tensor libraries. I'm working on a chemometrics project worknig with spectrum-chromatograms, 2nd order tensors, and am looking for a python library that will enable me to apply customed preprocessing algorithms to the tensor prior to modelling. In the past i have achieved this by producing pandas dataframes of dataframes, but this is both cumbersome and frankly just feels dirty. I've given a cursory glance to several libraries such as Keras, but they don't seem to fit my needs, at least not superficially. Please help!

serene scaffold
ashen axle
serene scaffold
ashen axle
serene scaffold
#

!docs pandas.Series

arctic wedgeBOT
#

class pandas.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=False)```
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

Operations between Series (+, -, /, \*, \*\*) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.
serene scaffold
#

this?

ashen axle
serene scaffold
#

you should never have a Series of DataFrames. all the DataFrames that are in it should be concatenated into one (potentially with multiple levels of indexing)

ashen axle
#

Yeah I know its unorthodox, hence my question. I was inspired to use a Series of Dataframes by Jodie Burchell in this podcast https://open.spotify.com/episode/6iN2nYZGBTdUAdpVnWvI5W

serene scaffold
#

and keras isn't an alternative to pandas

#

pandas and polars are for the same thing
pytorch and tensorflow (and therefore keras) are for the same thing

ashen axle
ashen axle
rose dagger
#

What's a good modern reference for 3D image classification? Any SOTA models and standard data processing techniques?

broken gorge
#

Hi,
I just got my master degree in experimental psychology from a really good college.

Right now i'm doing a gap year bc I want to properly learn how to code and ML.
I'm not sure yet if I want to do a Phd mixing experimental psychology and cognitive process modeling or become a data scientist.

I just started CS50P(ython) from Harvard and like it very much.
I plan to do the regular CS50 after and follow with CS50AI.

I'm also considering using Dataquest or Datacamp on the side to reinforce/train.
Are there worth it ? Are there equally good ?
(I read on reddit that datacamp is too easy and consist in filling blank, no actual typing. I juste started the free version of the python course and it seems it's not the case in the first course.
On the other hand Dataquest seems more challenging but is lacking variety of courses and is much more expensive)

Thank for reading this long message 🙂

PS: my only real coding experience is some C in highschool and R during college for stats, but R is quite different from other languages from my understanding.

void veldt
#

how does one transfer arguments from minimizer_kwargs in basinhopping to a custom function being used as a method?

molten hamlet
#

I don't know how to copy pandas,

.copy method is not working, .copy(deep=True) also does not work.
"Not working" I mean that columns are not copies, so when I modify columns in one, the other dataframe is also modified 😐

#

segment_df.columns.to_numpy() has some trickery inside not doing copy...

tidal bough
arctic wedgeBOT
#

@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |    a  b
002 | 0  a  0
003 | 1  a  1
004 | 2  a  2
005 | 3  b  3
006 | 4  b  4
tidal bough
#

What's the .dtypes of your dataframe? Perhaps it's something really weird that needs deep copying?

molten hamlet
#

I solved, but problem was with this code.

"""list_ofdf is list of dataframe splited into smaller segments to separate scope
"""
    for segi, segment_df in enumerate(list_ofdf):
        # print()
        segment_df = segment_df.copy(deep=True)
        print(f"Segment {segi:>3}: {segment_df.shape}. cols: ")
        # print(segment_df.columns)

        timestamp_ind = np.argwhere(segment_df.columns == "timestamp_ns").ravel()[0]
        segment_df.iloc[:, timestamp_ind] = segment_df.iloc[:, timestamp_ind] / 1e9
        # print(f"timestamp_ind: {timestamp_ind}")

        base_features = segment_df.shape[1]

        segm_columns = segment_df.columns.to_numpy() # THIS DOES NOT WORK
        #segm_columns = np.array(segment_df.columns) # THIS WORKS
        segm_columns[timestamp_ind] = "timestamp_s"
tidal bough
#

to_numpy tries to make a view rather than a copy if possible, pass copy=True if that's undesirable.

molten hamlet
solar carbon
#

hello guys, does anyone had this problem ?

#

i installed tensorflow, tf-gpu and keras. I want to train a zoo model and i have problems setting up 😦

tidal bough
#

you cut off almost all of the traceback, so hard to tell

solar carbon
#

i have looked on the internet and some say that numpy version is depricated, installed other version and still didnt work

tropic prairie
#

hi can someone help me with a project that I have

#

I have to make a NN model for regression, and the dataset just consists of x and y values so really simple.

young granite
#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

tropic prairie
#

how shd I get started?

tropic prairie
#

yep ig

#

I have around 1 month

young granite
#

so from scratch

tropic prairie
#

yep just need to acheive a high accuracy rate

tropic prairie
#

oh no I can use tensorflow and stuff

young granite
#

🗿

tropic prairie
#

sorry I was a bit unclear haha cuz I'm so confused rn

#

I just don't know how to get started

young granite
#

so did u plotted ur data (x,y)?

tropic prairie
#

yep really linear

#

already cleaned

young granite
#

always good to do a exploratory data analysis

#

ok

#

are u a graduate? (which complexity u want)

tropic prairie
#

yes

#

already hv experience but not project based, education system sucks lol

young granite
#

do u need to present and explain why u chosen certain model?

tropic prairie
#

not really, since I only have to use neural networks, so I don't really need to explore forests or kcluster but I do need to present how I built my model and it's kinda important that I get a good accuracy rate (which shd not be too complication since the data is (x,y)

young granite
#

so for simple regression go with a sequential model

tropic prairie
#

yea that's what I thought thanks a lot!

#

do you know any code online that mimics a project aiming for a high accuracy rate

young granite
#

im pretty sure u will find packages which do that for u

tropic prairie
#

oh never looked into it, do you know what I should search up to get started? like any package names you know?

#

is this similar to hypertuning parameters?

young granite
tropic prairie
#

ok tysm!

gaunt elbow
#

I'm working on an accounting dataset. They have a revenue amount that's like 1000 and then some n number of lines that offset that 1000. I can group the accounting data into small chunks of usually less than 100 lines. Problem is that it the offsetting amount can be 1 line or 10 lines and it's mixed into those 100 mostly identical looking lines. (There isn't any other good data to filter down anymore) My natural inclination is to iterate over every combination to see if any number of lines equals 1000. Then I can show them these as proposed matches and they can nail down which ones they want. Does anyone know if a fast-ish algorithm to do this? I'm going to run it on a GL dataset filtered down to the 25 million potential offsetting lines. Thankfully once it's done we just need to do it go-forward which should be easier

stable mist
#

can anyone teach me pytorch with flask

left tartan
left tartan
gaunt elbow
#

@left tartan sure!
Here are some headers and fake data. This is my revenue line:
Account | cost center | location| date | amount
ABC123 | 111 | CA | 6/30/2023 | 1000

Same headers, here is my offsetting cash.
ABC123 | 111 | CA | 6/30/2023 | -250
ABC123 | 111 | CA | 6/30/2023 | -333
ABC123 | 111 | CA | 6/30/2023 | -500
ABC123 | 111 | CA | 6/30/2023 | -250

Out of the above 4 lines 3 of them will tie to my 1000 revenue. So (for now) I've a recursive function that sums different every possible combo together to see if they match the 1000 revenue amount. So for these 4 lines it should create about 4! Or 24 loops to try every combo until it finds that 1,3,4 add together to offset the revenue. I have a description column to tell me which column is the revenue line but the cash offsets are either blank or not helpful in finding the offsetting amount

left tartan
#

Sounds very reminiscent of two sums, but with N.

#

I guess you could employ a recursive approach or DP algorithm (ie: cache the intermediate calculations so you're not re-calculating) in the abstract.

gaunt elbow
#

Okay yeah! Oh nice 👍 thank you! I'll give this a shot. It gives me better words to keep researching at least

merry ridge
ashen seal
charred light
#

How do I hide **specific **cells from rendering when exporting to HTML in Jupyter Notebook (VSCode) Similar to how you can do it in R studio notebooks. edit: I know about cmd line nbconvert --no-input

ashen axle
#

Question - I have created a duckdb database with a 3d instrument signal table totalling 173 million rows by 9 columns. Trying to introduce this into memory is resulting in kill process. Is a database the best solution for this type of data, or should I be looknig at another format?

twilit tundra
ashen axle
charred light
#

I'm not familiar with DuckDB so I won't be able to give much advice other than that.

ashen axle
charred light
#

It's also good to point out that, at a small scale, it doesn't matter too much. Optimization only really matters when you reach like hundreds of millions of rows +

quaint loom
hollow magnet
#

A little off topic, is VSC ok for the use of SQL ? Or it's better to use My SQL or something else ?

raw zenith
#

Guys, just a quick help, lets say i have size of data (1214 rows , 93 columns), if i want to remove rows based on columns ranging from 44:88 for example using pandas. I am having difficulty achieving this cuz all i can do is remove rows based on columns values, i want to remove just based on columns

#

I tried something like this df.drop(df.columns[44:89], axis=0, inplace=True)

#

but does not work as it drop columns but not rows associated with it

twilit tundra
#

df.drop(list(range(44,89)),axis=0) should work

merry ridge
split drift
#

Hey,
I'm looking for data documentation tool / package.
I want it to document the input data and the output data.

Currently each stage in the pipeline load the data from a given path.
Compute some features, and save to another given path.

Thanks

left tartan
# ashen axle Question - I have created a duckdb database with a 3d instrument signal table to...

I’m usually the duckdb guy around here, but I’d suggest going over to the duckdb discord and asking there. And sharing the query. The reason is; there are strategies for dealing with larger than memory datasets and queries, whether by chunking or writing queries that operate in subsets of data. Another strategy is to partition the source data… I use parquet a lot for this, and keep large data external.

quaint loom
young granite
young granite
upper drift
#

Hi! I'm dealing with a slimy landlord unfortunately and may have to go to a tribunal hearing. I want to be well prepared and had the idea that I could scrape their publicly available decision cases, and then train a LLM using that. For scraping I think I could use BeautifulSoup, but does anyone have suggestions for the LLM part?

left tartan
upper drift
#

They have about 12-13 years of stuff, and it looks about 25-40 cases each year

#

If I could learn some neat stuff while getting some benefit from it, I'd be happy

left tartan
upper drift
#

Awesome, thanks! Also, yea, I have some legal aid, this is just supplementary / hobby project 🤓

zealous hollow
#

if anyone has exprience with time series models especially ARIMA, can you kindly help me with my project. ADF and pacf,acf plots are done. just need help with p,d,q values

left tartan
zealous hollow
#

no work it just gave a straight line as output

left tartan
#

That would seem like some sort of data error, i'd guess. Can you share a minimal reproduction?

zealous hollow
#

btw letme show you the acf and pacf plots

#

the nymber of lags selected is 365
bcz the data is yearly and a value does depend on it's last year value

#

i am using gradient boosting models with input data like this

#

using temp did give it a little boost

#

it's only giving me 0.53 r2score

#

i need it to be atleast 0.75+

#

so trying arima model

#

i am open to other model suggestions as well

left tartan
#

And what do you get from Arima?

#

Here's an example of a simple arima model against a sin+noise signal: ```py

ARIMA Example w sin + noise (updated with correct m =20)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

n_points = 100
x = np.linspace(0, 20 * np.pi, n_points)
noise = np.random.normal(0, 0.5, n_points)
y = 5 * np.sin(x / 2) + noise

from pmdarima import auto_arima

model = auto_arima(y, seasonal=True, m=20, trace=True, error_action='ignore', suppress_warnings=True)
forecast, conf_int = model.predict(n_periods=20, return_conf_int=True)

plt.figure(figsize=(12, 6))
plt.plot(y, label="Data")
plt.plot(np.arange(n_points, n_points + 20), forecast, color="red", label="Forecast")
plt.fill_between(np.arange(n_points, n_points + 20), conf_int[:, 0], conf_int[:, 1], color="pink", alpha=0.3)
plt.legend()
plt.show()

print(model.summary())

zealous hollow
#

what is m?

left tartan
#

the size of the season

#

Like, if you have quarterly data, m=4

tidal bough
#

but... the period of the input data is 20 points, not 25. or to be precise, 19.8, I think.

zealous hollow
#

and if i have yearly ? 🌝

left tartan
vestal widget
#

I want to finetune a language model with my custom data, does the data have like a define format or does the format depends on the model?

left tartan
# zealous hollow and if i have yearly ? 🌝

The question really is what's your "season". Is there an inherent cycle to the data? If you only have annual points, you may not be looking at an arima model, unless there's some underlying cycle to the data

zealous hollow
left tartan
zealous hollow
#

should i use datetime object as input
or
this type of inputs

left tartan
#

iirc, pmdarima doesn't use an x axis... it expects the data to be sequential (chronological) and the observations to be uniformed distributed

zealous hollow
#

hmm so datetime objects as index and simply 'temp' values to it should work rigt

left tartan
#

it doesn't matter what you do for the datetimes

#

Just make sure you don't have gaps.

zealous hollow
#

nah data is proper and completee

zealous hollow
jolly ginkgo
#

Hello guys, I made a library which will genrate random data matrix, would anyone will try?

jolly ginkgo
#

Here
pip install rand-omata

left tartan
zealous hollow
zealous hollow
#

2h

zealous hollow
#

2.5 h

small wedge
#

lmao

left tartan
#

And, you don’t have to use auto arima, you could try tuning the parameters m yourself and measuring aic

zealous hollow
#
import pandas as pd

# %%
df=pd.read_csv('data.csv')

# %%
df=df[['date','Temperature']]

# %%
df.columns=['date','temp']

# %%
df['date']=pd.to_datetime(df['date'],format='%d/%m/%Y')

# %%
df.set_index('date')

# %%
y_train=df['temp'].iloc[:5843]
y_test=df['temp'].iloc[5843:]

# %%
from pmdarima import auto_arima
model = auto_arima(y_train, seasonal=True, m=365, trace=True, error_action='ignore', suppress_warnings=True)
forecast, conf_int = model.predict(n_periods=len(y_test), return_conf_int=True)


#

i think it;s bcz the data has more nearly 7300 expriences

left tartan
#

Maybe autoarima on a subset to get the parameters, then train?

zealous hollow
#

btw still running 💀

#

only a year?

#

btw training data has values from 2003 - 2018

#

while testing 2018-22

#

and pandemic really affected the values i am working with

#

eto so for research standards
is it plausible to explain the drop in accuracy with pandamic as anamoly?

zealous hollow
left tartan
#

No idea, seems odd it’s so slow but I don’t work with daily data much and only use arima occasionally

zealous hollow
#

well i amma try running 2 instances
of one year data of 2 diffirent years

#

if the parameters remain same

#

it should be okay to go with

#

right?

left tartan
#

Yah, yah, the arima parameters aren’t extremely complicated

zealous hollow
#

btw are there any serveices like google collab?

#

free

mild dirge
#

free computing power?

#

Why not collab?

zealous hollow
#

ye

#

isnt collab like very slow?

mild dirge
#

Right, because it's free 😛

#

There aren't that many companies happy with throwing money away for the greater good

burnt saffron
#

Hello

mild dirge
#

And it's not that slow I don't think

burnt saffron
#

How are you

dire iron
zealous hollow
#

...

zealous hollow
dire iron
zealous hollow
#

it's not that big just 7304 expriences

#

should take 2 years right?

dire iron
#

how are you defining seasonal in your dataset?

zealous hollow
#

?

#

didnt get your question>

#

how my data is seasonal?

#

well it's temperature values

#

💀 first and foremost and it's visible from the plots as well

gaunt elbow
#

@zealous hollow I think parsimonious is generally appreciated in ARIMA modeling. Trying to forecast tomorrow's temp probably is relatively related to last year temp but probably more closely related to today's temp. Since theres a moving avg component, using a 365 is just going to move your model to the yearly average which in summer or winter isn't representative at the extremes. An AR(365) is saying that every single day for the past year impacts the temperature tomorrow. I don't think either of those are convincing (personally IMHO)

zealous hollow
#

yeah 💀

dire iron
#

that may fix your issue too

zealous hollow
#

so 1,2? i do have the acf and pacf plots

#

lag no is 76

#

i have tested arima with
0-6 for each p,d,q but only time it even showed something other than a straight like was with 4,0,3

#

but that quickly went to straight line as well only after a few cycles

gaunt elbow
#

Where are your confidence intervals in those graphs? To me that looks like an AR(2) you would need to run the graphs again after running the model on your residuals to look for remaining significance

#

How far into the future are you forecasting?

zealous hollow
#

my data set has temps from 2003-2022 so i am atleast expecting it to go to 2030

left tartan
zealous hollow
#

ye i amma take billy's side on this one 🌝

left tartan
#

Second, op is running pmd autoarima to find optimal parameters. That’s the slow part for op

gaunt elbow
#

Oh I must have misread, I thought they were using a model like (365,0,365) or something crazy.

left tartan
#

Oh no, autoarima searches through the parameter space. Ghosty: can you paste the autoarima output?

zealous hollow
#

still running 🤡

#

made these adjustments training data to only 2 years

left tartan
#

Oh, you’re not printing the output incrementally?

zealous hollow
#

i am a total noob 🤡
through and thorough

#

so let me quickly look it up and do it

left tartan
#

I dunno, my example above would print the models as they are tested

zealous hollow
#
from pmdarima import auto_arima
model = auto_arima(y_train, seasonal=True, m=365, trace=True, error_action='ignore', suppress_warnings=True)
forecast, conf_int = model.predict(n_periods=len(y_test), return_conf_int=True)
#

it's exactly your code for me it's not printing out anything

dire iron
#

set seasonal to false

zealous hollow
zealous hollow
zealous hollow
dense crane
#

i want change the shape of data frame which is (400 x 1300) reduced to (400 x 1200) with something similar to PCA, but cannot use the PCA since n_components has to be smaller than 400, any ideas?

left tartan
zealous hollow
#

oh well i amma wait till it finishes and see the output

dense crane
zealous hollow
dense crane
zealous hollow
#

??

dense crane
zealous hollow
#
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Assuming X is your data and y are the corresponding class labels
lda = LinearDiscriminantAnalysis(n_components=1200)
X_lda = lda.fit_transform(X, y)
dense crane
zealous hollow
#

and you dont want it to stay 400

#

?right

dense crane
#

like i dont have (1300 x 400) but (400 x 1300)

#

yeah i want 1200

#

i want (400 x 1200)

zealous hollow
#

how about rfe?

dense crane
gaunt elbow
#

@zealous hollow what is the problem you're trying to solve exactly? The forecast being flat?

zealous hollow
#

this is my data
i want to predict values of each till 2030(very least)

dense crane
#

thanks for that!

zealous hollow
#

i alreay achieved 93% accuracy with gradient boosting for temp
but rest it's not getting any above 60%

dense crane
#

@zealous hollow but can you in few wordk describes more or less what this does (rfe)

fresh harbor
#

why are many pytorch models not exported to onnx?

zealous hollow
fresh harbor
#

shouldnt this help in getting of torch dependency

zealous hollow
#

someone sent a research paper link here just now related to my work and it got deleted by bot

serene scaffold
zealous hollow
#

i dnot now myseld 💀

left tartan
zealous hollow
#

nah

#

but you can use it

#

||bro please do, it will save me a lot of time||

#

🤣

#

make sure to save your results of testing

left tartan
#

feel free to throw it in a gist or whatever, and dm

#

got it, thx

zealous hollow
#

np

zealous hollow
left tartan
#

wow, that's one heckuva first search

verbal venture
#

The first layer of my fully connected layer is 2x the output of my final model's layer. What am I doing wrong here? ```py

if accuracy is not higher, and not changing epochs or batch size

more CNN Layers, more nodes in layers

'Conv2d, BatchNorm2d, and ReLU.'

class MyModel(nn.Module):
def init(self):
super(MyModel, self).init()
self.model = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(16),
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(64))

    self.classifier = nn.Sequential(
    nn.Linear(64 * 128 * 128, out_features=256),
    nn.ReLU(),
    nn.Linear(in_features=256, out_features=10)
    )
    
def forward(self, x):
    x = self.model(x)
    x = x.view(x.size(0), -1)
    x = self.classifier(x)
    
    return x ```
#

basically my first linear layer needs to be divided by 2, but I shouldn't hard code that. What am I getting wrong about the input parmaeters?

verbal venture
#

mat1 and mat2 shapes cannot be multiplied (32x65536 and 1048576x256)

maiden wadi
#

okay

#

the shapes are wrong

#
self.classifier = nn.Sequential(
        nn.Linear(65536, out_features=256),
        nn.ReLU(),
        nn.Linear(in_features=256, out_features=10)
        )
#

this should fix

verbal venture
#

right. but I shouldn't hardcode 65536 - I should reuse the height * w of the previous output layer. I'm wondering what the numbers are, as I thought the output of the previous layer was 64, 128, 128 (which is incorrect)

maiden wadi
#

yep is more readable using the formula

#

in this case is 64 * 32 * 32

verbal venture
#

yeah, why is it 32 * 32?

maiden wadi
verbal venture
#

But applied to this: nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(64). Wouldn't the output be 64*64?

verbal venture
#

yes

maiden wadi
#

okay in the formula

#

W = 32 (Image size)
F = 3 ( Kernel)
P = 1 (Padding)
S = 1 (Strides)

#

so

verbal venture
#

so the weight and height of the image does not change throughout the network?

#

just the feature maps at each convolution?

maiden wadi
#

as soon as you have the 64

#

refering to out channels

#

64 * 32 * 32

#

of the previous layer

#

the calc is the same

verbal venture
#

@maiden wadi so are feature maps kind of arbitrarily chosen? I was getting confused because I was using that as W to compute the feature maps at the next stage (which somehow worked)

maiden wadi
#

(32 - 3 + 2 * 1 ) / 1 + 1 = 32

#

but imagine

#

we have kerne size of 4x4

#

(32 - 4 + 2 * 1 ) / 1 + 1 = 31

#

now we decrese the value

#

so eache layer will decreese in 1

verbal venture
#

how are the increasing feature map values determined?

maiden wadi
#

There is no like a correct way of selecting the features

mild dirge
#

There is a bit of intution in it that you expect earlier layers to often have smaller features (and smaller perceptive field) thus probably less. and later layers combine them into more complex features, of which you expect more.

#

But no set rule.

#

This translates into later layers having more channels

zealous hollow
#

i have 32 gbs of ram

#

and i just saw python take 20 gb of ram 🤣

verbal venture
#

Does anyone know why the flattened features are 10368? Shouldn't they be 3200? My input dims are 3, 160, 160: ```py
self.model = nn.Sequential(
nn.Conv2d(3, 32, 3, 1),
nn.ReLU(),
nn.MaxPool2d(),

    # 32, 80, 80
    nn.Conv2d(32, 64, 3, 1),
    nn.ReLU(),
    # 64, 40, 40
    nn.MaxPool2d(),
        

    # 32, 40, 40
    nn.Conv2d(64, 32, 3, 1),
    nn.ReLU(),

    # 32, 20, 20
    nn.MaxPool2d(),
       
    # 32, 20, 20
    nn.Conv2d(32, 32, 3, 1),
    nn.ReLU(),
    # 32, 10, 10
    nn.MaxPool2d())
    
    self.classifier = nn.Sequential(
    nn.Linear(10368, 2048),
    nn.Linear(2048, 128),
    nn.Linear(128, n_classes))```
zealous hollow
verbal venture
#

shouldn't the flattened input from 32 * 8 * 8 = 2048?

dire iron
verbal venture
#

don't you flatten the feature maps * h * w

#

if the final layer after max pool is 32 * 8 * 8

dire iron
#

what features do you care about? if output, then only the final layer, right?

verbal venture
#

Idk I thought that was just the process

#

pass the input through the conv layers then get the flattened output and use as FCL input

dire iron
#

well, if you're classifying something, it just uses 3 linear layers. if you are training it, then it is nn.Conv2d(32, 32, 3, 1),

verbal venture
#

yeah I'm training it

#

just confused how 8192 got reached as the flattened layers

twilit tundra
#

If you don't put strides, the dimensions will not be divided but reduced by kernel size * dilation (-padding if there is)

split drift
twilit tundra
clever owl
#

Im doing data transformations on an excel file. Now I wanna test a function that cleans the top off an excel file (the file sorta looks like this)

My company name. Some Additional Info Some Other stuff
Space
Space
Additional Stuff

Column Title 1 Column Title 2
Column Value 1 Column Value 2

Just wondering if for my pytest dataframe fixture, would it be ok to make it read from an excel file instead of making this manually in the dataframe? Or is communicating with an external source strictly bad for testing

brittle lily
#

i have a bunch of rna sequences (and their secondary structures) and their corresponding energy values and im trying to find a way to identify features (patterns in their structures or sequences) common between samples of similar energy values. would be super super grateful if anyone could recommend me algorithms to look into using - im assuming this would be an unsupervised learning project and i only have experience with supervised stuff, but im looking into pca right now and not sure if thatd be useful. should i be looking into something else?

intuitively im imagining it as like a clustering + feature extraction problem where i have a bunch of dots, each representing an rna sample, and then the axis represents energy so the dots could be clustered in energy value similarity and then within each energy value cluster patterns/relationships could be found between the sequences and structures of the samples. but not sure if an ml algo exists to handle this and pca seems like not what im looking for because the axes would be principal components and not energy... pls help if u have any ideas!

#

again would really really appreciate any ideas for how to go about doing this or validation on whether pca is the way to go

quartz wigeon
#

Really appreciate if someone can help me out. I'm a self learner and don't have access to a teacher so discord is my only way to ask questions

sleek harbor
#

Is knowing stuff like Agile (Scrum, Kanban) and/or Jira necessary for data scientists/analysts? What tools do you use?

boreal gale
#

ads are not allowed here unless they have been previously approved. please remove your post if you haven't obtained permission to post this.

boreal gale
past meteor
#

Properly applying the principles of the agile manifesto are of benefit to nearly every job

#

But it's improperly applied more than it is so it's pretty toxic in reality

slim bone
#

Hey folks, for my summer vacation I took up ML as I'd like to work in the field in the future (or something close to it, at least)

Learning about the fundamental theory of neural networks was fascinating but I find the programming a bit... uninspired? I just find myself simply copy-pasting everything which feels pretty lame, regardless of how cool the outcome is.

Basically, I'd like to know if this is a common issue, and how can I make the coding process a little more creative? I feel like I'm lacking vision regarding what's exactly out there.

Hopefully most of what I'm saying makes sense, if not @ me and I'll clarify
Thanks in advance!

small wedge
split drift
#

While using pandas, should I write in a chaining method, or is is a recipe for disaster?

heavy crow
#

Try picking some project (mnist classifier) and try creating the solution from scratch. Sure look at how other people have done it, but don't look up a tutorial

#

Take the tensorflow example, and then start by implementing some steps. (2d convolution, forward pass etc)

fierce merlin
#

Yo guys im trying to build a posture detector app, from what ive found out, ill need openCV + either tensorflow OR pytorch, what would you guys recommened? i need some kind of guidance on this

slim bone
small wedge
# slim bone I'm really sorry for the late reply. I only did a couple of vision models, where...

Yeah, with modern high level libs/apis for ML it is pretty much all implemented for you. I'd say there are two branches here to go down, either you should pick a project that doesn't have a fleshed out tutorial. Maybe even something you have to collect your own dataset for, and do some experimentation implementing your knowledge of theory on a sort of novel project. That will lead you to do a lot experimentation with activation functions, architectures, optimizers, hyperparameters, maybe even push you to researching other types of models such as RNN's.

I also know you've been working on the math, another option is to go the other direction and try to implement all of the low level stuff yourself. For this I would recommend using a reputable dataset like you have been, and a task that requires a small/shallow/simple model like classifying MNIST handwritten digits.

fierce merlin
slim bone
# small wedge Yeah, with modern high level libs/apis for ML it is pretty much all implemented ...

and a task that requires a small/shallow/simple model like classifying MNIST handwritten digits.
I actually did try implementing this but I was a tad bit overwhelmed. I later implemented it with Pytorch with, like, 25 lines of code (Insanity!)

I am a lot more interested in the "under the hood stuff" than the actual implementation of things. So I suppose I'm a little more interested in the math-ier side of things and actually understanding what's going on.
On a slightly different note, I can't help but wonder what people who work in the field actually do? I can't imagine they're just using Pytorch all day long.

slim bone
# twilit tundra Why not?

Apparently it's a highly specialized field that requires master's and PhD degrees in order to be* qualified to work in the field
It seems like Pytorch has removed so many layers of abstractions it's almost unreal. I'm probably missing something though

desert oar
#

how you proceed through the very long process of developing all 3 of those things is up to you and depends on your immediate interests

desert oar
# slim bone Apparently it's a highly specialized field that requires master's and PhD degree...

i do data science professionally and i have a fairly "light" masters in quantitative social science. i've had to go back and re-learn several parts of the math that i didn't learn well or didn't learn completely enough the first time around. but even before i did that, i knew enough to fit models with pytorch. i just didn't have a really solid grasp of things enough to develop more advanced customized solutions to problems i had.

if you're doing NN stuff you're probably using pytorch on a regular basis. but plenty of people are very productive in data-scientist-like jobs and generally don't need NNs on a regular basis. it depends on a lot on what you specialize in and/or what your particular company/industry needs.

slim bone
slim bone
# desert oar you kind of need all 3: how to work with code, understanding the math (not just ...

Also, regarding this - If I had to put my current ambitions into words as accurately as possible I'd say "Whatever it's like to work as an AI(*) Engineer/Scientist/Whatever in the industry, I'd like to experience that"

Unfortunately I have no knowledge in stats nor probabilities yet, so if that's impossible I'll probably put that ambition on hold. I would like to know if that's actually the case though and if I could do something relevant without any knowledge in stats

(*) I don’t know if this is the generalization I’m looking for

sleek harbor
#

Is knowing ORM necessary, or is SQL enough?

twilit tundra
slim bone
twilit tundra
#

Well, it is but it's not that hard? pithink

slim bone
#

Oh, that would be cool if that’s the case 🙂

twilit tundra
#

I wouldn't recommend it over an engineering position if you're struggling with stats and probabilities, but if it's just that you haven't studied it yet and you have a good sense of stats then there shouldn't be any issue

slim bone
#

I don’t really believe you’re inherently “bad” at something. From my experience there’s a strong correlation between grades and interest

#

That’s besides the point though

#

I’m really just trying to understand how to experience “the real deal” of AI/ML engineering or whatever

past meteor
#

Because each company defines it differently

sturdy canyon
slim bone
#

And some common experience all of them share

sturdy canyon
#

I've taught middle schoolers how to implement image classification models

slim bone
#

I don't understand what this has to do with the question at hand though

twilit tundra
#

It means middle schoolers can work as data scientists

slim bone
#

I... have to call BS on that, I'm sorry

sturdy canyon
#

I believe it's because you haven't posed an answerable question. There is no "standard" foundation necessary, as different companies define the roles and the requirements for those roles differently

#

My point in mentioning teaching middle schoolers classification is if someone needed a very basic classifier to be implemented across a lot of different applications with pretty low accuracy, they would be qualified to do it

slim bone
#

Okay, I suppose there's no "standard" for being a CEO either - but I'm not asking for an absolute standard but rather a generalisation of said standard

That generalisation appears to be higher education. I'm fairly sure there's a common derivative at the end of the tunnel

#

Also, just for the record - @ Tonabrix seemed to have an idea of what I'm asking, and indeed suggested a couple of "paths" or whatever one might call them

sleek harbor
# slim bone Okay, I suppose there's no "standard" for being a CEO either - but I'm not askin...

I'll just note that I don't have a job, so.. don't listen to me. But if you want to work with AI, higher mathematics and statistics definitely will not be extra. I've been regretting that I didn't try to actually remember what I studied at university, and I haven't even gone very far down the rabbit hole of DS/ML/AI

Can you be a DS/MLE without strong fundamentals of math and stats? Yes, there are enough abstractions and good libraries to get by. But you'll eventually have to learn all that stuff, probably :3

slim bone
#

The question has sort of been missed throughout the entire conversation unfortuantely, it started out fairly simple and diverged to other topics I'm afraid

twilit tundra
#

You'll need at least linear algebra or stats, both is better

sturdy canyon
#

I also understand what you're asking, but I don't believe you're asking the right question. It's pretty straightforward to figure out what is necessary to become an ML person "in general", and what they do. Just go to indeed, search Machine Learning Engineering and do your best to find common threads. Past that, everyone is just going to give you responses based on their personal experience

slim bone
#

I... thank you but that really wasn't the question - I have a rough idea what mathematical background is required. I'm currently in university lol

sturdy canyon
slim bone
slim bone
twilit tundra
# slim bone I'd like to emphasize that I'm not looking for an objective answer in case that ...

My main tasks as a data scientist have been (on different projects, I'm a consultant) :

  • data cleaning/analysis
  • feature engineering
  • being up-to-date on recent models/advances
  • finding ways to exploit available data
  • designing models
  • fine-tuning models (can be deep learning models or boosting ones)
  • develop an interface for POC
  • deploy models/apps (most of the time in a cloud environment)
  • communication to stakeholders
sturdy canyon
#

I now do data science for a living, and have my own business that provides ML solutions to clients. However my situation will be vastly different from yours because I started as a mechanical engineer that got interested in applying the statistics electives I took in uni after working on measurement systems. I eventually worked my way through jobs focused more and more on data/stats/ML and now I'm here. In my experience, your focus should be on what makes you want to investigate and try things. If you want a list of things you should be able to do, I know a lot of what Rose said as well, but I didn't know how to do a number of them when I got what I'd consider my first true job in this field. Additionally, a lot of what I do in my actual job involves estimating risk (and therefore stats), but if you're working on keypoint tracking to put digital butterflies on people in a museum, you may never touch the kind of risk analysis I do

sturdy canyon
#

I also have two friends that work in AI for the defense sector and high precision optics respectively (I'm in healthcare). They focus on WILDLY different things than me once the "standard" stuff above is done or needs to be customized for a specific purpose

slim bone
#

Thank you both for the detailed description. This does give me some insights regarding where my question lacks. If you don't mind though, I'd like to try and explain my current situation and perhaps just seek "advice", instead of asking something concrete, regarding where I should continue from here

I've picked up ML not long ago, and learning about the fundamental theory of how the field absolutely fascinated me. Unfortunately when I got to work with Pytorch I've discovered a lot of those processes have been simplified to lines like

outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()

Which on one hand, is incredible, but on the other hand this made coding not fun at all, as everything feels like a black box.

Essentially, I like to know what's going on but not sure where to continue from here. "Read a book" seems like the obvious answer here but most of the books in my arsenal are study books which require knowledge I yet have.

Essentially, I'm not sure where to go, or what I'm even looking for. I'm hoping this vague description of my experience should suffice for you folks to understand where my interest is and what it is I'm trying to do

#

If I haven't mentioned already, I finished my calculus and linear algebra courses - no knowledge in probability/stats as those come in the following semesters

placid cedar
#

hi guys i need ur help. if i have a line chart with the x axis from 2020 to 2022, and the y axis being sale quantity, and i have different lines representing each store which are the legends, is that bivariate or univariate

tidal bough
# slim bone Thank you both for the detailed description. This does give me some insights reg...

I got a lot of my knowledge of ML from Ngo's free introductory course on coursera. That was years ago, back when that course was entirely in Octave, so I don't know how the modern version (which is in Python) compares, but the course had many assignments for implementing ML primitives like backpropagation, gradient descent, support vector models, etc. Maybe that'll make them feel less like blackboxes for you (if the modern version even has these exercises still, of course).

placid cedar
twilit tundra
sturdy canyon
# slim bone Thank you both for the detailed description. This does give me some insights reg...

I suppose the question I have is why do you want to learn more about how these processes exactly work? Are you hoping to go into research and/or work independently to create the next best pytorch or the most accurate/fastest model architecture X? Or is it more related to confidence in what you're developing? Pytorch is open source, so if you don't like not knowing how something works, you should be able to find exactly how these "black boxes" work. At one point I also felt like I needed to understand absolutely every detail about everything in stats worked, but after a while I personally found it to be far more practical and enjoyable when encountering something new to learn just enough to be able to implement it while paying attention to the assumptions/uncertainties with the model. That way I could see how it worked, and then determine how much more I needed to learn in order to achieve what I was trying to do.

slim bone
tidal bough
#

I haven't looked at that course nowadays but back when I did it, it had lectures on a lot of the theory, and the implementation tasks were mostly of little parts.

twilit tundra
slim bone
# sturdy canyon I suppose the question I have is why do you want to learn more about how these p...

"Why?" is a bit of a hard question because I'm at the beginning of my road of course. When I envision myself working in the field I'm thinking about the "Big dreamy models", like chatGPT or an automated car or whatever.

Of course this is all a "postcard description" but I'm not sure how else to put it. The dream project would probably be an attempt at a general purpose AI? Or a model capable of automating basic tasks millions of people work have as their job nowadays?

Whatever that entails, ig. Honestly a part of me just tells me to wait out and take the university courses I'll inevitably have to take anyway, but I have some time to burn and it'd be lovely to spend it on something I gravitate towards

slim bone
twilit tundra
#

You can enroll for free on all courses

#

You just don't get the certification

slim bone
#

Wait, really?

twilit tundra
#

There is a small link when you clik to enroll on a course

slim bone
#

Must've missed it, I'll check again. Thank you

tidal bough
#

yeah, enrolling for free locks you out of some assignments but it's usually not very different

placid cedar
#

im getting quite confused and struggling to disguish bivariate and univariate analyses, cld anyone lend me a helping hand? 🥲

#

tried finding information online, couldn't really find any useful sources

sturdy canyon
# slim bone "Why?" is a bit of a hard question because I'm at the beginning of my road of co...

As someone who spent a bunch of time trying to learn fundamentals before jumping into the real world, only to realize once I got there that I actually wanted to do something totally different (mechanical engineering -> data sci + ML), I would HIGHLY suggest you spend some time trying to answer that question. It's also why I keep asking the questions I do and responding in the way I do. I would suggest figuring out what you want to work on before going in deep on the how. Some understanding is necessary to pick a direction, but not much. Whether it be the nitty gritty math, or implementing models for time series analysis, image analysis, NLP, etc. I would pick one (or many) things that interest you and try to figure out how to do it yourself, even if it doesn't work well. If you really like what you picked, and are like me, you'll find motiviation in figuring out how to make it better. From there, you have the context and the reason to want to dig into the "black box" and find out what you need to learn in order to do what you want to do.

#

Or, you could go deep into a field and discover that you really love it via learning the technical details first and everythings great! Though, in my case I just hope some day I find a use for the fluid dynamics of viscous plastic extrusion that's still taking up space in my brain pithink

slim bone
# sturdy canyon As someone who spent a bunch of time trying to learn fundamentals before jumping...

Admittedly, I just enjoy learning about the abstractions and the math behind things at the moment :/ So I figured whatever involves "that", is what I'd like to try for now

I completely agree with what you're saying, about trying out everything, but I have a hunch that a lot of "things" are sort of "out of my reach" in terms of knowledge (Please do correct me if I'm wrong on this, but NLP seems crazy complicated for example).

That might be kind of what I'm poking at though - how do I even go about "trying everything"?

#

I'd like to emphasize that this is supposed to be more "fun" than actually practical. Although, if I could spare myself some future headache when I'll actually have to learn about this in university that'd be grand

slim bone
twilit tundra
#

A psych major doing ML would be good for the field

sturdy canyon
#

I've got to go, but feel free to DM me if you still have more questions

rough nova
#

@sturdy canyon
Ok

#

Tire means

slim bone
twilit tundra
#

I meant the ML field

slim bone
#

Yeah I got that haha

raw compass
#

What is the best Linux server for training a model?

#

Like what cloud

twilit tundra
tidal bough
#

I remember seeing some nice site that trained some model (ResNet?) on several cloud providers and provided a table with the costs, but I don't have a link

#

(I think AWS was ahead? not sure the results would even apply a year or two later, though)

raw compass
twilit tundra
#

AWS is probably the most popular one, followed by Azure

raw compass
misty flint
#

??

#

oh you mean linux distro on the cloud?

raw compass
misty flint
#

amazon linux is the default for most

#

they have their own flavor of distro

#

for aws

#

its so their services can be compatible (SageMaker, ECR, Lambda, etc.)

#

dont know about azure or gcp

twilit tundra
#

Out of curiosity, why are you interested in the distro used by cloud services?

sturdy canyon
dusty valve
#

hello #data-science-and-ml, i got a bunch of different datasets, some are weekly data, some daily, some monthly. i need to group them all together, preferably like i round down the weekly rows to the monthly rows, daily rows to monthly etc... so should i just write up a script that finds the month that each week/day occurs in?

#

how would i do that with pandas?

raw compass
twilit tundra
final hound
#

hi,i know this is pretty simple but how do i take an average of a dataframe column, I tried np.average but i get errors, ive also tried .mean(). with similar errors

serene scaffold
#

(In general, you should always give the error message for the error you need help with. if you just say that you got an error, we have no way of knowing what it is until you tell us.)

slim lance
#

What’s a good format on disk for time series data? (Basically wondering if there is something like parquet but optimized for time series?)

left tartan
shy rock
#

Hi @hot obsidian - How to call(execute) a function with 2 or more dataframe arguments? Sample code as below.. I would like to print the result of this function.

#

import pandas as pd
def department_highest_salary(emp: pd.DataFrame, dept: pd.DataFrame):
merged_data = pd.merge(emp, dept, left_on='DEPTNO', right_on='DEPTNO')
grouped = merged_data.groupby('ENAME')['SAL'].max().reset_index()
result = pd.merge(merged_data, grouped, how='inner', left_on=['DEPTNO', 'SAL'], right_on=['DEPTNO', 'SAL'])
result = result.rename(columns={'DEPTNO': 'Department', 'ENAME': 'Employee', 'SAL': 'Salary'})
return result[['Department', 'Employee', 'Salary']]

left tartan
#

Not sure I understand the q. You're asking how to call a function with two arguments? That's just result=department_highest_salary(df1, df2)

shy rock
#

Hi @left tartan - Thanks for response. I did pass the arguments like below. However, i am getting a key error

#

result=department_highest_salary(emp, dept)

left tartan
#

You'd have to share the full error plz.

slim lance
left tartan
#

That's more where hive comes in tho, partitioning across... say... producer

slim lance
#

I see.. So I can’t do this just in the storage layer? (I wanted to just use python with an on disk format.)

left tartan
#

Hive partitioning is just a directory organization of parquet files, so yah, you can do it in the storage layer

slim lance
#

Ah ok.. I understand now..

#

🙏

left tartan
shy rock
left tartan
#

You can't upload files

shy rock
#

KeyError Traceback (most recent call last)

D:\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1838 values = self.axes[axis].get_level_values(key)._values
1839 else:
-> 1840 raise KeyError(key)
1841
1842 # Check for duplicates

KeyError: 'DEPTNO'

#

KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18972\3746871449.py in <module>
----> 1 result=department_highest_salary(emp, dept)

~\AppData\Local\Temp\ipykernel_18972\3858326592.py in department_highest_salary(emp, dept)
5 merged_data = pd.merge(emp, dept, left_on='DEPTNO', right_on='DEPTNO') # join two data frames based on
6 grouped = merged_data.groupby('ENAME')['SAL'].max().reset_index()
----> 7 Merge_group = pd.merge(merged_data, grouped, how='inner', left_on=['DEPTNO', 'SAL'], right_on=['DEPTNO', 'SAL'])
8 Merge_group= Merge_group.rename(columns={'DEPTNO': 'Department', 'ENAME': 'Employee', 'SAL': 'Salary'})
9 return Merge_group[['Department', 'Employee', 'Salary']]

D:\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
105 validate: str | None = None,
106 ) -> DataFrame:
--> 107 op = _MergeOperation(
108 left,
109 right,

D:\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
698 self.right_join_keys,
699 self.join_names,
--> 700 ) = self._get_merge_keys()
701
702 # validate the merge keys dtypes. We may need to coerce

D:\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
1095 if not is_rkey(rk):
1096 if rk is not None:
-> 1097 right_keys.append(right._get_label_or_level_values(rk))
1098 else:
1099 # work-around for merge_asof(right_index=True)

timber sinew
shy rock
charred light
#

For Power BI, How do I create a dynamic index based on the current view? Like .reset_index() in python applied every time the table is updated.

coral field
#

Is there any way to get more free compute credits on Google Colab after using all 100 (?) hours?

#

And does the T4 GPU even take up credits?

late shell
#

Hello everyone, I want to use an open source LLM (like LlaMA 2) for text generation task. My prompt looks something like this:

Use the given question and context to generate a detailed, 
authentic description about the machine. Make it sound as if you are a great salesman and are pitching this machine 
to a potential buyer. Use good formatting and the description should not be too long (About 200 words only). 
Try to make it as easy to read as possible. Most importantly, you absolutely must include all the information provided in the description 
that you generate. Do not make up new information. It's a pre owned machine, therefore the description should not be like the launch of a new product.

Generate a description of the machine using the information provided under the Context.
 
Context: 

categoryName: Post Press
subcategoryName: Saddle Stitcher
subsubcategoryName: Conveyor belt
manufacturerName: Monotype
Year: 0.0
MachineModelName: Boston Double Head Stitching
Location: Germany
Info: DOUBLE HEAD STITCHING MACHINE BOSTON

2 HEAD FLAT AND SADDLE STITCHING MACHINE
DOUBLE WIRE

SCENARIO: I'm currently running the ggml quantized version of llama-2-7b and llama-2-13b locally (I can't use API based models due to data security concerns by my company). The results that this prompt generates are somewhat satisfactory for a starting point but the problem is that it takes around 4-5 mins to generate the whole response (with 33.6gbs of ram and NViDIA GeForce GTX 1080 Ti ) . Sometimes it just keeps on running for 15 mins and doesn't generate anything.

QUESTION: I'm wondering if I can either speed up the inference somehow or even considering to downgrade my model since maybe the llama-2-7b/13b model (quantized) could be an overkill for this task. I want to use a model that gives satisfactory results while using the least amount of resources since I need to run this model on the company server. How do I go about narrowing down my model hunt for this task?

void sail
#

If you want to research more on your own this issue is called: inference speed up / tokens per second

#

On a mac m1 mac you can reach without any effort 20 / 30 tokens per second on llama 2 13b 4 bit ggml

late shell
#

Oh okay okay. vLLM looks promising. I'll look into it. There's also exlllama, do you think that could be helpful too?

covert hearth
#

Hey All,

I am not sure this is the right chat, but it is about LLMs.
So, I am trying to get https://github.com/mosaicml/llm-foundry running.
Short: I think I am there, the only problem I seem to be facing is that composer seems not to be using my conda env and thus I am not able to run the train example on this page.

Long:
I installed all deps et cetera and I am trying out the quick start example.
When I am at the training part, you have to run:"composer train/train.py \ et cetera"
This returns ModuleNotFoundError: No module named 'llmfoundry'. Which is interesting since when I open python and import this, it works.
When I was debugging I found that composer wants to use another python executable: /sw/arch/RHEL8/EB_production/2022/software/Python/3.10.4-GCCcore-11.3.0/bin/python
Is there a way to force to use the conda env python with all the required packages et cetera?

quartz wigeon
#

Is there a machine learning classifier algorithm that classifies points on a 2d plane using a vertical or horizontal line as a separator?

#

I'm trying to write an ensemble learning algorithm from scratch, and I need a simple classifier like this as my base learner

sleek harbor
#

🤔 how important is it for a data scientist to know a framework, like flask or django? Or is that a mostly useless skill and not necessary at all?

mild dirge
quartz wigeon
twilit tundra
mild dirge
# quartz wigeon can you clarify? I'm quite new to machine learning

So logistic regression predicts a line that separates the points in 2d space (with an orientation and position). If you flatten the points, i.e. take only the x coordinate, or the y coordinate, you can predict a point that separates points on either side of the point in 1d space. This is the same as predicting a horizontal/vertical line in 2d space that separates the points.

quartz wigeon
mild dirge
#

The important part here is that you only use the x-coordinate or y-coordinate, as this forces you to predict a horizontal line or vertical line.

past meteor
#

Nowadays I'm more interested in making data / AI products and not just making models so I learnt those by myself. It's definitely not a requirement though

sleek harbor
past meteor
#

I read MDN's documentation first and then learnt (some of) Django and did a project without any JS

#

Django is one of best documented projects so it's a good place to start

sleek harbor
#

I was thinking to just learn js (svelt or react) and do everything there

past meteor
#

Afterwards I progressively went towards JS, Typescript and so on

sleek harbor
#

look at me, making plans for the distant future when I don't even have a job.. :/

harsh bane
#

Hoi, i'm really close to getting stable diffusion to run on steam deck in ubuntu 22.04, any idea how to fix these last remaining conflicts/issues?

past meteor
#

In businesses notebooks and models (purely exploratory work) don't really mean too much unless you're a bonafide statistician. You need to be able to put it into production / work. Smaller companies don't have the budget to have a data engineer, data analyst, AI engineer, frontend dev, backend dev and a devops.

#

You can be a generalist and spread yourself a bit more thin, but do end-to-end work

#

Or you can be a specialist and pick out for instance NLP, Vision, Time series, ... or a business domain e.g., finance and do that really well.

somber prism
#

Hi anyone here familiar with fiftyone module? I’m getting a ServiceListenTimeout error , which is fiftyone is failing to bind to a port while importing the module

void sail
#

At my company we interviewed 15 for junior and 0 made it throughyert

#

So if you become skilled it should be "easy" to get a job depending on location

void sail
sleek harbor
sleek harbor
#

I wonder if I would pass 🤔

void sail
#

If it helps you

soft badge
#

GUys i am try to compare 2 dataframes verify what rows changed, created and removed anyone can help?

north epoch
#

Can someone help me to write data into excel faster. For 20K records, it is taking around 25seconds in pandas xlsxwriter, Using pyexccelerate it is taking around 12 seconds. But pyexcelerate has limitations on the formatting(Accounting format)

iron folio
#

Hi anyone here who tried ibm watson ai to create a chatbot using python?

soft badge
soft badge
#

but the rows in df are diferrents

#

what dont change is the columns

twilit tundra
#

Yes that's what this function is for

#

Gives you the rows that are different

soft badge
#

ValueError: Can only compare identically-labeled (both index and columns) DataFrame objects

#
diff = df1.compare(df2, align_axis = 0)
visual tundra
#

Can someone please help me with this

I tried to import 'llama_index' in Jupiter and it shows error as following :

'If you use @root_validator with pre=False (the default) you MUST specify skip_on_failure=True. Note that @root_validator is deprecated and should be replaced with @model_validator.

Apparently Pydantic V2 has made some changes and it is showing this error .

twilit tundra
neon jay
#

Hey guys, I'm doing a kaggle comp rn and I am using gradient boosted regressor model along with using iterative imputer to fill in missing values. My laptop apparently isnt performing this well and has been running for a few hours I think there's some problem with it. But if I gave someone the dataset and the code could you please run it for me? It would rlly help a lot with my chances of getting higher on the leaderboard

vernal dome
#

Excited to announce the initial release of VectorFlow, written in Python! VectorFlow is an open-source, high volume vector embedding pipeline.

Our pipeline is built to embed large volumes of data quickly and reliably. While embedding a handful of documents for Q&A is straightforward, the real challenge arises when ingesting gigabytes of unstructured data to leverage the full power of LLMs on top of your data.

With just a simple API request, you can effortlessly embed raw data and store the vectors in your vector database, eliminating the need for intricate cloud infrastructure setups.

🔗 Check out our Github repo and give us a star: https://lnkd.in/en6FhfN9

For all the innovators working with vector databases, we're eager to hear your insights, feedback, and ideas for the roadmap.

Demo can be viewed here: https://www.youtube.com/watch?v=aQOlOT14DaA

And our website is here, sign up for a free consultation: https://www.getvectorflow.com/

VectorFlow is an open source, high throughput, fault tolerant vector embedding pipeline. With a simple API request, you can send raw data that will be embedded and stored in any vector database or returned back to you.

▶ Play video
granite atlas
#

@iron basalt good evening mate

#

I had a question about the nneu source material you suggested to me prior

#

When we multiply the "slopes" by the error, we are reducing the error of high confidence predictions

#

What does this statement mean

sleek harbor
neon jay
#

What I didn’t even know about that

#

Thanks

sleek harbor
iron basalt
#

The following sentences is what they are getting at.

#

This post is mostly for the code, if you want a understanding of the mathematics there are other places to look.

placid cedar
#

hey guys, i need some help clarifying a concept. is anyone available to help me at the moment? 🙂

#

i have 3 variables, date, store, sales amount. i put x axis as date, y axis as sales amount, and store as legends.

is this univariate, bivariate, multivariate, or neither of them?

void sail
#

Multivariate if you use all three store timeseries for a singular goal

#

E.g. a timeseries with multiple features (2+)for each timestamp

midnight pagoda
#

Hello, does anyone know why tensorflow stuck on import? i've waiting and nothing happen on the console(ignore the typo)

#

heres quick spec of my system:

OS: Linux Mint 20.1 x86_64
Host: 80MH Lenovo ideapad 100-14IBY
Kernel: 5.4.0-58-generic
CPU: Intel Celeron N2840 (2) @ 2.582GHz
GPU: Intel Atom Processor Z36xxx/Z37xxx Series Graphics & Di
Memory: 1227MiB / 1869MiB

I'm sure the processor/ram is not a problem when just import the module(right?)

sonic knoll
#

Hello guys

#

Could someone recommend me a book to learn Python through projects?

ashen axle
dawn patrol
#

hello

#

I try to convert file .py to .exe

#

I run file .py and it is well run, but I run file .exe and it is not run.

#

help me

#

Thank youuuuu

midnight pagoda
dawn patrol
#

the file .py with only content print('hello')

#

hiccc

ashen axle
ashen axle
dawn patrol
#

I use PyInstaller to convert file .py to .exe

#

When I run .exe, it not run.

#

But I run file .py, it display the word "hello"

#

I also use "Auto py to exe" tool to convert, but the result similar

midnight pagoda
# ashen axle is it still hanging? whats the last line of the output?

this

# code object from '/home/alfarizi/Documents/machine_learning_flask/venv/lib/python3.8/site-packages/tensorflow/lite/experimental/microfrontend/ops/__pycache__/gen_audio_microfrontend_op.cpython-38.pyc'
import 'tensorflow.lite.experimental.microfrontend.ops.gen_audio_microfrontend_op' # <_frozen_importlib_external.SourceFileLoader object at 0x7fa0632c0a30>
timber nexus
desert oar
desert oar
desert oar
desert oar
slim bone
slim bone
past meteor
#

It depends on what statistics.

#

Machine learning is technically a subset of statistics that is pioneerd by computer scientists

desert oar
# slim bone Don't worry, I couldn't skip them even if I wanted to 🙂 As a matter of fact, I...

good. you basically need all of this:

  • calculus (pre-req for probability mostly), ideally also multivariate
  • linear algebra
  • probability
  • statistics

that's a lot of new material to learn and intuition/understanding to develop. if you think you understand the basics after a couple of lectures, you didn't spend long enough time pondering them. take the advanced classes, but don't lose sight of the fact that it all builds sequentially, and you can't really apply any of the advanced material without really understanding the fundamentals.

tough radish
#

We are not a job recruitment board. Please do not post job ads in the future.

past meteor
#

From that perspective it makes no sense to limit yourself to just "machine learning", techniques from "traditional" statistics are valuable

desert oar
past meteor
#

Take as many classes of those are possible because they'll cover techniques you may (or may not) use in the future

slim bone
past meteor
#

Intro to stats typically teaches probability theory (make no mistake, this is NOT part of statistics, it's a prereq), descriptive stats and inferential stats

#

People hate probability theory and get turned off statistics as a whole

desert oar
#

it's unfortunate because probability theory is more useful than traditional statistics in some fields, e.g. reasoning about rare events and uncertain outcomes even when you don't need to "fit a model"

slim bone
past meteor
#

Descriptive statistics is (summarizing data) very important for machine learning as it relates closely to experimental data analysis

desert oar
slim bone
desert oar
#

heh, if there's ever a class that i don't think has been even remotely useful for me, it's real analysis

past meteor
#

Inferential statistics is at the heart of machine learning. I see too many practitoners (even people at work!) focussing on getting a very low MSE/ high accuracy when the point is actually getting unbiased estimates of performance

#

To whole notion of unbiased estimates is very rarely covered in ML classes (We only briefly spoke about it in my entire AI masters) but it's a big part of statistics

desert oar
#

(however if you get into numerical computing then yes real analysis i believe becomes very important)

slim bone
slim bone
desert oar
#

this is also why people tend to need a masters degree to even get into this field. you're usually packing a ton of things into your 4 years at school (as you should!) and you need a year or two to reset and focus a little more heavily on a smaller set of core ideas + spend more dedicated time on a thesis or capstone project

slim bone
desert oar
past meteor
#

Doing Kaggle implicitly helps explain this discrepancy

#

Cause in Kaggle you actually have 2 jobs:

  1. getting a good model
  2. Finding a way to robustly evaluate your models
#

If you don't succeed at both you're bust

slim bone
past meteor
#

School teaches you point 1. Unless you go to production and your model fails you won't learn point 2 either at work

desert oar
#

school ought to each 2 as well

#

some curricula cover it. traditional stats does to some extent

past meteor
#

They ought to, but they don't cover it well enough

#

Explaining what a roc curve and cross validation is, isn't enough

desert oar
#

CV at least is a valid and useful technique

#

i remember we learned about cv, bootstrap, leave-one-out, AIC, etc

past meteor
#

It is, but if you do like the people at work and you CV endlessly

#

It defeats the purpose of CV

desert oar
#

it wasn't particularly well-informed introduction or instilled deep understanding, but at least i'd seen it before

slim bone
#

Admittedly I don't look at school as a practical tool for the job market
I don't mind learning "useless things". To me school is just a foundation to obtain the ability to learn whatever's necessary

desert oar
#

btw there's some pushback now against "unbiased" estimation in statistics as well. the machine learning concept of bias-variance has a lot of overlap with the use of priors in bayesian statistics.

past meteor
#

People learn CV as a tool and idt the reasons behind the tool (and how you can still abuse it) are covered adequately

desert oar
desert oar
#

i've been lucky to work with very few knuckleheads and mostly people who are very conscientious about their work

slim bone
past meteor
#

Well, someone at work is working with a medical dataset. Not a lot of subjects. They're making a model. They use their entire dataset as validation

#

Because test train splitting with a small dataset isn't great either

desert oar
#

ah. do they not know about bootstrapping and cv?

past meteor
#

But they've iterated too much on their dataset that they're overfitting implicitly now

#

They're using cross validation

#

Cross validation does not save you here

#

After 1000 rounds of CV you're essentially making new features to raise the validation score

#

Each evaluation on your test / validation set increases the bias on your score

desert oar
#

oh so they'll do CV, make a change, do CV again, etc?

past meteor
#

yup

#

Imo you can do that, but not too much

desert oar
#

yeah that's always a tough one. in theory you're not supposed to do it at all, but how else are you supposed to iterate?

#

i've definitely fudged it with problematic datasets where we did things like CV simultaneously for hyperparameter selection and performance eval 😆 but we 1) knew we were overestimating performance and undersold our results to the business, 2) knew we would be able to get new out-of-sample data soon that we could use to evaluate the model properly, and 3) had good business reasons to believe that our data was "representative enough" (part of it was synthetically constructed anyway)

past meteor
#

You can't iterate without doing it, but doing it too much means you're overfitting so the answer is doing it "a little"

desert oar
#

yep. that seems like something you could maybe study with an information theoretic approach (how much is too much) but i haven't seen any papers on it

past meteor
#

There are but they're tedious haha

desert oar
#

i'd be curious what the literature says on it

past meteor
#

This problem has a name, iirc it's "adaptive overfitting"

past meteor
#

If I can sell it to myself that it's OK it's OK

desert oar
#

hah yep

past meteor
#

The problem is not knowing and having crazy inflated scores as a result

desert oar
#

but that's why we need all this foundational knowledge: can you sell it to yourself in a way that's legit?

#

like you're saying, you have to know what's wrong with doing Bad Thing in order to ever coherently justify doing Bad Thing

#

also i didn't know the term "adaptive overfitting", i've heard about it before in cases like everyone training on the same reference dataset but not with a nice name

past meteor
#

I don't think you even need to justify it? If you know it's bad and you can attach a "performance may be inflated" disclaiemr you're fine

#

Why? Let's say you cut some corners and the performance is 3 % higher than the baseline. I'm picking the baseline

#

If it's 30 % and the corners that I cut aren't too severe, sure I'm still picking my approach

desert oar
#

fair

#

ideally you can get a numerical estimate though

#

that's not always easy. simulation studies can be hard to design

past meteor
#

And to do that we'd have to look at our cousins from statistics

desert oar
past meteor
#

This is multiple testing

#

It's exactly the same problem as multiple testing. 100 %.

desert oar
#

fwiw i think multiple comparisons correction is controversial even in stats

#

you need a pretty well-formed decision criterion to do any kind of "testing" properly

past meteor
#

The thing is, at least they know it's a problem

#

If I were serious about tackling it I know stats has been grappling with this for ages and I know what to read

granite atlas
desert oar
sleek harbor
#

yesterday my plot was plotting.. I don't remember touching it.. today it's not plotting anymore.. :?

serene scaffold
lapis sequoia
#

can anyone provide me a good website focusing on ai ml dl data etc. staff?

twilit tundra
lapis sequoia
twilit tundra
#

Kaggle has a pretty good introduction so definitely yeah

sleek harbor
sleek harbor
twilit tundra
#

I haven't really compared beginners tutorials though, feel free to add your own recommendations

sleek harbor
odd meteor
# lapis sequoia can anyone provide me a good website focusing on ai ml dl data etc. staff?
AI Planet (formerly DPhi)

Learn Data Science for free through application oriented courses. Utilize our expert-curated resources as per your interest and pace.

sleek harbor
lapis sequoia
#

can anyone help me with install ing keras, I have installed python 3.12 and when I install tensorflow, it gives me these errors,

odd meteor
lapis sequoia
#

its alr im using pytorch now

civic elm
#

Is it confirmed that Keras will support pytorch this year?

serene scaffold
lapis sequoia
lapis sequoia
desert oar
civic elm
#

Great because I only need to learn one framework, hopefully

odd meteor
# civic elm Great because I only need to learn one framework, hopefully

Hehehe this was also what I said before; that I'm gonna learn Tensorflow. You certainly need to start with your most favourite framework but it'll be nice to be framework agnostic. Started with Tensorflow, but then I recently moved into ML Research, and here I am still learning PyTorch. It's nice to know at least two in my opinion. Tensorflow / PyTorch / JAX

odd meteor
fresh harbor
#

Is there a ((detailed)) pretrained model for audio samples used in music? AudioSet isn't that detailed and its the highest benchmark currently available for audio classifiers

odd meteor
#

HuggingFace usually has one or two gems for almost everything ML. Maybe try checking there.

narrow flare
#

Can someone tell me if I need Cuda 11 for tensorflow to work with my GPU? I currently have CUDA 12 and tensorflow is not detecting my GPU

lapis sequoia
unique flame
#

You asked for sites that would teach AI

unique flame
narrow flare
#

I wanna stick to tensorflow for now. Seems like this is really annoying for a lot of people lol

#

There's some docker thing that makes it easy apparently so im gonna look into that ig

steady basalt
#

Landed my next Data scientist job! its been such a long journey I feel like sam on mount doom

steady basalt
#

🥳 ty

#

had a really weird interview question though about prior likelihood vs probability, I think they got the wording mixed up

#

gotta say the job markets so bad at the moment, was a real grind

dense crane
#

i applied ```py
model = nn.DataParallel(model)

#

i am using kaggle gpu t4 x2

past meteor
#

Also my hot take is that once you know one framework you can be productive by googling "How do I do X in Pytorch / Tensorflow"

unique ether
#

Hello everyone!

#

I see this chat is a lot less populated than Python General which I have been frequenting recently

#

Quantity is no indicator of quality though!

#

Does anyone here have any insight into the best paid positions within the AI and ML field?

#

I've seen two commonly recurring job titles are "AI engineer" and "ML engineer" and the internet seems quite divided on who has the higher salary

serene scaffold
#

I've met "artifical intelligence engineers" who just flat out do not write code.

#

which means they are not "engineers" in the programming sense.

#

so even if it turns out that people who have the title "ML engineer" on average make more than people who have the title "AI engineer", that doesn't really tell us anything.

fallow frost
#

whats the appropriate plot for displaying the min, max, and avg execution time of a function (the X-axis will display the amount of time VS the input size)

#

I was thinking of using three lines with the same color (but different shadow) and fill the area between the min & max with a color

#

and I want to display the benchmark for 4 functions, so 12 data points in total

fallow frost
#

I was taught that I shouldnt use bar plots for continuous data

iron basalt
desert oar
wooden venture
#

for chatbot just say like a fun, normal conversation which type of ml model should i be aiming for?

desert oar
desert oar
iron basalt
left tartan
desert oar
#

or maybe that was pymc3 which was based on theano? idk

fallow frost
iron basalt
#

Even the about in the github page for it says "Theano was a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently."

#

Keyword, "was."

left tartan
fallow frost
left tartan
iron basalt
#

Ah, it was continued as "Aesara," which was forked to "PyTensor."

desert oar
#

@unique ether but more broadly, the best-paid positions are high-value positions in industries and at companies that have the capital to pay a lot of money for that high-value work. that would typically be ML/AI/data engineering supporting advanced research teams and/or critical production systems, or being an advanced researcher yourself. for positions that are realistically obtainable for normal people, "data engineering" and "data science" are still the two primary tracks. pay depends on seniority/expertise, region, and choice of industry