#data-science-and-ml | Python | Page 114

long canopy Mar 30, 2024, 7:51 PM

#

i see. thanks for the input! btw how do you use mlflow and tensorboard separately? afaik they're both for metrics/logging on models right?

past meteor Mar 30, 2024, 7:51 PM

#

I use MLFlow to log summaries of runs and use tensorboard to investigate the details if a run looks interesting or strange

long canopy Mar 30, 2024, 7:51 PM

#

past meteor I use MLFlow to log summaries of runs and use tensorboard to investigate the det...

i see right, that makes sense. ty

final kiln Mar 30, 2024, 7:52 PM

#

And like, if I had to go back, the thing I'd push harder for would be to have less moving parts. So instead of using open search MySQL and redis, I'd instead do postgres + redis, or just redis

long canopy Mar 30, 2024, 7:53 PM

#

final kiln And like, if I had to go back, the thing I'd push harder for would be to have le...

right ok, thanks this sort of comment is super helpful. will try full redis first and see how it goes

past meteor Mar 30, 2024, 7:53 PM

#

I also considered using dagster but I just built a basic svelte + fastAPI web app (<2 hrs work) to monitor some other things

#

Memory spikes seem to kill my pipelines and connecting to mlflow and/or tensorboard requires a company VPN so I essentially did a workaround where I made simple API endpoints that my runners send requests to confirm they're still alive and the system resource state. That way I can monitor runs and intervene where necessary on my phone before going to bed (without needing the VPN) 🤠

long canopy Mar 30, 2024, 7:56 PM

#

past meteor Memory spikes seem to kill my pipelines and connecting to mlflow and/or tensorbo...

hm what about a prometheus+grafana+loki setup for this usecase?

final kiln Mar 30, 2024, 7:56 PM

#

past meteor Memory spikes seem to kill my pipelines and connecting to mlflow and/or tensorbo...

Wait. But aren't you creating a security hole by circumventing the VPN

past meteor Mar 30, 2024, 7:56 PM

#

long canopy hm what about a prometheus+grafana+loki setup for this usecase?

Well, the key part is (<2 hrs work)

long canopy Mar 30, 2024, 7:56 PM

#

past meteor Well, the key part is (<2 hrs work)

right heheheh

past meteor Mar 30, 2024, 7:57 PM

#

If you trust yourself you can implement small subsets of overengineered stacks faster than you can read their documentation

#

I do this ... a lot

past meteor Mar 30, 2024, 7:58 PM

#

final kiln Wait. But aren't you creating a security hole by circumventing the VPN

Not really. We have a, let's call it, API gateway that isn't behind the VPN for purposes like this

final kiln Mar 30, 2024, 7:59 PM

#

That makes sense

past meteor Mar 30, 2024, 7:59 PM

#

The GPU VM just doesn't have a public IP, which means that this is kind of the only way

#

When I'm on the VPN I run a reverse proxy anyway to have a local.ip/mlflow, local.ip/tensorboard, ...

final kiln Mar 30, 2024, 8:00 PM

#

Ngl this redis SQL thing, sounds pretty good I'm gonna try it first chance I get, one less service in my app? Mind if I do

final kiln Mar 30, 2024, 8:01 PM

#

past meteor When I'm on the VPN I run a reverse proxy anyway to have a local.ip/mlflow, loca...

I've been doing all my stuff either over a connection made by GitHub, which I assume is secure or via ssh and ssh tunneling, but the ips are sometimes very very public

#

Like http basic auth kinda public

#

I should probly setup a vpn

past meteor Mar 30, 2024, 8:03 PM

#

On my virtual private server it's the same idea. I use github actions for CI/CD and it's kind of ... yeah

final kiln Mar 30, 2024, 8:04 PM

#

final kiln Like http basic auth kinda public

(it's doing security by obfuscation tho, since no one knows the IPs, they're stored in GH repo secrets)

past meteor Mar 30, 2024, 8:04 PM

#

I eventually need to:

Set up a VPN properly
Figure out how to make the GHA runner use it
Put my ssh port behind a firewall

long canopy Mar 30, 2024, 8:05 PM

#

final kiln I should probly setup a vpn

aren't you on AWS? why not pure VPC stuff

final kiln Mar 30, 2024, 8:05 PM

#

long canopy aren't you on AWS? why not pure VPC stuff

Yeah I could place them like, in the same subnet and have them address each other by their local IP, but it's not guaranteed cuz sometimes I'm doing stuff locally

long canopy Mar 30, 2024, 8:06 PM

#

yeah you can just use aws' api for VPC to manage the subnets and such

final kiln Mar 30, 2024, 8:07 PM

#

Right, but sometimes the machine is my laptop

past meteor Mar 30, 2024, 8:07 PM

#

I managed to draft up a plan to solve my GPU issue with IT btw

long canopy Mar 30, 2024, 8:07 PM

#

final kiln Right, but sometimes the machine is my laptop

oh

#

there's probably something to do here

#

like set your machine up within the VPC

#

i don't know the specifics but i need to learn them

#

let me know if you do an implementation

final kiln Mar 30, 2024, 8:08 PM

#

Uhm, never heard of it but I think something like that could be possible

past meteor Mar 30, 2024, 8:08 PM

#

I inventarised all VMs (a lot of work...) we're running on our 3 servers and convinced them to move all of them to node 1 and 2 so we can just have a bare metal ubuntu install without prox mox on one of the machines with the quadro

final kiln Mar 30, 2024, 8:09 PM

#

Altho, it's a lot of work just to get my machine in the network, might as well setup a vpn or open the ports on the cloud to my IP only

long canopy Mar 30, 2024, 8:09 PM

#

past meteor I managed to draft up a plan to solve my GPU issue with IT btw

what was going on?

final kiln Mar 30, 2024, 8:09 PM

#

I've just not done that out of lazyness so I don't think I'll go through the trouble of doing the vpc stuff

past meteor Mar 30, 2024, 8:09 PM

#

the tl;dr is that I kept begging for a bigger VM and I just managed to convince them to give me the entire server 👍

#

The entire box, not chopped up into bits using proxmox or whatever

final kiln Mar 30, 2024, 8:10 PM

#

Ah I never had to battle for resources like that

#

Tho I've experienced the lack thereof

long canopy Mar 30, 2024, 8:11 PM

#

past meteor the tl;dr is that I kept begging for a bigger VM and I just managed to convince ...

nice! time to load that 80 GB model into vram 😎

past meteor Mar 30, 2024, 8:11 PM

#

Pretty much

#

CPU and RAM were bottlenecking me

#

Couldn't use the 48GB VRAM card to its capacity

long canopy Mar 30, 2024, 8:11 PM

#

final kiln Ah I never had to battle for resources like that

my current project is to do distributed inference on weak CPU-only nodes

final kiln Mar 30, 2024, 8:12 PM

#

What is a weak node ?

long canopy Mar 30, 2024, 8:12 PM

#

past meteor CPU and RAM were bottlenecking me

the classic non-cuda conundrum

long canopy Mar 30, 2024, 8:13 PM

#

final kiln What is a weak node ?

smallest available VMs in whichever cloud producer you choose

final kiln Mar 30, 2024, 8:13 PM

#

Yeah sounds interesting

#

I've always wondered how gpt4 even does inference, model is so big and there's so many people using it

long canopy Mar 30, 2024, 8:14 PM

#

final kiln I've always wondered how gpt4 even does inference, model is so big and there's s...

tensor parallelism + pipeline parallelism

#

main subject i've been working on lol

final kiln Mar 30, 2024, 8:15 PM

#

Ah I mostly focus on smaller scale, might bit me in the future idk

long canopy Mar 30, 2024, 8:15 PM

#

yeah I want to minimize cost of doing inference with unquantized models

final kiln Mar 30, 2024, 8:18 PM

#

I think this is why the industry seems (at least from what I've observed) to be ahead of academia, the industry is very resource aware and always looking to optimize while the academia is very smart folks doing a subject they like but not necessarily within the same kinds of constraints

#

But idk

long canopy Mar 30, 2024, 8:24 PM

#

final kiln I think this is why the industry seems (at least from what I've observed) to be ...

personally my entire project would be impossible without petals and PiPPy heheh

#

really need those academic types working on those

final kiln Mar 30, 2024, 8:30 PM

#

long canopy personally my entire project would be impossible without `petals` and `PiPPy` he...

the authors in petals are all connected to industry i think

#

wait

#

yeah I think so, except one

#

the first one is actually now working at open ai

long canopy Mar 30, 2024, 8:38 PM

#

huh! industry eh

#

didn't know

#

well, thanks to those lads in any case lol

final kiln Mar 30, 2024, 8:40 PM

#

yeah they're from a company that happens to have a lab in the university, I actually worked at a place like that, it works very much like a company, didn't see much difference except that I had to walk through a campus again

vocal sleet Mar 30, 2024, 8:43 PM

#

What python libraries can I use to make a simple AI chatbot to add to a discord bot I am making? I know the openai library exists but I want a few more reccomendations?

final kiln Mar 30, 2024, 8:53 PM

#

vocal sleet What python libraries can I use to make a simple AI chatbot to add to a discord ...

Ollama is pretty good

vocal sleet Mar 30, 2024, 9:21 PM

#

final kiln Ollama is pretty good

What Ollama model is best for what I'm doing?

final kiln Mar 30, 2024, 9:32 PM

#

vocal sleet What Ollama model is best for what I'm doing?

Depends on your specs, the higher the parameter count the smarter it usually is

vocal sleet Mar 30, 2024, 9:37 PM

#

final kiln Depends on your specs, the higher the parameter count the smarter it usually is

do you know any good youtube tutorials to learn ollama?

final kiln Mar 30, 2024, 9:38 PM

#

vocal sleet do you know any good youtube tutorials to learn ollama?

No, the docs served me well, from what I recall it's a very similar API to open AI's

winter sluice Mar 30, 2024, 11:46 PM

#

Should I watch a video on memory/garbage collection for this? - but typically we say 'no reusing variable names'. But for sequential dataset calculations it feels totally wrong to make so much memory.

Lines like these happen all the time in my code:

        parsed_dataset = dataset_choice.parse_tfrecord(...
        self.dataset = filtered_dataset.shuffle(...```

#

ignore that it has an error haha

mild grotto Mar 31, 2024, 1:34 AM

#

So, I profiled my app and I see
{built-in method scipy.sparse._sparsetools.coo_matvec} is taking up basically all the processing time.

This is because I am have this gausian blur filter
self.L1=adjacency.tocoo()
and then blur like this:

  def blur(self,data):
    return self.L1.dot(data.flatten()).reshape(data.shape)

Is there a more performant way to do this?

#

I thought about doing a larger blur filter (5 pixels instead of 3) and then I could do 2 blur operations in a single pass. However this seems to cause it to actually be slower presumably because it can't utilize co-local variable locations in memory.

ashen axle Mar 31, 2024, 4:34 AM

#

I am looking for a LOCAL data pipeline framework that encourages intermediate value inspection, preferably through visualisation, throughput validation, and error handling. What is the contemporary f ramework/approach?

I am familiar with scikit learn's pipelines but as far as I am aware none of my requirements are built-in.

I've reached a point where I am writing one from scratch, which tells me I'm doing the wrong thing, so Im curious what the field is using. Web search turns up the usual Medium articles, blogs and advertisements for distributed systems.

wooden sail Mar 31, 2024, 5:18 AM

#

mild grotto So, I profiled my app and I see `{built-in method scipy.sparse._sparsetools.coo_...

reshaping is slow. if you can think of a clever way to represent the operation, that's probably better

#

e.g. keeping the original shape and doing elementwise multiplication plus addition

mild grotto Mar 31, 2024, 5:20 AM

#

Yeah I mean, I can keep shape the same and use a function to index in

#

would that help?

#

I'll try it

dusk tide Mar 31, 2024, 7:36 AM

#

Hello, has anyone ever did the Tensorflow Professional Developer Certificate exam ?

mild grotto Mar 31, 2024, 7:47 AM

#

wooden sail reshaping is slow. if you can think of a clever way to represent the operation, ...

Reshaping to a single row didn't provide a clear speedup... until I also changed my data format from COO to CSR. Then it was about 66% faster

wooden sail Mar 31, 2024, 7:56 AM

#

mild grotto Reshaping to a single row didn't provide a clear speedup... until I also changed...

how about with no reshaping and only * and + ?

mild grotto Mar 31, 2024, 7:58 AM

#

I allocate everything as a long array, and index using

Face*res*res + Y*res + X

#

Is that what you mean?

#

Now there is no reshaping.

wooden sail Mar 31, 2024, 8:00 AM

#

all right, though that kinda looks like a quadratic form now

#

what shape is this face variable?

#

and Y and res, i guess

#

originally and now as vectors

iron basalt Mar 31, 2024, 8:08 AM

#

mild grotto So, I profiled my app and I see `{built-in method scipy.sparse._sparsetools.coo_...

Consider ravel over flatten.

orchid forge Mar 31, 2024, 9:48 AM

#

guys i need help understanding something

unique ivy Mar 31, 2024, 10:05 AM

#

Pandas

orchid forge Mar 31, 2024, 10:17 AM

#

yup

past meteor Mar 31, 2024, 11:41 AM

#

ashen axle I am looking for a LOCAL data pipeline framework that encourages intermediate va...

Sadly the word "pipeline" means 5 different things in data

#

I think what you want is an orchestration tool. In that case you either want airflow or dagster. Airflow is the option with the most traction but dagster is comparatively simple

#

sci-kit learn's pipelines are something totally different, that's just encapsulating a ML model with its preprocessing (which is something you should definitely do)

versed flame Mar 31, 2024, 12:37 PM

#

Hi! Ill post this question here on recommendation:
I've recently said something infront of some any of my devices which sends me recommendations for 'trading-bot's etc on youtube.
While I doubt its not easy to get rich, I've traded with paper accounts before which was fun, and the thought of a bot seems like a fun project.

How 'real' are these, and also what is a good way or direction to start learning when going for this?

I assume I need to use machine learning to some capacity.

shut yoke Mar 31, 2024, 12:40 PM

#

versed flame Hi! Ill post this question here on recommendation: I've recently said something ...

You just trynna get rich easily

versed flame Mar 31, 2024, 12:41 PM

#

Well, "yes" but also no. If it was easy I relize it woulnt work.

shut yoke Mar 31, 2024, 12:41 PM

#

It's not easy to make the bot yourself

versed flame Mar 31, 2024, 12:41 PM

#

Based on comments on ALL the videos, trading gurus seems rather overrated.

shut yoke Mar 31, 2024, 12:42 PM

#

And it doesn't guarantee you profit because after all it's a bot. Not any better than a human being

versed flame Mar 31, 2024, 12:44 PM

#

Lets refrase it the, what Is a good way to get into machine learning, what other kind of project could i do? I learn alot better when doing something rather than following directions (hence why I dont want to watch the youtube videos and just copy)

abstract rune Mar 31, 2024, 12:46 PM

#

Isn't matrix multiplication also of order n^3 ?

#

how does gradient descent makes a better choice than the close form solution of (XTX)^-1 (Xy) ?

faint galleon Mar 31, 2024, 1:04 PM

#

abstract rune Isn't matrix multiplication also of order n^3 ?

In India we have diff method which is lot more easier than this

abstract rune Mar 31, 2024, 1:04 PM

#

faint galleon In India we have diff method which is lot more easier than this

?

faint galleon Mar 31, 2024, 1:05 PM

#

abstract rune ?

3 × 3

#

Is that about element or variable??

final kiln Mar 31, 2024, 1:07 PM

#

abstract rune how does gradient descent makes a better choice than the close form solution of ...

Gradient descent works on arbitrary C1 differentiable functions

abstract rune Mar 31, 2024, 1:08 PM

#

i have no idea what you are talking about @faint galleon

faint galleon Mar 31, 2024, 1:08 PM

#

abstract rune i have no idea what you are talking about <@998608551431897170>

Leave it leave it

abstract rune Mar 31, 2024, 1:09 PM

#

final kiln Gradient descent works on arbitrary C1 differentiable functions

but still we gotta calculate XTX
the proff told that we can use "Stochastic gradient descent"

#

which reduces the size for X, so it makes the computation simpler

final kiln Mar 31, 2024, 1:10 PM

#

abstract rune but still we gotta calculate XTX the proff told that we can use "Stochastic grad...

My point is that in a simple example the analytic solution might be more efficient. But in the real world you'll most often not have one when it comes to deep learning

abstract rune Mar 31, 2024, 1:10 PM

#

final kiln My point is that in a simple example the analytic solution might be more efficie...

hmm ok.

split drift Mar 31, 2024, 1:22 PM

#

Hey,
I've written a long script that process data.
I think that it would be good to break it into modular parts, to improve maintainability and readability.
Can someone send me a guide, or a repo that can serve as an example of how to do it correctly?

mild grotto Mar 31, 2024, 2:36 PM

#

wooden sail what shape is this face variable?

(6,res,res)

#

it's the surface of a cube

#

so face is just [0,5], y is [0,res] and x is [0,res]

ashen axle Mar 31, 2024, 3:18 PM

#

past meteor Sadly the word "pipeline" means 5 different things in data

Yes, you're spot on there. This is a 1 man locally run scientific project running batch data in MB size, signal processing. I'm simply spending too much time chasing errors caused during development. All I'm looking for is error handling and intermediate step data viz. I feel like airflow or otherwise is overkill? I'm not familiar with it.

rocky ridge Mar 31, 2024, 3:37 PM

#

https://discord.com/channels/267624335836053506/1224019305947988119

#

Please help me data scienctitsts

twin reef Mar 31, 2024, 3:37 PM

#

Hi guys I have made a model for a car that drives on a certain track and the point of the project is to get yhe best model possible for a track and you race against the car and at the end it shows where you could have performed better analysing the car amd your movement

#

Amd since I have only made the basic model and the pygame simulation I am wonder if this is too hard

#

Since I have around 20 days to do it

rocky ridge Mar 31, 2024, 3:43 PM

#

twin reef Hi guys I have made a model for a car that drives on a certain track and the poi...

can you help me?

twin reef Mar 31, 2024, 3:43 PM

#

No

rocky ridge Mar 31, 2024, 3:44 PM

#

https://tenor.com/view/ricky-berwick-toy-gun-shoot-nerf-gun-playing-gif-13811086

Tenor

feral blade Mar 31, 2024, 5:24 PM

#

hii, im using torchreid library for my custom data... The documentation says it automatically logs the learning curves and i just need to install tensorboard to visualize it... but the visualizations come out to be like this which is very weird imo.... is there anything i could do to maybe extract loss/rank1/map stuff from training myself and plot them, or any way to reconfigure plot?
link to doc - link to the said doc - https://kaiyangzhou.github.io/deep-person-reid/user_guide#visualize-learning-curves-with-tensorboard

graceful ledge Mar 31, 2024, 7:15 PM

#

has anyone ever analyzed their junk mailbox using python?

#

I just nuked 11k unread emails and am interested into sender distributions, etc. Wondering how I can get this from a folder in an email inbox

final kiln Mar 31, 2024, 7:34 PM

#

a while back i wanted a bot scraping my emails and wasnt able for gmail

#

only way wAs actual web scraping

past meteor Mar 31, 2024, 7:55 PM

#

graceful ledge I just nuked 11k unread emails and am interested into sender distributions, etc....

Due to GDPR being a thing you can easily get an export of all your emails

final kiln Mar 31, 2024, 8:06 PM

#

past meteor Due to GDPR being a thing you can easily get an export of all your emails

Do you know any way of doing real time ?

lapis sequoia Mar 31, 2024, 9:28 PM

#

hi i wanted to ask where should i start to learn python for AI since I'm interested how can machines learn smt (especially how it learns from its mistakes) so if i should buy specifics books or where i should start

odd meteor Mar 31, 2024, 10:00 PM

#

split drift Hey, I've written a long script that process data. I think that it would be good...

I'd say, just device a structure that works best for you. For example, I use the so called "3-design pipeline" to decompose my ML code into manageable components.

Feature Pipeline: A script that transforms raw data into model features, then pushes it to a feature store so the rest of the system can use it (I use Feast for most project)
Training Pipeline: A script that ingests features from feature store, train the model, and pushes the artifacts to model registry
Inference Pipeline: fetches last batch of features and generates prediction using the model that's already pushed to the model registry.

You can work out something like this where you decompose your long script into small and manageable bits.

More so, if you fancy Poetry, you can as well use it to keep your work well-structured.

odd meteor Mar 31, 2024, 10:06 PM

#

lapis sequoia hi i wanted to ask where should i start to learn python for AI since I'm interes...

Start from https://kaggle.com/learn
Check the pinned post by Zestar. You'll see some book resources he recommended.

If you're interested in making a financial commitment, you can try Udemy, Coursera or Udacity.

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these no-cost courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

hexed dawn Mar 31, 2024, 10:17 PM

#

hi! i'm getting conflicting info, would you say TF-IDF as a vectorizer is for feature extraction or feature selection?

tranquil mist Mar 31, 2024, 10:56 PM

#

Hey guys, I was wondering if there’s any VERY in depth resources for pandas, preferably with real world (read non ideal) input. I keep hitting a wall where the documentation isn’t very helpful in terms of performance and most YouTube videos / SO questions are very superficial and not geared towards very large datasets.

#

I’d say I’m beginner to intermediate level, meaning I can get anything done with decent but not optimal performance.

long canopy Apr 1, 2024, 1:23 AM

#

are there python alternatives to kafka?

quaint loom Apr 1, 2024, 4:55 AM

#

I've been tackling how to predict Macrophytes biomass using data from different locations and environmental factors. Initially, I tried using 'Wet biomass', 'Wet weight', and 'Dry weight' to guess 'Dry biomass', but that just made my model too clingy (overfit). So, I switched gears and decided to first make predictions on those auxiliary bits - 'Wet biomass', 'Wet weight', and 'Dry weight'. Then, I'd use these predictions as inputs to predict 'Dry biomass' more accurately, hoping this roundabout way would trick the model into not overfitting.

After merging and cleaning up the data, I split it up, made sure there weren't any gaps in my target variables, and trained separate models for each auxiliary target. These models' predictions were then used as extra features to help predict 'Dry biomass' with a RandomForestRegressor.

But here's where it got tricky: I ran into a snag with mismatched sample sizes, flagged by an error pointing out I had [294, 368] samples at different stages. I believe I may be off track, so any input would certainly be valuable.
https://paste.pythondiscord.com/M6EQ

odd meteor Apr 1, 2024, 6:09 AM

#

hexed dawn hi! i'm getting conflicting info, would you say TF-IDF as a vectorizer is for fe...

Feature extraction

odd meteor Apr 1, 2024, 6:35 AM

#

long canopy are there python alternatives to kafka?

Celery, Apache Pulsar, Flink, Faust...

Kafka + Bytewax is the setup here.

odd meteor Apr 1, 2024, 6:46 AM

#

tranquil mist Hey guys, I was wondering if there’s any VERY in depth resources for pandas, pre...

I think nothing gets more in depth than Pandas documentation itself.

When you say documentation isn't much helpful in terms of performance, can you elucidate more?

Perhaps, it's the size of your dataset. When it comes to handling very big dataset, most people switch to PySpark, Polars, Dask, CuDF, etc.

karmic void Apr 1, 2024, 7:02 AM

#

Hello guys, I am quite experienced in python and wanna enroll myself in DATA SCIENCE. I don't have much idea of what type of projects are created in this field. I have learnt about numpy, pandas, matplotlib and seaborn. Is there any idea about what should I do more and how?

final kiln Apr 1, 2024, 8:04 AM

#

odd meteor Celery, Apache Pulsar, Flink, Faust... Kafka + Bytewax is the setup here.

Celery can be very buggy at times. Gonna try those other alternatives

raw mortar Apr 1, 2024, 8:13 AM

#

plus1 for dask, it's a distributed task scheduler itself

potent sky Apr 1, 2024, 8:53 AM

#

Flink is great and growing very well. We recently finished adopting a donation of Change Data Capture Connectors from Alibaba

final kiln Apr 1, 2024, 8:56 AM

#

I wonder if prefect can be used for similar purposes

past meteor Apr 1, 2024, 12:55 PM

#

potent sky Flink is great and growing very well. We recently finished adopting a donation o...

Oh do you use Apache flink?

past meteor Apr 1, 2024, 12:56 PM

#

long canopy are there python alternatives to kafka?

The closest comparison to Kafka is RabbitMQ

#

I don't think Kafka is a data processing solution at all. It's a piece of infra for distributed event-driven programming which can be used for data processing but wasn't specifically designed for it.

potent sky Apr 1, 2024, 1:04 PM

#

past meteor Oh do you use Apache flink?

Sometimes
But I like it.
I keep track of the developments, dev list discussions, sometimes vote on releases and FLIPs.
I'd try to contribute more through code but can hardly find the time 😔

past meteor Apr 1, 2024, 1:06 PM

#

potent sky Sometimes But I like it. I keep track of the developments, dev list discussions,...

That's already a lot 😮 What are your use cases for Flink if I may ask?

potent sky Apr 1, 2024, 1:17 PM

#

Real time data analytics platform for example, in combination with Apache ignite 🔥. Flink supports true stream processing natively.
Unfortunately most of my dealings with flink are hobby projects 😅
Flink is pretty good for stateful unbounded streams and event driven requirements. Exactly-once consistency guarantees in many cases.

past meteor Apr 1, 2024, 1:20 PM

#

potent sky Real time data analytics platform for example, in combination with Apache ignite...

Interesting, does that mean you work with Java/Scala?

potent sky Apr 1, 2024, 1:22 PM

#

pyflink

#

Java for where that's insufficient

past meteor Apr 1, 2024, 1:25 PM

#

I'll look into Flink some more 👀

My area is p much dominated by Azure and Databricks so that's what I know. I've looked into Flink a tiny bit but not that intensively.

desert mulch Apr 1, 2024, 2:08 PM

#

we are making game would help us

#

gmail:haseebrob456@gmail.com

long canopy Apr 1, 2024, 2:47 PM

#

@odd meteor @past meteor thanks for the comments!

potent sky Apr 1, 2024, 3:06 PM

#

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Good piece. Putting it here in case someone hasn't come across it yet.
The discussion on Kafka, flink etc reminded me of it

The Log: What every software engineer should know about real-time d...

I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. This has been an interesting experience: we buil

past meteor Apr 1, 2024, 4:11 PM

#

potent sky https://engineering.linkedin.com/distributed-systems/log-what-every-software-eng...

Yeah, I do know change data capture but that's like the extent of my knowledge here 😄

#

And event sourcing I guess

long canopy Apr 1, 2024, 4:19 PM

#

7B is now 3 GB

#

it's the friggin future lads

long canopy Apr 1, 2024, 4:36 PM

#

the apache big data stack is some serious stuff

#

i can clearly see its use for business analytics

potent sky Apr 1, 2024, 4:40 PM

#

past meteor Yeah, I do know change data capture but that's like the extent of my knowledge h...

I'm also surface level on many of them tbh
But it's a good read

#

My latest interest is Apache Ignite

past meteor Apr 1, 2024, 4:40 PM

#

I honestly never heard of that one

#

I'm reasonably deep into this stuff https://fs2.io/ but my guess is that it doesn't scale as Flink and Spark do because it's really aimed at single node concurrency

#

Totally fine for my use cases tho tbh

potent sky Apr 1, 2024, 4:41 PM

#

It has an interesting architecture and the enterprise use cases mentioned are also intriguing

potent sky Apr 1, 2024, 4:42 PM

#

past meteor I'm reasonably deep into this stuff <https://fs2.io/> but my guess is that it do...

I honestly never heard of that one 😅

#

Haven't really played much with Scala yet

past meteor Apr 1, 2024, 4:43 PM

#

It's my guilty pleasure 🤣

long canopy Apr 1, 2024, 5:18 PM

#

WHAT

#

gpt 3.5 just got released for local

#

dude what the heck is going on this is too much i cannot handle it

final kiln Apr 1, 2024, 5:19 PM

#

long canopy gpt 3.5 just got released for local

Fr ? Cuz that's big

long canopy Apr 1, 2024, 5:19 PM

#

nvm i just got rick astley'd

#

i'm a friggin idiot

#

sorry i forgot the day

final kiln Apr 1, 2024, 5:19 PM

#

Looool

long canopy Apr 1, 2024, 5:20 PM

#

dude it was so believable, the tweet even said it was an old version of gpt 3.5

final kiln Apr 1, 2024, 5:20 PM

#

They're definitely working on 4.5, so releasing 3.5 would mean 4.5 and 5 were coming

long canopy Apr 1, 2024, 5:20 PM

#

keep an eye out for jamba tho

#

some serious stuff going on there

final kiln Apr 1, 2024, 5:22 PM

#

Let's see. Altman also mentioned Q*, but I'd bet he's playing the hype

versed pilot Apr 1, 2024, 6:22 PM

#

tranquil mist Hey guys, I was wondering if there’s any VERY in depth resources for pandas, pre...

I did a couple of courses by Matt Harrison on Linked in Learning, and I bought his Pandas 2 book. He has some good ideas on making pandas code more maintainable, and also more performant. But you can only go so far with Pandas, you should consider other solutions if the dataset is really too big. At work we use BigQuery, but if you want to stick to open source dataframes libraries have a look at what Emyrs suggested.

tranquil mist Apr 1, 2024, 7:33 PM

#

versed pilot I did a couple of courses by Matt Harrison on Linked in Learning, and I bought h...

Yep, saw one of his talks a couple days ago which inspired me to treat pandas with more "respect" in a way. Unfortunately we're not allowed any type of distributed compute at work, all i have to work with is a laptop 😦 such an unserious company

tranquil mist Apr 1, 2024, 7:36 PM

#

odd meteor I think nothing gets more in depth than Pandas documentation itself. When you s...

I'll def look into Polars ! But i feel like i learn more by watching someone attack a problem live and solve it, basically what i'm looking for is some sort of data science livestream ahah

drifting spire Apr 1, 2024, 7:52 PM

#

Hey guys, has anybody here worked with recsys? I'm creating my first one and would love to hear some advices

final kiln Apr 1, 2024, 9:30 PM

#

https://youtu.be/wjZofJX0v4M?si=Llqp3kIlSJKM3V8h

I've only watched a chunk of this, but it's top notch as usual

YouTube

3Blue1Brown

But what is a GPT? Visual intro to Transformers | Deep learning, c...

An introduction to transformers and their prerequisites
Early view of the next chapter for patrons: https://3b1b.co/early-attention
Special thanks to these supporters: https://3b1b.co/lessons/gpt#thanks

Other recommended resources on the topic.

Richard Turner's introduction is one of the best starting places:
https://arxiv.org/pdf/2304.10557.p...

▶ Play video

long canopy Apr 1, 2024, 9:40 PM

#

i want one on mamba

fading wigeon Apr 2, 2024, 1:32 AM

#

Anyone have any suggestions for like... courses on Machine learning and/or AI in Python?

#

I'd like to get more experience in both without having to go get a masters/phd or something

#

r/machinelearning seems to recommend this, but that was like 7 years ago: https://www.coursera.org/learn/machine-learning/home/info

Coursera

Coursera | Online Courses & Credentials From Top Educators. Join fo...

Learn online and earn valuable credentials from top universities like Yale, Michigan, Stanford, and leading companies like Google and IBM. Join Coursera for free and transform your career with degrees, certificates, Specializations, & MOOCs in data science, computer science, business, and dozens of other topics.

desert oar Apr 2, 2024, 2:54 AM

#

fading wigeon Anyone have any suggestions for like... courses on Machine learning and/or AI in...

dive into deep learning, fast ai

fading wigeon Apr 2, 2024, 2:55 AM

#

Okay. Any suggestions on resources? I think I do best with academic type courses

#

like on coursera or something

desert oar Apr 2, 2024, 2:56 AM

#

fading wigeon Okay. Any suggestions on resources? I think I do best with academic type cours...

both are reasonably "academic". dive into deep learning is the basis for some actual CS courses

#

but check the pinned messages + search up in the channel, there will be lots of suggestions

fading wigeon Apr 2, 2024, 2:56 AM

#

OH

#

I thought you meant I should dive into deep learning

#

Like,as a description of what I should do

desert oar Apr 2, 2024, 2:57 AM

#

😆 nope https://d2l.ai/

#

https://www.fast.ai/

fast.ai

fast.ai - fast.ai—Making neural nets uncool again

fading wigeon Apr 2, 2024, 2:57 AM

#

Thank you! I'll look into them 🙂

long canopy Apr 2, 2024, 7:23 AM

#

@fading wigeon am fan of d2l and also https://web.stanford.edu/~jurafsky/slp3/

Speech and Language Processing

potent sky Apr 2, 2024, 8:37 AM

#

final kiln https://youtu.be/wjZofJX0v4M?si=Llqp3kIlSJKM3V8h I've only watched a chunk of t...

oh wow 3b1b getting into gpt
You love to see it

final kiln Apr 2, 2024, 8:38 AM

#

potent sky oh wow 3b1b getting into gpt You love to see it

im legit jealous of people who will learn this stuff for the first time from this video, it's always such a pleasure to see his presentations

potent sky Apr 2, 2024, 8:40 AM

#

Ikr! Felt the same when he posted "But what is a neural network"
Like why wasn't this out when I was getting into this stuff 😭

final kiln Apr 2, 2024, 8:48 AM

#

it's how it should be, we're here to make it easier for those who come after us - still jelly tho hueshda

tranquil juniper Apr 2, 2024, 9:24 AM

#

Question on math behind transformers if thats fine here, saw 3blue1browns last video on it and he describes that only the final tokens hidden state vector is used to generate the next token, why is that? Is it true? Why would you ignore all the valuable info in the other vectors? https://youtu.be/wjZofJX0v4M?si=xjG1aMGzizelL5B9 21:15 in the video for this specific question.

YouTube

3Blue1Brown

But what is a GPT? Visual intro to Transformers | Deep learning, c...

An introduction to transformers and their prerequisites
Early view of the next chapter for patrons: https://3b1b.co/early-attention
Special thanks to these supporters: https://3b1b.co/lessons/gpt#thanks

Other recommended resources on the topic.

Richard Turner's introduction is one of the best starting places:
https://arxiv.org/pdf/2304.10557.p...

▶ Play video

final kiln Apr 2, 2024, 9:30 AM

#

tranquil juniper Question on math behind transformers if thats fine here, saw 3blue1browns last v...

so the transformer is being trained on next token prediction, imagine you have some text:

dataset: "this is some text that is being used to train the transformer on next token prediction"

what you want to do is select a subset of it, for example

sample: "text that is being used to train the transformer"

you now turn this into an input and output:

input: "text that is being used to train the"
output: "that is being used to train the transformer"

note how in the output, the first token was removed and the last token was not present in the input

#

and so the reason why you only take the values from the last token, is because the other tokens are just being transcribed, copied from the input, the only token with new information is the last one

this is called a self-supervised method, in which the labels of the dataset are generated from an unlabelled dataset

#

for BERT you do like

input: "text that is being <MISSING_WORD_TOKEN> to train the transformer"
output: "text that is being used to train the transformer"

#

the reason for the difference has to do with the internals of the attention mechanism, BERT lets every token influence every other token, while GPT only lets tokens influence tokens that have already occurred in the sentence

tranquil juniper Apr 2, 2024, 9:42 AM

#

final kiln so the transformer is being trained on next token prediction, imagine you have s...

Why select a subset of data? Is it to verify and adjust weights according to performance? Why is the first token missing in output, is this due to context size or whats the reason for that?

tranquil juniper Apr 2, 2024, 9:44 AM

#

final kiln the reason for the difference has to do with the internals of the attention mech...

Am i getting this right, that gpt uses the last vector only as each vector only holds contextual meaning for every previous token? So the 2nd vector only holds contextual information for the previous token, not the rest?

final kiln Apr 2, 2024, 9:44 AM

#

tranquil juniper Why select a subset of data? Is it to verify and adjust weights according to per...

You select a subset because the amount of text will always be a lot larger than the context window.

The first token is missing because the size of the output is equal to the size of the input, so if you want a new token you have to remove something from the sentence

tranquil juniper Apr 2, 2024, 9:45 AM

#

final kiln You select a subset because the amount of text will always be a lot larger than ...

Is that just due to matrix multiplication? U just copy and paste the first token after to clean it up for the user then?

final kiln Apr 2, 2024, 9:47 AM

#

tranquil juniper Am i getting this right, that gpt uses the last vector only as each vector only ...

It's not that they hold information themselves, they are an exact copy of the input. They were there only to aid in the training process, having to copy the tokens and at the same time having it create a new one, forces the model to create meaningful internal representations

final kiln Apr 2, 2024, 9:47 AM

#

tranquil juniper Is that just due to matrix multiplication? U just copy and paste the first token...

I don't understand the question

tranquil juniper Apr 2, 2024, 9:47 AM

#

final kiln and so the reason why you only take the values from the last token, is because t...

Sorry if my intuition sucks, i think i understand the concept of unsupervised learning, but what is being labeled here?

final kiln Apr 2, 2024, 9:48 AM

#

tranquil juniper Sorry if my intuition sucks, i think i understand the concept of unsupervised le...

The text is being labeled

#

Each sentence is labeled by itself dislocated one token

final kiln Apr 2, 2024, 9:52 AM

#

tranquil juniper Is that just due to matrix multiplication? U just copy and paste the first token...

I mean it's just the setup right, you could technically make it larger or smaller, but it works out well to have it this way.

The extra tokens that are just copies, they participate in the loss function, it's an extra signal when training the model

tranquil juniper Apr 2, 2024, 10:10 AM

#

Do you have any textbook or colab/jupyter notebook you could recommend that helps understand the fine steps of it? A more low level understanding like yours? @final kiln

final kiln Apr 2, 2024, 10:15 AM

#

tranquil juniper Do you have any textbook or colab/jupyter notebook you could recommend that help...

I went through this step by step with a pen and paper in hand: https://bbycroft.net/llm

When I was satisfied with how much I understood, I went from the top, and implemented everything in pytorch.

Don't try to understand everything at once, it can be okay to start building the parts you do understand and then come back.

I first trained it on simple array sorting, then I trained it on next token prediction. At that point, I was using this repo as reference to get some details right:

#

https://github.com/karpathy/nanoGPT

sweet prairie Apr 2, 2024, 10:19 AM

#

👀 rate the model guys

tired lodge Apr 2, 2024, 10:25 AM

#

image permissions?

#

or just in general how to upload files to discord?

#

hmm, strange. the permission should allow all files of any kind to be uploaded

#

that makes sense. you shouldnt be uploading big py files because pastebins exist

#

see https://paste.pythondiscord.com

#

just put all your content in there lol

#

idk i might be waffling, im a bit hungry so i should probably eat

tranquil juniper Apr 2, 2024, 10:34 AM

#

final kiln I went through this step by step with a pen and paper in hand: https://bbycroft....

Thx man! I’ll def do this to really learn and ask here/ping you if thats fine if i hit a roadblock in implementation/theory. Just a undergrad in statistics currently with no formal cs background trying to learn.

frozen tundra Apr 2, 2024, 10:58 AM

#

does somone know if there is a problem with my code or i just didnt use it correctly? i tried to make my own neural network and everything works pretty well except the linear functions in the output layer, they just dont learn, they output the same output for different inputs (i can link the code if somone wants to see it)

frozen tundra Apr 2, 2024, 11:57 AM

#

sorry lol, ill paste it in a sec

#

https://paste.pythondiscord.com/USIQ

#

this is the code, (the elu is not finished yet it dosent matter)

#

i have changing parameters like the learning rate and amount of hidden layers and neurons but those didnt change much i found out its not only with the output layer but in the hidden aswell

plucky sedge Apr 2, 2024, 12:01 PM

#

I'm using Func Animation to animate a graph of a projectiles position in a simple flight (like throwing a ball) and I'm wondering if there's a way to auto-adjust the axes scale because the animation just goes off-screen immediatelty (I can still just move the graph around but I'd rather have it auto adjust)

Any help would be much appreciated.

frozen tundra Apr 2, 2024, 12:03 PM

#

i tried using just tanh and sigmoid to teach it stuff like cos and sin and it worked pretty good but when i used linear or relu activation functions it outputted the same thing over and over again

#

i apologize if my english is not well, its not my native language

#

thank you so much

#

no problem, the relu function isnt realy a relu, its kind of a relu that makes it so there are no derivatives that are 0 to prevent "killing" neurons

#

yeah

#

with the linear activation function?

#

ok ill try

#

btw i didnt teach it a cosine with the linear function, i tried to teach it a parabolic function

#

how many hidden layers should i use? i think one is good?

#

i tried a lot of different numbers and none of them worked (i dont mean to make you stay here you can go eat lol)

wooden sail Apr 2, 2024, 12:31 PM

#

i wouldn't expect it to work well outside of the training domain regardless of the activation function tbh

final kiln Apr 2, 2024, 12:32 PM

#

The other two worked, so something is correct

(Brb)

final kiln Apr 2, 2024, 12:50 PM

#

        self.weights = np.empty(1 + hidden_num, dtype=object)
        for i in range(len(self.weights)):
            self.weights[i] = np.random.uniform(-0.5, 0.5, (self.sizes[i], self.sizes[i + 1]))
        self.weight_update = self.weights

you're actually not making a copy here, both self.weight_update and self.weights point to the same array

frozen tundra Apr 2, 2024, 12:50 PM

#

it didnt work with the same setup, it outputs very similar outputs, when i tried using only linear activation functions another problem occured and the outputs were "nan" i have no idea why and i have very little knowledge of what it means. i found a setup that kinda works that has 1 hidden layer and 5 hidden neurons

final kiln Apr 2, 2024, 12:51 PM

#

    def add_changes(self):
        self.weights = self.weight_update
        self.biases = self.bias_update

so this function has no effect

#

and you dont need two separate arrays, you can just update the weights as you go through backprop

frozen tundra Apr 2, 2024, 12:51 PM

#

oh i understand, how do i make a copy?

frozen tundra Apr 2, 2024, 12:52 PM

#

final kiln and you dont need two separate arrays, you can just update the weights as you go...

but isnt it better to use batches?

final kiln Apr 2, 2024, 12:52 PM

#

frozen tundra but isnt it better to use batches?

you can still do batches, well, usually you're calculating this stuff at the same time right

frozen tundra Apr 2, 2024, 12:53 PM

#

final kiln you can still do batches, well, usually you're calculating this stuff at the sam...

i dont realy understand what you mean

final kiln Apr 2, 2024, 12:55 PM

#

so, like, in the forward pass you can already calculate the gradients right

f(x) = x**2

f'(x) = 2*x

even if it's part of a composition of functions, f'(x) can be calculated in isolation as long as you have x

during the backwards pass you apply the chain rule in succession

#

as you do that you apply a change to the weights, don't need to store the change in a separate array since you are only operating with the gradients

#

if you have an entire batch, you do the same thing, the difference is that you have multiple x's

#

but you do one step at a time for all batches, instead of the entire backprop for element each of the batch

#

but in any case, what I'm trying to say is that "self.weight_update = self.weights" does not perform a copy of the array, it just copies a reference to the same array, so updating one is updating the other

frozen tundra Apr 2, 2024, 1:00 PM

#

ok i think i understand (i will try to explain what you said) so what i should do is go through all of the batch at the same time and save the derivatives as i go so i do it as a batch

final kiln Apr 2, 2024, 1:02 PM

#

frozen tundra ok i think i understand (i will try to explain what you said) so what i should d...

yes I believe that's how it's usually implemented in the ml frameworks, where it has to work with all sorts of derivatives, but ig when you're implementing these linear layers with numpy they usually just do it the way you're doing it here

#

tho the def add_changes(self):
self.weights = self.weight_update
self.biases = self.bias_update
part doesnt have any effect

frozen tundra Apr 2, 2024, 1:05 PM

#

thanks i undestand. do you think it has to do with the problems i am having with the linear functions?

final kiln Apr 2, 2024, 1:08 PM

#

layer_before = np.dot(inputs, self.weights[i]) + self.biases[i]

in here you're hardcoding the layer right, if you were to do a general thing you'd do

layer_output = layer(inputs)
grad_output = grad_layer(inputs)

and inputs could be a batch of inputs, then in back prop you'd go back apply the chain rule and then avg out the gradients and apply them

frozen tundra Apr 2, 2024, 1:10 PM

#

thanks i realy appreciate your help

final kiln Apr 2, 2024, 1:10 PM

#

frozen tundra it didnt work with the same setup, it outputs very similar outputs, when i tried...

so, 1 linear layer and 5 neurons is your baseline of when it's working

frozen tundra Apr 2, 2024, 1:11 PM

#

do you maybe know what the nan is about?

final kiln Apr 2, 2024, 1:11 PM

#

frozen tundra do you maybe know what the nan is about?

potentially an overflow error somewhere

#

that's why normalizing the input tends to be a good idea

frozen tundra Apr 2, 2024, 1:11 PM

#

what is normalizing?

final kiln Apr 2, 2024, 1:17 PM

#

frozen tundra what is normalizing?

uhm, it can have multiple meanings

but it usually means getting the values to be within a certain range, like scaling them in the same way, but such that their sum is between 0 and 1

like if you have

[1, 2, 3]

you can see if I divide by 3+2+1 = 6

[1/6, 1/3, 1/2]

these sum up to one

#

there's multiple procedures and they can have many meanings

#

in this case I turned it into a probability distribution

#

the important intuition is that both collections of values have the same information

frozen tundra Apr 2, 2024, 1:18 PM

#

oh i see, but it outputs nan for all inputs when using linear activation function as the first layer

final kiln Apr 2, 2024, 1:19 PM

#

they both convey the same relative scale

#

but the values are nicer in the second case

frozen tundra Apr 2, 2024, 1:19 PM

#

final kiln uhm, it can have multiple meanings but it usually means getting the values to b...

thats what softmax is supposed to do right>

#

?

final kiln Apr 2, 2024, 1:19 PM

#

frozen tundra thats what softmax is supposed to do right>

yes

frozen tundra Apr 2, 2024, 1:21 PM

#

so i should pass the input values through some kind of function before the linear function

#

the point is i want it to learn infinite range function so i can later use it for q learning

#

yeah i just felt like using pytorch and tensorflow will be "cheating" because i wanted to realy understand how everything works but maybe i will use them after i have some more understanding of the topic

#

again i realy appreciate your help! you helped me understand a lot of things and you gave me a different prespective. Thank you so much

potent sky Apr 2, 2024, 2:10 PM

#

The SAM codebase is so nice it honestly makes me happy.
No unnecessary bloat code.
Very clean, logical, "the right amount of" modular, well commented.
Refreshing to see such a clean research implementation

desert oar Apr 2, 2024, 2:10 PM

#

frozen tundra yeah i just felt like using pytorch and tensorflow will be "cheating" because i ...

it depends on what you want to learn.

deriving the gradient manually, to implement a small fully-connected NN in pure numpy, is a great exercise.

implementing autograd yourself is interesting and useful if you are interested in ML engineering or other computational aspects of machine learning. but it's not an important learning exercise for actually doing DS/ML/AI in practice. just use pytorch for that.

desert oar Apr 2, 2024, 2:10 PM

#

potent sky The SAM codebase is so nice it honestly makes me happy. No unnecessary bloat cod...

what's SAM?

potent sky Apr 2, 2024, 2:10 PM

#

Most paper implementations are quite messy

potent sky Apr 2, 2024, 2:11 PM

#

desert oar what's SAM?

Meta's Segment Anything Model

desert oar Apr 2, 2024, 2:11 PM

#

Meta seems to put out high quality OSS ML code

#

Fasttext was very nice quality as well when I looked at it

potent sky Apr 2, 2024, 2:12 PM

#

desert oar Meta seems to put out high quality OSS ML code

True

frozen tundra Apr 2, 2024, 2:12 PM

#

desert oar it depends on what you want to learn. deriving the gradient manually, to implem...

Thanks

potent sky Apr 2, 2024, 2:12 PM

#

desert oar Fasttext was very nice quality as well when I looked at it

oh hmm I haven't checked that out properly I'll have a look again
It honestly makes me so excited lol

desert oar Apr 2, 2024, 2:13 PM

#

Fasttext is actually what I described above: a pure C++ neural network, no autograd stuff

#

(or it was, when I looked at the code in 2018)

potent sky Apr 2, 2024, 2:13 PM

#

I go through research implementations regularly and most of them are so messy it's tiring
It's understandable why they're messy, the researchers' primary function is research, they're not software engineers
But it's tiring nonetheless

potent sky Apr 2, 2024, 2:14 PM

#

desert oar Fasttext is actually what I described above: a pure C++ neural network, no autog...

interesting

#

Aten src is also well structured
Well as well can be expected anyway from a codebase of that size and complexity

#

But I couldn't find any good reference doc for aten itself

jaunty helm Apr 2, 2024, 2:38 PM

#

question, how do you choose between pytorch and tensorflow?

cinder jay Apr 2, 2024, 3:03 PM

#

hi, i have two images, one is the original and the second one is the segmented, how can i overlay the segmented above the original???

potent sky Apr 2, 2024, 3:18 PM

#

cinder jay hi, i have two images, one is the original and the second one is the segmented,...

Use alpha/opacity value for the mask

serene scaffold Apr 2, 2024, 3:36 PM

#

jaunty helm question, how do you choose between pytorch and tensorflow?

just use pytorch

grand geyser Apr 2, 2024, 4:43 PM

#

jaunty helm question, how do you choose between pytorch and tensorflow?

I would prefer pytorch over tensor flow because Google is soon going to shift to Jax from tensorflow
So going with tensor flow may not be much safe in the next upcoming years probably in my opinion

wooden sail Apr 2, 2024, 4:44 PM

#

fair warning that jax does not fill the same niche as tensorflow

grand geyser Apr 2, 2024, 4:44 PM

#

wooden sail fair warning that jax does not fill the same niche as tensorflow

Of course it can't be this soon

wooden sail Apr 2, 2024, 4:45 PM

#

it also never will

#

libraries might be built around it that do, like haiku and flax

#

but jax itself is a different thing

grand geyser Apr 2, 2024, 4:46 PM

#

wooden sail it also never will

I can't predict the future so idk

wooden sail Apr 2, 2024, 4:46 PM

#

it's lower level numpy-like access to the XLA backend that tensorflow also uses

grand geyser Apr 2, 2024, 4:46 PM

#

wooden sail but jax itself is a different thing

Jax needed to be different since tensor flow lost to pytorch

wooden sail Apr 2, 2024, 4:47 PM

#

i'm just saying: jax does not do the same thing tensorflow and pytorch do

#

it's not the same kind of tool

#

you have to do more of the math and design yourself

#

there aren't even any "layers" defined anywhere

#

you have to compose everything by hand

grand geyser Apr 2, 2024, 4:47 PM

#

wooden sail you have to compose everything by hand

For real 💀

wooden sail Apr 2, 2024, 4:48 PM

#

yes

#

that is what it's all about

#

it's literally numpy with autograd and jit

grand geyser Apr 2, 2024, 4:48 PM

#

Thanks for the info
I made up my mind
Never go to Jax 🫡

wooden sail Apr 2, 2024, 4:48 PM

#

nothing else

#

don't get me wrong, it's fantastic for research

#

but not friendly if you don't like/have to do the math yourself

grand geyser Apr 2, 2024, 4:49 PM

#

wooden sail it's literally numpy with autograd and jit

Do I even need Jax if I already know numpy alike libraries?

wooden sail Apr 2, 2024, 4:49 PM

#

yes, because of the jit and autodif

grand geyser Apr 2, 2024, 4:49 PM

#

wooden sail but not friendly if you don't like/have to do the math yourself

Who likes doing math by themselves.....

wooden sail Apr 2, 2024, 4:49 PM

#

researchers that have to design novel architectures

grand geyser Apr 2, 2024, 4:49 PM

#

....

wooden sail Apr 2, 2024, 4:49 PM

#

sometimes you need to solve problems for which no solution exists yet

#

no out-of-the-box layers or architectures

grand geyser Apr 2, 2024, 4:50 PM

#

Btw do you have any good video for pytorch?

wooden sail Apr 2, 2024, 4:50 PM

#

i don't, i use jax 😛

grand geyser Apr 2, 2024, 4:50 PM

#

wooden sail i don't, i use jax 😛

💀💀💀

#

🫡

#

Now I see how you know so much about Jax 😆😆😆😂

robust stratus Apr 2, 2024, 5:02 PM

#

This is a practical way of inserting data into an excel spreadsheet?

import json
from openpyxl import Workbook
from pl_bh.gh_resources import pl_functions as pl

# Load API response data from JSON
projects = pl.get_all_projects()

# Create a new workbook
workbook = Workbook()

# Select the active worksheet
sheet = workbook.active

# Write headers
headers = ['Project Number', 'Project Name', 'Manager Name', 'Type Description']
sheet.append(headers)

# Iterate through each project in the API response and write data to the spreadsheet
for project in projects:
    project_number = project.get('id', '')
    project_name = project.get('name', '')
    manager_name = project.get('manager', {}).get('name', '')
    type_description = project.get('type', {}).get('name', '')

    row_data = [project_number, project_name, manager_name, type_description]
    sheet.append(row_data)

# Save the workbook
workbook.save(filename="projects.xlsx")

raw mortar Apr 2, 2024, 6:00 PM

#

robust stratus This is a practical way of inserting data into an excel spreadsheet? ```py impo...

Does it not work? pithink

robust stratus Apr 2, 2024, 6:18 PM

#

raw mortar Does it not work? <:pithink:652247559909277706>

Yes. i am just asking about the structure of my code

#

A lot of get() functions

past meteor Apr 2, 2024, 6:21 PM

#

jaunty helm question, how do you choose between pytorch and tensorflow?

Keras is easier than using Torch. However ... there's many breaking changes in Tensorflow/keras world. As a matter of fact, Keras is now suddenly multi backend again which brought a host of breaking changes 😅 .

Personally, I've moved to Torch myself but there are a couple of things I do miss from Tensorflow.

raw mortar Apr 2, 2024, 6:22 PM

#

robust stratus Yes. i am just asking about the structure of my code

projects is a list of dicts?
You could use json normalise with pandas and select only the columns which are required and name them appropriately

#

Pandas can also output to excel, underneath it uses openpyxl or other excel extensions

robust stratus Apr 2, 2024, 7:31 PM

#

raw mortar Pandas can also output to excel, underneath it uses openpyxl or other excel exte...

Is pandas better than Workbook? And yes projects is a list of dicts

raw mortar Apr 2, 2024, 7:33 PM

#

robust stratus Is pandas better than Workbook? And yes projects is a list of dicts

It is used to work with tabular-like data. Openpyxl is used to work with Excel. You can combine both

robust stratus Apr 2, 2024, 7:34 PM

#

I need to see an example of how pandas can be incorporated into my current code

desert oar Apr 2, 2024, 9:53 PM

#

robust stratus This is a practical way of inserting data into an excel spreadsheet? ```py impo...

Are you inserting data into an existing sheet, or just writing a data set into a new sheet? If the latter, I prefer pandas .to_excel()

#

!d pandas.DataFrame.to_excel

arctic wedgeBOT Apr 2, 2024, 9:53 PM

#

pandas.DataFrame.to\_excel

DataFrame.to_excel(excel_writer, *, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, ...)```
Write object to an Excel sheet.

To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.

Multiple sheets may be written to by specifying unique sheet\_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.

desert oar Apr 2, 2024, 9:53 PM

#

https://pandas.pydata.org/docs/user_guide/io.html#excel-files

#

Of course, if you aren't already using Pandas, you might want to hold off. It's a big library with a learning curve if you aren't already familiar with the idea of a "data frame" from other contexts.

dusty valve Apr 3, 2024, 3:52 AM

#

How much faster is jax than numly

#

numpy

serene scaffold Apr 3, 2024, 4:13 AM

#

dusty valve How much faster is jax than numly

you can look up "jax vs numpy benchmarks"

visual bone Apr 3, 2024, 5:26 AM

#

hey guys, is there a good youtube video or book to read to learn pyspark?

raw mortar Apr 3, 2024, 6:03 AM

#

visual bone hey guys, is there a good youtube video or book to read to learn pyspark?

Databricks themselves have pretty good documentation and tutorials on pyspark. Also if you have a subscription with them, their academy/learning portal would be freely available.

visual bone Apr 3, 2024, 6:07 AM

#

raw mortar Databricks themselves have pretty good documentation and tutorials on pyspark. A...

oh okay, did not know about Databricks. Will check it. Thank you man

orchid forge Apr 3, 2024, 6:52 AM

#

guys

#

im trying to understand a code
but couldn't

#

price_range = df['Price range']
total_restaurants = len(price_range)
percentage = (price_range.value_counts() / total_restaurants) * 100
percentage
import matplotlib.pyplot as plt

price_range_counts = {}

for price in price_range:
if price in price_range_counts:
price_range_counts[price] += 1
else:
price_range_counts[price] = 1

sorted_price_range_counts = dict(sorted(price_range_counts.items()))

total_restaurants = len(price_range)

percentage = {price: (count / total_restaurants) * 100 for price, count in sorted_price_range_counts.items()}

plt.figure(figsize=(8, 6))
bars = plt.bar(percentage.keys(), percentage.values(), color='lightgreen')

plt.xlabel('Price Range')
plt.ylabel('Percentage of Restaurants (%)')
plt.title('Distribution of Price Ranges Among Restaurants')

plt.xticks(list(percentage.keys()))

plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.ylim(0, 100)

for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}%', ha='center', va='bottom')

plt.tight_layout()
plt.show()

#

this code

#

#

#

#

#

this code

#

anyone here?

orchid forge Apr 3, 2024, 7:15 AM

#

orchid forge

Create a histogram or bar chart to visualize the distribution of price ranges among the restaurants

#

its written at the top

orchid forge Apr 3, 2024, 7:26 AM

#

orchid forge

this

#

for loop

lapis sequoia Apr 3, 2024, 7:29 AM

#

i meant this dataset, is um deprecated....

crisp raptor Apr 3, 2024, 11:24 AM

#

Excel truly is the pinnacle of data science.

lofty thorn Apr 3, 2024, 1:12 PM

#

is there available a separate library for ' matrix '.

wooden sail Apr 3, 2024, 1:24 PM

#

numpy can handle all of your matrix arithmetic for you

lofty thorn Apr 3, 2024, 1:29 PM

#

ok

#

one more question

#

how to enable autocompletion in jupyter notebook

#

?

umbral charm Apr 3, 2024, 8:39 PM

#

#

how does one achieve a diagram like this in matplotlib

#

ive tried doing color = 'None'

#

and edge colour = 'black'

#

but that makes the lines come back down, but i want an out line

#

#

this is what i have

#

but i dont want them lines coming all the way backdown if you get me

#

i just want the outline

cinder jay Apr 3, 2024, 8:49 PM

#

hey, how can i put some legend in a image using numpy or opencv?
because i segmented an image and i need to specify each segmentation

arctic wedgeBOT Apr 3, 2024, 8:49 PM

#

:incoming_envelope: :ok_hand: applied timeout to @pine oasis until <t:1712177971:f> (10 minutes) (reason: newlines spam - sent 106 newlines).

The <@&831776746206265384> have been alerted for review.

spiral peak Apr 3, 2024, 8:50 PM

#

!unmute 423503073479098368

arctic wedgeBOT Apr 3, 2024, 8:50 PM

#

:incoming_envelope: :ok_hand: pardoned infraction timeout for @pine oasis.

spiral peak Apr 3, 2024, 8:50 PM

#

!paste

arctic wedgeBOT Apr 3, 2024, 8:50 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

spiral peak Apr 3, 2024, 8:50 PM

#

pls use our pastebin

pine oasis Apr 3, 2024, 8:50 PM

#

Sorry, whats that?

#

Oh sorry, just seen the embed

#

I wanted some help with pytorch and this snippet
https://paste.pythondiscord.com/WVWQ
to train with this dataset
http://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success

#

I am struggling with understanding different loss functions and how should i shape my output accordingly, the provided snippet works but the results seem really weird

#

If anybody interested ping with reply please

versed pilot Apr 3, 2024, 9:34 PM

#

tranquil mist Yep, saw one of his talks a couple days ago which inspired me to treat pandas wi...

In the worst case, you can go a long way with a laptop and some FOR loops 😉 . But yeah, not sure Spark makes sense on a laptop, you are left with fewer options compared to having access to clusters and/or cloud.

vocal sleet Apr 3, 2024, 9:36 PM

#

What's the best libraries to make a chatbot from pretrained models?

I already tried HuggingFace but got so confused

verbal venture Apr 3, 2024, 10:15 PM

#

Can anyone explain why x is being multiplied to find w during gradient descent in linear regression

tidal bough Apr 3, 2024, 10:24 PM

#

If you write down the formula for the Mean Squared Error loss and calculate the gradient by the weight, that part will be there, because d((w@x-y)^2)/dw = 2 (w@x-y) x

verbal venture Apr 3, 2024, 10:24 PM

#

can you just tell me the reason

mellow vector Apr 3, 2024, 10:25 PM

#

cars.plot(kind = "scatter", x ='horsepower', y = 'mpg', figsize = (12,8), c = "cylinders", marker = "x", colormap = "viridis")```

verbal venture Apr 3, 2024, 10:25 PM

#

why is x included in calculating w but not b

mellow vector Apr 3, 2024, 10:25 PM

#

what is c short for?

tidal bough Apr 3, 2024, 10:26 PM

#

I'm not sure how I can tell it more directly. In the gradient by w it appears from the d(w@x + b - y)/dw term. Whereas for the gradient by b, d(w@x + b - y)/db is 1.

mellow vector Apr 3, 2024, 10:26 PM

#

it produces a scatterplot, cylinders is used to determine the point on a spectrum that colors each data point

tidal bough Apr 3, 2024, 10:27 PM

#

mellow vector what is c short for?

color

mellow vector Apr 3, 2024, 10:27 PM

#

hmm i thought so, guess i didn't realize it would accept a series

tidal bough Apr 3, 2024, 10:27 PM

#

that argument of pd.DataFrame.plot corresponds to this argument of plt.scatter:

#

(except that the pandas version accepts a column name, too, in which case that column is used.)

mellow vector Apr 3, 2024, 10:28 PM

#

ty

#

wish this instructor wouldn't use shortcuts

#

dot notation bleh

desert oar Apr 3, 2024, 10:35 PM

#

mellow vector wish this instructor wouldn't use shortcuts

c actually has different behavior from other options

#

it's annoying because that's just the name, it's not an abbreviation for anything

#

there's also s

#

this is one of the most obnoxious areas in matplotlib

white crown Apr 3, 2024, 10:38 PM

#

I have an excel sheet with multiple sheets. I am reading it as a dictionary where the key is the sheet_name and the values is a dataframe. I am using pydantic v2 to validate this. I need check the mode field. This field should be a value rows in the sheet named zone_sheet and column Zone. I am trying something like this and it doesnt seem to work for me. What is the recommended way to dynamically create a list of valid values?

Zone =[]

Class testSchema(BaseModel):
  mode: Annotated[Zone, Field(description = "some test")]
  
  @model_validator(mode='before')
  def populate_zone(data_dict):
    zone_sheet = data_dict.get('Zone')
    if not zone_sheet:
      ZONE = [zone_sheet['zone'].tolist()]

desert oar Apr 3, 2024, 10:41 PM

#

white crown I have an excel sheet with multiple sheets. I am reading it as a dictionary whe...

You can't do this using type hints. You will need to write a custom validator to check that the zones fall into the list of valid zones from the other sheet.

#

you can do it with an "after" validator though

#

"before" validators are too powerful, avoid if possible

#

That said, you might want to use Pandera instead of (or in addition to) Pydantic

white crown Apr 3, 2024, 10:44 PM

#

desert oar You can't do this using type hints. You will need to write a custom validator to...

can you give me an example? Yes I know Pandera does this better, but it is a complicated sheet and it was already started in Pydantic.

desert oar Apr 3, 2024, 10:45 PM

#

white crown can you give me an example? Yes I know Pandera does this better, but it is a co...

Use two attributes: 1) a list of valid zones, and 2) the data you want to work with. Write a validator for the 2nd field to verify that the data values are only the values in the 1st field

#

But it might help if you gave me an example of what you are trying to achieve, without Pydantic

#

That way I could understand better what you want to do

#

This might be better off in a separate help thread. Make a thread following the instructions in #❓｜how-to-get-help and @ me so I see it

#

It seems mostly like a Pydantic question which isn't really the topic of this channel

white crown Apr 3, 2024, 10:47 PM

#

desert oar But it might help if you gave me an example of what you are trying to achieve, w...

I have a dictionary of dataframes where the sheetname is the key of the dictionary. I want to ONLY accept in the mode field values from sheet 'Zone' column zone.

desert oar Apr 3, 2024, 10:49 PM

#

white crown I have a dictionary of dataframes where the sheetname is the key of the dictiona...

I think I understand. How do you want to represent this in Pydantic? A single attribute? Every data frame a separate attribute?

white crown Apr 3, 2024, 10:50 PM

#

desert oar I think I understand. How do you want to represent this in Pydantic? A single at...

YES. I am not sure how to do this and have tried to do this in several ways and failed. I cant seem to write something that works. So far I have worked on getting this done using "before"

cinder jay Apr 3, 2024, 10:56 PM

#

hi, i have a "problem", i have a few classes that i segmented, im printing em in a image but a few overlay the anothers
how can i fix that??

fading wigeon Apr 3, 2024, 10:57 PM

#

I need help solving a stupid, stupid argument I'm having at work

#

My argument/stance is that the definition of a peak is a point where there's at least one of its neighbors is lower and the other is either equivalent or lower.

The junior coworker says that's something that's entirely made up and actually any point can be a peak if the slope changes

dusty valve Apr 3, 2024, 11:01 PM

#

Hes right

fading wigeon Apr 3, 2024, 11:02 PM

#

How?!?!

#

That's literally every point that isn't the same

#

Like, by that definition, unless the previous point and the following point is equivalent to the current point, it's a peak

dusty valve Apr 3, 2024, 11:03 PM

#

I dunno

#

I think he's right though

fading wigeon Apr 3, 2024, 11:04 PM

#

So he's right, but you don't know how he's right, but you think he is?

#

Based off of information that you don't know?

desert oar Apr 3, 2024, 11:09 PM

#

fading wigeon My argument/stance is that the definition of a peak is a point where there's at ...

What is a "neighbor" here?

fading wigeon Apr 3, 2024, 11:10 PM

#

the point directly preceding or proceding

#

So the neighbors of the peak in the data [1,2,3,2,1] are both 2

desert oar Apr 3, 2024, 11:10 PM

#

In a discrete sequence, I guess that's one way to define a peak? But what about the sequence 1,2,1,2,1,2

#

is every 2 a peak?

fading wigeon Apr 3, 2024, 11:11 PM

#

If we don't take noise tolerance into account, then yes.

#

They are all local maxima

desert oar Apr 3, 2024, 11:11 PM

#

what about 1.0001, 1.0003, 1.0002, 1.045, 1.592, 1.432

#

is 1.0003 a peak?

fading wigeon Apr 3, 2024, 11:12 PM

#

From a strictly mathematical standpoint, yes. For any modern peakfinding algorithm, no, that would fall below noise tolerance thresholds and be deemed as spurious.

#

In the above example, he'd be arguing that 1.045 and 1.432 are peaks

#

because the slope is different from 1.592 to 1.045 to 1.0002

#

But literally any points besides [1,1,1] will have slope changes

#

and thus the definition becomes meaningless

#

Depending on the type of signal data you're processing, (or if you're using the matlab/scipy peakfinder) you can set different thresholds to combat noise. Generally speaking the yaxis distance between a local maxima and a local minima have to be at least 1/4th the range of the data to be classified as a peak.

#

Some people like 1/5th

desert oar Apr 3, 2024, 11:17 PM

#

It sounded like you were describing an algorithm though

#

Not a modern peak-finding algorithm, something very simple, which might in fact work in a lot of cases

#

I'm not sure what the heck they mean by "any point can be a peak if the slope changes" though

fading wigeon Apr 3, 2024, 11:18 PM

#

Well, the junior is arguing that all modern peak finding algorithms are failures, as well as mine, and that only he's good enough to make one to REALLY catch all the peaks, which includes a bunch of points that look like knees or shoulders

desert oar Apr 3, 2024, 11:18 PM

#

wait, do you want to include the shoulders or no?

#

i'd say anyone who takes an approach like that is probably an arrogant asshole, amplified if a junior is saying it

#

"everyone else but me is wrong" is the creed of a crank

fading wigeon Apr 3, 2024, 11:20 PM

#

Yeah, he's incredibly arrogant. He tried to set himself up as the director of software engineering at one point

desert oar Apr 3, 2024, 11:21 PM

#

but whether or not he's right in the context of your particular business case depends a lot on subjectively what do you consider a peak

#

unrelated to the math, people like that are a net drag on productivity and team morale, and are best let go of promptly

#

the longer their tenure without being chastized for their attitude, the more they find validation for their arrogance, and the more arrogant they get, and the more disruptive/counterproductive they get

#

if you give somebody like that too much power, they can sink the entire organization. and even if they don't have power, they can scare away enough talent that you will have issues retaining good people who don't need to put up with it

#

!e import pydantic

arctic wedgeBOT Apr 3, 2024, 11:26 PM

#

@desert oar :x: Your 3.12 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "/home/main.py", line 1, in <module>
003 |     import pydantic
004 | ModuleNotFoundError: No module named 'pydantic'

desert oar Apr 3, 2024, 11:27 PM

#

aw

fading wigeon Apr 3, 2024, 11:31 PM

#

I think the core issue is that he's very charismatic and is trying to climb the corporate ladder

#

So he's brown nosing the right people and he can speak confidently about things

#

These things happen to be total bullshit

#

But I'm an academic/engineer, I am always precise and leave room for myself to be wrong due to incomplete information

#

I know office politics are unavoidable, but...

#

I'm just hoping it's not like this at every company

#

Since I'm looking for the door

#

Maybe I just need to practice being a lying brownnoser, idk.

desert oar Apr 3, 2024, 11:33 PM

#

white crown YES. I am not sure how to do this and have tried to do this in several ways and...

import pandas as pd
import pydantic

class DataWrapper:
    data_frames: dict[str, pd.DataFrame]

    @pydantic.model_validator(mode="after")
    def check_data_modes(self) -> None:
        data_frames = self.data_frames.copy()
        try:
            zone_df = data_frames.pop("Zone")
        except KeyError:
            raise ValueError("Zone sheet is missing from input.")

        valid_zones = zone_df["Zone"].unique()

        for sheetname, df in data_frames.items():
            if not df["mode"].isin(valid_zones).all():
                raise ValueError(f"Sheet {sheetname!r} has invalid 'mode' values!")

like that?

fading wigeon Apr 3, 2024, 11:33 PM

#

Can I just say I love that there's a module called "pydantic"

desert oar Apr 3, 2024, 11:34 PM

#

fading wigeon Can I just say I love that there's a module called "pydantic"

it's a great name

fading wigeon Apr 3, 2024, 11:35 PM

#

I don't even know or care what it does, lol

desert oar Apr 3, 2024, 11:35 PM

#

fading wigeon But I'm an academic/engineer, I am always precise and leave room for myself to b...

at some point you need to voice your concerns w/ management

#

there is also a bit of an art to being precise and scientific without being seen as incoherent or inconclusive

fading wigeon Apr 3, 2024, 11:38 PM

#

Yeah. I have. Office politics are complicated right now. My original boss quit. Temporarily one of the execs was leading the software team but he didn't have any software knowledge. But he is/was really susceptible to yes men and the junior really confidently made the case that he should be promoted to director. I told him that I'm really trying to be a team player, but that I don't think the junior had any idea what he was doing. I got ignored until everything crashed and burned and then I got listened to.

But unfortunately the junior is telling everyone I'm just out to get him and he's well liked, people are buying it. Well, anyone who doesn't know anything about software is buying it.

#

But in the past I've gotten blamed for his mistakes so I'm trying to make any future incidents be crystal clear that I am in strong opposition because that logic was used to justify denying me a raise

desert oar Apr 3, 2024, 11:39 PM

#

yikes

#

sounds well past the point of "work on your resume and start applying" imo

fading wigeon Apr 3, 2024, 11:39 PM

#

Yeah

desert oar Apr 3, 2024, 11:39 PM

#

it sounds like you're doing the right things, but it also sounds like management is toxic and borderline hostile

fading wigeon Apr 3, 2024, 11:40 PM

#

It's hard though, everyone in my field wants to talk about AI and language learning models and I don't have a ton of experience there, so I'm trying to study up as fast as I can

desert oar Apr 3, 2024, 11:40 PM

#

i'm in DS with relatively minimal AI knowledge as well, i am right there with you

#

the jobs exist, but are rarer than they should be

fading wigeon Apr 3, 2024, 11:40 PM

#

Yeah

#

I did find an ML course. I know that doesn't cover LLMs or everyhthing about AI, but it's somewhere to start

desert oar Apr 3, 2024, 11:41 PM

#

most small/mid-size orgs could benefit from an intermediate data scientist and a data analyst, giving the former freedom to do R&D and the latter stays busy with dashboards etc

fading wigeon Apr 3, 2024, 11:41 PM

#

That's a very good point

desert oar Apr 3, 2024, 11:41 PM

#

the problem is data quality -- usually it's horrible

fading wigeon Apr 3, 2024, 11:41 PM

#

Yeah

desert oar Apr 3, 2024, 11:41 PM

#

so you have like 1 year until the DS becomes productive

fading wigeon Apr 3, 2024, 11:41 PM

#

Yup

desert oar Apr 3, 2024, 11:41 PM

#

what is your background if not ML? engineering?

#

the way you're talking about "signals" makes me think of EE

fading wigeon Apr 3, 2024, 11:41 PM

#

Digital Signal Processing for Physiological signals, degree in biomedical engineering

#

There's a lot of EE background knowledge involved

#

It's rough. This job used to be so much fun and I'm in medtech so I got to see people whom my technology directly helped

#

But I need to accept that ship has sailed

desert oar Apr 3, 2024, 11:42 PM

#

BME degree in DSP + a few YoE you should be a pretty compelling candidate as long as you interview well and write decent code

fading wigeon Apr 3, 2024, 11:43 PM

#

Yeah, I just need to be able to speak to AI/LLM/machine learning better

#

I just admit my knowledge on the topics is minimal and the interviews end

#

Well, not knowledge, but my experience is anyway

desert oar Apr 3, 2024, 11:43 PM

#

there are ways to spin that

fading wigeon Apr 3, 2024, 11:44 PM

#

You think so?

desert oar Apr 3, 2024, 11:44 PM

#

plus it's not that hard to dick around with some prompts + run nanogpt locally

fading wigeon Apr 3, 2024, 11:44 PM

#

Hmm

desert oar Apr 3, 2024, 11:44 PM

#

it depends on who's interviewing you and what they're looking for

fading wigeon Apr 3, 2024, 11:44 PM

#

I've definitely like... grilled chat gpt to try to figure out how well it "thinks" and it doesn't.

desert oar Apr 3, 2024, 11:44 PM

#

unfortunately the fad cycle is at peak hype right now so everyone thinks they NEED it

fading wigeon Apr 3, 2024, 11:44 PM

#

Yeah

#

If I tell even the screener I think it's a bit overblown that's the end of the conversation, lol

#

I've never heard of nanogpt

desert oar Apr 3, 2024, 11:45 PM

#

what kinds of jobs are you applying for where the interviews are so AI focused?

fading wigeon Apr 3, 2024, 11:46 PM

#

What's ironic is.... all of them.

#

Well

desert oar Apr 3, 2024, 11:46 PM

#

DM me an example?

fading wigeon Apr 3, 2024, 11:46 PM

#

No, yeah, like all of them. Specifically I focus on more R&D involved positions

#

Sure

desert oar Apr 3, 2024, 11:49 PM

#

that might be part of the problem. what R&D does everyone want to do now? AI

#

you might need to spend some time with self-study

#

i've been recommending https://www.fast.ai/ and https://d2l.ai/

fast.ai

fast.ai - fast.ai—Making neural nets uncool again

#

i never took the full courses but i have gone through enough of the material to feel comfortable recommending it

fading wigeon Apr 3, 2024, 11:51 PM

#

Can't DM you, you have them turned off 😛 But yeah, I think you hit the nail on the head anyway.

desert oar Apr 3, 2024, 11:51 PM

#

but a big part of interviewing is emphasizing what you are good at. you have a really strong math & engineering & programming background? then you will get ramped up quickly on the AI material and can be very effective at prototyping

fading wigeon Apr 3, 2024, 11:51 PM

#

I'm excited to go through the material. I just wish I wasn't suffering at my job while doing it.

#

Hmm

#

I think that's part of the problem. It's hard to sell myself on something unless I'm really confident in my knowledge about it

desert oar Apr 3, 2024, 11:59 PM

#

That's probably a good thing

waxen delta Apr 4, 2024, 1:30 AM

#

Does anyone know how to fix this error with Labelme? labelme
2024-04-03 18:30:20,756 [INFO ] init:get_config:67- Loading config file from: /home/student/.labelmerc
QObject::moveToThread: Current thread (0x2ef3a970) is not the object's thread (0x30045540).
Cannot move to target thread (0x2ef3a970)

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/student/.local/lib/python3.9/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb, eglfs, linuxfb, minimal, minimalegl, offscreen, vnc.

Aborted

final swift Apr 4, 2024, 4:26 AM

#

My knowledge of python right now is somewhat limited, I have spent a good chunk of time working with lists, arrays and the other more basic things, I have not taken an opportunity to look at hashmaps, sql, or really much with databases or neaural networks. I have looked at big O notation, and a little at classes. with all that in mind, is there any projects y'all would recommend to help build the necessary skills, or better cement foundations, for data work? I would like something a little more difficult than what I have been working on, which is Flask projects.

#

the other question I guess is, and what skills is it that I'm looking for?

jaunty helm Apr 4, 2024, 4:41 AM

#

final swift My knowledge of python right now is somewhat limited, I have spent a good chunk ...

if by data work you mean data science, then I don't think sql / flask will help that much?
kaggle is a good starting point I feel

Kaggle: Your Machine Learning and Data Science Community

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

final swift Apr 4, 2024, 4:42 AM

#

jaunty helm if by data work you mean data science, then I don't think sql / flask will help ...

my time with Flask was just to get an understanding of OOP to an extent, sql is for databasing is it not?

jaunty helm Apr 4, 2024, 4:43 AM

#

final swift my time with Flask was just to get an understanding of OOP to an extent, sql is ...

yeah, and databases & data science are a bit different

orchid forge Apr 4, 2024, 4:43 AM

#

i still didnt get it that why are we even doing it, i understood the code but what is the reason behind that fact that we are using a loop

#

im so dumb

#

why do we need to iterate through it

final swift Apr 4, 2024, 4:43 AM

#

jaunty helm yeah, and databases & data science are a bit different

Ah okay I will look more into that. You can probably tell just how little experience I actually have. Thank you very much for your help.

orchid forge Apr 4, 2024, 4:43 AM

#

like why?

#

i wanna cry

#

im so fucking dumb

#

i should give up....its too hard for me

final swift Apr 4, 2024, 4:46 AM

#

Nah man don’t give up. You got this. Just keep pushing through. Too hard just means for now.

pulsar wolf Apr 4, 2024, 6:17 AM

#

yall know any cool open source data science projects?

abstract rune Apr 4, 2024, 6:30 AM

#

Is this right guys ??

package matrix

// The constants of linear equations do not determine whether the matrix A
// singular or not!
func Singularity(A [][]float64) bool {
    return NumberOfRows(A) == Rank(A)
}

func Rank(A [][]float64) int {
    var rank int
    matrix, _ := REF(A)
    for i := 0; i < NumberOfRows(A); i++ {
        isNonZeroPresent := false
        for j := 0; j < NumberOfCols(A); j++ {
            if matrix[i][j] != 0 {
                isNonZeroPresent = true
            }
        }
        if isNonZeroPresent {
            rank++
        }
    }
    return rank
}

#

if the matrix(square) is not singular, then it is guranteed that the determinant is 0 , right ?

versed flame Apr 4, 2024, 9:39 AM

#

If I get a 'good' result when doing machine learning, what are ways to figure out of that result really is correct or not?
Im basically an idiot, and im probably training my model towards the wrong thing or whatever, or testing it incorrectly.

#

Ill try to figure out how to do that

#

Im doing something probably every moron is doing and trying to predict stock. I've been getting < 0.4-0.5 accuracy constantly and now suddenly i get 0.75

#

I assume im doing something wrong.

grand geyser Apr 4, 2024, 10:34 AM

#

abstract rune Is this right guys ?? ```go package matrix // The constants of linear equation...

Isn't that JavaScript?

#

nvm it is not javascript
it is golang code

#

Guys is this video good enough to learn pytorch?

https://youtu.be/V_xro1bcAuA?feature=shared

YouTube

freeCodeCamp.org

PyTorch for Deep Learning & Machine Learning – Full Course

Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.

✏️ Daniel Bourke developed this course. Check out his channel: https://www.youtube.com/channel/UCr8O8l5cCX85Oem1d18EezQ

🔗 Code: https://github.com/mrdbourke/pytorch-deep-learning
🔗 Ask a question: https://githu...

▶ Play video

wooden sail Apr 4, 2024, 10:53 AM

#

abstract rune if the matrix(square) is not singular, then it is guranteed that the determinant...

the determinant is 0 if the matrix IS singular

abstract rune Apr 4, 2024, 10:54 AM

#

grand geyser Isn't that JavaScript?

Don't disrespect Go 😭

grand geyser Apr 4, 2024, 10:54 AM

#

abstract rune Don't disrespect Go 😭

I didn't disrespect that 💀

#

I thought it was JavaScript till I saw "func"

#

I don't think you will find many people that know golang in the python community..... 💀

grand geyser Apr 4, 2024, 10:55 AM

#

abstract rune Don't disrespect Go 😭

Golang is great for his own use cases

abstract rune Apr 4, 2024, 10:57 AM

#

grand geyser Golang is great for his own use cases

My favourite language😌

grand geyser Apr 4, 2024, 10:58 AM

#

People can have different favourite language
Someone's favourite language can be java or JavaScript or c or c++ or python etc

jaunty helm Apr 4, 2024, 12:13 PM

#

playing around with polars, anyway to select the columns where the value isn't 0? (without doing it manually ofc)

jaunty helm Apr 4, 2024, 12:32 PM

#

shape isn't preserved but works fine for my case

graceful carbon Apr 4, 2024, 3:44 PM

#

how long does it take to learn to make AI assistant in python

#

??? hello

#

bro is anyone there

serene scaffold Apr 4, 2024, 4:11 PM

#

graceful carbon how long does it take to learn to make AI assistant in python

depends on what you want it to do

serene scaffold Apr 4, 2024, 5:08 PM

#

pydantic questions go in #type-hinting

charred light Apr 4, 2024, 5:43 PM

#

What would be a better way to visualize this data?

agile cobalt Apr 4, 2024, 6:15 PM

#

maybe a pie, sunburst or icicle chart

charred light Apr 4, 2024, 6:32 PM

#

There's 4 dimensions, 3 categorical 1 continuous.

molten forge Apr 4, 2024, 6:40 PM

#

Anyone having experience using shifter legendre fourier moments for image analysis?

buoyant vine Apr 4, 2024, 10:02 PM

#

Tbh, as much as that would be cool, I suspect what it is actually going to lead to is, some creative new scams (probably) and a lot of spam of automated systems producing low quality content on all the major platforms, i.e. YT, TK, etc...

#

not that it isn't already at that stage (spam wise)

lapis sequoia Apr 5, 2024, 1:38 AM

#

merger datasets. Like mergers and aquitions datasets

mild grotto Apr 5, 2024, 4:46 AM

#

Ok, I'm having trouble wrapping my head around this problem:
I have a numpy matrix M, and an adjacency matrix A.
I also have a heightmap H. All these are the same dimensions.

I want to calculate a gradient map G using H and A. to figure out the gradient in each adjacent direction.
Then I want to "move" each value in H in the direction of the largest gradient...

quaint loom Apr 5, 2024, 9:41 AM

#

Is there anyone who have used Mantel test and know this issue:
ecopy Mantel test ValueError: Matrix d1 must be a square, symmetric distance matrix"

Is so, how did you handle it?

boreal gale Apr 5, 2024, 9:44 AM

#

quaint loom Is there anyone who have used Mantel test and know this issue: ecopy Mantel test...

haven't heard nor used mantel test before.

have you checked that your matrix is square and symmetric as the error indicated?

quaint loom Apr 5, 2024, 9:49 AM

#

boreal gale haven't heard nor used mantel test before. have you checked that your matrix is...

Yes, the output is just telling me that Matrix d1 must be a square, symmetric distance matrix.

boreal gale Apr 5, 2024, 9:50 AM

#

quaint loom Yes, the output is just telling me that Matrix d1 must be a square, symmetric di...

how have you checked the matrix is square and symmetric?

quaint loom Apr 5, 2024, 9:54 AM

#

boreal gale how have you checked the matrix is square and symmetric?

"if env_distances.shape[0] != env_distances.shape[1] or target_distances.shape[0] != target_distances.shape[1]:
raise ValueError("Distance matrices are not square.")" for the distance and

"def is_symmetric(matrix):
return np.allclose(matrix, matrix.T)"
for the symmetric

boreal gale Apr 5, 2024, 9:59 AM

#

quaint loom "if env_distances.shape[0] != env_distances.shape[1] or target_distances.shape[0...

okay, are you using a jupyter notebook?

#

which mantel test implementation/library are you using?

quaint loom Apr 5, 2024, 10:02 AM

#

boreal gale okay, are you using a jupyter notebook?

I am using VScode.
Ecopy and scity

boreal gale Apr 5, 2024, 10:02 AM

#

can you turn on some debugger and see where it fails and step in

quaint loom Apr 5, 2024, 10:03 AM

#

boreal gale can you turn on some debugger and see where it fails and step in

I have tried.

#

I will share it once I have finished my dinner. Okey?

boreal gale Apr 5, 2024, 10:03 AM

#

sure

proper timber Apr 5, 2024, 10:06 AM

#

hello everyone

#

i'm a python developer

unkempt yoke Apr 5, 2024, 10:19 AM

#

Hi python developer

quaint loom Apr 5, 2024, 10:45 AM

#

boreal gale okay, are you using a jupyter notebook?

So I think theproblem stems from the improper construction of the distance matrices, resulting in matrices that are neither square nor symmetric. It seem from the debugging that it indicate that the matrices are not square and symmetric, which leads to errors during further operations such as the Mantel test.

env_distances = np.zeros((len(area_data), len(area_data)))
target_distances = np.zeros((len(area_data), len(area_data)))

print("Constructing distance matrices...")
for i in range(len(area_data)):
for j in range(len(area_data)):
# Compute distances for env_distances
env_distances[i, j] = np.linalg.norm(area_data[env_columns].iloc[i] - area_data[env_columns].iloc[j])

    target_distances[i, j] = np.linalg.norm(area_data[target_columns].iloc[i] - area_data[target_columns].iloc[j])

print("Distance matrix (env_distances):")
print(env_distances)
print("Distance matrix (target_distances):")
print(target_distances)

hybrid spruce Apr 5, 2024, 10:58 AM

#

I have a large dataset shared with me on Dropbox, but it’s too large to download directly, and I can’t copy it to my own Dropbox either to perform CLI operations. Any ideas on how to access it?

quaint loom Apr 5, 2024, 11:00 AM

#

hybrid spruce I have a large dataset shared with me on Dropbox, but it’s too large to download...

Get yourself a hard-disk, change the download location to your hard-disk and whalaah : )

hybrid spruce Apr 5, 2024, 11:02 AM

#

quaint loom Get yourself a hard-disk, change the download location to your hard-disk and wha...

Haven’t tried that

uncut beacon Apr 5, 2024, 12:32 PM

#

Hello, I have a relatively large dataset that I'm working with for DL using Tensorflow.

Is there a recommended way to select features in my dataset; for example, getting the correlation of each feature if it can help me answer a business question, or how is it correlated to a specific feature that I want to use. The model is expected to have 95% accuracy, so I'm worried about my feature selection.

Any tips how to approach this? Thanks!

toxic mortar Apr 5, 2024, 12:33 PM

#

How is this all one cluster?

#

I use HDBSCAN(min_cluster_size=5, min_samples=2, cluster_selection_epsilon=0.35)

hushed matrix Apr 5, 2024, 1:43 PM

#

Hi, I've been using Python for a few years now, and recently I started studying artificial intelligence (been more specific computational vision). Usually, do you use Jupyther Nootebook or any IDE on your computer?

#

Yeah I'm currently using google colab for small experiments, but for performace do u notice some difference?

toxic mortar Apr 5, 2024, 2:26 PM

#

yes. my bad

#

I thought it was clustering algo

#

⚠️ RANT INCOMING ⚠️

So,

#

The way I interpret human logic is similar to what I've heard in classical philosophy, "induction" and "deduction".

#

I see "induction" as finding the conceptual connections based on the observation of an outcome.

Whereas "deduction" is finding the outcome based on the observation of conceptual connections

Current AI seems very inductive, which might explain some of the issues we've seen:

Some examples might include:

The algorithm being impressionable, just letting you assert anything to be true.
Algorithms appearing to struggle with thought experiments with no real world equivalent.
Representation biases.
Conceptual contradictions

#

Do you think the possibility of deductive logic algorithms have been explored?

elder hemlock Apr 6, 2024, 12:08 PM

#

To illustrate, I'll make up two conversations to demonstrate how I think each would work:

#

Induction:


(Here, the algorithm has no choice but to search for a real world example, which would either be from fictional material, or in this case, other people's answers.)

A : Based on our media representation and reaction to vampires, this must mean we would be scared, and morbidly curious.```
***Deduction:***
```P : What would humans do if vampires were real?

(At this point, the algorithm would either ask the prompter for clarification, or use induction to establish an understanding.)

(The AI will then assume that all paradoxical or contradictory outcomes are impossible, and remove them from the set.)

A : Assuming that vampires must kill humans to survive, this would limit the set of outcomes to:

1. Vampires depending on humans.
2. Vampires and humans being co-dependent.
3. Humans killing vampires.
4. Vampires killing all humans, and dying.
5. Vampires not killing humans, and dying.```

#

My speculation is that inductive algorithms would be using knowledge graphs that store event data and connects them (like existing AI)
- More memory intensive, less process intensive
- More perceptive of the real world
- Struggles with thought experiments and abstract logic
- Predisposed to popular culture and convention
And that deductive algorithms would use a Markov chain to map the abstract concepts.
- Less memory intensive, more process intensive
- Limited to theoretical thought
- Capable of speculating on scenarios it has not encountered yet
- Detached from perceiving the material world
- Predisposed to building internally defined principles.

#

If this is true, I argue that they'd both serve as the duality of a human, and the key to its simulation.

rapid hill Apr 6, 2024, 1:58 PM

#

hello, does anyone know about fbprophet module

jaunty helm Apr 6, 2024, 2:00 PM

#

rapid hill hello, does anyone know about fbprophet module

ask your problem directly, that way someone that knows (I am not one of them) can answer without having to ask you back

rapid hill Apr 6, 2024, 2:06 PM

#

jaunty helm ask your problem directly, that way someone that knows (I am not one of them) ca...

Im trying to run a stock predictor but ı couldnt install this module in jupyter, ı dont know the problem so is this module removed or doesnt work

#

ım asking it in general does anyone have information about this module

#

And ım also hire a Python developer for Finance if anyone interested dont hesitate me text me privately

#

excuse my english ım not good at it

elder hemlock Apr 6, 2024, 2:24 PM

#

Sorry for that rambling!
I get what you mean, and I agree an artificial thinking algorithm would compare fully to us.

#

These ideas keep coming back frequently, and I keep seeing it around me.

#

I'm tempted to believe that the machine can unlock a form of thinking that can be purely enclosed in a think tank. Undisturbed by physical mysteries.

#

And I sometimes dip into this when thinking of ways to design AI for games.

#

I came up with a design concept I'm still trying to implement. Where instead of the AI using detection to perceive the world, they can look at the game's programming, and make decisions based on object oriented relationships.

#

Whereas those machines learn from the outcome of a mutating technique, the theory behind what I'm suggesting,

#

Is that the robot has a preconception of how object types interact, and then it can decide based on that.

#

So for example, if a game npc sees in the code that it's possible for the player to kill them, then it might take measures to avoid or confront the player, depending on which one is more likely to succeed based on how the code looks.

#

It's a little like "static analysis", and I think it's a way to create an adaptive AI with good hindsight.

serene scaffold Apr 6, 2024, 2:57 PM

#

pearl ocean My main question I was originally intending to ask is if anyone has a good idea ...

just use pytorch

bleak gate Apr 6, 2024, 3:06 PM

#

elder hemlock So for example, if a game npc sees in the code that it's possible for the player...

that sounds beautiful and terrifying at the same time

elder hemlock Apr 6, 2024, 3:09 PM

#

bleak gate that sounds beautiful and terrifying at the same time

I discovered this theory when brainstorming how to make a game where players change the rules and create new concepts within it.

#

Like, imagine that you could create your own sword with code.

#

This algorithm might lend itself to measuring and finding efficient courses of action, which can be applied to debugging, difficulty scaling, and generated player advice.

#

In my game project, I intend to use this to give me a metric of a fair design, so I can keep all player creations balanced.

bleak gate Apr 6, 2024, 3:13 PM

#

dAamnnnnnnnnnnnnnnnnnnnnnnnnn

#

that sounds phenomenal

#

what language are you gonna use to code it

elder hemlock Apr 6, 2024, 3:14 PM

#

Uh, well I'm using python to make a proof of concept, and if I ever finish it, I might show this to people who are interested, or just take it into a different language.

bleak gate Apr 6, 2024, 3:15 PM

#

sounds fair

#

so it wont have levels to it

#

right ?

elder hemlock Apr 6, 2024, 3:18 PM

#

For the prototype, my goals are to make:

A modest scripting system, where you can write an object behavior
An algorithm to apply this analysis technique
A procedure to reject or accept a player design (optionally, a procedure that could edit a design to suit the fairness requirements)
A sandbox enclosure where objects will come into existence

#

(This won't have graphics, I'll use text)

#

If it works, then it might transfer into real games.

bleak gate Apr 6, 2024, 3:20 PM

#

i see

#

super interesting

elder hemlock Apr 6, 2024, 3:21 PM

#

I occasionally rant about this idea, because it does seem crazy.

bleak gate Apr 6, 2024, 3:21 PM

#

hell naw

elder hemlock Apr 6, 2024, 3:21 PM

#

I get the feeling I'm confusing people though

bleak gate Apr 6, 2024, 3:21 PM

#

chatgpt wouldve been a good laugh in 2025

#

2015*

#

look at it now

#

apple vision

#

all of it

elder hemlock Apr 6, 2024, 3:23 PM

#

Since the idea is difficult to explain, I've been working in a vacuum, and I wonder if the idea is actually new or not.

toxic mortar Apr 6, 2024, 3:59 PM

#

Why do I get this error:```
Batches: 100%|##############################################################################################################################################################################################################################################################9| 13609/13612 [08:24<00:00, 27.17it/s]
Batches: 100%|###############################################################################################################################################################################################################################################################| 13612/13612 [08:24<00:00, 26.98it/s]
2024-04-06 16:54:38,324 - BERTopic - Dimensionality - Completed ✓
2024-04-06 16:54:38,330 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-04-06 16:55:02,045 - BERTopic - Cluster - Completed ✓
[2024-04-06 16:55:02,534: ERROR/MainProcess] Task econ.api.queries.features.process_new_files_topic[fb957415-f2a8-4f4d-a4b3-71c995615de8] raised unexpected: ValueError('empty vocabulary; perhaps the documents only contain stop words')
app\venv\Lib\site-packages\sklearn\feature_extraction\text.py", line 1295, in _count_vocab
raise ValueError(
ValueError: empty vocabulary; perhaps the documents only contain stop words


I am using BERTopic and I preprocess docs with nltk.tokenize.sent_tokenize. What can this be? 

Is my preprocessing the problem, or clustering algo?
This is my topic model

serene scaffold Apr 6, 2024, 4:04 PM

#

please don't post screenshots of text

#

!code

arctic wedgeBOT Apr 6, 2024, 4:04 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

toxic mortar Apr 6, 2024, 4:08 PM

#

def create_model():
    hdbscan_model = HDBSCAN(min_cluster_size=20, metric='euclidean', cluster_selection_method='eom', prediction_data=True)
    main_representation = KeyBERTInspired()
    client = openai.OpenAI(api_key=OPENAI_API_KEY)
    aspect_model1 = [KeyBERTInspired(top_n_words=45), MaximalMarginalRelevance(diversity=0.7)]
    tokenizer = (
        tiktoken.encoding_for_model("gpt-3.5-turbo"))
    prompt = """
             You are a helpful, respectful and honest assistant for labeling topics.
             I have a topic that contains the following documents:
             [DOCUMENTS]

             Based on the information above, extract a short but highly descriptive topic label of at most 3 or 4 words. Be precise. Make sure it is in the following format:
             topic: <topic label>
             """
    aspect_model2 = OpenAI(client, model="gpt-3.5-turbo", exponential_backoff=True, chat=True, prompt=prompt,
                           tokenizer=tokenizer, diversity=0.75)
    representation_model = {"Main": main_representation, "Aspect1": aspect_model1, "Aspect2": aspect_model2, }
    vectorizer_model = CountVectorizer(stop_words="english", ngram_range=(1, 2))
    topic_model = BERTopic(hdbscan_model=hdbscan_model, embedding_model=embedding_model,
                           representation_model=representation_model,
                           vectorizer_model=vectorizer_model,
                           language='english', calculate_probabilities=True,verbose=True)
    return  topic_model

lapis sequoia Apr 6, 2024, 4:13 PM

#

@serene scaffold what should i start learning first for Data Analysis?

serene scaffold Apr 6, 2024, 4:13 PM

#

lapis sequoia <@253696366952316929> what should i start learning first for Data Analysis?

!resources data science

arctic wedgeBOT Apr 6, 2024, 4:13 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

hasty furnace Apr 6, 2024, 5:56 PM

#

serene scaffold !resources data science

For Data analysis, you can start with SQL.
Afterwards if you want to move to data science and ML. You'll need to start with Python & Stats.

white crown Apr 6, 2024, 7:33 PM

#

I have this function. data_frame is a dictionary of dataframes which I am reading in from an Excel file with multiple sheets. I am trying to add a few columns to the dataframe in a sheet called "Signaling_Port" ONLY leaving the rest of the dictionary of dataframes alone. When I attempt to print the entire df I get "ValueError: If using a scalar values, you mus pass an index". I have been at this for a while now. I have attempted to reset the index, set the index to 0 and even putting the return in a list "pd.DataFrame([data_dict])" and cant seem to figure out the correct syntax. Please help.

def add_data_Signaling_Port(data_frame):
    updated_data_dict = {"zone_new": ["data1", "data2"], 
                         "Global1_new": ["new_data1", "new_data2"], 
                         "test_new": ["test_data1"]}
    
    for key, values in updated_data_dict.items():
        data_frame["Signaling_Port"].loc[0, key] = values[0] if values else None
    
    return pd.DataFrame(data_frame)
   
data_frame = add_data_Signaling_Port(data_frame)
print(data_frame)

elder hemlock Apr 6, 2024, 8:56 PM

#

I feel like any analogy I could give might complicate things.

#

I'll think.

#

I don't know any real examples of this, unfortunately.

#

:_

desert oar Apr 6, 2024, 10:39 PM

#

elder hemlock I came up with a design concept I'm still trying to implement. Where instead of ...

This is definitely an accepted technique

#

You mean something like, giving the AI access to the actual game state and mechanics? Definitely not a new idea

#

It's not like the OpenAI Dota bot is using computer vision to parse the screen

#

It's a question of what you actually want to achieve

iron basalt Apr 6, 2024, 10:56 PM

#

desert oar You mean something like, giving the AI access to the actual game state and mecha...

Giving the AI direct access to the game state is a much easier problem, especially if it's the whole game state (no hidden state). It's a starting point if you want to start with something a lot easier.

#

But note that if you are comparing it to human performance, it's apples and oranges.

terse frigate Apr 7, 2024, 2:07 AM

#

HI guys... i am trying to learn MLOps.. but unfortunately i study on a macbook.. i would like to know if there are any cheap clouds available out there which i can experiment and learn hands-on ... open to any suggestions. Cheers! 😄

serene scaffold Apr 7, 2024, 2:27 AM

#

terse frigate ### HI guys... i am trying to learn MLOps.. but unfortunately i study on a macbo...

are you a university student?
in either case, docker is a pretty useful skill for MLops, and you can experiment with that on your macbook.

terse frigate Apr 7, 2024, 2:31 AM

#

serene scaffold are you a university student? in either case, docker is a pretty useful skill fo...

I am a university student yes

#

I am learning docker

#

I am very lost tho

serene scaffold Apr 7, 2024, 2:31 AM

#

terse frigate I am a university student yes

you might see if your university has a cloud where they could give you some space. otherwise, I think AWS will give students some credits to experiment.

terse frigate Apr 7, 2024, 2:32 AM

#

My uni has a cluster yes

#

But I graduated so I don’t think I can use that anymore

serene scaffold Apr 7, 2024, 2:32 AM

#

so you're not a university student

terse frigate Apr 7, 2024, 2:32 AM

#

Well

serene scaffold Apr 7, 2024, 2:33 AM

#

being a university graduate is not the same as being a university student.

terse frigate Apr 7, 2024, 2:33 AM

#

I finished my course last month haha i haven’t received the degree yet

#

I still have access to uni email and all that but not sure for how long

serene scaffold Apr 7, 2024, 2:33 AM

#

I see

#

well, I would try enrolling in AWS's student thing ASAP, then

terse frigate Apr 7, 2024, 2:34 AM

#

Oh

#

Ok ok I’ll do that

#

Also can you suggest some learning materials for docker for MLops on Mac?

serene scaffold Apr 7, 2024, 2:34 AM

#

That aside, I think you're probably smart to be focusing on MLops.

terse frigate Apr 7, 2024, 2:35 AM

#

serene scaffold That aside, I think you're probably smart to be focusing on MLops.

You’re very very correct

serene scaffold Apr 7, 2024, 2:35 AM

#

terse frigate Also can you suggest some learning materials for docker for MLops on Mac?

not really. it should be basically the same as doing it on windows or linux.

terse frigate Apr 7, 2024, 2:35 AM

#

It’s a niche

#

Isn’t it?

serene scaffold Apr 7, 2024, 2:36 AM

#

terse frigate It’s a niche

Just that I think a lot of students/pre-career people have their heart set on ML, but they don't realize that it's all theoretical math that they won't like.

terse frigate Apr 7, 2024, 2:36 AM

#

Well, I kinda got interested because I found myself very lost and overwhelmed by MLops at my internship

serene scaffold Apr 7, 2024, 2:37 AM

#

what were you trying to do during your internship that made you feel lost

terse frigate Apr 7, 2024, 2:37 AM

#

I was working for a startup that needed me to deploy my code on bare metal

serene scaffold Apr 7, 2024, 2:38 AM

#

so you had to start with a machine that didn't have an OS installed, or what?

terse frigate Apr 7, 2024, 2:38 AM

#

Yep

#

I could not do it

#

I had absolutely no clue or direction

serene scaffold Apr 7, 2024, 2:38 AM

#

weird that they asked you to do that, and not the person who bought the hardware

terse frigate Apr 7, 2024, 2:39 AM

#

Also they told me I’d be working under someone

#

But it was just me doing everything

serene scaffold Apr 7, 2024, 2:39 AM

#

terse frigate Also they told me I’d be working under someone

but they killed that person?

terse frigate Apr 7, 2024, 2:39 AM

#

From fetching data to training to deploying

terse frigate Apr 7, 2024, 2:40 AM

#

serene scaffold but they killed that person?

They just kept saying they’re busy in another project

iron basalt Apr 7, 2024, 2:40 AM

#

I would not expect a beginner to be able to do anything in that situation. Sounds like you were in a strange situation.

terse frigate Apr 7, 2024, 2:40 AM

#

iron basalt I would not expect a beginner to be able to do anything in that situation. Sound...

For a very very long time I got overwhelmed and kept blaming myself and was too scared to ask for help

iron basalt Apr 7, 2024, 2:41 AM

#

terse frigate For a very very long time I got overwhelmed and kept blaming myself and was too ...

This is why NASA works the way it does, everyone (and I mean everyone) is assigned a mentor that they can always refer to. It's part of their setup that you are expected to have questions and not know things.

terse frigate Apr 7, 2024, 2:41 AM

#

Yeah and didn’t pay me for the last 2 months I worked

#

Because I “didn’t deliver”

serene scaffold Apr 7, 2024, 2:42 AM

#

iron basalt This is why NASA works the way it does, everyone (and I mean everyone) is assign...

who's the mentor for the person at the very top? themselves, or the president?

terse frigate Apr 7, 2024, 2:42 AM

#

They even said they gonna hire another senior to help me with everything but that never happened lol

iron basalt Apr 7, 2024, 2:43 AM

#

serene scaffold who's the mentor for the person at the very top? themselves, or the president?

IIRC top gets assigned "mentors" still, they don't need to be above, just anyone at all that you are supposed to go to.

#

So you never feel completely stuck.

serene scaffold Apr 7, 2024, 2:44 AM

#

terse frigate They even said they gonna hire another senior to help me with everything but tha...

what else did they ask you to do, out of curiosity?

terse frigate Apr 7, 2024, 2:45 AM

#

serene scaffold what else did they ask you to do, out of curiosity?

So I had to code a script to fetch data from some API

iron basalt Apr 7, 2024, 2:45 AM

#

iron basalt IIRC top gets assigned "mentors" still, they don't need to be above, just anyone...

They really just want you communicate your problems.

terse frigate Apr 7, 2024, 2:45 AM

#

Parse it from JSON to tabular

serene scaffold Apr 7, 2024, 2:45 AM

#

iron basalt They really just want you communicate your problems.

do you also get assigned a therapist?

terse frigate Apr 7, 2024, 2:46 AM

#

Then train that table on a QA model

iron basalt Apr 7, 2024, 2:46 AM

#

serene scaffold do you also get assigned a therapist?

I don't think so, that would be next level for a company with a giant budget.

terse frigate Apr 7, 2024, 2:46 AM

#

Design pipelines for CICD

#

deploy all that on metal

iron basalt Apr 7, 2024, 2:46 AM

#

But I suppose it kind of is, so you don't sit there frustrated.

serene scaffold Apr 7, 2024, 2:47 AM

#

terse frigate Design pipelines for CICD

that's a lot to put on an intern with no support. sounds like they're very dysfunctional.

terse frigate Apr 7, 2024, 2:48 AM

#

serene scaffold that's a lot to put on an intern with no support. sounds like they're very dysfu...

I just don’t want to blame it on anyone and use it as a learning opportunity; hence, trying to learn MLOps 😅

#

Following Andrew Ng on coursera

serene scaffold Apr 7, 2024, 2:49 AM

#

terse frigate I just don’t want to blame it on anyone and use it as a learning opportunity; he...

You can always blame other people. And frankly, you always should. Taking personal accountability is for the weak.

terse frigate Apr 7, 2024, 2:50 AM

#

serene scaffold You can always blame other people. And frankly, you always should. Taking person...

Hahahaha

raw mortar Apr 7, 2024, 4:28 AM

#

terse frigate Well, I kinda got interested because I found myself very lost and overwhelmed by...

Are you supposed to implement a well defined strategy or define it yourself?

#

An intern defining an mlops strategy doesn't sound right to me

#

This is pretty good, it's a big ad for aws products which can replaced with product xyz.
https://youtu.be/UnAN35gu3Rw

YouTube

AWS Events

AWS Summit ANZ 2022 - End-to-end MLOps for architects (ARCH3)

Learn how to design an end-to-end machine learning architecture, one step at a time, graduating from a simple model deployment to a complex multi-model strategy. This session aims to help architects working with data scientists and machine learning engineers to implement machine learning use cases. Prior knowledge of and experience with core AWS...

▶ Play video

orchid forge Apr 7, 2024, 5:44 AM

#

@serene scaffold hey

orchid forge Apr 7, 2024, 6:13 AM

#

what is wrong with my code here?

#

can someone please help me

arctic wedgeBOT Apr 7, 2024, 7:16 AM

#

:incoming_envelope: :ok_hand: applied timeout to @scenic pier until <t:1712474799:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

latent parcel Apr 7, 2024, 8:16 AM

#

keep feeding the fire stelercus, keep feeding it

orchid forge Apr 7, 2024, 8:46 AM

#

latent parcel keep feeding the fire stelercus, keep feeding it

Wym
Am i doing anything wrong?

orchid forge Apr 7, 2024, 8:47 AM

#

latent parcel keep feeding the fire stelercus, keep feeding it

Why you're being so mean

terse frigate Apr 7, 2024, 9:26 AM

#

orchid forge what is wrong with my code here?

Can you show the error

elder hemlock Apr 7, 2024, 9:55 AM

#

desert oar You mean something like, giving the AI access to the actual game state and mecha...

I see, are you able to give any examples?

#

I'm imagining an algorithm that can read game code, and do a static analysis on it to generate some behaviors for the npc.

#

Since game code is a static concept relationship map.

#

The benefit of this also being that if this game code evolves as the consequence of players creating new features in a sandbox style environment, the NPC should be able to understand this better?

#

Gradient descent?

#

For reference, this is supposed to be my proposal for a zero-learning model. It, uh, just "knows"?

#

At least that's how I think it works.

#

Yeah, that's closer to what I'm thinking.
Imagine you take a game script, turn it into a knowledge graph, and use that as the basis for an NPC's behaviors?

#

This knowledge graph may also include properties relating to a "Markov chain" I think?

#

So, where exactly does this hypothetical algorithm have to do any learning?

#

#data-science-and-ml message

thorny lodge Apr 7, 2024, 10:08 AM

#

Guys, I'm building an app for estimations in terms of the effort, time, and cost associated with software development projects. I'm trying to leverage ML for such a thing. I'm trying to find a pre-trained model that I can utilize for such a task. Where would be a good place to find what I'm looking for? I tried hugging face website & tensorflow hub but I haven't used them before so I'm not sure where & what to look up for exactly, kinda confused, I'd appreciate a lil guidence ❤️

elder hemlock Apr 7, 2024, 10:13 AM

#

Maybe I'll put this example into steps:

There is a hypothetical sandbox-style game where players may insert new custom programs to create new entities.
There is a generic enemy, who's goal is to stop the player.
The player designs a new object, that has mechanics relevant to the enemy's efficiency.
My proposal for a solution to this, is to take the players custom mechanic, and add it into a knowledge graph / markov chain.
The enemy can then do a analysis on this data structure, to ascertain the best course of action, either generally, or in a specific scenario.

#

This solution could be described as "artificial intelligence", but at the same time, does not require a learning process, other than step 4.

#

And is my example for a program that utilizes the "deductive" thinking method, instead of an "inductive" one.

#

I guess, it has rules?