#data-science-and-ml | Python | Page 130

past meteor Jun 26, 2024, 6:44 AM

#

80 % of what I did in the course "fundamentals of AI" was pathfinding and search

spring field Jun 26, 2024, 6:44 AM

#

cuz like the machine was actually involved in figuring out those conditionals

past meteor Jun 26, 2024, 6:44 AM

#

https://onderwijsaanbod.kuleuven.be/2023/syllabi/e/H02A0AE.htm#activetab=doelstellingen_idp3137744

#

^ the content may interest you

#

have deep knowledge and insight into fundamental techniques from Artificial Intelligence, including: basic search methods, heuristic search methods, optimal path search methods, game tree search techniques, constraint solving techniques, planning techniques and markov decision processes

spring field Jun 26, 2024, 6:46 AM

#

I am actually in the process™️ of writing a pathfinding library in C (for Python)

past meteor Jun 26, 2024, 6:46 AM

#

Nice, that could be considered an AI algorithm in the right context 😄

#

https://onderwijsaanbod.kuleuven.be/syllabi/e/H02C1AE.htm#activetab=doelstellingen_idp2287408 Machine Learning and Inductive Inference

[...] the domain of machine learning, which concerns techniques to build software that can learn how to perform a certain task (or improve its performance on it) by studying examples of how it has been accomplished previously, and in a broader sense the discovery of knowledge from observations (inductive inference).

past meteor Jun 26, 2024, 6:48 AM

#

past meteor <https://onderwijsaanbod.kuleuven.be/syllabi/e/H02C1AE.htm#activetab=doelstellin...

The course was overly academic and had longer more precise definitions of ML in the syllabus but you can see how conditionals don't pass the litmus test of "machine learning" but decision trees do

spring field Jun 26, 2024, 6:50 AM

#

I think I see, yes

iron basalt Jun 26, 2024, 6:52 AM

#

I recommend AAA instead of AI when you want to describe something that most people would probably imagine to be "AI." Autonomous Adaptive Agents. Autonomous: no human intervention, it can operate/survive on its own in either the real world or a virtual world (e.g. a game). Adaptive: it learns / adjusts to achieve what it needs to, constantly trying to improve. Agent: agent as in game theory agent, an "entity that always aims to perform optimal actions based on given premises and information." Note that it must take actions with consequences, it can't just classify stuff or something like that (without making use of that classification for an action). Most things being advertised as "AI" do not fall under this definition, they don't have the required design goals. They are just tools. The end goal of what is being made is pretty important.

past meteor Jun 26, 2024, 6:53 AM

#

Then you immediately get into the strong/weak AI debate

#

this works for strong AI but not for weak/narrow

iron basalt Jun 26, 2024, 6:54 AM

#

You can go off of feeling with this, when people imagine "AI" they think of something like feels like an animal, not a tool. It operates on its own, achieving its goals.

#

(Often a human specifically because we lack creativity it seems (in writing and such))

#

The distinction is in what you are trying to make, not what we currently have. OpenAI is not trying to make AI, it wants to make a tool that can replace certain jobs (in theory, it won't, it's a scam).

past meteor Jun 26, 2024, 6:57 AM

#

That's a different discussion altogether

#

Personally I don't really care about making my own definitions of AI

#

I'm just applying the already existing conventions from important literature out there

iron basalt Jun 26, 2024, 6:58 AM

#

I just want people to not use AI when they are not really making AI (it's not their goal), it just obfuscates what they are making. It's like saying "i'm selling you a thing." "Have you not heard, everybody wants thing these days!"

past meteor Jun 26, 2024, 6:59 AM

#

they are not making AI based on your specific definition

iron basalt Jun 26, 2024, 6:59 AM

#

But then what are they making?

#

Their definition is that it's what they are making, which is also now in conflict with other companies saying "it's what we are making." Depsite being very different things.

past meteor Jun 26, 2024, 7:00 AM

#

Norvig's book on AI is as good as it gets as a reference right? Especially since it predates all of the hype https://en.wikipedia.org/wiki/Artificial_Intelligence:_A_Modern_Approach

#

All I'm saying is, go in there. Find the definition they use of AI, apply it to whatever OpenAI is making and then the logical conclusion is "they're making AI"

#

If you were Peter Norvig and you wrote this book in 95 and wrote your definition of AI in this book I'd agree with you. That's how much I care about this discussion (close to zero). It's just about applying convention for me. Without covention, if everyone has their own definitions it is impossible to have a discussion

iron basalt Jun 26, 2024, 7:11 AM

#

past meteor If you were Peter Norvig and you wrote this book in 95 and wrote your definition...

Yeah, my definition it is then. In this book, we adopt the view that intelligence is concerned mainly with rational action. Ideally, an intelligent agent takes the best possible action in a situation. We study the problem of building agents that are intelligent in this sense. - Norvig, page 30.

#

He brings up the other defintions and lands on this one.

#

Throughout history this has always been the goal, agents, this current use of "AI" is recent, and meaningless.

#

The current common "AI" may be part of actual AI, but it's not on its own, because that is not the end goal.

#

Probably the largest distinction being whether it's autonomous and an agent.

past meteor Jun 26, 2024, 7:17 AM

#

iron basalt Yeah, my definition it is then. ```In this book, we adopt the view that intellig...

This is vague enough that an "agent" and a "situation" could be simply deciding if someone gets a loan or not

#

I appreciate the effort for going into the book

iron basalt Jun 26, 2024, 7:17 AM

#

past meteor This is vague enough that an "agent" and a "situation" could be simply deciding ...

That counts. If it takes actions autonomously.

#

It does not need to be very good at it either.

past meteor Jun 26, 2024, 7:18 AM

#

So if you make an endpoint with your model on it and it's part of the loan application process it counts?

iron basalt Jun 26, 2024, 7:19 AM

#

past meteor So if you make an endpoint with your model on it and it's part of the loan appli...

Yup. If it's an agent that tries to make the best decision. This is a game it exists in (a game as in game theory).

#

And my additional "autonomous" means that it does not require a human to approve the actions.

#

I also have adaptive in mine too, so it needs to keep learning.

#

But without those two added things, still much closer to what I consider to be an ok definition of "AI."

#

Norvig's is solid.

#

It's the same basically, game theory, agents.

#

Which is why the definition makes more sense in a sense, since in biology we are always taking actions and such as an agent.

#

There is no classifier there that just does that.

#

I would not consider it AI, but that does not make it useless or whatever, very much the opposite, it makes sense to make tools, not a whole rational agent.

#

Yeah, then there are safety issues.

#

Although the tools being made are destructive for other reasons.

past meteor Jun 26, 2024, 7:28 AM

#

ML is a safety hazard as is unless we massively constrain it for medium risk tasks

iron basalt Jun 26, 2024, 7:28 AM

#

But not like, "wow I have this tank that goes around on its own and tries to deal as much damage as possible and even refuels itself, etc" levels of potentional destruction.

#

Although who knows, maybe unraveling society via spam and such actually is worse...

past meteor Jun 26, 2024, 7:30 AM

#

We have our world which we condense into an optimization problem that the algorithm needs to minimize. Big alignment problem. Also worse considering it tries to learn "the easy way out" (overfitting)

#

Classic example is if you naively train an ML model to reduce the amount of people with disease X to 0 you as the implementer think you're proding it to find a cure but the algorithm probably will arrive at "eliminate all those people"

iron basalt Jun 26, 2024, 7:33 AM

#

So in conclusion, I don't like the current use of "AI" and I don't think it's something they even want to make. And selling it as such on everything does not even make sense, because it's selling "thing" instead of "useful tool." (just be forward about what it's trying to be, it's fine, I prefer you did not try to make AI, but useful tools)

iron basalt Jun 26, 2024, 7:34 AM

#

iron basalt So in conclusion, I don't like the current use of "AI" and I don't think it's so...

(not even in the loan case for example, because I don't like that either (it must be human approved))

#

In some ways this can be even worse, as in cases similar to the loan example you brought up, because it being worse might make it more destructive. The key is whether or not it can take actions on its own. Not SOTA can still be AI.

#

If we regulate AI based on it being SOTA as the definition, it does not fix the actual problem.

#

(Which is why the compute limit stuff coming up now is nonsense (just monopoly stuff))

#

(What we also really want to target is all this spam that is still allowed, things that can take actions that can ruin people's lives (autonomously), e.g. the youtube algorithm just auto demonetizing everything you have and/or banning you (there are worse agents being used already, but this is a less gruesome example so I used this one))

past meteor Jun 26, 2024, 8:18 AM

#

iron basalt In some ways this can be even worse, as in cases similar to the loan example you...

Yeah this makes a lot of sense. The AI act will solve this all /s

unkempt apex Jun 26, 2024, 11:00 AM

#

hey @rich moth
need help ,.
just review my RL code

#

and tell do I need some improvements in it or not!

lapis sequoia Jun 26, 2024, 12:12 PM

#

Any good technical podcasts on machine learning and neural networks

#

Like discussing new optimizers or architectures etc

#

Adan came out in 2022 has anyone made anything better

#

I don't even know where to look for it

agile cobalt Jun 26, 2024, 12:24 PM

#

lapis sequoia Adan came out in 2022 has anyone made anything better

if you mean the Adam optimizer, it is much older than 2022? https://arxiv.org/pdf/1412.6980

vernal valve Jun 26, 2024, 12:30 PM

#

Anyone know how to run ollama downloaded models on vllm?

#

Or does vllm only accept huggingface models

lapis sequoia Jun 26, 2024, 12:31 PM

#

agile cobalt if you mean the Adam optimizer, it is **much** older than 2022? https://arxiv.or...

i mean adan which is adaptive nesterov momentum

#

https://arxiv.org/abs/2208.06677

#

its the last i can find that is sufficiently different from adam and works good

shadow veldt Jun 26, 2024, 1:07 PM

#

So I have a Keras image classification model, and i was wondering if instead of training it overall for new classes. I can perhaps fit new classes using transfer learning? If so, can someone refer me to some docs of some kind. Mucho Gracias.

hexed crest Jun 26, 2024, 1:24 PM

#

My excel dataframe is giving me a headache since it is changing my input so the format is not the same in all cells for time

#

my format is supposed to be hour : minute : seconds

#

but when some of the cells remove the seconds automatically

#

as you can see the input is up in the formula field but it does not match what is displayed in the cell?

agile cobalt Jun 26, 2024, 1:27 PM

#

step 1: Do not use Excel

#

that should be configurable under Home -> Number though, just change the format

hexed crest Jun 26, 2024, 1:35 PM

#

agile cobalt step 1: Do not use Excel

why not? how would I finish my project otherwise? if the people that will use it downloads the data as either csv or xlsx?

hexed crest Jun 26, 2024, 1:38 PM

#

agile cobalt that should be configurable under `Home -> Number` though, just change the forma...

this is just a dummy dataframe

spring field Jun 26, 2024, 1:50 PM

#

hexed crest why not? how would I finish my project otherwise? if the people that will use it...

what is your project?

hexed crest Jun 26, 2024, 1:58 PM

#

spring field what is your project?

It is a project to help some colleagues do calculations for heat treatment of steel

worldly wagon Jun 26, 2024, 2:03 PM

#

not a big or important question but just came to mind if anyone knows (i'm conducting my own research on the side currently)

are there faster ways to read a csv than pandas built in read_csv method?

agile cobalt Jun 26, 2024, 2:07 PM

#

worldly wagon not a big or important question but just came to mind if anyone knows (i'm condu...

you can try other libraries like polars

narrow tiger Jun 26, 2024, 2:09 PM

#

worldly wagon not a big or important question but just came to mind if anyone knows (i'm condu...

yes u convert it to a better format

worldly wagon Jun 26, 2024, 2:09 PM

#

agile cobalt you can try other libraries like `polars`

currently trying that actually however i'm not sure how compatable it is with dataframes

worldly wagon Jun 26, 2024, 2:10 PM

#

narrow tiger yes u convert it to a better format

any examples?

agile cobalt Jun 26, 2024, 2:10 PM

#

worldly wagon any examples?

parquet

serene scaffold Jun 26, 2024, 2:10 PM

#

worldly wagon not a big or important question but just came to mind if anyone knows (i'm condu...

why is pandas read_csv not fast enough?
polars is just another library for dataframes.

carmine badge Jun 26, 2024, 2:10 PM

#

Hi, can somebody help me? Why pytesseract returns nothing?

def get_table():
    pytesseract.pytesseract.tesseract_cmd = r'E:\Program Files\Tesseract\tesseract.exe'

    image = pyautogui.screenshot(region=(1515, 190, 810, 810))
    image.save('screenshot.png')

    path = 'screenshot.png'

    image = cv2.imread(path)

    cv2.imwrite('original_screenshot.png', image)

    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = cv2.adaptiveThreshold(image, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                                cv2.THRESH_BINARY_INV, 1603, -80)
    image = cv2.bitwise_not(image)

    cv2.imwrite('screenshot.png', image)

    return pytesseract.image_to_string(Image.open('screenshot.png'))

print(get_table())```

narrow tiger Jun 26, 2024, 2:10 PM

#

worldly wagon any examples?

yeah parquuete as etrotta said

#

i remember it reduced loading time exponentially for me

worldly wagon Jun 26, 2024, 2:12 PM

#

serene scaffold why is pandas read_csv not fast enough? polars is just another library for dataf...

Its about a few hundred thousand rows per day multiplied by the amount of days in a quarter pandas is doing okay but i've been optimizing the processing time

hexed crest Jun 26, 2024, 2:13 PM

#

worldly wagon Its about a few hundred thousand rows per day multiplied by the amount of days i...

You should be fine until we talk above a million rows I believe

narrow tiger Jun 26, 2024, 2:13 PM

#

hexed crest You should be fine until we talk above a million rows I believe

100k * 100 is 10mil

hexed crest Jun 26, 2024, 2:14 PM

#

narrow tiger 100k * 100 is 10mil

math✅

serene scaffold Jun 26, 2024, 2:14 PM

#

worldly wagon Its about a few hundred thousand rows per day multiplied by the amount of days i...

and you're storing all of this in a CSV?
you could be storing it in SQL and using an SQL query to retrieve just the rows that you need into a dataframe.

worldly wagon Jun 26, 2024, 2:14 PM

#

sorry if i didnt explain that well

worldly wagon Jun 26, 2024, 2:15 PM

#

serene scaffold and you're storing all of this in a CSV? you could be storing it in SQL and usin...

basically a RPC is made from the database that gives us xlxs which are converted to csv(s) i'm new-ish to the project so i'm kinda discerning everything before i push decisions for a refactor

narrow tiger Jun 26, 2024, 2:16 PM

#

worldly wagon basically a RPC is made from the database that gives us xlxs which are converted...

what are you using rpc for?

#

rpc returns xlxs?

worldly wagon Jun 26, 2024, 2:17 PM

#

narrow tiger what are you using rpc for?

yea something like that, the project is detached from the database if that makes principle of least privilege

#

I'm explaining certain things poorly i feel like 🤔 but i basically process stock transactions and do visualizations on it

narrow tiger Jun 26, 2024, 2:20 PM

#

so rpc returns some db query results in xlxs? and then u need to visualize them

#

this is weird design unless you don't own the db and using 3rd party servies

worldly wagon Jun 26, 2024, 2:22 PM

#

narrow tiger so rpc returns some db query results in xlxs? and then u need to visualize them

yea but i have all that handled lol it would be nice to get processing speed up a bit tho, got it down from abt 3mins to 3-5 seconds however most of the currently processing time lands within the read csv

worldly wagon Jun 26, 2024, 2:22 PM

#

narrow tiger this is weird design unless you don't own the db and using 3rd party servies

i work at my countries Stock Exchange, we don't own the database thats a horrible decision for this kind of work well not horrible just not safe?

narrow tiger Jun 26, 2024, 2:22 PM

#

u can use parquet as soon as the csv file is generated. try to clean it as much as possible.
you can also try yielding the rows if you are doing backtesting or something similar

left tartan Jun 26, 2024, 2:23 PM

#

worldly wagon Its about a few hundred thousand rows per day multiplied by the amount of days i...

Pandas is generally worst answer for these problems: it's good for prototyping, but doesnt scale well. The first thing to consider is a partitioning strategy: parquet is a good choice for daily or weeklies, along with roll ups so you don't need to query the underlying records continuously

narrow tiger Jun 26, 2024, 2:23 PM

#

worldly wagon yea but i have all that handled lol it would be nice to get processing speed up ...

3-5 sec for whole quarter's data is fine i think u are using python after all, maybe some other language can lower it

left tartan Jun 26, 2024, 2:24 PM

#

There are a number of time series db's that are optimized for tick level analysis, but that's a different scale of the problem

left tartan Jun 26, 2024, 2:24 PM

#

worldly wagon i work at my countries Stock Exchange, we don't own the database thats a horribl...

I don't 'own' the db for my data, but I do work with it using various engines (lately? Mostly DuckDB)

worldly wagon Jun 26, 2024, 2:25 PM

#

left tartan Pandas is generally worst answer for these problems: it's good for prototyping, ...

yea i'm new to this project most of my knowledge is formal education based in stats/CS, i didn't make the decision of pandas the guy precursor to me did

left tartan Jun 26, 2024, 2:25 PM

#

KDB is the big boy database in this space

left tartan Jun 26, 2024, 2:25 PM

#

worldly wagon yea i'm new to this project most of my knowledge is formal education based in st...

Half my life is replacing pandas code. Its good for rapid dev, but hits a wall quickly (both performance and complexity, imo)

narrow tiger Jun 26, 2024, 2:26 PM

#

worldly wagon yea i'm new to this project most of my knowledge is formal education based in st...

well anyways Mr.carrot if you come across some cool trading strats do ping me. i love working on those, or some real insider quantitative edge

worldly wagon Jun 26, 2024, 2:26 PM

#

narrow tiger 3-5 sec for whole quarter's data is fine i think u are using python after all, m...

yea i think where i'm at isnt bad but if i can make small optimizations with notable impacts thats always nice, like i was able to move some loops into numpy/pandas built ins

worldly wagon Jun 26, 2024, 2:27 PM

#

left tartan Half my life is replacing pandas code. Its good for rapid dev, but hits a wall q...

😭 this is a little funny but yea i could see that

left tartan Jun 26, 2024, 2:27 PM

#

worldly wagon yea i think where i'm at isnt bad but if i can make small optimizations with not...

Experiment with DuckDB, if you know Sql. It'll let you write sql directly against in memory dataframes. No config needed

worldly wagon Jun 26, 2024, 2:28 PM

#

left tartan Experiment with DuckDB, if you know Sql. It'll let you write sql directly agains...

yea i do know sql i have a formal(university/internships) background just not the deep developer knowledge that alot of people may have I'll check it out

left tartan Jun 26, 2024, 2:29 PM

#

Polars is the usual answer tho, for growing out of Pandas, but requires a bit of a commitment otherwise you'll end up with a confusing code base

worldly wagon Jun 26, 2024, 2:30 PM

#

narrow tiger well anyways Mr.carrot if you come across some cool trading strats do ping me. i...

lol if i do, i personally follow basic index funds strategies for personal life

worldly wagon Jun 26, 2024, 2:31 PM

#

left tartan Polars is the usual answer tho, for growing out of Pandas, but requires a bit of...

makes sense and i appreciate the advice i got today

narrow tiger Jun 26, 2024, 2:36 PM

#

worldly wagon lol if i do, i personally follow basic index funds strategies for personal life

try to see what those Market Makers are doin ducky_devil

#

you have great job btw GL

worldly wagon Jun 26, 2024, 2:38 PM

#

narrow tiger you have great job btw GL

yea i agree and appreciate it 🙏 , i'm from the caribbean tho so it isnt as notable as other stock exchanges such as NY or swiss
but a good way to enter the industry after uni

brave sand Jun 26, 2024, 2:43 PM

#

how do I remove outliers in data?

unkempt apex Jun 26, 2024, 2:47 PM

#

brave sand how do I remove outliers in data?

https://www.analyticsvidhya.com/blog/2021/05/feature-engineering-how-to-detect-and-remove-outliers-with-python-code/
scroll down a bit, andyou will find a code!!

Analytics Vidhya

CHIRAG GOYAL

Outlier Detection & Removal | How to Detect & Remove Outliers (Upda...

Learn about outliers, their types, outlier detection methods, and treatment techniques like trimming, capping, and discretization. Read Now!

runic parcel Jun 26, 2024, 2:51 PM

#

i want to make my own llm in which the model will have all the data of eg: products. And the user will give a prompt like "i want to have a pink shoes with laces", so from all the product the model will show the approiate one asper the users prompt. how can i make something like this?

unkempt apex Jun 26, 2024, 2:52 PM

#

runic parcel i want to make my own llm in which the model will have all the data of eg: produ...

first of all what you know about llm?

#

don't reply with full form!!

runic parcel Jun 26, 2024, 2:54 PM

#

unkempt apex first of all what you know about llm?

nth much, but like a model with will learn form the knowledge base data and give answers by understanding from it. (like there are 2: fine tuned and knowlege base)

unkempt apex Jun 26, 2024, 2:55 PM

#

runic parcel nth much, but like a model with will learn form the knowledge base data and give...

what about ML?

runic parcel Jun 26, 2024, 2:56 PM

#

unkempt apex what about ML?

reading the data from the datasets and making predictions asper it, like supervised, unsupervised or refo

unkempt apex Jun 26, 2024, 2:57 PM

#

no, no , have u practiced ML?

runic parcel Jun 26, 2024, 2:57 PM

#

predicting patters and graphs

#

stuff

unkempt apex Jun 26, 2024, 2:57 PM

#

don't give def. sry about that!

runic parcel Jun 26, 2024, 2:57 PM

#

unkempt apex no, no , have u practiced ML?

yes i have

#

i did it

#

making graphs, predicting, clustering and stuff

unkempt apex Jun 26, 2024, 2:58 PM

#

wait , experts will give you some suggestion!

#

I hope so!

sage sparrow Jun 26, 2024, 3:32 PM

#

Quick question; What level of correlation would be considered extreme/too high? To avoid multicollinearity

past meteor Jun 26, 2024, 3:48 PM

#

left tartan Polars is the usual answer tho, for growing out of Pandas, but requires a bit of...

How does your duckdb workflow look like? Do you use DBT?

#

How do you connect to external sources? Do you use something like meltano or just regular python

lapis sequoia Jun 26, 2024, 4:05 PM

#

ok, when building a nn of some sort, if some parameter is optimized, do you never change it no matter what is add to the model?

brave sand Jun 26, 2024, 4:22 PM

#

if I have a multiple linear lines that represent the weight vs price of different how can I combine them for a multivariate function that give supply and demand?

past meteor Jun 26, 2024, 4:29 PM

#

brave sand if I have a multiple linear lines that represent the weight vs price of differen...

There's something that is literally called "multiple linear regression". It's just https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html which I sent yesterday 🙂

brave sand Jun 26, 2024, 4:29 PM

#

past meteor There's something that is literally called "multiple linear regression". It's ju...

welp, i guess that is karma then

#

i don't quite get this:

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])

# y = 1 * x_0 + 2 * x_1 + 3```

past meteor Jun 26, 2024, 4:30 PM

#

check out the link please

#

and look at the examples

#

The answers are there, right in front of you

brave sand Jun 26, 2024, 4:30 PM

#

why does that represent that linear equ?

left tartan Jun 26, 2024, 4:42 PM

#

past meteor How does your duckdb workflow look like? Do you use DBT?

Depends on project, but increasingly dbt (but some custom). Most data sources end up being something custom: Python lambdas to transcode data to parquet, and some api hooks. One thing we've done is build a sql overlay to use vendor APIs and various inference/etc directly from sql. Allows us to keep all our data transformations external from
the code..

past meteor Jun 26, 2024, 4:44 PM

#

left tartan Depends on project, but increasingly dbt (but some custom). Most data sources en...

This is very interesting, is this because SQL is the common denominator? Many analysts that don't know Python or is it truly a workflow your organisation believes is better?

#

I don't mean this in a snarky way, I'm really very curious

left tartan Jun 26, 2024, 4:51 PM

#

past meteor I don't mean this in a snarky way, I'm really very curious

It's somewhat my personal philosophy, a bit of ease of training (I only need to educate analysts in one workflow), a bit of separation of concerns (code is infrastructure, sql is data transformations), but my favorite rationale is locality of behavior: it keeps business logic near each other "The behaviour of a unit of code should be as obvious as possible by looking only at that unit of code."

#

Doesn't mean we don't bookend with Python, but we can get pretty far in sql alone

past meteor Jun 26, 2024, 5:00 PM

#

left tartan It's somewhat my personal philosophy, a bit of ease of training (I only need to ...

I like this. I don't think I'd organize myself like this if I were leading a team but I'd never protest if I'd be in a team that does this

left tartan Jun 26, 2024, 5:03 PM

#

past meteor I like this. I don't think I'd organize myself like this if I were leading a tea...

Well, that's the difference too: we're a service provider/integrator: we're enabling our customer's analysts, rather than being direct consumers.

past meteor Jun 26, 2024, 5:03 PM

#

I see, that's a big big distinction

#

As you know I'm a big polars fan. My project is basically done and if I could go back and change something I'd have used DuckDB (+ DBT). That's part of the reason for my curiosuity

left tartan Jun 26, 2024, 5:06 PM

#

I would probably make bigger use of polars if I were the consumer

past meteor Jun 26, 2024, 5:06 PM

#

My main collaborator doesn't know Python. Keeping it in Python as opposed to SQL or R, both of which he knows, was a deliberate strategy initially. His code isn't the cleanest and I wanted to insulate myself from it as much as possible.

In hindsight, I think having it in SQL (and telling him not to touch it) would've been better because at least he could've reviewed the code

#

imo the beauty of SQL is mostly that everyone knows it yeah

past meteor Jun 26, 2024, 5:08 PM

#

past meteor My main collaborator doesn't know Python. Keeping it in Python as opposed to SQL...

I think this scenario is actually quite common in teams where you have overconfident data analysts / scientists with poor engineering standards / code hygiene

#

Hindsight is 20/20 but I should've dealt with it better

left tartan Jun 26, 2024, 5:09 PM

#

The sql of today, especially in the OLAP world, is soooo nice. I spent many years without cte's, windows, etc. beyond that, the pace of innovation right now is awesome: integration with parquet, Python udf's, delta lakes, etc. I -love- the dynamic column stuff: https://duckdb.org/2024/03/01/sql-gymnastics.html

#

(Other OLAP platforms are doing similar things... it's a new era on the data side)

past meteor Jun 26, 2024, 5:10 PM

#

dynamic columns are new for me. Is it having a json(b) column?

#

I'm young enough to never have been in a situation without cte's, window etc.

left tartan Jun 26, 2024, 5:11 PM

#

past meteor dynamic columns are new for me. Is it having a json(b) column?

Has those too, but you can reference columns dynamically. Like: max(columns('.*score')) to create an aggregate of all columns that end in score

#

(Simple example)

past meteor Jun 26, 2024, 5:12 PM

#

oh that's cool

left tartan Jun 26, 2024, 5:12 PM

#

Plus pivot/unpivot

past meteor Jun 26, 2024, 5:12 PM

#

Those I know

#

The last time I did SQL heavy work was 2021 iirc

#

So I'm behind, but not too much

left tartan Jun 26, 2024, 5:12 PM

#

Yah, the problem is just how fast they're introducing features. This stuff is DuckDb specific, so I try to use it sparingly

#

I try to make sure I'm doing things that are clickhouse or snowflake compatible, generally speaking

past meteor Jun 26, 2024, 5:13 PM

#

ANSI wise or jjust feature set wise?

#

As in, it's not ANSI but all OLAPs support it so it's fair game?

left tartan Jun 26, 2024, 5:36 PM

#

past meteor As in, it's not ANSI but all OLAPs support it so it's fair game?

Some is, some isn't

#

I try to stick to: all olaps support it (or similar)

deep sleet Jun 26, 2024, 6:03 PM

#

as the number of estimators in a random forest increases , it reduces overfitting right?

#

but at the same time you won't the number to be as low as possible to increase efficiency

unkempt apex Jun 26, 2024, 6:48 PM

#

unkempt apex wait , experts will give you some suggestion!

hey lisan help this guy first! about LLM

#

yeah come on !!, just help him!

#

look at my skyscrapers named as losses

#

wait lemme scroll!

unkempt apex Jun 26, 2024, 6:51 PM

#

runic parcel i want to make my own llm in which the model will have all the data of eg: produ...

yeah this!!

#

reading research paper!

#

yeah okay!

#

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.loglog.html
???

#

send the docs!

#

😂

#

and I never read that !!

#

https://www.geeksforgeeks.org/matplotlib-pyplot-loglog-function-in-python/

what are this inputs?

GeeksforGeeks

Matplotlib.pyplot.loglog() function in Python - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

#

what is ax thing?

tidal bough Jun 26, 2024, 7:02 PM

#

can just do plt.yscale("log")

unkempt apex Jun 26, 2024, 7:03 PM

#

tidal bough can just do `plt.yscale("log")`

just this? woah!

#

the plt.xticks, uses continuous values like
[1, 2, 3,4 ]
but I have discrete value like 100
so need to convert it into 1 to 100

I used range(1, 100) but doesn't work

#

like this

plt.xticks(range(1, TOTAL_NUM_EPISODES), labels=None)```

#

but I want my x -axis as number of episodes!

unkempt apex Jun 26, 2024, 7:06 PM

#

unkempt apex look at my skyscrapers named as losses

this was the output of "let it decide"

#

#

now what's this?

#

angstrorm?

#

yeah searched that!

#

noise? in RL?

#

wdym?
it's just losses!! after training

#

that Pong game!

#

just implemented replay_buffers

#

hey I have a simple python logic question!

#

for i in batch:
        states, actions, rewards, next_states, done = zip(*i)

#

so in this batch we have 32 samples from whole buffer ( experience of model)

#

and now we have to add all the 32 samples from batch to s, a, r, ns, d

#

ignore why I did for loop on batch, it just because I am messing around appending wrongly!

#

what is untyped?

#

I am convering those later into tensors

#

just samples from whole buffer

#

yeah okay!

#

wait I have done big mistake!!

unkempt wigeon Jun 26, 2024, 7:16 PM

#

Is there a specific amount of time that needs to take to train a neural network would it take a month if it was on something very specific or do I have this all not under what's the truth my apologies

unkempt apex Jun 26, 2024, 7:16 PM

#

month??
do you have GPU>

unkempt wigeon Jun 26, 2024, 7:18 PM

#

I think so maybe I don't really know the maintenance of computers I'm trying to code to learn them better but that might take a couple of years to fully understand something basic

unkempt apex Jun 26, 2024, 7:20 PM

#

you sounds like demotivated now!

#

do you have a GPU?

unkempt wigeon Jun 26, 2024, 7:21 PM

#

No I know it's going to take a couple of years to fully understand a few bits of the subject as python code can become very complex and I'm writing everything and putting it into a notebook like a little cheat sheet and I can easily remember if I'm having a problem I try to listen to python but sometimes I get into an infinite complaint loop where it complains that I didn't do something right and I listen to it I think so I'll have to check with my computer in a few minutes

past meteor Jun 26, 2024, 7:22 PM

#

deep sleet as the number of estimators in a random forest increases , it reduces overfittin...

Kind of true, it lowers the risk of overfitting

unkempt wigeon Jun 26, 2024, 7:23 PM

#

How long does it usually take for a neural network to learn I know it depends on multitude of factors but if I gave it something simple like trying to learn color how long would that take trying to weigh it where it it fully understands and comprehends each color with heavy weights on all Network pieces

unkempt apex Jun 26, 2024, 7:25 PM

#

unkempt wigeon How long does it usually take for a neural network to learn I know it depends on...

oh mann,. what you are doing currently?

unkempt wigeon Jun 26, 2024, 7:25 PM

#

unkempt apex you sounds like demotivated now!

I'm not demotivated it's just I'm curious on how long it would take to teach on network and if it takes a month I don't mind cuz I want it to be strong teach it a color to the point where if you gave it a different color it would understand it's not that color

#

And would be teaching at colors be complex just so I understand this a little better

unkempt apex Jun 26, 2024, 7:26 PM

#

yeah!

unkempt wigeon Jun 26, 2024, 7:28 PM

#

Like if I put David access to a camera that was plugged into the computer if it were to be set on a specific color like red it would print out red or speak it out using text to speech option

deep sleet Jun 26, 2024, 7:28 PM

#

past meteor Kind of true, it lowers the risk of overfitting

Ty

unkempt wigeon Jun 26, 2024, 7:28 PM

#

Like if I gave it the phone color it would print out what the image is or what color is

#

green

#

kNN

unkempt apex Jun 26, 2024, 7:30 PM

#

k nearest neighbour?

tidal bough Jun 26, 2024, 7:31 PM

#

k-nearest-neighbours is kind of overkill for matching to one of, what, at most a few thousand colors? :p

brave sand Jun 26, 2024, 7:31 PM

#

hi reptile

unkempt apex Jun 26, 2024, 7:32 PM

#

so there is a logical error

train.py -> https://www.pythonmorsels.com/p/26whg/ at line 21 and 119
buffer.py -> https://www.pythonmorsels.com/p/35wjn/

as I have told the structure of "batch"

26whg - Python Pastebin - Python Morsels

A free Python-oriented pastebin service for sharing Python code snippets with anyone

35wjn - Python Pastebin - Python Morsels

A free Python-oriented pastebin service for sharing Python code snippets with anyone

left tartan Jun 26, 2024, 7:33 PM

#

Yah, that's always the basic problem/opportunity with data: partitioning. If you can organize/partition the data in a manner the aligns with the access pattern, things are good. But if not, you end up with random access which is performance hell (and usually ends in a full scan)

brave sand Jun 26, 2024, 7:34 PM

#

left tartan Yah, that's always the basic problem/opportunity with data: partitioning. If you...

can you help me understand regression?

unkempt apex Jun 26, 2024, 7:34 PM

#

brave sand can you help me understand regression?

wait there is a best blog on that!

#

https://www.datacamp.com/tutorial/essentials-linear-regression-python

Essentials of Linear Regression in Python

Learn what formulates a regression problem and how a linear regression algorithm works in Python.

#

https://youtu.be/7ArmBVF2dCs?feature=shared

YouTube

StatQuest with Josh Starmer

Linear Regression, Clearly Explained!!!

The concepts behind linear regression, fitting a line to data with least squares and R-squared, are pretty darn simple, so let's get down to it! NOTE: This StatQuest comes with a companion video for how to do linear regression in R: https://youtu.be/u1cc1r_Y7M0
You can also find example code at the StatQuest github: https://github.com/StatQuest/...

▶ Play video

unkempt apex Jun 26, 2024, 7:36 PM

#

unkempt apex so there is a logical error train.py -> https://www.pythonmorsels.com/p/26whg/...

so I guess I am messing around appending tuples , don't know but in batch variable another list is being added which I don't wanted

brave sand Jun 26, 2024, 7:37 PM

#

i do not get how to implement it though for my scenario

unkempt apex Jun 26, 2024, 7:38 PM

#

brave sand i do not get how to implement it though for my scenario

it depends on your data then, have you plot that?

brave sand Jun 26, 2024, 7:38 PM

#

i have

unkempt apex Jun 26, 2024, 7:38 PM

#

show !

brave sand Jun 26, 2024, 7:38 PM

#

i have multiple datasets and i got the line of best fit, i want to combine all of those lines into a single function/line though

unkempt apex Jun 26, 2024, 7:40 PM

#

what if you add all those data into one1

magic dune Jun 26, 2024, 7:40 PM

#

hi

unkempt apex Jun 26, 2024, 7:40 PM

#

hey @final kiln
are you reading that code ? of batch

left tartan Jun 26, 2024, 7:40 PM

#

brave sand i have multiple datasets and i got the line of best fit, i want to combine all o...

That statquest video is pretty good, it's kinda hard to go through this topic without just going through the underlying math: the math isn't hard, it's just tedious (which is why we use nice libraries like sklearn)

brave sand Jun 26, 2024, 7:41 PM

#

left tartan That statquest video is pretty good, it's kinda hard to go through this topic wi...

can you help me implement sklearn for my case then 😅

#

more focused on the application

unkempt apex Jun 26, 2024, 7:41 PM

#

brave sand can you help me implement sklearn for my case then 😅

that blog already implemented that

brave sand Jun 26, 2024, 7:42 PM

#

i have the coefficients:

    p = np.polyfit(arrival_kg, min_rs_per_kg, 1)
    print("parameters (slope, intercept):", p)```

#

can I shove this into a sklearn function and get a result?

magic dune Jun 26, 2024, 7:42 PM

#

left tartan That statquest video is pretty good, it's kinda hard to go through this topic wi...

this

linear regression is one of the underlying algos in neural networks so it is good to understand well

brave sand Jun 26, 2024, 7:42 PM

#

does the weighted average not work here?

past meteor Jun 26, 2024, 7:42 PM

#

brave sand can I shove this into a sklearn function and get a result?

I already sent you the sklearn link twice. I and others will be less inclined to help if we don't see effort on your side (e.g., reading the docs/links that are sent to you)

left tartan Jun 26, 2024, 7:42 PM

#

brave sand can I shove this into a sklearn function and get a result?

That's very confusing language. Maybe try a basic linear regression example, with, say Iris dataset.

magic dune Jun 26, 2024, 7:43 PM

#

left tartan That's very confusing language. Maybe try a basic linear regression example, wit...

this

unkempt apex Jun 26, 2024, 7:43 PM

#

yeah , please take a look at that, I am just confused about appending!

left tartan Jun 26, 2024, 7:43 PM

#

Iris is a good data set because everyone knows it. And if you can do a linear regression on Iris, you can do it on anything

brave sand Jun 26, 2024, 7:43 PM

#

past meteor I already sent you the sklearn link twice. I and others will be less inclined to...

oh i forgot to mention, i did use this:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

and got a result by combining all the datasets together which isnt correct

scikit-learn

LinearRegression

Gallery examples: Principal Component Regression vs Partial Least Squares Regression Plot individual and voting regression predictions Comparing Linear Bayesian Regressors Linear Regression Example...

unkempt apex Jun 26, 2024, 7:43 PM

#

I have given the code, do I need to explain now in short?

past meteor Jun 26, 2024, 7:44 PM

#

Doing linear regression with sklearn is literally just calling 2 functions, .fit() and .predict()

#

and there's tons of examples in the documentation

unkempt apex Jun 26, 2024, 7:44 PM

#

it says tuple!

magic dune Jun 26, 2024, 7:44 PM

#

!d sklearn.datasets.load_iris

arctic wedgeBOT Jun 26, 2024, 7:44 PM

#

sklearn.datasets.load\_iris


sklearn.datasets.load_iris(*, return_X_y=False, as_frame=False)```
Load and return the iris dataset (classification).

The iris dataset is a classic and very easy multi-class classification dataset...

unkempt apex Jun 26, 2024, 7:44 PM

#

in train.py! in updatedqn func!

#

no wait it's list

#

train.py -> https://www.pythonmorsels.com/p/26whg/ at line 21 and 119

in this code in update_dqn method the batch is list

26whg - Python Pastebin - Python Morsels

A free Python-oriented pastebin service for sharing Python code snippets with anyone

#

it's List!

deep sleet Jun 26, 2024, 7:47 PM

#

Does anyone have a good link to understand what bootstrap in random forrests are?

unkempt apex Jun 26, 2024, 7:47 PM

#

in that list we have a list and then all tuples 32!!

deep sleet Jun 26, 2024, 7:48 PM

#

I tried to understand them from several resources but I can't seem to understand the advantage of it when they explain it

past meteor Jun 26, 2024, 7:49 PM

#

deep sleet Does anyone have a good link to understand what bootstrap in random forrests are...

It's basically you have your dataset which has 5 samples: A, B, C, D, E

You train 3 trees. While training them you don't use the original dataset, you sample with replacement (literally, you take a sample and put it back) to make a new dataset.

For tree #1 you may have A A B C D for tree #2 A B C D D E, and for tree #3 A B C E E

#

It's just another way to increase randomness

#

this is bootstrapping, visually

#

replace "compute statistic X" with "train a tree"

unkempt apex Jun 26, 2024, 7:50 PM

#

https://paste.pythondiscord.com/JAGQ
whole print(batch)

deep sleet Jun 26, 2024, 7:50 PM

#

but doesn't that make it harder to actually find the relation between data?

past meteor Jun 26, 2024, 7:50 PM

#

good question

#

it does, but it's the point 😄

#

Decision trees overfit really really easily

unkempt apex Jun 26, 2024, 7:51 PM

#

yeah , because then it will make individual tuples !! for like 32 samples!!

past meteor Jun 26, 2024, 7:51 PM

#

Look at it like this:

Your dataset is a noisy sample from a distribution

#

When you're fitting a ML model you're interested in knowing the actual relation between the independent / dependent variable

#

not just that of your training set (overfitting)

#

So the observation

but doesn't that make it harder to actually find the relation between data?

is true, but it applies to the literal relationship in your training set. You absolutely don't want to replicate this. This is per definiton over fitting

#

To truly understand why this is the case you actually have to study the bias variance trade-off

deep sleet Jun 26, 2024, 7:53 PM

#

past meteor To truly understand why this is the case you actually have to study the bias var...

I will google this

past meteor Jun 26, 2024, 7:53 PM

#

Trees have very little (inductive) bias. They can fit basically everything

#

But, if you change 1 example the tree may be very different (high variance)

#

Random forest trades a tiny bit of bias for a massive reduction in variance

unkempt apex Jun 26, 2024, 7:54 PM

#

first of all in buffer ( deque ) we are just appending this experiences
([10, 180, 96, 104, 4, -4], 0, 0, [10, 175, 92, 108, 4, -4], False)which are this

and then we are creating samples (32) from whole buffer [ consider that buffer may have thousands of this experiences ]
so batch will have now 32 samples
now we have to add this 32 samples each into 5 variables which are s, a, r, ns, d
so that's why I am using zip

deep sleet Jun 26, 2024, 7:55 PM

#

past meteor So the observation > but doesn't that make it harder to actually find the rela...

So randomness help us remove stuff that just happen to seem like they are related(noise) and let us actually find the variables that are actually dependent on each other?

past meteor Jun 26, 2024, 7:55 PM

#

deep sleet So randomness help us remove stuff that just happen to seem like they are relate...

That's a helpful way to look at it for now

unkempt apex Jun 26, 2024, 7:55 PM

#

unkempt apex first of all in buffer ( deque ) we are just appending this experiences ```([10,...

and I just want to remove that []!! which is bothering me!

deep sleet Jun 26, 2024, 7:55 PM

#

past meteor Random forest trades a tiny bit of bias for a massive reduction in variance

I kinda understand this but will give it another read once I understand the concept of bias and variance

#

Tysm man!

unkempt apex Jun 26, 2024, 7:56 PM

#

yeah but here usecase is opposite, we have 5 elements which will be converted into sepearte 5 variables

deep sleet Jun 26, 2024, 7:56 PM

#

Idk what I would have done without you

past meteor Jun 26, 2024, 7:57 PM

#

no problem

#

I like answering your questions because they're good ones

unkempt apex Jun 26, 2024, 7:57 PM

#

okay lemme try atleast theN!

past meteor Jun 26, 2024, 7:57 PM

#

You premptively ask what is covered in a typical, rigorous ML class

unkempt apex Jun 26, 2024, 7:57 PM

#

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 5)

deep sleet Jun 26, 2024, 7:58 PM

#

past meteor You premptively ask what is covered in a typical, rigorous ML class

I am glad you view it this way and don't find it annoying

unkempt apex Jun 26, 2024, 7:58 PM

#

6? how?

#

yeah!!!

#

that extra [] is bothering !!

#

because while appending it is appending as list of list, nested list in short!

#

now how can we remove that stupid [], or I am doing mistake while appending?

past meteor Jun 26, 2024, 8:01 PM

#

deep sleet So randomness help us remove stuff that just happen to seem like they are relate...

This is what my old slides had to say about this btw

#

decision trees are extremely unstable

unkempt apex Jun 26, 2024, 8:02 PM

#

shit happens with that!

#

what batch[0]?

deep sleet Jun 26, 2024, 8:03 PM

#

past meteor decision trees are extremely unstable

Yeah makes sense , I been trying tuning hyperparameters on them and it really changes the results

unkempt apex Jun 26, 2024, 8:03 PM

#

not working!

#

because
batch[0] is printing this

past meteor Jun 26, 2024, 8:04 PM

#

Ah and last but not least, random forest is just bagging + not considering all variables at each split

#

So if you understand that slide, you understand RF

unkempt apex Jun 26, 2024, 8:05 PM

#

same, there also same output!

#

while returning batch

past meteor Jun 26, 2024, 8:08 PM

#

Everyone is different right, but I'd go for regular Q learning first before diving into DQN

#

regular Q learning is easy enough to implement form scratch that when you upgrade it to DQN you'll at least have confidence that you know what you're doing 🙂

unkempt apex Jun 26, 2024, 8:09 PM

#

past meteor regular Q learning is easy enough to implement form scratch that when you upgrad...

yeah you are trolling me!😂

unkempt apex Jun 26, 2024, 8:09 PM

#

past meteor Everyone is different right, but I'd go for regular Q learning first before divi...

simple Q-learning is just creating tables for q values, so it's boring!

past meteor Jun 26, 2024, 8:10 PM

#

not really

#

You can do Q learning with function approximation as well

#

The deep net is just approximating the table

#

It doesn't need to be a deep neural net, it can be any function approximator

unkempt apex Jun 26, 2024, 8:20 PM

#

yeah, but with neural net it's interesting

wispy jackal Jun 26, 2024, 8:22 PM

#

can someone who got an internship in data science msg tell me how i could get one aswell?

#

like what do i need to know/put in resume/ apply for internship

past meteor Jun 26, 2024, 8:31 PM

#

wispy jackal can someone who got an internship in data science msg tell me how i could get on...

#career-advice has all the pros you need for this question 😎

#

I just sent an email to a company I wanted to intern at sounding very motivated and they "hired" me for the internship

unkempt apex Jun 26, 2024, 8:36 PM

#

I got that working, with itertools

#

from itertools import chain

#

https://paste.pythondiscord.com/TS6A

#

this approach is also good!

#

([10, 330, 16, 184, 4, -4], 0, 0, [10, 325, 12, 188, 4, -4], False)
([10, 245, 100, 100, 4, -4], 1, 0, [10, 250, 96, 104, 4, -4], False)
([10, 180, 120, 80, 4, -4], 1, 0, [10, 185, 116, 84, 4, -4], False)
([10, 225, 276, 76, 4, 4], 0, 0, [10, 220, 272, 72, 4, 4], False)
([10, 260, 320, 120, 4, 4], 1, 0, [10, 265, 316, 116, 4, 4], False)
([10, 185, 236, 36, 4, 4], 1, 0, [10, 190, 232, 32, 4, 4], False)

#

now I got this with

 for x in batch:
        print(x[0])

#

now need to append
s, a, r, ns, d

#

now how can I append this values into 5 diff. variables?

#

no I think I am too much computing here and there, I should take a look at how it is appending!

orchid lintel Jun 26, 2024, 9:15 PM

#

Sick. I wonder how well this compares to the big proprietary ones? https://timefold.ai/blog/new-open-source-solver-python

Timefold

Optimize routing and scheduling in Python: a new open source solver...

Automate and optimize your operations scheduling in Python with Timefold AI

past meteor Jun 26, 2024, 9:21 PM

#

orchid lintel Sick. I wonder how well this compares to the big proprietary ones? https://timef...

The amount of time that has gone into CPLEX over decades makes me think it still blows stuff liek this out of the water for sufficiently large problems

#

Does not surprise me that it was made here. This is such a niche in Belgium 😅

#

A lot of my coursework was on this, fun stufff

deep sleet Jun 26, 2024, 9:21 PM

#

past meteor Ah and last but not least, random forest is just bagging + not considering all v...

Yeah I understood that when I was looking at the hyperparameters

wispy jackal Jun 26, 2024, 9:22 PM

#

past meteor I just sent an email to a company I wanted to intern at sounding very motivated ...

ahah like what did you write? and what company was it

past meteor Jun 26, 2024, 9:23 PM

#

wispy jackal ahah like what did you write? and what company was it

I just wrote that I'd been to some of their presentations and their work really interests/inspired me and that I wanted to do an internship. It's not a company you know (and if you did I wouldn't say which because it doxes me)

#

https://github.com/google/or-tools this is interesting in this space

GitHub

GitHub - google/or-tools: Google's Operations Research tools:

Google's Operations Research tools:. Contribute to google/or-tools development by creating an account on GitHub.

#

It's mostly for modelling problems, you can pick your solver "backend"

#

It's missing a lot of the things timefold has though, I see they offer metaheuristics

#

Or rather, it's exclusively metaheuristics based

#

ig that answers the question. Metaheuristics are ime slower than doing something like simplex/branch and bound if your search space is tractable, non-linear, non-convex, ...

orchid lintel Jun 26, 2024, 9:33 PM

#

I guess the real bottleneck is an open foundation to build on, like BLAS and LAPACK which are government-maintained.

main drift Jun 26, 2024, 9:37 PM

#

Um... Hi. I'm new here. Where do I go to ask for help? (When replying to me please @ me so I know you are talking to me. This is a force of habit, I'm sorry if it's an inconvenience.)

past meteor Jun 26, 2024, 9:37 PM

#

yeah, it's not only that. It's also just that the underlying algorithms are different

#

It's been too long since I looked at things like CPLEX but they do mostly standard, LP, IP, MIP, QP, ...

#

Which can be more efficient than full blown metaheuristics, if your problem allows for it

#

writing a genetic algorithm or so is a fun coding exercise btw 🙂

past meteor Jun 26, 2024, 9:40 PM

#

main drift Um... Hi. I'm new here. Where do I go to ask for help? (When replying to me plea...

hey hey, welcome. You can check out #❓｜how-to-get-help .

In general, you can ask questions in a relevant room (like here) and people will do their best to answer. My biggest tip is to ask the question straight away like:

"What libraries can I use to do linear regression." instead of "I need help" or "I need help with linear regression" 😄

deep sleet Jun 26, 2024, 9:41 PM

#

main drift Um... Hi. I'm new here. Where do I go to ask for help? (When replying to me plea...

#❓｜how-to-get-help

main drift Jun 26, 2024, 9:43 PM

#

Okay well... here goes. I started using Python recently and I'm attempting to use Rasa to build an Ai. The only issue is, it does not install completely. I'm using a virtual environment. Pip, Python, and absl-py are all the latest version. I get a HUGE Error message somewhere in the downloading process. I can put that Error Message here if that's allowed.

left tartan Jun 26, 2024, 9:46 PM

#

main drift Okay well... here goes. I started using Python recently and I'm attempting to us...

Yah, definitely open a help thread with the error message (see the link above

past meteor Jun 26, 2024, 9:46 PM

#

You can paste a lot of code like this, might be easier to share

#

!paste

arctic wedgeBOT Jun 26, 2024, 9:46 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

main drift Jun 26, 2024, 9:54 PM

#

I made a post, barely fit in the block with my paragraphs, lol.

deep sleet Jun 26, 2024, 9:54 PM

#

https://youtu.be/hDKCxebp88A?si=Bn6dKsNGpaNGUyzS, I finished the first 7 hours and a half which talked about the basics of linear , logistic regression and decision trees + random foressts with sci kit learn , Now would it be better to take a break from the course and go check projects like kaggle notebooks that utilized these models before moving on?

YouTube

freeCodeCamp.org

Machine Learning with Python and Scikit-Learn – Full Course

This course is a practical and hands-on introduction to Machine Learning with Python and Scikit-Learn for beginners with basic knowledge of Python and statistics.

It is designed and taught by Aakash N S, CEO and co-founder of Jovian. Check out their YouTube channel here: https://youtube.com/@jovianhq

We'll start with the basics of machine lear...

▶ Play video

coral field Jun 26, 2024, 11:56 PM

#

so if i have a dataset with one or two values that have a value of 0, and i want to use MAPE to evaluate the set, what alternatives do i have towards the zero values? the zero values represent like <0.1% of the complete dataset but i want to avoid trying to remove the values themselves

serene scaffold Jun 27, 2024, 12:37 AM

#

The chat bots on company websites are usually low effort shit.

Fine-tuning an interactive LLM to a specific company would probably have worse results than a RAG system that uses a generic interactive LLM.

Any options that involve LLMs will be slower and more expensive than the shitty systems.

deep sleet Jun 27, 2024, 2:21 AM

#

serene scaffold The chat bots on company websites are usually low effort shit. Fine-tuning an i...

What is a RAG system?

serene scaffold Jun 27, 2024, 2:24 AM

#

deep sleet What is a RAG system?

Retrieval augmented generation

It's basically when you ask a chat bot something, and it looks up information related to your question, and then passes both your message and the relevant information to a generative LLM. And it uses the extra information to answer the question.

deep sleet Jun 27, 2024, 2:25 AM

#

serene scaffold Retrieval augmented generation It's basically when you ask a chat bot somethin...

ohh so it gives the llm the necessary information to answer your question clearly instead of just pasting the documentation?

serene scaffold Jun 27, 2024, 2:27 AM

#

deep sleet ohh so it gives the llm the necessary information to answer your question clearl...

I mean if it's a chat bot that answers questions about libraries, "the necessary information to answer the question" might be the documentation.

#

What I mean is, for the rag system, the documentation might be the "augmented input". The output would be something generated by the LLM. It wouldn't just link you to the docs

deep sleet Jun 27, 2024, 2:29 AM

#

Noted!

#

you mentioned it would be slower and more expensive but ig how viable it is depend on the margin of each of the previous factors right?

serene scaffold Jun 27, 2024, 2:31 AM

#

deep sleet you mentioned it would be slower and more expensive but ig how viable it is depe...

the low-fi chatbots that answer questions on company websites are so profoundly unhelpful that I don't know why companies even have them. But I can't imagine that they're more computationally demanding than a generative LLM that requires a GPU.

#

If a company wants to have a chatbot on their website at all, imo, it should be LLM based. The ones most companies have are basically just a text version of phone answering bots.

deep sleet Jun 27, 2024, 2:32 AM

#

Yep

#

and it should actually be viable since it will reduce the amount of support tickets for customer service by a huge margins

#

It seems like a great idea , would like to implement it someday when I have deeper knowledge

serene scaffold Jun 27, 2024, 2:35 AM

#

deep sleet and it should actually be viable since it will reduce the amount of support tick...

My guess is that there are already startups offering RAG-based customer support bots, which are in turn making API calls to OpenAI (that the startup has to pay for, and then pass along that expense to the customer)

deep sleet Jun 27, 2024, 2:36 AM

#

serene scaffold My guess is that there are already startups offering RAG-based customer support ...

Yeah , I think I saw an ad for something similar a while ago

#

A question tho

#

Can't gpt-2 be viable for this?

#

or ig it depends on the context and required capabilities

serene scaffold Jun 27, 2024, 2:38 AM

#

you'd need to find an interaction-tuned version of GPT-2

#

ChatGPT is an interface for interaction-tuned versions of GPT-3, etc.

deep sleet Jun 27, 2024, 2:39 AM

#

ohhh

#

makes sense

#

I think there was something done by microsoft for that posted on hugging face

#

lemme search for it

serene scaffold Jun 27, 2024, 2:41 AM

#

it's all on hugging face.

#

🤗

deep sleet Jun 27, 2024, 2:42 AM

#

ah

#

Dialogpt

lapis sequoia Jun 27, 2024, 3:16 AM

#

if ill train a model
for 30 hrs on my laptop
will it f it up
gpu temp is below 65 since half hour
hour

lapis sequoia Jun 27, 2024, 3:16 AM

#

deep sleet ah

ayo how is it going dude

deep sleet Jun 27, 2024, 3:16 AM

#

lapis sequoia ayo how is it going dude

Hey man

#

Great!

#

wbu?

lapis sequoia Jun 27, 2024, 3:19 AM

#

am doing good

#

u found the thing?

#

@deep sleet

deep sleet Jun 27, 2024, 3:19 AM

#

oh the drive?

spring field Jun 27, 2024, 5:27 AM

#

what's the difference between multiple linear regression and using a fully-connected neural net? for like tabular data

wooden sail Jun 27, 2024, 6:02 AM

#

spring field what's the difference between multiple linear regression and using a fully-conne...

it's pretty much the same. some people might try and make the distinction between linear reg, multiple lin reg, and multivariate lin reg, but it's all referred to as linear regression as well

#

in lin reg you look for M and B that satisfy Y = MX + B, which you'll note is the same as the weights and biases of a dense layer

#

this is also one of the key observations made by yann lecun in a paper from like 13 years ago, pointing out that this means many algorithms that iteratively applies affine transformations followed by selection rules can be "unfolded" into a neural network that you can now explain and whose architecture is well motivated by an optimization algorithm with convergence guarantees

#

the 1 layer case being linear regression

#

https://dl.acm.org/doi/abs/10.5555/3104322.3104374 this paper eventually led into a flavor of what is now called "model-based deep learning"

Guide Proceedings

Learning fast approximations of sparse coding | Proceedings of the ...

spring field Jun 27, 2024, 6:08 AM

#

wooden sail in lin reg you look for M and B that satisfy Y = MX + B, which you'll note is th...

right, makes sense yeah

spring field Jun 27, 2024, 6:14 AM

#

wooden sail https://dl.acm.org/doi/abs/10.5555/3104322.3104374 this paper eventually led int...

how do I access the pdf pithink

wooden sail Jun 27, 2024, 6:17 AM

#

https://icml.cc/Conferences/2010/papers/449.pdf here

spring field Jun 27, 2024, 6:18 AM

#

thanks!

past meteor Jun 27, 2024, 6:45 AM

#

spring field what's the difference between multiple linear regression and using a fully-conne...

From a practical pov, you can have a non-linear relationship with the independent and dependent variables

#

You can have this with linear regression but you need to specify all of them, including all interactions a priori.

The issue with neural networks is then that because you don't have to specify a relationship a prio you could be fitting the signal and the noise.

#

Naturally this makes linear regression whitebox and an DNN on tabular data black box

#

As a modeller it's also harder to get these fully connected networks right (on tabular). A lot more knobs and dials to turn

#

I like this question because you can answer it in two totally different ways (Edd's answer and mine) and I'm not sure which one you wanted haha

spring field Jun 27, 2024, 8:27 AM

#

I have no idea which one I wanted waaaaaaaaaahhhhhh I'll take both though

past meteor Jun 27, 2024, 8:38 AM

#

spring field I have no idea which one I wanted <:waaaaaaaaaahhhhhh:1043459405183729664> I'll ...

Try doing some Kaggle competitions 😄 the ones I mentioned are low time investment

spring field Jun 27, 2024, 8:41 AM

#

alrighty

eager sundial Jun 27, 2024, 9:51 AM

#

is there any book/source where to learn how to properly clean data? I have to make some models for the uni but the datasets is a mess (high skew and kurtosis for some feature, continuos and categorical features mixed etc.). Also there's an high imbalance 45/45/10 (which I tried to solve using SMOTENC). Still, I can't get good results on prediction

hollow escarp Jun 27, 2024, 2:19 PM

#

Had anyone ever tried runing paddle ocr on rocketchips ?

#

If yes im really curios about details in how to do it because it's hard to find anything on internet

abstract mica Jun 27, 2024, 3:47 PM

#

https://github.com/waefrebeorn/KAN-Stem

GitHub

GitHub - waefrebeorn/KAN-Stem: attempt at using gpt4o to create a K...

attempt at using gpt4o to create a KAN stem training script - waefrebeorn/KAN-Stem

#

I’ll be on huggingface discord more for this since VC told me it would be better to ask there, putting this here for posterity

unkempt wigeon Jun 27, 2024, 4:01 PM

#

Making a simple neuron for a workout Network Temple and get that understanding of it

deep sleet Jun 27, 2024, 6:16 PM

#

What is a vector and tensor in the context of ml?

wooden sail Jun 27, 2024, 6:28 PM

#

deep sleet What is a vector and tensor in the context of ml?

this is a bit of a rabbit hole question, but roughly:

from the maths standpoint: a vector is an element of a vector space, and a tensor is a multilinear transformation
for ML people: any multidimensional array is a tensor, and 1d arrays are vectors. this is enough to get around the basic ML code and papers, but not the more sophisticated stuff or if you want to go in depth

river cape Jun 27, 2024, 6:36 PM

#

why do we need minima when we are already differentiating the loss function with the weight and bias in backpropogation?

wooden sail Jun 27, 2024, 6:39 PM

#

you got that backwards

#

the only reason we take the derivatives in backprop is that it can be useful in finding local minima

river cape Jun 27, 2024, 6:39 PM

#

wooden sail the only reason we take the derivatives in backprop is that it can be useful in ...

How?

#

For a minima , shouldnt we equate the derivative to 0?

wooden sail Jun 27, 2024, 6:40 PM

#

under the condition that a function is differentiable and locally convex, it can be shown with some effort that following the negative of the gradient with a proper step size will eventually lead you to a local minimum with gradient 0

wooden sail Jun 27, 2024, 6:41 PM

#

river cape For a minima , shouldnt we equate the derivative to 0?

this is impossible to do directly for anything other than trivial cases

deep sleet Jun 27, 2024, 6:41 PM

#

wooden sail this is a bit of a rabbit hole question, but roughly: * from the maths standpoin...

ohh

wooden sail Jun 27, 2024, 6:41 PM

#

a network with 1 layer and a nonlinear activation function is already a case where doing that explicitly is impossible

#

there are also cases where you could technically do it, but the effort of inverting a matrix is prohibitive, so you anyway can't

#

you almost always have one or both of these cases together in any interesting problem

river cape Jun 27, 2024, 6:45 PM

#

wooden sail you almost always have one or both of these cases together in any interesting pr...

Wait so our approach is to minimize the loss , by finding the minima at which the loss function is minimum for that respective weight?

wooden sail Jun 27, 2024, 6:45 PM

#

that's a redundant way of putting it, but yes

#

you write the loss as a function of the weights, and then tweak the weights in such a way that the loss is small

river cape Jun 27, 2024, 6:46 PM

#

And we use the formula W(new) = W(old) - L.R * the gradient of the Loss w.r.t W(old)

wooden sail Jun 27, 2024, 6:46 PM

#

yes

#

well, that's vanilla gradient descent, but the other methods build up on it

#

i must add that there are also gradient-free methods

#

the philosophy is similar, but you trade in convergence guarantees for a shot at global optimality and relaxing the need for differentiability

river cape Jun 27, 2024, 6:47 PM

#

wooden sail you write the loss as a function of the weights, and then tweak the weights in s...

One more thing , let's say I have 9 trainable parameters so the loss would be the function of those parameters , which in turn means that the loss function is 10-D function right?

river cape Jun 27, 2024, 6:47 PM

#

wooden sail the philosophy is similar, but you trade in convergence guarantees for a shot at...

I still have to cover that

wooden sail Jun 27, 2024, 6:47 PM

#

river cape One more thing , let's say I have 9 trainable parameters so the loss would be th...

no

#

the loss is scalar, and usually real-valued at that

river cape Jun 27, 2024, 6:48 PM

#

wooden sail the loss is scalar, and usually real-valued at that

But if you graph it , it would be in 10 dimensions right?

wooden sail Jun 27, 2024, 6:49 PM

#

sure

#

but that 10 never shows up in any of the math you do on it

#

you have a function f: R^9 -> R

#

or possibly something more restrictive like a 9 dimensional manifold or just some subset of R^9, instead of R^9

river cape Jun 27, 2024, 6:50 PM

#

Ummm I see thats one thing

#

Never the less , I cleared my confusion regarding the minima

#

Btw can we differentiate a max function?

wooden sail Jun 27, 2024, 6:51 PM

#

remember the minima are the values the loss takes, the minimizers are the parameters

wooden sail Jun 27, 2024, 6:51 PM

#

river cape Btw can we differentiate a max function?

not in the conventional sense, no

river cape Jun 27, 2024, 6:52 PM

#

wooden sail not in the conventional sense, no

So the perceptron loss function doesnt use GD I believe?

wooden sail Jun 27, 2024, 6:52 PM

#

it does

#

but you'll either accept the output as probabilities of a categorical distribution, or use a smooth approximation to the max function during training

river cape Jun 27, 2024, 6:53 PM

#

wooden sail it does

How will it differentiate as the loss function is max(0,-y*f(X))

wooden sail Jun 27, 2024, 6:53 PM

#

it won't

#

the relu is also not differentiable btw. all modules make an arbitrary choice of subgradient for the relu, since it's subdifferentiable

#

for classifiers, you remove the max altogether or use something like a softmax

river cape Jun 27, 2024, 6:54 PM

#

wooden sail for classifiers, you remove the max altogether or use something like a softmax

So that it brings down the summation to a probability?

wooden sail Jun 27, 2024, 6:55 PM

#

what summation?

river cape Jun 27, 2024, 6:56 PM

#

wooden sail what summation?

f(x)= w1X1 + w2X2 +b2

wooden sail Jun 27, 2024, 6:57 PM

#

and wdym by "bring down"

#

the usual approach is that, if a classifier is supposed to output a particular class, this is (roughly) the same as saying that class has a probability 1 and the others have probability 0

#

and now we get the network's output probabilities match that

river cape Jun 27, 2024, 7:04 PM

#

wooden sail and now we get the network's output probabilities match that

What I meant was , we first use forward propogation to find the dot product, and then use a activation function like softmax to bring it down to a range.

wooden sail Jun 27, 2024, 7:04 PM

#

sure

river cape Jun 27, 2024, 7:16 PM

#

wooden sail sure

Question

#

Lets say I have a regression problem

#

I would use 2 nodes in the input layer , 2 nodes in hidden layer and 1 node in the output layer

#

My loss function would be mean squared error

#

and let's assume that only the bias of the output node is variable , all of the remaining parameters are constant

#

So technically speaking my loss is entirely dependent on the bias right

wooden sail Jun 27, 2024, 7:19 PM

#

yes

river cape Jun 27, 2024, 7:20 PM

#

And now I find the derivative of the loss wrt to the bias

#

and lets say it is positive

#

so that would mean if i increase the bias , the loss would also increase right?

wooden sail Jun 27, 2024, 7:21 PM

#

that is what the gradient tells you, yes

#

the gradient points in the direction that a function increases the most

river cape Jun 27, 2024, 7:21 PM

#

So we bring down the value of the bias by subtracting it with the derivative

wooden sail Jun 27, 2024, 7:22 PM

#

subtraction is not commutative so your wording is very ambiguous, but yes

river cape Jun 27, 2024, 7:23 PM

#

But if the derivative is positive , wouldnt it also mean that decreasing the bias , would decrease my loss?

deep sleet Jun 27, 2024, 7:24 PM

#

What are the pros and cons of capping?

#

because can't outliers in certain scenarios show you the relation between certain variables that you otherwise can't find?

chrome lake Jun 27, 2024, 10:14 PM

#

Hi, I was making this project from tensorflow https://www.tensorflow.org/tutorials/keras/text_classification_with_hub and i wanted to deploy it to a free hosting site, like pythonanywhere, the problem that i have encountered is that pythonanywhere doesn't support tesnorflow or Keras. So i tought of saving the model pickle, which is supported, however, I realized that you can't save a keras model using pickle and that you need to use an .h5 file format, which is loaded using keras.
Is there anything that i can do to load the model without using keras or tensorflow?
Also, sorry if this is no the correct channel to ask this

scenic parcel Jun 27, 2024, 10:54 PM

#

left tartan Half my life is replacing pandas code. Its good for rapid dev, but hits a wall q...

what do you replace it with? duckdb? if one was concerned with performance they'd not even be using python no? (unless pytorch or someething is involved)

deep sleet Jun 27, 2024, 11:13 PM

#

I was reading some code on kaggle and encountered this

#


In Logistic Regression, we use default value of C = 1. It provides good performance with approximately 85% accuracy on both the training and the test set. But the model performance on both the training and test set are very comparable. It is likely the case of underfitting.

I will increase C and fit a more flexible model```

#

Why would this be underfitting? isn't this perfect for the model?

#

it was able to capture the general trend and ignore the noise

left tartan Jun 27, 2024, 11:39 PM

#

Either DuckDB or some Polars or just consolidating (refactoring), and some just properly vectorizing operations (ie: getting rid of loops). Performance is fine with Python, driving Polars or DuckDB, although sure there's room for gain... but in analytical workflows, my biggest battle is complexity and reuse: sql (and dbt) give me a better structure @scenic parcel

sturdy canyon Jun 28, 2024, 2:41 AM

#

Anybody have a resource/framework they'd recommend for distributed training? I tend to use AWS as a cloud provider, and have set up an internet facing multi-instance inference platform on EC2 in the past. I am reluctant to use SageMaker due to my impression it's trading ease of use for increased cost and abstracting things that I should probably learn instead. Though, if that's not the case I'm willing to change my mind.

tender hearth Jun 28, 2024, 3:47 AM

#

@left tartan Sorry for disturbing you. Do you have an example of using DuckDB? I plan to use it with Django instead of Pandas but find it hard to use persistent data and create custom SQL based on query parameters.

left tartan Jun 28, 2024, 3:48 AM

#

tender hearth <@738234281146712084> Sorry for disturbing you. Do you have an example of using ...

Search this discord for 'import duckdb', I've posted a few examples in past

tender hearth Jun 28, 2024, 3:48 AM

#

thank you so much

woven hollow Jun 28, 2024, 4:46 AM

#

what is duckdb?

devout sail Jun 28, 2024, 4:54 AM

#

deep sleet ```The training-set accuracy score is 0.8476 while the test-set accuracy to be 0...

Even though you want to avoid overfitting, you still expect the performance on the training set to be a bit better, since that's the data you minimized your error for. So they're theorizing that since the training accuracy is about the same and even slightly lower than the test accuracy (meaning, the performance on the training set is comparable to data it wasn't trained on), that the model didn't eke out everything it could out of the training data, and suggest that might be because the regularization is too strict.

deep sleet Jun 28, 2024, 4:58 AM

#

devout sail Even though you want to avoid overfitting, you still expect the performance on t...

Ohh

#

Makes sense

#

Tysm!

half lintel Jun 28, 2024, 5:10 AM

#

is there a pandas-specific channel (or server)? I'm fairly experienced with python but pretty new to Pandas, would appreciate some help as I try and do things...

jaunty helm Jun 28, 2024, 5:11 AM

#

half lintel is there a pandas-specific channel (or server)? I'm fairly experienced with pyt...

not really, but you can ask here or open a thread in #1035199133436354600

tawdry monolith Jun 28, 2024, 5:28 AM

#

https://youtu.be/kQQaO5Cm5AI?si=boVbr88e72MzLWA8

Is this enough pandas to be move forward in my journey to be ml engineer for should I more more things about it?

YouTube

WsCube Tech

Python Pandas Tutorial for Beginners [FREE | Learn Pandas in 3 Hours

In this video, learn Python Pandas Tutorial for Beginners [FREE] | Learn Pandas in 3 Hours.

00:00:00 What is Data Analysis
00:15:10 What is Data Structures in Pandas (Pandas Series Data Structures)
00:29:42 DataFrames Data Structures in Pandas
00:41:01 Arithmetic Operators in Pandas
00:48:40 Delete and Insert Data in Pandas
00:58:22 Write ...

▶ Play video

past meteor Jun 28, 2024, 5:31 AM

#

tawdry monolith https://youtu.be/kQQaO5Cm5AI?si=boVbr88e72MzLWA8 Is this enough pandas to be m...

I strongly recommend you to just use the official pandas tutorial/docs

#

https://pandas.pydata.org/docs/user_guide/index.html

#

Often times people making these videos/courses don't really know the tech either, they target beginners that can't tell and make money off of that

tawdry monolith Jun 28, 2024, 5:34 AM

#

Ok thank you

tawdry monolith Jun 28, 2024, 5:39 AM

#

past meteor I strongly recommend you to just use the official pandas tutorial/docs

BTW one questions I want to ask do I have to learn everything about pandas or ?

glad whale Jun 28, 2024, 5:40 AM

#

@past meteor which bachelor / master did you do?

#

Im just curious

past meteor Jun 28, 2024, 5:41 AM

#

tawdry monolith BTW one questions I want to ask do I have to learn everything about pandas or ?

skim the documentation to know what exists and then start using it, then use the docs as a reference. Nobody knows "everything about Pandas", but a lot of people know where to find what they need etc.

past meteor Jun 28, 2024, 5:41 AM

#

glad whale Im just curious

and why

glad whale Jun 28, 2024, 5:42 AM

#

past meteor and why

I want to choose a masters and I see that you are proficient in the field of data science

past meteor Jun 28, 2024, 5:42 AM

#

What are your options?

past meteor Jun 28, 2024, 5:51 AM

#

glad whale I want to choose a masters and I see that you are proficient in the field of dat...

On average I think MS CS will teach you the ideas behind ML models and will also give you the required baggage to deploy models. My sample size isn't huge but what is missing from MS CS is the "finesse" of actually doing statistical modelling, that's often missing.

Statistics is another viable option but it's the opposite. You'll get all the finesse of modelling imaginable but probably not enough of the real world concerns (deployment, MLops, ...).

There's also MS data science (or AI). There you can't go off of the name, I'd really have to see the content because all of them I've seen are very different.

Finally, you can also pick applied fields if you have a specific interest you want to apply data science in. Experiemental psychology, bio informatics, computational chemistry, actuarial science, ... are all examples and there's many more

wooden sail Jun 28, 2024, 5:53 AM

#

there's also signal processing lemon_fingerguns_shades that comes with variants like medical sigproc/imaging, communications, and more

past meteor Jun 28, 2024, 5:53 AM

#

exactly, also a fine choice. My alma mater doesn't offer it, but it offers EE

#

EE into ML is a very solid choice as well. I think all of the very specific and advanced vision courses were exclsuively done there (due to the signal proc background)

wooden sail Jun 28, 2024, 6:11 AM

#

maybe i would add that data science and ML are probably best seen as tools you apply within another field, so you'll always be better if you have the specific domain knowledge of where you plan on using them. if you already know what applications you like, you can mix the two things together. if you don't, then a more stand-alone learning of DS and ML might be better, with the understanding that you'll have to learn about the application area later

#

i think both zestar and i learned all our DS and ML stuff in the context of a particular application, and as a result both of us know a lot of non overlapping methods and maths simply because some are more common in some fields

toxic mortar Jun 28, 2024, 6:43 AM

#

Hello guys, I'm working an NLP classification task that involves specialized terminology/lingo. Is it realistic to fine-tune existing models such as bert/some other, or would you recommend starting with a baseline model such as naive bayes/some other and then working through iterations with custom nlp model? I'd analyze the dataset and see what type of data I'm working with and then use various models to have some preliminary tests to assess performance and later on to compare it with. Any insights / docs on structuring the dev process would be appreciated. Thanks! 😄

main citrus Jun 28, 2024, 6:57 AM

#

Someone have an ai model which can reduce noise for mp4 file?

#

Or mp3

lapis sequoia Jun 28, 2024, 7:07 AM

#

any one help me in getting room direction from 2dimage.The image will always be indoor image

odd meteor Jun 28, 2024, 7:51 AM

#

toxic mortar Hello guys, I'm working an NLP classification task that involves specialized ter...

It depends... 😃 if I have enough time and I'm not rushing to beat any deadline, I'll definitely start from the classics; building a baseline model.

If I'm in my "let's go family, it's show timeeeee" mood, I'll go as far as adding transfer learning to it just so I can compare and contrast different model performance. (to be honest this part is more fun for me lol)

topaz stirrup Jun 28, 2024, 7:54 AM

#

guys when making ai, do u make a neuron class? and what is its attributes

#

im trying to make an adaptive neural network, and its well confusing me

south wraith Jun 28, 2024, 8:12 AM

#

hi folks, been having a lot of issues with my base code for my teaching module for my AI. Hoping that someone may be able to give me a few pointers on why this issue may be happenning? I'm not super advanced in python, but everything appears correct, and it just keeps erroring. very frustrating. Hoping someone can have a peek to get a second pair of eyes on it to see what it is that I am missing please?

#

If anyone thinks they would be able to lend a quick hand, feel free to DM me.

past meteor Jun 28, 2024, 9:04 AM

#

south wraith hi folks, been having a lot of issues with my base code for my teaching module f...

just a heads up, in this channel it's typically more helpful to ask your question directly and be as specific as possible

#

People will rarely commit to DMs, they'll typically prefer to see a question they're able to answer directly (or not)

drifting plaza Jun 28, 2024, 10:01 AM

#

What courses/youtube videos/resources do you all suggest when learning about pytorch?

unkempt apex Jun 28, 2024, 10:04 AM

#

drifting plaza What courses/youtube videos/resources do you all suggest when learning about pyt...

Docs!

wooden sail Jun 28, 2024, 10:16 AM

#

i did sigproc for my masters too

toxic mortar Jun 28, 2024, 12:00 PM

#

What do you think about kaggle ml competitions?

toxic mortar Jun 28, 2024, 12:01 PM

#

toxic mortar Hello guys, I'm working an NLP classification task that involves specialized ter...

Do you think looking through best approaches from leaderboards for a specific similiar use-case might help me for my problem?

south wraith Jun 28, 2024, 12:18 PM

#

past meteor just a heads up, in this channel it's typically more helpful to ask your questio...

Problem is that I don't know what to ask other than what I asked, because I could ask one question and the problem may be something completely different that I"m just not seeing, which is why I need a fresh set of eyes to look over it as I've probably got tunnel vision in relation to it.

left tartan Jun 28, 2024, 12:22 PM

#

south wraith Problem is that I don't know what to ask other than what I asked, because I coul...

At least give more information and context. Your prompt was devoid of any details, code, context, etc

left tartan Jun 28, 2024, 12:22 PM

#

south wraith Problem is that I don't know what to ask other than what I asked, because I coul...

You say it's 'erroring'. What errors? Etc

south wraith Jun 28, 2024, 12:25 PM

#

"UnboundLocalError: cannot access local variable 'y' where it is not associated with a value"

#

that's the error at the moment.

#

Before that, "ValueError: too many values to unpack (expected 3)"

#

Before that there was serialisation error

#

I've been going roun and round on the same errors for a while now.

#

I have no idea what error is the actual error.

#

What is the actual real error thoguh, must be something other than that because there are multiple errors that I keep going in circles with.

#

I'm currently cutting the code down so that I can upload it.

#

Essentially the error is somewhere inside....
[code]
i = self.sigmoid(np.dot(x, self.W_i[0, :]) + np.dot(h_prev, self.U_i) + self.b_i)
f = self.sigmoid(np.dot(x, self.W_f[0, :]) + np.dot(h_prev, self.U_f) + self.b_f)
c = f * c_prev + i * self.tanh(np.dot(x, self.W_c[0, :]) + np.dot(h_prev, self.U_c) + self.b_c)
o = self.sigmoid(np.dot(x, self.W_o[0, :]) + np.dot(h_prev, self.U_o) + self.b_o)
h = o * self.tanh(c)
y = np.dot(h, self.W_hy)
[/code]

#

as far as I am aware, because y isn't getting a value.

#

and nothing inside there is causing an exception

#

So I've tried many things. I've sent x with an np newaxis, I've sent x as it stands without modification or alteration. But nothing has worked.

south wraith Jun 28, 2024, 12:46 PM

#

If you would like to help, then please go to my post and have a look.

left tartan Jun 28, 2024, 12:51 PM

#

south wraith If you would like to help, then please go to my post and have a look.

Link to post?

left tartan Jun 28, 2024, 12:52 PM

#

south wraith I have no idea what error is the actual error.

For what it's worth, those are just code issues. They're all 'actual' errors that you need to fix first, before getting to anything related to your training.

south wraith Jun 28, 2024, 12:56 PM

#

Can't upload the traceback as it hits the 2000 character limit. sorry

left tartan Jun 28, 2024, 12:57 PM

#

south wraith Can't upload the traceback as it hits the 2000 character limit. sorry

First step: open a help thread so the conversation stays in one place: #❓｜how-to-get-help

#

!paste long blocks of code

arctic wedgeBOT Jun 28, 2024, 12:57 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

south wraith Jun 28, 2024, 12:57 PM

#

left tartan Link to post?

https://discord.com/channels/267624335836053506/1256220828173860916

south wraith Jun 28, 2024, 12:58 PM

#

left tartan !paste long blocks of code

thanks

past meteor Jun 28, 2024, 1:18 PM

#

wooden sail i did sigproc for my masters too

One day when I grow up I'll know 10 % of the maths you do 🙏

#

I'm reading a 1000+ page refresher on operating systems whenever I have a spare hour

south wraith Jun 28, 2024, 2:21 PM

#

howdy peeps, just got a bit further ahead of where I was earlier thanks to Billy, but there are still a few issues.. Things aren't being broadacast together properly.

As example setup...

import numpy as np
a = np.random.random((1, 1))
b = np.random.random((96,))
b= b.reshape(1, -1)
result = np.dot(a, b)

This works correctly and is broadcast together.
I have 2 np arrays in other code of same variance tha tI have applied them to and now I'mm being told they can't be broadcast together in the sigmoid.

ValueError: operands could not be broadcast together with shapes (1,12288) (1,96)

I did a
xax = self.W_i.reshape(1,-1)
to reshape the array like I did in the quick tester.

x1=np.dot(x, xax)
x2=np.dot(h_prev, self.U_i)
i = self.sigmoid(x1 + x2 + self.b_i)

But sigmoid isn't working right..

File "/home/user/AI/ai4.py", line 118, in forward
i = self.sigmoid(x1 + x2 + xbi)
~^~

#

It points at "x1 + x2"

lapis sequoia Jun 28, 2024, 3:48 PM

#

Hi,
I am working on time series forecasting for one step ahead, the following picture represents the result of forecasting, as u see the time series is highly variable, the R2 is equal 62%. i am using two models CNN-LSTM-Attention and GRU attention. please i need your opinion what do you think about results? can it be improved ?

unkempt apex Jun 28, 2024, 6:28 PM

#

right side -> loss
left side -> average reward ( Q values )

why it is confusing, loss values are seeming to increasing(which is bad) whereas average q values are increasing ( which is nice)

spring field Jun 28, 2024, 6:35 PM

#

unkempt apex right side -> loss left side -> average reward ( Q values ) why it is confusing...

is the loss td error? it's a bit finicky because per the papers and such what you want to do is gradient ascent not descent, so the value should indeed be increasing, but usually you just invert the loss and descend it, though you could then invert it again for plotting I guess? (I can't tell what you're doing without the code anyway) either way, RL do be a bit confusing 😁

unkempt apex Jun 28, 2024, 6:42 PM

#

yeah, RL is bit confusing for initial episodes, because it learns very slow!

#

do you need my code?

#

to check ?

spring field Jun 28, 2024, 7:01 PM

#

I'm honestly not sure what I would be looking for, I'm ever so slightly out of loop with RL right now 😁

unkempt apex Jun 28, 2024, 7:02 PM

#

yeah, I mean I have to train this on atleast 100k then something will be out!

scenic parcel Jun 28, 2024, 7:35 PM

#

https://archive.is/20240410144342/https://medium.com/@cautaerts/a-dataframe-is-a-bad-abstraction-8b2d84fa373f

#

article about a dataframe being a bad abstraction. first argument seems to be lack of ability for type checking

left tartan Jun 28, 2024, 7:59 PM

#

scenic parcel article about a dataframe being a bad abstraction. first argument seems to be la...

Somewhat a silly debate in Python/dynamic type land

#

Oh, their argument is really against tables?

#

That everything should exist as an entity/object?

glossy urchin Jun 28, 2024, 8:12 PM

#

would this be the place to ask about data scraping

spring field Jun 28, 2024, 8:18 PM

#

left tartan Oh, their argument is really against tables?

does it go so far to claim that NoSQL is the only way, lol?

left tartan Jun 28, 2024, 8:19 PM

#

spring field does it go so far to claim that NoSQL is the only way, lol?

It's more about strongly typing I guess. That we need some typing overlay? I didn't finish yet

sweet harness Jun 28, 2024, 9:01 PM

#

hay

#

Guys, did anybody tried to create a trading bot with neuroevolution algo?

topaz stirrup Jun 28, 2024, 9:16 PM

#

guys can anyone take a look at my code? i really cant find the issue on my genetic algorithm, it just does not want to learn!!!!! (i trained it for an hour but the average score didnt increase, while it should learn a high score within minutes)

half bolt Jun 28, 2024, 10:00 PM

#

How to get started with ai on mobile ?

past meteor Jun 28, 2024, 10:08 PM

#

not really

#

You need dependent typing or more for this to work

#

Dataframe libs like Pandas can let you arbitrarily add new columns with whatever names at whatever time

#

How will you know what type type is at any point in time

#

Because, that's the point of data frames. Removing that removes the point

past meteor Jun 28, 2024, 10:12 PM

#

past meteor Dataframe libs like Pandas can let you arbitrarily add new columns with whatever...

Can they solve that problem? It's one I spent too much time thinking of myself (strongly typed DFs)

#

I know Pandera

#

hmmm

#

Sounds convoluted

#

This is not the way to solve this problem (nor is statically typed dataframes)

#

Data versioning is also somewhat a solved problem

#

Have you heard of slowly changing dimensions?

#

Data warehouses have a data versioning problem

#

But, the schema is consistent

#

It' appropriate in some situations, but not in others

#

You can flip this problem on its head

#

tag datasets, tag runs

#

have a small CLI tool that can roll back time to a tagged dataset and execute a run on its commit hash

#

this is what I'd doo for ML. I actually do this without the CLI tool

#

For analytics this is a terrible idea

#

because it's a solved problem

rich moth Jun 28, 2024, 10:25 PM

#

Couldn't you incorporate data versioning into the code encoding the version information as part of the metadata for each document?

past meteor Jun 28, 2024, 10:25 PM

#

What are we talking about though

#

NLP? Vision? or general data

#

for analytics or similar

#

for the NLP and vision this might be an OK idea

spring field Jun 28, 2024, 10:26 PM

#

sweet harness Guys, did anybody tried to create a trading bot with neuroevolution algo?

someone probably has? though if you plan on trading purely based on price, know that that is quite baseless of a trading strategy

past meteor Jun 28, 2024, 10:27 PM

#

typical "business" data

#

even the data I work with, which is not "business" data

#

structured data

#

With a relatively fixed schema

rich moth Jun 28, 2024, 10:29 PM

#

spring field someone probably has? though if you plan on trading purely based on price, know ...

Well, I store all my datasets in elasticsearch, when when I embed the data into the server, you can create custom fields that can track the information related to the dataset. I imagine you track the where the information is coming from and update the embeddings of the data easier.

scenic parcel Jun 28, 2024, 10:31 PM

#

I just make a strenum that tells me what the columns of a df are

past meteor Jun 28, 2024, 10:31 PM

#

there's DBT

#

But you'll be writing SQL then

tropic moss Jun 28, 2024, 10:31 PM

#

Pro tip: instead of just slamming dependenciy installs into your terminal, read and at least understand their function

past meteor Jun 28, 2024, 10:35 PM

#

would you cache all the methods?

deep sleet Jun 28, 2024, 10:36 PM

#

sweet harness Guys, did anybody tried to create a trading bot with neuroevolution algo?

Yo

#

dms

past meteor Jun 28, 2024, 10:37 PM

#

I assume image_resized is a transformation step?

#

Do you cache the result affter each call or recompute?

#

I like types as much as the next guy but ...

#

There's also other tools

#

Just test your code

#

When the effort of types gets too much just write a test

#

what do you mean?

#

yes, and? I don't know what you mean?

#

define "a lot of files"

#

you mean, code?

#

So you mean, a lot of data

#

Seems like you have something very specific in mind and I don't understand it. That's fine.

#

Stuff like Airflow solves a lot of this

#

and DBT models are also made exactly for this

#

There's also something called "data lineage" worth looking into

#

https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/lineage.html

#

DVC is something I'm so skeptical about

#

make it and once I have it in my hands I can critique it better

#

I think your workflow is idiosynctratic

#

You've had problems that are unique to what you are/were doing

#

But they may very not well be the problems and scenarios that are common

#

Why isn't your data in a DB

#

And that's when it becomes idiosyncratic

#

why

#

... an object storage?

#

You're saying all of this because, fundamentally, you did these projects solo

#

How do you scale what you did to teams

#

A database does a lot more than just reading and writing files

#

Firstly, there's a thing called the medallion architecture. In this image they're showing it off with structured data

#

But you can do the same idea for unstructured data (images, sound, ...)

#

You keep the data in bronze, you can transform it to silver in multiple ways and times

#

If you change (the result of) your transform logic, which is a very very expensive thing to do irl, you can make a different section in silver and/or bronze

#

Also

#

How are you enforcing role based access control?

#

At a basic level if you don't want all the goodies that object storages have

#

Having a principled way for user managemennt is important

#

A big part of minio and S3 is just governance

#

Having the same tier of fine grained permissions with git and git LFS... idk

#

Okay so, then your data isn't in your repo

#

then it's in S3. Then it is in an object storage

#

Then isn't an entire project localized to your git repo if you use S3 or similar anyway?

#

How is that different to now where you have a DAG that reads data from an object storage, does transforms and writes it back? (the status quo)

#

What if you have a feature branch that has CD to a staging area

#

You submit a PR

#

It gets merged, CD to prod

#

You read main, you know it's occurred

#

Nah i'm just describing the standard workflow of companies with good engineering hygiene

#

If you have continuous deployment and you test stuff out in other branches isn't what you read in main exactly what happened in reality

#

With the ELT "pattern" you never tamper with your source data which means you can absolutely checkout to a commit, run your pipeline and have that dataset

#

Especially if you have the date of the commit and add created_at < commit_date

#

I'm "challenging" you on this not because I don't think it's a good idea

#

It is, it just isn't worth the paradigm shift imo

#

But the same can be said about really knowing what exists in this space already

#

Rather, to try old things

#

So as to not reinvent the wheel, but square 😄

#

Which is an odd place to start

#

It's not

#

It's very niche

#

https://www.reddit.com/r/MachineLearning/comments/mrb096/comment/gun8aa0/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button this is how I feel about DVC, said the same as me but differently

ploomber-io's comment on ""[Discussion]" Should I be using DVC (Dat...

Explore this conversation and more from the MachineLearning community

#

More eloquently

#

https://www.reddit.com/r/mlops/comments/16u7o2w/comment/k2ji9lc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

qalis's comment on "Data versioning: what is out there?"

Explore this conversation and more from the mlops community

#

another link, says the same as I do

past meteor Jun 28, 2024, 11:24 PM

#

past meteor With the ELT "pattern" you never tamper with your source data which means you ca...

.

#

idk how your pipeline can't be not ephemeral

#

it should always be

#

If it is, just run it

#

Just show me if/when it's done and I'll have an unbiased look

#

But if you invented a square wheel out of blindspots I'll tell you

#

gl with the takehome

sage sparrow Jun 28, 2024, 11:39 PM

#

I have this project where I must predict the next day's high and low temperature and was checking for normality, this is the Q-Q plot. What should I do about the not normally distributed residuals? Or how should I investigate this further?

deep sleet Jun 28, 2024, 11:42 PM

#

sage sparrow I have this project where I must predict the next day's high and low temperature...

what features do you train it with?

sage sparrow Jun 29, 2024, 12:14 AM

#

deep sleet what features do you train it with?

features = ['dew', 'humidity', 'precip', 'precipcover', 'windgust', 'cloudcover', 'visibility']

deep sleet Jun 29, 2024, 12:16 AM

#

Do most of these features have a linear relationship?

#

If not then you can give a look to randomforests

sage sparrow Jun 29, 2024, 12:21 AM

#

deep sleet Do most of these features have a linear relationship?

How should I check if the features have a linear relationship?

deep sleet Jun 29, 2024, 12:22 AM

#

sage sparrow How should I check if the features have a linear relationship?

by plotting each feature against the target

#

It's called data analysis

#

understanding the relation between your features and he target to identify the best model

sage sparrow Jun 29, 2024, 12:25 AM

#

deep sleet understanding the relation between your features and he target to identify the b...

Yup, they have a linear relationship. I've done tests but have been cramming so much info these days that I'm not sure about anything right now lol

Thank you by the way, for your responses

deep sleet Jun 29, 2024, 12:26 AM

#

sage sparrow Yup, they have a linear relationship. I've done tests but have been cramming so ...

hmm I am honestly new to this too xd

sage sparrow Jun 29, 2024, 12:26 AM

#

Ooh boy, still, how long do you have?

deep sleet Jun 29, 2024, 12:27 AM

#

Well if so I think that's the best you can do and someone with more experience will probably have a better suggestion

deep sleet Jun 29, 2024, 12:27 AM

#

sage sparrow Ooh boy, still, how long do you have?

about 2 weeks xd

sage sparrow Jun 29, 2024, 12:27 AM

#

All right, thank you : )

scenic parcel Jun 29, 2024, 1:46 AM

#

I've found 3 typos in dagster's docs so far

sage sparrow Jun 29, 2024, 2:39 AM

#

scenic parcel I've found 3 typos in dagster's docs so far

Sue them

scenic parcel Jun 29, 2024, 2:42 AM

#

my payday is coming

sage sparrow Jun 29, 2024, 2:48 AM

#

https://tenor.com/view/kermit-the-frog-tea-drink-meme-sip-gif-7971490

Tenor

chá

▶ Play video

unique spoke Jun 29, 2024, 11:48 AM

#

Anyone over here experienced with Opencv?

#

Was wondering how I could just identify all objects in an image

#

Doesnt require recognition but just detection

#

for example :

#

in a street like this, it would maybe identify all the different people and place a box enclosing them

#

I would also be using an edge detector on this

proven inlet Jun 29, 2024, 1:14 PM

#

I'm trying to train AI using Torch & Transformers but This chatbot literally copies me.

#

i used 7k of lines & messaages to train it

#

using Cuda via google colab

#

training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,
    num_train_epochs=10,
    per_device_train_batch_size=2,
    save_steps=2000,
    save_total_limit=2,
    learning_rate=0.001
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets,
    eval_dataset=tokenized_datasets,
    data_collator=data_collator
)

trainer.train()

model.save_pretrained("./trained_model")
tokenizer.save_pretrained("./trained_model")

model = BertLMHeadModel.from_pretrained("./trained_model").to(device)
tokenizer = BertTokenizer.from_pretrained("./trained_model")

def chatbot_response(input_text, model, tokenizer):
    input_ids = tokenizer(input_text, return_tensors='pt').input_ids.to(device)
    output = model.generate(input_ids, max_length=100, pad_token_id=tokenizer.pad_token_id)
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    return response

if __name__ == "__main__":
    while True:
        user_input = input("You: ")
        if user_input.lower() == 'quit':
            break
        response = chatbot_response(user_input, model, tokenizer)
        print(f'Bot: {response}')

#

its just a snipet from code

glass ridge Jun 29, 2024, 1:29 PM

#

does ml require using linux

#

or i can just use windows

proven inlet Jun 29, 2024, 1:33 PM

#

u can use windows

half bolt Jun 29, 2024, 1:51 PM

#

glass ridge does ml require using linux

Use jupyter notebook

half bolt Jun 29, 2024, 1:52 PM

#

proven inlet I'm trying to train AI using Torch & Transformers but This chatbot literally cop...

Congrat you just made a troll bot

proven inlet Jun 29, 2024, 1:52 PM

#

half bolt Congrat you just made a troll bot

😭

glass ridge Jun 29, 2024, 1:52 PM

#

so u guys using windows

half bolt Jun 29, 2024, 1:52 PM

#

proven inlet 😭

I code on mobile so any idea how to start ai?

half bolt Jun 29, 2024, 1:52 PM

#

glass ridge so u guys using windows

You can use Linux bro

proven inlet Jun 29, 2024, 1:52 PM

#

glass ridge so u guys using windows

u can use any os u want

half bolt Jun 29, 2024, 1:52 PM

#

Android on top

proven inlet Jun 29, 2024, 1:53 PM

#

not android 💀

half bolt Jun 29, 2024, 1:53 PM

#

proven inlet not android 💀

Why not I'm trying ai on android

glass ridge Jun 29, 2024, 1:53 PM

#

proven inlet u can use any os u want

cause i watched a video that recomanded linux

half bolt Jun 29, 2024, 1:53 PM

#

Don't tell me I can't:(

proven inlet Jun 29, 2024, 1:53 PM

#

bro's trying to burn his phone

half bolt Jun 29, 2024, 1:53 PM

#

proven inlet bro's trying to burn his phone

What?!?

#

I'm using a cloud bro

proven inlet Jun 29, 2024, 1:54 PM

#

oh

#

then ur actually training it on pc

half bolt Jun 29, 2024, 1:54 PM

#

You run the code inside the cloud

proven inlet Jun 29, 2024, 1:54 PM

#

cloud server

half bolt Jun 29, 2024, 1:54 PM

#

proven inlet then ur actually training it on pc

Technically yes

#

I have to buy a pc lol

#

Adios amigos

proven inlet Jun 29, 2024, 1:55 PM

#

i prefer google colab

proven inlet Jun 29, 2024, 1:55 PM

#

proven inlet ```python training_args = TrainingArguments( output_dir="./results", ove...

ok but fr someone help me 😭

#

this mf copies me

#

buoyant vine Jun 29, 2024, 2:02 PM

#

For starters you are using the wrong type of llm

#

BERT type models absolutely do not want to generate text

#

The only ingest text and spit out numbers

#

They don't generate paragraphs

#

Secondly idk how you are training your model but building your own llm takes a monumental amount of data

proven inlet Jun 29, 2024, 2:06 PM

#

buoyant vine Secondly idk how you are training your model but building your own llm takes a m...

i have 7k lines of messages isn't that good?

proven inlet Jun 29, 2024, 2:06 PM

#

buoyant vine BERT type models absolutely do not want to generate text

what should i use instead

buoyant vine Jun 29, 2024, 2:10 PM

#

proven inlet i have 7k lines of messages isn't that good?

Try millions or billions before you start seeing it produce same text

#

Training llms from scratch require an insane amount of data

proven inlet Jun 29, 2024, 2:11 PM

#

i dont train it from scratch

#

i use dbmdz/bert-base-turkish-cased

buoyant vine Jun 29, 2024, 2:11 PM

#

You can't use that model

proven inlet Jun 29, 2024, 2:11 PM

#

why?

buoyant vine Jun 29, 2024, 2:11 PM

#

Because it is not built to generate text

#

It is built to ingest text and spit out numbers for classifying data

#

My advise would be use llama or the likes for your chat what ever

proven inlet Jun 29, 2024, 2:12 PM

#

i used to use gpt2

#

but thats english

#

i need turkish

buoyant vine Jun 29, 2024, 2:12 PM

#

And models that understand non-english are going to be massive

#

I.e. 7+ billion params

#

I.e several GB at minimum

#

Idk how well llama even handles Turkish

#

Probably better with the 40B model

#

But it isn't something you can run easily yourself

proven inlet Jun 29, 2024, 2:14 PM

#

buoyant vine But it isn't something you can run easily yourself

why

buoyant vine Jun 29, 2024, 2:15 PM

#

Do you have a GPU with like 20GB+ vram?

#

You can try run it

#

Probably best to try via ollama

#

But the bigger models requires a lot of hardware

deep sleet Jun 29, 2024, 2:19 PM

#

wouldn't it be better to just use an api for translation?

buoyant vine Jun 29, 2024, 2:19 PM

#

Maybe the smaller one will work with Turkish? But I somewhat doubt since Turkish has a pretty minimal amount of presence in datasets that they use for Train these things

buoyant vine Jun 29, 2024, 2:20 PM

#

deep sleet wouldn't it be better to just use an api for translation?

This is indeed also an option but I guess it depends on how many resources you want it to use, and if you need it to maintain context across it since normally translation does loose some value of the text

deep sleet Jun 29, 2024, 2:21 PM

#

Yeah makes sense

buoyant vine Jun 29, 2024, 2:21 PM

#

Maybe translate -> gpt2 -> translate will be functional enough for your use?

#

Id recommend Argos translate for the actual translate bit

#

Since it is free and seld-hostable

#

And from our experience pretty solid

proven inlet Jun 29, 2024, 2:23 PM

#

buoyant vine Maybe translate -> gpt2 -> translate will be functional enough for your use?

yeah i was thinking of that

#

but then i also need to translate dataset right?

deep sleet Jun 29, 2024, 2:23 PM

#

No

proven inlet Jun 29, 2024, 2:23 PM

#

but dataset is turkish

#

gpt2 is english

deep sleet Jun 29, 2024, 2:23 PM

#

ohhh

#

Yeah ig

proven inlet Jun 29, 2024, 2:24 PM

#

ok so, should i use gpt2 via translation or llama?

buoyant vine Jun 29, 2024, 2:25 PM

#

What are you training it to do?

proven inlet Jun 29, 2024, 2:25 PM

#

chatbot

buoyant vine Jun 29, 2024, 2:25 PM

#

So why do you need to train it for that?

#

If this is a LLM you just take a pre trained one and adjust the prompt

#

For llama, chatgpt etc...

proven inlet Jun 29, 2024, 2:26 PM

#

i will make that chatbot mimic my friend

#

i have 6k+ messages of him

buoyant vine Jun 29, 2024, 2:26 PM

#

I would just do that via rag tbh

#

If you fine tune like you're doing now you're probably causing more damage to the model than actual training it

#

Won't be enough text to likely change the weights it already has trained

deep sleet Jun 29, 2024, 2:27 PM

#

buoyant vine I would just do that via rag tbh

how would rag work in this context?

buoyant vine Jun 29, 2024, 2:27 PM

#

So RAG (look it up) and llama probably best for you?

proven inlet Jun 29, 2024, 2:27 PM

#

buoyant vine If this is a LLM you just take a pre trained one and adjust the prompt

doesn't gpt2 count for that?

buoyant vine Jun 29, 2024, 2:27 PM

#

deep sleet how would rag work in this context?

Use the 'train' set of messages to give the model context on how it should reply and try mimic

buoyant vine Jun 29, 2024, 2:28 PM

#

proven inlet doesn't gpt2 count for that?

Not really

deep sleet Jun 29, 2024, 2:28 PM

#

buoyant vine Use the 'train' set of messages to give the model context on how it should reply...

oh k

buoyant vine Jun 29, 2024, 2:28 PM

#

Or at least it is a too small of a model

#

You basically need a 'big' LLM to do the actual text generation and have a conversation with

proven inlet Jun 29, 2024, 2:29 PM

#

does Llama have its own pretrained words in it

proven inlet Jun 29, 2024, 2:29 PM

#

buoyant vine You basically need a 'big' LLM to do the actual text generation and have a conve...

i dont have billions of text so

buoyant vine Jun 29, 2024, 2:29 PM

#

Yes that is the idea with llama and others

#

I'd play around with ollama

#

It is a seld-hostable service that lets you easily switch out models and prompts

proven inlet Jun 29, 2024, 2:30 PM

#

what do u mean switching out models and prompts

#

i just need to specify one model and use it

sweet harness Jun 29, 2024, 2:40 PM

#

spring field someone probably has? though if you plan on trading purely based on price, know ...

Giving it past 60 days prices as % change.
Started training it.

deep sleet Jun 29, 2024, 2:48 PM

#

spring field someone probably has? though if you plan on trading purely based on price, know ...

but if you give it time and price you want it to find the patterns for you

proven inlet Jun 29, 2024, 2:53 PM

#

@buoyant vine is it ok if i use allenai/longformer-base-4096 for llama

sweet harness Jun 29, 2024, 3:25 PM

#

Progress so far

half bolt Jun 29, 2024, 4:04 PM

#

Who knows a free good cloud for ai

#

The one I'm using costs coins and its a pain in the neck to get them

native pumice Jun 29, 2024, 4:05 PM

#

Hi everyone, im new to AI and I dont use python that much.

Im trying to use this model https://github.com/vikhyat/moondream but i seem to have some issues.

When I install via pip these things in a sequence

numpy (1.26.4 because something doesnt work well with 2+)
torch
moondream2 (deps specified on the github page)

I get an index out of bounds error when trying to use the model from the first example from the repo.

BUT if i do pip freeze > requierments.txt and then clean my venv and run an installation using the generated requirements I no longer get the index out of bounds error (using the same input)

What could be the issue?

GitHub

GitHub - vikhyat/moondream: tiny vision language model

tiny vision language model. Contribute to vikhyat/moondream development by creating an account on GitHub.

misty shuttle Jun 29, 2024, 4:06 PM

#

What is random_state in sklearn's train_test_split? and why should i set it to 42?

proven inlet Jun 29, 2024, 4:08 PM

#

half bolt Who knows a free good cloud for ai

google colab

lapis sequoia Jun 29, 2024, 4:14 PM

#

misty shuttle What is random_state in sklearn's train_test_split? and why should i set it to 4...

42 is kinda like seed which make data split same every time you runs

#

If u dont then before spliting it will sufffle the data every time you run

noble topaz Jun 29, 2024, 4:15 PM

#

Hello guys. I want to upload a model and a dataset in streamlit and when i press run to say the accuracy. Can you please help?

lapis sequoia Jun 29, 2024, 4:17 PM

#

native pumice Hi everyone, im new to AI and I dont use python that much. Im trying to use thi...

Try creating seperate clean environment

rich moth Jun 29, 2024, 4:18 PM

#

Finally getting decent results and thats just the first epoch.

river cape Jun 29, 2024, 4:18 PM

#

Hi

rich moth Jun 29, 2024, 4:18 PM

#

Howdy

river cape Jun 29, 2024, 4:18 PM

#

rich moth Howdy

I am doing good

#

How are you?

lapis sequoia Jun 29, 2024, 4:19 PM

#

rich moth Finally getting decent results and thats just the first epoch.

No overfitting?

river cape Jun 29, 2024, 4:20 PM

#

In backpropogation , which are the weights which get adjusted first , is it the weights which are closer to the input layer or the weights closer to the output layer?

rich moth Jun 29, 2024, 4:20 PM

#

lapis sequoia No overfitting?

not yet! 👍🏻

lapis sequoia Jun 29, 2024, 4:20 PM

#

Nice 👌

misty shuttle Jun 29, 2024, 4:28 PM

#

lapis sequoia 42 is kinda like seed which make data split same every time you runs

data split into what, test and train?

#

im already setting a limit for the data that is being used to train right?

lapis sequoia Jun 29, 2024, 4:33 PM

#

misty shuttle im already setting a limit for the data that is being used to train right?

Its not spliting but before spliting it randomises the data

misty shuttle Jun 29, 2024, 4:36 PM

#

lapis sequoia Its not spliting but before spliting it randomises the data

I'm afraid I don't understand you- does setting it to 42 control the data in some way?

lapis sequoia Jun 29, 2024, 4:36 PM

#

misty shuttle I'm afraid I don't understand you- does setting it to 42 control the data in som...

Yes

#

U know concept of random int?

misty shuttle Jun 29, 2024, 4:37 PM

#

I do not

lapis sequoia Jun 29, 2024, 4:37 PM

#

U know function called random.randint?

misty shuttle Jun 29, 2024, 4:37 PM

#

I do yeah in python

#

it gives a random integer in a specified range

lapis sequoia Jun 29, 2024, 4:38 PM

#

So if u dont set randomstate it will randomize the columns

#

Every time you run it

misty shuttle Jun 29, 2024, 4:39 PM

#

and 42 is what keeps it stable?

lapis sequoia Jun 29, 2024, 4:39 PM

#

misty shuttle and 42 is what keeps it stable?

Yes

misty shuttle Jun 29, 2024, 4:39 PM

#

okay i understand now- thank you

lapis sequoia Jun 29, 2024, 4:40 PM

#

👍

lapis sequoia Jun 29, 2024, 6:25 PM

#

What are the most advances parts of ML/AI in terms of skill?

river cape Jun 29, 2024, 6:43 PM

#

lapis sequoia What are the most advances parts of ML/AI in terms of skill?

Like do you want in terms of the difficulty to learn ?

left tartan Jun 29, 2024, 6:48 PM

#

lapis sequoia What are the most advances parts of ML/AI in terms of skill?

What is skill anyway? It's not just knowledge, but experience applying to knowledge plus understanding when to apply which techniques.

#

So, the hardest part of developing skill is actually using the knowledge and learning from the experience

#

For Ai/ml, this means tackling a wide range of problems using a variety of techniques, and understanding which techniques are most likely to be fruitful (my point is that nothing individually is 'hard', the hard part is acquiring sufficient experience)

lapis sequoia Jun 29, 2024, 7:07 PM

#

river cape Like do you want in terms of the difficulty to learn ?

Yes

#

What academic stuff? I took calc1-3 matrix and linear algebra , optimization, and a bunch of stuff. I don’t know, data science differs so severely from one place to another and it’s relatively new and wasn’t a thing when I was in undergrad

#

Just in terms of ML/AI. Like, I don’t know, what form is deep learning is the hardest? Like specifics. I just grinded NLPs for a month straight. Probably reinforcement learning.

#

Like, in undergrad, I dealt with partials so much to point it is just none sense. It just varies so much from place to place. Like, it is confusing. My friend has a masters in EE and mostly, writes in PyTorch and tensorfloe, but x he is engineering stuff like, let me show you https://github.com/devin1126/DevBot-1.0

GitHub

GitHub - devin1126/DevBot-1.0: This repository contains all of the ...

This repository contains all of the code that was used in the creation of the first iteration of my custom surveillance robot coined the 'DevBot'. Please read the README.md file for...

#

For intense optimization, yeah.

#

No, I never found it confusing

#

It’s not, it is hard when you have to see the statics once it is optimized to see how parameters change when things are optimized. That is very hard.

#

No, like, say f(x,y;a,bc) = something, right? You have to maximize x and y, not a,b and c. When it is optimized, parameters change,
You just take the partials of those to see if the whole thing was optimized correctly and if everything all together holds. I was just asking like, what is the highest level of mastery in ML/AI at the moment.

unique spoke Jun 29, 2024, 7:32 PM

#

Hey Lisan Al Gayib

#

U experieneced with cv?

unique spoke Jun 29, 2024, 7:32 PM

#

unique spoke for example :

uk a lib i can use for this

#

??

#

thats amazing

#

I was hoping you could suggest a way I could achieve what I linked

unkempt apex Jun 29, 2024, 7:34 PM

#

after 25k episodes!
left -> loss
right -> average Q

unique spoke Jun 29, 2024, 7:34 PM

#

for objects in the street? Like I want something which is more general. Walking down a street recording this, it should be able to identify all objects

#

Im checking them out. While semantic segmentation does seem to really help.honestly the boring one suits my project more. Where do you normally find these? (Could you link if you found one already)?

#

Also whats your suggestion for how I should detect them - using haarcascades , lbps etc

river cape Jun 29, 2024, 7:38 PM

#

So usually the gradients are calculated starting from the output layer and moving backward to the input layer and then the weights are updated simultaneously for all layers?

unique spoke Jun 29, 2024, 7:41 PM

#

lemme check. thanks!

river cape Jun 29, 2024, 7:45 PM

#

Actually gradients are calculated in the backward pass right?

#

Oh i see

unique spoke Jun 29, 2024, 7:49 PM

#

Thanks for your help Lisan Al Gayib, have narrowed it down to MS COCO and VIDVIP

#

also just another q b4 I go, as a beginner with CV , for image classification , do you recommend I do keras and then move to neural networks or should I directly move to neural networks

river cape Jun 29, 2024, 8:20 PM

#

Well its very deep but I get the idea now

#

Thanks mate

#

Whats the difference between pytorch , tensorflow and keras?

glass ridge Jun 29, 2024, 8:23 PM

#

buoyant vine Yes that is the idea with llama and others

does the documentation that u ve suggested contains all stuffs that i need to know about numpy fr ml (data science)

glass ridge Jun 29, 2024, 8:26 PM

#

buoyant vine Yes that is the idea with llama and others

this is the doc if u dont remember it https://realpython.com/numpy-tutorial/

NumPy Tutorial: Your First Steps Into Data Science in Python – Real...

In this tutorial, you'll learn everything you need to know to get up and running with NumPy, Python's de facto standard for multidimensional data arrays. NumPy is the foundation for most data science in Python, so if you're interested in that field, then this is a great place to start.

past meteor Jun 29, 2024, 8:29 PM

#

glass ridge this is the doc if u dont remember it https://realpython.com/numpy-tutorial/

you should learn numpy from the official docs

#

https://numpy.org/doc/stable/user/absolute_beginners.html

glass ridge Jun 29, 2024, 8:33 PM

#

past meteor you should learn numpy from the official docs

will i need all methods in numpy fr ml

past meteor Jun 29, 2024, 8:33 PM

#

No but this is the kind of thing you should read to know what methods exist, then do a project with it, and then you can use it as a reference

unreal geyser Jun 29, 2024, 8:34 PM

#

i don't see anything to learn in numpy

#

its simple and straigh forward

glass ridge Jun 29, 2024, 8:35 PM

#

unreal geyser i don't see anything to learn in numpy

yeah , but u have to memorize the methods

past meteor Jun 29, 2024, 8:35 PM

#

You absolutely do not

glass ridge Jun 29, 2024, 8:35 PM

#

past meteor You absolutely do not

?

unreal geyser Jun 29, 2024, 8:35 PM

#

that's what i meant , they are pretty simple and self explainatory so more like you will remember them once you use

#

pytorch and numpy has mostly same api for operations

past meteor Jun 29, 2024, 8:37 PM

#

glass ridge ?

In my opinion it's always a good idea to learn what methods the library has to get a sense of what it can do. Don't memorize them. When you have a project you'll forget which methods exist to solve a specific problem but you'll know where to look to find it

unreal geyser Jun 29, 2024, 8:37 PM

#

i believe you should remember funtion names mostly used ones at least

glass ridge Jun 29, 2024, 8:39 PM

#

so , i gotta get directly to the officiel documentation

#

and see what s the methods that i will take

#

ok thx guys

glass ridge Jun 29, 2024, 8:40 PM

#

past meteor https://numpy.org/doc/stable/user/absolute_beginners.html

is this the doc u mean?

past meteor Jun 29, 2024, 8:40 PM

#

yes

glass ridge Jun 29, 2024, 8:40 PM

#

ok

#

i will start on it

finite lodge Jun 29, 2024, 8:41 PM

#

Hi all, Im using seaborn to generate a plot, however I cant get the legend to be outside of the graph...
I tried to solve it but it got cut off...

Relevant code:

sns.set_theme()
sns.set_style("whitegrid")
sns.set_context("paper")

#plt.figure(figsize=(12, 4.8))

plot = sns.barplot(data=df, x='files', y='similarity', hue='type')
sns.move_legend(plot, "upper left", bbox_to_anchor=(1, 1))

sns.despine()

plt.savefig('plt.svg')

Thank you in advance

left tartan Jun 29, 2024, 9:01 PM

#

seaborn. It's pretty good, I use it occasionally (plotly's my main). Not frequently enough to remember how to place the legend tho 🙂

past meteor Jun 29, 2024, 9:03 PM

#

Yeah, I mostly use plotly as well

#

seaborn is what I use for extensive EDAs because of joinplot etc

#

fwiw you can set your plotting backend with Pandas, in case you're using that. Meaning you can do df.plot() and have it output plotly, seaborn or matplotlib

past meteor Jun 29, 2024, 9:07 PM

#

finite lodge Hi all, Im using seaborn to generate a plot, however I cant get the legend to be...

As for moving the legend, big tip: the most of seaborn plots are matplotlib plots. It's often better to google "how to move the legend with matplotlib" in my experience.

finite lodge Jun 29, 2024, 9:07 PM

#

past meteor fwiw you can set your plotting backend with Pandas, in case you're using that. M...

Not sure if you are talking to me lol, but Im using pandas to create the dataframe already

past meteor Jun 29, 2024, 9:07 PM

#

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html

finite lodge Jun 29, 2024, 9:07 PM

#

past meteor As for moving the legend, big tip: the most of seaborn plots are matplotlib plot...

Indeed, however the equivalent code for matplotlib is much more complex unfortunately

deep sleet Jun 29, 2024, 9:07 PM

#

I think you add the box anchor parameter for that

past meteor Jun 29, 2024, 9:08 PM

#

plt.legend(loc="upper right") does that work for you?

#

if not, what do you want to do?

#

Maybe I don't get the question

deep sleet Jun 29, 2024, 9:08 PM

#

I had used it in my previous code lemme see

finite lodge Jun 29, 2024, 9:08 PM

#

past meteor if not, what do you want to do?

Having the legend outside the plot, but not cut off like the image

finite lodge Jun 29, 2024, 9:08 PM

#

past meteor `plt.legend(loc="upper right")` does that work for you?

let me see

past meteor Jun 29, 2024, 9:08 PM

#

What is "outside of the plot"

#

oh

#

it's cut off I see

#

Yeah, try upper right

finite lodge Jun 29, 2024, 9:10 PM

#

past meteor Yeah, try upper right

Nope, it is inside and overlapping. I think this is teh default behavior

deep sleet Jun 29, 2024, 9:10 PM

#

import matplotlib.pyplot as plt

sns.set_style("whitegrid")
sns.set_context("paper")

plt.figure(figsize=(12, 4.8))

plot = sns.barplot(data=df, x='files', y='similarity', hue='type')
plot.legend(loc='upper left', bbox_to_anchor=(1, 1))

sns.despine()


plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.savefig('plt.svg', bbox_inches='tight')
plt.show()```

#

try this

finite lodge Jun 29, 2024, 9:12 PM

#

deep sleet ```import seaborn as sns import matplotlib.pyplot as plt sns.set_style("whitegr...

Working thanks!

#

How did it work th?

#

Also, I did not know one could use plt.show outside a notebook btw

past meteor Jun 29, 2024, 9:12 PM

#

I didn't know tight layout was a thing for non subplots

deep sleet Jun 29, 2024, 9:13 PM

#

finite lodge Working thanks!

the bbox_to_anchor makes it outside and the rest makes sure it gets enough space

#

I honestly don't understand it that well

#

I found it on stackoverflow because I wanted the same thing a while ago

#

but I am glad it worked

past meteor Jun 29, 2024, 9:14 PM

#

Can you try this plt.legend(bbox_to_anchor=(0, 1), loc='upper left', ncol=1)?

finite lodge Jun 29, 2024, 9:15 PM

#

past meteor Can you try this `plt.legend(bbox_to_anchor=(0, 1), loc='upper left', ncol=1)`?

Where? replacing the plt.legend of @deep sleet ?

past meteor Jun 29, 2024, 9:17 PM

#

sns.set_theme()
sns.set_style("whitegrid")
sns.set_context("paper")

#plt.figure(figsize=(12, 4.8))
fig, ax = plt.subplots()

plot = sns.barplot(data=df, x='files', y='similarity', hue='type', ax=ax)
ax.legend(bbox_to_anchor=(0, 1), loc='upper left', ncol=1)

sns.despine(fig=fig)

fig.savefig('plt.svg')

#

something like this

#

I vastly prefer making my figure and axis manually and passing it around

#

More explicit 🙂

finite lodge Jun 29, 2024, 9:19 PM

#

past meteor ```python sns.set_theme() sns.set_style("whitegrid") sns.set_context("paper") #...

Not working th

past meteor Jun 29, 2024, 9:20 PM

#

interesting

#

luckily you already have a solution lol

finite lodge Jun 29, 2024, 9:22 PM

#

True true

#

Also, is a "group separator" possible in seaborn?

half bolt Jun 29, 2024, 10:07 PM

#

@past meteor can I run ai code on Google jupiter notebook or bothosting or pydroid3 on mobile ?

#

Or even a vps??

past meteor Jun 29, 2024, 10:20 PM

#

half bolt <@260493929047130113> can I run ai code on Google jupiter notebook or bothosting...

Vague question. What is "AI code"

#

You can run non neural net algos on most consumer grade computers

lapis sequoia Jun 29, 2024, 10:21 PM

#

are most CNNs made through cv2, like, I do not know, ImageDataGenerator and stuff?

past meteor Jun 29, 2024, 10:21 PM

#

It ultimately depends on what it is. If it's LLMs you will need heavier hardware

lapis sequoia Jun 29, 2024, 10:22 PM

#

past meteor It ultimately depends on what it is. If it's LLMs you will need heavier hardware

I thought LLMs use RNNs

past meteor Jun 29, 2024, 10:22 PM

#

No

#

They're transformers

lapis sequoia Jun 29, 2024, 10:22 PM

#

are RNNs kind of just irrelevant?

past meteor Jun 29, 2024, 10:23 PM

#

There's cases where they still outperform transformers. They have a lot less parameters

#

It's conceivable you have problems where an rnn will be better