#data-science-and-ml

1 messages · Page 108 of 1

long canopy
#

need to sit in a corner and think, thanks for help clarifying what i mean lol

final kiln
#

I don't know how I helped, but happy I did

frigid badge
#

I still need idea's on how to make the programming system

#

like what do you think should I do it via specific classes that correspond to attributes and physical laws? for example

CHECK-ATTRIBUTE -> (Material) [Corrosponding attribute] returns T/F

OBJECT-DENSITY -> (Material) returns object density
... so on...

would this be too simple? too complicated?

final kiln
#

I think an object oriented language, when it comes to physics makes a lot of sense

#
class Material:
    density: float
    state: Literal["gas", "solid", "liquid"]

    def to_liquid(self):
        ...
frigid badge
iron basalt
#

You see a stream of cars drive by in front of you. They are all the same green car. Then after a long period of time, you finally see a red car. If you count how many green cars you have seen and how many red, red has a low probability of being seen, yet seeing the red car told you a lot more about what kinds of cars there are than seeing yet another a green car, seeing a green car is not special / common and does not tell you anything new.

#

If you are trying to hack a program and seeking information on its behavior, are you looking for common / normal / not rare behavior or rare behavior? Which tells you more about all the possible ways the program can behave given random inputs?

frigid badge
#

yea that makes sense

iron basalt
frigid badge
#

I want to use this as the programming language the MLM uses to learn, it should be complex enough to create a learning algorithim and custom behaviour in relation to materials.

1.    CHECKATTRIBUTE -> TOOL(OPTIONAL), (Attribute) Material
2.    ALLOYING -> [Material, Material, Material…]
3.    GASBLENDING -> [Material, Material, Material…]
4.    BLENDING -> [Material, Material, Material…]
5.    FOR (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … 
6.    WHILE (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … 
7.    IF (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … 
8.    ELSE-IF  (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … 
9.    ELSE
10.    MATERIALSTATE(Material) returns the state of the material as a string.
11.    MATERIALSTAT -> (Stat), (Material) returns the current value of the material
tidal glen
#

hello, i am trying to integrate a model into android app and i am getting this error
java.lang.RuntimeException: Error occurred when initializing ImageClassifier: Input tensor has type kTfLiteFloat32: it requires specifying NormalizationOptions metadata to preprocess input images.
i didnt have much knowledge about ml
can anybody help??
i think i have to add metadata to the tflite model.
But i didnt have any idea about it

long canopy
#

god bless the SPAADIA corpus

rugged mist
agile cobalt
#

extremely rough guesses but I'd imagine either

  • their model is overfit and the performance wouldn't hold up in practice, in particular I'd be very sceptical of its reasoning capabilities
    • might've just have been out of scope for their research, but there being no "instruct" model nor open source weights at all reinforces that for me
  • if you can get that much better with smaller models, then normal LLMs are ridiculously underfit to the point that we should be able to get x10 better performance with the current parameter 7~13B ish counts
#

these benchmarks also look pretty sus imo

where did they even take that "llama llm" 3b parameters model from? the official llama2 models are 7, 13 and 70b

#

We compared BitNetb1.58 to our reproduced FP16 LLaMA LLM in various sizes.
To ensure a fair comparison, we pre-trained the models on the RedPajama dataset for 100 billion tokens.
if I'm understanding this correctly, they only trained the big LLM for 100 billion tokens?
no wonders it sucks

#

the comparison with StableLM might be a bit more fair, but iirc StableLM is so bad nobody even uses it in first place?

final kiln
#

trying to find a github link

final kiln
#

just to see if I understood correctly

#

every weight is now -1, 0 or 1

#

the implications of this, if it works, are really big I think, cuz this is great for electronics right

final kiln
final kiln
final kiln
final kiln
frigid badge
#

gotcha

#

so I decided I want the languages structure to be like this:

CHECKATTRIBUTE -> TOOL(OPTIONAL), (Attribute) Material;
MATERIALSTATE -> (Material);
MATERIALSTAT -> (Stat), (Material);
ALLOYING -> [Material, Material, Material…];
GASBLENDING -> [Material, Material, Material…];
BLENDING -> [Material, Material, Material…];
FOR (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … {LOGIC}
WHILE (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … {LOGIC}
IF (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] …  {LOGIC}
ELSE-IF (CONDITION-A) OPTIONAL [&&, || (and, or) (CONDITION-B)] … {LOGIC} 
ELSE {LOGIC}
FUNC -> (Param-1, Param-2…) {LOGIC}
VAR -> (VariableName), (int/str/float/list/set); 
VAR = Value;
RETURN VARTYPE(VAR)
#

its clear enough, flexible while also allowing for complexity and limitations

#

if anyone got any ideas on how I can make an interpeter for this without costing too much preformance wise im all ears!

final kiln
# final kiln > Entropy in information theory is directly analogous to the entropy in statisti...

ok I'm gonna try to use this as a starting point and walk back to information theory and then ML

  • for a given microscopic system we have a state space that is the tensor product of the state space of each of its particles
  • from my point of view, all I can observe are averaged out variables, such as average momentum, average energy, etc
  • a given combination of these averaged out values (macro states) that I can measure experimentally, a partitioning of state space occurs, there's N micro states for each macro state
  • for a given macro state, entropy is the logarithmic measure of the number of possible micro states
  • in a sense then, entropy kinda measures the probability of a macro state, which is why systems evolve to higher entropy, a bit of a circular thing really
  • so the analogy here is that the macro state is the random variable (from which I can extract average values), and the micro states are the various realizations of the random variable

I think I'm getting somewhere

microstates here would then be like, the number of things I have to write in order to fully describe the random variable,

tho I think that in statistical mechanics you have the assumption that each micros tate is equally probable, idk how that affects this picture

hasty gust
# frigid badge so I decided I want the languages structure to be like this: ``` CHECKATTRIBUTE ...

I'm joining in this discussion quite late, but what are you hoping to do with such a model? The structure of the language seems fine, I'm no expert on language structure nor am I very familiar with condensed matter physics, but the basic operations are there and can be fleshed out in the future. However, why do you feel like you need a new programming language? This process can be described well in OOP or functional programming. In terms of performance, I don't believe making a new language vs doing this in an existing language will really matter.

hasty gust
final kiln
#

has been a struggle

#

I'm gonna try to pick it from the other side

frigid badge
# hasty gust I'm joining in this discussion quite late, but what are you hoping to do with su...

I just want the model to have a very specific and limited set of tools which will force it to mix and match them trying to make the best possible results, this in my opinion needs to be done in a controlled environment where all of the outcomes and possible matches can be predicted relatively easily, with a larger more fleshed out language it would be way more difficult to look at its limitations and try to force the MLM to "innovate", so yea this is all in my own opinion ofc and i could be wrong

hasty gust
#

I suppose my confusion is you're basically describing OOP. The first step in OOP is to collect all the objects that you want to manipulate and their relationship to each other. You then label these objects as classes which describe how they can be manipulated. Imo, you've basically describe the problem that OOP was designed for. There's a known number of material properties (if I'm understanding your project correctly), and known ways materials can interact with each other to potentially create those properties.

final kiln
hasty gust
# frigid badge I just want the model to have a very specific and limited set of tools which wil...

To explain my thoughts, this programming language isn't going for computation. Its describing processes for your model to evaluate. In this case where you're essentially trying to describe a process graph to an ML model, your language of choice really won't matter, and you shouldn't need to really care about performance. (I could be completely misinterpreting your project since its 4 am in the morning for me and I haven't had my morning coffee)

final kiln
#

https://www.youtube.com/watch?v=0GCGaw0QOhA

this all makes sense, until you need to generalize to biased coins

for a fair coin, sure, I need one bit to communicate an outcome, but that's exactly the same thing as for a biased coin, it's either heads or tails, so I only need one bit regardless of how improbable it is, as long as it's not p=0

#

unless like, we're interpolating between communicating between zero bits and communicating a bit ? if that even makes sense, which it doesn't because even if I would consider "not sending a bit" as a communication, that's still exactly the same as sending a bit

(im dead >.<)

lapis sequoia
#

Reddit Agrees To $60M Deal That Allows Google To Train AI Models On Its Posts Xd

final kiln
lapis sequoia
#

And they would be like so what

#

At this stage mega 'influential' companies are not doing what consumer wants Xd

final kiln
lapis sequoia
#

They are about to IPO, pension funds ect will buy shares. They have got AI training data Xd

final kiln
#

i am aware, i received the invite thing

#

since im not a us citizen I can't participate, and not sure if I would tbh

lapis sequoia
#

Well people like to socialise - and are simply letting themselves to be corralled to reddit like funnels Xd

final kiln
#

even unis are selling their students data it's insane

lapis sequoia
#

Reddit - people could like posts. Posts could not be liked on IRC Xd

final kiln
#

IRC ?

lapis sequoia
#

Lettings themselves to be corralled to online where estimating people emotions are hard, higher insecurity, yearning for likes

#

internet relay chat Xd

final kiln
#

that's before my time

lapis sequoia
#

history repeats

#

Like why would one interact in moderated online environment mostly

final kiln
#

this is such an amazing short

lapis sequoia
#

Well many proverbs are simply made. Like discord is bad tea is good Xd

#

or Joe is cool, sun is cool, rain is wet, cold is cold Xd

#

I am asking local AI model how would AI AGI enslave humans, would it even be interested

#

Humans are by far more interested in being an AI slaves than AI is in been a master

frigid badge
#

how is this even topic related???

final kiln
lapis sequoia
#

Which part?

final kiln
#

All of it, you're making a statement about the interests of billions of people.

#

And what AI wants, if it wants or ever will want, is unknown

frigid badge
final kiln
lapis sequoia
#

Well - AI that knows humans can not delete it . It may simply ignore humans. If you listen to Elon he claims AI wants to control. Elon wants to be controlled maybe

serene scaffold
#

I don't listen to Elon.

hasty gust
#

Why are we talking about AI as though its not just an equation at the moment

serene scaffold
#

This channel isn't really a place for grandiose speculation about apocalyptic AI.

buoyant vine
#

Has anyone played around with AWS's Trn1 instances?

#

Are they as good as AWS claim?

final kiln
#

I've considered, I got some spot quota for them

buoyant vine
#

The specs on the chips seem to suggest they have some enormous compute power for 16bit floats and int8 models

#

but for me to test would involve quite a bit of work to use 😅 So just asking around before hand

lapis sequoia
#

Well AI regulation is debated now in most countries

final kiln
buoyant vine
#

I would be planning to run our inference engines on them

#

which deploy fine tune BERT type models

lapis sequoia
#

However it seems they simply want some limited AI AGI that will enslave humans. AGI itself is like meh

final kiln
#

Right now I usually just pull the Nvidia docker container and set it up to have GPU access right, would it be similar for TPU ?

buoyant vine
#

It is more similar to onnx

#

Model compiles to their instructions -> Tranium chips execute those instructions

final kiln
#

Interesting

lapis sequoia
#

Neuromorphic computing is an approach to computing that is inspired by the structure and function of the human brain. A neuromorphic computer/chip is any device that uses physical artificial neurons to do computations. In recent times, the term neuromorphic has been used to describe analog, digital, mixed-mode analog/digital VLSI, and software s...

#

move on gpu

hasty gust
#

as someone who works with neuromorphic computing, its not going to replace gpus, its just another way to compute

lapis sequoia
#

how come?

final kiln
#

They have an AMI for it, so there's no need to pull docker

hasty gust
#

Well for one, the technology is not quite there yet. It's still pretty much an arms race between memcomputing, neuromophic, or quantum computing to see which one will become viable. In addition, the specific computing tasks that they have an advantage over classical computing in, is very specialized. This area of computing, I've generally seen referred to as alternative computing, is essentially computing at scale. It'll most likely end up being another specialized component like a gpu rather than completely replacing it as they serve different purposes.

#

Just the challenges of making a useful memristive material still haven't been solved yet

#

If by neuromorphic engineering, you're referring to snns, that's potentially viable in the next couple of years. The main challenge with snns is that their performance is often just comparable to anns rather than being a generational advantage. From the research I'm done with recurrent spiking neural networks, they have a similar roc curve as their ann counterparts, but are better at general cases and fail more often for edge cases. The biggest issue is why spend 2 weeks implementing stuff like neurogenesis when you could train an lstm much faster?

final kiln
#

I think it's appropriate because it's hard not to envy the engineering marvel you see all over nature

hasty gust
#

There's definitely a lot to learn there. I believe vanadium oxide is the closest thing we have to neurons, and they don't come close

final kiln
#

I saw this recently

hasty gust
#

The goal is to mimic the nonlinear behavior of actual neurons, rather than the simple linear behavior of perceptions

final kiln
#

Yeah that would be cool

hasty gust
#

But well, we're not quite there yet. We have some cool random number generators last I checked lol

final kiln
#

But also very complicated I would imagine

#

I think my math is all over the place lol, but stuff works out to more or less the same order of magnitude

hasty gust
#

Yup, the main barrier is we don't have a material at room temperature that behaves the way we want. Its currently a material hunt on the experimental side, while people try to figure out how to create a new theory of computing that's turing complete.

final kiln
hasty gust
#

I believe the most accurate physical model of a neuron is coupled magnetic oscilators.

hasty gust
#

My knowledge of the mechanical side of things is because most of the physicists I interact with work on it. My own research is in physics informed machine learning, for high energy physics.

long canopy
#

what's up with that 1 bit LLM paper

#

anyone try an implementation?

final kiln
#

My supervisor at the time talked about physics informed ML, tho at the time I didn't have much interest in ML tbh

final kiln
lapis sequoia
#

I saw it

#

With a rapid expansion of AI ability to spot connections yes tech explosion

#

Question is do most people want AGI to do most so people can relax, enjoy or people want limited AGI dictators

hasty gust
long canopy
#

it would speed up inference like crazy

lapis sequoia
#

Yes

#

Forget about phd

#

Now when person ai communication is free substandard ways of learning are obsoleting

hasty gust
#

I'd be very interested in the results, I'm a bit skeptical of this paper, as airxiv isn't exactly peer reviewed, and the paper doesn't go into too much detail.

final kiln
lapis sequoia
#

Well what do you like doing? Or its chasing latest trends?

long canopy
#

i just need cheap inference, queries on whole codebases currently cost 1 USD lol

final kiln
final kiln
hasty gust
final kiln
#

But has Microsoft stamp on it I think

lapis sequoia
#

For many in past access to uni, professors was costly. Their place was full of backward people. Now people can and do talk to AI direcly and privately. It is also changing ways people act

hasty gust
#

Yup it has the microsoft stamp, I'm just skeptical. I don't think its a scam, but I definitely think their data is pretty crazy.

final kiln
#

If we can have bit models, do we really need quantum computing for ex

#

QC has very limited usage as it is

lapis sequoia
#

Maybe its a bit too much to ask. However after talking to AI did you change the way your behave?

hasty gust
#

Yup, the goal of quantum computing is to avoid the exponential cost of algorithms.

lapis sequoia
#

Avoidances of exponential costs - nice

hasty gust
#

I don't think its a realistic goal, but if its possible, then it has a lot of potential applicability

#

At least that's the reason why so much money gets thrown at it

final kiln
hasty gust
final kiln
#

Yeah it saves a ton of time for sure

hasty gust
#

It's allowed me to focus more on what I care about instead of learning a new api pretty much

lapis sequoia
#

Well maybe effect is more pronounced in kids - teens people who still actively question ways of expressing

#

And yes it saves a lot when it comes to code

final kiln
#

Also writing the boring stuff, I can make it write my docstrings and stuff

#

Just needs me to audit it and make changes here and there

#

Ah and emails. Automating email writing has been the best contribution of gpt to my life for sure

hasty gust
#

They claim to find prime factorizations in polynomial time

long canopy
#
paul_mk1

Fun to see ternary weights making a comeback. This was hot back in 2016 with BinaryConnect and TrueNorth chip from IBM research (disclosure, I was one of the lead chip architects there).Authors seemed to have missed the history. They should at least cite Binary Connect or Straight Through Estimators (not my work).Helpful hint to authors: you can...

lapis sequoia
#

AI can be used for thinking too. And inventing

long canopy
final kiln
lapis sequoia
#

It can simply decide what to do and that is a relief

jade grotto
#

hi

supple inlet
#

anyone here using a tesla p40 in their deep learning / LLM rig?

past meteor
#

No, we run rtx Quadro cards

wooden sail
#

A100 gang

#

note that no particular person should ever buy any of these

supple inlet
#

thats crazy

past meteor
#

As much as I dislike non fixed price cloud services I think they're the way to go for GPUs

supple inlet
#

will a p40 be able to locally run LLM

past meteor
#

24GB vram right? You can run certain things

supple inlet
#

yeah but im a little confused, a youtuber was saying theres two cards each at 12

past meteor
#

Honestly, I don't know the specs by heart. Your odds are higher going through Reddit and whatnot to find someone with exactly your (envisioned) setup

supple inlet
#

Thanks ill try that

hasty gust
#

(don't actually buy one though. Just if you have access to one, don't bother with multi gpu setups unless you really need more than 80gb)

supple inlet
#

Which cpu is better suited for LLM: i5 12600k or ryzen 7 5800x

long canopy
#

we need an encoder/decoder architecture for a transformer when the "form" or "type" of the input is different than the output, right?

wooden sail
#

it might be wiser to pay for usage of a compute cluster online than to build an expensive computer that will anyway not be great at it

past meteor
#

spot on

dusty valve
past meteor
#

I hadn't realized they were such a big flex 🤷

dusty valve
#

Can i get ur hiring managers number

serene scaffold
#

I was reading a paper yesterday, and it just said "language models (LMs)" and I don't think they ever said LLM in the paper. And that made me happy.

past meteor
#

You can quantise them pretty hard but I have no sense of how good they are afterwards

serene scaffold
burnt oxide
#

hello i need help with understanding image classification model 🙂

past meteor
#

I read something along the lines of "bigger + quantized > smaller unquantized"

serene scaffold
past meteor
#

idk if you're saying the opposite or not - don't have a frame of reference for LLMs' performance wrt size

past meteor
serene scaffold
serene scaffold
burnt oxide
# past meteor yes, just ask your specific question please

okay so coming from a person who's just done basics of python
now we are given a task to perform LSH on images and design an image retrieval system
so initially we are given 50,000 images in a folder that we are suppose to load and pre process them
so i have no clue where to start or load the data (we are restricted to use pytorch or tensorflow) so i want basically 2 things

  1. how should i load the data? and in what data structure should i store it?
  2. if only someone could give overview about what features to extract from data/images?
past meteor
burnt oxide
past meteor
#

That's a great place to start because they directly answer question 1

#

For 2) how well you do understand the role of convolutional layers in a CNN architecture?

long canopy
#

anyone attempt implementing the 1bit paper?

burnt oxide
past meteor
long canopy
past meteor
#

I'm going to be honest and say I'm not entirely familiar with LSH

burnt oxide
serene scaffold
arctic wedgeBOT
#
The Zen of Python (line 14):

Now is better than never.

past meteor
#

But, what I can say is that the role of the conv layers in a CNN are extracting features from raw input

#

Features that should carry some (domain) semantics

#

A good place to start is likely taking an pretrained CNN (resnet, xception, ...) without the fully connected layers and giving your image as input to that

#

If none of this makes sense (and you're in it for the long game - not just to complete your assignment) I'd have a look at understanding (C)NNs first 🙂

agile cobalt
#

Q1? Tasks? What's that, homework or some take-home assignment?

burnt oxide
agile cobalt
#
  • pre-processing: any computer vision tutorial at all should explain it
  • meaningful features: Try a bunch of different things - Which sort of features easy to compute would you use to manually divide these into different classes? (even if very broad)
  • LSH: See this for a high level explanation but if they're asking for you to implement it you should be able to figure it yourselves / find resources explaining it more technically
  • querying against it should be trivial after implementing it
  • for the metrics, literally look up their names to see how they work and how to implement them (or find which method to use from a library you're allowed to use)
past meteor
agile cobalt
#

tbh the way they worded the second point (Extract meaning features) sounds a bit weird
like... the way they're wording it, it's not clear to me if you are supposed/allowed to use a deep neural network at all or if you're supposed to do it manually

though I guess that if they're telling you to use pytorch/tensorflow you probably should use one?

meager ridge
#

hey are RTX GPUs so powerful I couldn't run code for them on the less fancy GPUs available on compute engines?

past meteor
meager ridge
past meteor
#

So I'm expecting traditional methods (surf, brief, ...) are also on the table 🤷

meager ridge
#

but im having issues with installation

past meteor
#

But features from a pretrained cnn are just better

agile cobalt
burnt oxide
past meteor
#

At least, that's what I noticed empirically

past meteor
#

Seems strange

agile cobalt
#

that first "not" is confusing me

burnt oxide
#

not allowed to use torch or tf to load data , rest we can

agile cobalt
#

how are you supposed to load then? PIL?

past meteor
burnt oxide
past meteor
#

cv2.imread

burnt oxide
#

50k images with that?

#

and even if i open it how am i suppose to get imp data out of it and how am i suppose to store it AHHHHHHHh

past meteor
#
def read_from_folder(path: str) -> list[np.ndarray]:
    images = []
    for file_name in os.listdir(folder):
        img = cv2.imread(os.path.join(path,file_name))
        images.append(img)
    return images
#

smth like this presumably

#

or PIL

#

can be anything

burnt oxide
#

so a list of all images hmm

median drift
#

Hello guys, Is learning data structure and algorithm relevant to become an data scientist?

past meteor
#

Oh, it says "use PIL in the docs"

#

Lord, I have sinned in the past 🙏

agile cobalt
#

I actually made something vaguely something to that a little while ago but using an existing open source model for embeddings + an open source vector database for querying instead of implementing it myself, the overall idea is

  • user submits image
  • create a vector out of it
  • query against saved vectors
  • retrieve similar records
    with open source tools and no limitations the first prototype was done and working in 1 day but took a while to polish, create embeddings for all of the data, and put online
past meteor
#

I took a course called search engines & information retrieval in uni

#

💯 % obsolete

past meteor
agile cobalt
# median drift ?

you'll want to have a high level understanding of the basics concepts, but nothing too much in detail

statistics on the other hand...

long canopy
past meteor
#

Knowing how to work with most non-exotic data structures (what their performance characteristics are like) is something I'd say most people that code should know

#

But it's something you can learn over a weekend

#

@median drift I don't respond to DMs sorry

burnt oxide
#

okay so

#

i m suppose to use pil?

past meteor
#

or openCV

#

Installing OpenCV can be a pain

#

If you can avoid it in its entirety that may be nice

burnt oxide
#

nah we have used opencv once

#

so its already installed

past meteor
#

Then you could just use that

odd meteor
wooden sail
#

there is one part of DSA that does overlap with your duties in data science... depending on your definition of data science

#

tasks involving discrete optimization overlap with DSA and dynamic programming. being able to recognize them can save you a lot of heartache, since they're often combinatorial and scale very poorly

#

no need to implement the algs yourself, but you need to be able to recognize the problems to know how/where to look up the solution

desert oar
#

e.g. being able to rewrite multiple linear scans over a dataset with a single loop using dynamic programming, or being able to reason through an accidentally-quadratic algorithm that's exploding your cloud costs

#

knowing broadly what kind of access and insertion time complexity to expect from a hash table, things like that

past meteor
#

One of my friends has the opinion that everyone should write at least 1 program in C, it can be fairly simple even

#

Goes hand in hand with this, you don't need to master it but knowing some of this stuff at a super high level at least makes you stop and think a little bit more 🙂

lapis sequoia
#

Hi Guys, does anybody know how to install python Box 2D using pipenv?
I have python 3.11.0, but box2d-py 2.3.8 refuses to be install into pipfile

#

are there any possible alternatives to box2d-py 2.3.8?

odd meteor
#

We don't have so many people from Africa here, but if by chance you reside or generally from Africa, and you're interested in attending a computer vision summer school in Kenya, please do apply

The African Computer Vision Summer School (ACVSS) 2024 application closes on 15th March.

https://www.acvss.ai/home

#

There's funding as well. (T&C applies)

potent sky
# odd meteor I don't think it's really all that important to become a data scientist. Learn ...

This but I'll add that: probably don't prepare as intense as you would for an algorithms interview, but having an understanding of the common ds and algorithms will be helpful, you will see this reappearing in seemingly unrelated areas of CS.
Because in principle most data structures and algorithms are just a manner of breaking down a problem into mathematically composable parts, which is a widely applicable skill, in CS and beyond.

odd meteor
# lapis sequoia is it free

Idk about International applicants (would have to ask my friend. He's one of the organisers) but there's funding (which usually covers flight and accommodation) for Africans whose application got accepted.

final kiln
#

Surprised they don't have a Docker file in their repo

#

I am starting to see why rust's borrow checker exists and why it don't let me do certain stuff, every time it bothers me it means I was gonna hit myself in the foot

#

Can avoid most problems by not doing fancy pointer stuff

#

I'm setting up a config file that can be read by all processes, otherwise it's gonna be mayhem

#

And the rust binary is going to be reading data from the same file, it reads it, deletes it, waits for the next. This way I have a complete separation of concerns, the pipeline decides everything related to data and the binary just takes data, batches it and performs gradient descent, it doesn't care which epoch or slice it is or anything else

final kiln
#

So the workflow will be, I hit a button on GitHub actions, it finds me a suitable spot instance, deploys prefect there. Then I manually trigger the training loops that I want

#

When I'm done I bring it down, or leave an automation that brings the spot instance down after all pipelines are done

ornate ledge
#

Hi guys I have been self-learning to code for 2 months, I went from not even defining functions and not knowing what pep8 was to my current state. I have tried to follow SOLID principles, this example is a simple webscraper, the code works. What would you do different as someone with more experience, what parts look like bad code, or newbie behavior ?

from selectolax.parser import HTMLParser
from webscraper import WebScraper
import re
import json

def get_data_container(html):
tree = HTMLParser(html)
cars = tree.css('form div.d-md-none')
return cars

def process_title(title):
_, rest = title.split('&', 1)
brand, model, year = rest.split('.')
return brand, model, year

def extract_number(text):
match = re.search(r'\d+', text)
return match.group() if match else None

def extract_car_data(car):
data = {
'title': car.css_first('td.brandtitle-sm > a').attrs['href'],
'passengers': extract_number(car.css_first('td.brandtitle-sm > span').text().strip()),
'price': int(extract_number(car.css_first('span.precio-sm').text().replace(',', ''))),
'details': [item.strip() for item in car.css_first('div.transtitle').text().replace('|', '').strip().split('\n') if item.strip()]
}
data['brand'], data['model'], data['year'] = process_title(data['title'])
return data

if name == "main":

URL = 'URL'
scraper = WebScraper(URL)
html = scraper.get_html()
cars = get_data_container(html)
data = [extract_car_data(car) for car in cars]
with open('cars.json', 'w') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)
scraper.close()
#

webscraper.py

from playwright.sync_api import sync_playwright

class WebScraper:

def __init__(self, url):
    self.playwright = sync_playwright().start()
    self.browser = self.playwright.chromium.launch(headless=True)
    self.page = self.browser.new_page()
    self.page.route('**/*.{png,jpg,jpeg}', lambda route, _: route.abort())
    self.url = url

def get_html(self, selector = None):
    self.page.goto(self.url)
    if selector:
        self.page.wait_for_selector(selector)
    html = self.page.inner_html('body')
    return html

def close(self):
    self.browser.close()
    self.playwright.stop()

I factorized this one with the intention to add subclasses or different capabilities as navigation and stuff

#

BY THE WAY SOME UNDERSCORES DISAPPEAR WHEN COPYING THE CODE, nervermind that

final kiln
#

So you get

Abc
ornate ledge
#
from webscraper import WebScraper
import re
import json

def get_data_container(html):
    tree = HTMLParser(html)
    cars = tree.css('form div.d-md-none')
    return cars

def process_title(title):
    _, rest = title.split('&', 1)
    brand, model, year = rest.split('.')
    return brand, model, year

def extract_number(text):
    match = re.search(r'\d+', text)
    return match.group() if match else None

def extract_car_data(car):
    data = {
        'title': car.css_first('td.brandtitle-sm > a').attrs['href'],
        'passengers': extract_number(car.css_first('td.brandtitle-sm > span').text().strip()),
        'price': int(extract_number(car.css_first('span.precio-sm').text().replace(',', ''))),
        'details': [item.strip() for item in car.css_first('div.transtitle').text().replace('|', '').strip().split('\n') if item.strip()]
    }
    data['brand'], data['model'], data['year'] = process_title(data['title'])
    return data

if __name__ == "__main__":

    URL = 'https://crautos.com/autosusados/searchresults.cfm?c=02281'
    scraper = WebScraper(URL)
    html = scraper.get_html()
    cars = get_data_container(html)
    data = [extract_car_data(car) for car in cars]
    with open('cars.json', 'w') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)
    scraper.close()
#

class WebScraper:

    def __init__(self, url):
        self.playwright = sync_playwright().start()
        self.browser = self.playwright.chromium.launch(headless=True)
        self.page = self.browser.new_page()
        self.page.route('**/*.{png,jpg,jpeg}', lambda route, _: route.abort())
        self.url = url

    def get_html(self, selector = None):
        self.page.goto(self.url)
        if selector:
            self.page.wait_for_selector(selector)
        html = self.page.inner_html('body')
        return html

    def close(self):
        self.browser.close()
        self.playwright.stop()
ornate ledge
iron basalt
#

(Note they use natural log instead of base 2 in physics, but they could use base 2 (and do depending on what they are doing))

#

(When base e, it's "nats" (natural unit of information) instead of "bits")

final kiln
iron basalt
#

This is just classic statistics gas stuff.

final kiln
# iron basalt

Yes they have that equivalence, but in statistical mechanics you assume every micro state is equally probable, so it's like looking at the fair coin only, whose case I already find intuitive

iron basalt
#

So what most people probably think of with the classic idea of entropy.

final kiln
#

Calculation of the entropy for the biased coin is what I don't find intuitive

#

I get it's talking about an average number of bits

#

Because it wouldn't make sense to talk about 0.5bits for ex

#

What average it is, idk

final kiln
iron basalt
#

So lets say you have a set of things you want to send over a wire encoded with bits: {red, green, blue, orange, purple, pink, yellow, brown}. You could use a simple encoding with 3 bits per message: {000, 001, 010, 011, 100, 101, 110, 111}. So on average you are sending 3 bits over the wire. So lets say this encoding instead: {1, 01000, 01001, 01010, 01011, 01100, 01101, 01110}. The other colors now use 5 bits and if they were all equally likely this would use an average of 4.625 bits, so it's worse. But what if all the messages are not equally likely? What if read is really likely and the rest almost never happen? Then this would be close to an average of 1 bit. Btw as a detail on the receiver side, if the first bit it receives is a 0, then it knows it needs to read 4 more. So it would be nice to have a formula to compute this average required number of bits to encode these messages...

final kiln
#

Wait how did the likelihood change the amount of bits set in this case

#

Don't matter if red is super likely, you're still sending the same number of bits unless you decide on a more clever encoding

#

Ah that's exactly it

#

You're sending 1

#

1=red in your other encoding

iron basalt
#

Yeah, and you can measure the difference between the encodings, and then you get the KL.

#

Basically abusing the probabilities to send less, on average.

final kiln
#

Yeah I get it makes sense.

#

Okay it makes sense in this setup of two people or computers trying to communicate, but what is about it that makes it appear in so many places like other than message sending, like in physics and ML

#

I assume that in physics it's gonna tell me (if I use base 2), the number of bits on average that I need to write down the current micro state of a given macro state

#

Ok yeah I think I get it, it's the average number of bits I need to describe an outcome, given that I can use knowledge of the dist itself to minimize that average

iron basalt
#

They use natrual log though, so they don't use bits.

#

Except now they sometimes do.

final kiln
#

Usually in physics log=log10, maths log=loge and CS log=log2

iron basalt
#

In the past, pre-shannon, natural log just made sense, because it makes the math easier.

#

Euler's constant makes calculus stuff easy.

final kiln
iron basalt
final kiln
#

Idk what that means but I assume post Shannon is better

iron basalt
#

Yeah. It's a paraphrase, I can't find the source anymore, but Shannon's work affected physics too, and it really changed everyone's perspective on things.

#

It at first might seem like it's just about efficient communication over wires, but applies far beyond that.

final kiln
crisp raptor
#

Some dude wrote a NN in ps

final kiln
#

Really cool diagram

iron basalt
# final kiln

Now next time someone says "X bits," you can say "X Shannons."

meager ridge
final kiln
final kiln
# final kiln

I can see how this abstract diagram can be applied in many situations

wooden sail
#

the idea of a signal travelling through a channel is key in many control theory, communications, signal processing, and optimization problems

meager ridge
final kiln
meager ridge
meager ridge
final kiln
meager ridge
final kiln
# final kiln

Information source could be some physical system, the transmitter would be something that somehow couples with the quantity I want to measure, the receiver would be the actual display, like a needle thingy pointing to a scale, and the destination is my brain

#

Or, the information source could be my body, the transmitter a black hole, the signal the black hole radiation emitted after I'm absorbed by it, the receiver a measurement apparatus and the destination is someone trying to decode my age

#

Wait no, in that case I'm the message

#

The information source is whatever bumped me into it

#

Or am I the transmitter since all that's being decoded is my age ?

#

I'm gonna read the paper tomorrow, I think I'm getting all of them wrong

iron basalt
#

(Also it should pretty straight forward to make the connection to compression now)

civic elm
#

Hi, I read a book that said RandomForest is the most popular multi classifier on Kaggle, but the author did not cite the source, I wonder if this is true?

agile cobalt
#

you could try determining it based on the classes used in https://www.kaggle.com/datasets/kaggle/meta-kaggle-code but it wouldn't be 100% accurate and may take a while

in practice, RandomForest is a good pretty good place to start when working with tabular data though ; you might want to try something fancier like gradient boosting, but random forest is about as 'safe' as it gets when it comes to avoiding pitfalls and getting a baseline

past meteor
#

An untuned GBM is usually what performs best for me in a diverse set of problems

#

It could also be an implementation issue on sklearn's side but GBMs typically are faster to train for me as well on CPU

full ore
#

Hey peeps, does anyone mind taking a look at #1212891108418256936 if they get a chance, i know it looks daunting to read but I've tried to include as much context as possible to hopefully alleviate the number of questions I receive. I don't there will be no questions

desert oar
unreal mesa
#

anybody using Hugging Face or NLTK? are you able to design complex chatbots with any?

potent sky
final kiln
#

I requested it again using Gmail

#

Ah I already got it ._.

#

That was it

#

Like it took a couple minutes

potent sky
#

Perf

past meteor
vale swallow
#

Hi, in PyTorch I’m so confused on how to get the output channels for that first conv2d layer (the 6… is that just a random number that was chosen?). I’m also confused on how to figure out what to put in for that fc1, can someone pls explain thanks

#

Also, how can I check size of the image as I build the network?

broken eagle
#

Heyo. Anyone has experience working with Whisper? Facing some weird issues with the transcription.

Whisper not predicting end of sequence token properly. Will like to discuss if you have encountered similar issues 🙂

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @lapis sequoia until <t:1709295520:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

unreal mesa
final kiln
#

the lamma2 models are not too shabby

#

I wonder if it's worth it to get a serverless setup for the 70B model

#

perhaps save me from the chat gpt subscription + a lot more privacy

final kiln
wraith flicker
#

Im new to python.. is it possible to make a tool integrated with chat GPT where it has a pre loaded question and just simple input is required? For example:

The tool would say “Enter city:”

You’d enter the city name and press enter, and it would feed back the temperate in that city.

Example:
Enter city: New York
Response: The tempature in new york is **

This is just an example, i’d use it for multiple purposes.

serene scaffold
#

<@&831776746206265384> ad

jagged bane
#

!pban 1141814750808903720 seems like you're just here to advertise

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @random summit permanently.

lapis sequoia
#

Hey!
Figured I'd ask here, as I don't see a reason to open a help channel for this.

What is the best LLM library/framework for Python which supports AMD GPUs?

Thanks 🙂

devout fossil
#

I have used Pycharm for all of my python needs. I am starting to learn ML and Jupyiter Notebook is recommended for that. Can I set up pycharm so that it has the same features or should i learn how to use Jupyiter?

final kiln
#

In vs code there's an extension for it

devout fossil
#

Yes, as I looked deeper it looks like pycharm can just act like jupiter notebook.

final kiln
devout fossil
#

Scientific project is what i need. It is a pycharm profesinal only feature, but I have it

dusty forge
#

I think the extension will simply open Jupyter inside Pycharm if I'm not mistaken

devout fossil
#

Yeah, it creates the jupyter server but uses pycharm UI

vagrant root
#

heyy

#

i am a student that has worked in python and c and done some wbdev along with ml and ai

long canopy
#

bitnet impl

#

from 1bit paper

vagrant root
#

what should i expect

#

and what should i prepare for

final kiln
surreal sierra
wraith flicker
#

Im new to python.. is it possible to make a tool integrated with chat GPT where it has a pre loaded question and just simple input is required? For example:

The tool would say “Enter city:”

You’d enter the city name and press enter, and it would feed back the temperate in that city.

Example:
Enter city: New York
Response: The tempature in new york is **

This is just an example, i’d use it for multiple purposes.

raw mortar
final kiln
#

let's gooooooo

#

the second blue one is already using a rust binary

#

and training a model

#

now would be nice to have a flow chart for it, and logs

frigid badge
#

ok so I went offline for a few days sorry about that, but I decided to cut down some of the work and just use python to allow the MLM to try make tools and such

#

now I just gotta figure out how the MLM will take that information decide on its use, how I would actually apply theuse in the simulation and how the MLM will pass on the information to their offspring.

grand breach
#

is 42k rows 17 columns high dimentionality ?

frigid badge
#

so any idea's are welcome...

serene scaffold
#

which is to say, it's 17 dimensions, the rules for doing math in 17-dimensional space is the same, regardless of how many points you have in it. Be it 10, or 100, or 42,000

#

To answer "is 17 dimensions high?", this is a matter of perspective. but I've seen 2048 dimensions

supple inlet
#

Hello everyone is this a good place to ask about a pandas dataframe question?

left tartan
supple inlet
#

Ah nice.

#

I have a main df with a location column and another df which has a ledger of locations with corresponding postcodes. I want to add a new postcode column with the correct postcodes for the locations in the main df. Where im tripping up is the main df has duplicate locations. Both df locations spellings are the same

peak jackal
#

so you want to use the second df to append a column to the first dataframe based on the location. you're keying 2nd dataframe's postcodes (the one of the one-to-many relationship) to the first dataframe's postcodes (the many of the one-to-many relationship)

dusty forge
#

df2 postal codes to df1 where df1 location = df2 location

#

ChatGPT gave me this

#

`# Assuming df1 and df2 are your DataFrames

df1 is the DataFrame you want to add information to

df2 is the DataFrame containing the lookup table

Create a dictionary from the lookup table

lookup_dict = df2.set_index('key_column')['value_column'].to_dict()

Use map() to apply the lookup

df1['new_column'] = df1['key_column'].map(lookup_dict)`

#

so I think key-column would be location, and value column the postal codes ( @supple inlet )

peak jackal
#

you could splice out what you want from the original dataframes, if there's more, then merge it accordingly.

dusty forge
#

yeah gpt also gave me Merge, but that seems to be adding the full df2 to df1, but i guess it has options

dusty forge
#

coming from the SQL side of things, I would merge this outside of Python honestly, but might be a great moment to learn it with pandas

#

What are the pros and cons of data loading, wrangling, cleaning outside of Python, likely with SQL, and then store it in a db table specifically for ML? As opposed to wrangling/cleaning within Python?

wraith flicker
#

Anyone have any idea this won't work?

Youd enter the two basketball teams, and chatgpt would tell you who has better statistics overall.

Example:
Enter two teams separated with a comma: Philadelphia 76ers, Miami Heat
Response: According to sources, the team with better overall statistics is ----

This is just an example, i’d use it for multiple purposes.

dusty forge
#

you might want to cover up your key 😉

wraith flicker
thorny drum
#

is there a channel for dat engineering questions or does this serve as such?

#

I have a pyspark issue I wanted to learn about

crystal fjord
#

ok I want to make my own python back end using flask and pytorch weight that is already trained dataset on some project I have and I just can't seem to understand how to make it so it can read the live camera feed from my nginx server or how to even pair the 2 togetherI need help with it, if anyone knows what to do please help!

supple inlet
#

@dusty forge @peak jackal Thanks for your help, ill give your suggestions a go. At first i did use gpt and bard for help and it suggested merge which didnt work great but perhaps thats down to my bad prompt.

supple inlet
# dusty forge What are the pros and cons of data loading, wrangling, cleaning outside of Pytho...

This is going to sound stupid and i totally would but IT at work is refusing giving me access to the DB. So what im stuck with is downloading raw data (excel) from a web portal and making do with that. Im a data analyst (2 months in) and its my first job out of uni and the IT department sucks. They refused to let me download python and said its a security risk but my machine already had it installed lol.

peak jackal
#

i feel that

#

data analysts/scientists embedded in the non-IT team seem to be in an awkward position in these things so often

#

2¢: I understand if you just want to get what you need to get done. But if you had "I use Python to do data analysis!" in your resume, or in your interviews... lean on that. Let them know that you were hired for a reason. You're going to need to communicate effectively with the IT team, and this is a reason to start a conversation with them.

versed pilot
versed pilot
odd meteor
# vale swallow Hi, in PyTorch I’m so confused on how to get the output channels for that first ...

Yes, you can pick the number of output channels you want. Hopefully, this long post provides you a little bit of clarity.

Suppose we have 1 input channel and 3 output channels, it would look like the attached image (see image 1). So here, I'm using the @ symbol to denote the number of channel(s)

  • On the left, we have an input represented by a 12x12 image with a single channel.
  • On the right, our output is a feature map that is 10x10 in size and has three channels.
    To consider concrete example, let's assume again we have an input image 5. (see image 2)

Now the convolution operation for this first output feature map is exactly the same convolution operation you're probably familiar with. I modified the notation a bit here in order to include the channel reference.

The @ symbol followed by 1 refers to the first output channel. (see 3rd image). Then we'll also perform the same convolution operation for the 2nd and 3rd output channel respectively (see 4th image)

In fact, we are using the same convolution operation across all three output channels. The only distinction here is that, each output channel uses a different set of weights; in other words, we use different feature detectors for each channel to produce different outputs. However, the convolution operation itself remains exactly the same for each output channel.

=================================

Suppose we have 3 input channels and 5 output channels. Now if we have 5 output channels, we need 5 kernels, and since we have 3 input channels, each of the kernels has to have three channels as well. (see image 6)

if I want to code this now in PyTorch it'll be

import torch
import torch.nn as nn
layer = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=2)
layer.weight.shape

This configuration will generates a four-dimensional weight tensor. Therefore, the resulting weight tensor will have the dimensions of 5x3x2x2 shape.

Hopefully this clarifies things a bit for you.

odd meteor
# vale swallow Hi, in PyTorch I’m so confused on how to get the output channels for that first ...

In that 1st fully connected layer, we're simplying flattening the image (tensor) in order to perform the last arm of the work (MLP) where the classification happens.

The CNN part is the 1st arm which is used to extract features from the image before it's passed to the last arm (MLP part)

So the last convolutional layer has 16 output channels × 5 (input image width) X 5 (input image height)

120 is the number of output features from the fc1.

long canopy
#

anyone know to what extent loss functions are convex?

final kiln
supple inlet
long canopy
final kiln
#

Idk if this is correct but, if 0 is a point, the hyper surface tends to infinity, and is continuous

#

Then it's gotta be convex at least on avg

long canopy
#

no guarantee and we need regular enough convexity to apply convexity algorithms

#

we don't even know if we can reach 0

final kiln
#

Wait convexity is a global thing from what I'm seeing now. Uhm I think that it's always gonna be a funnel shape with local minima here and there and tending to infinity cuz the model not modelling the data is the easy thing

#

But a pure convex shape almost looks like an idealized case right

#

Not sure tbh

#

I think this looks realistic

#

And this looks artificial

supple inlet
grand breach
past meteor
grand breach
#

or I should select the columns manually

past meteor
#

Either you use a method with regularisation, most popular ML algorithms do this, or you do a PCA with hyperparemeter tuning

wooden sail
#

convexity in general also doesn't mean there's a unique solution, either

#

for nonlinear cost functions, you can safely assume they're non convex. ml in general has no guarantee of optimality

dusty forge
#

Short q: why do the x values get stored in 2D, and the y values in 1D? Why not both in 2D by default?

dull idol
#

Hi, 2-3 months ago there was a post on reddit about an alternative to jupyter notebooks, but I couldn't quite remember the project name
you edit it like a regular python file, but it can be executed interactively like jupyter,
it also says that it guarantee the flow of code execution (top to bottom), so it can be run like a regular python file

wooden sail
#

you used iloc to take all rows, and then a number of columns, from a dataframe. then you used values to get a numpy array representation of that

#

for y, you said "all rows, 1 column". that's 1d

#

for x, you specified a range of columns, even if the range is a singleton

dusty forge
#

ahhhh

#

I see

#

would this work for y? [:, -1:-1]

wooden sail
#

try and see

#

think carefully about the ranges

#

what do you expect 1:-1 to do? what about -1:-1?

dusty forge
#

I tried -1:-1 as in, select the last up to the last, to force a range

#

did not work

wooden sail
#

you should leave the dimension empty to specify "up to the last element"

dusty forge
#

for y I only want to have the last column, hence trying -1:-1 to force a range selection even though it's only one column

wooden sail
#

because -1:-1 is an empty slice

#

it contains nothing

#

leave it empty or write None if you want to let the slice go "to the end"

#

!e

import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print(x[:, -1:])
print(x[:, -1:None])
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | [[3]
002 |  [6]]
003 | [[3]
004 |  [6]]
wooden sail
#

do take a chance to read through those links on slicing

dusty forge
#

How would you select only the last column, without the need to reshape in a later step?

#

Basically the second part of my original question, how can I read the data and store the y, in the same way as storing the X, immediately as a 2D array?

wooden sail
dusty forge
#

ohhhhh

#

-1:

wooden sail
#

in two different ways, too

#

please read what i wrote carefully

dusty forge
#

oke so
-1:-1 as in, last column up to last column = does not work as it's an empty slice
-1: as in, last column up to the end = does work

#

selecting the last column as the 'start' confused me

#

Thanks, yeah that worked, not sure why the course is not using that, I guess they want us to learn the reshape method

untold bloom
#

[-1] might be clearer

#

though that guarantees a copy, which may or may not be desired (probably won't matter)

#

also .values is sad code nowadays, cool people use .to_numpy()

#

.values doesn't always give a NumPy array whereas .to_numpy() as its name suggests does, though the instances where .values is not giving back a NumPy array but instead, e.g., pandas array is super rare but still

#

lastly, some APIs might as well accept pandas stuff, so chances are you don't need to go to NumPy's domain at all maybe

dusty forge
#

Ok understood, will keep this in mind. I'm following an Udemy course where they use .values, but I'm also reading the pandas (and numpy) book by Wes McKinney as an additional source, just to have a more complete view

#

The course starts with Regression and it might be possible that they will change the code along the way to teach us other (and perhaps better) ways to code for similar purposes, that's what I'm hoping for anyways.

orchid forge
#

I've been trying to learn data analysis all by my own from few months
I see it's getting tougher for me.
Thinking about buying a course
Can anyone recommend a good one?

odd meteor
dusty forge
# orchid forge I've been trying to learn data analysis all by my own from few months I see it's...

For solely Data Analysis, I highly recommend Maven Analytics, they offer both on Udemy and their own platform. Very clear and clean explanation, course comes with their own dataset, and you 'play' the role of the analyst on different topics depending the course. I have several of their courses, Power BI, Power Excel/Pivot/Query. They explain the theory, then followed by the practical exercise, followed by you getting a virtual message by one of your colleagues asking for an analysis (this is the actual exercise to check if you understand what to do), then the solution.

orchid forge
odd meteor
# orchid forge I've been trying to learn data analysis all by my own from few months I see it's...

Check the pricing of https://DataCamp.com or https://DataQuest.io

You can also check Udemy

Dataquest

97% of learners recommend Dataquest for learning AI and data skills. Better teaching = better outcomes. Take a free lesson now >>

orchid forge
#

K

#

Also idk why I think that data analysis would not be a cool job for a girl like me

dusty forge
#

Holy moly this code, I tried to write down my notes as comments but how close or wrong am I here?

long canopy
#

thanks for the answer btw

wooden sail
#

you study each one individually and group them by how nice they are

#

convexity, smoothness, etc

long canopy
#

currently wondering whether % linearity of total layers could play a role in how nice they are

wooden sail
#

you can't assume anything about a function, it's your job to show it belongs to a family or to recognize at a glance it has a special form

wooden sail
#

linear funcs associate into a single linear func

long canopy
wooden sail
#

nope

#

as soon as you have a nonlineae activtaion, no

long canopy
#

ah right

#

softmax etc

wooden sail
#

if those werent there youd have just one linear or affine map

#

yeah

past meteor
#

It's worth writing this out to see it

final kiln
#

All things considered, gradient descent is a pretty crude algorithm that assumes very little about the search space, just that it is first order differentiable.

If we're using it then I'd assume most inputs will not have nice properties.

wooden sail
#

it's not just gradient descent though, ML also relies on the gradients being stochastic

#

all of the gradients are wrong at each iteration due to noise and batching, but the error averages out through the training

#

you reap the benefit of being able to escape saddle points this way

#

scheduling of the step size can also help you escape local minima

final kiln
#

Yeah you're working with the wrong surface at each iteration

wooden sail
#

that's where all the momentum and what not comes in. you almost never use GD or SGD directly

final kiln
#

Was trying to train a GAN with the SGD optimizer, as soon as I changed to Adam it worked

wooden sail
#

that's cuz GD actually has fairly strict conditions go guarantee convergence to a stationary point

#

either a step size that decreases quickly enough, or lipschitz continuity, along with starting close enough to a stationary point

#

adam adaptively uses momentum to make a step size schedule

final kiln
final kiln
long canopy
#

are there special fine-tuning algorithms, or is it just doing more training on a special dataset?

dusty forge
#

What I think it says in the line is: arrange the X-axis values in decimal steps, specifically 0.1. But what does the reshape in second line do, or perhaps, why is this second line needed? And in line three, why do I need to predict on X_grid instead of X? Wouldn't changing the actual values of the data, and showing the data be separate? To me this looks like I'm changing the values of the data just to show it nicer in the graph.

#

What I mean is, when I have values like 15, 33, 71 for example, those are the true values. Changing them to 15.0, 33.0, 71.0 so the graph looks smoother feels wrong. As opposed to only changing the axis' intervals from 0 to 10, to 0.0 to 10.0.

#

That's how I interpret this part

dull idol
grand breach
#

what is the way to find correlation between a numerical feature & a categorical target variable

#

there are 17 columns and they can be segregated as demographic, call and one more (3 columns)

urban gazelle
#

Hi

dusty forge
#

You also want to look if you need all the 16 other columns as Xn or able to ignore some of those if you think it won't add value

odd meteor
dusty forge
#

Not cell-based? So I guess it would be one of the many IDE's? PyCharm, VSCode?

dull idol
#

they also said that you can rerun the file/notebook as a script,
they somehow managed the notebook state so that you dont have to deal with "which cell should i run first to make this code work"

#

all I can remember is that their website was green 😂

final kiln
#

You can select the parts you want to run

#

Ig the disadvantage was that the graphs weren't neatly displayed after the code

dull idol
#

spyder IDE? I know this program but its not it

final kiln
#

It's as if you ran it as a script or in the terminal

final kiln
#

Now I want to get into neovim so that I can tell everyone I use neovim

#

Jk, I think it's really good, looking at how fast people are with it

devout fossil
#

I was following a tutorial and the histogram for one of the columns was like this image. The tutorial then said that we should use the log function to modify that column so that it had more of a bell curve.

Why is that needed? Does it just yield better results and should be done on all skewed columns? Or is it more specific to the needs later on?

random sapphire
#

HI

#

i've got a doubt does MLOPS have scope in 3-4 years

final kiln
#

MLOps is a lot of work, idk how much they're paid but it ain't enough ._.

random sapphire
final kiln
#

What do you mean by scope in 3 years

#

You mean like, will it still be relevant ?

random sapphire
final kiln
# random sapphire exactly

I don't see much reason for it to disappear. As long as you need infrastructure to do ML, you need MLOps

random sapphire
#

ohk

left tartan
#

What I’ve heard from several big (really big) firms is: MLOps is an area they are struggling, like DevOps, it’s a complex topic that requires serious engineering and planning and oversight.

#

Just talked with a large firm last week and they specifically called out mlops as their biggest challenge/gap

lapis sequoia
odd meteor
# devout fossil I was following a tutorial and the histogram for one of the columns was like thi...

Log transform is one of the popular transformation techniques applied to a skewed data whose distribution isn't Gaussian in a bid to make it one.

Aside transforming skewed data to approximate normality, these are other reasons why log transform is useful:

  • Reducing the impact of outliers
  • Linearizing relationships between variables
  • Stabilizing variance in heteroscedastic data
  • Simplifying complex relationships
devout fossil
odd meteor
devout fossil
#

Thanks!

odd meteor
long canopy
#

adam optimizer is still SOTA?

#

is there any reason to use something else?

buoyant steppe
#

I'm trying to make forecasting to a signal that have same pattern everyday using SARIMA model but the forecasting become close to zero. is there anyone know how can I improve the forecasting result?

left tartan
arctic wedgeBOT
#

:x: failed to apply.

final kiln
#

Tbh, I think lamma2 7B is gpt 3.5 level. Can talk and do basic tasks if well instructed, but also not very intelligent and will easily misunderstand context

#

It can also run in the CPU, not that slow

#

I'm running a discord bot on a private server to test it out during the day

final kiln
long canopy
long canopy
#

in the image below, z_j^l is the pre-normalization activation of the jth neuron in layer l, C is the cost function, and delta is the neuron's error

#

is there a case where delta is NOT defined this way?

final kiln
#

The error would be the cost function itself

#

In gradient descent you use the directional derivative to ascertain the direction towards the minima

#

But like, it doesn't really give any info about how much distance to travel

long canopy
#

ah right, that's the l,j parameter's net instantaneous contribution to the error

final kiln
final kiln
latent radish
#

just a meme

#

like im amazed

#

"Wooooow"

final kiln
#

Oh. I'm finding it hard to interpret the facial expression

final kiln
#

So like if increasing z, also increases the error, then what you want is to decrease z

And if increasing z decreases the error, then what you want is to increase z

The full gradient gives you the relative values between the changes in the various z's so that the resulting vector points in the direction of greatest ascent.

This is why force of gravity = - gradient ( gravitational potential )

long canopy
#

pytorch is computationally optimized right? i.e., it's as fast as if we do a manual implementation in numpy right?

buoyant vine
#

it is probably going to be faster

#

Numpy is a good tool

#

but it is not specialised for the things PyTorch can do

hollow mortar
long canopy
#

yeah pytorch has the cuda stuff

#

probably is superior to numpy after a certain size

hollow mortar
hollow mortar
#

there are more than this list

long canopy
#

need to choose a compute renting platform

#

what do you peeps like?

final kiln
#

I'm using AWS cuz I have a lot of free credit. Otherwise I think I'd use vast.ai, seems to have the best prices

final kiln
final kiln
#

If you use it try using Spot, which will give you the best pricing in exchange for letting them take the machine away with a 2min warning

long canopy
#

oh nice

#

hm, might make sense to learn to use their API to quickly make VMs and deactivate them right?

#

get some MLOps skills on the way

final kiln
#

I think there's clients that can be used for all cloud providers, can help minimize vendor lockin

long canopy
#

skypilot

final kiln
#

Perhaps, terraform falls in this category I think

#

Haven't used it yet

#

My MLOps setup is a GitHub actions workflow that brings up a spot instance and does a prefect deployment. Which itself is a pipeline that coordinates a rust binary that performs the actual training and a python task that fetches data from a hosted parquet file, processes it and gives it to the rust program

astral niche
#

Hey I'm having trouble with color detection for open CV regarding HSV values, and was wondering if I could get some help on it

final kiln
#

They'll also register data to be read by MLFlow

#

And finally ofc, Ill eventually have the attention mechanisms coded in custom CUDA kernels

#

So that I can benchmark their performance

long canopy
final kiln
long canopy
final kiln
#

MLOps you have automation

quick sandal
#

pyvista + trimesh + some data =

#

wip

unique summit
#

Any ML/Vision genius have a quick 10-15 mins to chat about drawing recognition

buoyant steppe
#

Any time series ML models expert can have 10 mins of quick discussion about model performance?

dull idol
#

It works as it said, plain python file.
each cell is just a method with its dependencies specified, very readable.

#

vim keybind works out of the box too!!

dusty forge
dull idol
#

its offer the same interactivity as jupyter, but underneath its just a python file

#

on jupyter you can run any cell anytime you want, its good for prototyping
but its a pain when you have to figure out in what order does the cell need to run in order to make the code works

#

on marimo, they figure out each cell dependencies by itself

gloomy anvil
#

sounds cool. It shouldn't be that hard in jupyter though, it's supposed to run in order 🙂
And you can use jupytext have jupyter notebooks as plain text python (just how vscode also does it)

long canopy
#

what do you guys use for hyperparam tuning?

odd meteor
odd meteor
long canopy
final kiln
#

I use the pipeline itself, which sends the data to MLFlow

long canopy
final kiln
long canopy
#

oh I see

long canopy
final kiln
#

In case of GitHub actions, you can write matrices and it computes the various possible combinations

#

In case of prefect, it's a bit more manual but the UI is good enough because the form caches the values from the previous run

#

I'm still seeing how this will go down tho, I might need to write a script or code it to behave like GitHub matrices

#

But I don't think a script will be needed

long canopy
final kiln
final kiln
#

I've been exploring a lot of things, trying to find the best ways of doing this or that

#

it's a lot of work overall, but I'm hoping that once I get to a good pattern I'll never have to have so much work with it

#

I'll just keep some boilerplate code on some repo so I can reuse it

long canopy
#

yeah it would be nice if there were some mlops libraries that would help out w/ this stuff

final kiln
#

I think github is in a very good position to solve many of these problems, at least from where I stand

#

if they enabled gpu runners for github actions, with competitive pricing to aws

#

and with drivers ready and all that good stuff

buoyant vine
#

but Azure already has some pretty good CICD integration for GPU runners

#

For AWS runners, idk if you already use it. But CML makes it like a 1 line setup to spin up instances on AWS and attach them as runners automatically

#

and can do things like reuse idle instances, etc...

final kiln
#

Seems like it

buoyant vine
#

yeah

final kiln
buoyant vine
#
 deploy-runner:
    runs-on: ubuntu-latest
    if: "!startsWith(github.event.head_commit.message, '[skip')"
    steps:
      - uses: iterative/setup-cml@v2
      - uses: actions/checkout@v3
      - name: Deploy runner on EC2
        shell: bash
        env:
          REPO_TOKEN: ${{ secrets.CRAWLER_GH_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_CML_REGION: ${{ env.AWS_CML_REGION }}

        run: |
          cml runner launch \
              --repo=https://github.com/my/repo \
              --cloud=aws \
              --cloud-region=$AWS_CML_REGION \
              --cloud-type=g5.8xlarge \
              --cloud-hdd-size=100 \
              --labels=cml-runner-large \
              --reuse-idle \
              --idle-timeout=1200

  model-training:
    if: "!startsWith(github.event.head_commit.message, '[skip')"
    timeout-minutes: 5000
    needs: [deploy-runner]
    runs-on: [self-hosted, cml-runner-large]

Example CI config

#

cut out the company specific stuff

#

but that is about all you need to do really, and then just tell CI to run what ever script starts the training

final kiln
final kiln
# buoyant vine but that is about all you need to do really, and then just tell CI to run what e...

Are you running the training thing directly or are you deploying something ?

My previous setup was a self hosted runner in which I passed the training parameters via env, which I would get by defining a matrix strategy.

It was great, except that it took a long time to start the self hosted runner and that slowed me down quite a lot.

So now I'm deploying a prefect thing, which exposes a UI that I can use to start runs dynamically. I can set the training parameters there and everything will get recorded on MLFlow.

#

I also couldn't selectively stop runs. Or start new ones. Apart from the very good version control (hyper parameter search strategy got written directly in the commit), it was very clumsy and slow

final kiln
#

Prefect is promising to solve all my problems tho.

#

They have a managed solution

#

I can create work pools, turn them on and off, once deployed they know to reconstruct the pipeline from the correct commit

#

And it seems to support spot

#

And handles the fault tolerance, tho not sure if it will workout neatly when GPU is involved

#

So like, I'm just gonna have a CD pipeline on GitHub, and a bunch of deployments that I can trigger manually on the UI

#

Input is easily written via pydantic models

#

If the spot work pools work as I imagine it, this is awesome

idle stone
#

are top university college courses on ai pre chatgpt release worth studying or is the material too old to be considered good enough to spend hours on? had someone hook me up with a bunch of old lectures from a few places and im seeing if it would be worth it to study them alongside my current school lectures

surreal sierra
#

Hi does anyone that does data analysis know what the best way to analyse timesheet data would be? Thank you

final kiln
idle stone
#

excellent, i'm so glad to hear this! and if some of the intro lectures are from 2015, would those also still be good or should i find something more up to date?

final kiln
# idle stone excellent, i'm so glad to hear this! and if some of the intro lectures are from ...

2015 is getting to be a bit ancient tbh, but really, it depends on what stuff you're studying. For example the stuff 3blue1brown shows in his ML videos has not changed and I don't think it will

When you're getting to more recent models it's best to check the time when they came out I think. For ex transformers appeared in 2017, so the more mature material will be at least a couple years after that

left tartan
#

Yah, if you take an AI course like I did in grad school (multiple decades ago)... that stuff is terribly outdated. God, what a waste of time.

serene scaffold
left tartan
#

But im not sure if it was irrelevant at the time, or that AI in general was at a real low point in that era.

serene scaffold
left tartan
#

So yah?

#

I think nearly half the course was on rules driven nonsense/expert systems

serene scaffold
#

expert systems sounds outdated even for 2003

left tartan
#

At least from a historical review. Absolutely nonsense, and we knew it

hollow mortar
#

constructing knowledge bases has a bright future

#

it is the replacement of 1d written natural language

#

1d as in text flow in one direction, word, sentence, paragraph, chapter, document

#

and the ability to refer to any point in your knowledge base at any other point, automatically infer differences between viewpoints, so much

#

automatic detection of contradiction in ones logic, detection of self justifying logical systems

#

in addition this improves the agency of human actors instead of constructing things external to conscious bodies which conducts thought

lapis sequoia
#

AI is good and bad at a same time. If citizens can no longer replace governments - wont govs simply become openly dictatorial? Xd

idle stone
raven brook
#

hi

#

format1 = ff.FortranRecordReader('(F8.4,F9.4,ES12.4, F8.4, 1x, 1x, 1x, I3, I3, ES12.4, F8.3, F8.3, 1x, ES12.4, 1x, ES12.4, ES12.4, 1x, ES12.4, ES12.4 1x, ES12.4, 1x, 1x,I3,I2,I3,I2,I3,I2)')
Im trying to read a fortran formatted output file using this command but I get the following error

#

InvalidFormat:
Token: type=ED7, value=ES has invalid neighbouring token

#

at first I had this error but with value=x, I fixed it by putting a 1 in front of every x. But when I try this fix with ES, I still get the same error

#

anyone know what Im doing wrong?

past meteor
left tartan
#

You know we need? More rules!

past meteor
#

It has use cases but they're slim

#

I don't think it's healthy to trim down AI to ML and ML to NLP and NLP to LLMs

buoyant vine
#

Tbh, I think to most people AI = ChatGPT rn

idle stone
#

yeah that's a wild reduction

#

yeah 100%

past meteor
#

ML gives approximate answers and the rule based stuff gives you exact answers

#

ML needs data and rule based stuff needs ... rules

buoyant vine
#

if people actually work in the industry/tech then I don't think there really is any bad slimming

#

it is just to people who aren't techy, the only AI they see are the likes of ChatGPT or deep fake Talyor Swift nudes

past meteor
#

I think it holds for AI professionals to a certain extent as well

#

How many people could set up a rule based system when it's appropriate (or recognise that it is for that matter)

left tartan
#

!mute 1181398920358789120 1d this behavior is clearly not appropriate. Read our #rules

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @raven brook until <t:1709583921:f> (1 day).

left tartan
dusty valve
#

Too real

versed pilot
final kiln
versed pilot
#

Depends what we mean by AI, things like linear regression and even basic classifiers are ancient

#

And it is worth dipping into the history of computer vision and speech recognition if you are working in those fields, it's a long history

final kiln
abstract wasp
#

Have any of you guys tried to access the Shapenet dataset?

long canopy
#

am currently living thedistilbert life

long canopy
#

torch.cat((torch.ones(len(datalist1)), torch.zeros(len(datalist2))), dim=0) is there a nicer way to write this

final kiln
#

Yes, by expanding it out into variables named in a self-documenting manner

#

Probably also by pre-allocating the array with zeros and incrementing the ones you want to be ones

viral dagger
#

Guys, cv2.imread is not reading all the png files in a path, what can be the reasons ? specifically stops after reading 446 files

hazy panther
#

I have a server sending objects of data to another server through an mqtt.

According to predefined parameters, the data should run through some filters and ml models and return to a user results.

What are the best practices, architectures and technologies/frameworks I should use considering that I use python as a server language.

Example, if I have a data parcel that is 1 second of ECG samples, and pre-defined param states that I want to fetch heart rate results, I should by default filter the ECG samples and reduce noise, and if the parcel doesn't include any R peaks, I should use another filter and set R peaks for this second of data. Only then the ml heart rate model can run. Another parameter is the rate of samples. Some devices have different rates and should be included within the inputs.

This is one example. Consider having multiple filters while sometimes needed and sometimes not. Consider having some models that require A buffer of 10 seconds to run a ml model to produce results. In my head, it looks a bit like a branch with nodes. If there is some resources you recommend or reviewing that seem similar, I'll be more than happy to check them out. 8 just really don't know what to search for

ancient glade
#

I want to study EDA, but I need datasets with some missing values for learing the handling of missing part better. But the datasets that I want to use don't have missing values. Is there any efficient way to generate missing values, lets say 12%? But it should be random. I tried bruteforcing but the dataset has around 40k entries. Pls help

jaunty helm
ancient glade
jaunty helm
jaunty helm
# ancient glade Yeah I wanted some efficient solution, brute forcing is not worth it
>>> import pandas as pd, numpy as np, seaborn as sns
>>> df = sns.load_dataset('iris')
>>> df
     sepal_length  sepal_width  petal_length  petal_width    species
0             5.1          3.5           1.4          0.2     setosa
1             4.9          3.0           1.4          0.2     setosa
2             4.7          3.2           1.3          0.2     setosa
3             4.6          3.1           1.5          0.2     setosa
4             5.0          3.6           1.4          0.2     setosa
..            ...          ...           ...          ...        ...
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica

[150 rows x 5 columns]
>>> rng = np.random.default_rng()
>>> mask = rng.random(size=df.shape) < 0.12
>>> df[mask] = np.nan
>>> df
     sepal_length  sepal_width  petal_length  petal_width    species
0             5.1          NaN           1.4          0.2        NaN
1             4.9          3.0           1.4          0.2     setosa
2             4.7          3.2           1.3          0.2        NaN
3             4.6          NaN           1.5          0.2     setosa
4             5.0          3.6           NaN          NaN     setosa
..            ...          ...           ...          ...        ...
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           NaN          3.0           NaN          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica

[150 rows x 5 columns]
>>>
ancient glade
#

Yeah but will it work on a dataset with more than 40k values?

#

Iam concerned about the time

jaunty helm
jaunty helm
ancient glade
#

So how exactly does it work? I would like te learn the intricacies of it

jaunty helm
ancient glade
#

I understand the code, I am more interested in how does the function actually works on the dataframe. Like what makes it faster, and what mathematics goes behind it

jaunty helm
ancient glade
#

Okay, thanks

odd mountain
#

I have a neat-python project but my issue is that the agents have the same-ish behaviour with a minority of the population showing variation in behaviour... What am I doing wrong (I know it's not enough info but if someone's willing to help, I'll share repo / provide any more info)

versed dirge
#

Hello all! I had an open question that went unresolved. Trying to get into the transformers library from HuggingFace but I keep hitting a dependency error on distutils. Can anyone in this group tell me what I'm doing wrong? Would love to explore the topic of ML and AI here with everyone but this bump is popping up for some reason; recent install of transformers where the requirement is supposedly not there anymore since distutils has been deprecated?

long canopy
mint shard
#

Guys can someone guide me for creating an ai?

#

dm me

long canopy
#

fml, rocm docker image is frigging huge

long canopy
buoyant vine
#

002_salute Yup, which is why we use onnxruntime aha

#

Nvidia Cuda with pyTorch is also insanely big

buoyant vine
#

onnxruntime can support multiple execution platforms

#

rocm and migraphx are two of them for AMD gpus

#

I think the docker image size with them enabled tends to be < 1GB if I remember right

long canopy
#

woah incredible

buoyant vine
#

does involve converting your model to onnx first though

long canopy
#

any performance loss wrt rocm/cuda?

buoyant vine
#

but it is well supported by pytorch and TF

buoyant vine
#

shouldn't be

long canopy
#

very cool, thanks a lot for the reference

buoyant vine
#

For us we actually get faster times, but that is because onnxruntime has some specialised handling for transformers

#

but yeah, if you are unfirmiliar with it, highly recommend

long canopy
#

incredible stuff, tyvm!

surreal sierra
#

Hi can anyone recommend predictive models for timesheet data? I want to separate by employee, the month and the project if this is possible

grand breach
#

what could be said about the prediction bias in random forest & decision trees algorithms on imbalanced dataset ?

serene scaffold
grand breach
#

they perform really well and give high accuracy on imbalanced dataset but that leads to overfitting

final kiln
# long canopy 55 GB lol

It's a nightmare, every time I try to get AMD to do the thing it don't do it and I spend countless hours to then have Nvidia do the thing in less than an hour

grand breach
final kiln
#

Oh I think you can

buoyant vine
#

you can definitely train it

#

What you can do is train with onnx, export model waits to like safetensors, then import back to pytorch if needed

final kiln
#

Might've been something specific to the model I guess

#

I was exploring models from all the frameworks, ended up only being able to do inference with them

long canopy
#

uh is your entire screen supposed to freeze when you run pytorch with cuda?

final kiln
#

Paddle in particular was a bit troublesome, also tf

buoyant vine
#

For real though, if it is using most of your GPU memory, and the actual cores itself, then yeah your display is going to start suffering

#

If you CPU allows it, you may find it temporarily benifitial to switch the display to be rendered with the CPU's inbuilt iGPU

surreal sierra
serene scaffold
mint palm
#

If I make a numpy array somewhere in between, gradient wont flow through them, right?????

long canopy
#

any framework for managing epochs?

#

get me model xy epoch z

#

keep every 5th epoch

spiral whale
#

do u know any background remover model?

#

idk if they are technically called saliency object detection

final kiln
final kiln
long canopy
#

heatmap analysis

final kiln
#

It keeps your training loop details hidden from the code you use to send data to your logger

#

A recent one I'm using is to have one process that doesn't care about which epoch it is rn, it just reads the data from the disk and applies gradient descent, this inside a while True

#

And then another that keeps generating data and storing it to the disk

#

This way one process trains, the other pre processes the data, and there's 0 down time for each

long canopy
#

what's the reasoning behind yielding the entire model? seems interesting

#

to work in a REPL-like environment?

final kiln
final kiln
#

Thank you

#

Still looking for the best ways to do stuff

#

Tbh, I think compiled languages will have their place in my workflow.

#

For experimentation, jupyter and python for sure, but for a full implementation I think the compilation step is crucial, there's a ton of useful stuff that can be done there like checking if the matrix dimensions make sense

#

Not even the full code, just the model code, it's super helpful

#

There's this tho

surreal sierra
long canopy
#

distilbert weights

spring field
#

hello people, bit of a conceptual question for my understanding
what are the hidden layers supposed to do? so I understand it's basically linear transformations and sigmoid functions (or some other function like ReLU) all the way through and the linear transformations make sense as a means to go from more inputs to less outputs via those linear transformations, like if you have 3 separate parameters and 1 value as the prediction, you need to at some point or over the course of the hidden layers transform those 3 into 1, alright, so that makes sense and like you then also put them through those sigmoid functions or whatever
but, how do you decide on what transformations you're gonna do exactly? because you could for example expand the inputs via linear transformations and then compress them again to the output dimensions and like how do you figure out that you need to expand the inputs to more dimensions for example?
and more generally, how do you figure out how many hidden layers you need for a particular thing?
I'd also appreciate some resources if you can throw some my way 😁
I'm sort of reading this rn: http://neuralnetworksanddeeplearning.com/

long canopy
#

we don't have a good understanding of why particular settings work better than others

long canopy
spring field
#

alright, thanks, currently I feel like I have a rough understanding (being at the peak of Dunning-Kruger's effect be like) of like the sort of whole process and how like you try to reach a local minimum and such, I just don't quite get the layer configuration if you can call that, I feel like I could throw something random together and it would work, it just doesn't seem quite right, but yeah, I am practicing this ofc, so ig it's mostly time that will tell 😁

long canopy
#

wrt layer configuration

spring field
#

yeah, that's what I'm currently sort of working with

long canopy
#

but yeah, layer configuration is literal magic atm.; what's worth knowing in these is their different types. convolutional, recurrent, feed forward, attention, etc. etc., and the very general idea of what purpose they tend to work for

spring field
#

but also, I was wondering, not planning to do it right now, but like I see being able to use this to sort of "predict" statistical outcomes of certain parameters by minimizing the loss, but, for example, are these sort of neural networks also used for something like game AI or do they use different models altogether? I just sort of wanted to eventually practice this by implementing a somewhat "sentient" npc in a game, but currently at a bit of a loss for how to even approach it, especially with these linear/sigmoid/relu models

spring field
spring field
long canopy
#

if you're able to reduce your problem to generating a statistical distribution, you'll be able to use ML for it

spring field
#

I see, that makes sense then, it also seems rather complicated, but I guess that's the fun part 😁

long canopy
spring field
#

I suppose I'd have to read some actual research into such uses for AI at that point

#

overall exciting though

long canopy
#

definitely!

leaden rock
#

Gradient

#

Gradient descent?

iron basalt
spring field
spring field
iron basalt
#

(Also made by one of the first computer game developers)

#

(Coined the term "machine learning")

spring field
#

can that even be considered ML? isn't it just follow these rules and these and brute force this stuff... I mean, I guess it could be considered AI (as outlined by one of the pinned messages), but ML? well, maybe, I haven't looked into that work

iron basalt
#

Later some extra heuristics added.

spring field
#

alright

iron basalt
#

(Also this was on the first computer implementation of checkers that he wrote too)

iron basalt
#

You can imagine that a plain old table would not scale well, the memory usage would become ridiculous, and it has no ability to deal with things it has not stored previously (exact same) (if it's not in the table we just go full brute force again (with some heuristics)).

odd meteor
# long canopy get me model xy epoch z

You can use Model Checkpointing to achieve this. You can use it to do stuff like: get me model weights in epoch 8 or best model weights after completing training on the whole epoch etc.

#

If you are using Lightning it comes even much easier

long canopy
long canopy
#

looks interesting, thanks a lot!

iron basalt
# spring field making sentient npcs for games?

IDK what sentient means, but for NPCs in a game they tend to be most interesting via emergent behavior caused by lots of simple rules interacting with each other and/or the environment. You could have them be small neural networks, or make use of them in part. Something like a language model to interact with does not really work in practice, it's something a lot people are trying but from a game design POV it does not really make sense unless you just want random stuff / interactions to happen, which makes it less of a game and more of a simulation to mess around in.

spring field
#

so, uhh, sth that would at least to some extent resemble a human (ish) player in a 1st person shooter is how I'd go about describing the result I'm sort of looking to achieve eventually

long canopy
hollow mortar
#

dont know if theyve been mentioned but i remember reading about perceptrons and it seemed to be some theory about the choices in regards to layer configuration

spring field
#

nooo, my hopes and dreamssss... shattered in the snap of the moment

iron basalt
spring field
#

well, hope dies last, so at least I got some of that still left now...

#

I mean, fair, from a game design POV, I can't really imagine anything particularly impressive to come out of this experiment, but I'd like to practice with it that way

long canopy
iron basalt
#

Maybe to act as a shadow-boxing like bot, for high level competitive play practice, but other than that, not sure. When NPCs in a game become too good, or too unpredictable they become impossible to design a game with.

iron basalt
#

Standard RL agent, has been done before, runs on a PC easily.

serene scaffold
#

!otn a this is not a language model it's a shooter

arctic wedgeBOT
#

:ok_hand: Added this-is-not-a-language-model-it’s-a-shooter to the names list.

spring field
#

honestly, I have no issue with it becoming too good, besides I suppose it could be possible to somewhat revert it to previous revisions where it wasn't as good? by using weights and stuff from those previous generations or whatever?

serene scaffold
#

actually I don't like that

#

!otn rm this is not a language model it's a shooter

arctic wedgeBOT
#

:ok_hand: Removed this-is-not-a-language-model-it’s-a-shooter from the names list.

long canopy
#

they did mention sentience

serene scaffold
#

!otn a this isn't a language model it's an fps

arctic wedgeBOT
#

:ok_hand: Added this-isn’t-a-language-model-it’s-an-fps to the names list.

long canopy
iron basalt
spring field
#

do llms use the same concept of input layer + hidden layers + output layer ?

iron basalt
#

Usually this is done with classic game AI, like hierarchical portfolio search or something.

long canopy
iron basalt
#

Technically, HPS can make use of neural networks in it, since it allows mixing of AI methods.

#

And hard coded stuff.

serene scaffold
long canopy
spring field
serene scaffold
iron basalt
iron basalt
#

(Note, it's not "mostly complete" at all)

long canopy
#

is this a poster for ants

spring field
# serene scaffold something being a language model is a description of what it does, not how it's ...

alright, but like, is all machine learning/ai/whatever other buzzwords pretty much using this idea of input layers + hidden layers + output layers as the sort of base for everything? sure it can be cyclic somewhere along the hidden layers and such, but can it be something other than this (and possibly rules, so like, obviously it could be like a couple if statements here and there and such, but yeah)

long canopy
#

yes that's all it is

iron basalt
spring field
long canopy
hollow mortar
#

yep so regarding the question about choosing hidden layers, in classifiers, there is theory about how the neurons relate to the ability to properly catagorise input data

#

the ability of a multilayer perceptron to draw boundarys may be a good search term
however this is something i just looked into so i cant provide info

long canopy
#

we don't have the results yet

#

i have been doing a literature review (of abstracts, like a real researcher)

#

there are no results

#

we have no clue

serene scaffold
long canopy
#

the statement has been accepted

spring field
#

cool and I thought people had a lot of this figured out already, guess I couldn't have been further from the truth, lol
it is pretty nice to be learning about this also because it's sort of demystifying the concept of ai for me
thanks for the valuable information everyone, I'll probably hang around here more often from now on 😁

hollow mortar
#

this was also in reference to how to pick the number of layers and number of neurons

long canopy
#

randomized grid search is the only way

#

only the RNGods will save us

iron basalt
hollow mortar
#

i feel like there are better determenistic optimisation techniques out there which dont exist yet, in an attempt to slay the rng gods :D

long canopy
#

if you find something tag me lol

hollow mortar
#

but some ppl dont think so, i was having a whole thing in the emacs server

#

lol u2

hollow mortar
#

i have a non linear optimiser without constraints that doesnt work for all cases i made, but not in researchy enough circles to know where ( it/ the approach) sits on the scale of ( easy peasy did it at age 10 -> meh -> cool and fresh)

long canopy
#

is it just a general nonlinear optimizer?

hollow mortar
#

its not really neural net pilled, yh just general

#

but it would seem ( without exploration) that general opt could be applied to some nn

long canopy
hollow mortar
long canopy
hollow mortar
#

ok to picture a nn in typical human optimisation form there is some function which takes all weights of the net as input and produces some single real num which represents the error for that training data
and that can be minimised
no constraints
is this linear could it be done w simplex

#

idk i sleepy

#

gnight

dusty valve