#data-science-and-ml | Python | Page 107

lapis sequoia Feb 25, 2024, 9:13 PM

#

final kiln Feb 25, 2024, 9:13 PM

#

Yeah sometimes that happens because the lib code is not properly typed

#

But you can still do the help thing

lapis sequoia Feb 25, 2024, 9:18 PM

#

final kiln But you can still do the help thing

hmm, that's interesting. How would you find log-probabilities of normal distributions which has domain (-inf, inf)?

final kiln Feb 25, 2024, 9:20 PM

#

lapis sequoia hmm, that's interesting. How would you find log-probabilities of normal distribu...

The domain has nothing to do with it, you have to look at the target. The y axis in the normal distribution represents probability mass, so you'd have to apply log to that somehow

#

log(X) where X ~ N

#

My starting point would be here maybe, log(p(x)dx), find some way to make this make sense

lapis sequoia Feb 25, 2024, 9:23 PM

#

final kiln The domain has nothing to do with it, you have to look at the target. The y axis...

you mean log-normal distribution?

final kiln Feb 25, 2024, 9:24 PM

#

lapis sequoia you mean log-normal distribution?

Yes that seems to be it

#

#

Tho a bit of a boring proof compared to going for the intuitive picture of infinitesimals

#

No wait, here ln(x) ~ N right

lapis sequoia Feb 25, 2024, 9:26 PM

#

final kiln Yes that seems to be it

i'm trying to apply techniques from discrete action space for actor-critic to continuous action space and it's not working

final kiln Feb 25, 2024, 9:26 PM

#

lapis sequoia i'm trying to apply techniques from discrete action space for actor-critic to co...

Idk anything about that tho

lapis sequoia Feb 25, 2024, 9:27 PM

#

final kiln No wait, here ln(x) ~ N right

no, it's log(X) where X ~ N, because we want X to be bounded by (-inf, 0), not (0, 1)

final kiln Feb 25, 2024, 9:27 PM

#

If you want to interpret your array as a set of probabilities people usually apply softmax, but honestly idk your context in any way

final kiln Feb 25, 2024, 9:29 PM

#

lapis sequoia no, it's log(X) where X ~ N, because we want X to be bounded by (-inf, 0), not (...

I'm confused, above the log normal is log(X) ~ N, what you seem to be looking for is log(X) where X ~ N, but that doesn't put any bound on X, it spans the real line still

#

But I reckon the derivation is super similar

lapis sequoia Feb 25, 2024, 9:33 PM

#

final kiln I'm confused, above the log normal is log(X) ~ N, what you seem to be looking fo...

lmao forgot log of normal distribution is actually a straight line

final kiln Feb 25, 2024, 9:33 PM

#

lapis sequoia lmao forgot log of normal distribution is actually a straight line

That can't be tho

#

The area under the curve must sum up to 1

#

And any individual p(x)dx is a probability too, so it must be above or eq to 0

#

That is you can't have negative numbers

final kiln Feb 25, 2024, 9:34 PM

#

lapis sequoia lmao forgot log of normal distribution is actually a straight line

Oh sorry you said log of

#

Ah I got it, I think you have to do like

lapis sequoia Feb 25, 2024, 9:37 PM

#

final kiln Oh sorry you said *log of*

straight line makes sense, because normal distribution tends to 0 at both -inf and +inf, but straight line tends to -inf and +inf respectively

final kiln Feb 25, 2024, 9:37 PM

#

Integral of A log(p(x))dx = 1

#

And find A = 1 / that integral

#

Like just normalize it

#

This the only way that makes sense, because you can't really do log(X)

lapis sequoia Feb 25, 2024, 9:38 PM

#

final kiln Integral of A log(p(x))dx = 1

this is just for taking log of probability which improves numerical precision

lapis sequoia Feb 25, 2024, 9:38 PM

#

final kiln This the only way that makes sense, because you can't really do log(X)

I don't need it to be a distribution, you can't normalise a straight line anyway

final kiln Feb 25, 2024, 9:39 PM

#

lapis sequoia this is just for taking log of probability which improves numerical precision

Ah, in that case you need to take it to the positive reals first

#

Maybe exp(x)

#

Or add a displacement in case X is bounded

final kiln Feb 25, 2024, 9:40 PM

#

lapis sequoia this is just for taking log of probability which improves numerical precision

But again, that's totally not a probability

#

Since prob is not negative

lapis sequoia Feb 25, 2024, 9:41 PM

#

final kiln But again, that's totally not a probability

I can do log(X)^2

lapis sequoia Feb 25, 2024, 9:42 PM

#

final kiln But again, that's totally not a probability

or log(abs(X)) where X ~ N

final kiln Feb 25, 2024, 9:42 PM

#

lapis sequoia I can do log(X)^2

You can't because X can be negative

final kiln Feb 25, 2024, 9:42 PM

#

lapis sequoia or log(abs(X)) where X ~ N

You are just transforming the real line, you can even omit X~N

#

X is not a probability

lapis sequoia Feb 25, 2024, 9:43 PM

#

final kiln You are just transforming the real line, you can even omit X~N

I think taking log probability of normal distribution makes no sense. It only works with categorical actions and not continuous actions

final kiln Feb 25, 2024, 9:44 PM

#

lapis sequoia I think taking log probability of normal distribution makes no sense. It only wo...

I mean, that's what you asked for right

final kiln Feb 25, 2024, 9:44 PM

#

lapis sequoia hmm, that's interesting. How would you find log-probabilities of normal distribu...

Here

lapis sequoia Feb 25, 2024, 9:44 PM

#

final kiln I mean, that's what you asked for right

im very new to RL, thanks for your help

final kiln Feb 25, 2024, 9:45 PM

#

lapis sequoia im very new to RL, thanks for your help

I don't know anything about it, otherwise I could've helped more

#

Been getting enough fun with supervised learning

lapis sequoia Feb 25, 2024, 9:46 PM

#

final kiln Been getting enough fun with supervised learning

i dont even know what it means

lapis sequoia Feb 25, 2024, 9:46 PM

#

final kiln Been getting enough fun with supervised learning

do you manually change hyperparameters?

final kiln Feb 25, 2024, 9:46 PM

#

lapis sequoia i dont even know what it means

Just means labeled data

final kiln Feb 25, 2024, 9:46 PM

#

lapis sequoia do you manually change hyperparameters?

Yeah

#

Gotta do a grid search, I'm even doing a whole pipeline for it

lapis sequoia Feb 25, 2024, 9:47 PM

#

final kiln Yeah

do you use tensorflow?

final kiln Feb 25, 2024, 9:47 PM

#

lapis sequoia do you use tensorflow?

I prefer pytorch, recently started using the rust bindings

lapis sequoia Feb 25, 2024, 9:49 PM

#

final kiln I prefer pytorch, recently started using the rust bindings

I know about 1% about pytorch compared to tensorflow. will reverse engineer a ready continuous action space from someone else, should find the solution

final kiln Feb 25, 2024, 9:50 PM

#

lapis sequoia I know about 1% about pytorch compared to tensorflow. will reverse engineer a re...

They are fairly similar I think, they all are really.

Be sure to check with papers with code too. I recently spent like 2 weeks training a model that was already at maximum performance since the start >.>

#

Like always check what other people are getting for that dataset and how and if your results make sense in that context

#

I was getting 56% accuracy for a dataset where the best BERT gets 65%, given that my models are way smaller and likely much less optimized, it's a good result

#

And the way BERT got there was by pre training it on next token prediction on the usual massive amounts of text

tidal bough Feb 25, 2024, 10:06 PM

#

lapis sequoia lmao forgot log of normal distribution is actually a straight line

you misplaced the 2, making it a multiplier instead of a power.

long canopy Feb 26, 2024, 12:10 AM

#

are literally all non-mamba leaders in the open source LLM leaderboards some variation on transformers?

long canopy Feb 26, 2024, 4:39 AM

#

do ALL loss functions for transformers work in the following way? the transformer produces a probability distribution of the most likely next token; then a loss function scores the probability assigned to the actual next token? the aim then is that, in the following step, a better probability is assigned to the actual next token?

agile cobalt Feb 26, 2024, 5:02 AM

#

It's pretty hard to make statements about "ALL" things in any research field, specially one that's moving/evolving pretty fast

Even if it boils down to that, the loss function which "scores the probability assigned to the actual next token" part can be extremely complicated thanks to reinforcement learning from human feedback (RLHF) / Proximal Policy Optimization (PPO)

long canopy Feb 26, 2024, 5:04 AM

#

agile cobalt It's pretty hard to make statements about "ALL" things in any research field, sp...

will look into those keywords, thanks!

agile cobalt Feb 26, 2024, 5:04 AM

#

even further down the rabbit hole: https://huggingface.co/blog/pref-tuning
(and not gonna lie, I don't understand much of that either)

remote stream Feb 26, 2024, 6:41 AM

#

Guys i got a question

#

they have given me a dataset (generated or something), when i check profession and age
average age of students seems to be 41

#

they want me to build an application with target audience regarding time management in social media

#

so do I consider the target audience?

remote stream Feb 26, 2024, 8:11 AM

#

hello?

final kiln Feb 26, 2024, 8:12 AM

#

long canopy do ALL loss functions for transformers work in the following way? the transforme...

I've seen two major setups for training them, in case of decoder architectures (single branch with causal self attention, i.e. zeroing of the self attention scores of future words ), you feed the x[i:i+n] and then expect it to reproduce x[i+1:i+1+n], so it's like you're asking it to reproduce the input except for the first word and to guess the lass the word. So it's actually not just doing next token prediction, it's also doing transcription.

#

The second major setup I've seen is that for encoder architectures, single branch with no masking of attention scores. In this case you take the input, say x, and you substitute tokens at random and assign them a special token and then expect the transformer to reproduce the entire input but substitute back the tokens by means of guessing.

final kiln Feb 26, 2024, 8:13 AM

#

final kiln I've seen two major setups for training them, in case of decoder architectures (...

This is how GPTs are trained

final kiln Feb 26, 2024, 8:14 AM

#

final kiln The second major setup I've seen is that for encoder architectures, single branc...

This is how BERT is trained

wispy pulsar Feb 26, 2024, 8:28 AM

#

Guys, I coded a virtual Hadron Collider also known as a Monte Carlos, can anyone help me with something?

final kiln Feb 26, 2024, 8:30 AM

#

wispy pulsar Guys, I coded a virtual Hadron Collider also known as a Monte Carlos, can anyone...

woah, I've also coded something similar, what do you need help with ?

#

did you use geant4 ?

wispy pulsar Feb 26, 2024, 8:30 AM

#

The collected data, I don't know what to do with it, and I'm using Python

final kiln Feb 26, 2024, 8:31 AM

#

wispy pulsar The collected data, I don't know what to do with it, and I'm using Python

what did you simulate exactly, and how did you set it up ?

wispy pulsar Feb 26, 2024, 8:31 AM

#

3163906 different interactions of all particles

final kiln Feb 26, 2024, 8:32 AM

#

wispy pulsar 3163906 different interactions of all particles

right, which particles, which transport code ?

wispy pulsar Feb 26, 2024, 8:32 AM

#

Um.

#

I was studying physics and connected GR and QM, and connected the strong force to the table, and was able to generate the results based off that.

final kiln Feb 26, 2024, 8:33 AM

#

wispy pulsar I was studying physics and connected GR and QM, and connected the strong force t...

can you show me the code that you wrote, maybe it's easier for me to understand what you did

wispy pulsar Feb 26, 2024, 8:33 AM

#

I can't provide it because of the implications of this all..

#

I can't trust anyone but myself with the information as of yet

final kiln Feb 26, 2024, 8:34 AM

#

GR is usually not involved in these simulations though, as their mass is actually very small and we're talking at very small scales

wispy pulsar Feb 26, 2024, 8:34 AM

#

Yes, it's connecting GR and QM

final kiln Feb 26, 2024, 8:34 AM

#

wispy pulsar I can't provide it because of the implications of this all..

then I cannot help

wispy pulsar Feb 26, 2024, 8:34 AM

#

So even on the quantum scale it still applies

final kiln Feb 26, 2024, 8:35 AM

#

wispy pulsar Yes, it's connecting GR and QM

there's actually no connecting to be done between QM and GR, QM is a mathematical formalism that you apply to classical theories, it's more of a framework, there's first and second quantizations. Applying these formalisms to GR has been unsuccessful as far as I understand

wispy pulsar Feb 26, 2024, 8:36 AM

#

I did a lot of research in to it, trust me. The connection became obvious after a while.

#

I have a ton of calculations in regards to all of that nonsense

#

But I'm here now trying to make sense of the data I've gathered

final kiln Feb 26, 2024, 8:38 AM

#

wispy pulsar I did a lot of research in to it, trust me. The connection became obvious after ...

my understanding is that the more recent developments point to spacetime as being a quantum object already, like with ER=EPR, which would explain why applying quantization to it don't work

wispy pulsar Feb 26, 2024, 8:38 AM

#

I solved the connection differently.

#

Useing logic

final kiln Feb 26, 2024, 8:39 AM

#

that's super suss tho

wispy pulsar Feb 26, 2024, 8:39 AM

#

You may think it, but it's what my research lead too.

#

All the calculations, figuring out how things actually worked compared to what we think we know

#

And figuring out the connection between the two\

final kiln Feb 26, 2024, 8:39 AM

#

we don't think we know anything though, we have data and a bunch of explanatory models that fit to that data

wispy pulsar Feb 26, 2024, 8:40 AM

#

Ask yourself this question

#

If you were to place two objects with the same mass in a vacuum, no matter how far apart they were from each other, how fast do you think their velocity would get?

final kiln Feb 26, 2024, 8:41 AM

#

early proponents of QM didn't even think of QM as describing reality, they believed QM to model our data only, which explains why it's so weird, it's more about our ignorance of the world than about the world itself

wispy pulsar Feb 26, 2024, 8:41 AM

#

I know it's about the ignorance, I solved for the ignorance.

final kiln Feb 26, 2024, 8:42 AM

#

wispy pulsar If you were to place two objects with the same mass in a vacuum, no matter how f...

I'd just use a bare bones newtons law of gravity for that

wispy pulsar Feb 26, 2024, 8:42 AM

#

final kiln I'd just use a bare bones newtons law of gravity for that

Yeah, I did too, and did more, and more, and more, and it always still broke the speed of light.

#

Now will you help me with this process? lol

final kiln Feb 26, 2024, 8:43 AM

#

no I can't help you with your data since I don't know how you produced it

wispy pulsar Feb 26, 2024, 8:43 AM

#

It's not about how it was produced, but how the current data can be observed.

final kiln Feb 26, 2024, 8:43 AM

#

wispy pulsar Yeah, I did too, and did more, and more, and more, and it always still broke the...

that tends to happen if you don't use relativity

wispy pulsar Feb 26, 2024, 8:44 AM

#

final kiln that tends to happen if you don't use relativity

I ended up using everything in relativity in the end and it still broke the lorentz factor, so...

final kiln Feb 26, 2024, 8:44 AM

#

wispy pulsar The collected data, I don't know what to do with it, and I'm using Python

I cannot answer this question without details about the simulation

#

or what you're looking for

#

usually from monte carlo simulations you get a bunch of particle tracks

#

along with their associated energy losses at each interaction with the medium

wispy pulsar Feb 26, 2024, 8:45 AM

#

This is one of the results from one of the interactions

final kiln Feb 26, 2024, 8:45 AM

#

which you can use to do lots of things

final kiln Feb 26, 2024, 8:46 AM

#

wispy pulsar This is one of the results from one of the interactions

again it's meaningless to me without knowing what you did

#

it's like me saying I simulated the japanese economy using language models, but then I don't tell you how I did it and show you a random graph

wispy pulsar Feb 26, 2024, 8:47 AM

#

What would you do in my position then with the implications of it all?

#

I can't trust anyone but myself I feel like.

final kiln Feb 26, 2024, 8:47 AM

#

wispy pulsar What would you do in my position then with the implications of it all?

if you have something useful for the scientific community I'd write a paper on it and submit it to peer review

#

I'd also open source the code

wispy pulsar Feb 26, 2024, 8:48 AM

#

This is the answer to everything.

#

I don't know if everyone should be open to such a thing until more is known

final kiln Feb 26, 2024, 8:48 AM

#

unless it can be used for nefarious purposes, in that case I'd be careful not to release something that produces too accurate results and place it behind a form or somethin

wispy pulsar Feb 26, 2024, 8:49 AM

#

That's what I'm saying, you could create the biggest explosion ever with this information.

#

You can create cures for diseases, create diseases, not even sure of the implications at the end of it all.

#

All I know is it has to be kept close to chest

final kiln Feb 26, 2024, 8:50 AM

#

sure, but they're empty claims until they've gone through the very rigorous peer review process you usually see in particle physics

wispy pulsar Feb 26, 2024, 8:51 AM

#

I was solving for the The Equation of Almost Everything and connected it mathematically, it wasn't just a logic step.

final kiln Feb 26, 2024, 8:51 AM

#

final kiln unless it can be used for nefarious purposes, in that case I'd be careful not to...

they do this with lamma for example

wispy pulsar Feb 26, 2024, 8:51 AM

#

It's how I got these values in the first place was because of that equation.

final kiln Feb 26, 2024, 8:51 AM

#

like you can do anything with mathematics, it's like when you get super good with python and you can just do wtv

#

doesn't mean wtv is good, most times it's actually kinda bad

#

what truly matters is putting your model against experimental data

wispy pulsar Feb 26, 2024, 8:52 AM

#

What I am saying is the results are accurate based on the results of the data.

final kiln Feb 26, 2024, 8:52 AM

#

which data exactly ?

wispy pulsar Feb 26, 2024, 8:52 AM

#

All the interactions between the different particles

final kiln Feb 26, 2024, 8:52 AM

#

in your simulation ?

wispy pulsar Feb 26, 2024, 8:52 AM

#

A Monte Carlo

final kiln Feb 26, 2024, 8:52 AM

#

that's not experimental data

#

that's using existing models to produce synthetic data

wispy pulsar Feb 26, 2024, 8:53 AM

#

This is off of my previous data, no other models.

final kiln Feb 26, 2024, 8:53 AM

#

and where did that data come from ?

wispy pulsar Feb 26, 2024, 8:54 AM

#

The Equation of Almost Everything

final kiln Feb 26, 2024, 8:54 AM

#

that's just an existing model

#

you need to build a big machine and then collide the particles, you can't just use an existing model

wispy pulsar Feb 26, 2024, 8:54 AM

#

I made improvements on it and calculated data on the different interactions

#

It shows every interaction based off the Monte Carlo

final kiln Feb 26, 2024, 8:55 AM

#

this kind of physics is very stale right now due to hardware limitations, there's not gonna be a new big discovery until they build a larger hadron collider or someone sees something crazy in smaller experiments

wispy pulsar Feb 26, 2024, 8:55 AM

#

I have the data, it doesn't matter.

#

The calculations went through without issues

final kiln Feb 26, 2024, 8:56 AM

#

I can tell from this conversation that you have synthetic data produced from the existing models

wispy pulsar Feb 26, 2024, 8:56 AM

#

Models that I improved on

final kiln Feb 26, 2024, 8:56 AM

#

okay, do you have new predictions ?

wispy pulsar Feb 26, 2024, 8:57 AM

#

That's what I'm trying to figure out.

#

I have all this data but don't know how to read it properly

final kiln Feb 26, 2024, 8:58 AM

#

that strikes me as very odd, it's usually very straightforward to understand the predictions of your own model

wispy pulsar Feb 26, 2024, 8:59 AM

#

The first test was for the frequency and strength of the interaction, then I calculated for the graph above as well as other statistics.

#

But I don't know what it all means

final kiln Feb 26, 2024, 9:00 AM

#

wispy pulsar The first test was for the frequency and strength of the interaction, then I cal...

in MC simulations the stats are already kinda baked into the simulation

wispy pulsar Feb 26, 2024, 9:01 AM

#

MC simulations?

final kiln Feb 26, 2024, 9:01 AM

#

monte carlo simulations of particle transport and interaction with matter

wispy pulsar Feb 26, 2024, 9:02 AM

#

Sounds like that's what I do next.

final kiln Feb 26, 2024, 9:02 AM

#

you take existing models to calculate colision cross sections, which you use as probability distributions of the final variables resulting of each interaction

wispy pulsar Feb 26, 2024, 9:02 AM

#

I need to keep doing my monte carlo progression

#

Thanks for the help.

final kiln Feb 26, 2024, 9:09 AM

#

this reminds me that I have to finish writing a whole thing about this stuff

steady hedge Feb 26, 2024, 11:43 AM

#

Hello everyone

void crescent Feb 26, 2024, 12:11 PM

#

so im running a keras sequential model on colab, and have REALLY low accuracies

#

on top of that, it just ends abruptly at 14 epochs when i clearly specified 20 epochs

#

ive never seen something like this happen

#

my model architecture is:

def create_improved_model():
    model = Sequential()

    model.add(Conv2D(64, kernel_size=3, input_shape=(50, 50, 3), activation="relu"))
    model.add(BatchNormalization())

    model.add(Conv2D(32, kernel_size=3, activation="relu"))
    model.add(BatchNormalization())

    model.add(Dropout(0.25))  # Adjust the dropout rate as needed

    model.add(Flatten())
    model.add(Dense(1, activation="softmax"))

    # Compile the model
    adam = Adam(learning_rate=0.0001)
    model.compile(loss="binary_crossentropy", optimizer=adam, metrics=["accuracy"])

    return model

#

can someone tell me why this is happening?

karmic roost Feb 26, 2024, 12:40 PM

#

anyone who have experienced with OCR?

#

which model do you use?

potent sky Feb 26, 2024, 12:46 PM

#

The principles proposed in the RCNN paper continue to remain useful
Tho more powerful models must've been developed for each different "module" of the process

#

e.g. ViTs for extracting rich feature map representations

long canopy Feb 26, 2024, 2:26 PM

#

@final kiln ty for comments!

void crescent Feb 26, 2024, 2:31 PM

#

even with an accuracy of 88%, for some reason it always returns the EXACT same accuracy EVERY single time for both images


class_1_img = "/content/sample_data/10253_idx5_x501_y351_class1.png"
class_0_img = "/content/breast-hispathology-images/10301/0/10301_idx5_x1001_y1651_class0.png"

# Load and resize the images
img_0 = cv2.imread(class_0_img)
img_0 = cv2.resize(img_0, (50, 50), interpolation=cv2.INTER_LINEAR)

img_1 = cv2.imread(class_1_img)
img_1 = cv2.resize(img_0, (50,50), interpolation=cv2.INTER_LINEAR)

# Ensure the images have the correct shape and type
img_0 = np.expand_dims(img_0, axis=0)  # Add batch dimension
img_0 = img_0.astype('float32') / 255.0  # Normalize pixel values between 0 and 1

img_1 = np.expand_dims(img_1, axis=0)
img_1 = img_1.astype("float32")/ 255.0

# Make predictions
prediction_0 = model.predict(img_0)
prediction_1 = model.predict(img_1)

print(prediction_0[0][0])
print(prediction_1[0][0])

#

can someone tell why exactly is this happening

carmine wharf Feb 26, 2024, 2:54 PM

#

Hi everyone, I have been trying to implement U-net model in tensorflow from original paper. In the paper it is said that the network uses "unpadded convolutions". This lead to some shape problem in my case. Every implementation that I have seen une "same" as padding for convolution. I do not understand it. If the papers states "unpadded convolutions" shouldn't one use "valid" as padding ?

final kiln Feb 26, 2024, 3:57 PM

#

carmine wharf Hi everyone, I have been trying to implement U-net model in tensorflow from orig...

uhm if you don't pad your convolution kernels the resulting output will be slightly smaller in size

#

it's not really the kernel being padded right

#

it's the input matrix

#

probably a good idea to go over what each of those modes mean, but for the 1d case, which is much easier to follow

carmine wharf Feb 26, 2024, 4:01 PM

#

exact, if you don't pad, the ouput will be smaller, which is what they show in the paper, if I understand correctly

final kiln Feb 26, 2024, 4:03 PM

#

carmine wharf exact, if you don't pad, the ouput will be smaller, which is what they show in t...

if I had to guess, there's likely just not a big performance difference between the modes, the UNET in particular is a very strong and versatile architecture

#

or, maybe the other way around right

#

there's a performance difference that was found in a later paper

#

so every1 does the other way

carmine wharf Feb 26, 2024, 4:04 PM

#

maybe not, but I try to understand why people implement it in a way that from my point of view differs from the original paper

#

ah ok

#

would you have the reference of the later paper you mention ?

final kiln Feb 26, 2024, 4:04 PM

#

carmine wharf maybe not, but I try to understand why people implement it in a way that from my...

try to look into more recent papers, even the transformer has already suffered some mutations

final kiln Feb 26, 2024, 4:05 PM

#

carmine wharf would you have the reference of the later paper you mention ?

not really, check the website papers with code for SOTA papers

#

there should be a section for the UNET somewhere

carmine wharf Feb 26, 2024, 4:08 PM

#

ok, great ! thank you !

void crescent Feb 26, 2024, 4:15 PM

#

why does this happen

ValueError: Failed to find data adapter that can handle input: <class 'PIL.Image.Image'>, <class 'NoneType'>

my code is:

def tensor_to_image(tensor):
    tensor = tensor*255
    tensor = np.array(tensor, dtype=np.uint8)
    if np.ndim(tensor)>3:
        assert tensor.shape[0] == 1
        tensor = tensor[0]
    return PIL.Image.fromarray(tensor)


def load_image():
  #reshuffle
  reshuffled = test_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
  for i, (image,label) in enumerate(test_dataset.take(1)):
    img = tensor_to_image(image[0])
    img.save("sample_img.png")
    print(parasite_or_not(model.predict(img)))

load_image()

#

the image is getting saved

#

but this error also comes with it

viscid glen Feb 26, 2024, 5:38 PM

#

Hello! Anyone who has experience with OCR and organize the output in a table who can help me with a doubt? I will really appreciate it 🙂

long canopy Feb 26, 2024, 5:49 PM

#

what would be your preferred way to make a 27x27 heatmap

potent sky Feb 26, 2024, 6:01 PM

#

long canopy what would be your preferred way to make a 27x27 heatmap

Heatmap of what

#

And from what

long canopy Feb 26, 2024, 6:03 PM

#

potent sky Heatmap of what

ended up just using plt.imshow

desert oar Feb 26, 2024, 6:15 PM

#

long canopy ended up just using `plt.imshow`

this was going to be my response. just pay attention to the aspect and interpolation settings.

long canopy Feb 26, 2024, 6:19 PM

#

desert oar this was going to be my response. just pay attention to the `aspect` and `interp...

will do, ty!

long canopy Feb 26, 2024, 6:57 PM

#

what the heck are all these interpolation options

#

you guys use these at all? since all I have is a 27x27 array, I'm using None, but at which point would I use something else?

final kiln Feb 26, 2024, 6:58 PM

#

I think that's just to make the image lighter or something

#

Yeah I think high fidelity render is heavier, so you interpolate the pixels

#

I usually just use the option to get it to not do interpolation

#

#

Uhmmm

#

Yeah that's it right, it loads like a percentage of the pixels and displays interpolated values

long canopy Feb 26, 2024, 7:01 PM

#

I guess, if you just need a datapoint most precise than what you actually have

final kiln Feb 26, 2024, 7:01 PM

#

#

No the other way around rite, since it's averaging out

long canopy Feb 26, 2024, 7:02 PM

#

no clue when this would be relevant

final kiln Feb 26, 2024, 7:04 PM

#

Chat gpt says it's just for appearance purposes, like when resizing the image, zooming in and out, etc, you gotta interpolate the missing pixels

#

Guess not zooming in, but scaling it

long canopy Feb 26, 2024, 7:09 PM

#

hm, it would make sense in the following situation: you've got a sensor that makes measurements, one axis of the heatmap is distance, the other is some form of quantity of interference, and the hardware of your sensor, or the experimentation costs, don't allow you to take intermediary distances

#

so your heatmap is pixelated, but this would be an inadequate representation of the data because it should vary continuously with distance and degree of interferene

final kiln Feb 26, 2024, 7:21 PM

#

Like if you do an interpolation it's not guaranteed to be the correct value tho

#

But I get what you're saying, I've used cubic splines a lot when I knew that even a line between points was a good approx

teal lance Feb 26, 2024, 7:44 PM

#

long canopy Feb 26, 2024, 7:53 PM

#

i REALLY need to get up to date with hardcore statistics

#

took two classes a while back but it's not enough

final kiln Feb 26, 2024, 7:58 PM

#

Been thinking of brushing up on some math too

#

Maybe go deeper in stats

#

I thinking, once I finish this project and also land a job in Switzerland (which hopefully will happen this week), I'm gonna start reviewing some stats heavy math. I can just stick it with the rest of my thesis.

long canopy Feb 26, 2024, 8:01 PM

#

final kiln I thinking, once I finish this project and also land a job in Switzerland (which...

drop me a message when they need another ml enthusiast lol

final kiln Feb 26, 2024, 8:01 PM

#

They're purely ML company, with offices in like 3 countries

#

And they'll be expanding to the US this year, which might be how I get the L1

#

Will see, haven't gotten the job yet, and I have a tooon of time

long canopy Feb 26, 2024, 8:03 PM

#

good luck

final kiln Feb 26, 2024, 8:03 PM

#

Thank you

#

I'm also gonna move my course credits to a different uni, the objective is to present the thesis in front of the top experts in the field. Still don't know how I'm gonna do it yet, but shouldn't be too hard due to EU standardization

#

I got a ton of cool stuff planned for the future

#

Like, on all fronts

desert oar Feb 26, 2024, 8:12 PM

#

long canopy what the heck are all these interpolation options

they're for actually showing images and things like actual heatmaps, rather than just visualizing a grid of numbers

desert oar Feb 26, 2024, 8:12 PM

#

final kiln Maybe go deeper in stats

you definitely know enough "math" from what i've seen. i'd say to go for probability and stats specifically

desert oar Feb 26, 2024, 8:12 PM

#

long canopy so your heatmap is pixelated, but this would be an inadequate representation of ...

precisely this

final kiln Feb 26, 2024, 8:15 PM

#

desert oar you definitely know enough "math" from what i've seen. i'd say to go for probabi...

Yeah but it's been like 2 years since college, I gotta start exercise some those muscles to get them back in shape

desert oar Feb 26, 2024, 8:15 PM

#

prob & stats will work you out well enough

#

you're super far ahead of a lot of people

final kiln Feb 26, 2024, 8:15 PM

#

My thesis is monte Carlo stuff so stats and probs actually fit well with it

desert oar Feb 26, 2024, 8:16 PM

#

perfect

long canopy Feb 26, 2024, 8:16 PM

#

desert oar prob & stats will work you out well enough

any recs beyond beginner stuff?

teal lance Feb 26, 2024, 8:25 PM

#

Any quants ?🙌🏾🙌🏾

desert oar Feb 26, 2024, 8:29 PM

#

long canopy any recs beyond beginner stuff?

recs for what? textbooks? topics to cover?

long canopy Feb 26, 2024, 8:30 PM

#

desert oar recs for what? textbooks? topics to cover?

either

desert oar Feb 26, 2024, 8:31 PM

#

long canopy either

what's your background?

long canopy Feb 26, 2024, 8:31 PM

#

econ major math minor

#

i know my way around a transformer

#

but literally 0 ML theory

desert oar Feb 26, 2024, 8:34 PM

#

long canopy econ major math minor

so you've taken econometrics?

#

that was where i started about a decade ago. econ major + math minor, trying to figure out how SVM, CNN, and random forest models worked

long canopy Feb 26, 2024, 8:36 PM

#

desert oar so you've taken econometrics?

yeah, a bit of R but sort of was not very interested in the material until recently

#

was the last semester too

desert oar Feb 26, 2024, 8:36 PM

#

if you're still in school, try to take a stats class outside of the econ department. i.e. "statistics" and not "econometrics". some of it will seem very familiar, some of it will seem like an alien reinterpretation of what you've already learned, some of it will be new.

#

if you can take a class on probability modeling with the math department that will help as well. my stochastic processes class was a great way to reinforce probability as well as think more clearly about applying it to real problems.

long canopy Feb 26, 2024, 8:40 PM

#

noted, will look into doing these, thanks a lot

#

highly appreciated

desert oar Feb 26, 2024, 8:41 PM

#

long canopy noted, will look into doing these, thanks a lot

sure! also make sure you are very comfortable with linear algebra and calculus. the MIT OCW course by Strang is legendary for a reason. that + the 3b1b courses are great, even if you got an A in those course sequences, you might learn something or at least gain new intuition.

#

for modern ML stuff i'm not that well-versed either because i don't actually do it much at work. but i do really like Dive Into Deep Learning https://d2l.ai/

long canopy Feb 26, 2024, 8:43 PM

#

desert oar sure! also make sure you are very comfortable with linear algebra and calculus. ...

thankfully linear algebra, calculus and pure math are a breeze

long canopy Feb 26, 2024, 8:43 PM

#

desert oar for modern ML stuff i'm not that well-versed either because i don't actually do ...

nice! will look into this too

versed pilot Feb 26, 2024, 9:19 PM

#

long canopy no clue when this would be relevant

Hanning/Hamming windows come up a lot in signal processing. https://en.wikipedia.org/wiki/Window_function#Examples_of_window_functions

Window function

In signal processing and statistics, a window function (also known as an apodization function or tapering function) is a mathematical function that is zero-valued outside of some chosen interval. Typically, windows functions are symmetric around the middle of the interval, approach a maximum in the middle, and taper away from the middle. Mathema...

#

If you think of the spatial frequencies contained in the heat map, the pixellated view is kind of obscuring things

#

the windowed view is allowing the "fundamental" spatial frequencies to come through

#

And yeah, it would make more "physical" sense if you think of remote sensing where a pixel is 500m x 500m, but there is no way the contents of the pixel on earth are uniformly one flat colour, unless it's the roof of the Tesla factory or something 😉. The windowing will let you see the underlying shape a bit more clearly, even though it is still blurred

ionic sun Feb 26, 2024, 9:25 PM

#

i have a plot that looks like this

#

i want to have a line that fits only the linear region extending to the =x axis

#

the x intercept is what i want how to do it?

#

#

something like this

versed pilot Feb 26, 2024, 9:27 PM

#

can you reject the low values, there's a voltage below which you don't get much current, right?

ionic sun Feb 26, 2024, 9:28 PM

#

yeah i can reject the low values just filter it out

#

but i quite new to numpy

versed pilot Feb 26, 2024, 9:28 PM

#

It's been a while since I did anything with transistors.

ionic sun Feb 26, 2024, 9:28 PM

#

it more a data sci thing i think

versed pilot Feb 26, 2024, 9:28 PM

#

I guess the other thing is to do a diff of your points which effectively is the derivative

ionic sun Feb 26, 2024, 9:28 PM

#

im not sure how to do it in python

#

but umm isnt that quite difficult

#

can i just curve fit

versed pilot Feb 26, 2024, 9:29 PM

#

and reject all the points where the derivative is low so the curve is nearly flat

ionic sun Feb 26, 2024, 9:29 PM

#

a line on the linear region

versed pilot Feb 26, 2024, 9:29 PM

#

fit on the remaining points that are neither low voltage, nor "flat" current i.e. diff(I) = ~0

#

https://numpy.org/doc/stable/reference/generated/numpy.diff.html would this help?

ionic sun Feb 26, 2024, 9:32 PM

#

i was also gonna ask if the polyfit function works

ionic sun Feb 26, 2024, 9:32 PM

#

versed pilot https://numpy.org/doc/stable/reference/generated/numpy.diff.html would this help...

imma look into it

versed pilot Feb 26, 2024, 9:34 PM

#

I've done it in scipy before for polynomials, but I don't think the transistor curve works that way

#

and you only want its linear part, you want to be rejecting all the points outside that linear part before doing a line fit

#

Good luck

desert oar Feb 26, 2024, 9:39 PM

#

ionic sun i have a plot that looks like this

do you know where the "elbow" is located, or are you trying to figure out how to locate it / work around it?

oak orbit Feb 26, 2024, 9:44 PM

#

Hello ! I have a dataframe, named 'ctl' made after a csv file: each row corresponds to a country (in two-letter format) and each column is named after a language (also in two-letter format). The data in this dataframe is in the following form for a given row: ,,,,,,,,,,,,,,,,,,"{'percent': 100.0, 'official': True}","{'percent': 1.9, 'official': False}","{'percent': 0.47, 'official': False}",,,,,, for example. I want to create a dictionary from this dataframe where, for each country (in lowercase two-letter format), it associates the most spoken language (in lowercase two-letter format). In order to do this i just aim to search for the highest pourcent found in the row, but its in a string format so its kinda difficult to do this...another approch was to find something higher than 50 in string but i also failed...is there a easiest approch to solve this ?

ionic sun Feb 26, 2024, 9:48 PM

#

desert oar do you know where the "elbow" is located, or are you trying to figure out how to...

yeah i do know where its located

#

so far right now i took two points in the lienar regime and calculated its slope

#

and modeled the linear part with scipy

oak orbit Feb 26, 2024, 9:49 PM

#

for context I have a CSV file in which each entry has a 'username' and the languages the user speak, along with their proficiency level (Native, A2, B1...). The goal of my project is to, for each username, determine which languages they can speak, associated with their proficiency level. My output file should represent languages using
codes. (i.e., i should have ‘es’ - not ‘mx’ or ‘en’ not ‘eng’ or ‘us’).

#

hope this is right channel

vital plank Feb 26, 2024, 9:54 PM

#

anyone knows how to use sklearn in jupyter lab desktop version?
I have already installed scikit-learn, started scripts, create a new kernel and change it to my jupyterlab. Help pleaseee

desert oar Feb 26, 2024, 9:57 PM

#

ionic sun so far right now i took two points in the lienar regime and calculated its slope

that's what i was going to suggest. pick the first non-elbow point on the left, and the last point, and just compute the slope between them

desert oar Feb 26, 2024, 9:58 PM

#

oak orbit hope this is right channel

this is the right channel but i think your first post is really deep into XY territory. your 2nd post clarifies, but it's not clear specifically waht you want here.

oak orbit Feb 26, 2024, 10:01 PM

#

sorry !!
country en ... mxc kck
0 AC {'percent': 99.0, 'official': False} ... NaN NaN
1 AD NaN ... NaN NaN
2 AE {'percent': 50.0, 'official': False} ... NaN NaN
3 AF NaN ... NaN NaN
4 AG {'percent': 86.0, 'official': True} ... NaN NaN

left tartan Feb 26, 2024, 10:04 PM

#

I think you're including a lot of unnecessary information here:

#

You want to know the most spoken language?

oak orbit Feb 26, 2024, 10:04 PM

#

basically country goes through 278 row and most common languages are column titles

left tartan Feb 26, 2024, 10:04 PM

#

Given Country, Language, Population (or something like that)?

#

So your dataframe has: Country, Language and Population?

oak orbit Feb 26, 2024, 10:05 PM

#

country and all languages

left tartan Feb 26, 2024, 10:06 PM

#

I think you need to show more about what a single country with multiple languages looks like.

oak orbit Feb 26, 2024, 10:06 PM

#

headers would be like country, eng, fr, es, it...

#

oki doki

left tartan Feb 26, 2024, 10:06 PM

#

Show us the first few rows of the CSV plz

oak orbit Feb 26, 2024, 10:08 PM

#

AC,"{'percent': 99.0, 'official': False}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,AD,,"{'percent': 51.0, 'official': True}","{'percent': 43.0, 'official': False}","{'percent': 7.5, 'official': False}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
AE,"{'percent': 50.0, 'official': False}",,,,"{'percent': 78.0, 'official': True}","{'percent': 7.0, 'official': False}","{'percent': 2.9, 'official': False}","{'percent'

left tartan Feb 26, 2024, 10:08 PM

#

Ah, each language is a column?

#

Show me the header line plz

oak orbit Feb 26, 2024, 10:09 PM

#

and headers is like country,en,ca,es,fr,ar,ml,ps,bal,fa,haz,uz_Arab,tk,prd,bgn,ug,kk_Arab,pt,sq,el,mk,hy,ku,az,umb,kmb,ln,und,cy,gn,sm,de,bar,it,hr,sl,hu,zh_Hant,wbp,hnj,nl,pap,sv,az_Cyrl,tly,ttt,tkr,bs,bs_Cyrl,sr,sr_Latn,bn,rkt,syl,rhg,ccp,my,grt,mro,mni,vls,wa,mos,dyu,ff,ff_Adlm,bg,ru,tr,rn,sw,fon,yo,ms,ms_Arab,qu,ay,aro,vec,ja,kgp,ko,yrl,gub,xav,dz,ne,tsj,lep,tn,af,be,zh,yue,pa,fil,ur,hi,ta,vi,pl,gu,ro,pdt,uk,so,iu,iu_Latn,oj,ojs,chp,moe,cr,mic,atj,bla,crk,den,dgr,csw,moh,nsk,dak,clc,hur,crg,war,lil,oka,pqm,crl,kwk,gwi,lua,lu,kg,lol,rw,sg,gsw,lmo,rm,rmo,wae,bci,sef,dnj,kfo,bqv,arn,bum,ewo,ybb,bbj,nnh,bkm,bas,bax,byv,mua,maf,bfd,bss,kkj,dua,mgo,jgo,ksf,ken,agq,ha_Arab,nmg,yav,wuu,yue_Hans,hsn,hak,,gan,ii,za,mn_Mong,bo,lis,ky_Arab,nxq,khb,tdd,lcp,uz_Cyrl,lzh,guc,kea,cs,sk,nds,vmf,da,swg,ksh,hsb,frr,dsb,frs,stq,pfl,aa,fo,kl,jut,arq,kab,qug,et,fi,vro,arz,ti,tig,ssy,byn,gl,eu,ast,ext,an,oc,am,om,sid,wal,gez,rmf,se,smn,sms,hif,fj,rtm,chk,pon,kos,yap,uli,pcd,br,co,frp,ia,puu,sco,lt,ga,gd,kw,en_Shaw,ka,xmf,ab,os,gcr,ak,ee,abr,gur,ada,gaa,nzi,ha,saf,man,man_Nkoo,sus,nqo,kpe,fan,bvb,pnt,tsd,quc,ch,knf,ht,id,jv,su,mad,min,bew,ban,bug,bjn,ace,sas,bbc,mak,ljp,rej,gor,nij,kge,aoz,kvr,lbw,gay,rob,mdr,sxn,sly,mwv,he,yi,lad,gv,te,mr,kn,or,bho,awa,as,bgc,mag,mwr,mai,hne,dcc,bjj,sat,wtm,ks,kok,gom,swv,gbm,lmn,sd,gon,kfy,doi,kru,sck,wbq,xnr,tcy,wbr,khn,sd_Deva,brx,noe,bhb,raj,hi_Latn,hoc,mtr,unr,bhi,hoj,kha,kfr,unx,bfy,srx,saz,bfq,njo,ria,bpy,bft,bra,btv,lif,lah,sa,kht,dv,ckb,az_Arab,lrc,syr,mzn,glk,sdh,rmt,bqi,luz,lki,gbz,is,sc,nap,lij,scn,sdc,fur,egl,pms,rgn,jam,ryu,ki,luy,luo,kam,kln,guz,mer,mas,ebu,dav,teo,pko,saq,ky,km,cja,kdt,gil,zdj,wni,kk,ug_Cyrl,lo,kjg,ku_Arab,si,vai,men,vai_Latn,st,zu,ss,xh,sgs,lb,lv,ltg,ary,zgh,tzm,shi,shi_Latn,rif,rif_Latn,gag,mg,mh,bm,ffm,snk,mwk,ses,tmh,bm_Nkoo,khq,dtm,kao,bmq,bze,shn,kac,mnw,mn,wo,mt,mfe,ny,tum,tog,yua,nhe,nhw,maz,nch,sei,iba,zmi,dtp,vmw,ndc,ts,ngl,seh,mg

#

cropped because too long

vital plank Feb 26, 2024, 10:09 PM

#

anyone knows how to use sklearn in jupyter lab desktop version?
I have already installed scikit-learn, started scripts, create a new kernel and change it to my jupyterlab. Help pleaseee

oak orbit Feb 26, 2024, 10:09 PM

#

sorry this is like the worst formatting possible

left tartan Feb 26, 2024, 10:09 PM

#

It's kinda terrible, but it's ok

#

So, the first thing I'd do is melt() the dataframe

#

import pandas as pd
df = pd.DataFrame({"col1": ["USA"], "en": ["{'percent': 43.0, 'official': False}"], "es": ["{'percent': 43.0, 'official': False}"]})
print(df)

#

Sample df so you can test this easier.

#

import pandas as pd
df = pd.DataFrame({"country": ["USA"], "en": ["{'percent': 43.0, 'official': False}"], "es": ["{'percent': 43.0, 'official': False}"]})
melted_df = df.melt(id_vars="country", var_name = "language")
print(melted_df)

#

Now, you just need to parse out the percent from value.

oak orbit Feb 26, 2024, 10:15 PM

#

ill try thank you very much !!

desert oar Feb 26, 2024, 10:17 PM

#

oak orbit sorry !! country en ... mxc kck 0 A...

did you create this data? with python dicts embedded in csv? in the future i strongly suggest not doing this.

left tartan Feb 26, 2024, 10:18 PM

#

vital plank anyone knows how to use sklearn in jupyter lab desktop version? I have already ...

What isn't working?

oak orbit Feb 26, 2024, 10:19 PM

#

desert oar did you create this data? with python dicts embedded in csv? in the future i str...

no i had to work from this file, but i cant agree more with you

left tartan Feb 26, 2024, 10:20 PM

#

What is your error?

vital plank Feb 26, 2024, 10:20 PM

#

ModuleNotFoundError: No module named 'sklearn'

left tartan Feb 26, 2024, 10:20 PM

#

Also:

#

!code

arctic wedgeBOT Feb 26, 2024, 10:20 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

vital plank Feb 26, 2024, 10:21 PM

#

from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)

print ('Coefficients: ', regr.coef_)

left tartan Feb 26, 2024, 10:21 PM

#

What happens when you run pip install -U scikit-learn

#

In jupyter, %pip install scikit-learn

vital plank Feb 26, 2024, 10:22 PM

#

basically says i have it installed already

vital plank Feb 26, 2024, 10:22 PM

#

left tartan In jupyter, `%pip install scikit-learn`

same output with this command

left tartan Feb 26, 2024, 10:22 PM

#

vital plank basically says i have it installed already

Then restart the kernel and try again

vital plank Feb 26, 2024, 10:23 PM

#

let me try, thanks

#

same error, sklearn not found

left tartan Feb 26, 2024, 10:27 PM

#

And what happens with %pip install?

#

That was to confirm you weren't running in some other virtual environment. I'm stumped if that said it was installed.

vital plank Feb 26, 2024, 10:28 PM

#

ERROR: You must give at least one requirement to install (see "pip help install")

#

first time im using jupyter desktop, i had this problems too with wget and some other libraries, i did install some other files to solve those issues

#

and run scripts/activate

left tartan Feb 26, 2024, 10:33 PM

#

I said earlier: %pip install scikit-learn

#

That's what I was wondering

vital plank Feb 26, 2024, 10:34 PM

#

should i just use jupyterlab in explorer? 🤣

left tartan Feb 26, 2024, 10:38 PM

#

What happens when you do %pip install scikit-learn?

#

In the same notebook

vital plank Feb 26, 2024, 10:39 PM

#

vital plank basically says i have it installed already

this

#

requeriments already satisfied

left tartan Feb 26, 2024, 10:54 PM

#

And what's the exact message when you run the function?

#

!traceback

arctic wedgeBOT Feb 26, 2024, 10:54 PM

#

Traceback

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
        ~~~~^~~
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

vital plank Feb 26, 2024, 10:55 PM

#

left tartan Feb 26, 2024, 10:56 PM

#

What about !pip install scikit-learn?

vital plank Feb 26, 2024, 10:56 PM

#

left tartan Feb 26, 2024, 10:57 PM

#

That would run from inside the jupyter cell

#

Same as %pip

#

Did you run %pip from the jupyter notebook?

vital plank Feb 26, 2024, 10:58 PM

#

%pip is not recognized

left tartan Feb 26, 2024, 10:58 PM

#

Hold up:

#

Inside Jupyter, in the same notebook, create a new empty Cell. In that cell put %pip install scikit-learn. Run that cell. Share the screenshot.

vital plank Feb 26, 2024, 10:59 PM

#

with cell what do you mean? a new folder?

left tartan Feb 26, 2024, 10:59 PM

#

vital plank

This is a Jupyter cell.

vital plank Feb 26, 2024, 10:59 PM

#

oh

#

says i need to restart kernel to use adapted packages (if i restart it it says the same thing)
The system cant find the directory

left tartan Feb 26, 2024, 11:01 PM

#

Hit the restart button

#

The button next to code

vital plank Feb 26, 2024, 11:02 PM

#

yeah, i did, and when i run it it says the same thing

left tartan Feb 26, 2024, 11:02 PM

#

Show me ss

vital plank Feb 26, 2024, 11:02 PM

#

#

#

The spanish text says this:

says i need to restart kernel to use adapted packages
The system cant find the directory

left tartan Feb 26, 2024, 11:04 PM

#

%pip install -U scikit-learn

vital plank Feb 26, 2024, 11:04 PM

#

same thing

left tartan Feb 26, 2024, 11:05 PM

#

Ok, try !pip install scikit-learn

vital plank Feb 26, 2024, 11:07 PM

#

you are the best, that was it!! tysmmm!!

left tartan Feb 26, 2024, 11:08 PM

#

vital plank you are the best, that was it!! tysmmm!!

What's happening is: the jupyter kernel is running in a different environment than your terminal window. So when you pip install in Terminal, it's installing to a different python install.

vital plank Feb 26, 2024, 11:09 PM

#

i see, what can i do to sync terminal and jupy kernel?

final kiln Feb 26, 2024, 11:14 PM

#

https://www.sciencedirect.com/science/article/pii/S0896627321005018

#

I forget I can't do attachments here

final kiln Feb 26, 2024, 11:14 PM

#

final kiln https://www.sciencedirect.com/science/article/pii/S0896627321005018

You need a CNN with 8 layers to reproduce the computational power of a single neuron

#

Paper says 128 feature maps per layer, assuming 3x3 kernels (I can't find the actual size so I'm gonna guess a small one), that puts the number of parameters at around 2500

#

Times the 86 billion neurons that gives us an estimate of

#

2.15e+14 parameters in the brain

#

Which I do believe matches my previous Fermi estimate

left tartan Feb 26, 2024, 11:28 PM

#

vital plank i see, what can i do to sync terminal and jupy kernel?

I don't know your configuration at all... depends on where the jupyter kernel is running/etc.

long canopy Feb 26, 2024, 11:28 PM

#

versed pilot Hanning/Hamming windows come up a lot in signal processing. https://en.wikipedia...

I see! thanks very interesting stuff

final kiln Feb 26, 2024, 11:28 PM

#

Here

final kiln Feb 26, 2024, 11:29 PM

#

final kiln Paper says 128 feature maps per layer, assuming 3x3 kernels (I can't find the ac...

The kernels are likely a bit larger than 3x3, so they might actually match quite well, via two different paths

#

Kinda cool

vital plank Feb 26, 2024, 11:30 PM

#

left tartan I don't know your configuration at all... depends on where the jupyter kernel is...

Gotcha, thanks

final kiln Feb 26, 2024, 11:31 PM

#

It's 1.72e+16, so I did count the zeros right

#

So this would be a 40 peta byte model

long canopy Feb 27, 2024, 12:23 AM

#

one_hot_encoded_following_bigrams_according_to_maximal_distance

#

i don't care ok the variable names will be as long as they need to

desert oar Feb 27, 2024, 1:15 AM

#

vital plank i see, what can i do to sync terminal and jupy kernel?

depends on how you installed python and jupyter, how you set up the terminal, etc.

long canopy Feb 27, 2024, 2:15 AM

#

seems like mistral just went closed source

#

depressing

left tartan Feb 27, 2024, 2:34 AM

#

long canopy seems like mistral just went closed source

Link?

mild vine Feb 27, 2024, 2:37 AM

#

does anyone know why my machine learning bot learns through all the noice at the start but there is that one individual per generation that is all the way down there

#

Is something possibly wrong with my code?

long canopy Feb 27, 2024, 2:43 AM

#

left tartan Link?

https://news.ycombinator.com/item?id=39517016

smy20011

Mistral Remove "Committing to open models" from their website

undone plaza Feb 27, 2024, 5:29 AM

#

does anyone have any good resources to get a really quick overview of python stuff that would be useful for math modelling/data sci (ie numpy, matplotlib, scipy, pandas, sklearn)

ionic sun Feb 27, 2024, 5:30 AM

#

im using scipy for curve fitting

#

but its giving me a strange error, i dont udnerstand why this is wrong

left tartan Feb 27, 2024, 7:09 AM

#

undone plaza does anyone have any good resources to get a really quick overview of python stu...

Kaggle.com/learn

final kiln Feb 27, 2024, 10:28 AM

#

ionic sun but its giving me a strange error, i dont udnerstand why this is wrong

Looks very clear to me, one of your arrays contains inf or Nan values and the algo won't work with those so they interrupt the execution by throwing an error

#

Just print every array along the way until you find out where they get introduced, then try to find out how they are introduced, and by then it should be clear what you have to do

final kiln Feb 27, 2024, 10:30 AM

#

long canopy https://news.ycombinator.com/item?id=39517016

Damn, that's disappointing

hexed yew Feb 27, 2024, 11:03 AM

#

Anyone know how to parse through PDFs with complicated tables and maintain table structure? I’ve exhausted a lot of the common libraries and was wondering if there’s something out there I’m missing

red bane Feb 27, 2024, 11:23 AM

#

I'm trying to run the model I've uploaded as an image.
I expected that the gpu (NVIDIA GeForce GTX 1660 TI) would run it faster than the cpu (AMD Ryzen 7 4800 H with Radeon Graphics), but it turned out to be the opposite with gpu taking about 5 minutes, while cpu takes about 1.5 minutes.
Is this supposed to be like this? If not, what can I do to fix it? If tensorboard profiler is what I have to look at, how can i verify that gpu is running as it should?

arctic talon Feb 27, 2024, 11:38 AM

#

Can somebody explain how to find solution to question C

wooden sail Feb 27, 2024, 11:54 AM

#

you may think of the law of total probability and bayes rule to write down alternative factorizations which you can then draw as diagrams. or maybe you know these concepts under the idea of computing "posterior probabilities"

void crescent Feb 27, 2024, 12:03 PM

#

guys

#

whenever i use dataset.take() the model gives accurate results

#

but if i use cv2.imread() it gives inaccurate results

#

i dont get it, why exactly is this happening?

wooden sail Feb 27, 2024, 12:05 PM

#

are your images color images?

#

cv2's imread returns the slices in order BGR

#

take the output from dataset.take and cv2.imread and compare the individual slices, it might be that the slices are ordered differently

void crescent Feb 27, 2024, 12:06 PM

#

ohhh

#

how do i rearrenge to RGB

void crescent Feb 27, 2024, 12:06 PM

#

wooden sail take the output from dataset.take and cv2.imread and compare the individual slic...

dataset.take doesnt exactly return anything

#

i have to use enumerate() to get it

#

it returns a tensor() when i do that

wooden sail Feb 27, 2024, 12:07 PM

#

all right, that sounds fine

#

numpy arrays have a transpose function that, more generally, allows you to swap axes

#

np.transpose(your_array, axes=(2,1,0)) would give the correct order in that case

#

i don't recall what cv2 returns exactly, but there should be an equivalent way of reordering axes for it if it doesn't return numpy arrays

void crescent Feb 27, 2024, 12:10 PM

#

#

returns this

#

(this isnt resized for my model btw)

wooden sail Feb 27, 2024, 12:11 PM

#

what's the type?

void crescent Feb 27, 2024, 12:11 PM

#

#

so i CAN do that

wooden sail Feb 27, 2024, 12:11 PM

#

ok, then you can reshape exactly as i showed above

void crescent Feb 27, 2024, 12:11 PM

#

thanks

wooden sail Feb 27, 2024, 12:11 PM

#

give that a shot and see

void crescent Feb 27, 2024, 12:11 PM

#

ill see if it works

#

@wooden sail

label = load_one_image()

img = cv2.imread("sample.png")
transposed = np.transpose(img, axes=(2,1,0))

reshaped = cv2.resize(transposed, (224, 224))

final_img = np.expand_dims(reshaped, axis=0)

print(label)
print(str(parasite_or_not(model.predict(final_img))))

returns:

ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(None, 224, 224, 224)

wooden sail Feb 27, 2024, 12:20 PM

#

what's the original shape of img before you do anything to it

void crescent Feb 27, 2024, 12:23 PM

#

(224, 224, 3)

#

i resize just in case

wooden sail Feb 27, 2024, 12:23 PM

#

ok, try this instead, that was my bad
img = img[:,:,[2,1,0]]

void crescent Feb 27, 2024, 12:25 PM

#

ok ty

#

however i really wanna see if i gets the parasitized cell correct

#

because it always got the uninfected ones

#

right

wooden sail Feb 27, 2024, 12:27 PM

#

that's up to you 😛 i just wanted to point out that cv2 loads images in an unusual order. make sure that the arrays cv2 gives you are in the same order as the other loading method or the training is going to have issues

void crescent Feb 27, 2024, 12:28 PM

#

training is already done

#

its getting everything correct

#

reason im going through all this trouble is because i need to deploy to web

#

so i have to use cv2.imread()

#

instead of taking new images

#

btw

#

the unchanged image

#

plt.imshow() will work fine on it right?

wooden sail Feb 27, 2024, 12:31 PM

#

should be the case

void crescent Feb 27, 2024, 12:33 PM

#

wait

#

ok nvm

#

i dont need plt.imshow since its already being saved

#

so i can just see the images

#

facing another problem

#

def load_one_image():
  #reshuffle
  reshuffled = test_dataset.shuffle(buffer_size=8, reshuffle_each_iteration=True).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
  for i, (image,label) in enumerate(test_dataset.take(1)):
    img = tensor_to_image(image[0])
    url = f"sample.png"
    img.save(url)
    return str(parasite_or_not(label.numpy()[0]))

for i in load_image():
  url = f"/content/drive/MyDrive/malaria_samples/sample_img{i[0]}.png"

  img = cv2.imread(url)
  img = img[:,:,[2,1,0]]

  resized = cv2.resize(img, (224, 224))

  final_img = np.expand_dims(resized, axis=0)

  print("For image "+i[0])
  print("Label: "+i[1])
  print("Predicted: "+str(parasite_or_not(model.predict(final_img)))

#

when i do for i in load_image() it cant read the image

#

because it doesnt exist

#

for some reason the images never got created?

#

oh

#

its bc google colab suck

#

i have to rerun the method when i changed it

wooden sail Feb 27, 2024, 12:46 PM

#

that's an issue with notebooks and the possibility of running cells out of order

void crescent Feb 27, 2024, 12:47 PM

#

wwait

#

its making mistakes

#

still

#

it said this is uninfected

#

and when i do this:

for i, (image,label) in enumerate(test_dataset.take(9)):
    ax = plt.subplot(3, 3, i+1)
    plt.imshow(image[0])
    plt.title("Label: "+str(parasite_or_not(label.numpy()[0])) + " Predict: " + str(parasite_or_not(model.predict(image)[0][0])))
    plt.axis("off")

it gets everything correct

wooden sail Feb 27, 2024, 12:55 PM

#

you'll have to compare what the difference between the images gotten with take vs with cv2 is

void crescent Feb 27, 2024, 12:57 PM

#

it always predicts U

wooden sail Feb 27, 2024, 12:58 PM

#

are you loading the exact same image just with 2 different methods?

void crescent Feb 27, 2024, 1:03 PM

#

no its reshuffled

#

in the cv2.imread()

wooden sail Feb 27, 2024, 1:13 PM

#

wdym by reshuffled?

void crescent Feb 27, 2024, 1:50 PM

#

ok

#

OHHHHH

#

bro

#

this is why i should start acting on my own

#

and stop reading online guides

desert oar Feb 27, 2024, 2:15 PM

#

void crescent this is why i should start acting on my own

a true moment of clarity!

void crescent Feb 27, 2024, 2:17 PM

#

idk why i wanted to use cv2.imread

#

i should have just used dataset.take()

#

and returned the result

potent pollen Feb 27, 2024, 2:46 PM

#

I would like to create a backprop algorithm but I struggle to find a convincing explanation of the process. Do you guys have a pdf file or stuff that explain it clearly pls ?

past meteor Feb 27, 2024, 2:58 PM

#

potent pollen I would like to create a backprop algorithm but I struggle to find a convincing ...

I usually don't like video content but this is the exception to the rule. https://www.youtube.com/watch?v=VMj-3S1tku0 or https://www.youtube.com/watch?v=i94OvYb6noo

#

Most backprop guides don't place enough emphasis on automatic differentiation imo

potent pollen Feb 27, 2024, 3:00 PM

#

Ok tysm I will watch them asap!

final kiln Feb 27, 2024, 3:21 PM

#

still very much work in progress, but not to shabby right ? compile time checks on matrix multiplication and inference of the resulting dimensions

past meteor Feb 27, 2024, 3:24 PM

#

final kiln still very much work in progress, but not to shabby right ? compile time checks ...

Are you making a rust library?

final kiln Feb 27, 2024, 3:26 PM

#

past meteor Are you making a rust library?

I'm porting my pytorch code to rust (torch c++ bindings in rust), just playing with the idea of compile time matrix checks, might include it in a utilities file for this project but if it proves super useful I might make it into one yeah

past meteor Feb 27, 2024, 3:27 PM

#

final kiln I'm porting my pytorch code to rust (torch c++ bindings in rust), just playing w...

Maybe this can be an inspiration: https://github.com/dragonfly-ai/slash

#

They do the same compile time shape checking etc.

final kiln Feb 27, 2024, 3:28 PM

#

past meteor Maybe this can be an inspiration: <https://github.com/dragonfly-ai/slash>

interesting, will check it out for sure

#

never coded in scala tho, and am learning rust so idk if I'll be able to find good connections due to me being newbie in both

#

in rust it seems I'll be entirely relying on macros to do this

past meteor Feb 27, 2024, 3:34 PM

#

Ah that's fair

long canopy Feb 27, 2024, 4:15 PM

#

is the only way to get the FLOPS of a piece of code to manually analyze the code and add counters?

wispy pulsar Feb 27, 2024, 4:17 PM

#

So I'm downloading Nous Hermes 2 Mistral DPO, would this be a good AI to use for python coding and data analysis?

agile cobalt Feb 27, 2024, 4:21 PM

#

wispy pulsar So I'm downloading Nous Hermes 2 Mistral DPO, would this be a good AI to use for...

by "for python coding and data analysis", you mean literally asking the model to write code for you?
If so, no - no model is good for that, and we strongly recommend against doing that at all.

wispy pulsar Feb 27, 2024, 4:21 PM

#

It's the only option I have right now with my knowledge on coding and structure.

#

If it can calculate the data that I recieve that's good enough for me anyways

long canopy Feb 27, 2024, 4:36 PM

#

looks like doing FLOPS estimation is an absolute pita

void crescent Feb 27, 2024, 4:38 PM

#

@app.route('/predict')
def predict():
  pred_array = get_img_array()

  for i in pred_array:
    url = i[0]
    label = i[1]
    prediction = i[2]
  
  return f'<center><img url="{url}" height=224 width=224><h3>Prediction: {prediction}</h3></center>'

guys i need help in displaying the images and predictions of my model in a flask application

#

i think this will work

#

wait nvm

#

what exactly do i do

steady hedge Feb 27, 2024, 5:23 PM

#

Yoooooo

#

Evening

#

I want to ask a question??... Been into data science for a while.... I need a platform where I can test my skills

north bluff Feb 27, 2024, 5:26 PM

#

Try Kaggle

long canopy Feb 27, 2024, 5:39 PM

#

is MIPS a good measure for code performance, instead of FLOPS?

#

no: different instructions have different costs

agile cobalt Feb 27, 2024, 5:48 PM

#

there are no really good benchmarks/ways to measure things, at least looking at it in an absolute way

anything that gives you one single absolute number as a way of measuring something will be ignoring a lot of context, and that context can make enormous differences

you can compare how two things perform on your system for your problem, but that isn't always going to transfer 1:1 to other people's environments

long canopy Feb 27, 2024, 5:51 PM

#

agile cobalt there are no *really* good benchmarks/ways to measure things, at least looking a...

yeah I just need to optimize distribution of tasks over a cluster of computers with different abilities

#

might just benchmark a piece of the algorithm I need to run on each computer, and use that as the "unit" measurement

iron basalt Feb 27, 2024, 8:06 PM

#

long canopy is the only way to get the FLOPS of a piece of code to manually analyze the code...

Yes, but you need to look at the compiled output of a program (the disassembly). Also some count MADs as two floating point operations, others don't, and so when comparing it's important to know how they count FLOPs. If MADs are counted as two, then programs that make more use of them will pretty much always win in FLOPs measurement (e.g. Nvidia counts them as two, and so the theoretical FLOPs count on the box used as a selling point is much higher than in practice for many programs).

long canopy Feb 27, 2024, 8:08 PM

#

iron basalt Yes, but you need to look at the compiled output of a program (the disassembly)....

what an absolute pain

iron basalt Feb 27, 2024, 8:09 PM

#

long canopy what an absolute pain

You may want to just measure throughput more directly, how many GB/s of input data can it process.

long canopy Feb 27, 2024, 8:10 PM

#

iron basalt You may want to just measure throughput more directly, how many GB/s of input da...

definitely, thanks a lot for the input

iron basalt Feb 27, 2024, 8:11 PM

#

iron basalt You may want to just measure throughput more directly, how many GB/s of input da...

That is, transform to the correct output, regardless of what in-between steps are used (maybe no FLOPs at all).

iron basalt Feb 27, 2024, 8:13 PM

#

long canopy yeah I just need to optimize distribution of tasks over a cluster of computers w...

Before spending a bunch of time on parallelization, apply Amdahl's law to see if it's actually going to be worth it / do anything.

long canopy Feb 27, 2024, 8:13 PM

#

iron basalt Before spending a bunch of time on parallelization, apply Amdahl's law to see if...

woah, had no clue this existed!

#

very cool, thanks seriously, will be reading up on it properly

#

i need to do very concrete stuff and need very concrete numbers/data for this part of the setup so that definitely helps

long canopy Feb 28, 2024, 1:45 AM

#

anyone got resources on distributed inference?

karmic zealot Feb 28, 2024, 6:07 AM

#

Hello everyone, how can I learn data science and ai effectively, what should I learn before jumping into this field?

worldly dawn Feb 28, 2024, 6:15 AM

#

karmic zealot Hello everyone, how can I learn data science and ai effectively, what should I l...

I would suggest to look at the pinned resources

heavy hornet Feb 28, 2024, 6:19 AM

#

Hi, so I am trying to clone and install an api but during that process I am getting this error the only reason why I can think of is that its trying to access a file located in the c drive programs file but I installed my git in e drive. Any idea how to resolve this?

#

pip3 install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

GitHub

GitHub - philferriere/cocoapi: Clone of COCO API - Dataset @ http:/...

Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3 - philferriere/cocoapi

#

This was the command that i tried to run

long canopy Feb 28, 2024, 6:45 AM

#

anyone have thoughts on embedding programming language tokens

final kiln Feb 28, 2024, 7:22 AM

#

long canopy anyone have thoughts on embedding programming language tokens

My first thought is that it doesn't sound very efficient in terms of implementation, at least in the general case. Programming languages evolve and get new features and keywords all the time. Do this for every programming language out there and it starts to look hard to even keep track of over time. Once you had an unsupervised setting where you just throw text at the model and it learns syntax, now you have to change the embedder every X months.

I think it may be a good idea for self contained models specialized in a single language or in a single language version. It would definitely result in smaller models due to the reduced vocabulary size

buoyant steppe Feb 28, 2024, 10:37 AM

#

Is there anyone familiar with SARIMA model?

tender umbra Feb 28, 2024, 12:36 PM

#

i have a simple question, is this equation wrong, since it does account for only p_1 and not p_2?

supple inlet Feb 28, 2024, 1:11 PM

#

Im thinking of building a deep learning/llm rig from a dell powerPowerEdge R720 with 2x Tesla p40. Anyone here done this or somethinf similar? It seems like the best cost effective route compared to 3090 or 4090s and im hoping ill learn more about servers

serene scaffold Feb 28, 2024, 1:46 PM

#

tender umbra i have a simple question, is this equation wrong, since it does account for only...

what is p_2, in this context?

serene scaffold Feb 28, 2024, 1:48 PM

#

supple inlet Im thinking of building a deep learning/llm rig from a dell powerPowerEdge R720 ...

how much memory does each Tesla p40 have? Because having two GPUs with y memory is worse than having one GPU with 2y memory. It's not the same as CPUs.

supple inlet Feb 28, 2024, 2:19 PM

#

24gb each, im based in UK and can pick them up for about £150. For a 24gb 3090 on ebay its about £630 and 4090 is £1700+

tender umbra Feb 28, 2024, 2:28 PM

#

serene scaffold what is p_2, in this context?

its a decision tree to predict cat or dog. p_1 is probability that its a cat. p_2 is dog. this is from andrew ngs course on decision tree.

#

https://www.coursera.org/learn/advanced-learning-algorithms/lecture/ZSbs2/choosing-a-split-information-gain

Coursera

Choosing a split: Information Gain - Decision trees | Coursera

Video created by DeepLearning.AI, Stanford University for the course "Advanced Learning Algorithms". This week, you'll learn about a practical and very commonly used learning algorithm the decision tree. You'll also learn about variations of the ...

strong wagon Feb 28, 2024, 2:39 PM

#

Hey guys, sorry if this is the wrong forum but I was wondering if anyone had any books they could recommend to get started with NLP? There are a lot of different sources out there and was wondering if there were any someone in here found more useful than others.

lapis sequoia Feb 28, 2024, 2:40 PM

#

strong wagon Hey guys, sorry if this is the wrong forum but I was wondering if anyone had any...

Daniel Jurafsky and James H Martin. Speech and Language Processing

serene scaffold Feb 28, 2024, 2:41 PM

#

strong wagon Hey guys, sorry if this is the wrong forum but I was wondering if anyone had any...

This is the right channel. However, I wouldn't recommend a book for learning about NLP. Though reasonable people can disagree.

boreal gale Feb 28, 2024, 2:42 PM

#

tender umbra https://www.coursera.org/learn/advanced-learning-algorithms/lecture/ZSbs2/choosi...

the equation is right.

H(p_1^left) is entropy of the subgroup after the left split, it contains terms of p_1 and p_2 inside

more precisely:-

H(p_1^left) = p_1 log(p_1) + p_2 log(p_2)

tender umbra Feb 28, 2024, 2:50 PM

#

boreal gale the equation is right. `H(p_1^left)` is entropy of the subgroup after the left ...

oh got it. thanks

final kiln Feb 28, 2024, 2:55 PM

#

what's the intuition behind the information entropy definition ? in physics entropy measures the number of microstates associated with a given macrostate

boreal gale Feb 28, 2024, 2:55 PM

#

tender umbra oh got it. thanks

but it's worth noting i have not seen it written like that before.. it's only after i watched his video you linked that i realised that's what he meant

boreal gale Feb 28, 2024, 2:59 PM

#

final kiln what's the intuition behind the information entropy definition ? in physics entr...

the way i think of it is that entropy is a measure of "how many bit do i need to recreate the information i was given" (with some degree of hand waving)

brave cobalt Feb 28, 2024, 2:59 PM

#

is there someone else that experience with 429 error? im using pytrends library

final kiln Feb 28, 2024, 3:00 PM

#

boreal gale the way i think of it is that entropy is a measure of "how many bit do i need to...

So the macro state is the message and the microstate is the bits that build up the message

#

Wait no that doesn't make sense

boreal gale Feb 28, 2024, 3:00 PM

#

brave cobalt is there someone else that experience with 429 error? im using pytrends library

HTTP 429 means you can exceeding some rate limit and the server is complaining, don't do so many requests at once/in such a short time

final kiln Feb 28, 2024, 3:00 PM

#

Uhmmm

brave cobalt Feb 28, 2024, 3:04 PM

#

boreal gale HTTP 429 means you can exceeding some rate limit and the server is complaining, ...

ive tried with, reduced my request and it doesnt work, i felt like google just block me away from their API

#

i use my laptop and PC to run the code, its ll the same

final kiln Feb 28, 2024, 3:05 PM

#

final kiln Uhmmm

I think I can see it more or less. The more certain a bit is, the less it will be capable at conveying a message

boreal gale Feb 28, 2024, 3:05 PM

#

final kiln So the macro state is the message and the microstate is the bits that build up t...

so say you have an array of bit of length 10

1111111111 and 0000000000 has 0 entropy, meaning you almost don't need anything to recreate that array
1010101010 has entropy of 1, the maximum entropy, meaning you kinda need to store exactly that array to recreate that array
(but now that i think of it, it's not really true is it? store 10 and run legth of 5, that's not exactly 10 bit.. i guess the point is shannon entropy is a measure of that without further method of compression / just coming from a probabilitic point of view)

tender umbra Feb 28, 2024, 3:06 PM

#

final kiln what's the intuition behind the information entropy definition ? in physics entr...

entropy is a nice metric here to judge how pure the dataset is. if dataset has only dogs or cats, entroy is 0. maximum is equally balanced. you dont need to use entropy any other function with similar charterstics works. infact people use 1-p_1^2-p_2^2 (aka gini) instead of entropy these days. because its efficient to compute.

boreal gale Feb 28, 2024, 3:06 PM

#

brave cobalt ive tried with, reduced my request and it doesnt work, i felt like google just b...

inspect the headers and see if it's suggesting a time where you can start using it again

final kiln Feb 28, 2024, 3:07 PM

#

Wait but isnt entropy calculated over all possible events ? A single bit string would be a realization of a random variable

brave cobalt Feb 28, 2024, 3:08 PM

#

boreal gale inspect the headers and see if it's suggesting a time where you can start using ...

#

like this?

final kiln Feb 28, 2024, 3:08 PM

#

Yeah looking at a single bit

tender umbra Feb 28, 2024, 3:09 PM

#

final kiln Wait but isnt entropy calculated over all possible events ? A single bit string ...

above picture considers both p_1 and p_2. those are the only two possibilities. but in general there can be more terms. i usually think of it as a metric to measure purity in this case, nothing more.

boreal gale Feb 28, 2024, 3:10 PM

#

brave cobalt

not really. look at the response object that you were given, from whatever python library you are using - the headers of the HTTP response should be parsed and stored in the object

final kiln Feb 28, 2024, 3:10 PM

#

tender umbra above picture considers both p_1 and p_2. those are the only two possibilities. ...

By pure, do you mean that the prob dist is uniform ?

tender umbra Feb 28, 2024, 3:11 PM

#

no not uniform. pure mean only dogs or only cats. so opposite of uniform

#

uniform would be a set of equal number of dogs or cats.

final kiln Feb 28, 2024, 3:12 PM

#

But in your picture entropy seems to be maximum at 0.5

tender umbra Feb 28, 2024, 3:12 PM

#

we are trying to minimise entropy. or maximaize information gain. same thing

final kiln Feb 28, 2024, 3:13 PM

#

Ah okay it's a different measure I see

tender umbra Feb 28, 2024, 3:14 PM

#

yes. you can go through the lecture. he explains in simple ways

final kiln Feb 28, 2024, 3:14 PM

#

No wait in the wiki it says it's entropy

#

So if the coin is balanced entropy is maximized

tender umbra Feb 28, 2024, 3:14 PM

#

do you want to join on a vc channel?

boreal gale Feb 28, 2024, 3:14 PM

#

i usually think of it as a metric to measure purity in this case,
high entropy = low purity
low entropy = high purity

#

it's flipped, maybe that's the source of confusion

final kiln Feb 28, 2024, 3:15 PM

#

final kiln No wait in the wiki it says it's entropy

I'm looking at entropy here

#

So 0.5 = max entropy

boreal gale Feb 28, 2024, 3:16 PM

#

Pr(X=1) = 0.5 leads to max entropy

#

not that 0.5 is the max entropy

final kiln Feb 28, 2024, 3:16 PM

#

Right

#

I mean, just a choice of words

#

0.5 => max entropy

boreal gale Feb 28, 2024, 3:16 PM

#

yep

final kiln Feb 28, 2024, 3:17 PM

#

Okay so, the more uniformly distributed the dataset is, the higher the entropy

#

At least in the coin flip case

boreal gale Feb 28, 2024, 3:17 PM

#

indeed

final kiln Feb 28, 2024, 3:17 PM

#

So entropy measures how uniformly distributed a dataset is ?

boreal gale Feb 28, 2024, 3:18 PM

#

yep that's a valid interpretation

final kiln Feb 28, 2024, 3:18 PM

#

Uhm okay

#

Gonna see if I can relate it to information

#

I don't see how uniformity implies information in some capacity. Since if the random var is uniformly distributed I can't really use it to encode a message

#

Maybe the other way around ? If it's not uniformly distributed I can decode it very easily

#

Or at least, predict it from the others

#

So in your previous example with the random string of bits

#

What is missing is the distribution itself

#

So, (00000000, P(0)=1) means low entropy

#

Because you just need one 0 to predict the entire sequence

#

As you increase P(0), one 0 will correspond to higher numbers of possibilities

#

Idk if I'm getting anywhere here >.>

#

Ah okay the description right after the graph is very telling

#

The entropy of the unknown result of the next toss of the coin is maximized if the coin is fair (that is, if heads and tails both have equal probability 1/2). This is the situation of maximum uncertainty as it is most difficult to predict the outcome of the next toss;

#

So it's measuring uncertainty

wooden sail Feb 28, 2024, 3:26 PM

#

final kiln I don't see how uniformity implies information in some capacity. Since if the ra...

this is correct though, uniform distributions have maximum entropy

#

encoding techniques for communications seek to yield a dictionary of symbols that are close to equiprobable

final kiln Feb 28, 2024, 3:27 PM

#

I'm just having trouble seeing how high uncertainty = high information content

wooden sail Feb 28, 2024, 3:27 PM

#

they're almost synonymous

final kiln Feb 28, 2024, 3:27 PM

#

I think I can see the uncertainty bit

#

Wait ,are they

#

If I send a message there's not much uncertainty in it, but a ton of information

wooden sail Feb 28, 2024, 3:28 PM

#

there's a precise definition of information here

#

as an example, most text can be compressed

#

that hints at the existence of a more compact representation of the exact same content with different, fewer symbols

#

you can achieve that by removing all the redundance and expected patterns from text. those are the things that make it easy to read for you as a person, but they reduce the uncertainty in the string of symbols that represent the content

final kiln Feb 28, 2024, 3:30 PM

#

I'm looking for the definition in the wiki

wooden sail Feb 28, 2024, 3:30 PM

#

things like huffman encoding do exactly this

final kiln Feb 28, 2024, 3:33 PM

#

I'm not seeing a formal definition of information

#

Just the entropy formula

#

There's a paper on this from the 40's

wooden sail Feb 28, 2024, 3:34 PM

#

that is what entropy is

final kiln Feb 28, 2024, 3:34 PM

#

Guess I'm reading it

#

Then I don't get it

wooden sail Feb 28, 2024, 3:35 PM

#

entropy is the standard measure of information of a random variable

final kiln Feb 28, 2024, 3:35 PM

#

Right I understood that far

#

But in real terms

wooden sail Feb 28, 2024, 3:35 PM

#

the idea being that if a random variable is constant, it carries 0 information since you already know what it is

final kiln Feb 28, 2024, 3:36 PM

#

Okay sure, but from my point of view, I can't use a random process to convey information right

#

Like if it's uniform

wooden sail Feb 28, 2024, 3:36 PM

#

the classical example is that if you're in the desert and are dying of thirst, the knowledge that it's hot and sunny carries no value to you, but the rare event that it rains has a huge amount of information

wooden sail Feb 28, 2024, 3:36 PM

#

final kiln Okay sure, but from my point of view, I can't use a random process to convey inf...

this is exactly the idea: it's the randomness that carries information

#

if it's not random, you already know the outcome and it is moot

#

and a uniform distribution is, in this sense, "the most random" because you can make no guess as to what event happens next

#

if one event happens less often, it immediately means it is less informative because you expect it to happen often

final kiln Feb 28, 2024, 3:37 PM

#

So would you say that entropy measures how much information a particular string of bytes has about the prob distribution that is generating it

wooden sail Feb 28, 2024, 3:37 PM

#

yes

final kiln Feb 28, 2024, 3:37 PM

#

Alright I got it then, it makes some sense

wooden sail Feb 28, 2024, 3:38 PM

#

and entropy in base 2 in particular measures the number of bits needed to describe an event happening

#

for a uniform distribution, you need a very large number of bits to describe all events

#

a very biased distribution can be explained with way fewer bits

#

now you can think back to the concrete example of huffman encoding for text

#

what do you do there? if a letter or string of letters happens very often, it is less informative and we represent it with fewer bits

final kiln Feb 28, 2024, 3:39 PM

#

But so like, you only know an accurate value for entropy if you know the prob distribution

#

So from the string of bytes, I can't really be certain about their entropy

wooden sail Feb 28, 2024, 3:40 PM

#

right. in communications, you construct the source yourself too, it's your task to make it have a good distribution

final kiln Feb 28, 2024, 3:40 PM

#

In which case, how is it useful, is it another case of large numbers making it accurate

wooden sail Feb 28, 2024, 3:40 PM

#

and if you hope to describe any statistical event, you need to either know or learn the statistical distribution (this is what ML is about)

final kiln Feb 28, 2024, 3:41 PM

#

I think I get yeah, awesome, thank you for your help I think I would've taken like a day or two to dig this one up

final kiln Feb 28, 2024, 3:41 PM

#

wooden sail right. in communications, you construct the source yourself too, it's your task ...

Also interesting

wooden sail Feb 28, 2024, 3:41 PM

#

one way of thinking about ML is "i have no idea about the statistical distribution of data. lemme hook up this black box thing and show it so many examples that it learns the distribution on its own"

final kiln Feb 28, 2024, 3:42 PM

#

Makes sense too.

wooden sail Feb 28, 2024, 3:42 PM

#

(noting that estimation theory uses a different, though tangentially related, definition of information)

novel notch Feb 28, 2024, 4:21 PM

#

Hello.
Is there anyway to match two shapes that consist of two or more contours in opencv?

#

I tried using matchShapes with grayscaled images but the results were totally inaccurate

long canopy Feb 28, 2024, 5:25 PM

#

wooden sail as an example, most text can be compressed

results about huffman encoding are what is supporting this claim?

#

intuitively it looks true

final kiln Feb 28, 2024, 5:30 PM

#

long canopy intuitively it looks true

Yeah intuition is easy for this one, like you can ommit 'lot stuff from english still understand message'

long canopy Feb 28, 2024, 5:31 PM

#

right but I wonder whether it depends on some statistical properties of the reducible text, or whether the reduced text has some sort of specific statistical properties wrt. the reducible text

final kiln Feb 28, 2024, 5:32 PM

#

Not sure if I understood what you mean. I think that there's something fundamental to the reduced text, as in, it's the smallest amount of bits needed to represent the text

#

Then you add redundancy and you get the English language

wooden sail Feb 28, 2024, 5:36 PM

#

both

#

and meaning of the text is also a different matter 😛

final kiln Feb 28, 2024, 5:42 PM

#

how can the redundancy part be important tho, isn't it just something you shave off anyway

wooden sail Feb 28, 2024, 5:43 PM

#

so the cool thing about shannon's coding theorem is that it's not constructive

#

it tells you there's a theoretical amount of information your data contains, but not how to reach it

#

you can't in general reduce a message to its bare minimum, and even if you could, there's possibly more than one way of doing it

final kiln Feb 28, 2024, 5:45 PM

#

that actually sounds kinda odd

wooden sail Feb 28, 2024, 5:46 PM

#

data can be encoded in more than one way, and you can compare how "efficient" each way is w.r.t. the theoretical "best code"

final kiln Feb 28, 2024, 5:46 PM

#

i'd assume that the "purest" way to encode data is binary

wooden sail Feb 28, 2024, 5:47 PM

#

but you can't directly find what that best code is in any straightforward way

#

there's more than one way to encode something in binary

#

the entropy helps you find how many bits you need, but not how to construct the bit stream

final kiln Feb 28, 2024, 5:48 PM

#

sure, but once you compress it, the final representation should be equal for all of them - this totally coming from what my brain feels like it should be tho lol

wooden sail Feb 28, 2024, 5:48 PM

#

no, that's the point

#

shannon's theorems don't tell you want the "final representation" is

#

in fact, it may not be achievable at all

final kiln Feb 28, 2024, 5:49 PM

#

but there is one and is unique right ?

wooden sail Feb 28, 2024, 5:49 PM

#

nope

#

that's what i said above

final kiln Feb 28, 2024, 5:49 PM

#

0101, there's no other way of describing this

#

anything else would be a change of symbols

wooden sail Feb 28, 2024, 5:49 PM

#

nothing is said about whether you can achieve it (might not exist at all) or whether it's unique (may be more than one)

wooden sail Feb 28, 2024, 5:49 PM

#

final kiln anything else would be a change of symbols

this is the whole point

#

the meaning of symbols is assigned by the encoder and decoder

#

you don't know what 0101 represents

#

it could be an int, a char, or anything else encoded in an arbitrary way

#

you can pick that yourself

#

same as the number 1 can be stored as a float, int, short, long, char, str, etc

#

those all use a different number of bits to represent the same thing

final kiln Feb 28, 2024, 5:50 PM

#

that's not what I'm getting at tho, what I'm saying is that all those "meanings" eventually reduce to that bit string

wooden sail Feb 28, 2024, 5:51 PM

#

how do you mean?

final kiln Feb 28, 2024, 5:51 PM

#

if that makes sense

#

like, you can have a string and a number, represent them in binary and they end up being represented by the same bit string

#

after ideal compression

wooden sail Feb 28, 2024, 5:52 PM

#

i'm not sure i get what you mean

#

the original meaning is unique

#

the encoding is arbitrary and not unique

final kiln Feb 28, 2024, 5:53 PM

#

so you have a string of text, which you can represent in binary and then compress it

#

actually idk if it makes sense, never studied this stuff, but my intuition is that if there is a process of compressing a message to its bare minimum bits, they should be an unique representation

wooden sail Feb 28, 2024, 5:55 PM

#

neither of those is true though 😛

#

there is no general process to do that, and the representation is not unique in general, if it exists at all

#

the only thing the theorem can guarantee you is that if you use any fewer bits than the limit, you immediately lost information and cannot recover the original meaning

final kiln Feb 28, 2024, 5:56 PM

#

can't you have something like byte pair encoding

#

but reducing it to the smallest bit string possible

wooden sail Feb 28, 2024, 5:57 PM

#

no

final kiln Feb 28, 2024, 5:57 PM

#

it doesn't like it should be impossible

wooden sail Feb 28, 2024, 5:59 PM

#

you can look this up, any compression scheme you can imagine, and this one ofc included because language models use byte pair encoding a lot, has been compared to the shannon limit

teal adder Feb 28, 2024, 6:00 PM

#

Someone sends a machine learning project in python?

final kiln Feb 28, 2024, 6:01 PM

#

so if I have a vocabolary of one letter, but I can repeat it

"aaaaa"

can't I encode it like so

"aaaa" -> 4 - b'001'

wooden sail Feb 28, 2024, 6:01 PM

#

if you like

final kiln Feb 28, 2024, 6:01 PM

#

isnt't that the theoretical minimum bits

wooden sail Feb 28, 2024, 6:02 PM

#

is it? that is not random if that is the whole string

#

so it has 0 entropy

#

if you consider it in the context of a larger dictionary, then you have to study the statistical properties of that 😛

final kiln Feb 28, 2024, 6:02 PM

#

well yeah more bits make it redundant, and if I take one bit I change the string

wooden sail Feb 28, 2024, 6:03 PM

#

btw there's a proof that all compression algorithms will fail for at least one data input, yielding a sequence that is even longer than the original

#

so no single algorithm could reach the shannon limit for all data

final kiln Feb 28, 2024, 6:03 PM

#

ah

wooden sail Feb 28, 2024, 6:03 PM

#

and really very few compression algorithms have been shown to be able to do that for a single type of data in the first place

#

the proof is cute, by the pigeonhole principle

#

if you start with files of a size N and compress them to size M, M < N, with a fixed compression alg that is invertible

final kiln Feb 28, 2024, 6:06 PM

#

reason why I think it's weird, is because physically there's a maximum amount of information you can place in a given volume of space, once you reach it you get a black hole and all the information gets encoded at the surface

wooden sail Feb 28, 2024, 6:06 PM

#

the number of files of length M is smaller than the number of files of length N. so either the algorithm was not invertible and you violated the shannon limit/lost information, or some of the encoded files are actually of length M' >= N

final kiln Feb 28, 2024, 6:07 PM

#

so I'd imagine the black hole surface as being like, the perfect compressed encoding for the stuff that's inside

wooden sail Feb 28, 2024, 6:07 PM

#

idk whether that analogy is true in the first place, and if so, whether it helps here anyway 😛

final kiln Feb 28, 2024, 6:07 PM

#

yes it is true, there's a thing about it

#

wait

#

https://en.wikipedia.org/wiki/Holographic_principle

Holographic principle

The holographic principle is a property of string theories and a supposed property of quantum gravity that states that the description of a volume of space can be thought of as encoded on a lower-dimensional boundary to the region — such as a light-like boundary like a gravitational horizon. First proposed by Gerard 't Hooft, it was given a prec...

#

it's helpful because it would be the perfect algorithm that you said doesn't exist

wooden sail Feb 28, 2024, 6:10 PM

#

well, we just showed above that it doesn't exist 😛

final kiln Feb 28, 2024, 6:11 PM

#

wooden sail well, we just showed above that it doesn't exist 😛

experiment always wins tho

#

so most likely scenario is I'm doing the analogy wrong

wooden sail Feb 28, 2024, 6:11 PM

#

you're mistaking "difficult" with "impossible"

frigid badge Feb 28, 2024, 6:12 PM

#

I'm currently working on making two of my simulations into a much bigger and complex one that simulates the human experience. so first up we use a hyper focused Machine learning model to try and mimic a "General Intelligence" (a general intelligence in the context of the simulation not the real world) This intelligence is placed in a world with different materials that have unique properties. These materials can be combined to create tools and technology based on a system of patterns, where the created items have the characteristics of the materials used. To manage this, I need to figure out a way to show the basic rules of this simulation, which is why I'm considering creating a simple programming language that will represent the physics in this world, allowing the MLM to learn how to work with it over time.

My question is: how can I do this in a way that allows for many different possibilities without slowing down the simulation or making it too complex for the MLM to handle effectively?

final kiln Feb 28, 2024, 6:12 PM

#

ah so I was sorta right then, just read it wrong

frigid badge Feb 28, 2024, 6:13 PM

#

frigid badge I'm currently working on making two of my simulations into a much bigger and com...

id love your opinions and idea's!

final kiln Feb 28, 2024, 6:13 PM

#

frigid badge I'm currently working on making two of my simulations into a much bigger and com...

oh I've had a similar idea for a programming language that encodes physical laws

#

was some time ago, and in the context of MC simulations, the idea was to have a language that came with the physical models already coded

#

instead of coding a lib for it

frigid badge Feb 28, 2024, 6:15 PM

#

final kiln oh I've had a similar idea for a programming language that encodes physical laws

not at all what im going for

#

more like simulating a new version of physics in relation to the simulation and what is present in it, as it its flexible but with limitations forcing the MLM to be "creative"

final kiln Feb 28, 2024, 6:16 PM

#

"creating a simple programming language that will represent the physics in this world"

frigid badge Feb 28, 2024, 6:16 PM

#

this world being the simulation...

final kiln Feb 28, 2024, 6:16 PM

#

sure, similar to my idea, not the same

#

but you're probly looking for a game engine or something of the sort right

frigid badge Feb 28, 2024, 6:17 PM

#

not at all no, the simulation is already built...

#

My question is: how can I do this in a way that allows for many different possibilities without slowing down the simulation or making it too complex for the MLM to handle effectively?

#

im basically looking for ideas on how this can be done in a cost efficient way that also allows the MLM to be stuck within certain limits forcing it to look into different solutions

final kiln Feb 28, 2024, 6:18 PM

#

you mean like, you're trying to find a way to describe the simulated world to the language model ?

frigid badge Feb 28, 2024, 6:18 PM

#

more of less

final kiln Feb 28, 2024, 6:19 PM

#

uhm

#

is your model multi modal or can it be made so ?

frigid badge Feb 28, 2024, 6:19 PM

#

I want it to be a programming language so the model can use it further on to make tools, new mateirials and tech

frigid badge Feb 28, 2024, 6:19 PM

#

final kiln is your model multi modal or can it be made so ?

Im building the model from scratch around the programming language im making right now

final kiln Feb 28, 2024, 6:19 PM

#

ah I see, you want an API for the language model to interact with your simulation

frigid badge Feb 28, 2024, 6:19 PM

#

no

final kiln Feb 28, 2024, 6:20 PM

#

you want an actual programming language for it to interact with the world, or a DSL

frigid badge Feb 28, 2024, 6:22 PM

#

No i don't want it to interact with the world via a programming language, I want to make a custom programming language to represent the world so the MLM can visualise what is around the 3D representation of it inside the simulation, fetching information like the structure of the material, state [liquid gas or solid] and the materials present inside said structure, all of this to allow it to try and create tools and tech to help it in its survial / its communities survival.

final kiln Feb 28, 2024, 6:24 PM

#

Uhmm okay I think I understand, it's a DSL describing 3d space and physics

#

Why not use game engines and their code, sounds like you'd be duplicating a lot of work

frigid badge Feb 28, 2024, 6:25 PM

#

pretty much yea

final kiln Feb 28, 2024, 6:25 PM

#

With game engine you could even vizualize stuff easier

frigid badge Feb 28, 2024, 6:25 PM

#

not really, the information is back end

#

like everything is generated into custom classes that are then represented visually inside the simulation

#

and to make it a bit more difficult im using Ursina, a python based open source engine

final kiln Feb 28, 2024, 6:26 PM

#

Can't you use those classes ?

frigid badge Feb 28, 2024, 6:26 PM

#

yes but again I want the model to try and make complex structures using the programming language

final kiln Feb 28, 2024, 6:27 PM

#

Presumably if you code the world, and it renders, then that code should be enough to represent the world

#

Regardless of compilation steps in between

frigid badge Feb 28, 2024, 6:27 PM

#

...? how would radioactivity be rendered...?

final kiln Feb 28, 2024, 6:27 PM

#

Ah wait but it needs to be dynamic right

final kiln Feb 28, 2024, 6:27 PM

#

frigid badge ...? how would radioactivity be rendered...?

Wdym ?

frigid badge Feb 28, 2024, 6:28 PM

#

im talking about an extensive simulation with a ton of diffrent attributes and physichal laws, not all of them are going to be rendered because that would fry my equipment xd

final kiln Feb 28, 2024, 6:28 PM

#

Radioactivity renders like this

#

Quite literally

final kiln Feb 28, 2024, 6:29 PM

#

frigid badge im talking about an extensive simulation with a ton of diffrent attributes and p...

Uhm I see what you mean

#

Certain things you don't want to code in a game engine

#

You just want values to be generated so that you can feed to the LM

frigid badge Feb 28, 2024, 6:30 PM

#

no... again I want the MLM to take the programming language and try to structure it into useful scripts, like tools and tech

#

I legit want the MLM to understand the attributes of the elements inside the simulation, find uses for them and try to adapt over time to use them

final kiln Feb 28, 2024, 6:30 PM

#

Interesting

frigid badge Feb 28, 2024, 6:30 PM

#

and then to try and combine these attributes into more useful items

final kiln Feb 28, 2024, 6:31 PM

#

Wait wait, but is the LM constructing the physics ?

frigid badge Feb 28, 2024, 6:31 PM

#

which is why i need it to be a programming language which is formatted in a specefic way, to give limits but allow flexibility

#

no not at all the physics are independent and predefined

final kiln Feb 28, 2024, 6:32 PM

#

So you just want a language which efficiently represents building blocks within the physics you coded

#

An example would make it easier for me

frigid badge Feb 28, 2024, 6:36 PM

#

ok then... lets say I have 3 materials, Uranium, Osmium and wood.
I want to make a radioactive material that is dense and does not emit too much radiation so it can be used as a weapon, the way I would do that inside the hypothitical programming language could possibly be:

Materials = [U, O, W]
Possible_Patterns = [[U, O, W], [U, W, O], [O, U, W], [O, W, U], [W, U, O], [W, O, U]]

# logic to try and check for the best pattern based on stats and attributes inherited via the crafting proccess's logic

#

something of the sort

final kiln Feb 28, 2024, 6:38 PM

#

does not emit too much radiation so it can be used as a weapon

.>

#

sounds like you want to code my framework

frigid badge Feb 28, 2024, 6:39 PM

#

my dude there is a difference between wearing a slightly nuclear necklace and being stabbed with a slightly nuclear weapon

final kiln Feb 28, 2024, 6:40 PM

#

just reacting to the weapon part, which is a bit suspicious

#

in any case, I have a similar thing in which you can combine materials like that

#

and then test for radiation and whatnot

frigid badge Feb 28, 2024, 6:40 PM

#

no no its not just to combine materials

final kiln Feb 28, 2024, 6:41 PM

#

it also does CSG

frigid badge Feb 28, 2024, 6:41 PM

#

but also to try and teach the model about their properties

final kiln Feb 28, 2024, 6:41 PM

#

so you can construct an environment and etc

frigid badge Feb 28, 2024, 6:41 PM

#

trail and error

final kiln Feb 28, 2024, 6:41 PM

#

frigid badge but also to try and teach the model about their properties

uhm, right, so you want the language model to code the MC input files

frigid badge Feb 28, 2024, 6:41 PM

#

final kiln so you can construct an environment and etc

everything is done and dusted I am just looking for ideas on how to make the programming lanaguage so I can build the model around it

final kiln Feb 28, 2024, 6:42 PM

#

what I did was something like this

#


water = H*2 + O

#

and it would compile all the information from the elements to make the data needed to simulate radiation in water

#

you give it density too to get the liquid, gas, etc

frigid badge Feb 28, 2024, 6:43 PM

#

yea thing is I wont be giving every element a name, or any data at all, the model will only get data to allow to try new things, from there it depends on the model itself and the enviorment around it

final kiln Feb 28, 2024, 6:43 PM

#

it's a python variable, the model can name it however it wants

#

for the geometry I did this

frigid badge Feb 28, 2024, 6:44 PM

#

I dont this this is what im looking for my guy

final kiln Feb 28, 2024, 6:44 PM

#

with Sphere(params) as s1:
with Cube(params) as c1:
...

etc

#

this would construct a, I forget what it's called actually, but it's a tree structure used in computer graphics to optmize ray tracing

#

so like c1 is inside s1, and that info is used to not ray trace s1 when appropriate

frigid badge Feb 28, 2024, 6:45 PM

#

I just want the model to try and learn based on the information it finds around it and try to apply the scientific method along with trail and error, your more trying to visualise this, I dont want any advanced visuals beyond representing the model and the base materials.

#

hell even those are just cubes...

final kiln Feb 28, 2024, 6:45 PM

#

frigid badge hell even those are just cubes...

not bounding boxes

frigid badge Feb 28, 2024, 6:45 PM

#

...?

final kiln Feb 28, 2024, 6:46 PM

#

frigid badge I just want the model to try and learn based on the information it finds around ...

sure im just giving you an example based on the DSL I created within py

#

eventually I was gonna do something inspired by JSX

#

to get dynamic objects

#

was gonna be so cool

#

I gotta finish that thing

#

def some_obj():
      t = useTime()
      return <Box>
        <Sphere radius={t**2} />
      </Box>

#

uhm

meager ridge Feb 28, 2024, 6:57 PM

#

wooden sail you're mistaking "difficult" with "impossible"

this should be a t-shirt

wooden sail Feb 28, 2024, 6:58 PM

#

i meant it the other way around though

#

many people fail to grasp that things are proven to be impossible 😛

frigid badge Feb 28, 2024, 6:58 PM

#

wooden sail many people fail to grasp that things are proven to be impossible 😛

for now

wooden sail Feb 28, 2024, 6:58 PM

#

exactly like this

meager ridge Feb 28, 2024, 6:58 PM

#

i mean i wouldn't wear it but it would sell

frigid badge Feb 28, 2024, 6:58 PM

#

I strongly belive that given enough time we would break the laws of physichs

#

and by enough times I mean eons

meager ridge Feb 28, 2024, 6:59 PM

#

frigid badge I strongly belive that given enough time we would break the laws of physichs

that just means we have the laws wrong

final kiln Feb 28, 2024, 6:59 PM

#

wooden sail many people fail to grasp that things are proven to be impossible 😛

reality doesn't care about your proof tho, if all of a sudden 1 + 1 = 3 was observed everywhere that's what it would be

#

you'd have to adapt your formalism

wooden sail Feb 28, 2024, 6:59 PM

#

please don't ping me for this

meager ridge Feb 28, 2024, 6:59 PM

#

lol

final kiln Feb 28, 2024, 6:59 PM

#

uhm, sure

frigid badge Feb 28, 2024, 7:00 PM

#

final kiln reality doesn't care about your proof tho, if all of a sudden 1 + 1 = 3 was obse...

...?

#

sorry but out of what you say I understand like 30% at best

final kiln Feb 28, 2024, 7:01 PM

#

frigid badge ...?

what I'm stating is that experiment is king, and the rest is mathematical idealism

#

if it were any other way we wouldn't have AI in the first place for ex

#

the 1+1=3 was just an extreme to illustrate a point

frigid badge Feb 28, 2024, 7:02 PM

#

I got like 10-20% of that sorry dude

final kiln Feb 28, 2024, 7:02 PM

#

frigid badge I got like 10-20% of that sorry dude

which part is confusing

frigid badge Feb 28, 2024, 7:02 PM

#

ima just give up and back out of this...

final kiln Feb 28, 2024, 7:03 PM

#

frigid badge ima just give up and back out of this...

uhm sure, if you want to understand my point I'd be happy to explain tho

frigid badge Feb 28, 2024, 7:04 PM

#

walks away slowly

meager ridge Feb 28, 2024, 7:09 PM

#

ok i have a much less theoretical question

I'm working on something to learn from a bunch of very long budget documents (pdfs), combined text and tables

assuming I can get both types of data extracted, what would be the best approach to building something that could answer questions about the documents?

ie: "how much did the parks department spend on maintanence in 2022"

#

(I have a good understanding of LLMs and I know spacy, huggingface etc, i'm just trying to map out a good workflow)

final kiln Feb 28, 2024, 7:11 PM

#

uhm, I'd possibly try to tokenize everything and store it in chunks that can be embedded in a template prompt

#

your question here 

[pre computed chunk]

and then it would output the answer if possible, like iterating through the various chunks until it finds it

#

https://research.ibm.com/blog/retrieval-augmented-generation-RAG

IBM Research Blog

What is retrieval-augmented generation? | IBM Research Blog

RAG is an AI framework for retrieving facts to ground LLMs on the most accurate information and to give users insight into AI’s decisionmaking process.

meager ridge Feb 28, 2024, 7:13 PM

#

cool -- is there a better way than just using a diff model to answer qs from the table data?

final kiln Feb 28, 2024, 7:13 PM

#

oh, there's something recent from nvidia too

#

that does exactly that wait

meager ridge Feb 28, 2024, 7:14 PM

#

(ideally this is something i can set up on my own server as opposed to paying for textract)

final kiln Feb 28, 2024, 7:14 PM

#

https://blogs.nvidia.com/blog/chat-with-rtx-available-now/

NVIDIA Blog

Chat with RTX Now Free to Download | NVIDIA Blog

New tech demo gives anyone with an NVIDIA RTX GPU the power of a personalized GPT chatbot, running locally on their Windows PC.

#

I think it's free

#

gets you a UI and everything, you just feed it the documents

meager ridge Feb 28, 2024, 7:15 PM

#

ayooo

final kiln Feb 28, 2024, 7:15 PM

#

and then ask questions

meager ridge Feb 28, 2024, 7:16 PM

#

awesome, thank you!!!

frigid badge Feb 28, 2024, 7:22 PM

#

final kiln https://blogs.nvidia.com/blog/chat-with-rtx-available-now/

shi- i was working on something simmillar using open source LLM's

#

Concept of how the model could work

final kiln Feb 28, 2024, 7:26 PM

#

frigid badge Concept of how the model could work

yeah it looks interesting

#

I think there's some recent developments made by google that would make LLMs work in the context of scientific discovery

#

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/

frigid badge Feb 28, 2024, 7:28 PM

#

im an 18 year old soldier working from a war room on a laptop :Y

#

I make everything I use to save cash

#

as I got like 300$ in bank account after 2 months of working so

final kiln Feb 28, 2024, 7:29 PM

#

im 10 years older, we are not that different

frigid badge Feb 28, 2024, 7:29 PM

#

you also live on 150$ a month?

final kiln Feb 28, 2024, 7:29 PM

#

no I pay rent

meager ridge Feb 28, 2024, 7:29 PM

#

im 20 years older i haven't paid my rent in six months because im on rent strike

frigid badge Feb 28, 2024, 7:29 PM

#

yea I dont I live in a room with 6 other doods :")

frigid badge Feb 28, 2024, 7:30 PM

#

meager ridge im 20 years older i haven't paid my rent in six months because im on rent strike

thats a thing?

meager ridge Feb 28, 2024, 7:30 PM

#

you can make anything a strike

final kiln Feb 28, 2024, 7:30 PM

#

frigid badge yea I dont I live in a room with 6 other doods :")

been there too when I was your age

final kiln Feb 28, 2024, 7:30 PM

#

meager ridge im 20 years older i haven't paid my rent in six months because im on rent strike

hopefully the trend spreads : D

frigid badge Feb 28, 2024, 7:30 PM

#

my guy im pretty sure I have some unique exprience here :")

meager ridge Feb 28, 2024, 7:30 PM

#

they won't give me a new lease unless i sign away my rent stabilization, evicting people is actually really difficult

final kiln Feb 28, 2024, 7:31 PM

#

meager ridge they won't give me a new lease unless i sign away my rent stabilization, evictin...

yeah, as it should be

frigid badge Feb 28, 2024, 7:31 PM

#

frigid badge my guy im pretty sure I have some unique exprience here :")

like im currently part of a war doing 17 hour shifts while still trying to run my personal life which is 137km away I really dont think anyone else here has done that :")

final kiln Feb 28, 2024, 7:31 PM

#

frigid badge my guy im pretty sure I have some unique exprience here :")

are you studying AI ?

dusty forge Feb 28, 2024, 7:32 PM

#

Danggg why is the entire world hyped on NN and LLMs ... I'm still learning predictive modeling on ML level 😆

frigid badge Feb 28, 2024, 7:32 PM

#

final kiln are you studying AI ?

in my spare time im teaching myself how to work with it

final kiln Feb 28, 2024, 7:32 PM

#

frigid badge like im currently part of a war doing 17 hour shifts while still trying to run m...

wait like a literal war ?

frigid badge Feb 28, 2024, 7:32 PM

#

im in the war room rn lol

meager ridge Feb 28, 2024, 7:32 PM

#

final kiln yeah, as it should be

oh for sure, i think it just comes as a surprise to a lot of ppl who are worried about getting evicted!

meager ridge Feb 28, 2024, 7:32 PM

#

final kiln gets you a UI and everything, you just feed it the documents

the problem is the GPU i would be using is on a compute VM lol

still tho, good looks

final kiln Feb 28, 2024, 7:33 PM

#

frigid badge im in the war room rn lol

ah okay, just to be clear, you are not in an actual war where people die right ?

#

unfortunately that's always possible given the state of things

frigid badge Feb 28, 2024, 7:33 PM

#

final kiln ah okay, just to be clear, you are not in an actual war where people die right ?

it depends sometimes I get sent to places with semi active combat to document things

dusty forge Feb 28, 2024, 7:33 PM

#

guys, can we keep this channel for the actual topics?

frigid badge Feb 28, 2024, 7:33 PM

#

yea true we did drift off topic

dusty forge Feb 28, 2024, 7:34 PM

#

feel free to chat about wars and stuff on off-topic or even private, many people do not need to be dragged into sensitive subjects

final kiln Feb 28, 2024, 7:34 PM

#

meager ridge oh for sure, i think it just comes as a surprise to a lot of ppl who are worried...

yeah it was for me too, I had to check because I'm taking a bit of a risk for my current career step. but like, I get along with my landlord very well and we're in constant comunication, we'd figure it out if it came to that

frigid badge Feb 28, 2024, 7:34 PM

#

dusty forge feel free to chat about wars and stuff on off-topic or even private, many people...

we already agreed and stopped dood no need to go on

final kiln Feb 28, 2024, 7:34 PM

#

dusty forge feel free to chat about wars and stuff on off-topic or even private, many people...

oh sorry

frigid badge Feb 28, 2024, 7:35 PM

#

frigid badge Concept of how the model could work

anyway, so this is what I was thinking the model should be built around,

meager ridge Feb 28, 2024, 7:35 PM

#

anyways how hard do we think its gonna be to adapt this for linux
https://github.com/NVIDIA/trt-llm-rag-windows

GitHub

GitHub - NVIDIA/trt-llm-rag-windows: A developer reference project ...

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM - NVIDIA/trt-llm-rag-windows

#

i am less familiar with that shit

#

(good with setting up vms and good at following instructions, but not a lot of experience with GPU set-up)

frigid badge Feb 28, 2024, 7:37 PM

#

I tried enabiling cuDNN its a bloody pain

#

I got 0 idea how to work with GPU's beyond that

meager ridge Feb 28, 2024, 7:38 PM

#

is it possible the only windows-specific thing is the UX?

#

(i do NOT really know how computers work)

final kiln Feb 28, 2024, 7:45 PM

#

meager ridge (i do NOT really know how computers work)

computers are fancy calculators with memory

#

*nailed it *

final kiln Feb 28, 2024, 7:45 PM

#

meager ridge anyways how hard do we think its gonna be to adapt this for linux https://github...

ah I didnt knw it was just for windows

meager ridge Feb 28, 2024, 7:46 PM

#

i mean its prob doable

#

im sure its doable

final kiln Feb 28, 2024, 7:46 PM

#

uhm, I wonder if you can run a windows docker container

#

on linux

meager ridge Feb 28, 2024, 7:46 PM

#

oh maybe

final kiln Feb 28, 2024, 7:46 PM

#

behind it would be a windows VM ofc

meager ridge Feb 28, 2024, 7:47 PM

#

id still have to figure out how to run an app on a VM

final kiln Feb 28, 2024, 7:47 PM

#

but there might be images prepared with all the drivers and stuff

meager ridge Feb 28, 2024, 7:47 PM

#

er run a UX

final kiln Feb 28, 2024, 7:47 PM

#

meager ridge id still have to figure out how to run an app on a VM

if it exposes a web UI, all you need to do is open that port on the docker container right

meager ridge Feb 28, 2024, 7:47 PM

#

um ima see and report back

final kiln Feb 28, 2024, 7:48 PM

#

it's gonna be funky for sure due to gpu stuff, but im curious to see if it works

meager ridge Feb 28, 2024, 7:48 PM

#

yeah this is one of those awesome "choose your pain in the ass adventure" situations

final kiln Feb 28, 2024, 7:48 PM

#

it would seem so yes

#

can't believe nvidia went for win tho

final kiln Feb 28, 2024, 7:49 PM

#

frigid badge I got 0 idea how to work with GPU's beyond that

are you using windows or linux ?

frigid badge Feb 28, 2024, 7:54 PM

#

final kiln are you using windows or linux ?

windows

final kiln Feb 28, 2024, 7:55 PM

#

frigid badge windows

linux tends to be better for this stuff

#

but using the gpu is easy if you're using a python ML lib like pytorch

#

you kind just do .to(cuda)

#

and a copy is sent to the gpu

#

all operation thereafter happen in the gpu

#

and that's pretty much everything

#

unless you want to code your own custom layers on the gpu

frigid badge Feb 28, 2024, 7:56 PM

#

just got an RTX 3060 for better development but cant figure out how tf to use it

final kiln Feb 28, 2024, 7:57 PM

#

frigid badge just got an RTX 3060 for better development but cant figure out how tf to use it

should be possible to do what I just described

#

but you do need to install the drivers and cuda

frigid badge Feb 28, 2024, 7:57 PM

#

I am telling you hat I am struggeling with even using CudNN

final kiln Feb 28, 2024, 7:57 PM

#

uhm

nimble heath Feb 28, 2024, 7:57 PM

#

frigid badge just got an RTX 3060 for better development but cant figure out how tf to use it

sounds quite sad you figuring that out after gettin it lol

#

good luck tho

final kiln Feb 28, 2024, 7:58 PM

#

frigid badge I am telling you hat I am struggeling with even using CudNN

you gonna wanna use this

#

https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/

frigid badge Feb 28, 2024, 7:58 PM

#

nimble heath sounds quite sad you figuring that out after gettin it lol

I saved for 4 yeats to buy a new setup, i7 14th gen (I now realise 14th gen is a mistake but eh) with an ETX 3060 12GB

final kiln Feb 28, 2024, 7:58 PM

#

install docker and run this command "docker run -it --gpus=all --rm nvidia/cuda:11.4.2-base-ubuntu20.04 nvidia-smi"

#

should be enough

frigid badge Feb 28, 2024, 7:59 PM

#

now I live 137km away from it and I cant even figure out how to use the GPU

frigid badge Feb 28, 2024, 7:59 PM

#

final kiln install docker and run this command "docker run -it --gpus=all --rm nvidia/cuda:...

on windows?

final kiln Feb 28, 2024, 7:59 PM

#

frigid badge on windows?

yes

nimble heath Feb 28, 2024, 7:59 PM

#

And here i am, saving money and still sittin on a computer from 2014 or somethin

#

(Only got a new one because i just needed open gl 3.2 support)

frigid badge Feb 28, 2024, 8:04 PM

#

I spent every cent I had

#

my old pc was from 2014 lol

long canopy Feb 28, 2024, 8:09 PM

#

Suppose that, for any batch, if token X_1 appears, token X_2 will also appear. What's the most computationally efficient way to infer this property?

final kiln Feb 28, 2024, 8:09 PM

#

nimble heath And here i am, saving money and still sittin on a computer from 2014 or somethin

You guys don't need to spend too much money on GPU tho, since you can just rent it

final kiln Feb 28, 2024, 8:10 PM

#

long canopy Suppose that, for any batch, if token X_1 appears, token X_2 will also appear. W...

Where does X_2 appear ? In the same sequence or it can appear anywhere in the batch of sequences ?

long canopy Feb 28, 2024, 8:11 PM

#

final kiln Where does X_2 appear ? In the same sequence or it can appear anywhere in the ba...

Oh, suppose our model takes only singular sequences and batches of size one (slightly mispoke)

final kiln Feb 28, 2024, 8:12 PM

#

long canopy Oh, suppose our model takes only singular sequences and batches of size one (sli...

I'm more confused now ahah,

So the shape of the input would be: (1, 1, number_of_tokens) ?

long canopy Feb 28, 2024, 8:12 PM

#

yep

final kiln Feb 28, 2024, 8:12 PM

#

Uhm

#

If there's only one token in the sequence, there's no place for X_2 to appear

#

I maybe miswrote it

#

I totally did

long canopy Feb 28, 2024, 8:13 PM

#

final kiln I'm more confused now ahah, So the shape of the input would be: (1, 1, number_o...

shape (1,1,x) is a column/row vector

#

this is what I mean

#

no it's not

#

i messed up too

final kiln Feb 28, 2024, 8:14 PM

#

Usually you have (batch_dimension, sequence_size)

#

So you mean (1, 10) maybe

#

[[token1, token2, ...]]

long canopy Feb 28, 2024, 8:16 PM

#

yeah uh i think (1,1,x) = (1,x)

#

in terms of shape

final kiln Feb 28, 2024, 8:16 PM

#

Well yes

#

Okay in any case, what do you mean by inferring tho

long canopy Feb 28, 2024, 8:17 PM

#

if there is X_1 in the input the model will necessarily produce X_2

final kiln Feb 28, 2024, 8:17 PM

#

Well you train it to do so