#data-science-and-ml

1 messages · Page 131 of 1

half bolt
#

Like anytype

#

Simple ai for games or ai graphs like for analysing

glass ridge
#

in arrays , the axcis 0 represents columns or rows ?

serene scaffold
#

No, the opposite

glass ridge
#

what about 1 dimention

serene scaffold
#

Rows are second to last and columns are last

serene scaffold
glass ridge
#

ok

serene scaffold
glass ridge
#

in 3 dimentions?

serene scaffold
#

Then it's like you have a stack of mateices

#

*matrices

glass ridge
#

?

serene scaffold
#

A matrix is a 2d array

glass ridge
#

ah i got it

serene scaffold
#

So if you have a 3d array, it's like you have a stack of 2d arrays

glass ridge
#

so axis 0 is columns?

serene scaffold
#

So if you have an array of shape (5, 7, 6), that's a stack of five matrices that have seven rows and six columns

glass ridge
#

yeah

serene scaffold
#

Which means that rows are dimension one and columns are dimension 2

#

And if you have an array of shape (8, 5, 7, 6), it's like you have a stack of stacks.

glass ridge
#

so we can call dimention 3 matrice , then axis0 is matrice , axis1 is row , axis2 is column

#

right?

serene scaffold
#

But you can also use -1 and -2 like list indexing

#

So for an array with two or more dimensions, -1 will always be columns and -2 will always be rows

#

Gotta run

glass ridge
#

thank u

glass ridge
#

now i understood

serene scaffold
#

I'm glad

deep sleet
#

Just a random question

#

Is pandas efficient on large scale datasets?

#

Or is there something else you learn for data with let's say millions of rows?

small wedge
deep sleet
#

Noted

left tartan
#

I treat Pandas as something that's convenient but not performant: I only use it because it's commonly used/known, but anything serious I use something else

deep sleet
#

Ohh

#

Do you have any resources to read about polars

deep sleet
left tartan
deep sleet
#

Oh

#

Thx!

spring field
#

right, I'm just saying that you can't really analyse the changes in stock price based on previous results

#

time and price is simply not enough to predict such a complex machine as the market

deep sleet
#

It depends on the instruments tbh for something like eurusd you can actually find with the human eye patterns that are related to time specifically

#

And then you can add level 2 market data if you have access to those which I think should be give it a necessary input right?

#

I know that I pretty much barely have any knowledge in the field of Ai but I been in the forex market specifically for 2 years and I can say price and time are enough for a human with enough backtesting data to find not specifically pattern but a context of how the market opperates

abstract wasp
#

Hi, I have a task of analyzing the linkage between a person’s housing and impact in their health. Basically, I have chunks of texts (some are just one or two sentences) where interviewees answer this question. What kinds of info should I grab from this data? I was thinking of doing a sentiment analysis and an emotion detection, and maybe doing like a word cloud to see which words come up most often. Do you guys have better ideas? lol

serene scaffold
#

If for a given person, you don't get information about both their housing and their health, that person is useless for you

#

Can you imagine what parameters of a person's housing situation might impact their health?

#

Also, word clouds are really just for fun. If you want to see what words appear most often, you should just make a table of word counts.

abstract wasp
abstract wasp
#

The extra stuff, not the key phrases

#

Is that what you meant? lol

serene scaffold
#

Not exactly

#

I mean I guess you can think of it that way

#

The point is that if someone says, for example, that they live in an apartment, you need to get that information. And you need to be able to identify the responses from everyone else who lives in an apartment and see what they have in common regarding their health.

abstract wasp
#

Ohhh that’s a good idea

abstract wasp
#

Also once I have all this info, how should I format this to make a sort of report

lapis sequoia
#

can someone give me critisims on a model I made?

#

i just need feedback

serene scaffold
serene scaffold
lapis sequoia
#

no, I just need to rest

serene scaffold
abstract wasp
#

We mostly see a lot of mental health issues bec. the high rent causes a lot of stress to the tenant, some is stress due to lack of help around unit (like when they request for repairs, LL/manager doesn’t help)

deep sleet
#

saying that the style of your neighborhood and how clean your housing is affects your psychlogy to a degree that it also affects your life span

abstract wasp
deep sleet
#

Here are some similar ones if they help

#
#

Yes exactly

abstract wasp
deep sleet
#

No worries!

unique spoke
#

Anybody ever worked with the COCO dataset

#

For some reason , im not able to download any of the files

#

Can anyone help

sweet harness
#

I'm very Disappointed. After a full night of training. This is the results.
Orange is Buy and HOLD, and Blue is the bot's performance.

past meteor
mental rampart
#

can someone tell me why my datasets are not being loaded

boreal nest
#

Reading a lab note on IBMs AI Engineer course.. surely there's a better variable name for this..

boreal nest
#

where is your datasets module located?

#

It's better if you make a more straight forward approach. And just use pandas fuction 'read_csv' in your notebook. There's no need to complicate it.

mental rampart
#

i got it fixed but thnxs for the input

boreal nest
#

mhmm, and alias your pandas. it's convention at this point

finite lodge
#

Hi, using ax.legend(loc='upper left', bbox_to_anchor=(1, 1)) removes the title of the legend, any idea why?

broken eagle
#

https://machinelearning.apple.com/research/recognizing-people-photos

"Building a Gallery

A gallery is a collection of frequently occurring people in a user’s Photos library. To build the gallery in an unsupervised manner, Photos relies on clustering techniques to form groups, or clusters, of face and upper body feature vectors that correspond to the people detected in the library. In Photos, we developed a novel agglomerative clustering algorithm that enables an efficient incremental update of existing clusters and can scale to large libraries. To build these clusters, we start with a clustering algorithm that uses a combination of the face and upper body embeddings for each observation. This step is fairly conservative because when it joins two instances, they are permanently associated. We tune the algorithm so that each first-pass cluster only groups together very close matches, providing high precision but many, smaller clusters. Each cluster is represented by the running average of its embeddings as instances are added. ...."

Any one familiar with how they implemented their clustering?

How did they go about doing this :

"To build these clusters, we start with a clustering algorithm that uses a combination of the face and upper body embeddings for each observation. This step is fairly conservative because when it joins two instances, they are permanently associated."

How are two instances permanently associated...?

mental rampart
#

can someone help me
i already have accelerate but its giving this error
"ImportError: Using the Trainer with PyTorch requires accelerate>=0.21.0"

royal haven
#

The error explains the issue: one of the required PIP packages must be at least version 0.21.0 or higher.

mild dirge
#

That's what I think from reading just this snippet at least

broken eagle
#

It is the first time I see people do a multi-stage clustering effort, and making use of different features at each stage. Very curious what kind of algorithmn they used and where to learn things like that :/. I'm just kinda stuck with the typical algo atm - K-means, HDBScan etc

mild dirge
#

I think this image explains it well. They have a separate detector for upper body and face

#

They need to match what upper body belongs to what face, so that is the "matching"

#

They then combine the embeddings (by simply concatenating f.e., could be some other ways too)

#

SO you have one "person" embedding

#

Which they probably then use for clustering (in a multi-step clustering process)

#

But that is just from this image, they might do it differently, but I'm not gonna read the entire blog post rn 😛

broken eagle
serene wedge
#

How to compute Triangular Moving Average?

left tartan
misty shuttle
#

Hey! I did freecodecamp's course on Machine Learning, and I wanted to implement the algorithms presented through scikitlearn. I am unable to find projects/practice problems/tutorials on these algorithms (on how to implement them specifically), can anyone help me out?

serene scaffold
misty shuttle
sweet jungle
#

im writing my own neural network code and in the backpropagation ive ran into some issues with the numpy.dot() function
so for my example ive got my neurons in the format (9,20,20,9) ie input layer of 9, 2 hidden layers of 20, output layer of 9. and for the backpropagation i have the line
hidden_error = np.dot(np.array(self.networktable[i]).T, previous_delta) where self.networktable[i] is a 2d list. or a list of lists, each of the contained lists is the weights one entire neuron, and theres one of thoes for each neuron. previous_delta is a 1d list for each of the neurons on the next layer.

so for the 2nd hidden layer the following occours:
when i try too dot product them im taking a 2d list with 20 lists each with 20 values inside, and trying to dot product that with a 1d list of 9 values. this causes a shape error due too the differnt sizes. however im not sure what im ment to do instead because this approach worked for https://www.youtube.com/watch?v=7qYtIveJ6hU this example but he doesnt appear to do it for a layer at a time. not sure how i could fix this??

#

his version of the line looks like this, ive taken it from the pandas format and used numpy which i dont think effects it other than syntax.

#

any ideas?

glad mountain
#

I am currently working on a project aimed at recognizing whether a photo of any individual exists in the university's records. The proposed method involves storing the embeddings of each student's photo, along with their details, in a vector database. When a photo needs to be compared, the system will generate the embedding value for that photo and then compare this value against the database. If the value falls within a specific threshold, it will indicate that the individual exists in the record.

I am seeking expert advice on whether this approach is feasible. If there are any concerns with this method, I would appreciate recommendations for the best solution.

past meteor
#

If it's just about using it then the sklearn docs themselves have tons of examples

#

Do reinforcement learning

#

Like, read some of the book form sutton & barto

#

You'll learn DP in no time 😄

#

Dynamic programming is at the essence of RL

#

It's not hard at all, you'll have no problems doing it whatsoever

#

You've heard of Q learning right?

#

Q learning is approximate dynamic programming

lapis sequoia
past meteor
#

Basically, you're estimating the value of each state action pair

#

that SA combination is called the Q-value

#

If you have this you can derive the optimal policy from it

#

Nothing fancy yet, right?

#

The thing is, with DP you have a "world model", you have a representation of the problem you can prod you get the reward under your next action

#

And in DP you evaluate all actions in each state

#

In the real world you can't do that without being able to rewind

#

Q learning is "data based", you sample one action (giving more weight to the current policy) and then do your update like that. No need for "rewinds" or a "world model"

#

My explanation is terrible

#

I give up

#

If I had a whiteboard itd all make sense lol

#

Doing DSA without doing leetcode first hmmm

#

Makes sense if you're in a hurry ig

misty shuttle
river cape
#

Can perceptron loss only be used with a single perceptron?

serene scaffold
river cape
#

Bcoz normally in a neural network we would use mse , mae , huber loss , categorical , binary and sparse right?

hazy bear
#

i have learnt intermediate level ml from kaggle.com

#

and some of pandas library

#

but how to i get into deeper machine learning

serene scaffold
hazy bear
#

wait

#

its just a fancy word for a matrix

serene scaffold
serene scaffold
#

That's okay

#

Tabular data is when you have a fixed number of pieces of information (features) about each thing

#

Whereas images are non-tabular because each pixel is a feature, and the number of them varies depending on the size of the image.

#

Neural networks really shine with non-tabular data. But creating a network for tabular data is a good place to start.

finite lodge
#

Hi, Im using seaborn, does anyone know why the color palette changes when adding more elements to a group (using hue)?

#

Nvm found it

deep sleet
#

Somehow the ball made it seem much more professional xddd

#

xdd

#

Nicee

#

oh , just waiting until you see how the previous ones go?

#

Good luck

past meteor
#

This is exactly what I do

#

I'm using Postgres with the vector extension so it's an all in one solution

#

Yes but it's not rocket science

#

Very easy to implement

#

I use dagster to embed my stuff once per night

#

Store it in pgVector, this is deployed as a separate entity from my DB.

Next to that, I also have the backend that can access pgVector, embeds the query, does cosine sim, sort and take top k

iron basalt
#

RL is in many ways more DP than many DP algorithms.

past meteor
#

Interesting way to view it, I agree

iron basalt
#

Just historically, what was being researched.

#

The dumb name was intentional, to avoid them not wanting to fund it.

past meteor
#

In my mind I always think DP needs the reward model and transition probabilities etc.

#

And that maximising the bellman equation isn't necessary DP

#

Would you call Monte Carlo methods DP?

iron basalt
#

DP is just mathematical optimization, and specifically about this kind of building things up via breadcrumbs (exact or not).

#

Which is why it's so broad, most algorithms typically used practically on which a business is built are DP.

past meteor
#

It needs to be a specific kind of mathematical optimisation, no?

iron basalt
past meteor
#

Yes, but I attribute that mostly to maximising the bellman equation

#

Unless you say that all methods that do that are DP (and vice versa)

iron basalt
#

In different contexts it's not just the Bellman equation.

past meteor
#

If that's the angle, then sure

#

Discussing the details of this are above my pay grade though

#

I just know distinctly that many papers called Q learning, let's say temporal difference methods in general, approximate DP

iron basalt
#

Bellman is essential though yes.

past meteor
#

But I don't know enough about this to be able to make a taxonomy myself 🙂

#

(or to critique existing ones, yours or theirs)

iron basalt
#

Although in other contexts you have like the Hamilton-Jacobi-Bellman equation.

iron basalt
#

So as to not be confused with DP, the programming method, as found in CS classes.

past meteor
#

DP as a programming method is stupid

#

In my mind it's an optimization method first and foremost yah

iron basalt
#

Yeah a lot of things in CS are weird like this, including the term CS.

#

(The need to tack on "science" to everything to make it feel more legit)

#

Or what a "greedy" algorithm even is.

past meteor
#

the programming in DP to me is the same programming in linear programming, IL, MIP, ...

#

Because that's the context I learnt about those, in quant business

iron basalt
#

Yeah, mathematical optimization.

#

A part of it.

past meteor
#

as well as greedy algorithms

iron basalt
#

Greedy is not actually well defined, it just kind of "feels" greedy.

#

But whether an algorithm is DP is.

past meteor
#

Can't you define it as places where you have some sort of bellman recursion and you place the gamma term to be 0

iron basalt
#

So if you ever find a CS exam that asks if an algorithm is greedy or DP or something, they are not exclusive and only one is well defined...

past meteor
#

So your chosen policy only tries to maximize P(Reward|State)

iron basalt
#

Had to correct Wikipedia on some of the DP and greedy stuff, they did change it though, so that's nice.

#

Not the correction I made, just a quote from Wikipedia: "From a dynamic programming point of view, Dijkstra's algorithm for the shortest path problem is a successive approximation scheme that solves the dynamic programming functional equation for the shortest path problem by the Reaching method."

#

Yet we don't consider it approximate. Since we run the whole thing till the end.

#

And we know we will reach some reasonable end.

iron basalt
#

Greedy is suppose to mean that you kind of don't enumerate all your options, you just pick the first one with limited scope.

#

But that scope can vary between the algorithms / contexts. If you are looping over a bunch of options and picking the best, is that really "greedy?" Sure it's not globally picking the best, but it's also not super local either.

#

So how local is local?

#

If i'm doing backtracking stuff and back out just 1 level, is that still local?

#

What about 2?

#

Usually it just boils down to practically, "the inner most loop."

deep sleet
#

This is a bit dumb but what does DP refer to in this context?

iron basalt
deep sleet
#

Like what do you mean by DP in this conversation

iron basalt
deep sleet
#

Ohh

#

Thx

lapis sequoia
#

Ight, wish I didn’t start deep learning a month ago. However, what is the most important math when it comes to deep learning and why do I feel it is matrix and linear algebra?

cloud flower
#

Hello everyone,

I'm currently working on a homework problem where we need to approximate the natural logarithm using an iteration method described by B.C. Carlson. The method involves iterative computation of arithmetic and geometric means, and seems quite interesting.

The iteration method is detailed in the paper:
B.C. Carlson: An Algorithm for Computing Logarithms and Arctangents, MathComp. 26 (118), 1972 pp. 543-549. DOI:10.1090/S0025-5718-1972-0307438-2.

Here's a summary of what I need to do:

- Initialize \( a_0 = \frac{(1+x)}{2} \) and \( g_0 = \sqrt{x} \).
- Iteratively compute \( a_{i+1} = \frac{a_i + g_i}{2} \) and \( g_{i+1} = \sqrt{a_{i+1} \cdot g_i} \).
- Use \( \frac{x-1}{a_i} \) as an approximation to \( \ln(x) \).```

The task requires writing a function `approx_ln(x, n)` that uses \( n \) iterations of this algorithm to approximate the natural logarithm \( \ln(x) \).

I'm a bit stuck on how to implement this in code, especially handling the iterations and ensuring the accuracy of the approximation. Could someone provide guidance or a starting point for this implementation? Any help would be greatly appreciated!
ocean apex
#

@cloud flower I faced the same issues recently, but luckily, I got some assistance. Unfortunately, I can't send you any links on this platform. However, if you send me a message, I can offer you some possible support.

half lintel
#

pandas/dataframe question (noob)

I have a bunch of rows in a DF, each of which have a couple of attributes (timestamp, account id, resource type, resource id); And a "status" value

How can I get a summary of the number of each "status" for each account_id + resource_type + resource_id?

Feels like groupby + value_counts() but I can't get my head around it

#

df['STATE'].value_counts() gives me a nice series with:

  • running: 2052
  • stopped: 180
    But I need those for each "thing" (which is resource_type + resource_id)
agile cobalt
#

just groupby + count

you'll end up with a Multi Index though, so you'll probably want to call reset_index() after it

half lintel
#

Hmm
df.groupby(['ACCOUNT_ID', 'TYPE', 'ID', 'STATE']).count().reset_index()
Looks kindof close. It's counting columns that aren't in the groupby

agile cobalt
#

you specify

  • which columns to group by
  • which columns to operate on (default: all columns you did not group by)
  • which operation to do on those columns
#

for count, it should be the same regardless of which column it is counting

half lintel
#

some of the other columns have zero or NaN, the tallies seem to be different depending on that.

#

but just using a column which is always present (eg timestamp) look OK

#

sorry can't cut/paste, discord is on different machine to work (where it's banned)

unkempt wigeon
#

What should I teach a small neural network?

half lintel
#

Thanks @agile cobalt

#

I am fairly competant python dev, but would like to get (much) better at data. But there is lots to learn in pandas

hidden sapphire
#

Anyone have any good papers/other resources on facial recognition machine learning models?

half lintel
#

etrotta from a DF with [a,b,c,d] columns, how might I add two columns:
group-by [a,b,c] + how many state=running
group-by [a,b,c] + how many total states

#

Trying to make a summary. Input data has hundreds of thousands of hows (the status of every resource, every 15 minutes). The group-by you helped with reducing it to one row per unique resource, with a count, which is super cool.
But would be nice to have two columns running="116" and total="149"

half lintel
#

Got it - using .size() and .unstack()

glass ridge
#

i have a question posted in python posted in python-help about arrays , can anyone help ?

broken eagle
#

Anyone has info on constrained clustering algorithm?

past meteor
zealous acorn
#

Hello, can someone recommend a book for me to learn AI with python? Learning by reading books appeared to be more effective than learning from the internet for me. Thank you in advance

past meteor
zealous acorn
#

Oh i'll make sure to check them out tysm

#

Do u know any actual book tho?

past meteor
#

they're actual books

#
zealous acorn
#

Ah alright alright

#

Tysm

pseudo moon
#

could anyone help me with wandb here? I'm trying to find out how to increase the upload limit here. I'm stuck with 0.934 MB every run. How can I increase it?

mellow glen
#

can anyone resolve this error
plz ping me

serene scaffold
#

actually, I see the problem. you wrote import open ai. that's not the right import statement

#

it's probably something like import open_ai or import openai. there definitely won't be a space.

mellow glen
#

ModuleNotFoundError Traceback (most recent call last)
Cell In[33], line 1
----> 1 from langchain import PromptTemplate
2 from langchain.chains import RetrievalQA
3 from langchain.embeddings import HuggingFaceEmbeddings

ModuleNotFoundError: No module named 'langchain'

serene scaffold
mellow glen
#

it is installed

serene scaffold
#

what did you do to install it?

mellow glen
#

pip install langchain

serene scaffold
#

okay, so the python instance that you installed langchain to is not the one you're using to run it

serene scaffold
# mellow glen pip install langchain

make a new cell in your notebook with these two lines, and run that cell, and show the result in this chat as text (not a screenshot)

import sys
print(sys.executable)
mellow glen
#

c:\Users\kabhi.conda\envs\mchatbot\python.exe

serene scaffold
mellow glen
serene scaffold
# mellow glen its ok btw thx bro

you need to install langchain in the python that's at c:\Users\kabhi.conda\envs\mchatbot\python.exe. you have more than one python on your computer. if you did pip install langchain and it worked, it went to some other python (probably the default one)

#

doing c:\Users\kabhi.conda\envs\mchatbot\python.exe -m pip install langchain might solve it, but powershell (like conda) is something I don't use.

polar minnow
#

Hi guys, I'm learning Python programming for ML, libraries such Matplotlib, Numpy, Pandas, SkLearn. I'd like to find a job within some months, I have in plan to build some little projects following along some online paid courses (on Udemy). What do you think? Do I need to learn other things? I'll get to learn TensorFlow as well, although I don't know if it's enough to work

mellow glen
#

File c:\Users\kabhi.conda\envs\mchatbot\lib\site-packages\langchain_api\module_import.py:69, in create_importer.<locals>.import_by_name(name)
68 try:
---> 69 module = importlib.import_module(new_module)
70 except ModuleNotFoundError as e:

File c:\Users\kabhi.conda\envs\mchatbot\lib\importlib_init_.py:127, in import_module(name, package)
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:991, in find_and_load(name, import)

File <frozen importlib._bootstrap>:961, in find_and_load_unlocked(name, import)

File <frozen importlib._bootstrap>:219, in _call_with_frames_removed(f, *args, **kwds)

File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:991, in find_and_load(name, import)

File <frozen importlib._bootstrap>:973, in find_and_load_unlocked(name, import)
...
76 ) from e
77 raise
79 try:

vestal spruce
serene scaffold
#

That isn't to say that you shouldn't learn it if it interests you, but if you need to train for a job, you should look at more attainable options.

serene scaffold
mellow glen
#

ok

mellow glen
#

its showing No Python at '"cd D:\MP\End-to-end-Medical-Chatbot-using-Llama2\python.exe'
this error

#

but i have py

cloud flower
river cape
#

model.fit(X_train,y_train,epochs=100)

Which gradient descent is used here? and if I dont mention the batch_size , which gradient descent will be used?

tidal bough
#

check the docs for whatever model that is, they should say

tidal bough
# cloud flower Help

You might want to ask in #1035199133436354600, since it's implementation you want help with. Also, people will generally not write code for you, so if possible ask concrete questions and/or post what you've written so far.

cloud flower
# tidal bough You might want to ask in <#1035199133436354600>, since it's implementation you w...

This is what I have so far:

def approx_ln(x, n):
    # Ensure the input x is greater than 0
    if x <= 0:
        raise ValueError("x must be greater than 0")
    
    # Initialization
    a = (1 + x) / 2   # a_0
    g = math.sqrt(x)  # g_0
    # Iterate n times
    for _ in range(n):
        a_next = (a + g) / 2  # Compute a_{i+1}
        g = math.sqrt(a_next * g)  # Compute g_{i+1}
        a = a_next  # Update a to a_{i+1}
    
    # Approximation of ln(x)
    ln_x_approx = (x - 1) / a
    return ln_x_approx

# Example usage
x = 5
n = 10
print(f"Approximation of ln({x}) with {n} iterations: {approx_ln(x, n)}")
tidal bough
#

That looks right - so what's the problem?

cloud flower
tidal bough
#

You could check that it converges to math.log(x) as n increases.

cloud flower
#

I see, so comparing the output of the approx_ln function to the result of math.log(x) for increasing values of n, and observe how the approximation improves as the number of iterations increases.

#

Pseudocode-wise, what I thought was that we need something that excludes it accepting x <= 0. Then we can initialize a_0, g_0, and then in the last step use the appropriate loop that just iterates over 1 to n, where n is what you give as input.

tidal bough
tidal bough
cloud flower
tidal bough
#

Calculate math.log(5) - approx_ln(5,n) for n in range(1,20), then plot it with matplotlib (with a logarithmic y-axis). In my case I also compared it to a ∼4^(-n) line, because it seems it fits the points perfectly (it's surprising to me, normally approximations don't converge this nicely).

misty shuttle
#

why does this error occur?

#

the xlabel and ylabel code does not work aswell

tidal bough
#

that means that somewhere you did plt.title = ...

#

much like how in this cell you're doing plt.xlabel = ..., which is also wrong. these are functions, you should be calling them, not redefining them.

misty shuttle
#

Oh right, but even when I change the plt.xlabel('X axis) and run it, still doesnt work. Gives the same error

#

This is the entire document, literally nothing else

tidal bough
#

The simplest fix would be to restart the notebook.

river cape
tidal bough
#

Well, sure, .fit in sklearn can do completely different things depending on what you're calling it on.

unkempt apex
#

how can I import modules in google colab?

#

which have dependencies in it!

tidal bough
#

you just import them as normal.

#

if you mean how to install them - with %pip install somecoolpackage

unkempt apex
#

nah, I did with gdrive

#

I mount that first!

#
os.chdir('/content/drive/MyDrive/Pong/')
!ls
#

so this prints all the files which I need

#

but when I import them , it shows error

#

no wait I solved that!

toxic palm
#

How to execute & test aws lambda locally in vscode?

versed pilot
# polar minnow Hi guys, I'm learning Python programming for ML, libraries such Matplotlib, Nump...

Do consider data analyst jobs. You are listing Matplotlib which is a visualisation library, numpy for arrays and Pandas which is a dataframe analysis library. While these are sort of prerequisites for ML, SkLearn is the only ML library in your list. With this toolset I would advise you to also look at jobs involving data transformation, data analysis and visualisation. I'd also advise you to learn statistics libraries such as scipy.stats (while you are at it, have a look at scipy.signal and scipy.optimize ) and/or statsmodels. You don't say what your background is otherwise, do you have a social or natural science or engineering degree that involved statistics, computational methods, mathematics in general?

deep sleet
#

What is smote

opaque oasis
#

What a good Github repo to look at for creating a GitHub portfolio for data science and data engineering and AI. What type of projects should i have.

deep sleet
loud plank
#

Where would guys recommend wanting to create AI and trying to learn from the absolute basics of having a small foundation of Python.

left tartan
loud plank
past meteor
#

Often times the people that make it don't really know what they're doing either 🙂

#

If you're going to do that, look for open courses like from MIT or so

loud plank
#

And 20 minutes in I feel like I’m wasting my time

past meteor
#

Or check out the second pinned post, I have a bunch of books there but they'll need a decent understanding of Python

loud plank
#

so I’ve been struggling to find a resource that actually teaches structure, basic understanding of code, et.

#

yeah I saw

past meteor
#

Pick any book, literally any book, that teaches Python and is somewhat recent

#

and you'll be fine

loud plank
#

Will do
Thank you both for the help

#

Yall are awesome

left tartan
# loud plank Yall are awesome

!kin And, start doing small projects. It doesn't matter what it is (web site? Tic tac toe? Etc): the easier the better

arctic wedgeBOT
#
Kindling Projects

The Kindling projects page on Ned Batchelder's website contains a list of projects and ideas programmers can tackle to build their skills and knowledge.

past meteor
left tartan
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

past meteor
#

Another practical tip I can give is that if I want to understand something I read several books on the same topic 😩

#

Hence why I'd say, pick any beginner book. You'll probably read several of them

unkempt wigeon
#

How do I start a neural network library?

cloud flower
# tidal bough Calculate `math.log(5) - approx_ln(5,n)` for n in range(1,20), then plot it with...

Hey, the follow-up task to the first one was:

Plot both functions, ln and approx_ln, in one plot and the difference of both functions in another plot. Do this for different values of n.
And this is what I wrote:

import numpy as np
import matplotlib.pyplot as plt

def approx_ln(x, n):
    if x <= 0:
        raise ValueError("x must be greater than 0")
    
    # Initial values
    a_i = (1 + x) / 2
    g_i = np.sqrt(x)
    
    # Iterative calculation
    for _ in range(n):
        a_next = (a_i + g_i) / 2
        g_next = np.sqrt(a_next * g_i)
        a_i, g_i = a_next, g_next
    
    # Final approximation
    ln_approx = (x - 1) / a_i
    return ln_approx

# a range of x values
x_values = np.linspace(0.1, 5, 400)

# true ln(x) values is computed
true_ln_values = np.log(x_values)

# values of n for approximation
n_values = [1, 2, 5, 10]

# subplots are created here
fig, axes = plt.subplots(2, 1, figsize=(10, 12))

# The ln(x) and approx_ln(x) for different n values is plotted
for n in n_values:
    approx_ln_values = [approx_ln(x, n) for x in x_values]
    axes[0].plot(x_values, approx_ln_values, label=f'approx_ln, n={n}')
axes[0].plot(x_values, true_ln_values, label='ln(x)', color='black', linestyle='--')
axes[0].set_title('True ln(x) and Approximations')
axes[0].set_xlabel('x')
axes[0].set_ylabel('ln(x)')
axes[0].legend()
axes[0].grid(True)

# The difference between ln(x) and approx_ln(x) is plotted
for n in n_values:
    approx_ln_values = [approx_ln(x, n) for x in x_values]
    difference = true_ln_values - approx_ln_values
    axes[1].plot(x_values, difference, label=f'n={n}')
axes[1].set_title('Difference between ln(x) and approx_ln(x)')
axes[1].set_xlabel('x')
axes[1].set_ylabel('Difference')
axes[1].legend()
axes[1].grid(True)

# Show plots
plt.tight_layout()
plt.show()
#

Curious what you think.

deep sleet
#

Does anyone know how to use tqdm function with scikit learn cross validation so I can see the progress

cloud flower
toxic palm
#

can we start AWS lambda function from vscode using terraform?
I mean, terraform is only for resource creation & destroying or can we also do above activity?

violet gull
#

Connect the lambda function to an api endpoint and call it from any framework that calls apis

toxic palm
# violet gull Wym by start?

This is what i did as of now

--> Created an .tf file where i wrote the code to create an lambda fn
--> After executing the series of commands init, plan & apply, the lambda got created on AWS console.
--> Now, if i want to use that lambda, i need run that lambda from AWS console, pass the input & see the output.
--> But, i want to execute this lambda from my console & see the output also in my console if self without touching AWS portal. Can terraform do this?

deep sleet
#

I believe my code has data leakage since it shows that I get accuracy 0.99 something

#
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

Y = train_metadata['target']
X = train_metadata.drop(columns=['target'])

numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns


X[numeric_features] = X[numeric_features].apply(lambda x: x.fillna(x.mean()))

for col in categorical_features:
    X[col] = X[col].fillna(X[col].mode()[0])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])

# Split to get 10% for training
X_train, X_rest, y_train, y_rest = train_test_split(X, Y, test_size=0.9, random_state=42)

X_test, _, y_test, _ = train_test_split(X, Y, test_size=0.9, random_state=43)

X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)
left tartan
#

Your split isn't splitting

left tartan
deep sleet
#

also shouldn't it be a very low chance for them to be the actual same 10%

left tartan
#

Why wouldn't you use train test split once to get the train and test?

left tartan
# deep sleet also shouldn't it be a very low chance for them to be the actual same 10%

I don't know you data, I'm just saying this is a weird way to do this. You can pass train size and test size in a single call: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

deep sleet
#

Ohhh okay

vale jungle
#

people what's up with interactive graphing these days?

#

looks like a bunch of web-based trash which renders in browsers

#

is there anything for data exploration which is comparable to the ease-of-use of ggplot through rstudio?

agile cobalt
#

Jupyter Notebooks with seaborn, plotly or bokeh

vale jungle
#

why not just seaborn?

agile cobalt
#

eh I like plotly better myself

vale jungle
#

and plotly is the kind of web-dependent trash I'm referencing as being suboptimal

#

let me state that more politically

#

plotly, I'd like to avoid, and I think categorically as I noted above

agile cobalt
#

interactivity can be complicated

vale jungle
#

I don't find it useful in my work so it's fine to have say no tooltip at all. I am just looking at trends.

#

in fact it can be a dead graph once I draw it!

#

as long as it can be redrawn instantly when I change filters / columns / rerun analysis and build some new summary column

agile cobalt
#

why did you ask for interactive graphing then?..

vale jungle
#

because the graphing process is interactive in ggplot

#

e.g. you layer, you filter, you manipulate, and the graph rendering process is indeed interactive

#

it's just not point and click interactive, as I guess you assumed I must mean

#

my b to confuse

agile cobalt
#

sounds like literally any IPython terminal / Jupyter Notebook?

vale jungle
#

yeah I don't think an terminal as great for ease of use, nor jupyter notebook

#

I'd be doing analysis via pycharm or vscode

#

whichever the graphing library's rendering requirements integrated best with

agile cobalt
#

you realize you can use notebooks inside of VS Code / PyCharm right?

#

(and I'd strongly recommend using inside of them over a web interface)

vale jungle
#

why would I want that?

agile cobalt
#

if anything maybe take a look at Spyder, might be closer to R Studio

vale jungle
#

they always seemed, to me, to be a way to containerize examples for export in the web-ed systems

#

not necessary for actual data science

agile cobalt
vale jungle
#

I want to build an environment once and persist it

agile cobalt
#

IPython works just as well, if not better, if you're familiar with terminals and don't need of inline plots

vale jungle
#

e.g. I like the environment in which I install tf and pytorhc etc to be static

#

see inline plots - what I want it ease of data exploration. So like, rstudio just puts that plot right on the screen, right above your working environment's CLIs

agile cobalt
vale jungle
#

I love that. I am trying to find a similar configuration to enact a chillness like that for a python-based data analysis.

agile cobalt
vale jungle
#

so what's up with notebooks? do they have their own environments, or can you link any environment to a notebook?

#

can they share an env?

#

ah

agile cobalt
#

the notebook itself does not contains the environment - you have to specify which environment it'll use as it backend

vale jungle
#

so what does it do?

agile cobalt
#

just contains the code and its output

vale jungle
#

if my environment is local, and my ide is local (you said I can use a notebook in pycharm)

agile cobalt
#

yes

vale jungle
#

so I'd write a little plotly and run this notebook and that would handle rendering directly to what?

agile cobalt
#
  • You should create a virtual environment (or a conda environment) and install your packages to that environment
  • Your IDE can create it for you, or you can create it manually and tell your IDE to connect to that environment
  • You can specify which environment to use when running scripts or terminal sessions
  • You can specify which environment to use as the "Kernel" for notebooks, which is just a fancy way of saying which environment they're running in
vale jungle
#

do I need to coordinate between the notebook and plotly, or does plotly coordinate with the notebook which calls my local environment to render it?

agile cobalt
#

you would just write some code like ```py

cell 1

import pandas as pd
import plotly.express as px

cell 2

df = pd.read_csv('data.csv')
df # show a preview of the df

cell 3

px.bar(df, x="name", y="score") # shows the plot as the cell output

vale jungle
#

yeah looking good, I'm testing it now

#

any way to log output?

agile cobalt
vale jungle
#

data. I can see my graph (adjusted from the example graph in the jupyter docs)

#

but if I do print('hi there')

#

I see no changes

agile cobalt
#

did you remember to actually run the cell?

vale jungle
#

nope!

#

was looking for a button like that in the locality of the code & output

#

they're making some big deal about "packaging" those things together

#

tyty

agile cobalt
#

remember to check the hotkeys for things like running cells

vale jungle
#

sure thing. So spyder vs jupyter?

#

spyder I presume local environment setup is easier?

#

or is that the de-factor tool for using notebooks with local env even?

agile cobalt
agile cobalt
vale jungle
#

I don't cling to the past 😏

#

I shall try both and report back

agile cobalt
#

there is also the option of just using neither, writing your code in normal .py files, and running it in a normal terminal
(either entire files at once like normal, or just select regions of it then use some keybind to tell your IDE to run it in the terminal. shift + enter does that in VS Code)

vale jungle
#

hurrrgth man I've done that plenty, I'm looking for cutting edge and that means hot-reloading of live-edited code

small wedge
#

jurigged this

vale jungle
#

if it must be scripted

small wedge
#

look into that module, it's pretty nice

agile cobalt
#

you should be able to use ctrl + enter to re-run the cell you're currently editing in Jupyter, not sure about Spyder, but I'd advise against actual hot-reloading

vale jungle
#

looks good. But I've still need I think for the web environment and some I/O behind the scenes management which it seems jupyter /spyder may do

vale jungle
vale jungle
#

thank you both. It's been productive, I feel like I can get back into viz with an approach which isn't from the stone age.

agile cobalt
#

btw, as a bit of a side note, for actual web dashboards (not exploration) you would typically use something like streamlit, dash or gradio

they all support most plotting libraries

vale jungle
#

gotta admit I do all my web graphing in ChartJS because it's just got exquisitely pretty defaults and the minified version is small enough to ship to p much any client

#

and if I'm writing it for production, I mean. The time to write is worth it.

left tartan
#

I use plotly also because it's easy to export to plotly.js

merry ore
#

Hello folks, not sure if this is the right place to ask, but are there any good solutions right now for using native webgpu for inferencing right now?

mellow glen
#

can anyone help me integrating py code of chatbot with pre buit website ??

magic cedar
unkempt apex
#

I only run colab for 3 hours and now it is saying timeout!! ( I mean no access )

#

but the docs says that you can use for 12 hours!

rich moth
orchid thunder
#

dude ML is so fun to learn

#

minus the math

#

and everything else

rich moth
unkempt apex
#

did this created problem?

rich moth
unkempt apex
#

anyways I am using kaggle now!

past meteor
#

Add more padding/margin

#

For simplicity's sake I have it render html and render it directly

#

I'm not streaming the response rn, once I do I'll have to change it to markdown and convert it to html client side

#

Because I think it'll misbehave when it's streaming a response like <h1> the title ...

#

I've been slacking with polishing mine

#

it's functionally done but I want to add polish

#

nice

#

I created a static site generator by mistake for my personal site 😢

past meteor
#

Yesn't. I have a system that hooks into an existing build tool (vite) to render my markdown into HTML, add web components and then publish to my site on every commit. Also has basic stuff like automatic reference lists, template inheritance and so on.

#

I didn't have to write any of the hard stuff myself

#

yup, but I don't "trust" their support for web components

#

All my links are turned into this style

#

done at build time

#

yes

#

At build time I just have some JS (but it could've been Python or Go) that does this:

markdown => HTML => replace <a> to <custom-component> . I also generate a reference list like this

#

Maybe hugo does this?

#

I was too lazy to read the docs so I did the stupid thing and created my own 🤣

#

My JS is only at build

#

The served site has virtually no JS

#

just disabled JS out of curiosity and most of my site still works (dark mode dies)

#

yes, I used to use it but meh

#

a dev container with a dataset inside?

#

Sure, go for it

#

unsure how big the userbase is of dev containers

remote stream
#

guys how important is maths for Machine Learning

past meteor
#

nix is a good solution for this

toxic mortar
#

Hello, is the some ML models/techniques that can identify sub-keywords within concatenated words or strings. Something like tokenization, segmentation, or sequence labeling techniques. I am looking for decomposing down username concatenated string. Thanks!!

past meteor
#

It's enough to have a decent basis and then you can learn what else you need when necessary

remote stream
#

because i am new to learning ml

past meteor
#

it's not a question you can answer precisely

#

Start by going in blind without knowing math and building things

remote stream
#

like what topics are highly used i

past meteor
#

To see if you like the idea of ML

remote stream
#

i watched khan academy for linear algebra

past meteor
#

And then read any standard book on linear algebra and then calculus

#

No videos, use a book

remote stream
#

and youtube for calculus

past meteor
#

Learning from videos is a myth 🤓

remote stream
#

he teaches with application in ml

past meteor
#

sure usre but you have to engage with the material

#

In uni you kind of do that by taking notes and summarizing them after the fact

remote stream
#

Unis are kinda ded at teaching those stuff

past meteor
#

I don't think anyone is watching Khan academy and taking notes

remote stream
#

fr

past meteor
#

Hence, go for material that you need to actively do effort in order to "parse"

#

So books

remote stream
#

but books got lot of unnecessary things too ryt

#

the things which we wont even see in ml?

#

like too much theory

remote stream
# past meteor So books

for example for PCA just knowing the working of it and the structure is necessary for ml not the mathematical derivation and calculuation right( Because we have libraries for it)?

past meteor
#

Ok so at the risk of sounding like a massive gatekeeper

#

This is something you're imo only allowed to say after you know enough about the theory

#

Before learning it you absolutely don't know what is and isn't relevant

#

In hindsight I can say "thing X and Y were irrelevant to know"

#

And maybe that's some sort of bias because knowing them helped me understand different things more easily

past meteor
#

They all want you to pick up to k principal components

#

maybe having principal component 1, 3 and 7 is the best and not range(1, 8)

#

Or maybe you need kernel PCA or an autoencoder etc etc

remote stream
#

so i just learn enough theory and start learning ml

#

and if i find any barriers i go relearn or learn about the topic and come back?

#

does that sound correct

past meteor
#

Something like that

remote stream
#

i am telling that cuz i dont wanna waste much time on learning the unwanted math

#

and learn how to actually build the model

#

cuz that will force me to learn the ones that i will actually have to learn

#

that sounds fine

past meteor
#

doing the entire theory at once never works ye

#

I think my method is a bit more hardcore but it works ™️

remote stream
#

I am absorbing most of the basic maths for ml

past meteor
#

If I read a book on say reinforcement learning I implement the majority of algorithms along the way

remote stream
#

and am going to try learning ml while relearning or learning that i havent learnt maths

past meteor
#

For Rust I did this

remote stream
#

learning along the way the things we missed for foundation

past meteor
#

For something like Rust I'd argue it makes sense to do that and not learn "as you go"

remote stream
#

do you catch the context

past meteor
#

yes

remote stream
#

not py deep learning?

#

ah ic

past meteor
#

anyway, the most important thing is just starting

#

Time spent optimizing how you'll learn and creating roadmaps is time spent not learning. Nowadays I just pick any book and read that

#

Instead of wasting time looking for the optimal 😄

#

zen buddhism

remote stream
#

But I atleast try 1 hour to optimize my learning

#

cuz blindly going on and not knowing the prerequisites hits hard

proper crag
#

this is effective if you alr know math but if i didn know any what math else i can think of to try out

past meteor
#

The problem with data is that in software if you do things incorrectly you often get runtime errors

#

or it won't compile

#

With data there's a whole class of problems you don't know you're doing incorrectly unless you know otherwise

proper crag
#

im srry i mean *if i didn know any math ....as project usually structured upon real world concept like math,clients request

past meteor
#

No type safety will stop you from leaking data from test to train

tidal bough
#

yo dawg, i heard you like containers, so i put each of your datasets into a container inside your dev container

past meteor
#

✨ idiosyncratic✨

#

No, it's just weird i'm sorry :p

#

How hard is setting up a dev env?

#

Especially for Python

#

I use Ansible to install pyenv

#

and pipx and poetry

#

and then I just do poetry install and I have my dev env?

#

I think on your part tbh, I don't see what problem it's solving

#

Don't mean it in a rude way - I'm genuinely curious

#

I care about reproducibility and automation a lot

#

So why not just do this:

#

You use whatever means to bootstrap a Python environment, likely just pip install a requirements file

#

And you just download the dataset with a get request?

#

or you use wget

#

maybe our workflows are so different I don't get it, I'll chalk it down to that haha

#

I basically have docker compose files that spin up things I can use in dev (say, a database) as well as being populated with the right env variables during deployment

#

But I make assumptions like "everyone will have Python and Docker installed"

#

Running Python itself in a dev container is a waste of time imho

serene scaffold
past meteor
#

Sure if you use codespaces this probably makes sense but hence why I keep using ✨ idiosyncratic ✨

serene scaffold
#

I saw someone on reddit who was like "I hate when vendors only ship their product as a docker image. not everyone uses docker." bruh just install it
it's great.

past meteor
#

it's basically the same workflow

#

except with mine you need to do docker compose -f <the-file-name> up -d

vale jungle
#

hey, when you're exploring data, do you guys activate the same environment in pycharm & jupyter notebook and fuss about in pycharm and only transcribe to jupyter when you have the gist of things? or do you use one of jupyter's consoles such that you share the inner ( Python ) variables?

past meteor
#

What standard?

#

There is no standard

#

By M*crosoft at large

#

I don't know anyone in the real world that uses it 🤷

#

Except for me, briefly

serene scaffold
past meteor
#

I don't dislike your setup

vale jungle
#

how is it? commendable? because that seems ideal

past meteor
#

But don't do as if you invented hot water haha

#

it's the same as others

serene scaffold
vale jungle
#

I don't find conda complicated

past meteor
#

I see more repos asking you to run some infra with docker compose up than those that use dev containers

vale jungle
#

but I maybe would've found the jupyter plugin to pycharm w/o knowing what jupyter even did

past meteor
#

At the end of the day, I don't care

#

Whatever they propose I'll use

#

Do you use vs code?

vale jungle
#

it's not wasted time

#

that I do

past meteor
#

I'll check out your question(s) in a second @vale jungle

serene scaffold
vale jungle
#

I am fluent in regular code

#

CLIs, scripts, programs

past meteor
#

But I hope you can see this is a very specific setup?

#

I mean, coding exclusively in the cloud

#

You have specific problems to your setup

#

Hence why I call it idiosyncratic

#

Your entire set up is ephemeral

serene scaffold
# vale jungle I am fluent in regular code

alright. well you can write notebooks in pycharm, or in jupyter's browser interface. they're both essentially the same. they just have buttons in different places. and pycharm will have code completion.

past meteor
#

When the majority of us are coding in persistent places

past meteor
#

anyway, we're going in circles

past meteor
#

If it's really ad hoc data exploration I'd potentially do all of it in a notebook

serene scaffold
vale jungle
#

i was referring to onboarding people

#

oh, I see what you are saying.

#

yeah that's an immediate benefit of containerization

#

dang pycharm's jupyter plugins require pycharm professional

#

how's vscode for python in general?

serene scaffold
serene scaffold
vale jungle
#

sadly I'm graduated

#

well lookin like I'm gonna try it out

#

excellent thank you community, I'm up and running in vscode with my jupys and securely wrapped in a conda.

past meteor
serene scaffold
#

I'm like that with pycharm

vale jungle
#

I'm the opposite baby I try everything day 1

past meteor
#

yup, I would've used Pycharm if I started with that, I'm sure of it

serene scaffold
past meteor
small wedge
#

I used pycharm first and realized I hated it now I use vscode

vale jungle
#

bah nonsense

#

I agree for a naive user, but let's say you've already tried 2-3 tools in the past with sensible parallels

#

but I may have no point doing it either : / I concede I am not 100% certain of my methods

#

once you've learned five programming languages and 5 IDEs it's kind of like yeah maybe you can get a sense of abilities / disabilities

past meteor
#

I use just 1 editor, vs code

#

Learning 5 editors would be strange (for me)

vale jungle
#

well lemme tell you, rstudio kick's configured vscode's ass, and I would argue the emacs is a superior tool for a couple of niche languages (particularly erlang & elixir)

#

like, for r, erlang, & eixir. Vscode just isn't the right choice because of community commitments to other interfaces and subsequent package-availabilities.

past meteor
#

Emacs just uses the language server protocol doesn't it? Anything that works there works on vs code or other stuff like sublime

vale jungle
#

supposedly, I guess

#

I didn't even know that the language server protocol was something which vscode implemented lol

past meteor
#

Anyhow, I think this is a micro optimization 🙂 but yeah, whichever you prefer works

vale jungle
#

I assume there are elements which prevent somebody from just releasing all emacs packages for vscode, because it's the case that that isn't what happens

past meteor
#

I only have my reservations of a few

#

(I think we got off topic though)

vale jungle
#

ehhh it's not a microoptimization. Like I'd say the default R studio object explorer is a beautiful tool

#

but yeah, surely not focused.

past meteor
#

vs code's R plugin has that too

vale jungle
#

true, true. It's not just that though, it's how graphs are displayed inlines

#

it's a bunch of things

#

and it's all preconfigured

past meteor
#

source: I've used it

vale jungle
#

like out of the box, Rstudio is a beautiful data exploration environment. Vscode requires some configuration to achieve that .

past meteor
#

I didn't configure anything

#

just installed the plugin

vale jungle
#

does tidyverse formatting work in vscode?

#

always liked how data tables print in rstudio

past meteor
vale jungle
past meteor
#

I'm pretty sure it does

#

In the back it just renders the tables as html

vale jungle
#

I dunno I really just felt it was magical the last few times I tried anything els

#

it's hard to find any motivation to move off r studio because of that.

#

if it ever messes with me though VScode is awaiting me for my R-dependent work, too

spring field
#

I have always loved PyCharm

#

I also use VSCode

finite sierra
#

I'm interested in learning about machine learning but don't know where to start, I'm torn between 3blue1brown's "What is a neural network" series and Andrew ng's machine learning course on Coursera, which should I choose? or should I watch both?

serene scaffold
spring field
#

mmm, I'm using PyCharm pro, so I don't need to switch the context for web frontend pithink

spring field
#

nah, PyCharm Pro edition integrates all of Webstorm

tranquil mist
#

I work in data and most of my data visualization starts in excel hehe
I then transition to vscode to actual modeling and coding

tranquil mist
#

You don’t need MUCH, just get to eigenspaces and the basics of probability theory
This will give you intuitive understanding of what machine “learning” is

unkempt apex
#

how can I convert 23 gb dataset into 2 gb!!

#

randomly selection of data?
because I just wanna test something

#

the problem is root dir contains several directories which are confusing

#

TuSimple !

#

which has folders in folders and ......

#

wait!

#

I downloaded that from kaggle!

#

give me correct tree command
tree -d

#

???
it is printing 1000 dir

#

then I have to use pastebin lol!

#
├── test_set
│   ├── clips
│   ├── readme.md
│   └── test_tasks_0627.json
└── train_set
    ├── clips
    ├── label_data_0313.json
    ├── label_data_0531.json
    ├── label_data_0601.json
    ├── readme.md
    └── seg_label
#
├── test_label.json
├── test_set
│   ├── clips
│   │   ├── 0530
│   │   ├── 0531
│   │   └── 0601
│   ├── readme.md
│   └── test_tasks_0627.json
└── train_set
    ├── clips
    │   ├── 0313-1
    │   ├── 0313-2
    │   ├── 0531
    │   └── 0601
    ├── label_data_0313.json
    ├── label_data_0531.json
    ├── label_data_0601.json
    ├── readme.md
    └── seg_label
        ├── 0313-1
        ├── 0313-2
        ├── 0530
        ├── 0531
        ├── 0601
        ├── list
        ├── test.json
        └── train_val.json

19 directories, 9 files
#

what are this numbers?
depth of dir?

#

need pastebin again

#

wait there is a readme

#

again pastebin

#

yeah!, finding right dataset , always a task!

#

are you seriously downloading whole dataset?

#

yeah , I have downloaded that with script also!

#

it's fast

wispy jackal
#

i know its not python but do you guys have a really good tutorial for data analysis in excel??

unkempt apex
#

need to search!

#

S3?

#

ohh !!

#

our budget?
what does this mean?

#

you were giving interviews?

#

just to watch 2 guys discussing!

#

you were giving interviews to other companies right?
what about current status?

#

and currently you are working in some org?

#

which GPU they ( you ) use then?

#

then what?

#

ohhh, can I ask what you do ( non profit )

#

website?

#

dox?

#

I am so dumb!

#

lm anon?

#

wtf! this short form! nice!

#

I typed 'L' lol

#

hey can I change google account if my GPU runtime is over?

#

on google colab

#

that will be nice!

#

yeah ! can do that for a GPU!!

#

yeah, it's working!

#

I have student id for AWS

#

100 dollars credit!

#

for GPU?

#

yeah!

#

thanks for this!

#

yeah, that's the thing

#

but anyways colab is also fast

#

suggest me some basic task where I can apply CNN for fun!

#

this is advance!

#

suggest for me!

#

can you share you web?

#

bruhh, come on I just need to extract features and show some output

#

dm?

#

individual computer characters? what does this mean?

#

like all characters on keyboard or what?

#

yeah that would be nice then!

#

should I start with one by one, Like first A then B

#

generate images? of what kind?

#

input -> L
output -> L image?

#

how can I develop this?

#

yeah I have read about that!

#

pytesseract

#

so you are saying , we will give input as letter name and then it will give output as that image?

#

( if else in my mind ) !!

#

which model you have implemented on that resume chat

merry cloak
#

Hi,
I'm encountering an error when trying to train my keras functional model. The error only occurs when using a custom loss function I implemented, as the problem does not arise when using TensorFlow's built-in loss functions. Here's the error:

Traceback (most recent call last):
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 344, in <module>
game_model.fit([test_map_input, test_unit_stats_input, test_tech_input, test_tribe_input, test_high_level_mask,
File "/home/quirin/PycharmProjects/vector_note_server/venv/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/quirin/PycharmProjects/vector_note_server/venv/lib/python3.12/site-packages/keras/src/losses/loss.py", line 113, in reduce_values
or tuple(values.shape) == ()
^^^^^^^^^^^^^^^^^^^
ValueError: Cannot iterate over a shape with unknown rank.

I tried debugging it while setting run_functions_eagerly(True), but it led to a different error which I also don't understand. At this point, i'd be happy to return even random or constant values in my loss function without getting an error
does anyone have an idea why i get that error and how i can fix it?
below i'll paste the loss function i tried last.

unkempt apex
#

what is unknown rank in that error?

#

your input shape is wrong !

merry cloak
#

what do you mean with wrong?
and the input shape for what?

#

Yeah... sorry - i've been through a lot of experimenting

unkempt apex
#

you have written very advanced pythonic code or what!

merry cloak
serene scaffold
#

!otn a you have written very advanced pythonic code

arctic wedgeBOT
#

:ok_hand: Added you-have-written-very-advanced-pythonic-code to the names list.

unkempt apex
#

he got rewarded for that?

serene scaffold
unkempt apex
#

I am still reading his code

unkempt apex
#

😂 what the hell is happening!

#

please tell me what was that?

serene scaffold
#

"you have written very advanced, pythonic code" is what every python dev wants to hear when they hand in their side quest.

unkempt apex
#

😂 ohhh!

#

because I have never written ! Lol

#

I have not even passed 10 lines

#

nice code though!

#

Legend!

#

python question

he is calling the function , but where he is storing the value of loss?

merry cloak
#

yes, that has to do with the basic structure of the model.

I have Multiple models combined in a single big one, and for every one of them it always calles the loss, an i have these conditionals to figure out what shape and model i need to design the loss for

unkempt apex
#

ohh got it!!, it is conditional! into one "loss" variable

merry cloak
#

because tensorflow won't allow ordinary conditionals in their functions as far as i know

#

wait a second, i can send to the error you can expect when trying

#

Traceback (most recent call last):
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 342, in <module>
game_model.fit([test_map_input, test_unit_stats_input, test_tech_input, test_tribe_input, test_high_level_mask,
File "/home/quirin/PycharmProjects/vector_note_server/venv/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 309, in custom_loss
return tf.cond(c, single_output_loss, multi_output_loss)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 267, in single_output_loss
if last_dim in [len(ACTION_TYPES), len(UNIT_ACTIONS_LIST), MAP_SIZE * MAP_SIZE, len(TECH_ACTIONS_LIST), len(TRIBE_ACTIONS_LIST), len(BUILD_ACTIONS_LIST)]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: Using a symbolic tf.Tensor as a Python bool is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.

GitHub

An Open Source Machine Learning Framework for Everyone - tensorflow/tensorflow

#

It is using tf.Tensors, which can hold a boolean value, but can't be used as such

#

It might when enabling the "run_functions_eagerly", but i wouldn't be to sure of it.

#

I'll try that, but even then, it wouldn't actually fix my issue, right?
It would just be a better written version of the function that doesn't work anyways.

#

Already on it

#

Right, which really irritates me, since it only throws that error when i use my custom function.

#

when i just call the premade cross entropy it works just fine...

#

yes, but after splitting it off into groups (the function you see was a test after hours of trying to figure out what's wrong)
there should actually be some other calculations.

#

uhh, you're right. when i return constant 0 in the multi_output_loss it runs through smoothly

#

losses = [
tf.keras.losses.categorical_crossentropy(y_true[0], y_pred[0]), # action
tf.keras.losses.categorical_crossentropy(y_true[1], y_pred[1]),
tf.keras.losses.categorical_crossentropy(y_true[2], y_pred[2]),
tf.keras.losses.categorical_crossentropy(y_true[3], y_pred[3]),
tf.keras.losses.categorical_crossentropy(y_true[4], y_pred[4]),
tf.keras.losses.categorical_crossentropy(y_true[5], y_pred[5])
]

    return tf.constant(0.0)#tf.add_n(losses)
#

Tensor("data_9:0", shape=(None, 4), dtype=float32)
Tensor("test_game_model_1/gameModelDense6_1/Relu:0", shape=(None, 4), dtype=float32)
Tensor("data_10:0", shape=(None, 11, 11, 10), dtype=float32)
Tensor("test_game_model_1/cond/Identity:0", shape=(None, 11, 11, 10), dtype=float32)
Tensor("data_11:0", shape=(None, 11, 11), dtype=float32)
Tensor("test_game_model_1/cond/Identity_1:0", shape=(None, 11, 11), dtype=float32)
Tensor("data_12:0", shape=(None, 25), dtype=float32)
Tensor("test_game_model_1/cond_1/Identity:0", shape=(None, 25), dtype=float32)
Tensor("data_13:0", shape=(None, 4), dtype=float32)
Tensor("test_game_model_1/cond_2/Identity:0", shape=(None, 4), dtype=float32)
Tensor("data_14:0", shape=(None, 11, 11, 19), dtype=float32)
Tensor("test_game_model_1/cond_3/Identity:0", shape=(None, 11, 11, 19), dtype=float32)

doesn't really look like it...

#

y_true: Tensor("data_9:0", shape=(None, 4), dtype=float32)
y_pred: Tensor("test_game_model_1/gameModelDense6_1/Relu:0", shape=(None, 4), dtype=float32)
y_true: Tensor("data_10:0", shape=(None, 11, 11, 10), dtype=float32)
y_pred: Tensor("test_game_model_1/cond/Identity:0", shape=(None, 11, 11, 10), dtype=float32)
y_true: Tensor("data_11:0", shape=(None, 11, 11), dtype=float32)
y_pred: Tensor("test_game_model_1/cond/Identity_1:0", shape=(None, 11, 11), dtype=float32)
y_true: Tensor("data_12:0", shape=(None, 25), dtype=float32)
y_pred: Tensor("test_game_model_1/cond_1/Identity:0", shape=(None, 25), dtype=float32)
y_true: Tensor("data_13:0", shape=(None, 4), dtype=float32)
y_pred: Tensor("test_game_model_1/cond_2/Identity:0", shape=(None, 4), dtype=float32)
y_true: Tensor("data_14:0", shape=(None, 11, 11, 19), dtype=float32)
y_pred: Tensor("test_game_model_1/cond_3/Identity:0", shape=(None, 11, 11, 19), dtype=float32)

#

the output for the y_true and y_pred prints...

#

Oh, i have an idea now.
I expected to get some kind of list or tuple inputted to my loss function, when i have multiple outputs.
instead i think they are iterating over them in advance and only give me the single pairs.
that was my mistake i think

#

It was before.
i have no idea why it didn't throw the error while trying to index them

#

oh... it used the batch size as an index. Thats why i got errors with inputs, because the whatever i got for my loss output used a different batch size then everything else.

sturdy canyon
#

Finally got multi-gpu training running! Though I would have expected a bit more of a speed up (left is 4 Tesla T4s vs. the right being one 2070 Super), so there's probably some more work to do pithink

late oracle
#

someone teach me neural networks like im a 5 year old

serene scaffold
umbral delta
#

why is df.join giving all nans? the dfs share no columns

#

code: ```py
def rasterize_data(d: gp.GeoDataFrame):
xmin, ymin, xmax, ymax = d.total_bounds
resolution = 1000000

width = int((xmax - xmin) / resolution)
height = int((ymax - ymin) / resolution)

transform = rasterio.transform.from_origin(xmin, ymax, resolution, resolution)

print(d["geometry"].shape)

raster = d["geometry"].apply(
    lambda x: rasterize(
        [(x, 1)],
        out_shape=(height, width),
        transform=transform,
        fill=0,
        all_touched=True,
        dtype="float32",
    ).flatten()
)
raster = pd.DataFrame(raster.tolist()).add_prefix("p")
print("raster")
print(raster)
print("d")
print(d)
print("joined")
print(d.join(raster))

d = d.join(raster, how="left")
print(d)

return d
left tartan
lapis sequoia
#

I was drunkenly speedrunning tensor flow and a nn while blaring Chief Keef on spotify and filming it on OBS, and my pc just gave out. If it turned back on and stuff, should it be fine? It was when the epochs started getting higher.

small wedge
lapis sequoia
serene scaffold
lapis sequoia
serene scaffold
iron basalt
#

If you plan on training a large model on your own hardware, make sure the cooling is good, not just so it does not turn itself off, but because it will also slow itself down if it gets too warm. This also includes the room temperature.

#

(There is a reason why all this new LLM training is taking down some power grids now)

lapis sequoia
#

Bro, how much RAM does TF take up? I feel like it takes up a lot and I have to use Jupyter for it. Is PyTorch less taxing? I feel like this is taking up a good amount of ram and I have 32 GB of ram

serene scaffold
serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
#

It’s not frying my cpu, it is actually not too bad. I just went wild on it, I was drunk, yes, but I would usually never do that. Does it really not?

serene scaffold
#

Why would the same code run faster or slower in a notebook? It's the same code.

lapis sequoia
#

I configured it through that environment

serene scaffold
#

You configured what?

lapis sequoia
#

Tensor and libraries I use for it

serene scaffold
#

Okay, well Jupyter is just another way of writing/editing code and telling python what code to run. Once python is running the code, it doesn't matter where it came from.

lapis sequoia
#

No matter what the kernel still runs. That’s pretty dumb of myself not to consider. I should probably remove stud from VS code in all honesty

serene scaffold
#

Remove stuff from vs code, to achieve what?

lapis sequoia
#

It could not hurt, I mean, I have 600 scripts of csv files, C/C++, sql files in one place

serene scaffold
#

If you have 600 csv files, those are just csv files, not scripts of csv files

#

Anyway, is there a problem you're currently facing?

lapis sequoia
#

I just meant a lot of stuff in one place, need to throw out. And no problem. It happened and it’s acting fine now. Just kind of a jump scare.

serene scaffold
#

Cool

proper crag
#

is an array is like tuple and list where a single object is consist of multiple values?

violet gull
mild herald
#

Does anyone have a working program that uses the RagDatasetGenerator.generate_questions_from_node() functionality, preferably with a limit on question generation?

#

Got about this far before hitting a ReadTimeout: Error running coroutine

from llama_index.core.evaluation import DatasetGenerator, RelevancyEvaluator
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Response, Settings
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.ollama import Ollama

import nest_asyncio

nest_asyncio.apply()

reader = SimpleDirectoryReader("data/")
documents = reader.load_data()

llm = Ollama(model="llama3")
Settings.llm = llm

data_gen = RagDatasetGenerator.from_documents(documents)

eval_questions = data_gen.generate_questions_from_nodes()
print(eval_questions)```
urban helm
#

im thinking about making a CNN based around speech recognition, how would output even look like, and how would i even train it?

urban helm
#

damn alright

delicate bay
#

anybody here knows about YOLO custom cfg?
Traceback (most recent call last):
File "train.py", line 616, in <module>
train(hyp, opt, device, tbwriter)
File "train.py", line 88, in train
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create
File "C:\Users\kuzey\yolov7\models\yolo.py", line 528, in _init
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
File "C:\Users\kuzey\yolov7\models\yolo.py", line 738, in parse_model
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
KeyError: 'depth_multiple'

keep getting this error no idea how to fix
Even thought I got depth multiple in the cfg
YOLOv7 custom configuration file
Number of classes
nc: 3 # Number of classes (Kalsifikasyon, Normal, Kitle)

Anchors
anchors:

[10,13, 16,30, 33,23] # P3/8
[30,61, 62,45, 59,119] # P4/16
[116,90, 156,198, 373,326] # P5/32

YOLOv7 backbone and head configuration
backbone:
name: darknet53 # Darknet53 or any other supported backbone
depth_multiple: 0.33 # Depth multiple for backbone
width_multiple: 0.50 # Width multiple for backbone

head:
num_classes: ${nc}
anchors: ${anchors}

Training parameters
train:
img_size: 640
batch_size: 16
epochs: 300
data: C:\Users\kuzey\OneDrive\Masaüstü\data.yaml
cfg: models/yolov7-custom.cfg
weights: yolov7.pt
device: ''
multi_scale: False
freeze: [0]

Testing parameters
test:
img_size: 640
batch_size: 16
data: C:\Users\kuzey\OneDrive\Masaüstü\data.yaml
cfg: models/yolov7-custom.cfg
weights: yolov7.pt

loud sail
#

Hello

#

Can anyone please help me here

left tartan
loud sail
loud sail
crude zephyr
#

Is there a way we can download models from roboflow?

left tartan
#

I've only been telling you for a year now

agile cobalt
#

how so? isn't it just SQL with support for a lot of inputs/outputs?

left tartan
#

Columnar (like clickhouse/snowflake), support for Python UDFs, various optimizations for windowing/etc, the json support and parquet support is more than just different inputs, the no config/in memory footprint (like SQLite) makes it very ergonomic, the language extensions (around windowing, dynamic column expressions https://duckdb.org/2023/08/23/even-friendlier-sql.html, as of joins, positional joins, etc. I think they've taken every good idea everyone's come up with in OLAP land and Sql land and Dataframe land and unified it.

dusty valve
#

Bruh

#

Those were fast reactions

#

stelersus potentially self botting?

#

Ok

cinder schooner
#

Hello, i'm working on fine grained classification for car models. Since its fine grained classification the dataset is inherently unbalanced (long tailed). I can get so far by tweaking the loss or the sampling method. Would generating images with diffusion models for undersampled classes be a good idea? is there an other method i can try to use?

serene scaffold
#

@proper crag lists are not arrays--don't get them mixed up

violet gull
serene scaffold
serene scaffold
violet gull
#

Arrays are not immutable

serene scaffold
#

The shape of an array is immutable.

violet gull
#

Just for the record this was a beginner question posted in a wrong channel. The semantics aren’t that serious

dusty valve
dusty valve
serene scaffold
# dusty valve Couldnt u store a list in a list?

yes, but the outer list doesn't "know" that it contains another list, and you can't treat the sub-lists as part of the outer list. Whereas multi-dimensional arrays are one thing and can be indexed as one thing.

dusty valve
#

Ok

proper crag
#

like i know tuple and list both different but both are where data is being stored

past meteor
serene scaffold
past meteor
#

I guess I'm talking about C or similar? Are you talking about the Python array type?

serene scaffold
#

I'm talking about numpy, since that's the mostly widely used "array" in Python.

past meteor
#

I love this question because 2 people will always be talking about wildly different things (when array vs list)

serene scaffold
#

tbh, society went down hill the first time someone used "array" in computer science.

#

that person cursed us all to eternal semantic overload.

past meteor
#

Exactly

wooden sail
#

if you make a new array, it'll assign a whole new chunk in the heap so that the elements are contiguous

past meteor
#

Hence why it's a strange question because it depends on the level of abstraction

#

It's quite common in interviews

#

Like, what is immutable etc. not being able to mutate the thing on the stack? The heap? Both etc.

tidal bough
#

a list contains a PyObject** - a pointer to a (dynamically allocated) array of pointers to pyobjects

#

that's the header of all variable-sized python objects

#

it's a few fields like the pointer to the type, a refcount and (for variable-sized ones) the size

#

which, hmm, something is funky here actually

#

What do you mean?

#

(interestingly I think lists are maybe misusing PyObject_VAR_HEAD a bit. since... lists aren't variable-sized in that sense, their variable-sized part is stored outside of the struct itself, unlike with strings. Either that or I'm just assuming too much)

#

That's because it's a macro

arctic wedgeBOT
#

Include/object.h line 101

#define PyObject_VAR_HEAD      PyVarObject ob_base;```
`Include/object.h` lines 157 to 160
```h
typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;```
tidal bough
#

Sure - it's just a perfectly normal dynamical array, with an element type of PyObject* (pointer to a pyobject)

#

when the list runs out of space, it reallocates the dynamical array to a larger size, moving all of its elements (which are just pointers - the objects they point to aren't moved)

#

Not totally sure what you mean - a pyobject may have references of its own, yeah. Or it may not - a string, say, doesn't reference anything.

#

That's why you can't have an array of PyObject, yes - only of PyObject*, which is what lists do.

#

Yeah, I think you're missing this one *. A PyObject is really a pyobject, no references involved. That's why a list must have an array of PyObject*s, not PyObjects.

#

it's kind of like an ob_item: Vec<Rc<PyObject>> :p
(not really, because refcounting in python is done inside the pyobjects in question, but similar semantics)

#
  1. Arrays of what?
  2. We do, array.array.
#

yeah, it's just a 1d array of some predefined types

past meteor
#

If you find this stuff interesting you could read "fluent python"

#

This convo is part of the topics it covers, albeit in less detail

serene scaffold
#

imagine using array.array

serene scaffold
past meteor
#

DSA got rebranded to data structures and algorithms for a good hour instead of data science and ai 😂

#

But speaking of data science. Would you find it interesting if I'd do kaggle competitions or similar and do write ups of them in my blog?

#

At a beginner / intermediate level

#

I only ended up doing non data stuff there so far

hoary merlin
#

i need a package for sentiment analysis
any reccs?

#

im using flair bur having problems with it

real phoenix
#

I want to write a book about data science. Which would be the best tool to help me ? Claude ? ChatGPT ? Gemini ? Also pitch in with your versions too please.

agile cobalt
#

Nobody wants to read a book written by a LLM.

real phoenix
#

You'd be surprised 🙂

small wedge
#

brainrot

iron basalt
#

Yes, probably, technically you can allocate the PyObjects anywhere. PyListObject is a dynamic array of PyObject references. This is what lets it have different types of objects in the same list, because PyObject is a base class.

#

Yeah.

#

The struct itself is a dynamic array, of references.

#

Each element in the array is a reference to a PyObject elsewhere.

#

So it's an array of memory addresses.

#

It's a contiguous chunk of memory addresses which are all the same size (pointer size).

#

Oh, no I meant the ob_item.

#

Which is the main thing about this type, everything else is data to manage that.

wild coral
#

does anyone have any experience with pytorch combined with numerical integration / gradients ?

#

particularly quad torch?

iron basalt
#

An array is a fixed length, contiguous, ordered, fixed element size chunk of memory.

#

It becomes a multi-dimensional array depending on how you access it.

#

Well no, the type distinction is important, we usually define types by their interfaces.

#

Like how the PyListObject is no longer just an array because it's first a dynamic array (due to how we handle the array / reallocate it), and a list due to what it stores.

#

It's built on top of an array, but we don't really call it an array anymore.

#

Almost everything is built on top of an array.

#

A binary tree for example can be built on top of an array, but we don't call it an array due to how we store and read from it.

#

(The access pattern / interface)

#

Otherwise, everything is just an array.

#

So calling arrays and lists two different things is important, because the first does not get across much about how a list functions. Same thing for the tree.

#

Also the list in Python is suppose to match the idea from math a bit more.

#

Yeah, math has lists.

#

It has arrays, lists, tuples, sets, etc. Ofc, you can build everything on sets, which they do for other reasons.

#

Maps from subset of natural numbers to some set.

#

(list)

#

Yeah, and since sets are suppose to be able to have all kinds of things in them, Python does too.

#

For practical purposes Python lists are also dynamic arrays, which is where all the real practical use comes from.

#

Dynamic arrays are usually the first thing one tries to get before doing anything else these days.

#

And then everything on top of that.

#

Dynamic array is just an array that can resize.

#

Yes, often, but it can be allocated anywhere.

#

No, you can do either in either.

#

Make an array on the the stack that is very large, put your dynamic array in there.

#

That's exactly the same as what the heap does.

#

It's also one big array.

left tartan
iron basalt
#

OS managed yes, although people often have other heaps managed by themselves.

iron basalt
#

But it's not a dynamic array.

#

It's a list.

real phoenix
#

That's the idea*

iron basalt
#

It has the interface of a dynamic array still though in addition, the ability to resize.

#

There is the OS's heap, aka just called "the heap." There is the stack which is allocated and handed over to the program, although it can also be resized btw. And a user can also not use either and instead allocate memory pages directly from the OS.

#

If there is no OS you own all the memory (one big array, can point to anywhere in it, even null).

#

Yeah, OS needs to give you (virtual) memory.

#

The compiler decides for the stack.

#

If it wants some.

#

At startup.

#

The heap is an abstraction built on top of virtual memory.

#

When you call malloc is calls the OS's heap alloc which calls the virtual memory alloc.

#

No, it operates on pages.

#

When you alloc via malloc like 10 element int array, you get at least page size allocated, since you can only allocate in pages.

#

So like 4kB.

#

It is scattered, but in pages, and pages are fixed size chunks. In addition, they may not even be main memory, it can swap them in and out using the disk.

#

It also gives you a virtual address space for your process and your pointers are virtual pointers that need to be remapped to physical addresses.

#

No, the heap is built on virtual memory.

#

Virtual memory also makes use of the memory management unit, a hardware implementation of this addressesing remapping and paging.

#

Physically, yeah.

#

If there is an OS (that makes use of virtual memory) you are not directly dealing with the physical layout.

#

The clever indexing you mentioned is one of the cool things about virtual memory, it handles it for you, and so from the programmer's POV it's contiguous.

#

You can for example make your own heap. Request some pages / space from the OS, then build a heap data structure on that.

#

The heap is a convenience for allocating things of different sizes without having to think about where to put them, but it comes with downsides, it's often not a good choice for performance and makes memory management more error prone in manual memory management languages.

#

For their use cases it serves no purpose.

#

The best way to not shoot yourself in the foot and get max performance in a language like C is to use region based memory management, which is like the thing mentioned before where you can just get a big array/chunk and then put your actual data structure in there / build on top of it. Each region for a different part of the code / purpose. https://en.wikipedia.org/wiki/Region-based_memory_management

In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Memory allocators using region-based manageme...

#

Unfortunately due to malloc being built in, many C programs have just been spamming it all over their program, leading to many memory leaks and performance issues.

#

Region based memory management used to be the standard way apps where developed (especially arcade games) but once garbage collection became the more normal thing and most were used to that, it became a lost art.

#

In general most don't spend enough time in C to learn these things.

#

(And why would you? Most things can just be done with Python or whatever anyhow in way less time...)

#

You still see it in game engines, because they can't ship a 10 fps game.

#

Or other high perf.

#

Or even ML, where everyone is kind of doing it via one big region loaded onto to the GPU (you can heap alloc on the GPU).

unkempt apex
#

hey lisan
weather prediction from sky images?
( to learn about CNN )

#

Categorize them into classes like "Sunny", "Cloudy", "Rainy", "Stormy".