#data-science-and-ml
1 messages · Page 131 of 1
in arrays , the axcis 0 represents columns or rows ?
Depends on how many dimensions there are. If it's a 2d array, 1 is rows and 0 is columns
No, the opposite
what about 1 dimention
Rows are second to last and columns are last
If it's one dimensional, the concept of rows and columns does not apply. It's just a flat sequence.
ok
This is always true for any array with two or more dimensions.
in 3 dimentions?
?
A matrix is a 2d array
ah i got it
So if you have a 3d array, it's like you have a stack of 2d arrays
so axis 0 is columns?
So if you have an array of shape (5, 7, 6), that's a stack of five matrices that have seven rows and six columns
yeah
Which means that rows are dimension one and columns are dimension 2
And if you have an array of shape (8, 5, 7, 6), it's like you have a stack of stacks.
so we can call dimention 3 matrice , then axis0 is matrice , axis1 is row , axis2 is column
right?
yeah , got it
If you have (8,5,7,6) then 2 is rows and 3 is columns. The dimension for rows and columns is always the last two
But you can also use -1 and -2 like list indexing
So for an array with two or more dimensions, -1 will always be columns and -2 will always be rows
Gotta run
thank u
got it
i ws really confused
now i understood
I'm glad
Just a random question
Is pandas efficient on large scale datasets?
Or is there something else you learn for data with let's say millions of rows?
yes if you use it properly
Noted
Polars is notably faster / better for large data sets, as are a few other libraries. Pandas is making some improvements, but generally it hits various performance walls.
I treat Pandas as something that's convenient but not performant: I only use it because it's commonly used/known, but anything serious I use something else
Can you give examples for the other stuff you personally used?
I use duckdb, as my pipeline is primarily SQL.
right, I'm just saying that you can't really analyse the changes in stock price based on previous results
time and price is simply not enough to predict such a complex machine as the market
It depends on the instruments tbh for something like eurusd you can actually find with the human eye patterns that are related to time specifically
And then you can add level 2 market data if you have access to those which I think should be give it a necessary input right?
I know that I pretty much barely have any knowledge in the field of Ai but I been in the forex market specifically for 2 years and I can say price and time are enough for a human with enough backtesting data to find not specifically pattern but a context of how the market opperates
Hi, I have a task of analyzing the linkage between a person’s housing and impact in their health. Basically, I have chunks of texts (some are just one or two sentences) where interviewees answer this question. What kinds of info should I grab from this data? I was thinking of doing a sentiment analysis and an emotion detection, and maybe doing like a word cloud to see which words come up most often. Do you guys have better ideas? lol
You need to extract key phrases that tell you about their housing and which tell you about their health. And those need to be separate
If for a given person, you don't get information about both their housing and their health, that person is useless for you
Can you imagine what parameters of a person's housing situation might impact their health?
Also, word clouds are really just for fun. If you want to see what words appear most often, you should just make a table of word counts.
Ok, that makes sense, I’ll delete the fluff from the extracted text
The fluff?
Not exactly
I mean I guess you can think of it that way
The point is that if someone says, for example, that they live in an apartment, you need to get that information. And you need to be able to identify the responses from everyone else who lives in an apartment and see what they have in common regarding their health.
Ohhh that’s a good idea
Yeah Ig to answer that question, the type and quality of housing would be one, if the rent is within or out of their budget, their neighborhood (ppl, area in general), relationship w landlord/manager
Also once I have all this info, how should I format this to make a sort of report
always assume the answer is yes and give the information someone would need to fulfil your request.
are you looking at the effect on physical health, or mental health? and if it's both, are you treating them separately?
no, I just need to rest
you need to rest? sure. next time you ask for help, remember to assume that someone will help, and ask your question right out of the gate.
Yeah I’m looking at both, I will make distinctions between them (saying it’s related to mental and/or physical health)
We mostly see a lot of mental health issues bec. the high rent causes a lot of stress to the tenant, some is stress due to lack of help around unit (like when they request for repairs, LL/manager doesn’t help)
This might not be related as much but I read a research paper a while ago
saying that the style of your neighborhood and how clean your housing is affects your psychlogy to a degree that it also affects your life span
yeah, thats what we see too
how did you format your research paper?
Oh I just read that out of interest
Here are some similar ones if they help
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0219622
https://www.mdpi.com/1660-4601/11/3/3453
https://www.mdpi.com/1660-4601/16/17/3206
You can use gpt to summarize them and if you any interesting points you might give it a read
Objectives SCLC makes up approximately 15% of all lung carcinomas and is characterized by relatively aggressive spread and poorer prognosis compared to other lung cancers. Treatment options are limited, and their efficacy in randomized trials is poor, whilst outcomes in clinical practice remain unclear. The aim of this study was to assess the re...
Green space is now widely viewed as a health-promoting characteristic of residential environments, and has been linked to mental health benefits such as recovery from mental fatigue and reduced stress, particularly through experimental work in environmental psychology. Few population level studies have examined the relationships between green sp...
The relationship between the neighborhood environment and mental health has been investigated mostly in developed countries. Yet few studies have systematically examined the impact of the neighborhood-level built-environment and social environment on mental health within different localities in the Chinese context. Based on a household survey an...
Yes exactly
oh sorry i misread lol
and ty for the links 🙌
No worries!
Anybody ever worked with the COCO dataset
For some reason , im not able to download any of the files
Can anyone help
I'm very Disappointed. After a full night of training. This is the results.
Orange is Buy and HOLD, and Blue is the bot's performance.
The polars user guide
can someone tell me why my datasets are not being loaded
Reading a lab note on IBMs AI Engineer course.. surely there's a better variable name for this..
check your current working directory, or try "./data/test.csv".
where is your datasets module located?
It's better if you make a more straight forward approach. And just use pandas fuction 'read_csv' in your notebook. There's no need to complicate it.
i got it fixed but thnxs for the input
yea i did that
mhmm, and alias your pandas. it's convention at this point
Hi, using ax.legend(loc='upper left', bbox_to_anchor=(1, 1)) removes the title of the legend, any idea why?
https://machinelearning.apple.com/research/recognizing-people-photos
"Building a Gallery
A gallery is a collection of frequently occurring people in a user’s Photos library. To build the gallery in an unsupervised manner, Photos relies on clustering techniques to form groups, or clusters, of face and upper body feature vectors that correspond to the people detected in the library. In Photos, we developed a novel agglomerative clustering algorithm that enables an efficient incremental update of existing clusters and can scale to large libraries. To build these clusters, we start with a clustering algorithm that uses a combination of the face and upper body embeddings for each observation. This step is fairly conservative because when it joins two instances, they are permanently associated. We tune the algorithm so that each first-pass cluster only groups together very close matches, providing high precision but many, smaller clusters. Each cluster is represented by the running average of its embeddings as instances are added. ...."
Any one familiar with how they implemented their clustering?
How did they go about doing this :
"To build these clusters, we start with a clustering algorithm that uses a combination of the face and upper body embeddings for each observation. This step is fairly conservative because when it joins two instances, they are permanently associated."
How are two instances permanently associated...?
can someone help me
i already have accelerate but its giving this error
"ImportError: Using the Trainer with PyTorch requires accelerate>=0.21.0"
The error explains the issue: one of the required PIP packages must be at least version 0.21.0 or higher.
They basically concatenate the body feature vector to the face feature vector (or embedding as they call it). Because a person will always have a body and face, these feature vectors can simply be concatenated as they are associated with each other.
That's what I think from reading just this snippet at least
I see. I think when reading through the the approach, I had the impression that their approach is as such.
- First step -> they tuned the clusterer to be very insensitive, and only clustered points that are very close together. Then after that they made they permanently associated to each other (not sure how they do this).
It is the first time I see people do a multi-stage clustering effort, and making use of different features at each stage. Very curious what kind of algorithmn they used and where to learn things like that :/. I'm just kinda stuck with the typical algo atm - K-means, HDBScan etc
I think this image explains it well. They have a separate detector for upper body and face
They need to match what upper body belongs to what face, so that is the "matching"
They then combine the embeddings (by simply concatenating f.e., could be some other ways too)
SO you have one "person" embedding
Which they probably then use for clustering (in a multi-step clustering process)
But that is just from this image, they might do it differently, but I'm not gonna read the entire blog post rn 😛
You're absolutely right. My question is more targeted towards just the clustering technique
How to compute Triangular Moving Average?
Do you know what a triangular moving average is?
Hey! I did freecodecamp's course on Machine Learning, and I wanted to implement the algorithms presented through scikitlearn. I am unable to find projects/practice problems/tutorials on these algorithms (on how to implement them specifically), can anyone help me out?
woah that repo is so cool! Thank you so much- this is going to be very handy!
im writing my own neural network code and in the backpropagation ive ran into some issues with the numpy.dot() function
so for my example ive got my neurons in the format (9,20,20,9) ie input layer of 9, 2 hidden layers of 20, output layer of 9. and for the backpropagation i have the line
hidden_error = np.dot(np.array(self.networktable[i]).T, previous_delta) where self.networktable[i] is a 2d list. or a list of lists, each of the contained lists is the weights one entire neuron, and theres one of thoes for each neuron. previous_delta is a 1d list for each of the neurons on the next layer.
so for the 2nd hidden layer the following occours:
when i try too dot product them im taking a 2d list with 20 lists each with 20 values inside, and trying to dot product that with a 1d list of 9 values. this causes a shape error due too the differnt sizes. however im not sure what im ment to do instead because this approach worked for https://www.youtube.com/watch?v=7qYtIveJ6hU this example but he doesnt appear to do it for a layer at a time. not sure how i could fix this??
Link to github repo: https://github.com/geeksnome/machine-learning-made-easy/blob/master/backpropogation.py
Support me on Patreon: https://www.patreon.com/ajays97
Our facebook page: https://www.facebook.com/geeksnome
Our Twitter page: https://www.twitter.com/geeksnome
Our Instagram: https://www.instagram.com/geeksnome
Our Blog: http://geeksnom...
his version of the line looks like this, ive taken it from the pandas format and used numpy which i dont think effects it other than syntax.
any ideas?
I am currently working on a project aimed at recognizing whether a photo of any individual exists in the university's records. The proposed method involves storing the embeddings of each student's photo, along with their details, in a vector database. When a photo needs to be compared, the system will generate the embedding value for that photo and then compare this value against the database. If the value falls within a specific threshold, it will indicate that the individual exists in the record.
I am seeking expert advice on whether this approach is feasible. If there are any concerns with this method, I would appreciate recommendations for the best solution.
Like, from scratch or just using the algorithm?
If it's just about using it then the sklearn docs themselves have tons of examples
Do reinforcement learning
Like, read some of the book form sutton & barto
You'll learn DP in no time 😄
Dynamic programming is at the essence of RL
It's not hard at all, you'll have no problems doing it whatsoever
You've heard of Q learning right?
Q learning is approximate dynamic programming
Bro ?? Fr?
Basically, you're estimating the value of each state action pair
that SA combination is called the Q-value
If you have this you can derive the optimal policy from it
Nothing fancy yet, right?
The thing is, with DP you have a "world model", you have a representation of the problem you can prod you get the reward under your next action
And in DP you evaluate all actions in each state
In the real world you can't do that without being able to rewind
Q learning is "data based", you sample one action (giving more weight to the current policy) and then do your update like that. No need for "rewinds" or a "world model"
My explanation is terrible
I give up
If I had a whiteboard itd all make sense lol
anyway, here's the book http://incompleteideas.net/book/the-book-2nd.html
Doing DSA without doing leetcode first hmmm
Makes sense if you're in a hurry ig
using them- I'll check out the documents aswell, thanks!
Can perceptron loss only be used with a single perceptron?
when you say "perceptron loss", is there a code thing that you're referring to? like something from torch.nn?
No i wanted to know when do we use it
Bcoz normally in a neural network we would use mse , mae , huber loss , categorical , binary and sparse right?
i have learnt intermediate level ml from kaggle.com
and some of pandas library
but how to i get into deeper machine learning
Try making a neural network that classifies tabular data.
tabular what?
wait
its just a fancy word for a matrix
Tables and matrices aren't exactly the same thing
That's okay
Tabular data is when you have a fixed number of pieces of information (features) about each thing
Whereas images are non-tabular because each pixel is a feature, and the number of them varies depending on the size of the image.
Neural networks really shine with non-tabular data. But creating a network for tabular data is a good place to start.
Hi, Im using seaborn, does anyone know why the color palette changes when adding more elements to a group (using hue)?
Nvm found it
Somehow the ball made it seem much more professional xddd
xdd
Nicee
oh , just waiting until you see how the previous ones go?
Good luck
This is exactly what I do
I'm using Postgres with the vector extension so it's an all in one solution
Yes but it's not rocket science
Very easy to implement
I use dagster to embed my stuff once per night
Store it in pgVector, this is deployed as a separate entity from my DB.
Next to that, I also have the backend that can access pgVector, embeds the query, does cosine sim, sort and take top k
Technically just DP. DP does not need to be an exact thing.
RL is in many ways more DP than many DP algorithms.
Interesting way to view it, I agree
Just historically, what was being researched.
The dumb name was intentional, to avoid them not wanting to fund it.
In my mind I always think DP needs the reward model and transition probabilities etc.
And that maximising the bellman equation isn't necessary DP
Would you call Monte Carlo methods DP?
DP is just mathematical optimization, and specifically about this kind of building things up via breadcrumbs (exact or not).
Which is why it's so broad, most algorithms typically used practically on which a business is built are DP.
It needs to be a specific kind of mathematical optimisation, no?
The recursive relationship.
Yes, but I attribute that mostly to maximising the bellman equation
Unless you say that all methods that do that are DP (and vice versa)
In different contexts it's not just the Bellman equation.
If that's the angle, then sure
Discussing the details of this are above my pay grade though
I just know distinctly that many papers called Q learning, let's say temporal difference methods in general, approximate DP
Bellman is essential though yes.
But I don't know enough about this to be able to make a taxonomy myself 🙂
(or to critique existing ones, yours or theirs)
Although in other contexts you have like the Hamilton-Jacobi-Bellman equation.
Q learning is just DP, "approximate" can get across the practical implementation feeling better, but is not needed.
So as to not be confused with DP, the programming method, as found in CS classes.
DP as a programming method is stupid
In my mind it's an optimization method first and foremost yah
Yeah a lot of things in CS are weird like this, including the term CS.
(The need to tack on "science" to everything to make it feel more legit)
Or what a "greedy" algorithm even is.
the programming in DP to me is the same programming in linear programming, IL, MIP, ...
Because that's the context I learnt about those, in quant business
as well as greedy algorithms
Greedy is not actually well defined, it just kind of "feels" greedy.
But whether an algorithm is DP is.
Can't you define it as places where you have some sort of bellman recursion and you place the gamma term to be 0
So if you ever find a CS exam that asks if an algorithm is greedy or DP or something, they are not exclusive and only one is well defined...
So your chosen policy only tries to maximize P(Reward|State)
Had to correct Wikipedia on some of the DP and greedy stuff, they did change it though, so that's nice.
Not the correction I made, just a quote from Wikipedia: "From a dynamic programming point of view, Dijkstra's algorithm for the shortest path problem is a successive approximation scheme that solves the dynamic programming functional equation for the shortest path problem by the Reaching method."
Yet we don't consider it approximate. Since we run the whole thing till the end.
And we know we will reach some reasonable end.
Problem is what counts as a "local" decision. What is "local?" You will find that different algorithms consider different things to be "local."
Greedy is suppose to mean that you kind of don't enumerate all your options, you just pick the first one with limited scope.
But that scope can vary between the algorithms / contexts. If you are looping over a bunch of options and picking the best, is that really "greedy?" Sure it's not globally picking the best, but it's also not super local either.
So how local is local?
If i'm doing backtracking stuff and back out just 1 level, is that still local?
What about 2?
Usually it just boils down to practically, "the inner most loop."
This is a bit dumb but what does DP refer to in this context?
Which context?
Like what do you mean by DP in this conversation
Dynamic Programming.
Ight, wish I didn’t start deep learning a month ago. However, what is the most important math when it comes to deep learning and why do I feel it is matrix and linear algebra?
Can someone please criticism my model because I need feedback. Not shilling, just need to improve. https://github.com/nickkatsy/Jupyter_Notebooks/blob/main/hate_tweets_LSTM_NLP.ipynb
Hello everyone,
I'm currently working on a homework problem where we need to approximate the natural logarithm using an iteration method described by B.C. Carlson. The method involves iterative computation of arithmetic and geometric means, and seems quite interesting.
The iteration method is detailed in the paper:
B.C. Carlson: An Algorithm for Computing Logarithms and Arctangents, MathComp. 26 (118), 1972 pp. 543-549. DOI:10.1090/S0025-5718-1972-0307438-2.
Here's a summary of what I need to do:
- Initialize \( a_0 = \frac{(1+x)}{2} \) and \( g_0 = \sqrt{x} \).
- Iteratively compute \( a_{i+1} = \frac{a_i + g_i}{2} \) and \( g_{i+1} = \sqrt{a_{i+1} \cdot g_i} \).
- Use \( \frac{x-1}{a_i} \) as an approximation to \( \ln(x) \).```
The task requires writing a function `approx_ln(x, n)` that uses \( n \) iterations of this algorithm to approximate the natural logarithm \( \ln(x) \).
I'm a bit stuck on how to implement this in code, especially handling the iterations and ensuring the accuracy of the approximation. Could someone provide guidance or a starting point for this implementation? Any help would be greatly appreciated!
@cloud flower I faced the same issues recently, but luckily, I got some assistance. Unfortunately, I can't send you any links on this platform. However, if you send me a message, I can offer you some possible support.
pandas/dataframe question (noob)
I have a bunch of rows in a DF, each of which have a couple of attributes (timestamp, account id, resource type, resource id); And a "status" value
How can I get a summary of the number of each "status" for each account_id + resource_type + resource_id?
Feels like groupby + value_counts() but I can't get my head around it
df['STATE'].value_counts() gives me a nice series with:
- running: 2052
- stopped: 180
But I need those for each "thing" (which is resource_type + resource_id)
just groupby + count
you'll end up with a Multi Index though, so you'll probably want to call reset_index() after it
Hmm
df.groupby(['ACCOUNT_ID', 'TYPE', 'ID', 'STATE']).count().reset_index()
Looks kindof close. It's counting columns that aren't in the groupby
you specify
- which columns to group by
- which columns to operate on (default: all columns you did not group by)
- which operation to do on those columns
for count, it should be the same regardless of which column it is counting
some of the other columns have zero or NaN, the tallies seem to be different depending on that.
but just using a column which is always present (eg timestamp) look OK
sorry can't cut/paste, discord is on different machine to work (where it's banned)
What should I teach a small neural network?
Thanks @agile cobalt
I am fairly competant python dev, but would like to get (much) better at data. But there is lots to learn in pandas
Anyone have any good papers/other resources on facial recognition machine learning models?
etrotta from a DF with [a,b,c,d] columns, how might I add two columns:
group-by [a,b,c] + how many state=running
group-by [a,b,c] + how many total states
Trying to make a summary. Input data has hundreds of thousands of hows (the status of every resource, every 15 minutes). The group-by you helped with reducing it to one row per unique resource, with a count, which is super cool.
But would be nice to have two columns running="116" and total="149"
Got it - using .size() and .unstack()
i have a question posted in python posted in python-help about arrays , can anyone help ?
Anyone has info on constrained clustering algorithm?
Define constrained?
Hello, can someone recommend a book for me to learn AI with python? Learning by reading books appeared to be more effective than learning from the internet for me. Thank you in advance
look at the pinned post I made 🙂
they're actual books
https://www.statlearning.com/ is a book and so is https://arxiv.org/abs/2106.11342
This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely av...
could anyone help me with wandb here? I'm trying to find out how to increase the upload limit here. I'm stuck with 0.934 MB every run. How can I increase it?
can anyone resolve this error
plz ping me
please show the whole error message in the terminal in the chat as text (not a screenshot)
actually, I see the problem. you wrote import open ai. that's not the right import statement
it's probably something like import open_ai or import openai. there definitely won't be a space.
ModuleNotFoundError Traceback (most recent call last)
Cell In[33], line 1
----> 1 from langchain import PromptTemplate
2 from langchain.chains import RetrievalQA
3 from langchain.embeddings import HuggingFaceEmbeddings
ModuleNotFoundError: No module named 'langchain'
okay, so langchain isn't installed. what did you do that was intended to install langchain?
it is installed
what did you do to install it?
pip install langchain
okay, so the python instance that you installed langchain to is not the one you're using to run it
make a new cell in your notebook with these two lines, and run that cell, and show the result in this chat as text (not a screenshot)
import sys
print(sys.executable)
c:\Users\kabhi.conda\envs\mchatbot\python.exe
I don't know how to fix it if you're using conda.
its ok btw thx bro
you need to install langchain in the python that's at c:\Users\kabhi.conda\envs\mchatbot\python.exe. you have more than one python on your computer. if you did pip install langchain and it worked, it went to some other python (probably the default one)
doing c:\Users\kabhi.conda\envs\mchatbot\python.exe -m pip install langchain might solve it, but powershell (like conda) is something I don't use.
Hi guys, I'm learning Python programming for ML, libraries such Matplotlib, Numpy, Pandas, SkLearn. I'd like to find a job within some months, I have in plan to build some little projects following along some online paid courses (on Udemy). What do you think? Do I need to learn other things? I'll get to learn TensorFlow as well, although I don't know if it's enough to work
File c:\Users\kabhi.conda\envs\mchatbot\lib\site-packages\langchain_api\module_import.py:69, in create_importer.<locals>.import_by_name(name)
68 try:
---> 69 module = importlib.import_module(new_module)
70 except ModuleNotFoundError as e:
File c:\Users\kabhi.conda\envs\mchatbot\lib\importlib_init_.py:127, in import_module(name, package)
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:991, in find_and_load(name, import)
File <frozen importlib._bootstrap>:961, in find_and_load_unlocked(name, import)
File <frozen importlib._bootstrap>:219, in _call_with_frames_removed(f, *args, **kwds)
File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:991, in find_and_load(name, import)
File <frozen importlib._bootstrap>:973, in find_and_load_unlocked(name, import)
...
76 ) from e
77 raise
79 try:
new output
would anyone help me with my problem in #1035199133436354600 ?
If you don't already have a degree related to CS where you learned ML concepts, you almost certainly will not be able to get a job in ML from self-study.
That isn't to say that you shouldn't learn it if it interests you, but if you need to train for a job, you should look at more attainable options.
this probably means that you tried to import something that wasn't found.
ok
its showing No Python at '"cd D:\MP\End-to-end-Medical-Chatbot-using-Llama2\python.exe'
this error
but i have py
Help
model.fit(X_train,y_train,epochs=100)
Which gradient descent is used here? and if I dont mention the batch_size , which gradient descent will be used?
check the docs for whatever model that is, they should say
You might want to ask in #1035199133436354600, since it's implementation you want help with. Also, people will generally not write code for you, so if possible ask concrete questions and/or post what you've written so far.
This is what I have so far:
def approx_ln(x, n):
# Ensure the input x is greater than 0
if x <= 0:
raise ValueError("x must be greater than 0")
# Initialization
a = (1 + x) / 2 # a_0
g = math.sqrt(x) # g_0
# Iterate n times
for _ in range(n):
a_next = (a + g) / 2 # Compute a_{i+1}
g = math.sqrt(a_next * g) # Compute g_{i+1}
a = a_next # Update a to a_{i+1}
# Approximation of ln(x)
ln_x_approx = (x - 1) / a
return ln_x_approx
# Example usage
x = 5
n = 10
print(f"Approximation of ln({x}) with {n} iterations: {approx_ln(x, n)}")
That looks right - so what's the problem?
How do I check if the solution achieves what they want in the problem?
You could check that it converges to math.log(x) as n increases.
I see, so comparing the output of the approx_ln function to the result of math.log(x) for increasing values of n, and observe how the approximation improves as the number of iterations increases.
Pseudocode-wise, what I thought was that we need something that excludes it accepting x <= 0. Then we can initialize a_0, g_0, and then in the last step use the appropriate loop that just iterates over 1 to n, where n is what you give as input.
Sure. You could even make some plots, like this one:
Sure. Technically the problem statement doesn't say what "n iterations" means (whether to count the 0 values as an iteration or not), but this is a reasonable choice.
Would you happen to know what I would need to add to my code to achieve this?
Calculate math.log(5) - approx_ln(5,n) for n in range(1,20), then plot it with matplotlib (with a logarithmic y-axis). In my case I also compared it to a ∼4^(-n) line, because it seems it fits the points perfectly (it's surprising to me, normally approximations don't converge this nicely).
that means that somewhere you did plt.title = ...
much like how in this cell you're doing plt.xlabel = ..., which is also wrong. these are functions, you should be calling them, not redefining them.
Oh right, but even when I change the plt.xlabel('X axis) and run it, still doesnt work. Gives the same error
This is the entire document, literally nothing else
Since you've already ran a cell once that replaced plt.title with a string, removing that code won't undo its effects.
The simplest fix would be to restart the notebook.
Is it different for each model?
Well, sure, .fit in sklearn can do completely different things depending on what you're calling it on.
you just import them as normal.
if you mean how to install them - with %pip install somecoolpackage
nah, I did with gdrive
I mount that first!
os.chdir('/content/drive/MyDrive/Pong/')
!ls
so this prints all the files which I need
but when I import them , it shows error
no wait I solved that!
How to execute & test aws lambda locally in vscode?
Do consider data analyst jobs. You are listing Matplotlib which is a visualisation library, numpy for arrays and Pandas which is a dataframe analysis library. While these are sort of prerequisites for ML, SkLearn is the only ML library in your list. With this toolset I would advise you to also look at jobs involving data transformation, data analysis and visualisation. I'd also advise you to learn statistics libraries such as scipy.stats (while you are at it, have a look at scipy.signal and scipy.optimize ) and/or statsmodels. You don't say what your background is otherwise, do you have a social or natural science or engineering degree that involved statistics, computational methods, mathematics in general?
What is smote
What a good Github repo to look at for creating a GitHub portfolio for data science and data engineering and AI. What type of projects should i have.
nvm
Where would guys recommend wanting to create AI and trying to learn from the absolute basics of having a small foundation of Python.
Get good at Python. Forget about AI for a few months, and just get good at small Python projects. This will save you a lot of aggravation later. While you do that, maybe read about machine learning concepts and topics.
Have any resources? Or is some random 12 hour video on YouTube good enough lol
don't watch long youtube videos. They don't teach you anything
Often times the people that make it don't really know what they're doing either 🙂
If you're going to do that, look for open courses like from MIT or so
That’s my biggest problem
The videos goes over data types, syntax, etc
And 20 minutes in I feel like I’m wasting my time
Or check out the second pinned post, I have a bunch of books there but they'll need a decent understanding of Python
so I’ve been struggling to find a resource that actually teaches structure, basic understanding of code, et.
yeah I saw
Pick any book, literally any book, that teaches Python and is somewhat recent
and you'll be fine
!kin And, start doing small projects. It doesn't matter what it is (web site? Tic tac toe? Etc): the easier the better
The Kindling projects page on Ned Batchelder's website contains a list of projects and ideas programmers can tackle to build their skills and knowledge.
I speed read this one to see if I could vouch for it and I like it
!res Also, in case you haven't seen this:
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Another practical tip I can give is that if I want to understand something I read several books on the same topic 😩
Hence why I'd say, pick any beginner book. You'll probably read several of them
How do I start a neural network library?
Hey, the follow-up task to the first one was:
Plot both functions, ln and approx_ln, in one plot and the difference of both functions in another plot. Do this for different values of n.
And this is what I wrote:
import numpy as np
import matplotlib.pyplot as plt
def approx_ln(x, n):
if x <= 0:
raise ValueError("x must be greater than 0")
# Initial values
a_i = (1 + x) / 2
g_i = np.sqrt(x)
# Iterative calculation
for _ in range(n):
a_next = (a_i + g_i) / 2
g_next = np.sqrt(a_next * g_i)
a_i, g_i = a_next, g_next
# Final approximation
ln_approx = (x - 1) / a_i
return ln_approx
# a range of x values
x_values = np.linspace(0.1, 5, 400)
# true ln(x) values is computed
true_ln_values = np.log(x_values)
# values of n for approximation
n_values = [1, 2, 5, 10]
# subplots are created here
fig, axes = plt.subplots(2, 1, figsize=(10, 12))
# The ln(x) and approx_ln(x) for different n values is plotted
for n in n_values:
approx_ln_values = [approx_ln(x, n) for x in x_values]
axes[0].plot(x_values, approx_ln_values, label=f'approx_ln, n={n}')
axes[0].plot(x_values, true_ln_values, label='ln(x)', color='black', linestyle='--')
axes[0].set_title('True ln(x) and Approximations')
axes[0].set_xlabel('x')
axes[0].set_ylabel('ln(x)')
axes[0].legend()
axes[0].grid(True)
# The difference between ln(x) and approx_ln(x) is plotted
for n in n_values:
approx_ln_values = [approx_ln(x, n) for x in x_values]
difference = true_ln_values - approx_ln_values
axes[1].plot(x_values, difference, label=f'n={n}')
axes[1].set_title('Difference between ln(x) and approx_ln(x)')
axes[1].set_xlabel('x')
axes[1].set_ylabel('Difference')
axes[1].legend()
axes[1].grid(True)
# Show plots
plt.tight_layout()
plt.show()
Curious what you think.
Does anyone know how to use tqdm function with scikit learn cross validation so I can see the progress
And these are the plots @tidal bough
can we start AWS lambda function from vscode using terraform?
I mean, terraform is only for resource creation & destroying or can we also do above activity?
Wym by start?
Connect the lambda function to an api endpoint and call it from any framework that calls apis
This is what i did as of now
--> Created an .tf file where i wrote the code to create an lambda fn
--> After executing the series of commands init, plan & apply, the lambda got created on AWS console.
--> Now, if i want to use that lambda, i need run that lambda from AWS console, pass the input & see the output.
--> But, i want to execute this lambda from my console & see the output also in my console if self without touching AWS portal. Can terraform do this?
I believe my code has data leakage since it shows that I get accuracy 0.99 something
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
Y = train_metadata['target']
X = train_metadata.drop(columns=['target'])
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns
X[numeric_features] = X[numeric_features].apply(lambda x: x.fillna(x.mean()))
for col in categorical_features:
X[col] = X[col].fillna(X[col].mode()[0])
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numeric_features),
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
])
# Split to get 10% for training
X_train, X_rest, y_train, y_rest = train_test_split(X, Y, test_size=0.9, random_state=42)
X_test, _, y_test, _ = train_test_split(X, Y, test_size=0.9, random_state=43)
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", report)
Your split isn't splitting
Youre sampling from the entire X for your train, then sampling again from entire X for test
ohh what should I do?
also shouldn't it be a very low chance for them to be the actual same 10%
Why wouldn't you use train test split once to get the train and test?
I don't know you data, I'm just saying this is a weird way to do this. You can pass train size and test size in a single call: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
Gallery examples: Release Highlights for scikit-learn 1.5 Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.23 Release Highlight...
Ohhh okay
people what's up with interactive graphing these days?
looks like a bunch of web-based trash which renders in browsers
is there anything for data exploration which is comparable to the ease-of-use of ggplot through rstudio?
Jupyter Notebooks with seaborn, plotly or bokeh
why not just seaborn?
eh I like plotly better myself
and plotly is the kind of web-dependent trash I'm referencing as being suboptimal
let me state that more politically
plotly, I'd like to avoid, and I think categorically as I noted above
interactivity can be complicated
I don't find it useful in my work so it's fine to have say no tooltip at all. I am just looking at trends.
in fact it can be a dead graph once I draw it!
as long as it can be redrawn instantly when I change filters / columns / rerun analysis and build some new summary column
why did you ask for interactive graphing then?..
because the graphing process is interactive in ggplot
e.g. you layer, you filter, you manipulate, and the graph rendering process is indeed interactive
it's just not point and click interactive, as I guess you assumed I must mean
my b to confuse
sounds like literally any IPython terminal / Jupyter Notebook?
yeah I don't think an terminal as great for ease of use, nor jupyter notebook
I'd be doing analysis via pycharm or vscode
whichever the graphing library's rendering requirements integrated best with
you realize you can use notebooks inside of VS Code / PyCharm right?
(and I'd strongly recommend using inside of them over a web interface)
why would I want that?
if anything maybe take a look at Spyder, might be closer to R Studio
they always seemed, to me, to be a way to containerize examples for export in the web-ed systems
not necessary for actual data science
just interactivity
I want to build an environment once and persist it
IPython works just as well, if not better, if you're familiar with terminals and don't need of inline plots
e.g. I like the environment in which I install tf and pytorhc etc to be static
see inline plots - what I want it ease of data exploration. So like, rstudio just puts that plot right on the screen, right above your working environment's CLIs
that is literally the point of using notebooks for exploration
the way Spyder does it might be closer to what you want though
I love that. I am trying to find a similar configuration to enact a chillness like that for a python-based data analysis.
You can just create a virtual environment, install everything to it, then select that environment when running scripts or set it as your kernel inside of notebooks
so what's up with notebooks? do they have their own environments, or can you link any environment to a notebook?
can they share an env?
ah
the notebook itself does not contains the environment - you have to specify which environment it'll use as it backend
so what does it do?
just contains the code and its output
if my environment is local, and my ide is local (you said I can use a notebook in pycharm)
yes
so I'd write a little plotly and run this notebook and that would handle rendering directly to what?
- You should create a virtual environment (or a conda environment) and install your packages to that environment
- Your IDE can create it for you, or you can create it manually and tell your IDE to connect to that environment
- You can specify which environment to use when running scripts or terminal sessions
- You can specify which environment to use as the "Kernel" for notebooks, which is just a fancy way of saying which environment they're running in
do I need to coordinate between the notebook and plotly, or does plotly coordinate with the notebook which calls my local environment to render it?
you would just write some code like ```py
cell 1
import pandas as pd
import plotly.express as px
cell 2
df = pd.read_csv('data.csv')
df # show a preview of the df
cell 3
px.bar(df, x="name", y="score") # shows the plot as the cell output
log what exactly?
data. I can see my graph (adjusted from the example graph in the jupyter docs)
but if I do print('hi there')
I see no changes
did you remember to actually run the cell?
nope!
was looking for a button like that in the locality of the code & output
they're making some big deal about "packaging" those things together
tyty
remember to check the hotkeys for things like running cells
sure thing. So spyder vs jupyter?
spyder I presume local environment setup is easier?
or is that the de-factor tool for using notebooks with local env even?
not really, it is fairly simple for both as long as you do create and select a virtual environment
(or a conda environment, if you need to install certain niche packages)
Spyder is very different from Jupyter Notebooks, the de-facto standard are Jupyter Notebooks but Spyder might be closer to whatever your experience with R was
there is also the option of just using neither, writing your code in normal .py files, and running it in a normal terminal
(either entire files at once like normal, or just select regions of it then use some keybind to tell your IDE to run it in the terminal. shift + enter does that in VS Code)
hurrrgth man I've done that plenty, I'm looking for cutting edge and that means hot-reloading of live-edited code
jurigged 
if it must be scripted
look into that module, it's pretty nice
you should be able to use ctrl + enter to re-run the cell you're currently editing in Jupyter, not sure about Spyder, but I'd advise against actual hot-reloading
looks good. But I've still need I think for the web environment and some I/O behind the scenes management which it seems jupyter /spyder may do
yeah this is all I mean. I don't wanna rerender on every keystroke.
super thank you for your time
thank you both. It's been productive, I feel like I can get back into viz with an approach which isn't from the stone age.
btw, as a bit of a side note, for actual web dashboards (not exploration) you would typically use something like streamlit, dash or gradio
they all support most plotting libraries
gotta admit I do all my web graphing in ChartJS because it's just got exquisitely pretty defaults and the minified version is small enough to ship to p much any client
and if I'm writing it for production, I mean. The time to write is worth it.
I use plotly also because it's easy to export to plotly.js
Hello folks, not sure if this is the right place to ask, but are there any good solutions right now for using native webgpu for inferencing right now?
can anyone help me integrating py code of chatbot with pre buit website ??
I only run colab for 3 hours and now it is saying timeout!! ( I mean no access )
but the docs says that you can use for 12 hours!
were you hammering that baby? lol
it really is,, RL is a blast also
nah, !! but forgot to change runtime before closing chrome windows!
did this created problem?

anyways I am using kaggle now!
Add more padding/margin
For simplicity's sake I have it render html and render it directly
I'm not streaming the response rn, once I do I'll have to change it to markdown and convert it to html client side
Because I think it'll misbehave when it's streaming a response like <h1> the title ...
I've been slacking with polishing mine
it's functionally done but I want to add polish
nice
I created a static site generator by mistake for my personal site 😢
Yesn't. I have a system that hooks into an existing build tool (vite) to render my markdown into HTML, add web components and then publish to my site on every commit. Also has basic stuff like automatic reference lists, template inheritance and so on.
I didn't have to write any of the hard stuff myself
yup, but I don't "trust" their support for web components
All my links are turned into this style
done at build time
yes
At build time I just have some JS (but it could've been Python or Go) that does this:
markdown => HTML => replace <a> to <custom-component> . I also generate a reference list like this
Maybe hugo does this?
I was too lazy to read the docs so I did the stupid thing and created my own 🤣
My JS is only at build
The served site has virtually no JS
just disabled JS out of curiosity and most of my site still works (dark mode dies)
yes, I used to use it but meh
a dev container with a dataset inside?
Sure, go for it
unsure how big the userbase is of dev containers
guys how important is maths for Machine Learning
nix is a good solution for this
Hello, is the some ML models/techniques that can identify sub-keywords within concatenated words or strings. Something like tokenization, segmentation, or sequence labeling techniques. I am looking for decomposing down username concatenated string. Thanks!!
Having a decent basis in math will go a long way. Personally I didn't do a crazy amount of math in university, just a standard linear algebra and then a calculus course
It's enough to have a decent basis and then you can learn what else you need when necessary
can you tell me how much is necessary
because i am new to learning ml
it's not a question you can answer precisely
Start by going in blind without knowing math and building things
like what topics are highly used i
To see if you like the idea of ML
i watched khan academy for linear algebra
And then read any standard book on linear algebra and then calculus
No videos, use a book
and youtube for calculus
Learning from videos is a myth 🤓
he teaches with application in ml
sure usre but you have to engage with the material
In uni you kind of do that by taking notes and summarizing them after the fact
Unis are kinda ded at teaching those stuff
I don't think anyone is watching Khan academy and taking notes
fr
Hence, go for material that you need to actively do effort in order to "parse"
So books
but books got lot of unnecessary things too ryt
the things which we wont even see in ml?
like too much theory
for example for PCA just knowing the working of it and the structure is necessary for ml not the mathematical derivation and calculuation right( Because we have libraries for it)?
Ok so at the risk of sounding like a massive gatekeeper
This is something you're imo only allowed to say after you know enough about the theory
Before learning it you absolutely don't know what is and isn't relevant
In hindsight I can say "thing X and Y were irrelevant to know"
And maybe that's some sort of bias because knowing them helped me understand different things more easily
PCA is a good example, if you know the theory behind PCA you absolutely know where it's not appropriate and why the libraries doing PCA all suck
They all want you to pick up to k principal components
maybe having principal component 1, 3 and 7 is the best and not range(1, 8)
Or maybe you need kernel PCA or an autoencoder etc etc
so i just learn enough theory and start learning ml
and if i find any barriers i go relearn or learn about the topic and come back?
does that sound correct
Something like that
i am telling that cuz i dont wanna waste much time on learning the unwanted math
and learn how to actually build the model
cuz that will force me to learn the ones that i will actually have to learn
that sounds fine
doing the entire theory at once never works ye
I think my method is a bit more hardcore but it works ™️
I am absorbing most of the basic maths for ml
If I read a book on say reinforcement learning I implement the majority of algorithms along the way
and am going to try learning ml while relearning or learning that i havent learnt maths
For Rust I did this
learning along the way the things we missed for foundation
For something like Rust I'd argue it makes sense to do that and not learn "as you go"
what i meant is learning roughly then going throught it while relearning while implementing
do you catch the context
yes
anyway, the most important thing is just starting
Time spent optimizing how you'll learn and creating roadmaps is time spent not learning. Nowadays I just pick any book and read that
Instead of wasting time looking for the optimal 😄
zen buddhism
True that
But I atleast try 1 hour to optimize my learning
cuz blindly going on and not knowing the prerequisites hits hard
this is effective if you alr know math but if i didn know any what math else i can think of to try out
The problem with data is that in software if you do things incorrectly you often get runtime errors
or it won't compile
With data there's a whole class of problems you don't know you're doing incorrectly unless you know otherwise
im srry i mean *if i didn know any math ....as project usually structured upon real world concept like math,clients request
No type safety will stop you from leaking data from test to train
yo dawg, i heard you like containers, so i put each of your datasets into a container inside your dev container
✨ idiosyncratic✨
No, it's just weird i'm sorry :p
How hard is setting up a dev env?
Especially for Python
I use Ansible to install pyenv
and pipx and poetry
and then I just do poetry install and I have my dev env?
I think on your part tbh, I don't see what problem it's solving
Don't mean it in a rude way - I'm genuinely curious
I care about reproducibility and automation a lot
So why not just do this:
You use whatever means to bootstrap a Python environment, likely just pip install a requirements file
And you just download the dataset with a get request?
or you use wget
maybe our workflows are so different I don't get it, I'll chalk it down to that haha
I basically have docker compose files that spin up things I can use in dev (say, a database) as well as being populated with the right env variables during deployment
But I make assumptions like "everyone will have Python and Docker installed"
Running Python itself in a dev container is a waste of time imho
imagine having friends who don't have python and docker installed
Sure if you use codespaces this probably makes sense but hence why I keep using ✨ idiosyncratic ✨
I saw someone on reddit who was like "I hate when vendors only ship their product as a docker image. not everyone uses docker." bruh just install it
it's great.
it's basically the same workflow
except with mine you need to do docker compose -f <the-file-name> up -d
hey, when you're exploring data, do you guys activate the same environment in pycharm & jupyter notebook and fuss about in pycharm and only transcribe to jupyter when you have the gist of things? or do you use one of jupyter's consoles such that you share the inner ( Python ) variables?
I agree so much haah
What standard?
There is no standard
By M*crosoft at large
I don't know anyone in the real world that uses it 🤷
Except for me, briefly
Not everyone uses pycharm, or uses notebooks to explore data. do you dislike the jupyter interface in pycharm?
I don't dislike your setup
haven't even tried it yet, I just set up jupyter and conda and didn't wanna overcomplicate
how is it? commendable? because that seems ideal
if you don't want to overcomplicate, start by deleting conda. (this is my opinion.)
I don't find conda complicated
I see more repos asking you to run some infra with docker compose up than those that use dev containers
but I maybe would've found the jupyter plugin to pycharm w/o knowing what jupyter even did
At the end of the day, I don't care
Whatever they propose I'll use
Do you use vs code?
I'll check out your question(s) in a second @vale jungle
I recommend writing "regular code" before you start using notebooks. notebooks have a few landmines that are easier to understand once you know how python handles variables and state.
But I hope you can see this is a very specific setup?
I mean, coding exclusively in the cloud
You have specific problems to your setup
Hence why I call it idiosyncratic
Your entire set up is ephemeral
alright. well you can write notebooks in pycharm, or in jupyter's browser interface. they're both essentially the same. they just have buttons in different places. and pycharm will have code completion.
When the majority of us are coding in persistent places
.
anyway, we're going in circles
So what I do is I code in .py files and while I'm doing that I import it in notebooks and iteratively work on both. The reason why I do that is for the reasons that Stelercus mentions (the landmines)
If it's really ad hoc data exploration I'd potentially do all of it in a notebook
I do something like this, but my "iterative workbench" is an IPython repl rather than a notebook. (Which, as you know, is just a low-fi notebook with no pretense of reusability.)
i was referring to onboarding people
oh, I see what you are saying.
yeah that's an immediate benefit of containerization
dang pycharm's jupyter plugins require pycharm professional
how's vscode for python in general?
I don't think so, but I'm a pro user, so I wouldn't know for sure
if you're a student, you can get pycharm pro for phr33
a lot of people like it.
sadly I'm graduated
well lookin like I'm gonna try it out
excellent thank you community, I'm up and running in vscode with my jupys and securely wrapped in a conda.
Fantastic. Ultimately I think people just use what they use first and stick with it. I used vs code first so I stuck to that
I'm like that with pycharm
I'm the opposite baby I try everything day 1
yup, I would've used Pycharm if I started with that, I'm sure of it
but now you get to save like 80 dollars a year
There's no real point to doing this imo
I used pycharm first and realized I hated it now I use vscode
bah nonsense
I agree for a naive user, but let's say you've already tried 2-3 tools in the past with sensible parallels
but I may have no point doing it either : / I concede I am not 100% certain of my methods
once you've learned five programming languages and 5 IDEs it's kind of like yeah maybe you can get a sense of abilities / disabilities
well lemme tell you, rstudio kick's configured vscode's ass, and I would argue the emacs is a superior tool for a couple of niche languages (particularly erlang & elixir)
like, for r, erlang, & eixir. Vscode just isn't the right choice because of community commitments to other interfaces and subsequent package-availabilities.
Emacs just uses the language server protocol doesn't it? Anything that works there works on vs code or other stuff like sublime
supposedly, I guess
I didn't even know that the language server protocol was something which vscode implemented lol
Anyhow, I think this is a micro optimization 🙂 but yeah, whichever you prefer works
I assume there are elements which prevent somebody from just releasing all emacs packages for vscode, because it's the case that that isn't what happens
ehhh it's not a microoptimization. Like I'd say the default R studio object explorer is a beautiful tool
but yeah, surely not focused.
vs code's R plugin has that too
true, true. It's not just that though, it's how graphs are displayed inlines
it's a bunch of things
and it's all preconfigured
source: I've used it
like out of the box, Rstudio is a beautiful data exploration environment. Vscode requires some configuration to achieve that .
does tidyverse formatting work in vscode?
always liked how data tables print in rstudio
unless that counts as configuration
that counts!
I dunno I really just felt it was magical the last few times I tried anything els
it's hard to find any motivation to move off r studio because of that.
if it ever messes with me though VScode is awaiting me for my R-dependent work, too
I'm interested in learning about machine learning but don't know where to start, I'm torn between 3blue1brown's "What is a neural network" series and Andrew ng's machine learning course on Coursera, which should I choose? or should I watch both?
These two things are not interchangeable. 3b1b's videos are about getting a general understanding of something, whereas Coursera courses are more comprehensive. You can start with 3b1b and see how you feel.
mmm, I'm using PyCharm pro, so I don't need to switch the context for web frontend 
nah, PyCharm Pro edition integrates all of Webstorm
True, if all you’re doing is data stuff tho
I work in data and most of my data visualization starts in excel hehe
I then transition to vscode to actual modeling and coding
I HIGHLY suggest learning some basic linear algebra and statistics before jumping into machine learning
You don’t need MUCH, just get to eigenspaces and the basics of probability theory
This will give you intuitive understanding of what machine “learning” is
how can I convert 23 gb dataset into 2 gb!!
randomly selection of data?
because I just wanna test something
the problem is root dir contains several directories which are confusing
TuSimple !
which has folders in folders and ......
wait!
I downloaded that from kaggle!
give me correct tree command
tree -d
???
it is printing 1000 dir
then I have to use pastebin lol!
├── test_set
│ ├── clips
│ ├── readme.md
│ └── test_tasks_0627.json
└── train_set
├── clips
├── label_data_0313.json
├── label_data_0531.json
├── label_data_0601.json
├── readme.md
└── seg_label
├── test_label.json
├── test_set
│ ├── clips
│ │ ├── 0530
│ │ ├── 0531
│ │ └── 0601
│ ├── readme.md
│ └── test_tasks_0627.json
└── train_set
├── clips
│ ├── 0313-1
│ ├── 0313-2
│ ├── 0531
│ └── 0601
├── label_data_0313.json
├── label_data_0531.json
├── label_data_0601.json
├── readme.md
└── seg_label
├── 0313-1
├── 0313-2
├── 0530
├── 0531
├── 0601
├── list
├── test.json
└── train_val.json
19 directories, 9 files
what are this numbers?
depth of dir?
need pastebin again
wait there is a readme
again pastebin
yeah!, finding right dataset , always a task!
are you seriously downloading whole dataset?
yeah , I have downloaded that with script also!
it's fast
i know its not python but do you guys have a really good tutorial for data analysis in excel??
need to search!
S3?
ohh !!
our budget?
what does this mean?
you were giving interviews?
just to watch 2 guys discussing!
you were giving interviews to other companies right?
what about current status?
and currently you are working in some org?
which GPU they ( you ) use then?
then what?
ohhh, can I ask what you do ( non profit )
website?
dox?
I am so dumb!
lm anon?
wtf! this short form! nice!
I typed 'L' lol
hey can I change google account if my GPU runtime is over?
on google colab
that will be nice!
yeah ! can do that for a GPU!!
yeah, it's working!
I have student id for AWS
100 dollars credit!
for GPU?
yeah!
thanks for this!
yeah, that's the thing
but anyways colab is also fast
suggest me some basic task where I can apply CNN for fun!
this is advance!
suggest for me!
can you share you web?
bruhh, come on I just need to extract features and show some output
Who is Lisan Al-Gaib, Mahdi?
should I do this for alphabets?
dm?
individual computer characters? what does this mean?
like all characters on keyboard or what?
yeah that would be nice then!
should I start with one by one, Like first A then B
generate images? of what kind?
input -> L
output -> L image?
how can I develop this?
yeah I have read about that!
pytesseract
so you are saying , we will give input as letter name and then it will give output as that image?
( if else in my mind ) !!
which model you have implemented on that resume chat
Hi,
I'm encountering an error when trying to train my keras functional model. The error only occurs when using a custom loss function I implemented, as the problem does not arise when using TensorFlow's built-in loss functions. Here's the error:
Traceback (most recent call last):
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 344, in <module>
game_model.fit([test_map_input, test_unit_stats_input, test_tech_input, test_tribe_input, test_high_level_mask,
File "/home/quirin/PycharmProjects/vector_note_server/venv/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/quirin/PycharmProjects/vector_note_server/venv/lib/python3.12/site-packages/keras/src/losses/loss.py", line 113, in reduce_values
or tuple(values.shape) == ()
^^^^^^^^^^^^^^^^^^^
ValueError: Cannot iterate over a shape with unknown rank.
I tried debugging it while setting run_functions_eagerly(True), but it led to a different error which I also don't understand. At this point, i'd be happy to return even random or constant values in my loss function without getting an error
does anyone have an idea why i get that error and how i can fix it?
below i'll paste the loss function i tried last.
what do you mean with wrong?
and the input shape for what?
Yeah... sorry - i've been through a lot of experimenting
you have written very advanced pythonic code or what!
First i thought it was some error with the shapes, but since it works with the built in functions, it shouldn't be a problem with the shapes in the model itself
!otn a you have written very advanced pythonic code
:ok_hand: Added you-have-written-very-advanced-pythonic-code to the names list.
what was that?
he got rewarded for that?
you are now the proud creator of a dank meme
I am still reading his code
and what is this!
😂 what the hell is happening!
please tell me what was that?
"you have written very advanced, pythonic code" is what every python dev wants to hear when they hand in their side quest.
😂 ohhh!
because I have never written ! Lol
I have not even passed 10 lines
nice code though!
Legend!
python question
he is calling the function , but where he is storing the value of loss?
yes, that has to do with the basic structure of the model.
I have Multiple models combined in a single big one, and for every one of them it always calles the loss, an i have these conditionals to figure out what shape and model i need to design the loss for
ohh got it!!, it is conditional! into one "loss" variable
because tensorflow won't allow ordinary conditionals in their functions as far as i know
wait a second, i can send to the error you can expect when trying
Traceback (most recent call last):
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 342, in <module>
game_model.fit([test_map_input, test_unit_stats_input, test_tech_input, test_tribe_input, test_high_level_mask,
File "/home/quirin/PycharmProjects/vector_note_server/venv/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 309, in custom_loss
return tf.cond(c, single_output_loss, multi_output_loss)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/quirin/PycharmProjects/Polytopia_AI_Agent/discord_test_model.py", line 267, in single_output_loss
if last_dim in [len(ACTION_TYPES), len(UNIT_ACTIONS_LIST), MAP_SIZE * MAP_SIZE, len(TECH_ACTIONS_LIST), len(TRIBE_ACTIONS_LIST), len(BUILD_ACTIONS_LIST)]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: Using a symbolic tf.Tensor as a Python bool is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.
It is using tf.Tensors, which can hold a boolean value, but can't be used as such
It might when enabling the "run_functions_eagerly", but i wouldn't be to sure of it.
I'll try that, but even then, it wouldn't actually fix my issue, right?
It would just be a better written version of the function that doesn't work anyways.
Already on it
Right, which really irritates me, since it only throws that error when i use my custom function.
when i just call the premade cross entropy it works just fine...
yes, but after splitting it off into groups (the function you see was a test after hours of trying to figure out what's wrong)
there should actually be some other calculations.
https://paste.pythondiscord.com/VCCA
this one might be easier to read
uhh, you're right. when i return constant 0 in the multi_output_loss it runs through smoothly
losses = [
tf.keras.losses.categorical_crossentropy(y_true[0], y_pred[0]), # action
tf.keras.losses.categorical_crossentropy(y_true[1], y_pred[1]),
tf.keras.losses.categorical_crossentropy(y_true[2], y_pred[2]),
tf.keras.losses.categorical_crossentropy(y_true[3], y_pred[3]),
tf.keras.losses.categorical_crossentropy(y_true[4], y_pred[4]),
tf.keras.losses.categorical_crossentropy(y_true[5], y_pred[5])
]
return tf.constant(0.0)#tf.add_n(losses)
Tensor("data_9:0", shape=(None, 4), dtype=float32)
Tensor("test_game_model_1/gameModelDense6_1/Relu:0", shape=(None, 4), dtype=float32)
Tensor("data_10:0", shape=(None, 11, 11, 10), dtype=float32)
Tensor("test_game_model_1/cond/Identity:0", shape=(None, 11, 11, 10), dtype=float32)
Tensor("data_11:0", shape=(None, 11, 11), dtype=float32)
Tensor("test_game_model_1/cond/Identity_1:0", shape=(None, 11, 11), dtype=float32)
Tensor("data_12:0", shape=(None, 25), dtype=float32)
Tensor("test_game_model_1/cond_1/Identity:0", shape=(None, 25), dtype=float32)
Tensor("data_13:0", shape=(None, 4), dtype=float32)
Tensor("test_game_model_1/cond_2/Identity:0", shape=(None, 4), dtype=float32)
Tensor("data_14:0", shape=(None, 11, 11, 19), dtype=float32)
Tensor("test_game_model_1/cond_3/Identity:0", shape=(None, 11, 11, 19), dtype=float32)
doesn't really look like it...
y_true: Tensor("data_9:0", shape=(None, 4), dtype=float32)
y_pred: Tensor("test_game_model_1/gameModelDense6_1/Relu:0", shape=(None, 4), dtype=float32)
y_true: Tensor("data_10:0", shape=(None, 11, 11, 10), dtype=float32)
y_pred: Tensor("test_game_model_1/cond/Identity:0", shape=(None, 11, 11, 10), dtype=float32)
y_true: Tensor("data_11:0", shape=(None, 11, 11), dtype=float32)
y_pred: Tensor("test_game_model_1/cond/Identity_1:0", shape=(None, 11, 11), dtype=float32)
y_true: Tensor("data_12:0", shape=(None, 25), dtype=float32)
y_pred: Tensor("test_game_model_1/cond_1/Identity:0", shape=(None, 25), dtype=float32)
y_true: Tensor("data_13:0", shape=(None, 4), dtype=float32)
y_pred: Tensor("test_game_model_1/cond_2/Identity:0", shape=(None, 4), dtype=float32)
y_true: Tensor("data_14:0", shape=(None, 11, 11, 19), dtype=float32)
y_pred: Tensor("test_game_model_1/cond_3/Identity:0", shape=(None, 11, 11, 19), dtype=float32)
the output for the y_true and y_pred prints...
Oh, i have an idea now.
I expected to get some kind of list or tuple inputted to my loss function, when i have multiple outputs.
instead i think they are iterating over them in advance and only give me the single pairs.
that was my mistake i think
It was before.
i have no idea why it didn't throw the error while trying to index them
oh... it used the batch size as an index. Thats why i got errors with inputs, because the whatever i got for my loss output used a different batch size then everything else.
Finally got multi-gpu training running! Though I would have expected a bit more of a speed up (left is 4 Tesla T4s vs. the right being one 2070 Super), so there's probably some more work to do 
someone teach me neural networks like im a 5 year old
No. There is no way to simplify them to a five-year-old level that actually means anything.
why is df.join giving all nans? the dfs share no columns
code: ```py
def rasterize_data(d: gp.GeoDataFrame):
xmin, ymin, xmax, ymax = d.total_bounds
resolution = 1000000
width = int((xmax - xmin) / resolution)
height = int((ymax - ymin) / resolution)
transform = rasterio.transform.from_origin(xmin, ymax, resolution, resolution)
print(d["geometry"].shape)
raster = d["geometry"].apply(
lambda x: rasterize(
[(x, 1)],
out_shape=(height, width),
transform=transform,
fill=0,
all_touched=True,
dtype="float32",
).flatten()
)
raster = pd.DataFrame(raster.tolist()).add_prefix("p")
print("raster")
print(raster)
print("d")
print(d)
print("joined")
print(d.join(raster))
d = d.join(raster, how="left")
print(d)
return d
Show dataframes, and result of join. Rest of code doesn't really matter
I was drunkenly speedrunning tensor flow and a nn while blaring Chief Keef on spotify and filming it on OBS, and my pc just gave out. If it turned back on and stuff, should it be fine? It was when the epochs started getting higher.
it's probably fine, obs and tensorflow can both be cpu/gpu heavy processes so as long as your components weren't hot enough to boil water or something you're good
Thanks. Pretty dumb on my part.
The hardware should be fine. But you will have lost all your training progress.
Pretty dumb on my part, Chief Keef and a couple of beers with the bros kinda made me lose sight. But not enough sight to want to do nothing else while drunk
Your computer might also turn itself off automatically if the hardware gets dangerously hot. I had an old laptop that did that.
If you plan on training a large model on your own hardware, make sure the cooling is good, not just so it does not turn itself off, but because it will also slow itself down if it gets too warm. This also includes the room temperature.
(There is a reason why all this new LLM training is taking down some power grids now)
Bro, how much RAM does TF take up? I feel like it takes up a lot and I have to use Jupyter for it. Is PyTorch less taxing? I feel like this is taking up a good amount of ram and I have 32 GB of ram
The library will have a negligible impact on the ram usage
Do you have a gpu? Are you talking about cpu ram or GPU ram?
RTX3090, do you mean vram?
Okay, and what are you trying to do, and did you confirm that the computation is taking place on the GPU?
Yes. It chills out like crazy when I use Jupyter
Whether you're doing it in Jupyter or a regular python program has zero effect on computation speed.
It’s not frying my cpu, it is actually not too bad. I just went wild on it, I was drunk, yes, but I would usually never do that. Does it really not?
Why would the same code run faster or slower in a notebook? It's the same code.
I configured it through that environment
You configured what?
Tensor and libraries I use for it
Okay, well Jupyter is just another way of writing/editing code and telling python what code to run. Once python is running the code, it doesn't matter where it came from.
No matter what the kernel still runs. That’s pretty dumb of myself not to consider. I should probably remove stud from VS code in all honesty
Remove stuff from vs code, to achieve what?
It could not hurt, I mean, I have 600 scripts of csv files, C/C++, sql files in one place
If you have 600 csv files, those are just csv files, not scripts of csv files
Anyway, is there a problem you're currently facing?
I just meant a lot of stuff in one place, need to throw out. And no problem. It happened and it’s acting fine now. Just kind of a jump scare.
Cool
is an array is like tuple and list where a single object is consist of multiple values?
tuples are immutable but both can contain many or few values. Array is another word for list
thx
Does anyone have a working program that uses the RagDatasetGenerator.generate_questions_from_node() functionality, preferably with a limit on question generation?
Got about this far before hitting a ReadTimeout: Error running coroutine
from llama_index.core.evaluation import DatasetGenerator, RelevancyEvaluator
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Response, Settings
from llama_index.core.llama_dataset.generator import RagDatasetGenerator
from llama_index.llms.ollama import Ollama
import nest_asyncio
nest_asyncio.apply()
reader = SimpleDirectoryReader("data/")
documents = reader.load_data()
llm = Ollama(model="llama3")
Settings.llm = llm
data_gen = RagDatasetGenerator.from_documents(documents)
eval_questions = data_gen.generate_questions_from_nodes()
print(eval_questions)```
im thinking about making a CNN based around speech recognition, how would output even look like, and how would i even train it?
damn alright
anybody here knows about YOLO custom cfg?
Traceback (most recent call last):
File "train.py", line 616, in <module>
train(hyp, opt, device, tbwriter)
File "train.py", line 88, in train
model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create
File "C:\Users\kuzey\yolov7\models\yolo.py", line 528, in _init
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
File "C:\Users\kuzey\yolov7\models\yolo.py", line 738, in parse_model
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
KeyError: 'depth_multiple'
keep getting this error no idea how to fix
Even thought I got depth multiple in the cfg
YOLOv7 custom configuration file
Number of classes
nc: 3 # Number of classes (Kalsifikasyon, Normal, Kitle)
Anchors
anchors:
[10,13, 16,30, 33,23] # P3/8
[30,61, 62,45, 59,119] # P4/16
[116,90, 156,198, 373,326] # P5/32
YOLOv7 backbone and head configuration
backbone:
name: darknet53 # Darknet53 or any other supported backbone
depth_multiple: 0.33 # Depth multiple for backbone
width_multiple: 0.50 # Width multiple for backbone
head:
num_classes: ${nc}
anchors: ${anchors}
Training parameters
train:
img_size: 640
batch_size: 16
epochs: 300
data: C:\Users\kuzey\OneDrive\Masaüstü\data.yaml
cfg: models/yolov7-custom.cfg
weights: yolov7.pt
device: ''
multi_scale: False
freeze: [0]
Testing parameters
test:
img_size: 640
batch_size: 16
data: C:\Users\kuzey\OneDrive\Masaüstü\data.yaml
cfg: models/yolov7-custom.cfg
weights: yolov7.pt
Just ask your question, or use a help thread #❓|how-to-get-help
Ohh okey thank u
I just wanna ask if there r any courses for ai ml by python for free anywhere?
"CS50 for AI" is one
Can u please send me the link and is there any pre requirements to do this course?
i will say maths
😧
Is there a way we can download models from roboflow?
I've only been telling you for a year now
how so? isn't it just SQL with support for a lot of inputs/outputs?
<begins typing furiously>
Columnar (like clickhouse/snowflake), support for Python UDFs, various optimizations for windowing/etc, the json support and parquet support is more than just different inputs, the no config/in memory footprint (like SQLite) makes it very ergonomic, the language extensions (around windowing, dynamic column expressions https://duckdb.org/2023/08/23/even-friendlier-sql.html, as of joins, positional joins, etc. I think they've taken every good idea everyone's come up with in OLAP land and Sql land and Dataframe land and unified it.
Arrays and lists are different
Bruh
Those were fast reactions
stelersus potentially self botting?
Ok
Hello, i'm working on fine grained classification for car models. Since its fine grained classification the dataset is inherently unbalanced (long tailed). I can get so far by tweaking the loss or the sampling method. Would generating images with diffusion models for undersampled classes be a good idea? is there an other method i can try to use?
@proper crag lists are not arrays--don't get them mixed up
Sure technically but in python and to a beginner no
This is wrong. Lists are completely mutable, can be heterogenous, and are strictly one-dimensional. Arrays have immutable shapes, are homogenous, and can be multi-dimensional.
No. Everyone who does data science needs to know the difference. Especially beginners.
Arrays are not immutable
The shape of an array is immutable.
Just for the record this was a beginner question posted in a wrong channel. The semantics aren’t that serious
Couldnt u store a list in a list?
yes, but the outer list doesn't "know" that it contains another list, and you can't treat the sub-lists as part of the outer list. Whereas multi-dimensional arrays are one thing and can be indexed as one thing.
Ok
im not mixing things up rather are theyre like list .like an object that consist of multiple values
like i know tuple and list both different but both are where data is being stored
But what about an array on the heap? Like a pointer to dynamically allocated memory. Isn't the shape of that mutable because you can allocate more when necessary?
Are you talking about arrays that are python objects?
I guess I'm talking about C or similar? Are you talking about the Python array type?
I'm talking about numpy, since that's the mostly widely used "array" in Python.
I love this question because 2 people will always be talking about wildly different things (when array vs list)
tbh, society went down hill the first time someone used "array" in computer science.
that person cursed us all to eternal semantic overload.
Exactly
interestingly, not in the way you might think
if you make a new array, it'll assign a whole new chunk in the heap so that the elements are contiguous
Very true
Hence why it's a strange question because it depends on the level of abstraction
It's quite common in interviews
Like, what is immutable etc. not being able to mutate the thing on the stack? The heap? Both etc.
a list contains a PyObject** - a pointer to a (dynamically allocated) array of pointers to pyobjects
that's the header of all variable-sized python objects
it's a few fields like the pointer to the type, a refcount and (for variable-sized ones) the size
which, hmm, something is funky here actually
What do you mean?
(interestingly I think lists are maybe misusing PyObject_VAR_HEAD a bit. since... lists aren't variable-sized in that sense, their variable-sized part is stored outside of the struct itself, unlike with strings. Either that or I'm just assuming too much)
That's because it's a macro
Include/object.h line 101
#define PyObject_VAR_HEAD PyVarObject ob_base;```
`Include/object.h` lines 157 to 160
```h
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;```
Sure - it's just a perfectly normal dynamical array, with an element type of PyObject* (pointer to a pyobject)
when the list runs out of space, it reallocates the dynamical array to a larger size, moving all of its elements (which are just pointers - the objects they point to aren't moved)
Not totally sure what you mean - a pyobject may have references of its own, yeah. Or it may not - a string, say, doesn't reference anything.
That's why you can't have an array of PyObject, yes - only of PyObject*, which is what lists do.
Yeah, I think you're missing this one *. A PyObject is really a pyobject, no references involved. That's why a list must have an array of PyObject*s, not PyObjects.
it's kind of like an ob_item: Vec<Rc<PyObject>> :p
(not really, because refcounting in python is done inside the pyobjects in question, but similar semantics)
- Arrays of what?
- We do,
array.array.
yeah, it's just a 1d array of some predefined types
If you find this stuff interesting you could read "fluent python"
This convo is part of the topics it covers, albeit in less detail
imagine using array.array
fluent python is the greatest tome of all python
DSA got rebranded to data structures and algorithms for a good hour instead of data science and ai 😂
But speaking of data science. Would you find it interesting if I'd do kaggle competitions or similar and do write ups of them in my blog?
At a beginner / intermediate level
I only ended up doing non data stuff there so far
i need a package for sentiment analysis
any reccs?
im using flair bur having problems with it
I want to write a book about data science. Which would be the best tool to help me ? Claude ? ChatGPT ? Gemini ? Also pitch in with your versions too please.
Nobody wants to read a book written by a LLM.
You'd be surprised 🙂
brainrot
Yes, probably, technically you can allocate the PyObjects anywhere. PyListObject is a dynamic array of PyObject references. This is what lets it have different types of objects in the same list, because PyObject is a base class.
Yeah.
The struct itself is a dynamic array, of references.
Each element in the array is a reference to a PyObject elsewhere.
So it's an array of memory addresses.
It's a contiguous chunk of memory addresses which are all the same size (pointer size).
Oh, no I meant the ob_item.
Which is the main thing about this type, everything else is data to manage that.
does anyone have any experience with pytorch combined with numerical integration / gradients ?
particularly quad torch?
An array is a fixed length, contiguous, ordered, fixed element size chunk of memory.
It becomes a multi-dimensional array depending on how you access it.
Well no, the type distinction is important, we usually define types by their interfaces.
Like how the PyListObject is no longer just an array because it's first a dynamic array (due to how we handle the array / reallocate it), and a list due to what it stores.
It's built on top of an array, but we don't really call it an array anymore.
Almost everything is built on top of an array.
A binary tree for example can be built on top of an array, but we don't call it an array due to how we store and read from it.
(The access pattern / interface)
Otherwise, everything is just an array.
So calling arrays and lists two different things is important, because the first does not get across much about how a list functions. Same thing for the tree.
Also the list in Python is suppose to match the idea from math a bit more.
Yeah, math has lists.
It has arrays, lists, tuples, sets, etc. Ofc, you can build everything on sets, which they do for other reasons.
Maps from subset of natural numbers to some set.
(list)
Yeah, and since sets are suppose to be able to have all kinds of things in them, Python does too.
For practical purposes Python lists are also dynamic arrays, which is where all the real practical use comes from.
Dynamic arrays are usually the first thing one tries to get before doing anything else these days.
And then everything on top of that.
Dynamic array is just an array that can resize.
Yes, often, but it can be allocated anywhere.
No, you can do either in either.
Make an array on the the stack that is very large, put your dynamic array in there.
That's exactly the same as what the heap does.
It's also one big array.
I think this is a dangerous leap, as array implies different memory layout than list.
OS managed yes, although people often have other heaps managed by themselves.
It's built on top of a dynamic array. See the implementation in C.
But it's not a dynamic array.
It's a list.
That's the idea*
It has the interface of a dynamic array still though in addition, the ability to resize.
There is the OS's heap, aka just called "the heap." There is the stack which is allocated and handed over to the program, although it can also be resized btw. And a user can also not use either and instead allocate memory pages directly from the OS.
If there is no OS you own all the memory (one big array, can point to anywhere in it, even null).
Yeah, OS needs to give you (virtual) memory.
The compiler decides for the stack.
If it wants some.
At startup.
The heap is an abstraction built on top of virtual memory.
When you call malloc is calls the OS's heap alloc which calls the virtual memory alloc.
No, it operates on pages.
When you alloc via malloc like 10 element int array, you get at least page size allocated, since you can only allocate in pages.
So like 4kB.
It is scattered, but in pages, and pages are fixed size chunks. In addition, they may not even be main memory, it can swap them in and out using the disk.
It also gives you a virtual address space for your process and your pointers are virtual pointers that need to be remapped to physical addresses.
No, the heap is built on virtual memory.
Virtual memory also makes use of the memory management unit, a hardware implementation of this addressesing remapping and paging.
Physically, yeah.
If there is an OS (that makes use of virtual memory) you are not directly dealing with the physical layout.
The clever indexing you mentioned is one of the cool things about virtual memory, it handles it for you, and so from the programmer's POV it's contiguous.
You can for example make your own heap. Request some pages / space from the OS, then build a heap data structure on that.
The heap is a convenience for allocating things of different sizes without having to think about where to put them, but it comes with downsides, it's often not a good choice for performance and makes memory management more error prone in manual memory management languages.
For their use cases it serves no purpose.
The best way to not shoot yourself in the foot and get max performance in a language like C is to use region based memory management, which is like the thing mentioned before where you can just get a big array/chunk and then put your actual data structure in there / build on top of it. Each region for a different part of the code / purpose. https://en.wikipedia.org/wiki/Region-based_memory_management
In computer science, region-based memory management is a type of memory management in which each allocated object is assigned to a region. A region, also called a zone, arena, area, or memory context, is a collection of allocated objects that can be efficiently reallocated or deallocated all at once. Memory allocators using region-based manageme...
Unfortunately due to malloc being built in, many C programs have just been spamming it all over their program, leading to many memory leaks and performance issues.
Region based memory management used to be the standard way apps where developed (especially arcade games) but once garbage collection became the more normal thing and most were used to that, it became a lost art.
In general most don't spend enough time in C to learn these things.
(And why would you? Most things can just be done with Python or whatever anyhow in way less time...)
You still see it in game engines, because they can't ship a 10 fps game.
Or other high perf.
Or even ML, where everyone is kind of doing it via one big region loaded onto to the GPU (you can heap alloc on the GPU).