#data-science-and-ml

1 messages · Page 91 of 1

past meteor
#

If for some reason you can't use a neural network you can treat, as ML practitioners call it, dimensionality reduction as a hyperparameter you search over

desert oar
#

great point. that's the "people do it anyway" part 😛

past meteor
desert oar
#

i actually have a project coming up where i might be able to play around with this. see if similarity in a "dumb" space like PCA is semantically useful

past meteor
#

In my experience it's likely not

desert oar
#

this would be more like "tabular" data, not images or text

wooden sail
#

the problem is that, which a good enough cost function, large enough network, and if you don't reduce the rank of the covariance too much, this WILL work anyway

#

because you can't force your classifier to not learn another representation on top of the PCA you do yourself

#

this becomes very interesting though if you examine how many layers and params are needed to get a good result

past meteor
#

But yeah, I think the issue here is that imo it's unfair to talk about PCA in cases where you do have labels

desert oar
#

the project is actually an unsupervised learning project

past meteor
#

Because then yes it's self evident there are possibly better things out there

desert oar
#

any labels would be me or my colleagues manually combing through examples and labeling them "yes", "no", or "meh"

wooden sail
#

you should need more layers if PCA is all you at first, but with enough data and time, any network will be fine as long as you didn't PCA too hard

past meteor
#

It's a lot more interesting to talk about the unsupervised case indeed, where it's totally exploratory and have no labels

#

Like in the original question actually I think

desert oar
#

the original question was totally lacking in context as far as i saw

wooden sail
#

reminds me of a task we had a student try once. some sort of sparse recovery. the input data was considered in the original domain, but also after doing PCA and wavelet decomp. they were using a fairly large network and essentially unlimited synthetic training data, and there was no difference among the results 😛 as is to be expected

#

it becomes interesting when thinking about shrinking networks or trying to get them real-time capable for large inputs

past meteor
#

Well tbh my summary is that it's been a while since I heard anything about dimensionality reduction for classifiers and that for good reason

wooden sail
#

i do hear a lot about it, but mostly through judicious choice of cost functions to promote special behavior in the latent space

past meteor
#

At least in the case of PCA we did a million toy examples in uni where it destroys your data.

wooden sail
#

yeah, just pca straight up, no

past meteor
#

If it's about auto encoders and beta-VAEs etc. yes there's a lot of material there

#

But typically they're truly unsupervised which is, again, what makes them interesting again. Typically I don't see people taking the latent vectors and using them for classification downstream

wooden sail
#

not explicitly, at any rate

spice mountain
#

Do you guys know if the GPU acceleration is model agnostic in Lightning?

#

Like, is there some universal parameter to the pl.LightningModule (pl = Pytorch Lightning) class, that I can set so I utilize all my GPU cores?

#

It seems this exists:

buoyant vine
#

it will defualt to auto though, meaning it will use your GPU if it can automatically

#

Also I would probably avoid using multiple GPUs to begin with until you are a bit more firmiliar with lightning's behaviour, the multi-gpu, multi-device stuff is a bit more annoying than they let on 😅 with a couple of unfortunate bugs scattered in the mix

spice mountain
#

I have a report for Thursday, still haven't trained pithink

buoyant vine
#

how big is your model thonk

spice mountain
#

But it is on an HPC, so I have a sht ton of cores available

spice mountain
buoyant vine
#

Lightning will normally say when you start running

#

also, just a personal preference, but i'd also setup something like MLFlow or neptune to monitor the training process

#

helps to also give a bit of an indication of how long it is going to be before learning tapers off

thorn flame
#

Hey guys! Please who has used chatterbot recently?

toxic mortar
#

What these parameters p,T and m stand for?

#

I understand why we want to penalize greater errors by calculating square subtraction. However I do not get it why there is a 1/2 in front of it? Is that some math convention?

#

Another questions. To find a minimal error we use this descending gradient method. When doing partial derivate dE/dwj why we are doing it on w^T, but not on the Sum(wjxj)

#

They are not the same dimensions, how we can compare apples to oranges?

mild dirge
#

Whenever you see 0.5 in a cost function, it is probably because there is a square there, and the derivative will cancel those two out

echo mesa
#

Guys, does anyone know any books, resources about vectorisation and its implementation in programming? so far ive just seen people using numpy but i didnt actually get to see how it works and implementing it from scratch

mild dirge
#

Vectorization is just the ability to apply the same/similar computation on multiple elements. The way this is implemented is often still sequentially, but in a faster language like C. @echo mesa

#

But sometimes parallel computing, and even a GPU can be used for performing the actual computation.

mild dirge
echo mesa
mild dirge
#

numpy is open Source, so you could take a look at that

#

Though it would probably be simplest if you already have some experience with numpy, and know how to use it

echo mesa
mild dirge
#

"in a way which implements multiple chunks at the same time which is where the question of how is" Do you mean that multiple chunks are processed at the same time?

#

Because that would be parallel processing, you can use OpenMP in C(++) for that

echo mesa
echo mesa
mild dirge
#

Yeah that's the parallel part. vectorization (or array programming) lends itself well to parallel computing.

#

Because you write the instructions such that "multiple elements can be processed at once"

#

Which can then be implemented with parallel computing

#

Or sequentially in a lower-level language

echo mesa
mild dirge
#

Yeah, so SIMD is especially connected to vectorization, because you apply the same operation to multiple elements. And the GPU lends itself for these type of operations.

#

But numpy does not really use the GPU. but a lot of the syntax is used to instruct the computer to apply the same operation to many elements.

mild dirge
#

That is the one I use sometimes for my code

#

If I were you, I would look into OpenMP as well. It is very simple to set-up and use (it already comes with C/C++ stdlib iirc)

echo mesa
#

Gotcha thanks

dusty cloud
#

hello does anyone know any breadboard or circuit simulators? something that has sensors in them like temp sensors, water sensors, etc.? I want to simulate something with rasp pi

tidal bough
# toxic mortar What these parameters p,T and m stand for?

The index T here isn't a variable, it means transposition.
The 1/2, as wccamel mentioned, is just so that the derivative looks nicer. (It doesn't matter what constant we write since minimizing E(w) and E(w) leads to the same w).
p, it seems, is the size of y, so the number of samples. m meanwhile is the size of each x^(k), which would make it the number of input features in each sample.

tidal bough
spice mountain
#

But can anyone help me understand this;

I wish to train multiple models with this same script. The file will be in the same folder, but basically on the HPC I can set a script to run and then start the same script on another set of GPU cores; But they will share the same memory folders. I.e they can overwrite each other.

I am not so strong with Lightning, but appears, that if I do it this way, they will all write their checkpoint to last.ckpt? Is this correct? Can any of you guys suggest any quick fixes to this, so I can give it a number or something and it will turn that into the ckpt?

https://pastecode.io/s/1aquwr9n

#

Like, my instincts tell me I can just change this for every run, but is it that simple?

            # run all checkpoint hooks
            if trainer.global_rank == 0:
                print("Summoning checkpoint.")
                ckpt_path = os.path.join(ckptdir, "last.ckpt")
                trainer.save_checkpoint(ckpt_path)```
spice mountain
tender niche
#

Hello guys!

#

Posting from main thread to this chat:

I need help with correcting a small logic in pyspark query wihch I am unable to solve since 2 days :(....would really appreciate any help....
To give some context :
So I am trying to write a query to identify pairs of airlines that operate on the same date by reading a flights.txt file. In other words I need to find the pairs of airlines that share the same origin and the same dates, and determine the count of each pair....final result must be sorted such like the airline pairs alphabetically (for both airline names in the pair) with the counts in descending order.

My query returns wrong counts.

This is my code with a simple expected example at the end as well
https://pastebin.com/TLEfWjnN
Pastebin

In the example input it should return 3 2 2
but my prog returns 4 4 4

wild sable
#

Hey can someone rate my writing

river cape
#

imputer = SimpleImputer(missing_values = np.nan , strategy = 'mean')
imputer.fit(X[ : , 1:3])
X[ : , 1:3] = imputer.transform(X[ : , 1:3])
Could anyone tell as to why do we use fit and transform

quiet seal
#

Is there a library that's basically desmos but in the discrete domain? [is signal processing data science?]

#

I spent a whole day trying to model a method for alias-suppressed waveforms on desmos and it didn't work, then I realized I'm trying to model digital integration and differencing in a discrete space with an evenly-spaced sample rate and desmos is in the continuous domain 😐

#

I need mathing libraries that are digital that I can feed to like plotly or matplotlib

umbral charm
#
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import odeint
from matplotlib.animation import FuncAnimation, PillowWriter
#  Parameters and model from earlier
# ...
def model(u, t, sigma, rho, beta):
   x, y, z = u
   dxdt = sigma * (y - x)
   dydt = x * (rho - z) - y
   dzdt = x * y - beta * z
   return [dxdt, dydt, dzdt]

sigma = 10
beta = 8/3
rho = 28
# Solve first half
v0 = [1,0,0]
t = np.linspace(0, 25, 5000)
v = odeint(model, v0, t, args=(sigma, rho, beta))

# Marker points to be plotted for these rows
vskip = v[::20]

# Figure object for the animation
fig = plt.figure(figsize=(6, 4), dpi=150)


def animate(i):
   """ This function runs for each frame """

   # Clear the plot
   plt.cla()

   # Plot the solution up to this point
   plt.plot(v[:(i * 20), 0], v[:(i * 20), 2], lw=0.3, color="steelblue")

   # Add a big marker
   plt.plot(vskip[i, 0], vskip[i, 2], "o", color="steelblue", markersize=8)

   #  Make pretty
   plt.ylim([0, 50])
   plt.xlim([-30, 30])
   plt.title(f"t = {round(t[i * 20], 1)}")
   plt.xlabel("x(t)")
   plt.ylabel("z(t)")

# Make the animation
anim = FuncAnimation(fig, func=animate, frames=len(vskip))
anim.save("animation.gif", writer=PillowWriter(fps=20))
#

is there anyway i could make this animate faster

#

takes a good minute

odd meteor
# river cape imputer = SimpleImputer(missing_values = np.nan , strategy = 'mean') imputer.fit...

fit(): This method is used to compute the necessary parameters from the training data needed to perform the missing value imputation. For SimpleImputer, fit() calculates the value that will replace the missing values. In this case, the mean. The fit() method essentially "learns" from the data.

transform(): After the fit() method has computed the mean, the transform() method applies the transformation to the data. So here, the SimpleImputer, will replace all missing values with the computed mean value gotten from fit() method.

In essence, you can think of these duo; fit() & transform() as "learn from this data" vs "apply what you've learnt from the data to the same data and/or on a new data (usually the val or test set)"

Since fit() is used to learn from data, it's called on the train data only. On the other hand, transform() is used on both train, validation, and test set.

verbal venture
#

Does anyone know how unet goes from 572 -> 570 in the first convv layer for feature size

iron basalt
#

C has some automated vectorization by the compiler, but manual SIMD can often be a lot faster.

#

Numpy's implementation makes heavy use of this and it's why it's fast.

#

GPUs have this too and in the case of AMD it's all SIMD vector hardware, even when doing single floats for example.

buoyant vine
#

Has anyone tried to use a GRU (with GloVe) and BCE loss? For some reason, if I try make the classifier multi-label, rather than multi-class, the model learns nothing and its F1 score, Recall, etc... all drop to 0, but if I make it multi-class and then CrossEntropyLoss, it seems to learn fine?

magic dune
#

is the chain rule that helpful?

#

💀

shadow viper
devout oak
#

I dont know if this is the correct topic to ask but I am trying to do a image_to_text with tesseract OCR. I get an empty string to one of the images I use. I tried thresholding but it didnt change anything.

#

I dont really know whats the reason that makes it return return empty either. Like what kind of error it gives?

mild dirge
#

It's AI after all.

#

But maybe you have processed the image incorrectly, or the settings are incorrect.

devout oak
#

yeah true, its a very clear text with high resolution white letters on colored bg. Which I process as gray and threshold. I will crop the image to single lines to see whats wrong I guess

#

well I found the issiue, even though I run it on Turkish. It cant recognize "ş" and turns an error

#

I will try to train it myself to make it recognize

obsidian sand
#

Hi, Does anyone know how to apply SMOTE to BERT?

sick drift
#

Hi, im using python seaborn lib to display data. I got the data in the following format:

array approach (a,a,a,a,b,b,b,b)
array clients(7000,7000,700,700,7000,7000,700,700)
array latency (......)
array throughput(......)

Supposed to be a throughput/latency graph but its not displaying the mean values for me but the individual points

#

Its supposed to be a mean per client setting

#

I can obviously manually calculate the mean but then I dont have error bars either

river cape
#

X_train, X_test, Y_train, Y_test = train_test_split(X , Y , test_size = 0.2 , random_state = 1)

#

Does this split the dataset into train set and test set?

cold osprey
river cape
#

btw while installing ipykernel for jupyter , does it also install the pandas libraries?

earnest ridge
#

can anyone help , here i am trying to map the species column with numerical values but after mapping it is showing nan values only in species column

river cape
#

I have the ipykernel installed in the myenv environment. How do I solve this error?

left tartan
left tartan
buoyant vine
#

AM I loosing my mind, or does no one use GRU with GloVe and PyTorch for multi-label classification 😅

I have been trying to find some resources on it because for some reason when using BCE loss my model decides it shall learn absolutely nothing, but every tutorial, documentation, existing code seems to at best use multi-class classification and in their "Keras VS PyTorch" type blog posts, they don't even compare BCE PyTorch with BCE Keras, they have Keras using BCE and working, but Torch using CE.

sadge Has anyone got any good resources for using GRU and Glove together with PyTorch?

buoyant vine
#

wdym?

past meteor
#

Glove is a way to obtain embeddings you can use for downstream tasks and GRU does both, the embeddings and the downstream task.

#

I guess you could combine them by running your text through glove and to obtain embeddings that are used as input for GRU.

#

Tell me if I've failed to answer your question 🙂

buoyant vine
#

I don't GRU creates the embedding?

With my setup at least, we have Text -> GloVe (N_tokens * 300) -> GRU -> hidden -> output classifier layer

hallow cargo
#

Thank you, it seems I figured the problem with inputting the data, you indirectly got me to understand what the point of tf.data is. Although, I realized from a stackoverflow post that csv files are really unoptimized so I switched to a more specialized file format for tensorflow, .tfrecord. I currently got it working, but it is taking 6 seconds per step, or batch of (2048, 512, 17) just like it did with csv, which was the whole point of switching, to optimize it. From looking at my task manager, I am only getting a load on my gpu for just under a second at the start of each step similar to the csv, so that does not seem to be an issue. I know you specialize in pytorch, but would you have any idea what could cause this? I understand (2048, 512, 17) is quite a big tensor, yet it should load my gpu throughout no?

This is currently what my generator is yielding:
yield tf.transpose(tf.convert_to_tensor(np.array(list(features.values()))), perm=[1, 2, 0]), labels
Which granted is quite long, although in isolated testing, its practically instant.
Would would you think the cause of such long steps would be?

mild dirge
#

If your gpu is only busy for a part of a second every few seconds, then your cpu is probably the bottle neck

buoyant vine
hallow cargo
mild dirge
#

How long did it take to load a single batch then?

#

And do you use 1 or multiple workers for data loading?

hallow cargo
#

I think we've got the issue, I haven't defined that

#

Thank you

past meteor
#

I'd have to think about your BCE Multiclass issue

#

If I remember correctly you are truly trying to do multi label classification

buoyant vine
#

yeah

buoyant vine
past meteor
#

Have you made confusion matrices to see what is up at a basic level?

buoyant vine
#

it is not really even in a state where it is useful to look at the confusion matrixes

#

you can see it just kinda nukes itself

past meteor
#

Debugging other people's ML models, heck even my own models is such a hassle

#

There's nothing on the top of my head that I can recommend, sorr!

cold osprey
# river cape

change to the environment u have ipykernel installed

river cape
cold osprey
#

after creating the venv, u have to activate that venv before installing any packages

river cape
#

my venv is activated and I have installed the packages

cold osprey
#

pip list and see that it's there

river cape
#

Yep it is there in that

river cape
cold osprey
#

my venv's name is venv and its running python 3.9.13

cold osprey
#

can u pip list with the venv activated?

#

should be a short list since u only have ipykernel installed

spring scarab
#

Has anyone successfully taken a Keras trained model and converted it to Onnx?

river cape
cold osprey
cold osprey
#

it just shows the python version

river cape
#

it shows jupyter kernel

cold osprey
#

click python environments

#

and u shud see a list

river cape
#

Seee these both

cold osprey
#

yes select myenv

#

should work then

river cape
#

for which one?

cold osprey
#

pythin environment

river cape
#

jupyter kernel or

cold osprey
#

im not sure what the jupyter kernel one is for actually

river cape
wanton merlin
#

Yo people I want to learn AI/ML any suggestions of tutorials and resources. A dumbed down version ?

left tartan
feral kernel
#

How much vram can you solder onto a rtx 4090? 64or or 128gb?

fiery bane
#

I feel like all these concepts are related, and I'm wondering if there's a taxonomy to organize these ideas?

Continual learning (lifelong learning, incremental learning).
Meta-learning (learning-to-learn), few shots learning.
Transfer learning, domain adaptation.

brittle storm
#

hi

#

can someone help me?

cold osprey
brittle storm
spice mountain
#

Hey, so I have an issue;

I have downloaded a .ckpt file from the internet for VQGAN (https://github.com/CompVis/taming-transformers/tree/master). Whenever I try to extract it from .zip it turns into a folder, which I can't pass into my torch-Lightning program. Anybody got a clue what to do? I am running on Linux Scientific (a HPC cluster)

wanton merlin
feral kernel
#

Yo why is pytorch so buggy on mac, literally almost every time i run it, it says error?

serene scaffold
feral kernel
serene scaffold
feral kernel
serene scaffold
#

and by performance, do you mean "make it faster" or "get rid of errors"?

feral kernel
quaint loom
#

Hi guys. I am currently trying to do the Random forest test on my data. I want the random forest to do the test on 4 different areas. The "Position" column is nummeric and I have filtered them out like this :
'Restored Area 1': [1, 2, 3, 4],
'Restored Area 2': [9, 10, 11, 12],
'Unrestored Area 1': [5, 6, 7, 8],
'Unrestored Area 2': [13, 14, 15, 16]

Is there anyone who can see what mistake I make? I end up with everything in A.

Here is the code: https://paste.pythondiscord.com/UJOA

long canopy
#

is there a term to designate old-school AIs, i.e., AI before the likes of ChatGPT?

#

e.g., AI in Halo 1's enemy NPCs

agile cobalt
#

I wouldn't even call these "AIs", just NPC at best
if I had to guess, it probably just uses some path-finding algorithm like A* to find the closest distance between the player and the NPC then takes that path

long canopy
#

found it: GOFAI is the term

serene scaffold
#

though these days, I can't really think of an example of not-machine learning that's still considered "AI"

neon field
#

does anyone have convolution dataset for satellites signals in FEC (forward error correction)

jagged pulsar
#

hi, so I'm trying to play wave file with pyaudio and dynamically plot it with matplotlib, I already have script for plotting it in a static way and it looks like this

import wave
import numpy as np
import matplotlib.pyplot as plt

wav = wave.open("test.wav", 'r')
raw = wav.readframes(-1)
wav.close()

raw = np.frombuffer(raw, "int16")
sample = wav.getframerate()

time = np.linspace(0, len(raw) / sample, num=len(raw))

plt.plot(time, raw, color="green")
plt.show()

but I have no idea how to this, can you help me?

quaint gorge
#

I want to evaluate the effectiveness of a text classification algorithm in terms of how many mistakes it did, anyone has experience knows where to look? I want to know on average how many mistakes it will make in an n sample

past meteor
# long canopy is there a term to designate old-school AIs, i.e., AI before the likes of ChatGP...

I agree with Stelercus and Etrotta's answers.

Just to add, pathfinding algorithms (breath-first, depth-first, A*, ...) are typically considered "AI" and they aren't machine learning at all. They all fall under the broader category of Knowledge representation and reasoning https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning.

Personally, I think this stuff still matters to an extent because unlike ML with reasoning you typically get exact answers, whitebox answers where ML typically gives you an approximation that is also pretty opaque.

long canopy
past meteor
long canopy
river cape
#

Hey guys could anyone suggest any ml models that I can use for repair and service website?

past meteor
#

Can you format this, this is unreadable. I don't think people will bother to read this sorry 😅

feral kernel
# past meteor Can you format this, this is unreadable. I don't think people will bother to rea...

Here is the new code and new error, i tried to reduce the batch size and increase dimension and decrease the indices. `import torch.nn as nn
import torch.optim as optim
import torch
import time

Define the FCNN with Bessel activation

class FCNN(nn.Module):
def init(self):
super(FCNN, self).init()
self.conv1 = nn.Conv2d(1, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 62 * 62, 256)
self.fc2 = nn.Linear(256, 10)
self.bessel = torch.special.bessel_j0 # Bessel function as activation

def forward(self, x):
    x = self.conv1(x)
    x = x.view(-1, 64 * 62 * 62)  # Reshape for fully connected layer
    x = self.bessel(self.fc1(x))
    x = self.fc2(x)
    return x

Instantiate the model, loss function, and optimizer

model = FCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.LBFGS(model.parameters(), lr=0.01, max_iter=20) # Quasi-Newtonian optimizer

Break the input matrix into 8x8 matrices

batch_size = 8
num_batches = fft_result_tensor.shape[1] // batch_size

Training loop

def closure():
optimizer.zero_grad()
total_loss = 0.0

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = start_idx + batch_size

    # Extract an 8x8 tensor from the input
    input_batch = fft_result_tensor[:, start_idx:end_idx].unsqueeze(0).unsqueeze(1)

    # Forward pass
    outputs = model(input_batch)

    # Calculate loss
    target_labels = torch.tensor([0])  # Replace with your target labels
    loss = criterion(outputs, target_labels)

    # Accumulate loss
    total_loss += loss

# Backward pass
total_loss.backward()
return total_loss

Perform optimization

start_time = time.time()
for epoch in range(10): # Adjust the number of epochs as needed
optimizer.step(closure)

end_time = time.time()
training_time = end_time - start_time
print(f"Training time: {training_time} seconds")`

#

`1510 else:
-> 1511 return self._call_impl(*args, **kwargs)

File /opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
1515 # If we don't have any hooks, we want to skip the rest of the logic in
1516 # this function, and just call forward.
1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1518 or _global_backward_pre_hooks or _global_backward_hooks
1519 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520 return forward_call(*args, **kwargs)
1522 try:
1523 result = None

Cell In[27], line 17, in FCNN.forward(self, x)
15 def forward(self, x):
16 x = self.conv1(x)
---> 17 x = x.view(-1, 64 * 62 * 62) # Reshape for fully connected layer
18 x = self.bessel(self.fc1(x))
19 x = self.fc2(x)

RuntimeError: shape '[-1, 246016]' is invalid for input of size 78720`

past meteor
feral kernel
#

I will reduce the batch size even lower

past meteor
#

Ah, you should paste the code in there

feral kernel
past meteor
#

And then send us the link here

feral kernel
#

I tried to download the text and sent it to the chat but it won't accept it

past meteor
#

Alright, that's better. (I also don't answer to DMs, sorry)

#

Where does fft_result_tensor come from?

feral kernel
#

Also I changed the elements to 26240, but it still doesnt work

past meteor
#

This can be annoying to do, but what I did in the past is use a pen of paper and literally compute this. Maybe add it in comments and track it.

feral kernel
#

So i added the stride and padding size to the code and changed the code to have 3 channels and 256 as , so (256*3 channels +2 -1 )/1 +1=770. Nvm i need to check the size of the csv file first

past meteor
past meteor
#

Then you should be able to call .shape on the the fft_result_tensor

feral kernel
past meteor
feral kernel
past meteor
#

I'd make sure your shapes are truly 71 x 256 x 3 so you don't have an oopsie

#

You can use the equation above to calculate how large your oupout will be, it'll be X * Y * 64

feral kernel
past meteor
#

Indeed

#

Don't be afraid to use a debugger to track the shapes throughout

feral kernel
feral kernel
# past meteor Indeed

ValueError: Expected input batch_size (207) to match target batch_size (1). I changed the output size to match the input size but still this error, so i changed x = x.view(1, 64 * 36 * 23)

past meteor
#

How familiar are you with PyTorch?

feral kernel
#

so i need to change batch size to 207?

past meteor
feral kernel
feral kernel
past meteor
#

I think at this point it's really just best you read those docs, especially if you're in it for the long haul

#

I also got to go so I won't be able to help

feral kernel
#

Yep, i skimmed and read some of it earlier today and before , i will read more and try to get used to code formatting and syntax. Also why does jupyter sometimes run a code and says it is successful but it doesnt show any progress and printing even though i wrote print?

desert onyx
#

Hi, is there anyone who can help me with something simple?

small wedge
long canopy
#

anything I should be following if I'm interesting at attempts to model, in the sense of mathematical-ish modeling, emergent abilities of LLMs?

left tartan
lucid tide
#

Who has knowledge on tensorflow, QNN, PNN and transformer architectures?

feral kernel
# left tartan Can you share the code that you’re wondering about?

Thanks!`import torch.nn as nn
import torch.optim as optim
import torch
import time

Define the FCNN with Bessel activation

class FCNN(nn.Module):
def init(self):
super(FCNN, self).init()
# Set padding and stride for the convolutional layer
self.conv1 = nn.Conv2d(1, 64, kernel_size=3, padding=1, stride=1)

    # Modify the size of the fully connected layer to match the input tensor dimensions
    self.fc1 = nn.Linear(69 * 256 * 3, 256)  # Adjusted size for 4x4 matrices
    self.bessel = torch.special.bessel_j0  # Bessel function as activation
    self.fc2 = nn.Linear(256, 10)

def forward(self, x):
    x = self.conv1(x)
    x = x.view(-1, 64 * 36 * 23)  # Reshape for the fully connected layer

    # Apply Bessel activation to the reshaped input tensor
    x = self.bessel(x)
    x = x.view(0, 13428)  # Reshape back to original dimensions
    x = self.fc2(x)
    return x

Instantiate the model, loss function, and optimizer

model = FCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.LBFGS(model.parameters(), lr=0.01, max_iter=20) # Quasi-Newtonian optimizer

Break the input matrix into 4x4 matrices

batch_size = 1
num_batches = fft_result_tensor.shape[1] // batch_size

Training loop

def closure():
optimizer.zero_grad()
total_loss = 0.0

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = start_idx + batch_size

    # Extract a 4x4 tensor from the input
    input_batch = fft_result_tensor[:, start_idx:end_idx].unsqueeze(0).unsqueeze(1)

    # Forward pass
    outputs = model(input_batch)

    # Calculate loss
    target_labels = torch.tensor([0])  # Replace with your target labels
    loss = criterion(outputs, target_labels)

    # Accumulate loss
    total_loss += loss

# Backward pass
total_loss.backward()
return total_loss`
winter drift
#

is anyone familiar with finetuning gpt, i built a dataset and would like to test it out but am a bit lost

split compass
#

Greetings everyone
I have started to work on RLHF recently. So I'm thinking is there anyone who has any kind of experience in it.

rigid cape
#

hey guys , is there any recommended curriculum for learning machine learning using python ?

rigid cape
#

I know basic Data analysis using pandas and numpy but thats it .

lapis sequoia
#

so check the ML with python from IBM on coursera

#

there's an aduit version

lapis sequoia
#

hi im trying to implement a FaceRecognition in python, due to that i createt an venv and use vscode with a jupyter notebook , i am wondering why i get no output from this line of code : for directory in os.listdir("lfw"):
for file in os.listdir(os.path.join("lfw", directory)):
os.path.join("lfw", directory, file)
os.path.join(NEG_PATH, file)
print(file)
print("hello), the folde lfw exists , the NEG_PATH exist , and not even the print("Hello") statement works any idea why?

cold osprey
#

!code

arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

lapis sequoia
#

hi im trying to implement a FaceRecognition in python, due to that i createt an venv and use vscode with a jupyter notebook , i am wondering why i get no output from this line of code :py for directory in os.listdir("lfw"): for file in os.listdir(os.path.join("lfw", directory)): os.path.join("lfw", directory, file) os.path.join(NEG_PATH, file) print(file) print("hello), the folde lfw exists , the NEG_PATH exist , and not even the print("Hello") statement works any idea why?

devout python
#

Okay this is a super dumb question but ^c mean, I get it for some reason when I run my script on colab, but nowhere else...

#

what does ^c mean*

tidal bough
#

that's what ctrl-C (interrupt) tends to write

devout python
#

Any idea why colab writes that - it runs well on my desktop?

#

can it be because the files are too big to hold in memory?

river cape
#

So I have a service - based website in which we undertake repairs of any electronic gadgets. So I was thinking of implementing a chatbot which could answer the most basic queries and if the user's problem is still not solved , then on the existing data received by the chat bot , I would like to run a price prediction model which computes the approx. cost for the damage or repair incurred. Is there any way to implement this? Please I need help

cold osprey
#

for directory in os.listdir("lfw"):
    for file in os.listdir(os.path.join("lfw", directory)):
        os.path.join("lfw", directory, file)
        os.path.join(NEG_PATH, file)
        print(file)
        print("hello)

#

cant rly tell but r there indent problems?

#

cant help much without seeing the actual file directory too

quick sapphire
#

It will set you up with an advisor and you can look through the different roles at your convenience to getting your certifications

fallow frost
#

How do you guys generate graphs like these? there was a website or smth?

tidal bough
#

if you mean programmatically, there is pygraphviz

lavish kraken
#

Does anyone know why Hyperparameter Grid Search with XGBoost takes like so much time to run? it's frustrating...running for hours endless wtf

#

i have even reduce the size of the data rom 200k rows to 2,000

cold osprey
#

how big is ur param_grid

lavish kraken
#

i don't know how to quantify how big it is

cold osprey
#

u can try for only 1 set of parameters and see how long that takes

lavish kraken
#

how do you guys calculate or know how big a parameter is

cold osprey
#

and multiply to get total

lavish kraken
#

If i were to share the code snippet for you to help me edit r make some chanegs to make it run faster

#

can i do share my code snippet respectfully

odd meteor
# lavish kraken Does anyone know why Hyperparameter Grid Search with XGBoost takes like so much ...

Here are two things to do to speed up the training time.

  1. Since you're using XGBoost, if you have access to GPU on your machine, utilize that to speed up your hyperparameter tunning. Add this parameter tree_method='gpu_hist' while instantiating your XGBClassifier.

  2. Reduce your search space. The larger your search space, the longer the time it'll take to finish running. So, you might wanna reduce the number of hyperparameters you're trying to tune and their respective search space.

lavish kraken
untold bloom
#

or the verbose mode of the GridSearchCV tells you how big it is in the first line ℓoℓ

#

so maybe don't need the manual coding but still

odd meteor
# lavish kraken i don't know how to quantify how big it is

Hopefully this long ass post you're about to read will clarify things for you.

Imagine you're using GridSearchCV for hyperparameter tunning. You're interested in tunning 3 hyperparameters (let's call them A,B, and C for now.), and each one of them have 4 search space.

A = [4, 40, 65, 100]
B = ['Hey', 'Hoo', 'Haa", 'Santa']
C = [0.4, 0.25, 1.5, 6.5]

Now, to determine the total number of fits your GridSearchCV will make when tuning hyperparameters, you simply need to multiply the number of unique values in each hyperparameter's search space. In our small example (remember we have 3 hyperparameters & each has 4 possible search space), this will be:

4 (for hyperparameter 1) × 4 (for hyperparameter 2) × 4 (for hyperparameter 3) = 64 fits

So your GridSearchCV will perform a total of 64 fits.

Now, if you're performing cross-validation and the number of folds = 5, then GridSearchCV will perform a total of 320 fits; 64 (total fits from hyperparameter tuning) × 5 (number of cross-validation folds).

Again, now factor the sample size of your observations, that is, the total number of rows in your data (both train and test). The bigger your train data, the longer it'll take to quickly perform those 320 fits. And of course, the bigger the size of the data you're using for batch prediction (your test data / validation data; whichever one you're calling .predict() on) the longer the time it takes to make prediction as well (this part doesn't really take much time compared to when the model is being fit to the training data.)

So, you see how with just 3 hyperparameters, 4 search space each, and 5-fold cross cross validation, you're calling .fit() 320 times on your train data. Now imagine what happens when you increase this param_space.

Once you understand the scenario above, you can easily compute the same thing with your current setup.

past meteor
#

Two tips:

  1. Focus on tuning the amount of estimators hyperparameter (only). It's the most high value one.
  2. Use random search instead of grid search
exotic epoch
#

hey guys

#

I have to fill a code according to some comments

past meteor
exotic epoch
#

okay so here is the google colab link wich contains the code https://colab.research.google.com/drive/14mVqgJjhQWsxq0sddOOGxnpgwmuCa00B#scrollTo=FbZAW_DTLWbp and these are the instructions guys :: Download the iris.csv dataset from kaggle, using this link https://www.kaggle.com/datasets/saurabh00007/iriscsv
Put the data in the same directory of your notebook.
Start following this notebook https://colab.research.google.com/drive/14mVqgJjhQWsxq0sddOOGxnpgwmuCa00B#scrollTo=iHnErH5UAzTY
Read the comments carefully and complete the code whenever it is needed

#

please help me guys

past meteor
exotic epoch
#

it is a home work yep

#

For exemple here according to the green instruction (comment) u have to import the confusion matrix

#

and i don't know to do that yet but i am really working on it

past meteor
#

So, I think you'll learn more if you do this yourself. I checked the notebook and it's basically exclusively things from sci-kit learn

exotic epoch
#

yes it is

#

i can't do it myself infortunately

past meteor
exotic epoch
#

thanks anyways for the help

past meteor
exotic epoch
#

don't have the necessary knowledge

past meteor
#

You're in week 5, if I bail you out right now you'll be stuck later on. I want to help but the best way to help is letting you figure it out 🙂

#

If you have specific questions like "how does model X work" or "why is this method like this and not like that", I and most people here will still be happy to help though

exotic epoch
#

okay i see

#

Thanks dude

long canopy
#

anything I should be following if I'm interested in attempts to model, in the sense of mathematical-ish modeling, the possibility of the emergence of emergent abilities in LLMs?

serene scaffold
#

Anyone going to NeurIPS? I'll be there for the workshops only.

feral kernel
#

Hi, im still getting another error, ValueError: not enough values to unpack (expected 3, got 2). even though i changed the height and width and channels, the shape is Shape: torch.

feral kernel
# feral kernel Hi, im still getting another error, ValueError: not enough values to unpack (exp...

`import torch.nn as nn
import torch.optim as optim
import torch
import time

Define the FCNN with Bessel activation

class FCNN(nn.Module):
def init(self, input_dim):
super(FCNN, self).init()
# Adjust convolution based on input dimensions
self.conv1 = nn.Conv2d(1, 64, kernel_size=3)
# Unpack input dimensions
channels, height, width = input_dim
# Hidden layer size based on input and output dimensions
hidden_size = 64 * height * width
self.fc1 = nn.Linear(hidden_size, hidden_size) # Modified size for dynamic input
self.bessel = torch.special.bessel_j0 # Bessel function as activation
self.fc2 = nn.Linear(hidden_size, channelsheightwidth) # Output same as input

def forward(self, x):
    x = self.conv1(x)
    x = x.view(-1, x.shape[1] * x.shape[2] * x.shape[3])  # Reshape based on input dimensions
    x = self.bessel(x)
    x = self.fc1(x)
    x = self.fc2(x)
    x = x.view(-1, channels, height, width)  # Reshape to match input
    return x

Initialize model with actual input dimension

model = FCNN(fft_result_tensor.shape)

Adjust loss function based on desired output type (e.g., reconstruction)

criterion = nn.MSELoss()

Use a more suitable optimizer for large datasets

optimizer = optim.Adam(model.parameters())

Break input matrix into batches

batch_size = 4
num_batches = fft_result_tensor.shape[1] // batch_size

`

digital marsh
#

Hello there, I've been looking for ai since 3 days and I've watched video about flappy bird and ai did it worked well then tried to do the same process with snake but the result aren't that good. I think my problem are the output and maybe how I add and remove fitness. For the fitness I just add fitness when the snake eat food got a new highscore or when snaked get better average score
and I just remove snake when he died and touch itself
and the most important part is the output I do that py for x, snake in enumerate(snakos.copy()): pposition = ["right", "left", "up", "down"] output = nets[x].activate( (int(snake.body_pos[0][0]), int(snake.body_pos[0][1]), closest_apple(snake.body_pos, apples[x])[0], closest_apple(snake.body_pos, apples[x])[1])) snake.direction = get_optimal_direction(snake.body_pos[0], apples[x],all_position[-1],snake.body_pos) if output[0] > 0.5 else pposition[randint(0, len(pposition) - 1)] all_position.append(snake.direction) snake.update_position() snake.display() ge[x].fitness += 0.1

#

thanks for the reading and the help

odd meteor
magic bloom
#

I just built an AI data scientist. Any suggestions on how to improve? https://youtu.be/ZjpNx8qNnaA

🚀Explore OpenAI Assistants API🚀

🔗 Github Repo: https://github.com/calapsss/assistants-api-easy

🔗 STAY IN THE LOOP:
Medium: https://pipsworld.medium.com/
Twitter: https://twitter.com/pips_ai

✨What is Assistants API
The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage ...

▶ Play video
buoyant vine
#

"AnalAssit" is truly an unfortunate name to give it 😅

desert oar
#

personally, the editing style and clickbait title puts me off

#

you built something and i didn't, so i don't want to be dismissive. maybe i'm just not the target audience for this kind of content.

spare briar
magic bloom
magic bloom
magic bloom
magic bloom
serene scaffold
#

@magic bloom keep in mind that we don't allow self-promotion when it falls under advertising

magic bloom
serene scaffold
#

Can*

magic bloom
fading scaffold
#

how to count std using pd?

odd meteor
fading scaffold
odd meteor
# fading scaffold yes that's what i mean, pardon me 😅

I presume you've already read your data into pandas and df is the name of your dataframe. You can compute the standard deviation of any column using this

df['column_name'].std() if you want to see the standard devaition of all numeric columns in your data, you can the describe() method to get the descriptive stats. df.describe()

river cape
#

Any good sites to find datasets other than kaggle?

#

And how do I increase the accuracy of the model?

past meteor
#

That's why kaggle is a great platform, it's not just about the data but also about what people used it for

toxic mortar
#

I dont get why saturation is problematic? That means all of weights won't be same sign? Why that would be bad thing? Some features are worse than the other

#

Topic: [Activation functions CNN]

quartz hawk
#

Hey guys so I'm working on a project and it requires to extract some data from images in text format(key value pair).

the images are scanned pyq(previous year question paper) and i want to extract the name of the course, course id , year of examination, type of examination like major or minor, department of course.

So my initial approach is to use some kind of ocr(pytesseract) and use langchain to extract the key-value from the text.

Is there any better approach to this problem than this?

past meteor
toxic mortar
#

Just to confirm, when gradient is tailing off that means there are little to none updates?

past meteor
# toxic mortar Okay, the opposite of the saturation is some function that has lim(x->inf) = inf...

Basically, with large input you get close to -1 and 1 which means you get small gradients indeed. It's typically better to have something that allows for the gradient to flow nicely through the network, like relu.

You mentioned "something predictable", relu can cause the numbers internally to be more "unpredictable" or cause something called internal covariate shift but other tricks like batch normalisation account for this.

#

Predictable output in hidden layers is not a goal in and of itself tough. With gradient descent you want covariates to be on the same scale. Tanh gives you somewhat of a guarantee here, but not fully. Relu even less so.

past meteor
toxic mortar
wild sluice
#

how would i check a pandas dataframe column for a certain phrase like checking whether a row in the name's column has a certain word

lapis sequoia
#

load() missing 1 required positional argument: 'loader' in ```from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer

'''
This is an example showing how to create an export file from
an existing chat bot that can then be used to train other bots.
'''

chatbot = ChatBot('Export Example Bot')

First, lets train our bot with some data

trainer = ChatterBotCorpusTrainer(chatbot)

trainer.train('chatterbot.corpus.english')

Now we can export the data to a file

trainer.export_for_training('./my_export.json')```

#

pls help me

mental bane
#

i installed tensorflow but while running it in jupiter notebook it throws the error "
SymbolAlreadyExposedError: Symbol Zeros is already exposed as (). " i cant find sol can someone help me with it

long canopy
feral kernel
#

hey, why does jupyter keeps dying when i train a neural net? not enough memory, but it is a small neural net?

odd meteor
fading scaffold
odd meteor
feral kernel
warm shard
#

Why not anyone talking about gemini?

#

that modal ins doing insane things

agile cobalt
#

most of the multi-modal capabilities will only be released next year, iirc right now the publicly accessible version should be more or less on the same level as GPT-3.5, maybe just a bit better at non-English languages?

one way or the other, it's not that much more impressive than AWS's, Anthropic's and other closed source models

stone glacier
#

hey all,
is the python mlx module exclusive to MacOS devices?

#

or can a windows setup or github workspace run it?

feral kernel
# odd meteor Does this only happen when you try to train a NN? You need to figure out if it's...

`import torch
import torch.nn as nn
import time
from torch.cuda.amp import autocast, GradScaler
from torch.utils.checkpoint import checkpoint
from torch.optim.lbfgs import LBFGS

class FCNN(nn.Module):
def init(self, input_dim):
super(FCNN, self).init()

    # Adjust convolution based on input dimensions
    self.conv1 = nn.Conv2d(1, 16, kernel_size=3)  # Further reduced filter count

    # Unpack input dimensions
    channels, height, width = input_dim

    # Hidden layer size based on input dimensions
    hidden_size = 16 * height * width

    # Define network layers
    self.fc1 = nn.Linear(hidden_size, hidden_size)
    self.bessel = torch.special.bessel_j0
    self.fc2 = nn.Linear(hidden_size, channels * height * width)

def forward(self, x):
    # Forward pass
    with torch.inference_mode():
        if is_available():
            x = x.to("mps")
            model = model.to("mps")

        x = self.conv1(x)
        x = x.view(-1, x.shape[1] * x.shape[2] * x.shape[3])
        x = checkpoint.checkpoint(self.fc1, x)
        x = self.bessel(x)
        x = checkpoint.checkpoint(self.fc2, x)
        x = x.view(-1, channels, height, width)

    return x

Initialize model with actual input dimension

model = FCNN(fft_result_tensor.shape)

Adjust loss function for MPS

criterion = nn.functional.mse_loss

Move model to MPS device if available

if is_available():
model = model.to("mps")

Break input matrix into batches

batch_size = 4
num_batches = fft_result_tensor.shape[1] // batch_size

L-BFGS optimizer

optimizer = LBFGS(model.parameters())

`

#

`def closure():
optimizer.zero_grad()
total_loss = 0.0

for i in range(num_batches):
    start_idx = i * batch_size
    end_idx = start_idx + batch_size

    # Extract a batch of input
    input_batch = fft_result_tensor[:, start_idx:end_idx].unsqueeze(0).unsqueeze(1)

    # Move input to MPS device if available
    if is_available():
        input_batch = input_batch.to("mps")

    # Forward pass
    with autocast():
        outputs = model(input_batch)

        # Calculate loss
        target_labels = input_batch
        loss = criterion(outputs, target_labels)
        total_loss += loss

    # Print progress
    if i % 10 == 0:  # Adjust print frequency
        print(f"Batch {i+1}/{num_batches}, Loss: {loss.item():.4f}")

return total_loss

Perform optimization

start_time = time.time()
for epoch in range(10):
optimizer.step(closure)

# Save the model at the end of each epoch
torch.save(model.state_dict(), f"model_lbfgs_mps_epoch_{epoch+1}.pth")

end_time = time.time()
total_training_time = end_time - start_time
print(f"Total training time: {total_training_time:.2f} seconds")`

#

How do i fix this, it says not enough values to unpack

desert oar
#

@feral kernel 1) i suggest posting longer code section at https://paste.pythondiscord.com/ 2) post the full error message on that same site, including the "traceback" which should point to the exact line of code where the error occurs; you can use that to figure out why the error occurred

feral kernel
slow totem
#

hello! I have an urgent question, and would really, extremely appreciate any help. Thanks in advanced

So, im trying to use Llama-7b model to generate a few answers to simple questions. But the issue is, I have a 1gb ram server, and it can absolutely not handle llama, or mistral, unless I want to give up all the intelligence (better just use DialoGPT at that point) I aim to generate an answer to the question using specific documents (but these are general knowledge questions, so the docs are not required, only hopeful that they might reduce computation)

Coming to what I need help with, are there any alternatives of LLMs that arent completely stupid, but work on my 1 gb ram server? Alternatively, is there a hosted/inferencing api for LLAMA or Mistral or any such LLMs that has a free tier that I can work with?

Thanks, have a cookie for reading through 🍪

serene scaffold
slow totem
serene scaffold
slow totem
serene scaffold
#

also, you need access to GPU compute. If you don't have a GPU, you should immediately give up completely on trying to run any instruction-tuned LLMs

serene scaffold
slow totem
slow totem
serene scaffold
#

LLMs existed before "ChatGPT-like" LLMs became popular. And now everyone thinks "LLM" refers only to ChatGPT-like models

#

(ChatGPT is also an instruction-tuned LLM)

slow totem
serene scaffold
#

But I can't think of anything worthwhile you could potentially do with AI on a server with only 1GB RAM.

tidal bough
#

(An LLM is initially trained on massive datasets to complete text - that is, predict next token repeatedly. That's what some people call a "base model" these days. If you asked this model a question, it might give some answer if you make it look sufficiently like a Q-A dataset, or it may complete the prompt with some questions of its own. There's many tasks such models are useful for, like creative writing, but they aren't assistants of any kind - to turn it into something like ChatGPT, you need to tune it with something like RLHF to alter its utility function from "emit tokens that are like what would follow this text in my training data" to "emit tokens that wouldn't get me punished during RLHF".)

desert hawk
#

Howdy, I am a self taught web dev which led to me securing a web dev role. Prior to getting into web I was studying and absolutely loved Python (but was given advice to pickup web to secure a job which ultimately worked out). But now I'm working in web I feel like I can spend the time to pickup Python again. I am going to go through a 100 days of Python course as well as a TensorFlow course since that seems like a lot of fun!

hollow sentinel
#
pd.read_excel("/content/Gross Collections, by Type of Tax and State - IRS Data Book Table 5 2022.xlsx", header = 3)  ```
#

so the problem here is that the header is merged and centered

#

should i just unmerge and uncenter it and see what happens?

#

i don't really know how to read this data

#
Unnamed: 0    Unnamed: 1    Unnamed: 2    Total    Individual income\ntax withheld\nand FICA tax [3]    Individual income\ntax payments and \nSECA tax [3]    Unemployment\ninsurance tax    Railroad\nretirement tax    Estate and \ntrust income \ntax [4]    Unnamed: 9    Unnamed: 10    Unnamed: 11
0    NaN    -1.000000e+00    -2.0    -3.000000e+00    -4.000000e+00    -5.000000e+00    -6.0    -7.0    -8.0    -9.0    -10.0    -11.0
1    United States, total    4.901514e+09    475871099.0    4.321609e+09    3.089258e+09    1.133996e+09    7046465.0    6148312.0    85160093.0    28909393.0    4445883.0    70679117.0
2    Alabama    3.605756e+07    1936430.0    3.356064e+07    2.368409e+07    9.255441e+06    73015.0    3525.0    544565.0    267069.0    30921.0    262500.0```
#

this is what i'm getting so far

#

i'm not sure why i'm getting these unnamed things

#

if anyone knows, feel free to ping me

#

i have no idea what i'm doing

serene scaffold
#

and it looks like the first row doesn't actually have names for each column

hollow sentinel
#

hmmm

hollow sentinel
serene scaffold
hollow sentinel
#

how do i provide the column names manually?

serene scaffold
#

idk how well that will work as I've never tried to open an excel sheet with row or column spanning cells

hollow sentinel
#

provide a list?

serene scaffold
hollow sentinel
#

ah, thanks.

serene scaffold
hollow sentinel
serene scaffold
hollow sentinel
# serene scaffold if someone ever sends you an excel book with merged cells, send them that sonic ...
pd.read_excel("/content/Gross Collections, by Type of Tax and State - IRS Data Book Table 5 2022.xlsx", skiprows=5, 
              header = ["Total Internal Revenue collections", "Business Income Taxes", "Total",
                        "Individual income tax withheld and FICA tax", "Individual income tax payments and SECA tax [3]",
                        "Unemployment insurance tax", "Railroad retirement tax", "Estate and trust income  tax [4]", 
                        "Estate tax", "Gift Tax", "Excise Tax"])
#

ValueError Traceback (most recent call last)
<ipython-input-57-d484ef631d33> in <cell line: 1>()
----> 1 pd.read_excel("/content/Gross Collections, by Type of Tax and State - IRS Data Book Table 5 2022.xlsx", skiprows=5,
2 header = ["Total Internal Revenue collections", "Business Income Taxes", "Total",
3 "Individual income tax withheld and FICA tax", "Individual income tax payments and SECA tax [3]",
4 "Unemployment insurance tax"])

5 frames
/usr/local/lib/python3.10/dist-packages/pandas/io/common.py in validate_header_arg(header)
196 header = cast(Sequence, header)
197 if not all(map(is_integer, header)):
--> 198 raise ValueError("header must be integer or list of integers")
199 if any(i < 0 for i in header):
200 raise ValueError("cannot specify multi-index header with negative integers")

ValueError: header must be integer or list of integers

serene scaffold
#

oh fuck

#

should be names=. sorry about that

hollow sentinel
#

well, this will be a cool story to tell on an interview

serene scaffold
hollow sentinel
#

ah yeah i was reading the same thing, i just couldn't wrap my head around it

hollow sentinel
serene scaffold
#

YAY

hollow sentinel
# serene scaffold YAY
     Total Internal Revenue collections  Business Income Taxes        Total  Individual income tax withheld and FICA tax  Individual income tax payments and SECA tax [3]  Unemployment insurance tax  Railroad retirement tax  Estate and trust income  tax [4]  Estate tax  Gift Tax  Excise Tax
Alaska                               6572445.0               150882.0    6323953.0                                    4423914.0                                        1689140.0                     11933.0                   2008.0                          196959.0     35788.0      98.0     61723.0
Arizona                             71814870.0              5116779.0   64739720.0                                   45071814.0                                       18865670.0                    129407.0                   2013.0                          670817.0    209065.0   30814.0   1718491.0
Arkansas                            40231970.0              4846558.0   34464074.0                                   26995802.0                                        6983808.0                    141622.0                   2969.0                          339873.0    157175.0  133501.0    630661.0
California                         696826462.0             77361863.0  608660632.0                                  427216972.0                                      174510813.0                    849338.0                   8073.0                         6075435.0   5778261.0  669690.0   4356015.0
Colorado                            88448670.0              7523650.0   80022210.0                                   55735321.0                                       23393304.0                    111294.0                  20460.0                          761831.0    192972.0   46954.0    662883.0```
#

that's probably a nightmare to look at

hollow sentinel
# serene scaffold YAY

my idea is to create a project where i compare how much states get paid by the federal government and how much states have to pay the federal government in taxes

#

idk if that's a good idea, but i wanted to do it bc i'm interviewing for a gov org and i figured i'd make it domain specific

serene scaffold
hollow sentinel
#

also, would it be a bad idea to put this in sql?

serene scaffold
hollow sentinel
#

yeah this isn't like a massive dataset

hollow sentinel
#

like a stacked bar chart?

#

idk what else to do with the data

desert oar
#

Could also do some interesting comparisons with state GDP, again both total and per capita

#

Usually it's also a good idea to just look at the distribution of each variable individually

desert oar
hollow sentinel
#

i just don’t know what hypothesis i have

graceful anvil
#

is anyone here familiar with keras 3?

feral kernel
#

Hey, what is the most portable high performance (24gb of vram or greater ) desktop or laptop for machine learning that i can bring as a carry-on that weighs less than 10 pounds. Lol maybe a mac studio?

mild dirge
#

I wouldn't really want to buy a really expensive laptop for good performance, it's pretty bad value for your money compared to a desktop.

glad moth
#

It is better to buy a desktop for data analysing or computational operations, if you have a good office.

#

I tried laptop for my work, I always suffer from heating or over heating even I pursued high spec laptop

past meteor
#

I prefer a laptop because I'm on the go a lot. I take the train semi frequently and usually work there.

#

Desktops are a lot more cost efficient / value for money and have better longevity. It depends what you're after 🙂

odd meteor
# glad moth I tried laptop for my work, I always suffer from heating or over heating even I ...

We all have individual preference. For me, nothing beats the feeling of being able to set up my GPU cluster and train my model locally. I can't afford the kind of setup I want at the moment, so for now, I always rent / use Kaggle GPU.

So weigh your options and go with what rocks your boat. If you're someone like me who prefers investing in personal pc, then you might be interested in TensorBook and other machines that have similar spec. https://lambdalabs.com/deep-learning/laptops/tensorbook

Intel i7-11800H (8 cores, 2.30 GHz), 64 GB Memory, 2 x 1 TB, NVMe SSD, Data Science & Machine Learning Optimized. TensorFlow, PyTorch, Keras Pre-Installed. Fast shipping.

wooden sail
#

i would advise that no laptop will ever be "good" at ml atm

glad moth
#

@past meteor
Yes, working conditions device which option is more fit with you situation..

feral kernel
feral kernel
#

Where can u rent an H100 or A100 for really cheap? 1.6/hr is expensive if u train for a while

buoyant vine
#

you cant mmLol

#

GPU machines are on cloud are always very expensive

#

1.6/hr is super cheap though if you have access to a H100 or A100

#

We spent close to $12-16/hr for each one of our training machines

buoyant vine
#

I love my 3070ti, but it is shit for anything other than fairly small models

#

Trying to do anything productive with it on heavy compute is a nightmare because you have no where near enough Vram

hollow sentinel
#

thoughts? that's a hypothesis, right?

#

i feel like i get discouraged from my projects because they don't really do anything

hollow sentinel
#

i want to derive something interesting from the data and actually answer something

past meteor
#

Imo a very important side note to the GPU discussion is that this only applies to LLMs

#

You can do a lot with a laptop if it isn't LLMs. Computer vision for instance doesn't need that much vram. We do it on edge devices for instance.

hollow sentinel
#

i can't really think of anything

#

any ideas would be helpful

#

is it too complicated?

#

like a tax v aid analysis essentially

#

if that makes sense

#

this isn't for a school project, i'm doing this for fun

left tartan
#

It's a fine question... perhaps start with a scatter plot of [tax revenue] to [financial aid]. Starting with simple graphs is a nice way to get started

hollow sentinel
#

what column am i supposed to be using here?

hollow sentinel
left tartan
#

I dunno what you’re trying to do, but column b appears to be c+d

#

Oh, c+d+j+k+l

hollow sentinel
left tartan
#

You have another data source for financial aid from gov?

hollow sentinel
left tartan
#

First step is usually exploring the data. I would probably do a few scatter plots to see how gov aid relates to corporate and or individual tax.

#

Then, I’d look at change over time; is there some relationship between previous aid and future income

#

Basic exploratory stuff, without getting into anything complex.

#

Other variables might include weather, neighboring states, economic factors like unemployment rates, etc

hollow sentinel
left tartan
#

Nah, just giving you pointers

hollow sentinel
#

here it is

hollow sentinel
#
import requests

# API endpoint URL
url = 'https://api.usaspending.gov/api/v2/recipient/state/?year=latest'

headers = {
    'Content-Type': 'application/json',
}

params = {
    'year': 'latest'
}

response = requests.get(url, headers=headers, params=params)

if response.status_code == 200:
    data = response.json()
    parsed_data = {}


    for item in data:
        state_code = item['code']  # Get the state code
        state_name = item['name']  # Get the state name


        state_info = {
            'Type': item['type'],
            'Amount': item['amount'],
            'Count': item['count']
        }


        parsed_data[state_code] = state_info


    print(parsed_data)
else:
    print(f"Error: {response.status_code} - {response.text}")
#

this is what i coded up

#

the values match, total awarded amount is 391.2 bil

left tartan
#

Might be interesting to analyze at the district level, if you can get the tax data in the same granularity.

hollow sentinel
#

and then dive deeper

left tartan
#

Makes sense. One layer at a time.

hollow sentinel
#

so i can plot the scatterplot?

desert oar
desert oar
hollow sentinel
#
merged_data = pd.concat([df, data])
print(merged_data.columns)

import seaborn as sns
sns.scatterplot(data = merged_data, x= "Amount", y="Business Income Taxes")
#

well that's not good.

#

it seems like in my haste to create a merged dataframe, everything turned into NaNs

#

seems like it's a common problem with .concat

left tartan
#

you should print the merged_data and look at the data. You'll see it didn't do what you want. You probably want .merge(), not .concat()

left tartan
#

concat adds to the end (top and bottom), merge is side by side (left/right)

hollow sentinel
hollow sentinel
tidal bough
#

the real difference i'd say is that concat is "basic" whereas merge can do an arbitrary sql-like join

hollow sentinel
#

so i have to use merge, but idk what arguments to use. currently looking at the doc.

#

does merge work if you don't have identical keys?

left tartan
hollow sentinel
#
>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8
>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7
#

these have identical values tho

#

so would that work in my case?

tidal bough
#

what do you mean you don't have identical keys? how do you know which rows of your two dataframes correspond to each other, then?

hollow sentinel
#
print(data.columns)
Index(['Total Internal Revenue collections', 'Business Income Taxes', 'Total', 'Individual income tax withheld and FICA tax', 'Individual income tax payments and SECA tax [3]', 'Unemployment insurance tax', 'Railroad retirement tax', 'Estate and trust income  tax [4]', 'Estate tax', 'Gift Tax', 'Excise Tax'], dtype='object')
tidal bough
#

so how do you know which row in data corresponds to, say, first row in df?

hollow sentinel
#

i don't 😦

#

that's the prblem

tidal bough
#

...so they aren't related? why do you want to merge them, then?

hollow sentinel
#

i'm gonna try something hang on

tidal bough
hollow sentinel
#

hmmm

tidal bough
#

(I suspect that the actual answer is that they are just in the same order - that is, the first row of df should be matched with the first row of data and so on. if that's the case, that's just a pd.merge by the index, which is the default, or equivalently a pd.concat with axis=1.)

hollow sentinel
tidal bough
#

pd.merge(df, data)

hollow sentinel
# tidal bough `pd.merge(df, data)`
Index(['Total Internal Revenue collections', 'Business Income Taxes', 'Total', 'Individual income tax withheld and FICA tax', 'Individual income tax payments and SECA tax [3]', 'Unemployment insurance tax', 'Railroad retirement tax', 'Estate and trust income  tax [4]', 'Estate tax', 'Gift Tax', 'Excise Tax'], dtype='object')
---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
<ipython-input-36-a91edcbf3e7a> in <cell line: 7>()
      5 sns.scatterplot(data)
      6 
----> 7 merged_dataframe = pd.merge(df, data)

2 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py in _validate_left_right_on(self, left_on, right_on)
   1432                 common_cols = left_cols.intersection(right_cols)
   1433                 if len(common_cols) == 0:
-> 1434                     raise MergeError(
   1435                         "No common columns to perform merge on. "
   1436                         f"Merge options: left_on={left_on}, "

MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
#

i need to somehow use these arguments: left_on=None, right_on=None, left_index=False, right_index=False

tidal bough
#

ah, I was thinking of join I think. For merge you want left_index=True, right_index=True to merge by index.

hollow sentinel
#

i see, i'll try that

hollow sentinel
# tidal bough ah, I was thinking of `join` I think. For merge you want `left_index=True, right...
merged_dataframe = pd.merge(df, data, left_index = True, right_index = True)
print(merged_dataframe.head(5))
Empty DataFrame
Columns: [State, Amount, Count, Total Internal Revenue collections, Business Income Taxes, Total, Individual income tax withheld and FICA tax, Individual income tax payments and SECA tax [3], Unemployment insurance tax, Railroad retirement tax, Estate and trust income  tax [4], Estate tax, Gift Tax, Excise Tax]
Index: []
#

hmmmm

#

that's not good either

tidal bough
#

what's df.index?

hollow sentinel
tidal bough
#

and data.index?

hollow sentinel
#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

i just want all the columns together side by side

#

from both dataframes

#

idk exactly what'ss going ono

tidal bough
#

since one of the indexes is strings and the other is ints, there's no shared indices.

hollow sentinel
#

can i cast?

#

nah casting wouldn't work

tidal bough
#

i just want all the columns together side by side
then do something like pd.concat(df,data.reset_index(), axis=1)

hollow sentinel
# tidal bough > i just want all the columns together side by side then do something like `pd.c...
<ipython-input-45-ec38f56b9666>:1: FutureWarning: In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only.
  pd.concat(df,data.reset_index(), axis=1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-45-ec38f56b9666> in <cell line: 1>()
----> 1 pd.concat(df,data.reset_index(), axis=1)
      2 

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    329                     stacklevel=find_stack_level(),
    330                 )
--> 331             return func(*args, **kwargs)
    332 
    333         # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

TypeError: concat() got multiple values for argument 'axis'
#

hmmm

tidal bough
#

argh, right, pd.concat also takes a list, so pd.concat([df, data.reset_index()], axis=1)

hollow sentinel
#

AY

hollow sentinel
# tidal bough argh, right, pd.concat also takes a list, so `pd.concat([df, data.reset_index()]...
data = pd.concat([df, data.reset_index()], axis=1)
print(data.columns)
sns.scatterplot(data = data, x = "Amount", y ="Business Income Taxes")
ValueError                                Traceback (most recent call last)
<ipython-input-51-f11a6ce936f0> in <cell line: 1>()
----> 1 data = pd.concat([df, data.reset_index()], axis=1)
      2 print(data.columns)
      3 sns.scatterplot(data = data, x = "Amount", y ="Business Income Taxes")

2 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in insert(self, loc, column, value, allow_duplicates)
   4815         if not allow_duplicates and column in self.columns:
   4816             # Should this be a different kind of error??
-> 4817             raise ValueError(f"cannot insert {column}, already exists")
   4818         if not isinstance(loc, int):
   4819             raise TypeError("loc must be int")

ValueError: cannot insert level_0, already exists```
#

whack :(((

#

This error happens when you try to reset index on a pandas data frame, but the index name conflicts with existing column names.

tidal bough
#

well, don't do it twice on the same dataframe.

hollow sentinel
#

oh yeah

#

wait but how do i fix it

#

hmmm

desert oar
#

you can also change the name of the index so if doesn't clash with a column. or just drop it entirely with reset_index(drop=True) if you don't actually need the index

#

however keep in mind that concat will still align rows by index. it's like a "full outer join", if you're familiar with database terminology. so you need to do whatever you need to do in order to make sure that both data frames have the same index

hollow sentinel
#
data = pd.concat([df, data.reset_index(drop=True)], axis=1)
print(data.columns)
sns.scatterplot(data = data, x = "Amount", y ="Business Income Taxes") Index(['State', 'Amount', 'Count', 'State', 'Amount', 'Count', 'State', 'Amount', 'Count', 'level_0', 'State', 'Amount', 'Count', 'index', 'Total Internal Revenue collections', 'Business Income Taxes', 'Total', 'Individual income tax withheld and FICA tax', 'Individual income tax payments and SECA tax [3]', 'Unemployment insurance tax', 'Railroad retirement tax', 'Estate and trust income  tax [4]', 'Estate tax', 'Gift Tax', 'Excise Tax'], dtype='object')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-880ecaf4dcf2> in <cell line: 3>()
      1 data = pd.concat([df, data.reset_index(drop=True)], axis=1)
      2 print(data.columns)
----> 3 sns.scatterplot(data = data, x = "Amount", y ="Business Income Taxes")

10 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/construction.py in _sanitize_ndim(result, data, dtype, index, allow_2d)
    696             if allow_2d:
    697                 return result
--> 698             raise ValueError("Data must be 1-dimensional")
    699         if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype):
    700             # i.e. PandasDtype("O")

ValueError: Data must be 1-dimensional```
tidal bough
#

if you don't need data's index one can just do pd.concat([df, data], ignore_index=True, axis=1)

desert oar
#

i see several duplicate columns. duplicate columns are always going to cause a problem, i really wish pandas would just prohibit it

hollow sentinel
#

that's probably bc i ran the code multiple times

#

shit

tidal bough
#

ah, good eye. I suspect that's a result of replacing data with concat(df, data) several times

#

don't reuse your variable names! if you chain a lot of operations, use pandas's method chaining instead; that's cleaner anyway.

hollow sentinel
#

omg i didn't realize

#

fuck

#

ugh

#

how do i fix this?

desert oar
#

i don't entirely agree about the method chaining, as soon as you make a typo or introduce a bug, you end up needing to create intermediate variables anyway to figure out where the bug was. and overall yes, descriptive names are not only useful for your own comprehension, but also because you don't end up re-running the same code over and over causing problems like this

desert oar
# hollow sentinel how do i fix this?

reload from scratch and start again. if your notebook does not run cleanly from top to bottom, stop whatever you are doing and fix it so that it does

hollow sentinel
#

reload from scratch?

untold bloom
desert oar
#

yes, read csv all over again @hollow sentinel

desert oar
tidal bough
desert oar
#

it ignores the index along the axis that you are concatenating, not "the index"

untold bloom
#

RangeIndex-es the concatenation axis

#

after concatenating

desert oar
#

pandas terminology is an absolute shit storm

tidal bough
#

aaah, right

hollow sentinel
#

ok i restarted and ran all cells

#
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-27da461b5c0f> in <cell line: 3>()
      1 merged_data = pd.concat([df, data], ignore_index=True, axis=1)
      2 print(merged_data.columns)
----> 3 sns.scatterplot(data = merged_data, x = "Amount", y ="Business Income Taxes")

4 frames
/usr/local/lib/python3.10/dist-packages/seaborn/_oldcore.py in _assign_variables_longform(self, data, **kwargs)
    936 
    937                 err = f"Could not interpret value `{val}` for parameter `{key}`"
--> 938                 raise ValueError(err)
    939 
    940             else:

ValueError: Could not interpret value `Amount` for parameter `x`
#
sns.scatterplot(data = merged_data, x = "Amount", y ="Business Income Taxes")
desert oar
hollow sentinel
#

meaning the column doesn't exist in the merged dataframe

#

why does it not exist?

hollow sentinel
desert oar
hollow sentinel
desert oar
agile cobalt
#

Can you show what df and data look like before merging/concatenating them?

hollow sentinel
# agile cobalt Can you show what `df` and `data` look like before merging/concatenating them?
                      Total Internal Revenue collections  Business Income Taxes         Total  Individual income tax withheld and FICA tax  Individual income tax payments and SECA tax [3]  Unemployment insurance tax  Railroad retirement tax  Estate and trust income  tax [4]  Estate tax   Gift Tax  Excise Tax
United States, total                        4.901514e+09            475871099.0  4.321609e+09                                 3.089258e+09                                     1.133996e+09                   7046465.0                6148312.0                        85160093.0  28909393.0  4445883.0  70679117.0
Alabama                                     3.605756e+07              1936430.0  3.356064e+07                                 2.368409e+07                                     9.255441e+06                     73015.0                   3525.0                          544565.0    267069.0    30921.0    262500.0
Alaska                                      6.572445e+06               150882.0  6.323953e+06                                 4.423914e+06                                     1.689140e+06                     11933.0                   2008.0                          196959.0     35788.0       98.0     61723.0
Arizona                                     7.181487e+07              5116779.0  6.473972e+07                                 4.507181e+07                                     1.886567e+07                    129407.0                   2013.0                          670817.0    209065.0    30814.0   1718491.0
Arkansas                                    4.023197e+07              4846558.0  3.446407e+07                                 2.699580e+07                                     6.983808e+06                    141622.0                   2969.0                          339873.0    157175.0   133501.0    630661.0
``` this is data
#
 State        Amount   Count
0    AK  1.493428e+10   24395
1    AL  5.712495e+10   83522
2    AR  2.984787e+10  114712
3    AS  4.678064e+08     909
4    AZ  1.015892e+11   61249
``` this is df
agile cobalt
#

opening on notepad without word wrap, it looks like the first row's index is United States, total?

hollow sentinel
#

.to_list doesn't seem to wokr

#

AttributeError Traceback (most recent call last)
<ipython-input-15-e0a86854e54e> in <cell line: 1>()
----> 1 print(data.to_list())
2 print(df.to_list())

/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in getattr(self, name)
5900 ):
5901 return self[name]
-> 5902 return object.getattribute(self, name)
5903
5904 def setattr(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'to_list'

#
print(data.to_list())
agile cobalt
#

ah, that only exists for series/indexes, you can use to_dict though

hollow sentinel
#

oh, ok.

agile cobalt
#

anyway, are you 1000% sure that the data is properly aligned the exact way you want?

hollow sentinel
#

not really

agile cobalt
#

How do you want to align it then? join on the State name/code?

hollow sentinel
#

but the values aren't the same iirc

#

one has abbreviations, the other has full names

agile cobalt
#

you can just map name -> code or vice-versa (though you'll need to get a list of every state's name and code)

hollow sentinel
#

hmmm

hollow sentinel
agile cobalt
#
state_names = {
    "AL": "Alabama",  # this is what AL means right?
    # ...
}
data["state_name"] = data["state"].map(state_names)

pd.merge(left=df, right=data, left_index=True, right_on="state_name")
hollow sentinel
#

oh a dictionary. ok, yeah AL is alabama

agile cobalt
#

you'll probably want to get the dictionary from Google or something instead of writing it yourself, though you might have to parse it from a list into a dictionary yourself

agile cobalt
#

might work but I wouldn't trust it for this lol

#

maybe the parsing part, but accurately listing all states, not really

hollow sentinel
agile cobalt
#

you will also have to decide wtf to do about the United States, total

#

it should contain the sum of all states in data?

hollow sentinel
#

you're right, idk how to handle it. yep.

#

all the states summed

agile cobalt
#

should've checked earlier but just to check: data contains exactly one row per state right?

hollow sentinel
#

huh?

agile cobalt
#

is data missing any states?
does data have the multiple lines with the same state?

hollow sentinel
#

from what i can see, it has all states from alabama to wyoming

#

lemme see

agile cobalt
#

try something like ```py
totals = data[["Amount", "Count"]].sum()
totals["state"] = None
totals["state_name"] = "United States, total"
totals = pd.DataFrame(totals).tranpose()
data = pd.concat([data, totals], axis='rows')

hollow sentinel
# agile cobalt try something like ```py totals = data[["Amount", "Count"]].sum() totals["state"...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-22-d444038c6275> in <cell line: 4>()
      2 print(df.head(5))
      3 
----> 4 totals = data[["Amount", "Count"]].sum()
      5 totals["state"] = None
      6 totals["state_name"] = "United States, total"

2 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name)
   6128                 if use_interval_msg:
   6129                     key = list(key)
-> 6130                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   6131 
   6132             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Index(['Amount', 'Count'], dtype='object')] are in the [columns]"
agile cobalt
river cape
#

Can anyone give me a roadmap of building a chatbot

hollow sentinel
agile cobalt
#

print(repr(dataframe.columns))

hollow sentinel
# agile cobalt print(repr(dataframe.columns))
Index(['Total Internal Revenue collections', 'Business Income Taxes', 'Total', 'Individual income tax withheld and FICA tax', 'Individual income tax payments and SECA tax [3]', 'Unemployment insurance tax', 'Railroad retirement tax', 'Estate and trust income  tax [4]', 'Estate tax', 'Gift Tax', 'Excise Tax'], dtype='object')
agile cobalt
#

in this case we care about data, not df

hollow sentinel
agile cobalt
#

hmm weird

#

and it gave an error when you did data[["Amount", "Count"]]?

hollow sentinel
#

yes

agile cobalt
#

try again, just that on its own, and see if it gives the same error

hollow sentinel
agile cobalt
#

I'm willing to bet that the global state got messed up and data was not containing what it should when you tried it the last time

hollow sentinel
hollow sentinel
# agile cobalt I'm willing to bet that the global state got messed up and `data` was not contai...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-27-fb57bb0e3443> in <cell line: 5>()
      3 print(repr(df.columns))
      4 
----> 5 print(data[["Amount", "Columns"]])

2 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name)
   6128                 if use_interval_msg:
   6129                     key = list(key)
-> 6130                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   6131 
   6132             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Index(['Amount', 'Columns'], dtype='object')] are in the [columns]"
``` same thing
agile cobalt
#

yeah no clue, try debugging on your own for a bit (as in, try to get the code I suggested working, or do something akin to it) or try a different approach (as in, do soemthing completely different from my suggestion)

hollow sentinel
feral kernel
#

Hey, why can't I convert all my images into one massive tensor, train on that one massive tensor by breaking it down into tiny matrices and train one matrix one by one. I tried that, it keeps me giving me.I guess i will train one by one.

desert oar
#

it says those columns are missing; well, are they? check the data. what columns are there, if not those? how does it differ from what you expected? where could the difference have arisen? and so on

smoky dome
#

Hey
I'm currently working on an AI for an ultimate tic-tac-toe game. I am currently developing a learning method for the neural network (below). I am working with a policy (probabilities of the individual moves) and a value. I have tested it for manually created boards. This worked with the optimiser SGD. However, with Adam the loss of the policy got worse and it gave 1 move the probability 1 and otherwise 0 instead of my distribution. Can anyone help me with the reason for this?

def train_neural_net(self, dataset, epoch_start=0, epoch_stop=20, cpu=0):
        torch.manual_seed(cpu)
        self.model.train()
        criterion = AlphaLoss()
        optimizer = torch.optim.SGD(self.model.parameters(), lr=0.003)
     
        train_loader = DataLoader(dataset, batch_size=1, shuffle=True, num_workers=0, pin_memory=False)

        for epoch in range(epoch_start, epoch_stop):
            total_loss = 0.0

            for i, data in enumerate(train_loader, 0):
                state, policy, value = data

                optimizer.zero_grad()
                policy_pred, value_pred = self.model(state)

                loss = criterion(value_pred, value, policy_pred, policy)
                loss.backward()
                optimizer.step()

queen junco
#

I'm making a learning chat so that works like copilot ai that saves everything the user asks as it learns words but rn I need code that will check if something's a English word not

serene scaffold
queen junco
serene scaffold
# queen junco No

am I not speaking English when I tell you that I'm eating a croissant?

queen junco
serene scaffold
queen junco
#

No I'm interested in English words

serene scaffold
#

What is an English word?

queen junco
#

I already fixed it

#

Word originated by English people

serene scaffold
queen junco
#

If you search up fehe it will search your address to on pc

#

Itle also show healthcare

serene scaffold
#

@queen junco you shouldn't drop discord gifts in this server. selfbots will snipe them

queen junco
#

Nuh uh

serene scaffold
#

nuh uh?

queen junco
#

discord.gift/Udzwm3hrQECQBnEEFFCEwdSq

verbal sand
#

I've this python datafram and I want to get the percentage gain or loss for a particular company
%gain or loss = 100 * {(Avg Traded Price for Sell - Avg Traged Price for Buy)/Avg Traged Price for Buy}

#

Can anyone help me with this one please?...there's a groupby and apply function that I'm struggling to code for this.

serene scaffold
serene scaffold
#

also, in scientific contexts, you do percentages between 0 and 1. not 0 and 100

#

so don't multiply it by 100.

verbal sand
#

Doesn't look like a good approach.

serene scaffold
#

it's a great approach.

#

if you don't want to do it like that, your alternative is to pivot it so that buy and sell are the two columns

#

and then you can do {(Avg Traded Price for Sell - Avg Traged Price for Buy)/Avg Traged Price for Buy}

verbal sand
#

ya pivoting on buy and sell seems the right way

#

df.groupby('Company').apply(lambda x: )

serene scaffold
#

no.

#

!docs pandas.DataFrame.pivot_table

arctic wedgeBOT
#

DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)```
Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
verbal sand
#

I'm having a hard time forming the right lambda function for this

serene scaffold
#

the solution does not involve a lambda

#

so if you write one, the solution is automatically wrong.

verbal sand
serene scaffold
#

and for for-loops, it's closer to .95

verbal sand
#

what are the decimal values you're specifying. i don't get that

serene scaffold
#

80 percent of the time, if you think the solution to a pandas problem involves a lambda, that's wrong

#

and 95 percent of the time, if you think the solution to a pandas problem involves a for loop, that's wrong.

#

did you come up with code to do the pivoting?

verbal sand
#

Oh nice! I know using loops on dataframes is a terrible idea...but didn't know about lambda

#

i'm writing it

serene scaffold
#

.apply circumvents pandas optimizations almost as egregiously as for loops do

#

that is, you only get optimizations if you're using pandas' native methods, and if you're looping or applying non-pandas functions/methods, you're not.

verbal sand
#

Got it!

#
table = pd.pivot_table(df, index=['Company'], columns=['Side'], aggfunc="sum")
#

oh wait how do we highlight the code 😅

#

This is what I get. Not sure how can I subtract the buy and the sell column above

agile cobalt
#

MultiIndexes are powerful, but can be way more of a pain than useful most of the time
I'd recommend for you to just do table.columns = ['buy', 'sell']

serene scaffold
serene scaffold
#

@verbal sand did you get it?

shy rock
#

Does anyone has a script to find ( Extract) all the tables in a sql query including nested sub querry tables

frosty goblet
#

hey guys how does one plot an x axis and y axis on matplotlib?

silent patio
#

@frosty goblet you can also use pandas which uses matplotlib as its core. pandas should be easier to use with a lot of examples online (w3schools and such)

frosty goblet
hoary jay
#

If i have two n-dimensional matrices (say embeddings of words) what's the best way to calculate correlation between them? Something better than cosine similarity?

desert oar
weary crown
#

https://hastebin.com/share/ufujobacah.python

running a vgg19 on google quickdraw dataset - when i train my code in model.fit, it says that it has shape 999,224,224, 3 and expected 224,224,3
i understand what that means - that i have to separate my individual images
but why it cant train on the whole dataset
after all, x_train and y_train include all 999 training instances ...
why do i have to separate them each and how
isnt the whole model trained at once :9
:(

#

model.input_shape is (224 ,224, 3) which makes sense since that's the dimensions of my image - but when im fitting/training the model, shouldnt that input be (999, 224, 224, 3) since there are 999 images to be trained??

#

its weird that during training it wants 1 image ... instead of all of them ... and im sure model.fit doesnt go in a loop lol

final spruce
#

What is meant with ∆ in the neural network backwards phase

agile cobalt
weary crown
agile cobalt
#

what is that 999 then, if not the batch size?

weary crown
#

batch size rn is 32 - the thing i dont understand is why the model seems to want to train on 1 image at a time instead of the whole x_train

agile cobalt
#

what do you think that the batch size is in first place?

#

and what's its purpose

weary crown
agile cobalt
#

and how many examples are you trying to fit into 1 pass?

weary crown
#
# Train the model
batch_size = 32
epochs = 20

model.fit(
    x=x_train,
    y=y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_test, y_test)
)```
#

x_train shape is (999, 224, 224, 3) bc it has all 999 training things in it

#

but model.fit wants (224, 224, 3) so i do train in a for loop from 0 -> 999 or what bc im pretty sure u dont do that

agile cobalt
#

never mind, I thought that you had to break it down into batches yourself but looks like keras does that for you

#

yeah no clue, paste the actual traceback?

weary crown
#
Epoch 1/20
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-086b6a8fd937> in <cell line: 11>()
      9 # Make sure the input shape matches the model's expected input shapeTd
     10 
---> 11 model.fit(
     12     x=x_train,
     13     y=y_train,

1 frames
/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py in tf__train_function(iterator)
     13                 try:
     14                     do_return = True
---> 15                     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16                 except:
     17                     do_return = False

ValueError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1377, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1360, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1349, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1126, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(None, 999, 224, 224, 3)```
agile cobalt
#

you should have mentioned those extra Nones

#

mind checking what exactly is x_train.shape again?

weary crown
#

(9, 999, 224, 224, 3)

#

tbh idk what the 9 is doing there

agile cobalt
#

yeah where tf did that 9 come from x-x

weary crown
#

so js get rid of it?

agile cobalt
#

right now it thinks that you have 9 items, each containing the (999, 244, 244, 3) shape

weary crown
#

train test split somehow fucked up my stuff

agile cobalt
#

did you stack the different images instead of concatenating?

#

what was the shape before that split

weary crown
#

(12, 1000)

#

js get rid of the 9 somehow or

agile cobalt
#

right now you have something like cat [1, 2, 3] dog [4, 5, 6] bird [7, 8, 9] instead of cat 1 cat 2 cat 3 dog 4 dog 5 dog 6 bird 7 bird 8 bird 9 and it's splitting like ```
Train
cat [1, 2, 3]
dog [4, 5, 6]

Test
bird [7, 8, 9]

weary crown
#

ohh

#

so removing the 9 will fix that ok lemme try that

agile cobalt
#

not just removing in any arbitrary way, but you have to reshape it at the right place, or change the way you're creating it in first place

weary crown
#

bc like i had the data originaly correct

#

img_data was (12, 1000) - 12 classes each with 1k iamges

#

but after i split, a 9 popped up from nowhere

agile cobalt
#

it probably split 9 classes into train, 3 classes into test

weary crown
#

ohh yeah it did do taht

agile cobalt
#

it shouldn't have a "classes" layer though

#

it should be just (12000, 244, 244, 3) pre-split

weary crown
#

ohh

#

oh wait

#

@agile cobalt img_data is (999, 224, 224) pre split

agile cobalt
#

did you do something like thing = thing[0, ...]

weary crown
#

no

#

@agile cobalt img data shape works actually

#

12 classes, 999 instances, 224x224 per intsance

#

img_classes pre split is 12x1000 which works

#

so after trani test split its fucking up somehow

#
# split data into training and testing
x_train, x_test, y_train, y_test = train_test_split(
    img_data,
    img_classes,
    test_size=0.2,
    random_state=42,
    shuffle=True
)``` this is how i split @agile cobalt
agile cobalt
#

Again, for the last time, you should NOT have the 12, in img_data

weary crown
#

pre split:
img data (12, 999, 224, 224)
img classes (12, 1000)

post split:
x_train (9, 999, 224, 224, 3, 3)
x_test (3, 999, 224, 224)
y_train(9, 1000, 3, 3)
y_test (3, 1000)

#

9 + 3 = 12 so i get that ig

agile cobalt
#

The 999 and 1000 should be the same number (does not matters if 999, if 1000, or any other given number), right now you have 999 images per category but 10000 labels per category

weary crown
#

idk why its off by 1

agile cobalt
#

It should be```
pre split:
img data (12000, 224, 224)
img classes (12000)

post split:
x_train (9000, 224, 224, 3)
x_test (3000, 224, 224, 3)
y_train (9000)
y_test (3000)
``` and make sure that it is shuffled properly (primarily, make sure that y_test contains all 12 values at least once)

#

other than that, good luck getting it to the right format

weary crown
#

oh wait i think i might have it

#

but why is there a 999 instead of 1k lol @agile cobalt

#

i flattened it so that its like (12000, 224, 224)

#

but its (11988, 224, 224)

#

999 * 12

#

idk why its 999

#

shti is dumb asl

#

literally 0 clue how its 999 data but 1000 targets

#

i literally imported the data to be 1000 how tf are there 999???

#

now when i ask for 1001 drawings i get 1001 labels and 1000 data

#

wtf thas so weird now the other one is wrong?? what is going on ...

#

anyone know how this happens lol weirdest error ive ever faced

#

@agile cobalt ran drawing_count on it and it seems that every category l oads in 1000 images like expected

#

so idk where the 999 data is from

feral kernel
#

Yo, is mojo any good for ML?

agile cobalt
#

I would wait until they release a 1.0 version before really trying to use it - it is still missing a lot of core, essential features right now

left tartan
lapis sequoia
#

I need some ML ideas

serene scaffold
lapis sequoia
#

no

serene scaffold
lapis sequoia
#

No

#

I do not want to keep doing finance bro stuff

#

maybe optimization

hard pebble
#

I'm having so much trouble getting Tensorflow to work on my Macbook Air M1. Despite installing it on the terminal with 'pip install tensorflow' when I import tensorflow to my program, it always says it cannot find module tensorflow

#

Is there a different or more specific way I have to install tensorflow?

serene scaffold
hard pebble
serene scaffold
hard pebble
#

For the code editor - /Users/'name'/Desktop/pythonProject1/venv/bin/python
For the terminal - /Users/'name'/ENTER/bin/python

serene scaffold
#

so you need to install tensorflow in the environment that is being used by the editor

hard pebble
#

Ahhh okay. Thanks for your help!

serene scaffold
#

you can delete import sys; print(sys.executable); exit()

#

thank you for following the instructions exactly (a lot of people do not)

hard pebble
#

I'll definitely be back if I still run into issues lol

#

Seems google collab is pretty useful too. It's working fine on google collab

quaint loom
desert oar
weary crown
#

ok guys
finished writing code and flask bcakend but it shows this

googling shows its some error bc different dependencies but when i update tensorflow on pycharm vs google colab, former updates to 2.15.1 while latter upgrades only to 2.14.1
and im not sure thats the issue
but its prob some discrepancy? idk
how do i fix the deps diff

#

maybe smth is going on tho

#

i can link the other code if yall want

pine prawn
#

Hello there. I'm facing compatibility issues.

I'm expected to run two models on the same Conda environment. However:

Model A, requires tensorflow 1.x and my CUDA needs not be over 10.0 to be compatible (tf 1.15 - CUDA 10.0)
(according to https://www.tensorflow.org/install/source#gpu)

Model B, requires pytorch and the latest version of PyTorch that supports 10.0 is PyTorch v.1.2.0
(according to https://pytorch.org/get-started/previous-versions/)

However, there are a lot of Pytorch versions that support CUDA 10.2

My question is: Is it okay for me to use CUDA 10.2 instead? Will tensorflow 1.15 support it despite the website seemingly suggesting otherwise?

Downloading CUDA and its cudnn takes a lot of data so I want to be a bit more careful before downloading them (and they may corrupt my another current functioning environment, I think).

pine prawn
#

Are you sure? PyTorch 1.2.0 sounds very outdated

sage bolt
#

CUDA 10.0 would be the best option for compatibility with both TensorFlow 1.15 and PyTorch 1.2.0.

pine prawn
#

In fact, model B's official documentation suggests that PyTorch needs to be >= 1.6.0

pine prawn
#

But PyTorch 1.6.0+ needs CUDA 10.2+

sage bolt
#

i d k you have to choose one of these options

  • PyTorch 1.2.0 (for CUDA 10.0)
    or
  • TensorFlow 2.3 (for CUDA 10.2+)
    or
  • TensorFlow 2.4 (for CUDA 10.2+)
    or
  • TensorFlow 2.5 (for CUDA 10.2+)
    or
  • TensorFlow 2.6 (for CUDA 10.2+)
    or
  • TensorFlow 2.7 (for CUDA 10.2+)
wheat thicket
#

hlo i am new here

long canopy
#

what do you guys follow for "serious" AI-related news? By serious I mean, stuff that gets into the details, aimed at programmers

long canopy
#

there's probably new journals popping up nowadays and that sort of thing

cosmic willow
#

i just want to playaroud with ai is there some module or something that i just give a buch of inputs and outputs with a value descreibing how good that step was and after some time it retuns a network?

frail mauve
#

I have to build 3 endpoints for work:

  1. Ring tryon
    Input is hand image
    I know how to identify hand landmarks with Google's Mediapipe
    I just dont know how I will size the image according to the hand and stick that image to a particular finger.

  2. Bracelet Tryon
    Input is hand image
    How do I identify the wrist
    How do I size the image according the wrist size and stick the image on the wrist

  3. Earrings trying
    Input is face image
    How do identify ear
    How do I size the image according the ear size and stick the image on the ear

Please help me with libraries etc...

frail mauve
#

Please tag me

golden ridge
#

hello guys, i have a linear regresion model and i want a graph to plot the accuracy, but the problem is that my dataframe has several columns. how would i plot them into a graph and then draw the linear regression line

serene scaffold
golden ridge
serene scaffold
#

For future reference, any time you need help in connection to a dataframe, you should show it. Because a dataframe could have unlimited possible schemas.

golden ridge
# serene scaffold Can you show the dataframe?
0    2    4.0    4.0    1.0    83.391    3    17.8    34.4    52.0    0
1    2    5.0    5.0    1.0    83.104    3    17.8    34.0    52.0    0
2    2    6.0    6.0    1.0    82.843    3    17.7    33.7    52.0    0
3    3    3.0    11.0    2.0    83.460    3    18.0    33.1    52.0    0
4    3    4.0    12.0    2.0    81.994    3    18.0    33.0    51.0    0
...    ...    ...    ...    ...    ...    ...    ...    ...    ...    ...
931    2    6.0    6.0    1.0    80.312    3    16.4    26.1    72.0    0
932    1    2.0    9.0    2.0    83.470    3    16.4    25.4    73.0    1
933    2    2.0    2.0    1.0    81.678    3    16.4    26.3    75.0    0
934    2    4.0    4.0    1.0    80.535    3    16.4    26.5    76.0    0
935    2    6.0    6.0    1.0    80.380    3    16.3    26.0    73.0    0
936 rows × 10 columns```
pine void
#

hello everyone. I just ran a tensor flow model that took an hour on a google collab. How can i save it so i do not have to re run my code because normally google collab makes me re run all my code

pine void
serene scaffold
dense crane
serene scaffold
lapis sequoia
#

Hello

dense crane
#

Nad not this is an issue

craggy crescent
#

Hey everyone! Im new here
I need to know if data engineering is considered a good career in or not. I am interested in building databases and cleaning data. Not sure if there is enough job offers as a DE or not. Could someone please let me know morw details?

golden ridge
long canopy
#

what are some AI-related news sources targetted at comp sci people and programmers?

agile cobalt
#

I personally follow The Batch newsletter and these two youtube channel: bycloudai and AI Explained

The Batch | DeepLearning.AI | AI News & Insights

Weekly AI news for engineers, executives, and enthusiasts.

jaunty mural
#

Hello, I haven't looked in a while on server, especially this thread. Is there recent trends or some huge improvement in data science analyze tools in Python?

agile cobalt
#

I guess that pola.rs has been gaining traction (but is still not anywhere near as popular as pandas)
other than that, all the generative AI stuff I guess

pearl barn
#

For people who uses Jupyter how do you save your progress can I save same project with different names ? And what I should do to when kernel is dead??

left tartan
#

Just restart the kernel.

lapis sequoia
#

Do any of you mess with optimization?

feral kernel
#

Hi, ```def forward(self, fft_result_tensors):
# Forward pass through the network

    # Use MPS if available (optional)
    try:
       import torch.backends.mps
       is_available = True
    except ImportError:
                      is_available = False

    if is_available:
                   device = "mps"
                   fft_result_tensors = [tensor.to(device) for tensor in fft_result_tensors]
                   self.to(device)```
 78     self.to(device)
 80 # Compress the input before feeding it to the model

AttributeError: 'list' object has no attribute 'to'``` Why is there this error? I enumerated each tensor in the list already

proven ruin
#

hey guys

#

i've been working on a personal project for a while now

#

it's a simple weight tracker app

#

I would like my program to have a built in garth inside of it that its UI is written in tkinter

#

what Library do you recommend to plot dates and weight ?

hard pebble
#

Is there any benefits to using Tensorflow over PyTorch or is it all preference?

versed pilot
trim saddle
versed pilot
trim saddle
#

cleaner way would be to export certain functionalities to a .py file, once tested/developed in the notebook and import it afterwards for usage.
Then you have your logic defined in .py files which dont have all the meta information like notebook diffs

#
#

i can also recommend pyscaffold + the data-science extension, which sets up a whole project template for that workflow

#

(also developed by florian wilhelm)

umbral charm
#
def secant(f, a, b, tol):
    x0 = a
    x1 = b
    n = 0
    while abs(x0 - x1) > tol:
        n = n + 1
        x2 = x1 - f(x1)*(x1 - x0)/(f(x1) - f(x0))
        x0 = x1
        x1 = x2
        print(x0)
    return x2, n
#

this my secant method, when i call it with a fucntion and some tolerances it prints the root twice and thus one extra iteration but i cant seem to find out why

#

1.0319286204529856
0.9929544004363596
1.0004259265358206
1.0000054550936772
0.9999999957379389
1.0000000000000426
1.0
1.0
(1.0, 8)

trim saddle
#

Print the difference maybe too, i guess when x0 is 1, x2 still has a bigger value, which causes the while to eval to true an extra time?

umbral charm
#

that seems to be the problem

#
1.1 1.0319286204529856 1.0319286204529856
1.0319286204529856 0.9929544004363596 0.9929544004363596
0.9929544004363596 1.0004259265358206 1.0004259265358206
1.0004259265358206 1.0000054550936772 1.0000054550936772
1.0000054550936772 0.9999999957379389 0.9999999957379389
0.9999999957379389 1.0000000000000426 1.0000000000000426
1.0000000000000426 1.0 1.0
1.0 1.0 1.0
#

this is x0 x1 and x2 printed out

#

So is my root found when x1 and x2 are 1 or when all 3 are one

trim saddle
#

X0 and x1 have to be 1 and for your while loop to not continue, then also x2, because x1=x2

south gull
#

If you just want to serve models, whatever is available and most convenient on your target platform

versed pilot
lapis sequoia
south gull
# versed pilot But isn't their hardware TPU optimised for Tensorflow? There was a chart about G...

There are a couple of disctinctions you need to make. Importantly, whether you care about training or serving models. For the former, you care about the autograd engine. PyTorch does support a wide range of accelerators, and TPU is one of them if memory serves right (easy to check yourself, don't take my word for it). If you care about serving models, then the platform is much more important. What you want to do there, is compile your model with a graph compiler. For example, if you deploy on nvidia jetson devices, you'll want to use TensorRT

#

You can convert both PyTorch and Tensorflow implemented models with loaded weights to various formats, such as ONXX

left tartan
long canopy
#

what graph modeling options do I have if I need to:

  1. Name nodes,
  2. Name edges/links
  3. Set edge directions
  4. Navigate a graph of more than 1000 nodes
#

i'm most of all interested in navigation, i.e., if there exists a ready-made interface to move about such huge graphs

safe ermine
#

what is the best way to learn ai programming in python from scratch?

left tartan
safe ermine
#

the most advanced thing ik rn is like the basics of arrays of records

left tartan
safe ermine
#

what kinda things come under the fundementals

#

bc i’m not sure where to go from where i am now

#

bc currently i’m just following the stuff i need for school

left tartan
shell ruin
#

Im doing an ML refresher and starting off with the basic housing price predictor. I was playing around with seaborn for some EDA and got an empty-ish heatmap. Does anyone have any thoughts as to why?

long canopy
#

information information information, yeah yeah. informatiooooooooooooooooooooooooooooooooooooon (and data), yeah

#

turns out Gephi is great for graph visualization

left tartan
shell ruin
left tartan
left tartan
#

That last one is a correlation matrix. The .corr is the important piece missing.

shell ruin
#

So I did try it using .corr(), assuming that I would only get numeric columns, but using .corr() I was getting an error as it was trying to convert a string field to a float. Which of course would not work.

#

Huh, okay, I got it. So by not relying on .corr() to select the numeric columns and doing it manually, I was able to recreate the matrix

#

Just in case someone else ever needs an example

left tartan
#

There are other methods for correlation of categorical fields, but yah, great!

shell ruin
left tartan
shell ruin
#

Sounds good, thanks!

quaint loom
desert oar
#

that's also maybe too much code to expect someone to debug without additional context

long canopy
#

Any communities focused on local LLMs?

serene scaffold
lapis sequoia
#

What do you use for big data? And is all of this actually a job?

quaint loom
# desert oar it's not clear to me what you're asking... what's a "random forest test"?

So, basically a “random forest” is a technique in machine learning where a group of decision trees work together to improve predictions. Kinda like statistical test. So my question is why my code is not grouping my location out. I have been sampling at several areas, each areas have 4 site. So I have grouped them together. But it seems like my code have some mistakes, making every areas together instead of grouping them together. I hope this makes it more clear ☺️

desert oar
quaint loom
desert oar
#

i see... i wouldn't really say that's "kinda like a statistical test"

#

methodological issues aside, you should just be able to fit the model in scikit-learn and the extract feature importance scores

#

i don't know how this relates to grouping of locations. i think you might need to explain your actual goal more

quaint loom
#

The Method itself doesn’t have anything with the grouping bur rather different locations where samples have been collected. Lets say that : At restored area 1, parameter X1 is significantly influenced by the depended variable but at Unrestored area 2, X2 is significantly influenced by the dependent variable. So, I want the Random forest method to do the method on each ground (Restored are 1, Restores area 2, Urestored are 1 and Unrestores area 2 Maybe that is where the problem is, the machine learning method isn’t doing the method individually on each group but rather on the entire dataset.

desert oar
quaint loom
#

Maybe I am using the term different. I am using the dependent variable as : CH4 (methane) flux can be a dependent variable, while Total Nitrogen (TN), Total Phosphorus (TP), and chlorophyll-a (chl a) are independent variables. The Random Forest model would help in understanding how well these variables predict CH4 flux and which of them is most important for the prediction.

odd meteor
trim saddle
rapid charm
#

Hi, I am very new to using Python through the command prompt and I am having trouble downloading pytorch. I keep getting the error: ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch
How can I resolve this error?

shut girder
#

Hello, I'm currently very confused. Why do people say that exploratory data analysis is a step in data analysis? I thought exploratory data analysis is just an approach to data analysis and is the "full picture," meaning that once conclusions are drawn from EDA, those conclusions can be used for decisions. Am I getting the processes of data analyst wrong?

trim saddle
rapid charm
#

Ok, will do. Thank you.

trim saddle
snow fog
#

is there a library which will extract the text from pdf as it is , let me share an example pdf , in below pdf colums are space separated but after using an ocr library i am getting some colums as combined and columns as new line seperated , I can use some pdf extraction library but again it is discarding the space and first 3 coulmns and last 5 are not the problem for me but middle 3 are , if there exisits a library which will extract as it is then i can extract column on the basis of number of char it takes

trim saddle
tidal bough
deft spire
#

Can I ask for help for unity ml agents here? Technically backend is in python

past meteor
# quaint loom Maybe I am using the term different. I am using the dependent variable as : CH4 ...

In principle you can get the feature importance out of your model but if your variables are correlated you can't really say anything interesting.

Example:

Most methods have a more or less inbuilt regularisation method. If variable A is perfectly correlated with variable B and this is again perfectly correlated with the independent variable you'd split on A first and see that B no longer gives extra information on predicting the dependent.

To make it worse, you run your experiment again with a different seed, now it splits on B first.

Can you see what the problem is? The feature importance method you use will claim A is important and B has 0 importance and vice versa in the second run. This is patently untrue in reality, both of them are highly predictive.

This is what I meant with "you can't say anything interesting". At best you can say that your specific model instance holds these variables as important/unimportant but this type of claim has a very low internal validity let alone external validity.

pearl barn
#

Do anyone know good discord server for learning data analysis and finding projects to apply what I learn ??and ask questions about problems I got within my projects or codes??

pearl barn
#

I'm studying python for data analysis from freecodecamp ia that a good source to start from??

wooden sail
#

my turn to ask for help: what's the proper way of managing/changing the BLAS backend for numpy and similar packages in windows? ideally with conda

the context: a new AMD cpu for which MKL doesn't perform so well, and the old MKL flag trick was patched around 2021. because of that, i want to use openBLAS as a backend instead, but ideally also be able to switch to MKL when needed (necessary for some tests, since my code will ultimately run on an intel cluster)

past meteor
maiden arch
#
import matplotlib.pyplot as plt
import numpy as np

# Read CSV file
csv = pd.read_csv('/home/needjobcoder/devlopment/python/dataSciencePractice/practice/stockMarket/archive/ADANIPORTS.csv')

# Extract dates and volumes
dates = csv['Date']
volumes = csv['Volume']

# Create a bar plot
fig, ax = plt.subplots()
ax.bar(dates, volumes, width=1, edgecolor="white", linewidth=0.7)

# Set labels and title
ax.set(xlabel='Date', ylabel='Volume', title='Volume Over Time')

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

# Show the plot
plt.savefig('output_plot.png')

# plt.show()
#

volumes /home/needjobcoder/devlopment/python/dataSciencePractice/venv/bin/python /home/needjobcoder/devlopment/python/dataSciencePractice/practice/main.py 0 27294366 1 4581338 2 5124121 3 4609762 4 2977470 ... 3317 9390549 3318 20573107 3319 11156977 3320 13851910 3321 12600934

#

dates 0 2007-11-27 1 2007-11-28 2 2007-11-29 3 2007-11-30 4 2007-12-03 ... 3317 2021-04-26 3318 2021-04-27 3319 2021-04-28 3320 2021-04-29 3321 2021-04-30

boreal gale
# maiden arch dates ```0 2007-11-27 1 2007-11-28 2 2007-11-29 3 2007-1...

i think your first problem is that your dates here are still not parsed as proper datetime objects, pandas/matplotlib has no choice but to literally plot every single date as string, so you would get a cluster of black when they overlap due to lack of space - you need to read csv with the parse dates arg see: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#:~:text=%3DTrue%2C-,parse_dates,-%3DNone%2C or manually parse it with pd.to_datetime() (and overwrite it column for example)

maiden arch