#data-science-and-ml

1 messages · Page 388 of 1

misty flint
#

imblearn? are your classes that imbalanced? that may be the reason tbh

safe viper
#

It's just so discouraging when you feel like you can't do anything to solve the problem.

#

The field feels like such a black box

misty flint
#

yep ×100

safe viper
#

I'm only in second year of college. I really wanted to go into ML but this like broke that. Sorry im like ranting but its just so annoying.

#

Maybe just the more data science part of it is not for me

#

arg

misty flint
#

you shouldnt get discouraged

#

these are tough problems and it doesnt mean you arent good at it

#

im a grad student and i basically get the same results as you

misty flint
#

im reinventing the wheel again in my DL class with minitorch

#

and sometimes it feels like im just banging my head against the table

misty flint
#

interesting read talking about applications in machine learning + cybersecurity

pseudo wren
#

Yes I am

urban prism
#

I have an object detection problem with two types of images. Cards and random images without cards. Cards have their bounding boxes and the other doesn't. I only have the images for my model to be able to classify that there aren't any cards in the image. How should I keep this data (I have to turn them into TFRecord files later on because I have to use Tensorflow Object Detection API) I have the images as numpy files such as ìd-1.npy and am keeping their bounding box informations in a dictionary {"id-1":[],"id-2":[448,123,343,532]} (In this case id-1 is one of the random images, therefore it doesn't have any bounding box information -so empty list). How should I go with this?

misty flint
graceful glacier
#

hello

#

i have a question most likely related to numpy

#

so currently i am tasked with creating a mock schedule for an nba team and here are the rules

#

im having issues with the second bullet point

tiny osprey
#

you can include a EarlyStopping parameter as a callback to monitor if validation loss is not improving over a set number of epochs

graceful glacier
#

essentially i need to create a column in pandas that generates either a 0 or a 1 but it has to generate exactly 6 1's

#

how can i accomplish this^^^?

misty flint
#

have you tried doing it on paper first

misty flint
#

!e

import numpy as np
x = np.ones((6,1))
print(x)
arctic wedgeBOT
#

@misty flint :white_check_mark: Your eval job has completed with return code 0.

001 | [[1.]
002 |  [1.]
003 |  [1.]
004 |  [1.]
005 |  [1.]
006 |  [1.]]
graceful glacier
#

how can i add this randomly to a matrix of zeros then?

#

a column of zeros to be more specific

misty flint
#

pretty sure you can stack them

#

lets see

#

!e

import numpy as np
a = np.array([0,0,0,0,0,0])
b = np.array([1,1,1,1,1,1])
c = np.column_stack([a,b])

print(c)
arctic wedgeBOT
#

@misty flint :white_check_mark: Your eval job has completed with return code 0.

001 | [[0 1]
002 |  [0 1]
003 |  [0 1]
004 |  [0 1]
005 |  [0 1]
006 |  [0 1]]
radiant trout
#

col=np.zeros(20)

#

col[np.random.choice(np.arange(20),6)]=1

elfin copper
#

Hey

#

Anyone have experience in Reinforcement learning

thorn pier
#

Hi everyone, I have a project about " scannen results of corona test " which algo should i use ? KNN or SVM ?

tacit basin
wicked grove
#

Hello, im training a vgg 19 model and im using k fold cross validation and want to plot the validation accuracy

#

Should i use this ?

tacit basin
wicked grove
tacit basin
#

so like if you train for 10 epochs, at each epoch you will have different validation loss or other metric, you can only use last epoch, that's a choice. just something to remember that neural nets can overfit with too many epochs and you will see overfitte results, where some previouse epochs could be better. unless you use early stopping or saving best model checkpint

#

so yeah with early stopping or saving best checkpint you could just plot one loss va lue / metric per training

#

then you could just append the results to a list and calucate mean, standard devaition etc

unique tartan
#

I don't even have money hahah !

tacit basin
unique tartan
#

I have just heard it before , but I don't know what are its functions

#

Thanks

wicked grove
wicked grove
#

And i want for each fold the last epoch's accuracy and loss curve to be plotted

wicked grove
# wicked grove

@tacit basin how do i get this kinda graph kfold cross validation

tacit basin
wicked grove
#

And loss vs the fold

tacit basin
#

so you would just add the accuracy and losses for each fold to say list and then plot that. would that work?

#

you are probably interested in average value and st deviation for example as well?

#

something like that:

odd meteor
steady basalt
wicked grove
#

but i can't understand the plot i am getting

#
for i in range(len(histories)):
        # plot loss
        plt.subplot(211)
        plt.title('Cross Entropy Loss')
        plt.plot(histories[i].history['loss'], color='blue', label='train')
        plt.plot(histories[i].history['val_loss'], color='orange', label='test')
        # plot accuracy
        plt.subplot(212)
        plt.title('Classification Accuracy')
        plt.plot(histories[i].history['accuracy'], color='blue', label='train')
        plt.plot(histories[i].history['val_accuracy'], color='orange', label='test')
plt.show()```
wicked grove
tacit basin
wicked grove
wicked grove
#

And got this graph

#

@tacit basin but idk if this is correct

tacit basin
#

What graph did you get?

wicked grove
wicked grove
#

Like does it mean it's underfitting cause the curves fall flat

tacit basin
#

Or colors are folds? And x axis epochs?

wicked grove
tacit basin
#

Nice

serene scaffold
#

@slim frigate ask here.

wicked grove
#

Im getting kinda confused

#

If it's underfitting cause it the loss curves look flat

tacit basin
#

It kind of starts overfitting from epoch 6 ish

#

At least on that blue orange curve that start at high loss

#

Not sure why the other folds start at low losses, maybe something with the setup?

wicked grove
#

When i read these graphs

#

Am i only supposed to look at the loss the curve or the accuracy one as well

tacit basin
#

Yes both. Loss is for model to adjust weights, accuracy is human read able metric,

steady basalt
#

Anyone know in pandas how to return only rows which meet condition

#

.where and Iloc giving errors

#

I want to say where the columns are saying True, or 1 would also work as I converted to binary

tidal bough
#

one usually does something like df[df["country"] == "my cool country"] or whatever.

#

the idea is that you get a boolean Series from comparisons, and you can index with it.

wicked grove
slim frigate
# serene scaffold <@!691427365926076416> ask here.

data = {
"Joe": {
"math": 65,
"science": 78,
"english": 98,
"gym": 89
},
"Bill": {
"math": 55,
"science": 72,
"english": 87,
"gym": 95
},
"Tim": {
"math": 100,
"science": 45,
"english": 75,
"gym": 92
},
"Sally": {
"math": 30,
"science": 25,
"english": 45,
"gym": 100
},
"Jane": {
"math": 100,
"science": 100,
"english": 100,
"gym": 60
}
}

i want to add the values with inputs like a app In order not to keep edit the dictionary every time

serene scaffold
radiant trout
manic tangle
#

maybe a very broad question, but how do I decide whether I should try and write an algorithm or use an ai / machine learning approach for a task?

serene scaffold
slim frigate
serene scaffold
slim frigate
serene scaffold
slim frigate
#

thanks

misty flint
manic tangle
random sapphire
#

I just made a youtube video about working with image data in python for anyone interested in image processing for computer vision / machine learning. Let me know if you have feedback: https://www.youtube.com/watch?v=kSqxn6zGE0c

In this video I show how to work with image data in python! Using the popular python packages matplotlib and opencv you will learn how to open image data, how the data is formatted, some ways to manipulate the data and save it off in a different format. If you enjoy you can also check out my live twitch streams (below). Image data is extremely p...

▶ Play video
tacit basin
prisma mist
#

refactored and rerunning a training set that took 40 mins last time... this should be fun

tacit basin
#

This is the point where model started overfitting

#

Red curve ( test/valid) started going in 'the wrong' direction

#

While train (blue) decreases

prisma mist
#

a SVM is using 100 percent of a core for training for > 20 mins... anyone know how to make it multi-threaded?

austere swift
#

when they fall flat that just means it converged

#

which means its done training

wicked grove
wicked grove
wicked grove
#

And the loss looks like it is flattening

wicked grove
steady basalt
#

Has anyone here used the ukbiobank, urgent research help needed

#

@serene scaffold perhaps?

gentle swift
tacit basin
upper spindle
#

i got the correct cwd but still comes out with an error

#

error is FileNotFoundError: [Errno 2] No such file or directory: 'Downloads\\sp_500_stocks(1).csv'

prisma mist
# upper spindle

try using the full path. your cwd suggests otherwise. also.. not important but if you're using windows use raw-strings for less chances of errors

upper spindle
#

i tried full path but still comes out with the same error

stone marlin
#

You could try something like

os.path.expanduser("~/Downloads/whatever.csv")
mint palm
#

is embedding just encoding?

odd meteor
# random sapphire I just made a youtube video about working with image data in python for anyone i...

I've just subscribed. Nice Channel.

Happy early congratulations on hitting 1k subscribers. 🎉🎉🎉

You should easily hit 1k subs if about 60 more people subscribe to your channel.

Hey guys let's get RoblksCube to 1k. Consider subscribing to his YouTube channel 🙏🙏

https://youtube.com/channel/UCxladMszXan-jfgzyeIMyvw

prisma mist
#

someone pls help... i'm utterly stuck... trying to use pandas.concat feature

for i in enumerate(col_name):
    column_name = col_name[i[0]]
    pearson_coef, p_value = stats.pearsonr(df[column_name], df["SEVERITYCODE"])
    fuck = {
            "Column Name": column_name,
            "Pearson Correlation Coefficient": pearson_coef,
            "P-value of": p_value,
        }
    i_m_crying = pd.DataFrame(fuck)
    df_local_list.append(i_m_crying)
    percof_smry = pd.concat(df_local_list, ignore_index=True)
ValueError: If using all scalar values, you must pass an index
#

i tried every solution but i keep getting errors.. how do i use pd.concat in a for loop?

azure marsh
mint palm
#

is it trained

prisma mist
#

that's not the problem... i got the idx correct... it's the i_m_crying = pd.DataFrame(fuck) that's saying i'm using scalar value

tacit basin
prisma mist
serene scaffold
tacit basin
prisma mist
#

i can keep using the old code

percof_smry = pd.DataFrame({'Column Name': [], 'Pearson Correlation Coefficient': [], 'P-value of': []})
for i in range (0,len(col_name)):
    pearson_coef, p_value = stats.pearsonr(df[col_name[i]], df['SEVERITYCODE'])
    percof_smry = percof_smry.append({"Column Name":col_name[i],"Pearson Correlation Coefficient": pearson_coef , "P-value of": p_value }, ignore_index=True)
print(percof_smry)

but pd.append is deprecated

tacit basin
prisma mist
# tacit basin where is the error raised? at concat?

i'm trying to use the new pd.concat feature. right now the error is at

i_m_crying = pd.DataFrame(fuck)

it goes away if i use pd.Series but that's the wrong thing to use... i need the output in a concatenated DataFrame

tacit basin
prisma mist
#

SOLVED:

df_local_list = []
for i in enumerate(col_name):
    column_name = col_name[i[0]]
    pearson_coef, p_value = stats.pearsonr(df[column_name], df["SEVERITYCODE"])
    fuck = [{
            "Column Name": column_name,
            "Pearson Correlation Coefficient": pearson_coef,
            "P-value of": p_value,
        }]
    i_m_crying = pd.DataFrame.from_dict(fuck)
    df_local_list.append(i_m_crying)
percof_smry = pd.concat(df_local_list, ignore_index=True)

monkey around long enough and you eventually produce works of shakespeare. my exasperation can be seen thru my variable names

azure marsh
mint palm
prisma mist
#

pd.append... simple... straight forward.. easy to use
what they replaced it with: pd.concat
how to use it:

  1. turn your dict into a list... x=[dict]
  2. pass it to a dataframe... framed = pd.DataFrame(x)
  3. append it to a list again ... appendify = [] appendify.append(framed)
  4. now you can use concat... your_objective_df.concat(appendify) 👏
#

who comes up with these ideas?.... rewriting features just for the sake of putting out a new version

misty flint
#

concat comes from numpy world. its good for concatenating matrices

azure marsh
#

It's been around pandas for a long time, maybe as long as append. It's also more general

misty flint
#

i dont understand the issue tbh.

prisma mist
#

what's not to understand? it's written right there

azure marsh
#

The issue is it adds several steps to appending a dict to a dataframe

#

Though they can be put on one line it's ugly

#

But python's tagline is to have one obvious way of doing things

prisma mist
#

pd.append was working fine... they didn't need to deprecate it

azure marsh
#

If you want to read the actual reasons

#

Series.append and DataFrame.append [are] making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.

prisma mist
#

more 👎 than 👍 ... and his reasoning was flawed... what we ended up doing was pd.concat with ignore_index=True .... seems like the devs just needed to do "something" to put out a new version and forced this issue thinking it up in isolation

#

my old code:

percof_smry = pd.DataFrame({'Column Name': [], 'Pearson Correlation Coefficient': [], 'P-value of': []})
for i in range (0,len(col_name)):
    pearson_coef, p_value = stats.pearsonr(df[col_name[i]], df['SEVERITYCODE'])
    percof_smry = percof_smry.append({"Column Name":col_name[i],"Pearson Correlation Coefficient": pearson_coef , "P-value of": p_value }, ignore_index=True)
print(percof_smry)

my new code:

df_local_list = []
for i in enumerate(col_name):
    column_name = col_name[i[0]]
    pearson_coef, p_value = stats.pearsonr(df[column_name], df["SEVERITYCODE"])
    the_values = [
        {
            "Column Name": column_name,
            "Pearson Correlation Coefficient": pearson_coef,
            "P-value of": p_value,
        }
    ]
    pass_to_df = pd.DataFrame.from_dict(the_values)
    df_local_list.append(pass_to_df)
percof_smry = pd.concat(df_local_list, ignore_index=True)
print(percof_smry)

it has to pass thru extra steps now

tacit basin
#

What deep neutral network architecture would be good for large images classification, there will be both larger and smaller details that will be important for classification.

azure marsh
#

If you provide counters to each of their reasons that would be much more relevant than one code sample

prisma mist
azure marsh
#

Go away? Is this your discord? I'm merely asking you to think beyond your specific use case

#

You're welcome to ignore the request and exit the discussion. You're trying to debate the deprecation of an API, if you're looking for more than blind agreement perhaps look elsewhere.

modern cypress
#

Can anyone direct me to pretrained object detection models for COCO? If anything like that exists?

prisma mist
#

go away means don't talk to me... some of us have real models to train rather than having pointless debates

modern cypress
azure marsh
#

But there are surely more across all frameworks

minor elbow
#

or something like that

azure marsh
#

And as I mentioned above you can roll multiple of your lines into one. List append doesn't need to be it's own line, and neither does from dict

#

Additionally rolling into all one line is less readible and maintainable, specifically the old dict creation

random sapphire
prisma mist
minor elbow
#

you would have to work on it a bit most probably, was just eyeballing it

#

i feel like you would do well to work on your python fundamentals though

#

depends on what stats.pearsonr gives back, you could unpack it in the first comprehension maybe

prisma mist
minor elbow
#

generators can be nice if you need to do a bit too much dancing for a straight comprehension

#

i definitely lean towards using more lines and being clearer than trying to wrap it all in one giant thing

steady basalt
#

Love the package tho

#

Sometimes I am confused why they fix what isn’t broken

#

Btw I don’t see anything wrong with concat

#

I don’t think it’s new

minor elbow
#

i think one of the problems is, where do u draw the line about what to keep for backwards compatibility

#

like if they acknowlege it was a bad idea/didnt really work as expected, id rather they got rid of it

#

rahter than having a million different ways to do things, many of which are suboptimal

#

also maintaining a bunch of older stuff that should have been deprecated takes time away from doing newer better things

#

like as an extreme, ive worked at place that still run on mainframes, the argument is 'well they still work fine' which is true on one level but in practise it means they cant do anything modern that their users/customers want to do

unique tartan
#

Hey Guys ! Can u send some examples or ur projects using AI

#

You know what I mean, like a face detector or whatever

grave frost
#

@iron basalt seems Numenta is ditching HTM https://arxiv.org/abs/2201.00042
Apparently, it doesn't change things.... much for TBT, but I am currently pestering some guys about the status

anyways, its a pretty bad paper with plenty of criticisms - not to mention the DL baselines it competed against were weak, GOFAI stuff used to disguise cheating and general sussiness regarding inconsistent methodologies across experiments

languid stratus
misty flint
#

just like if your system is run on COBOL. where are you going to find COBOL programmers nowadays kekHands

warm stirrup
#

thanks, this helped a lot

misty flint
languid stratus
brave sand
#

so I’m close to completing LunarLander on OpenAI, how different is that from a 4 motor drone?

mint palm
#

how to deal with problem that LabelEncoding takes input only similar datatype

#

these are my modelling parameters

#

what kind of encoding do you suggest

misty flint
pseudo wren
#

Having trouble getting all the elements from my dataframe into one list

#

I’m looking to get everything into a list in order to calculate polarity

#

For some reason this is not working right now

#

I’ve been trying to find an answer to this question and can’t as of yet

novel elbow
karmic valley
#

anyone help me make a quick loop. if free for 5min just @winged grove would appreciate

#

want to create a loop or something. all my images are in order. like part_1, part_2, part3 . want code to automatically do code for like part[i+1].png

#
from skimage import io, img_as_float
import numpy as np

image= io.imread(r'C:\Users\guest\Dropbox\con1_outfolder_split_30sbeforepeak2min30safterpeak\part_1.png')
image = img_as_float(image)
print(np.mean(image))
novel elbow
karmic valley
#

only thing i am thinking is will it give me in order. because i want to see printed results in order. like first part 1 then part 2 then part 3 @novel elbow

#

got this error

 images = [img_as_float(io.imread(f'C:\Users\guest\Dropbox\con4_outfolder_split_30sbeforepeak2min30safterpeak\part_{i}.png')) for i in range(10)]
                                                                                                                             ^
SyntaxError: invalid syntax
wicked grove
#

How can i know if it is wrong tho

graceful glacier
#

whats the easiest way to extract the last digit from a number in pandas?

misty flint
#

youll need to select a column from your dataframe first since that function only accepts pandas series

#

your_list = df['column A'].to_list()

tacit basin
wicked grove
iron basalt
# grave frost <@!119925597395877889> seems Numenta is ditching HTM https://arxiv.org/abs/2201....

I don't think they are ditching it for DL, they have pretty much always been doing some DL-ish stuff too (I think Jeff already said in his book that HTM was not right, the specifics of it, but general things like having cortical columns still persist). I have not read the paper all that much so IDK about its quality. It does not interest me that much. I'm more interested in Jeff's grid cells idea (thousand brains theory, but also just for localization / regular grid cells stuff).

#

While we were inspired by HTM and such, we don't do it the way they do because it never got really good results (if it does not work, we move on, though it's hard to tell since it's a multi-arm bandit problem). The big picture of the structure and such is there / modelling the neocortex, but the details of how to do that are very different.

#

You can also see their use of DL in their first papers on grid cells based object detection in which they use a pre-trained CNN to simply demonstrate that the grid cells can identify objects given only patches of the original image in any order (as a sequence of eye movements). Ideally the DL part would be replaced with something more biologically plausible that at least gets similar results (we have done that, which also enables online learning in our case). Numenta as a company has different things going on and for me personally it's hit and miss. Sometimes it's a really nice idea like thousand brains theory, but sometimes it's kind of meh.

tacit basin
#

mabye convert column to string and slice it and back to number

df.AA.str[-1:].astype(int)
tacit basin
wicked grove
#

And i split it five times

tacit basin
#

what is it classification i forgot sry, if classification you can check number of classess in each split and also in validation set

subtle spoke
#

does anyone know how to use ffmpeg? I'm following these instructions but keep getting an error saying the file can't be found, even though I saved it in the same directory as the ffmpeg-split.py file

#

This is the command I ran

mint palm
subtle spoke
mint palm
#

suggest some architectures for complete categorial dataset

slim frigate
#

hi

#
from openpyxl import Workbook, load_workbook
from openpyxl.utils import get_column_letter
from openpyxl.styles import Font
data = {"rename": {
        "math": 20,
        "science": 20,
        "english": 20,
        "gym": 20}
        }

def info():
    name = input('Your name ? : ')
    math = input('Your math degree ? : ')
    science = input('Your science degree ? : ')
    english = input('Your english degree ? : ')
    gym = input('Your gym degree ? : ')
    data.update({name:{"math":math,"science":science,"english":english,"gym":gym }})
    input('press any key ...')

for a in range(2):
    info()
    a+=1

wb = Workbook()
ws = wb.active
ws.title = "Grades"

headings = ['Name'] + list(data['rename'].keys())
ws.append(headings)

for person in data:
    grades = list(data[person].values())
    ws.append([person] + grades)

for col in range(1, 6):
    ws[get_column_letter(col) + '1'].font = Font(bold=True, color="0099CCFF")

wb.save("NewGrades.xlsx")```
#

result :

#

but i want change the output without rename data

#

any suggestions can help me

stone marlin
#

This does not appear to be data science, you may want to ask in the standard help rooms.

mint palm
#

embedding needs encoding first right?

rotund trellis
#

Is there a way for me to use drone-acquired images of water/ocean/lakes and use them to check for pollution using machine learning?

haughty dust
#

Can someone advise what filter to keep on images while making word clouds in python??
Since I am not able to get proper imprint of the person with word cloud

steady basalt
fallen jackal
#

hey, i have a cropped opencv2 image that i want to predict in a model that i made.
the model requires shape (28,28), but when i try to reshape it i get the error
"cannot reshape array of size 63948 into shape (28,28)'"

img = cv2.imread(u'/content/drive/MyDrive/Data/Project/Screenshot_44.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

hImg, wImg, _ = img.shape
boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
  b = b.split(' ')
  x,y,w,h = int(b[1]), int(b[2]), int(b[3]), int(b[4])

  width = w-x
  height = h-y
  n = width - height

  if (width > height):
    h = int(h + n/2)
    y = int(y - n/2)

  elif (height > width):
    w = int(w - n/2)
    x = int(x + n/2)

  crop_img = img[hImg-h:hImg-y, x:w]
  reshape = crop_img.reshape((28,28))
rotund trellis
grave frost
steady basalt
#

How high up are they taken

#

If exists a vast library of these images then you could do it

forest knoll
#

Hi guys, is it possible to use CNN to train a model without labels in order to query similar images?

serene scaffold
humble mountain
#

Someone need help for a code?

serene scaffold
forest knoll
#

@serene scaffold I have 20 query images which have been cropped, I need to rank 10 most similar images among 5000 images, they are not quite similar
I only know I can use some feature extraction algo like SIFT or color histogram to find those image, but some images are still not found. And I know transfer learning is a way to modify the existing CNN model like VGG. but what's the most accurate way to do that?

pseudo wren
spring marsh
#

Does it matter if I normalize my data if I am using logistic regression ? I tried using standard scaler from sklearn and I am somehow getting way better results in confusion matrix and classification report

spring marsh
#

I can send the whole code if someone wants to take a look

sour spindle
#

normalizing data for lstm layers helps alot since it ranges values in tanh and sigmoid function

fossil ivy
#

Hey everyone! Anyone here experienced with Elasticsearch who could help me out? Im using ES 7.17 for lower level security but when I try to access the running node it tells me now that I am missing authentication credentials
elasticsearch.exceptions.AuthenticationException: AuthenticationException(401, 'security_exception', 'missing authentication credentials for REST request [/persons]')

spring marsh
forest knoll
#

@grave frost thank you for suggestion, i'm studying contrastive learning and transfer learning to see which one is more accurate

high grove
#

How to make a graph in third quadrant in matplotlib ?

hasty kiln
#

What Data Science and Data Analysis Skills Are Required to Become an ML Engineer?

#

I read in an article about the requirements to become an ML engineer, that you should have some experience with Data Science, data analysis and other major requirements.

steady basalt
#

actual ml engineers in research are literal gods

#

Check out some papers on designing new algorithms

#

Tbh I don’t think I’ll ever reach that level due to the hard cap on my math

hasty kiln
steady basalt
#

Ur gona need like

#

Literal years of education or experience

#

Take CS at uni maybe, then a statistical postgrad

#

Do u use python? Or R

hasty kiln
novel elbow
#

In my view a ml engineer is the one who knows how to apply software eng good practices into ml model development/deployment (mainly testing and CI)

desert oar
#

good point of course. but you might want to look into scikit-learn's dataset generators for some insight into generating somewhat "realistic" datasets https://scikit-learn.org/stable/datasets/sample_generators.html

novel elbow
unreal charm
#

Hi, how can i convert a csv file to base64 encode?

hasty kiln
karmic moth
#

Hi guys, i have a question, its simple and it has 2 parts, so basically

Should we train a CNN with Conv1D layers, with 2D or 3D data, or both are possible?
Should we train a LSTM model with 2D or 3D data, or both are possible?
I need someone's advice on this before I proceed.

mint palm
#

sys:1: DtypeWarning: Columns (1) have mixed types.Specify dtype option on import or set low_memory=False.

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
#

error^^^^^^^^^^ on running

jovial summit
#

Continuing from #help-cherries

For context, I'm a mechanical engineer without a ton of software experience that's now writing finite-element analysis code. We have an old mess of MPI-based C++ code that nobody understands. I'm currently looking to rewrite it. I have a simpler version of it implemented in Python now using Numba, to see if it works with multithreading the way we need.

I'm trying to determine if Numba is the best way to implement this, or if I should look into something else like Cython

Here's a snippet that shows the 'type' of operations that it's mostly based on.

@njit(parallel = True)  
def stepAllCalcs(dx, dz, ux3, uz3, ux2, uz2, ux1, uz1, lam, mu, lam_2mu, dt2rho, weights):
    co_dxx = 1/dx**2
    co_dzz = 1/dz**2
    co_dxz = 1/(4.0 * dx * dz)

    #Ux
    dux_dxx = co_dxx * (ux2[1:-1,0:-2] - 2*ux2[1:-1,1:-1] + ux2[1:-1,2:])
    dux_dzz = co_dzz * (ux2[0:-2,1:-1] - 2*ux2[1:-1,1:-1] + ux2[2:,1:-1])
    dux_dxz = co_dxz * (ux2[0:-2,2:] - ux2[2:,2:]- ux2[0:-2,0:-2] + ux2[2:,0:-2])

    (...)

    # Stress G
    stressUX = lam_2mu * dux_dxx + lam * duz_dxz + mu * (dux_dzz + duz_dxz)

It's mostly simple array addition, with some scalar multiplication.

@desert oar

desert oar
#

other than the horrifying 70s style variable names, this looks about as good as it's going to get

#

(i know i know it's math notation rendered in ascii, i've written/used code like this)

#

maybe there are some additional optimizations for this kind of calculation but i wouldn't know any

jovial summit
desert oar
#

writing high-performance cython is a lot closer to C than Python

#

you are still messing with pointers and such, and even worse you now have to worry about interacting with python, thread safety, reference counting, etc.

#

and at that point you're probably better off with the original c++ application

#

which leads me to ask: is this performing significantly worse than the C++ version?

jovial summit
#

we don't have an exact comparison between the two

desert oar
#

it seems like numba already uses openmp for parallelization, fwiw https://numba.discourse.group/t/does-numba-support-mpi-and-or-openmp-parallelization/483/2

jovial summit
#

this is a 2d version that I wrote a while back and am just now trying on our HPC, while the 'actual' code is 3d

#

trying to figure out if it's worth rewriting in 3d

jovial summit
#

just multiple cores, which numba seems to handle fine

#

also if anyone has general input for how to approach writing large-scale stuff like this I'd appreciate it

desert oar
#

not that i know of, no

#

i think openmp is what a lot of numerical programs use anyway

trim pond
#

Hi I want to make a project that uses object detection. I have some tf and data science experience but never used computer vision and stuff. Which libraries or frameworks do you guys recommend?

#

Or any courses to get started?

jolly knoll
#

Guys, I have a column called "Routes" with 900 unique values. Should I have one-hot encoded it? Haha

#

If not, what should I have done?

jolly knoll
#

for feature selection (rfe), then model development (random forest)

desert oar
jolly knoll
#

hmm what should I have done with it instead?

jolly knoll
mint palm
#

should i apply hash encoding after test_train_split??

#

or before??????????????//

desert oar
desert oar
mint palm
desert oar
#

but normally you should fit/train your data transformations only on the training set, not on the test set. data transformation is part of your model!

mint palm
#

how should i deal with Y

desert oar
#

why would you?

mint palm
#

consider my dataset to be 100% categorial

#

should i just one_hot y

#

or LabelEncode Y

desert oar
#

assuming you're using scikit-learn, OneHotEncoder has a bunch of extra features that you don't need for labels

#

and hashing makes no sense for a variety of reasons; i encourage you to think about why

mint palm
#

yeah ok on it, i hope it doesnt give me 100 accuracy on all three sets this time

#

else i am skrewed

desert oar
mint palm
#

yeah tried all that stuff

#

i am actually not doing any mistakes

#

that model was just not suited for the problem i believe thats why i am switching the model

desert oar
#

100% accuracy isn't a bad thing btw, but it does probably mean that your model is badly overfitted

mint palm
desert oar
#

that's covering up the problem, not solving it

mint palm
#

i have lower loss on validation then on train

#

thats suggest an "inconclusive performance"

desert oar
#

that suggests bugs in your code to me, or a particularly unlucky train/test split

#

this is why i think nested cross validation is a much better approximation of out-of-sample performance than a plain split

mint palm
desert oar
#

you can try posting your code, can't make any guarantees

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

mint palm
#

i have dataset and code in google drive and collab

#

are you comfortable with those

#

i can dm you, if you give me permission to

desert oar
#

that's a bit more than i'll have time to look at

mint palm
#

This website paste.pythondiscord.com/ is currently offline. Cloudflare's Always Online™ shows a snapshot of this web page from the Internet Archive's Wayback Machine. To check for the live version, click Refresh.

desert oar
supple leaf
#

salt rock lamp, could you give me a tip how i can find the maximum value from my code:

import pandas as pd
import matplotlib.pyplot as plt
#import numpy as np

var = pd.read_excel(r'/Users/pontusskol/Desktop/data.xlsx')
print(var)

x = list(var['X values'])
y = list(var['Y values'])

plt.figure(figsize=(10,10))
plt.style.use('seaborn')
plt.plot(x,y, '-o',label='x,y')
plt.scatter(x,y,marker="o",s=100,edgecolors="black",c="yellow")
plt.title("Excel sheet to Scatter Plot")
plt.show()

Which gives me this graph as i showed you before:

#

Ive been searching on internet but I just cant make it work :/

supple leaf
#

by using this?

xmax = x[numpy.argmax(y)]
ymax = y.max()
young egret
#

Is there a way to make my twint program update its data in real time?

agile cobalt
desert oar
#

either you compute a bunch of values on a very finely-spaced grid and do a search, or use some numerical optimization routine

supple leaf
#

okay thanks, will look into that

mint palm
#

you got any error?

jovial summit
#

@desert oar sorry for the random ping, but it seemed like you were knowledgeable about Numba - any ideas where I'd start troubleshooting LLVM / SVML? I'm trying to enable it for the project from earlier but it doesn't seem to be working

#

right now I'm just doing numba._try_enable_svml which always returns false despite having the libs installed

mint palm
#

how to handle hash encoding if my column has more then one datatype.

umbral sage
#

does anyone know if I load a model through HDFS how can I load it to use like pickle.load would since it is from a connection instead of a file

lapis sequoia
#

hello guys how can i fix installing kivy in anaconda errors

desert oar
#

i have a conceptual model about how numba works, and i know what llvm is, and that's about it

lapis sequoia
#

ERROR: Could not find a version that satisfies the requirement kivy-deps.sdl2 (from versions: none)
ERROR: No matching distribution found for kivy-deps.sdl2

#

how to fix it?

mint palm
#
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.```
ocean pendant
#

hello guys

#

im kinda new to learning curve analysis . does anyone know if this curve is good or not

#

im trying to run optimization buut I dont exactly know how I can fine tune the hyperparameters according to the learning curve

unreal charm
#

Hi I want to convert this: ```label,question,answer
label 1,pytanie 1?,odpowiedź 1
label 1-2,pytanie 2?,odpowiedź 1
label 1-2,pytanie 1?,odpowiedź 1
label 1-2,pytanie 2?,odpowiedź 2
label 1-2,pytanie 1?,odpowiedź 2
label 2,pytanie 2?,odpowiedź 2

#

with base64

#

and instead of this:

lapis sequoia
unreal charm
#

bGFiZWwscXVlc3Rpb24sYW5zd2VyCmxhYmVsIDEscHl0YW5pZSAxPyxvZHBvd2llZMW6IDEKbGFiZWwgMS0yLHB5dGFuaWUgMj8sb2Rwb3dpZWTFuiAxCmxhYmVsIDEtMixweXRhbmllIDE/LG9kcG93aWVkxbogMQpsYWJlbCAxLTIscHl0YW5pZSAyPyxvZHBvd2llZMW6IDIKbGFiZWwgMS0yLHB5dGFuaWUgMT8sb2Rwb3dpZWTFuiAyCmxhYmVsIDIscHl0YW5pZSAyPyxvZHBvd2llZMW6IDIK

#

I have that written in my csv file: ```98,71,70,105,90,87,119,115,99,88,86,108,99,51,82,112,98,50,52,115,89,87,53,122,100,50,86,121,67,109,120,104,89,109,86,115,73,68,69,115,99,72,108,48,89,87,53,112,90,83,65,120,80,121,120,118,90,72,66,118,100,50,108,108,90,77,87,54,73,68,69,75,98,71,70,105,90,87,119,103,77,83,48,121,76,72,66,53,100,71,70,117,97,87,85,103,77,106,56,115,98,50,82,119,98,51,100,112,90,87,84,70,117,105,65,120,67,109,120,104,89,109,86,115,73,68,69,116,77,105,120,119,101,88,82,104,98,109,108,108,73,68,69,47,76,71,57,107,99,71,57,51,97,87,86,107,120,98,111,103,77,81,112,115,89,87,74,108,98,67,65,120,76,84,73,115,99,72,108,48,89,87,53,112,90,83,65,121,80,121,120,118,90,72,66,118,100,50,108,108,90,77,87,54,73,68,73,75,98,71,70,105,90,87,119,103,77,83,48,121,76,72,66,53,100,71,70,117,97,87,85,103,77,84,56,115,98,50,82,119,98,51,100,112,90,87,84,70,117,105,65,121,67,109,120,104,89,109,86,115,73,68,73,115,99,72,108,48,89,87,53,112,90,83,65,121,80,121,120,118,90,72,66,118,100,50,108,108,90,77,87,54,73,68,73,75

#

why is that?

#

my code:

    if not name or name == '':
        print("badn name")

    label_check = db.session.query(Labels.label_name,Labels.label_id).first()

    if label_check == None:
        print("no labels")


    header = ['question', 'label', 'answer']
    data = db.session.query(Labels.label_name,Questions.question,Answers.answer)\
    .filter(Labels.label_id==Answers.label_id)\
    .filter(Labels.label_id==Questions.label_id).all()
    result = all_many_schema.dumps(data)
        
    with open(f"dowolands/{name}.csv", 'w', newline='') as f:
        header = ['label', 'question', 'answer']
        writer = csv.writer(f)
        writer.writerow(header)
        i = 0
        for range in data:
            writer.writerow(data[i])
            i = i+1
    data = open(f"dowolands/{name}.csv", "r").read().encode('utf8')
    encoded = base64.b64encode(data)
    with open(f"dowolands/{name}.csv", 'w') as f:
        writer = csv.writer(f)
        writer.writerow(encoded)
        f.close()
    print(encoded)```
jolly knoll
# desert oar probably not, i wouldn't worry about it

Thanks, that's a relief to know. But do you know what should I have done instead with dealing with high-cardinality columns? I've read of PCA but I heard it's designed for continuous variables. Would very much appreciate your input!

jaunty mural
#

night guys, sorry for disturbing, I feel a bit sick and can't concantrate on simple task. How can I add points(markers) for this subplot and prevent displaying scientific number in axis

cerulean finch
#

whats the ".exe" encoding bois?

iron basalt
iron basalt
# mint palm embedding needs encoding first right?

Encoding is putting something into some kind of system of signals (very generic term). Embedding's goal is to embed something into something to gain new insight about it and other things related to it. You can imagine embedding the data as analogous to Archimedes embedding Hiero's crown into liquid to measure its volume.

#

Whenever you change the form of some data you have technically encoded it. But whether or not that encoding is useful in that it lets you compare things is what matters.

mint palm
#

how do i predict embedding layer attributes

mint palm
#

input_dim
output_dim
and input_length
while embedding

iron basalt
#

You need to define some embedded space and then have something that embeds items into that space. The input_dim is whatever the input's dim is. The output_dim is whatever you decided.

brave sand
#

will this equation work for all reinforcement learning tasks?

#

new_q = (1 - LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)

pastel valley
#

im using free google colab and i cant train 100 epochs all in a single runtime so can i just train it 50 epochs first then save the model then next runtime load and train it for another 50 epochs?

iron basalt
brave sand
odd meteor
# ocean pendant im kinda new to learning curve analysis . does anyone know if this curve is good...

From your plot, you can see that your train set (blue line) Loss reduces as the number of Epoch increases. However, we can't say the same about the Validation loss.

The validation loss briefly reduced alongside train loss until, say, in the 7th Epoch when it slowly starts to diverge.

So, in essence the bigger/wider the resulting space caused by the divergence between your Train loss and Validation loss, the more your model overfits the data

odd meteor
# ocean pendant im trying to run optimization buut I dont exactly know how I can fine tune the ...

Try to use EarlyStopping callback to prevent overfitting.

from Keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience = 5)
model.fit(X_train, y_train, epochs =250, validation_data =(X_yest, y_test), callbacks=[early_stopping])

Also, for performing hyperparameter tunning in DL using an approach that's somewhat synonymous to RandomizedSearchCV in sklearn, you could use sklearn wrapper from keras

from keras.wrappers.acikit_learn import KerasClassifier`
quartz plank
#

Has anyone tried out neural intent?

lapis sequoia
#

i'm trying to make a new column that converts the S&P ratings to numbers

#
import pandas as pd

grades = {
    'AAA': 1,
    'AA+': 2,
    'AA': 3,
    'AA-': 4,
    'A+': 5,
    'A': 6,
    'A-': 7,
    'BBB+': 8,
    'BBB': 9,
    'BBB-': 10,
    'BB+': 11,
    'BB': 12,
    'BB-': 13,
    'B+': 14,
    'B': 15,
    'B-': 16,
    'CCC+': 17,
    'CCC': 18,
    'CCC-': 19,
    'CC': 20,
    'C': 21,
    'D': 22,
}

states = pd.read_csv('./data/states_credit_scores.csv')
states_frame = pd.DataFrame(states)
number_sp = [grades[x] for x in states_frame['Rating']]
states_frame['Rating_Num'] = number_sp
states_frame.sort_values(by='Rating_Num', inplace=True)
states_frame

countries = pd.read_csv('./data/countries_credit_scores.csv')
countries_frame = pd.DataFrame(countries)
number_cp = [grades[x] for x in countries_frame['Rating']]
countries_frame['Rating_Num'] = number_cp
countries_frame.sort_values(by='Rating_Num', inplace=True)
countries_frame
#

this is my error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/snap/pycharm-professional/278/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/snap/pycharm-professional/278/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/amicharski/PycharmProjects/njBudget/main.py", line 37, in <module>
    number_cp = [grades[x] for x in countries_frame['Rating']]
  File "/home/amicharski/PycharmProjects/njBudget/main.py", line 37, in <listcomp>
    number_cp = [grades[x] for x in countries_frame['Rating']]
KeyError: nan
spark dirge
#

turn it into a for loop and print each iteration? that or running in a debugger to see which key is causing problems.

tranquil yarrow
#

Anyone know of a way to invert the background/axis/label colors on a matplotlib 3d plot?
I'm trying to plot the orbits of solar system bodies, and they are color-coded with light colors because they will be on dark background (space) eventually.
I could probably convert to HSL and then just turn down the lightness, but a dark background would probably just be better.
Here is what they look like currently:

#

For anyone curious, the visible blue orbit is Neptune, the largest colored orbit is the dwarf planet Gonggong, and the biggest gray orbit is Comet Ikeya-Zhang.

lapis sequoia
#

Any suggestions on how to solve systems of differential equations on GPU using Python? Are there any packages like SciPy that offer this functionality on the GPU? I posted this in the #algos-and-data-structs channel too but it might be more appropriate for this channel.

iron basalt
#

Which functions from SciPy do you want?

lapis sequoia
#

@iron basalt Numerical solvers. For example, the solve_ivp function in SciPy solves a system of ODEs but uses the CPU. Is there anything like that available for GPU?

lime current
#

Hello, has anyone used or worked on "spotify/ANNOY" machine learning nearest neighbor model?
I need held on my project!!!!

#

help*

iron basalt
#

It seems SciPy's default solving method is Runge-Kutta of order 5 (4). Assuming I mean interpreting "RK45" correctly and they don't actually mean "RKF45".

misty flint
#

we're moving into squiggle's domain

iron basalt
#

Just checked the source code for it, it's Runge-Kutta order 5 (4).

#

If you have never written a solver for it before there are plenty of tutorials, the code is really short.

#

So you can first write it with numpy, then move that to numba.

misty flint
#

minitorch uses numba PikaThink

iron basalt
#

SciPy's solve_ivp is basically just RK45 with some extra code for picking methods other than RK45, and parameter wrangling.

slate hollow
#

can someone tell me why i need to install visual studio for cuda? i haven't installed any of the tools that it comes with, just the editor

#

and yet cuda seems to do just fine

#

what's so magical about vs 2019 that cuda wants

#

?

iron basalt
slate hollow
#

Microsoft SDK
?

iron basalt
#

Windows SDK*

slate hollow
#

this thing?

iron basalt
#

(unless you use mingw or something like that, but that is unofficial)

#

CUDA I think makes use of the visual studio SDK on windows so it may be that one (or both).

#

I know that whatever system you are using, CUDA hijacks your C++ compiler so you can write kernels in C++ directly.

#

Ah found the info: ```

Visual Studio is an IDE (Integrated Development Environment). It's the user interface.

Build Tools include the compiler that compiles your source code into machine code.

Windows SDK contains headers, libraries and sample code used to develop applications.

#

I would think that it needs the SDK and build tools.

#

But it probably wants to also integrate into visual studio.

slate hollow
#

thanks!

iron basalt
#

Windows does not have a standard way of dealing with SDKs / libs like Linux, so it's all a mess there.

misty flint
#

oof

#

sounds about right tho

#

think i wanna try this manim library

#

to see if i can make a short clip about numpy's broadcasting

rocky bough
#

Hello, I'm looking to have an interactive bot that reacts to user messages in certain scenarios and tries to match up a user's response up to one of a few different request options. I'm kind of lost as to what approach I should take here. Any general pointers would be much appreciated!

iron basalt
rocky bough
#

yes

iron basalt
#

Have you tried something really simple like naive bayes?

rocky bough
#

I haven't really tried anything yet. I'm looking for pointers on what I can read up on or specific libraries to use

#

I've done Binary Classification to use, but now I need something more

#

and I also dont really have that large of a training set to work on

iron basalt
#

Even more simple, you can just check for keywords in the messages.

#

Basically naive bayes, but hand crafted probabilities.

misty flint
rocky bough
#

yeah, I've considered that, but in some cases, it would be necessary to differentiate between who is being referred to

iron basalt
#

Well that is not just classification, that is much more complicated.

misty flint
#

yeah best to start simple. you can always iterate later

rocky bough
#

for example if the message says "I will do xyz, you should do abc", that needs to be analysed

iron basalt
#

However, you could first classify it with something dumb like naive bayes, and then try to figure out stuff from there based on that class.

misty flint
#

might as well learn NLP basics too

iron basalt
#

Or yeah, actually learn NLP.

misty flint
#

you could use this as an excuse to learn more

misty flint
#

funny enough we went over RNNs today

#

super classic

#

not even LSTMs yet

#

or attention mechanisms

#

we should get to transformers eventually

#

and modern NLP architectures

iron basalt
#

If you are struggling with LSTM, try looking at GRU. It's better in every way.

#

More simple, better results.

misty flint
#

i also looked at that today

#

but on my own

rocky bough
#

ok, thanks for the pointers. so Naive Bayes can provide some basic classification, that can then be further analyzed, and if I need more I need to look further into NLP

#

since at the same time it can't be unreasonably complicated because more complex algorithms usually take more processing power, its probably better to keep it simple stupid anyway

mint palm
#

Can someone please make me underatand how n_companents work in hashing.

iron basalt
mint palm
#

I understand how it works

#

But dont know how dimensions work

lapis sequoia
#

Hello guys, looking for some easy to follow python repos on github (data engineering preferred) where the code is written in a modular and production appropriate manner.

#

Basically I have been writing code in Jupyter notebooks for data lift and shift but would like to learn how to convert the code into a more modular and reusable format.

steady basalt
#

Boys my essay and coding assignment has been set

#

Which supervised models should I become an expert in 🧐

jaunty mural
#

wow, after good cup of tea I've improved my simple script file to plot 4 subplots

icy nebula
#

Hi there, Is there anyone who has a sample presentation/study file analyzing PCA components in terms of original variables? ( I am struggling to find a good example that explains PCA in business context)

odd meteor
# steady basalt Which supervised models should I become an expert in 🧐

As much as you can. There are many Supervised Learning algorithm, once you know a 2 or 3, it'll be easier to grasp how others work too. It's almost same syntax but different algorithms, and sometimes, different hyperparameters.

Knowing both Linear-based and Tree-based algorithm is quite important

odd meteor
# icy nebula Hi there, Is there anyone who has a sample presentation/study file analyzing PCA...

Do you understand what PCA does? I think if you've understood it very well you can easily apply/implement it in any business context.

You can check this video https://youtu.be/FgakZw6K1QQ

Principal Component Analysis, is one of the most useful data analysis and machine learning methods out there. It can be used to identify patterns in highly complex datasets and it can tell you what variables in your data are the most important. Lastly, it can tell you how accurate your new understanding of the data actually is.

In this video, I...

▶ Play video
odd meteor
# lapis sequoia Hello guys, looking for some easy to follow python repos on github (data enginee...

I don't know of any but try checking this https://youtu.be/bkJZDmreIpA then put on your FBI hat and do a quick digging on their GitHub repo. You might find what you seek therein

icy nebula
steady basalt
#

Which two… I already know how all of them “work” I meant on an expert level

#

We have to use them predictively as well as write essays

odd meteor
mint palm
#
    categorical_columns = [c for c in dataset.columns if (c != 'Slice Type (Output)')]

    hs = category_encoders.HashingEncoder(cols=categorical_columns, n_components=16)
    d = hs.fit_transform(dataset)```
#

i applied this encoding, is this actually correct, i worry about not able to see any correlation

patent pine
#

If I want to compare a column from 2 data frames, is there a more efficient way than df1.compare(df2) ?

#

Sorry I can not help with your problem. I was asking about mine 😂😂😭

desert oar
#

and what do you mean by "efficient"?

desert oar
#

xgboost seems to need more "care and feeding" to get good model performance, and generally is slower

#

and catboost never gave me good results compared to lightgbm on the problems where i tried it

serene scaffold
desert oar
#

data science code is usually bad quality

#

read the scikit-learn source code, their code is usually pretty decent

#

it's a bit "old school" in some respects, but for the most part it's a well-organized and thoughtful code base

serene scaffold
#

hmm, old school in what way?

desert oar
#

no type hints, * imports

#

one thing that they do which is interesting is mapping 1:1 **kwargs to instance attributes, this is actually enforced by their base classes

#

in a world without type annotations, that's a really nice thing

#

and in general it makes it impossible to accidentally discard user input

#

also distinguishing "generated" fields by suffixing with _ is kind of ad-hoc but a very useful convention

#

of course they almost certainly should have gone the R/statsmodels route of returning a "result" object instead of mutating the original "model" object and adding a bunch of fields

#

oh yeah, another old school thing: fields that are not initialized in __init__ and using hasattr() to check the current state of the object

#

the flipside of making the model fitted in-place is that you can chain transformers easily, but that's kind of a quirky thing that you don't usually need anyway

red sphinx
#

hey yall

#

i wanted to ask a question

#

if AI isnt telling the code if the person says "hello" or hi or sup or wassup or what ever word that means a welcoming action then how is AI made i mean like siri and google assastaint

serene scaffold
red sphinx
#

yea but if siri hgas components that include AI

serene scaffold
#

for example, if you ask siri a factual question, it uses an information retrieval algorithm to find a statement that answers the question, and that is AI.

red sphinx
#

so when i say hello it uses a algorithm to know what does hello even mean and what to answer if a user says hello

#

is that what your saying?

serene scaffold
#

no, just saying "hello" to siri and getting a response does not include any AI.

red sphinx
#

then how?, how does it know what hello means

serene scaffold
#
if user.says() in ('hello', 'hi'):
    return random.choice(['hi', 'hello', 'greetings'])
serene scaffold
red sphinx
#

so you mean that this is how siri is mean

serene scaffold
#

it's unlikely that the siri source code has whole conversations mapped out like this

red sphinx
#

ofcourse

#

thats what im saying

serene scaffold
#

but for trivial conversations, it's probably just picking from a few canned responses, just using speech recognition.

red sphinx
#

oh,

#

well thanks

serene scaffold
#

yw

red sphinx
serene scaffold
red sphinx
#

a quick one

#

thanks

red sphinx
#

return random.choice

#

i want to put that in my code

#

look ill give you an example

serene scaffold
#

it's my code, so you owe me 100 bucks

modest shuttle
#

Hello,
Which is better for detecting fake news?
Supervised Learning? Semi-Supervised Learning? UnSupervised Learning? Deep Learning?

red sphinx
#
message = input("Type your message: ")
if message == ("hello"):
print("hello", "hi", "greetings")
#

will this work?

serene scaffold
#

also none of those are mutually exclusive with deep learning

serene scaffold
red sphinx
#

oh

#

so how do i make it random then?

modest shuttle
serene scaffold
red sphinx
#

so i can just type

#
message = input("")

if message == hello:
return random.choice(['hi', 'hello'])
arctic wedgeBOT
#

Indentation

Indentation is leading whitespace (spaces and tabs) at the beginning of a line of code. In the case of Python, they are used to determine the grouping of statements.

Spaces should be preferred over tabs. To be clear, this is in reference to the character itself, not the keys on a keyboard. Your editor/IDE should be configured to insert spaces when the TAB key is pressed. The amount of spaces should be a multiple of 4, except optionally in the case of continuation lines.

Example

def foo():
    bar = 'baz'  # indented one level
    if bar == 'baz':
        print('ham')  # indented two levels
    return bar  # indented one level

The first line is not indented. The next two lines are indented to be inside of the function definition. They will only run when the function is called. The fourth line is indented to be inside the if statement, and will only run if the if statement evaluates to True. The fifth and last line is like the 2nd and 3rd and will always run when the function is called. It effectively closes the if statement above as no more lines can be inside the if statement below that line.

Indentation is used after:
1. Compound statements (eg. if, while, for, try, with, def, class, and their counterparts)
2. Continuation lines

More Info
1. Indentation style guide
2. Tabs or Spaces?
3. Official docs on indentation

modest shuttle
serene scaffold
red sphinx
#

well thanks

serene scaffold
#

a company I interviewed for told me about their fake news detection algorithm, but I don't think they want me repeating it.

serene scaffold
red sphinx
#

oh ok

#

then ill just make it in VSC then send it here

#

well ik i have asked from you alot

#

but just the last question

#

user.says() THIS

#

THIS

#

I HAVE SUFFRED FROM THIS

#

sorry caps but dude please tell me

serene scaffold
#

that part is entirely made up. there is no user.says()

red sphinx
#

ik

#

but the user.says

#

has the input code and stuff

#

please tell me how do i make it

serene scaffold
#

I don't know.

red sphinx
#

oh

#

ok

#

well thanks for the help

#

bye

odd meteor
patent pine
# desert oar what do you mean by "compare" exactly?

I have two models, and I want to compare their output. I want to know which event did they predict differently.
And since I'm dealing with massive data frames, I want to use the most efficient way to compare. Efficient in terms of memory.

patent pine
keen helm
#

ok, i have https://www.toptal.com/developers/hastebin/epelemifuz.properties as (a* algorithm) pathfind AI, i just wanna ask, where is the (0,0) point in this list https://www.toptal.com/developers/hastebin/novaconile.ini

short heart
#

is there an easy way to replace values with their mean for array area. For example:

[[0,6,0],
[6,3,1]]

and id want to take mean of [0,6].[6,3],[0,1] so result will be following:

[[3,4.5,0.5],
3,4.5,0.5]]```

with numpy or without, anything will do
serene scaffold
#

you can use mean imputation to replace the numeric NaNs and mode imputation to replace the string NaNs. Both of these can involve DataFrame.fillna. If you have any follow up questions, keep in mind that I will not look at any screenshots of text, only actual text in a markdown block in the chat or in the pastebin.

serene scaffold
acoustic crow
# serene scaffold you can use mean imputation to replace the numeric NaNs and mode imputation to r...

Apologies for sending them in such format. I am not looking to replace the values in the dataset, but instead whenever the function encounters a NULL value just to skip over it and do nothing with it. This is where I define the column list and the function


# positive integer columns
pos_int_col = data_check[['ApplicationFinancedAmount'
              ,'AssetHighestValueGapRatio'
              ,'AssetHighestValueManufacturingYear'
              ,'DeductionPercentage']] 

# Creating a function to check for negative values "find_neg_index" and print the value of the row and its position
# The function takes two arguments 
# df - the dataframe to validte upon
# num_col - is a predifined list of integer values only columns within the dataframe / or a single integer column

def find_neg_index(df, num_col):
  
  neg_dict = {}
# Iterating on column level
  for col in num_col:

# Creating a list within the dictionary and adding the column name as key and input an empty list as it pair
    neg_dict[col] = []

# Getting the full lenght of the dataframe, row
    indx_list = range(0,len(df[col]))

# Creating an empty list for the index position
    neg_indices = []
  
# Iterating on row level
    for indx in indx_list:   
    
# Extracting the value on each row the loop is working on
      val = data_check.loc[indx,col]

# Setting the condition for the validation and transforming string values to numeric 
      if pd.to_numeric(val) < 0:
        print('Find ',val,'at row',indx,'for column',col)
        neg_indices.append(indx)
        neg_dict.update({col:neg_indices})
        
  return neg_dict ```
#

After that when I parse the dataframe and the list of columns through the function I get the following error:


#error message
Find  -62242.65 at row 25 for column ApplicationFinancedAmount
ValueError: Unable to parse string "NULL" at position 0
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "NULL"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<command-156159945049379> in <module>
----> 1 find_neg_index(data_trans, pos_int_col)

<command-3934177330870497> in find_neg_index(df, num_col)
     26 
     27 # Setting the condition for the validation and transforming string values to numeric
---> 28       if pd.to_numeric(val) < 0:
     29         print('Find ',val,'at row',indx,'for column',col)
     30         neg_indices.append(indx)

/databricks/python/lib/python3.8/site-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    152         coerce_numeric = errors not in ("ignore", "raise") ```
#

I just want for the function to skip over the NULLS but I am not sure how to achieve that.

serene scaffold
#

if you call dropna on a series, it will give you a copy of the series with no NaNs. if you do it on a df, it will give you a copy without rows that had at least one NaN

keen helm
#

how does A* determine x and y position

lapis sequoia
#

What rl algorithm would be smart to choose for a simple shooter game.

The agent would be given the
position and rotation of enemies (direction their facing)
distance to the enemies, and it's own

rotation, speed, position and acceleration.

Possible moves would be
Turn left, right
accelerate forwards backwards left and right, fire.

Positive points for hitting enemies, negative points for being hit.

(If for whatever reason more complexity is needed, enemies could move, the agent could accelerate in 8 directions instead of 4 (diagonal) and the agent only gets position of visible enemies.)

lapis sequoia
#

is qr decomposition in numpy a static method? as in, does it involve any random factors or it will give same output for the same matrix everytime?

misty flint
#

could probably use simple q-learning for that

brave sand
#

if i had a drone learn how to fly, what would be the reward?

lapis sequoia
brave sand
lapis sequoia
#

hm yeah Q learning is related to reinforcement learning, and qr decomposition is a very different process.

mint palm
#

Will i have to experiment when it comes making architectures like this:

misty flint
lapis sequoia
#

hi

#

where do i ask for help

misty flint
#

try a #help channel

serene scaffold
lapis sequoia
#

tysm

misty flint
#

yes there should be available ones

#

so try to use those

lapis sequoia
misty flint
#

all good

worldly zinc
#

Do questions about numerical integration methods go here?

serene scaffold
worldly zinc
#

Ah, darn

#

I tried the help channel but no luck, I'll try another server

#

thanks!

paper compass
#

most of these audio clips contain multiple labels (in rating system 0-3)

misty flint
#

not sure

#

speech recognition is not my area of expertise

#

but it does sound like an interesting problem

lapis sequoia
#

only longer interviews, seems too messy to work with

misty flint
#

oof

#

probs gotta clean it first before can get to usable state then

vestal ocean
#
artists_european = artists_european.drop(['Position','Track Name', 'URL', 'Date','Region'], axis = 1) ```
#

Why does this code give me this?

#

But i get this when running ```py
artists_european = artists_european.groupby("Artist")['Streams'].sum()

#

How can i make stay in the previous format but just summing the streams for each artist?

lapis sequoia
#

with previous format you mean including the index labels ?

#

or the order ?

vestal ocean
#

i can just reset it

lapis sequoia
paper compass
#

but training model with this data is kinda unclear for me, how should i correctly label those clips?

lapis sequoia
paper compass
#

if for example one clip contains 4 labels, with associated number from 1 to 3

lapis sequoia
vestal ocean
lapis sequoia
#

is the dataset publically available ?

misty flint
#

3 means all 3 agree on that label

vestal ocean
paper compass
lapis sequoia
vestal ocean
mint palm
#

suggest a functional api plz, column 7 and 8 are greatly correlated, other comparitivly preety less

misty flint
vestal ocean
misty flint
#

i also dont know what you tried so theres always dif ways for improvements

paper compass
misty flint
#

who knows tbh

#

you cant really look at a dataset and know immediately without exploring and/or trying a few models

paper compass
#

so there's hope, that's all i really needed to know lol

misty flint
#

glad i could help (?) kekHands

paper compass
#

yes u did help! thank you 🙂

misty flint
#

@strange stump what do you mean by just plotting and gradients?

#

you seem like youre at least comfortable in excel, no?

strange stump
#

scatter plots for example

misty flint
#

thats a good start

strange stump
#

i used python to analyse my data for my degree

#

i am a physics student

misty flint
#

i see

steady basalt
strange stump
#

had some "useless" data i needed to clean and then identify peaks

#

i used python for this

misty flint
#

physics is a good background for this tbh

#

since you are used to working with messy data kekHands

strange stump
#

ya thats probably why i got the interview

misty flint
#

thats good

#

i would try to step back and try to understand the dataset when you open it up

#

look at the column names, see if theres any attached documentation that might give you more context

strange stump
#

yeah ok

misty flint
#

then you can maybe figure out what exactly you want to plot

strange stump
#

so to my understanding

misty flint
#

what is independent vs dependent

strange stump
#

id wanna clean it first

#

technically i can do that with python

misty flint
#

typically but you may also receive an already cleaned dataset

strange stump
#

true!

misty flint
#

if youre more comfortable in python, then feel free

#

whatever youre most comfortable with

strange stump
#

and they said i have 30 mins for this

#

so its probably already cleaned

misty flint
#

then after understanding the columns, i would do some EDA (Exploratory Data Analysis)

strange stump
#

i just gotta draw some conclusions about the variables they give

misty flint
#

pandas is really good for EDA

strange stump
#

i might need to look at that more

#

i just remember using pandas to store data in a dataframe

strange stump
#

and then doing fourier transforms on my cleaned data

lapis sequoia
misty flint
#

since you can see avg, max, min, count, etc.

misty flint
strange stump
#

HAHAHAHAH

#

yeah its just nice discrete data i hope

#

i wouldnt expect anything too spicy just a nice simple graph

misty flint
#

probs otherwise it would be a bit much to expect from a data analyst role

strange stump
#

graduate level too...

misty flint
#

yeah how comfortable are you with python viz libraries

#

either matplotlib or plotly or seaborn

strange stump
#

for example matplotlib?

#

ive used matplotlib to draw my graphs

misty flint
#

be able to make sure you can

  1. plot graphs
  2. label axes and titles
  3. draw simple regression lines
strange stump
#

the other channel user said seaborn is just for visualisation too but not necessary

misty flint
#

yeah it just looks nicer is all

#

im partial to plotly myself

#

but you should be able to still convey the info with matplotlib

strange stump
#

should i bother learning how to present data with seaborn?

misty flint
#

ehh

#

do you think it will help

strange stump
#

i mean if its visually pleasing they might like it more?

misty flint
#

or do you think the company cares more about the info

strange stump
#

psychology and sht like that

misty flint
#

i mean its not that hard to pick up tbh

#

so up to you

#

i wouldnt use matplotlib graphs in any documentation or papers but thats me

strange stump
#

i think theyre a bit ugly too yeah xD

#

ok fine i got the visualisation bit

#

and maybe the analysis

#

so i should be good then?

misty flint
#

good

strange stump
#

i just gotta answer some questions apparently

misty flint
#

get some practice in working with regular datasets and you should be good

#

just so you dont run out of time

strange stump
#

oh yeah

#

ok i should be good then

#

thanks man

#

ill ping you next week IF i need it 😄

misty flint
#

no problem bud

#

best of luck

vestal ocean
lapis sequoia
mild dirge
#

Using an existing neural network model (efficient net b1) for object recognition, but untrained. Trying to fit it on dataset of 400 classes (bird images) with about 100 images per class. Is this going to take really long to train/ is it possible from an untrained efficient net b1 model?

#

Running it with pytorch on a 2080 gpu with cuda

agile cobalt
mild dirge
#

hmm

#

I'm more so wondering if it would even converge after a reasonable amount of epochs

#

it takes about 5 mins per epoch (40k images)

#

But I guess there's no good way to tell

#

Training such a massive network from scratch seems not very do-able

agile cobalt
#

7.9M parameters with a depth of 186...

mild dirge
#

yeah haha

#

using transfer learning it's literally 1 epoch that takes about 1 minute and 90% accuracy

#

Doing it for a project comparing pre-trained and non-pretrained networks

agile cobalt
mild dirge
#

But otherwise i'd have to design my own network, which makes the comparison basically worthless

mild dirge
agile cobalt
#

the v2 sounds like it should be a direct improvement on the v1

mild dirge
#

right haha

agile cobalt
#

same authors, two years of progress later down the line

mild dirge
#

Don't see it in pytorch, and all my code is with pytorch rn so that would complicate a lot

#

I'll just try efficientnet b0 for now, it seems a lot smaller and at least 3 times faster per iteration

mild dirge
#

!paste

#

Currently looking at this code (custom CNN model using PyTorch), And i'm not completely sure how the shapes match for a specific line (line 46)

#

The input shape there is 64 x 7 x 7 but in the forward pass they explain that the output after the layer before it would be 128 x 7 x 7 (line 68)

#

The code seems to work fine however, so is the comment wrong, or am I missing something?

#

And a bonus question, They seem to bundle these layers up multiple times. Does this pattern have a name? what does res stand for?

#

Appreciate any response!

misty flint
#

possibly ResNet

#

where are all our CV peeps kekHands

misty flint
#

ResNet follows VGG’s full convolutional layer design. The residual block has two convolutional layers with the same number of output channels. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function.

#

im not even a computer vision guy

mild dirge
#

Thx for the reply!

mild dirge
misty flint
#

yeah maybe the comment is wrong. i believe you can always check

#

have it print the size at that line or something

#

good night

misty flint
stone marlin
#

I feel like one thing that people looking to advance in their DS careers tend to not think about is DE and Devops stuff, as well as Business-related things. It's totally possible to become a staff or whatever DS without this stuff, but having a general understanding, in my opinion, makes one much more competitive in the industry and allows for a more holistic understanding of the entire data pipeline --- instead of just modeling.

But since I'm doing MLE right now, my opinion is pretty biased, ha.

lone drum
#

hello python File "C:\Users\Admin\AppData\Local\Temp/ipykernel_2592/3914410830.py", line 1 nf_df_cur_exp = df[df['Expiry_new'] == 2022-03-24] ^ SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

#

how i can fix above error ?

#

when i try python nf_df_cur_exp = df[df['Expiry_new'] == 2022-0o3-24] i am getting empty dataframe

#

can anyone help me in this ? ping me when reply

iron basalt
#

Did you mean "2022-03-24"?

lone drum
inland zephyr
#

i have question about loss while training a model

Epoch 1/10
8/8 [==============================] - 14s 578ms/step - loss: 529.6362 - accuracy: 0.7676 - categorical_crossentropy: 529.6362 - val_loss: 62.1763 - val_accuracy: 0.0000e+00 - val_categorical_crossentropy: 62.1763
Epoch 2/10
8/8 [==============================] - 2s 248ms/step - loss: 466.1245 - accuracy: 0.8966 - categorical_crossentropy: 466.1245 - val_loss: 78.1461 - val_accuracy: 0.1207 - val_categorical_crossentropy: 78.1461
Epoch 3/10
8/8 [==============================] - 2s 248ms/step - loss: 201.3840 - accuracy: 0.9024 - categorical_crossentropy: 201.3840 - val_loss: 139.4732 - val_accuracy: 0.1762 - val_categorical_crossentropy: 139.4732
...
Epoch 9/10
8/8 [==============================] - 2s 252ms/step - loss: 60.5674 - accuracy: 0.9659 - categorical_crossentropy: 60.5674 - val_loss: 897.9677 - val_accuracy: 0.7778 - val_categorical_crossentropy: 897.9677
Epoch 10/10
8/8 [==============================] - 2s 245ms/step - loss: 66.1619 - accuracy: 0.9669 - categorical_crossentropy: 66.1619 - val_loss: 924.7500 - val_accuracy: 0.8333 - val_categorical_crossentropy: 924.7500
3/3 [==============================] - 1s 241ms/step - loss: 414.3506 - accuracy: 0.8426 - categorical_crossentropy: 414.3506

i used small epoch for the model(10) since my data are very small. I dont know if this is normal when the epoch spiked quickly but the evaluate result are fine

shell anvil
#

that never happened to me

#

when you say the data are very small, what you really mean with that?

#

@inland zephyr

inland zephyr
#

18 class, each class have 18 image

#

when using 80:20 split, so 16 train and 4 test. Although that, i also set in training phase 0.1 validation split

shell anvil
#

ok

#

returning to your doubt

shell anvil
#

but...

#

I'm not 100% shore

#

I'm new in the machine learning

#

but I think its fine

inland zephyr
#

but this is happen if i call the model.evaluate()

Accuracy: 0.790123462677002
AUC: 0.5
Precision: 0.0555555559694767
Recall: 0.0555555559694767
F1-Sco: 0.0555555559694767

I think this is troublesome

lone drum
#

my dataframe this way ```python

1 Strike Price Token_x Exchange_x ... Vega_y Gamma_y Expiry_new_y
0 14350.0 102048025.0 NgdE ... None None 2022-03-24
1 14350.0 102048025.0 NSsgE ... None None 2022-03-24``` i want to make first row as header

#

ping me when reply

dusty ivy
#

HI guyz

#

can anyone help me how to implement this one?

#

I implement this in C++ but the output seems like not correct

#
void train(vector<vector<double>> xy){
            int x = 0, y = 1;
            int epoch = 3;
            while (epoch--){
                random_shuffle(xy.begin(), xy.end());
                double tot_err = 0;
                while(tot_err < 0.01){
                    for(vector<double> data : xy){
                        double y_c = predict(data[x]);
                        // a.
                        err = data[y] - y_c;
                        tot_err += err * err;
                        // b.
                        b1 = b1 + alpha * err * data[x];
                        b0 = b0 + alpha * err;            
                    }
                }
            }
        }
#

the epoch here happens with the variable x and y are done distributed.

#
Total error: 1.80167e+09
Total error: 1.45195e+09
Total error: 1.54914e+09
y = -2556 + 6608x
#

the good thing here is that the total error is not zero but in the y hat it should be y = -2467 + 256x or something like that is not thousand because my output seems like too different.

tacit basin
dusty ivy
#
def sum_of_sqerrors(alpha: float, beta: float, x: Vector, y: Vector) -> float:
    return sum(error(alpha, beta, x_i, y_i) ** 2
               for x_i, y_i in zip(x, y))

@tacit basin what is the zip(x, y) here?

tacit basin
dusty ivy
#

okay thanks...

steady basalt
#

does anyone know when sk-learns random forest was released?

#

I am quite suprised that these authors did not show in their results that RF can achieve higher than their NN. On this data set I see it is 74% with RF without oversampling

#

is this sort of academic trickery prevalent?

#

And I wonder why these essentially homework level projects are being published

misty flint
mint palm
#
    model.add(Dense(8, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dense(4, activation='tanh'))
    model.add(BatchNormalization())
    model.add(Dense(3, activation='softmax'))```
#

comment of performance please

tacit basin
#

Beautiful

#

Although a bit unusual that train is worse than test

mint palm
mild dirge
#

Maybe the test images are cherry-picked? @mint palm

mint palm
mild dirge
#

Ah

#

well if it's not averaged it could just be an outlier

mint palm
#

?

#

you means intentionally pick example, for normal distribution

mild dirge
#

Have another pytorch question, when using transfer learning you often see something like in this code I attached. the model.classifier = line. Is this an existing part of the model that we replace with our own layers?

mild dirge
mint palm
karmic moth
#

Does anyone knows TF-IDF well, my question is should we remove exremely rare words/features and the most common features/words when producing TF-IDF vectors by using min_df and max_df?

sinful bramble
#

please i need help, i want to crop the passport of each student base on the of the student in the album, i wrote an algorithm which can crop the passports but it crop it crop the passport randomly, whereas i want the first passport to be 001.jpg while the second passport to 002.jpg .

mild dirge
#

Does that make sense?

karmic moth
grave frost
#

@iron basalt Pretty disappointed in Numenta all in all, the fact that they're resorting to such base tricks to try and show the performance of their methods is...honestly appalling.

#

they feed an explicit one-hot-encoded vector to their model for a meta learning, multi-task RL env and they have the gall to call it a "prior" which other DL models don't have access to?

#

pretty much exploiting the definiton of a prior smh

mint palm
mild dirge
#

I mean if that test accuracy is correct then it seems fine

#

but it is weird that your test accuracy is higher than training accuracy

#

So there might be some unknown underlying problem

desert oar
#

you should check the docs to see what compare does, it probably does more than you need

mild dirge
#

Is it unconventional to not freeze a large part of the model when using transfer learning?

lapis sequoia
#

Can someone help in why is it not working when I try it in loop

#

model is a list of strings

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

Anyway, it looks like you're trying to select rows where df['Model'] is an element of model, is that right?

lapis sequoia
#
IndexError                                Traceback (most recent call last)
<ipython-input-50-9ca28fa27480> in <module>
      1 for m in model:
----> 2     x=df[df["Model"]==m].sort_values("Total",ascending=False).iloc[0]    ##Taking the one with the maximum Total

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    893 
    894             maybe_callable = com.apply_if_callable(key, self.obj)
--> 895             return self._getitem_axis(maybe_callable, axis=axis)
    896 
    897     def _is_scalar_access(self, key: Tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1499 
   1500             # validate the location
-> 1501             self._validate_integer(key, axis)
   1502 
   1503             return self.obj._ixs(key, axis=axis)

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_integer(self, key, axis)
   1442         len_axis = len(self.obj._get_axis(axis))
   1443         if key >= len_axis or key < -len_axis:
-> 1444             raise IndexError("single positional indexer is out-of-bounds")
   1445 
   1446     # -------------------------------------------------------------------

IndexError: single positional indexer is out-of-bounds```
serene scaffold
lapis sequoia
#

there are repeated rows. I wanna take the one with the maximum "Total" column. Because it's the latest

serene scaffold
#

try df.groupby('Model')['Total'].max()

#

without the for loop. just that one statement by itself.

lapis sequoia
#

Then do i drop the duplicates and replace their total value from this new DF?

serene scaffold
#

If I were to do so, I'd have to see the rest of the DataFrame. but I'm heading out now.

tacit basin
mint palm
#

what does it mean by n_sample and n_feature

#

in above

mild dirge
final spruce
#

Hey, I'm trying to make my own trading bot.
Does anyone know a good API what I should use to gain stock information (not only the value/volume but also values from indicators like MACD / RSI)?

formal breach
#

Guys is dataquest a good site to learn python? i've learned the basic so now i will move on doing projects learning about data science ( Data analysing and Machine learning) are there better place to learn or this is good? i like learning visualising and doing projects

jaunty mural
lapis sequoia
tacit basin
mild dirge
#

ah alright

#

Using SGD now with decaying learning rates for the entire model so won't be using that I think

#

thx for the replies!

tacit basin
mild dirge
#

ah cool. Just started using pytorch for this project. I'll just probably try to wrap up this project as soon as possible to start on the report. but pytorch seems really cool.

#

The fact that it is a lot lower level than sklearn ,what I mostly use, really helps understand stuff better

tacit basin
#

Sorry it's discriminative learning rates not differential

iron basalt
# grave frost <@!119925597395877889> Pretty disappointed in Numenta all in all, the fact that ...

Is this still about that same paper? Can you link it again? Let me put it this way, I am impressed with Numenta's ideas, not their results or comparisons with others. For example, there are several others that have also gone and run with the grid cell idea and their stuff seems to be getting results. So I would suggest taking their ideas and trying to make them work yourself, and avoid the issues that they have. There is ofc always drama in the ML community and such. If you think the idea might still have some merit to it but they did it wrong, either in their implementation or method of testing / comparison, then you can do it the right way. You can find those that blindly follow Numenta's work, and those that are overly dismissive.

lapis sequoia
#

If anyone out here with experience in ai & ml field doesn’t mind specifying a solid book for Ai beginners/juniors please let he/she kindly do as I’m really confused on Ai learning

stark zenith
#

Been doing data manipulation in pandas for like 6 months now for work and just now got a strong hold on apply, and now I feel like i can both rule the world, and need to reinvent everything I wrote so far.

proven meadow
#

So full disclosure I originally asked this in a help channel but it’s kind of an open ended question so I think it fits here better.

Hello, so I have a project where I need to parse through a txt file of a classic novel, examine all lines of spoken dialogue, and (this is the hard part) decide which character speaks which line.

My teacher has not lectured us on NLP before, and I honestly don’t know where to start for the actual classification algorithm. If anyone can help guide me with any tips on what I would have to employ, links to resources (that aren’t too mathy for a HS sophomore), explain some packages that can help, etc., that would be great, thanks!

serene scaffold
#

what class is this for?

proven meadow
serene scaffold
misty flint
#

so can SpaCy

proven meadow
#

Sorry I’m a noob at this I don’t really know ML

blissful bone
misty flint
#

depending on the complexity of the text, simple classification models should suffice

#

otherwise

#

youre looking at maybe more advanced stuff

serene scaffold
proven meadow
misty flint
#

so instead of classification youre doing clustering. for the most part the same tbh

serene scaffold
misty flint
#

inb4 10+ details

proven meadow
#

Give me a moment

misty flint
#

this is like when you work with a business stakeholder

#

no offense to business peeps

grave frost
# iron basalt Is this still about that same paper? Can you link it again? Let me put it this w...

ye, https://arxiv.org/abs/2201.00042
my main issue is why the level of inconsistencies in the overall testing methodology? put it on the forum, authors won't even reply 🤷‍♂️

its not a major thing really, but....it does put a dent in Numenta's overall credibility

proven meadow
# serene scaffold uh okay. are there any other details like this?

Ok so basically the project is: Build a "profile" for each character in the novel. The profile includes all of their spoken dialogue as well as a list of adjectives that would accurately describe their characterization in the novel. I'm pretty sure it's supposed to be unsupervised clustering (ie I will not go through the novel by hand and match words to characters).

#

This is my teacher's first year doing this lab so it's pretty open-ended, I don't need something perfect

serene scaffold
proven meadow
#

Oh oops yeah. In that case what I meant is that I'm only doing this for one novel

#

wait