#data-science-and-ml

1 messages · Page 310 of 1

winged stratus
#

i can try and get upto 1000 images more

grave breach
#

Yes, if you lower the quality 1000 should be ok

winged stratus
#

500x500?

#

i dont know what i should lower it to

grave breach
#

Make a try

pallid parcel
#

Anybody know of a library that produce PNG files from data points?

crude fable
#

opencv?

#

what kind of data points?

pallid parcel
#

Well data points for line graphs or candle stick gtaphs

crude fable
#

ah, then matplotlib

pallid parcel
#

Matplotlib makes PNG files?

crude fable
#

of course

pallid parcel
#

Huh

#

Ok

#

Thanks

crude fable
#

just plt.savefig("*.png")

pallid parcel
#

Nice, sounds simple enough

#

Been looking all over

crude fable
#

Do u have an idea of how GANs could converge?

grave frost
#

that's too less. even using the DeiT recipie wouldn't yield much

grave breach
shy ember
#

i was hoping for 7-9 clusters but it doesn't seem to separate them far enough

grave breach
#

For this you should use k-mean clustering

crude fable
shy ember
#

from what i recall this was why i didn't use k-mean in the first place

crude fable
#

the K in k-means is a hyperparameter I think

grave breach
grave breach
#

Maybe you could try DBSCAN

#

If I don't go wrong scikit learn has DBSCAN implemented

crude fable
#

I am not quite familiar with the general training scheme of GANs. Do D and G both start from scratch or is D initialized with a good enough checkpoint?

grave breach
#

They both start from scratch

shy ember
grave breach
#

I think you have to calculate it by yourself

#

But

#

Once you have all the points in a cluster

crude fable
#

quote from Stackoverflow

grave breach
#

You can do the mean of all the xs and the mean of all the ys

#

So you have the center

#

@shy ember

lapis sequoia
#

I'm assuming you mean "better counterpart" as GPT-3, but that's a private beta, so what else can I use?

crude fable
#

maybe BART?

lapis sequoia
#

What's BART?

crude fable
#

auto-regressively pre-trained models are generally better at generative tasks like convo

#

A whole bunch of pre-trained models here

#

Basiclly BART = BERT (as encoder) + GPT (as decoder)

livid oar
simple shadow
#

hey everyone, how do i read a folder of images in github through python in google colab?

grave frost
# livid oar Checkout this new article 😃 https://www.analyticsvidhya.com/blog/2021/05/top-8...

ehh, that kind of stuff is always shite. There aren't any "must-to-know" libs ever - only those that are basic (numpy, pandas). It's absolutely useless to "learn" a library. you have to learn concepts, not libraries. in NLP, you do get some use out of those for basic processing, but I would rather say the person learns "TF/Pytorch" (i.e understand the practical terminology and API usage rather than just a lib) to use NLP models since most ML tools don't do good in real-world NLP tasks.

#

You'd find much more better blogs on Medium than Analytics Vidhya tbh

grave breach
#

And then open them via colab

grave breach
#

@simple shadow

short heart
#

How does fbprophet work?

#

cause I gave it some dataset and the output it generated kind of follows true values, but when it went to the values it didnt see, the prophet kinda starts..sucking?

#

what I mean is, how does it predict something? why did it even predict the values it was trained on if thats what happened on screenie?

grave frost
short heart
#

wdym

grave frost
#

exactly what I mean.....?

#

you can't expect something to work as perfect with minimal effort 🤷

short heart
#

i just want to know how does fbprophet make predictions and why it predicts whats already there

grave frost
simple shadow
grave breach
#

*google account

royal lintel
#

cannot pickle 'weakref' object
anyone knows how to solve it? Saving model using joblib.dump
tag if answered please

desert oar
desert oar
# short heart cause I gave it some dataset and the output it generated kind of follows true va...

Prophet is essentially regression, and this is a common problem in forecasting. See the "how prophet works" section https://research.fb.com/blog/2017/02/prophet-forecasting-at-scale/

Forecasting is a data science task that is central to many activities within an organization. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline.

desert oar
#

Basically, in order to predict a change in trend, the change either needs to be cyclical, or there needs to be some external feature that can indicate the trend is changing, and in what direction

#

Otherwise really the only reasonable thing a model could do is to continue the previous trend

short heart
#

Makes sense

desert oar
#

That said, it might be doing the wrong thing here in that what should be an unusual deviation from trend appears to be a change in trend

#

Maybe there are some configurable parameters that can help with this issue

#

Also, it would help if you drew a vertical line where the training data ended and the test data began

short heart
#

Yeah, I did draw one after posted but im not on pc now

#

Give me a sec

#

Should me something like that

desert oar
grave frost
#

that won't be very high accuracy, but I guess it gets the job done

desert oar
#

Neural networks have shown themselves to be pretty lackluster in time series modeling compared to other domains

desert oar
#

I believe it if you are doing something like classifying 1000 different time series

grave frost
#

you can check out kaggle's jane street, and everyone is running nets (I think the winner too, not sure tho)

desert oar
#

For time series prediction on a single series? I haven't seen good results but I am obviously willing to be proven wrong

grave frost
#

jane street is stock prediction BTW

#

Yirun's Solution (1st on 2021-03-29): Training Supervised Autoencoder with MLP

#

ahh, and a blend of xgboost

#
Cross-Validation (CV) Strategy and Feature Engineering:

    5-fold 31-gap purged group time-series split
    Remove first 85 days for training
    Forward-fill the missing values
    Transfer all resp targets (resp, resp_1, resp_2, resp_3, resp_4) to action for multi-label classification
    During inference, the mean of all predicted actions is taken as the final probability

Deep Learning Model:

    Use autoencoder to create new features, adding along with original features to the MLP
    Train autoencoder and MLP together in each CV split to prevent data leakage
    Add target information to autoencoder (supervised learning) to force it generating more relevant features, and to create a shortcut for backpropagation of gradient
    Add Gaussian noise layer before encoder for data augmentation and to prevent overfitting
    Use swish activation function instead of ReLU to prevent ‘dead neuron’ and smooth the gradient
    Batch Normalisation and Dropout are used for MLP
    Train the model with 3 different random seeds and take the average to reduce prediction variance
    Only use the models (with different seeds) trained in the last two CV splits since they have seen more data
    Only monitor the BCE loss of MLP instead of the overall loss for early stopping
    Use Hyperopt to find the optimal hyperparameter set

Ahh, the mind of grandmasters works in different ways than the rest of use mortals can comphrehend

desert oar
#

Yeah I have no doubt that eventually we will figure out useful general purpose neural network models that offer incremental improvements over regression-based methods

#

But for now it's a mischaracterization to suggest that a regression model like prophet will be inherently less effective than something based on an LSTM

grave frost
desert oar
#

The Jane Street problem is also a lot more sophisticated than forecasting a single series 🤷‍♂️

#

I'm not trying to say this particular model is the best thing since sliced bread either

#

Neural networks I would expect to perform much better on a complicated evaluation task like this

grave frost
#

!remindme

arctic wedgeBOT
#
Missing required argument

expiration

desert oar
#

I have no doubt that the winning forecast will at some point use deep learning somewhere along the way

grave frost
#

uh-huh

tawdry hamlet
#

Hi, does anyone have any good info on how to interpret the output of the critic/discriminator module in a Wassertstein GAN (WGAN-GP)? I am struggling to interpet whether a large number means 'normal' or a small number means normal and looking around online im seeing conflicting info?

#

Would love for some clarification

proper basin
#

I'm using xml.sax to parse a 200GB XML file and I'm trying to get byte offsets so I can seek back. Is there a way to do this? The Locator interface only provides line/column numbers that are useless for seeking

lapis sequoia
#

200GB... geez

proper basin
#

yeah it's been a pain

lapis sequoia
#

I can imagine.

#

Are you using a generator to deal with that much overhead?

proper basin
#

I'm using xml.sax which has a push API, so I read some bytes, feed them to the XML parser, and it calls callbacks where I handle the XML elements

#

so like I feed 16k bytes to it, one of the event is interesting and I want to note the file position, and I don't know how to do that

#

If I count bytes when I push data in, I know where the 16k buffer is, but I don't know where exactly in the buffer the event happens

lapis sequoia
#

Way over my head, wish I could help

#

The object event though

#

what happens when you print(blah.dir())

#

print(blah.__dir__())

#

Where blah is the obj

#

Even better, use ipython so you can tab out on the obj

proper basin
#

unfortunately the parser's internals are in C so I can't really get at the internals

lapis sequoia
#

Its still an obj is it not?

#

i.e.

#
blah blah your code blah blah

x = your callbacks obj at the event level that is interesting


Now you just need to see what properties x has
#

So to do, you can tab it out if you make x available to you in ipython

#

In [1]: f = str()

In [2]: f.casefold
capitalize() format_map() isnumeric() maketrans() split()
casefold() index() isprintable() partition() splitlines()
center() isalnum() isspace() replace() startswith()
count() isalpha() istitle() rfind() strip()
encode() isascii() isupper() rindex() swapcase()
endswith() isdecimal() join() rjust() title()
expandtabs() isdigit() ljust() rpartition() translate()
find() isidentifier() lower() rsplit() upper()
format() islower() lstrip() rstrip() zfill()
function()

#

Like that. I tabbed out and all the things f can do or has, pop up 🙂

proper basin
#

Oh yeah there's stuff, just not useful stuff

#
(Pdb) p parser
<xml.sax.expatreader.ExpatParser object at 0x7f393cee4430>
(Pdb) p dir(parser)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_bufsize', '_close_source', '_cont_handler', '_decl_handler_prop', '_dtd_handler', '_ent_handler', '_entity_stack', '_err_handler', '_external_ges', '_interning', '_lex_handler_prop', '_namespaces', '_parser', '_parsing', '_reset_cont_handler', '_reset_lex_handler_prop', '_source', 'character_data', 'close', 'end_element', 'end_element_ns', 'end_namespace_decl', 'external_entity_ref', 'feed', 'getColumnNumber', 'getContentHandler', 'getDTDHandler', 'getEntityResolver', 'getErrorHandler', 'getFeature', 'getLineNumber', 'getProperty', 'getPublicId', 'getSystemId', 'notation_decl', 'parse', 'prepareParser', 'processing_instruction', 'reset', 'setContentHandler', 'setDTDHandler', 'setEntityResolver', 'setErrorHandler', 'setFeature', 'setLocale', 'setProperty', 'skipped_entity_handler', 'start_doctype_decl', 'start_element', 'start_element_ns', 'start_namespace_decl', 'unparsed_entity_decl']
proper basin
#

Ok I wrote untractable code that records the byte offset of the buffer and the line number inside that buffer

lapis sequoia
#

What's the logic that goes into selecting a loss function

grave frost
#

do you mean selecting a loss function or why do we use one?

lapis sequoia
#

I mean, what loss function should I choose

#

Yeah, selecting one, I worded that badly lol

#

Like atm I'm doing an MLP sequential model for recognising numbers, what loss function should I be choosing?

grave frost
#

classification on multi-class --> cross entropy

#

I mean, there aren't that many options are there?

lapis sequoia
#

What's the difference between that and the sparse variant?

grave frost
#

the sparse accepts one-hot encoded labels I think - not sure tho

lapis sequoia
#

Ah, makes sense

#

Is it just for performance enhancement?

grave frost
#

I mean in TF/keras i don't thikn it does accept spare.

#

it's the opposite in fact

lapis sequoia
#

Then why use it

grave frost
#

If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss

lapis sequoia
#

Do you have a decision tree for this?

grave frost
#

but in SO, Im pretty sure I read it as sparse loss for sparse labels

lapis sequoia
#

Makes sense

#

Thanks

grave frost
#

yup, it seems sparse_categorical_crossen is for labels that are NOT sparse.
https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other
that's a weird convention

#

The sparse_categorical_crossentropy is a little bit different, it works on integers that's true, but these integers must be the class indices, not actual values.
Ahh, that makes sense in the terminology, if it does represent indices

sharp reef
#

Yeah it really helps not having to keep one-hot distributions in memory for potentially multiple batches

grave frost
#

I don't think it represents that significant of a overhead

lapis sequoia
#

How can I use the transformers lib bert model to continue a conversation (and basically be a chatbot)

#

I've got this code, but I don't know how to turn it's output into usable text.

      tokenized = tokenizer.tokenize(message.content)
      indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized)
      tokens_tensor = torch.tensor([indexed_tokens])
      tokens_tensor = tokens_tensor.to('cpu')
      model.to('cpu')
      outputs = model(tokens_tensor)
      encoded_layers = outputs[0]
#

The encoded_layers is just a buncha random numbers

serene scaffold
#

@lapis sequoia I can look at this tomorrow. However I believe the tokenizer can decode the "random numbers".

lapis sequoia
#

I know they're not random, they're outputs of the... neurons, I think they're called?

serene scaffold
#

I know you know they're not random. Bert uses transformers and I don't believe that transformers use neurons.

lapis sequoia
#

ah

foggy field
#

Hi, I think this might be a stupid question but I'm stuck. How do I train KNNRegressor from scikit using my entire dataset and then predict a target feature from 1 outside observation?

#

i asked my question more indepth in help-pretzel

#

but one has responded yet

foggy field
#

anyone able to help?

flint mason
#
RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x1 and 128x32)

Anyone familiar with this error. An using Resnet by modifying nn.Linear. The error is thrown when trying to evaluate

flint mason
# foggy field anyone able to help?
def predict_single(input, target, model):
    inputs = input.unsqueeze(0)
    predictions = model(input)               
    prediction = predictions[0].detach()
    print("Input:", input)
    print("Target:", target)
    print("Prediction:", prediction)
#

Will work for almost anything other than image classification

foggy field
#

Oh thanks

#

ill have a look

lapis sequoia
#

a simple solN would be having the layers kinda same dimensions instead of reducing

wispy tiger
#

Hey folks! I'm trying to detect motion and people with opencv and deploy it with fastapi, but I'm having some trouble integrating the two. Details are in #help-potato. Could someone pop in and help out?

vast thunder
#

Hello everyone!
So for a CNN (Convolutional Neural Network) , I am using Tensorflow . and I have Conv2d for the first 4 layers, a flatten , and then some dense layers with the last dense layer being Dense(3) with relu activation in ALL of them . My input image is loaded with CV2 and is B&W (the shape is (480, 640, 1)) and the output should be a number, for each image . The number should be either 0, 1 , or 2 . But with my Input (x values) looking something like :

[array([[213, 212, 212, ..., 149, 149, 149],
       [212, 212, 211, ..., 151, 151, 151],
       [211, 211, 211, ..., 150, 150, 150],
       ...,
       [ 27,  27,  27, ...,  18,  18,  18],
       [ 27,  27,  27, ...,  17,  17,  17],
       [ 27,  27,  27, ...,  17,  17,  17]], dtype=uint8), array([[212, 212, 212, ..., 150, 150, 149],
       [211, 211, 211, ..., 152, 151, 151],
       [211, 211, 211, ..., 152, 151, 151],
 . . .

And my output (y values) looking like :

[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]

It gives the following error :

ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'numpy.ndarray'>"}), (<class 'list'> containing values of types {"<class 'int'>"})

Why though? Because the types don't match? But when I turn the y values into np.array (like all of them) , I still get an error, telling the input sizes don't match . Anyone knows why?

tidal bough
vast thunder
#

I can send you the model code if you want

tidal bough
#

If that's the case, the input shape for multiple images should be (N,480,640,1) (a single tensor/numpy array)

vast thunder
#

Oh, N being the count?

tidal bough
#

Yup

vast thunder
#

Lemme try

tidal bough
#

It looks like you're passing a list. If you convert that list to a numpy array, what's the shape and dtype?

vast thunder
#

sorry my laptop crashed

#

after running the code with input shape of N

#

so uhm, Im waiting for it to load

#

Okay so

#

Um

#

It doesn't show the dtype

#

@tidal bough Should I convert EVERY element into one np array?

tidal bough
tidal bough
vast thunder
#

This is for testing purposes , just wanted to practice

#

That's the output

#

The input is images

#

Images , loaded by cv2, grayscaled

tidal bough
#

!e

import numpy as np
arr = np.arange(100).reshape(5,10,2)
print(arr.shape)
print(arr.dtype)
arctic wedgeBOT
#

@tidal bough :white_check_mark: Your eval job has completed with return code 0.

001 | (5, 10, 2)
002 | int64
tidal bough
#

check the shape and dtype of what you're passing to the model

vast thunder
#

kk

#

uint8, (19, 1, 480, 640) is the shape

#

So I'm guessing ... the input shape should be (19, 480, 640, 1)?

tidal bough
#

Presumably it will complain about shape if you pass this tensor to it.

vast thunder
#
model.add(Conv2D(20, (5, 5), (2, 2), input_shape=(19, 480, 640, 1), activation='relu'))

Is this alright as the first layer?

tidal bough
#

If your shape is (19, 1, 480, 640), you can make it (19, 480, 640, 1) using .transpose([0,2,3,1])

errant gull
#

What would be the best way to append a column of 1s as the 0th column to a numpy array

X = ([4,5,6,7,8,9],
     [4,5,6,7,8,9],
.
.
.
X should then be:
X = ([1,4,5,6,7,8,9],
     [1,4,5,6,7,8,9],
     [1, ..
     [1, ..
.
.
.
tidal bough
#

So (480,640,1) is right.

vast thunder
#

So (480, 640, 1) then

#

okay, so I just transpose, then fit?

tidal bough
#

yeah, I think so

vast thunder
#

Lemme try

tidal bough
#

np.ones((X.shape[0],1)) is a (n,1)-shape array where n is X's number of rows.

vast thunder
#

Thanks @tidal bough , really appreciate it!

grave frost
#

wait, you can force Convolutionals layers to find horizontal/vertical patterns?

#

does anyone know the research/technical term behind this?

lapis sequoia
#

you mean having different kind of filters in same layer?

#

yes. you can look out for inception network.

grave frost
lapis sequoia
#

how do i install pyaudio?

lapis sequoia
grave frost
lapis sequoia
#

..

#

inception network.

grave frost
lapis sequoia
#

i was just implying that if it is like that then it is possible.

grave frost
#

and the guys who described it looked like they had written it all in 10 mins

lapis sequoia
#

alright. thats why i kinda asked question as i was not 100% sure about it.

grave frost
#

literally fucking one statement "Musically motivated CNN". I mean, why publish a paper when you can't even write out it's whole architecutre

grave frost
#

so you can guess what's on it

lapis sequoia
#

wow

#

1.5

grave frost
#

:trash

lapis sequoia
#

this seems like same person. IEEE format.

grave frost
#

ahh, why didn't they link the repo with this paper?

lapis sequoia
#

there is one in which i sent hold on

#

oh i think its reference never mind

grave frost
#

like that paper should have been the first thing in the repo

#

well, their technique is interesting, but severely outdatted

lapis sequoia
#

its just 5 layers?

#

or there is repeatation like vgg?

grave frost
#

uh-huh. on top of that, they have custom layers for temporal and spatial, didn't even run NAS on it

#

A mediocre submission, but I guess it's 5 years old so I have to cut them some slack.

#

I was thinking of using LeVit. Just gotta convince them to gimme more GPU's

lapis sequoia
#

wow i need to learn a lot in data science.

grave frost
#

imo I like Vit a lot, but it's main drawback is the fact that it requires labelled data for pre-training

#

There was a new technique called BYOL, but I haven't got around to understanding it or it's results + it's highly experimental. the advantage for it is that it can use random unlabelled images from the task and pre-train on that, just like in NLP

winged stratus
#

have there been any advances in gans since wgan-gp?

hard hound
#

Hey could someone help me define a function which computes average of every last 50 ints for every point?

grave frost
#

it's honestly quite confusing to keep advances in something. I guess you can check benchmarks 🤷

lapis sequoia
#

Should I be one hot-encoding the training set as well (if possible) or just the labels?

inland zephyr
#

Hello all. I want to ask if there is any good tutorial to transfer learn VGGNet or Facenet with own-build datasets for production level architecture? Since i need to retrain it for specific face recognition task while cut production time. I search on Medium or anywhere there're lack of these kind of tutorial.

forest zenith
#

hi, my code is run, but the voice of my AI is speak too fast

#

here is my code:

import speech_recognition
import pyttsx3
from datetime import datetime

now = datetime.now()

name = input("Please enter your name before using this: ")
today = now.strftime("%B %D, %Y")

robot_ear = speech_recognition.Recognizer()
robot_mouth = pyttsx3.init()
robot_brain = ""

with speech_recognition.Microphone() as mic:
    print("Robot: I'm listening")
    audio = robot_ear.listen(mic)

print("Robot:...")

try:
    you = robot_ear.recognize_google(audio)
except:
    you = ""
print("you: " + you)

if "":
    robot_brain = "I can hear you, try again."
elif "hello" in you:
    robot_brain = "Hello " + name
elif "today" in you:
    robot_brain = today
elif you == "WWDC 2021":
    robot_brain = "WWDC 2021 will start from June 7 to 11. You can see details in https://developer.apple.com/wwdc21/"
else:
    robot_brain = "Sorry, we not supported this question."    
print("Robot:", robot_brain)

robot_mouth = pyttsx3.init()
robot_mouth.say(robot_brain)
robot_mouth.runAndWait()
lapis sequoia
serene scaffold
lapis sequoia
#

I can't figure out how to look at the methods it has

#

I tried decode but there was an error, lemme try it again

serene scaffold
#

The docs

lapis sequoia
#

The error is that encoded_layers is a list, and int can't convert it to a number (for tokenizer.decode)

#

Ah, I think I need to call convert_tokens_to_ids on the encoded_layers, then feed it to decode, possibly

serene scaffold
lapis sequoia
#

aight

#

Well, that's progress, I guess?

sick wedge
#

can anyone understand this error?

lapis sequoia
#

Trying convert_ids_to_tokens seems to return something that decode can't handle, and convert_tokens_to_ids and putting it through decode returns [UNK]

mint palm
#

the course got updated, lets goooooooooo..................!!!!!!!!!!!

lapis sequoia
# sick wedge

You need to make sure your output layer and your labels layer are the same shape

#

What's your output layer?

sick wedge
#

No worries I fixed it

lapis sequoia
#

Cool

serene scaffold
lapis sequoia
#

No, I haven't, let my try that

#

Feeding batch decode the output directly: TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
Feeding batch decode convert_ids_to_tokens: ValueError: only one element tensors can be converted to Python scalars
Feeding batch decode convert_tokens_to_ids:

serene scaffold
lapis sequoia
#

output directly:

Traceback (most recent call last):
    await coro(*args, **kwargs)
  File ".\public-chatbot02.py", line 28, in on_message
    msg_out = tokenizer.batch_decode(encoded_layers)
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 3019, in batch_decode
    for seq in sequences
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 3019, in <listcomp>
    for seq in sequences
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 3055, in decode
    **kwargs,
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils.py", line 731, in _decode
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils.py", line 706, in convert_ids_to_tokens
    index = int(index)
``` `convert_ids_to_tokens`: ```
Traceback (most recent call last):
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\discord\client.py", line 343, in _run_event
    await coro(*args, **kwargs)
  File ".\public-chatbot02.py", line 28, in on_message
    token_ids = tokenizer.convert_ids_to_tokens(encoded_layers)
  File "C:\Users\owenp\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils.py", line 706, in convert_ids_to_tokens
    index = int(index)
ValueError: only one element tensors can be converted to Python scalars
#

This is how I get encoded_layers

      outputs = model(tokens_tensor)
      encoded_layers = outputs[0]
serene scaffold
arctic wedgeBOT
#

pseudobert/pseudofiers/base_pseudofier.py line 70

def _call_bert(self, text: str, start: int, end: int) -> t.Generator[t.Tuple[str, float], None, None]:```
serene scaffold
#

see if that gives you any leads.

lapis sequoia
#

Should I be one hot-encoding the training set as well (if possible) or just the labels?

granite sierra
#

Hello, I'm doing object detection using darknet yolo, how would I write conditional statements based on the information in the image, like "if object in image, do this"

grave frost
glass cedar
grave frost
#

honestly, in reality there are very few times that you ever want to one-hot encode your x array

vast thunder
#

Guys a cross-entropy loss basically runs soft-max on the losses?

granite sierra
# grave frost if `model.predict(image) == 'tree': do_something`

hmm, that's what I thought, the issue is I'm running running the model through the exe, like sending the image over command prompt to the image.

def darknet(note):
    os.chdir("C:/Users/Denis/FinalProject/Carla/darknet-master/build/darknet/x64")
    process = popen_spawn.PopenSpawn("darknet.exe detector test data/obj.data cfg/yolov4.cfg yolov4.weights")
    print(note)
    return process

note = "Darknet Ready"
darknet_process = darknet(note)

def car_recogniser(new, detected_car_loop, sensor_name_image):
    global darknet_process
    carla_image_exists = os.path.isfile(f"C:/Users/Denis/FinalProject/Carla/carla/temp/{sensor_name_image}.jpg")
    if carla_image_exists:
        carla_image = (f"C:/Users/Denis/FinalProject/Carla/carla/temp/{sensor_name_image}.jpg").encode()
        darknet_process.send(carla_image+b"\n")
grave frost
granite sierra
#

I posted the code above what i'm doing, I'm running the model through an executable, its not within the script. The model itself isn't written in the script, I wasn't sure how I could incorporate the official darknet model in a pyscript

grave frost
granite sierra
#

I guess? My method does seem kinda jank

vast thunder
#

Guys is it necessary to memorize/understand all the loss , or generally, formulas that Machine learning has? Considering you use something like Tensorflow though

grave breach
#

Pay attention about the fact that I worte "use"

vast thunder
#

Well

grave breach
#

Because, if you use some auto ml, like autokeras, you don't really get equal results

#

But you can still do a lot

vast thunder
#

I can somewhat write a model in keras, without knowing what the frick the formulas mean

grave breach
#

I suggest you to use autokers

vast thunder
#

But I wonder, can I continue like this?

grave breach
#

No

#

But

vast thunder
vast thunder
grave breach
#

A library that automate all the model buildingh

#

*building

#

Training

#

Ecc.

#

You give it data, and it create and train a suitable model

vast thunder
#

Oh cool . But isn't the model building itself fun, and generally the whole point?

grave breach
#

I seuggest you to start with autokeras, move to keras and then move to pytorch

#

*suggest

vast thunder
#

I am already using keras ig

#

I understand conv2d, dense, maxpooling, dropout, and flatten. I guess that's enough for image classification

grave breach
vast thunder
vast thunder
modern vine
#

Anybody use Chatterbot? How can I recover the author of a message?

twin fiber
#

hello, could anyone please help me understand an issue i'm having with a model i'm building for predicting customer defaults on loan repayments

#

would really really appreciate it

paper agate
#

How can i learn about machine learning in Python? :P

exotic maple
twin fiber
#

I am applying a weight of evidence function to a train data set, however around half of my variables are categorical and have been split into their sub-categories which are using binary 1,0 to say whether that sub category was selected. my WoE however are extremely low and I think I haven't dealt with the categorical variables correctly so just need and advice on how to do so

desert oar
twin fiber
#

nope

desert oar
#

what's the target of the model? yes/no if the loan defaulted?

twin fiber
#

yes

desert oar
#

dont people just bin numerical features to compute weight of evidence?

#

then it's just the weight of evidence in each bin

twin fiber
#

it's an assignment

#

and they have given us data with loads of variables

#

so I can't imagine we are meant to leave out all of the categorical variables

desert oar
#

thats not what im saying

twin fiber
#

as there are so many, like education type, housing type, contract type etc

desert oar
#

im saying that weight evidence only works on categorical variables

#

and that non-categorical variables need to be binned into categories

twin fiber
#

well that can't be true because I'm following tutorial and they only have numerical variables exclusively

desert oar
#

what tutorial?

sour abyss
twin fiber
#

maybe I'm misunderstanding categories

#

it's a tutorial from my university

desert oar
#

link to the tutorial if you can

twin fiber
#

a lab session

#

I can't link it it wont allow access

#

here is a sample

#

all of those variables are numerical in the data set

desert oar
#

what's this sc module?

twin fiber
#

yes

desert oar
#

?

twin fiber
#

I mean none of those variables are categorical in the dataset they are all numerical values

paper agate
twin fiber
#

i think i'm misunderstanding something about categories maybe

desert oar
#

what is sc?

twin fiber
#

oh

desert oar
#

where is this woebin_ply function from?

twin fiber
#

scorecard

#

idk I'm very new to this

desert oar
#

is this a publicly available package?

twin fiber
#

yes

desert oar
#

i am pretty sure that the "bin" in that name means that the numerical features are divided into "bins", in order to produce categories

#

maybe im wrong

twin fiber
#

either way I have so many and it just looks wrong because all the sub categories have been given their own column

desert oar
#

yes that seems wrong

#

don't one-hot encode the categorical features

twin fiber
#

i just dont know what to do

#

because in the examples they only deal in numerical variables

desert oar
#

can you link to this scorecard package? the one i found does not appear to be correct

twin fiber
#

so they never discuss what to do with categorical ones

desert oar
#

were you ever actually taught how WoE is calculated?

#

is it in a textbook? course notes?

twin fiber
#

yeah but i'm cramming before it is due tomorrow

#

understimated difficulty

desert oar
#

well if you cant link to the package somewhere on the web, and you can't post the definition that you've been given, there isn't much i can say other than what i've already said

twin fiber
#

this i gues

desert oar
#

it looks like bins_adj were calculated by sc.woebin

#

i.e. those are "bins" calculated from the numerical features

#

which are themselves categories

twin fiber
#

before this step

#

we used a zscore function on the data

#

if that helps

desert oar
#

that's a good idea to do. but don't use zscore on anything categorical, it doesn't make sense to do it.

twin fiber
#

and thats when i saw the split in the df

desert oar
#

it looks like sc.woebin supports categorical features naturally, according to the docs http://shichen.name/scorecard/reference/woebin.html

#

woebin generates optimal binning for numerical, factor and categorical variables

twin fiber
#

yeah i can't understand that page it's way above what i'm doing

#

i'm following their tutorial and applying the functions to the provided data set

#

no where do they explain how to deal with categorical variables 😦

desert oar
#

(this is also the R docs i think)

#

what i would do is this:

  1. make sure that your categorical features are either pd.Categorical or otherwise not a numeric type (e.g. int, float)
  2. use sc.woebin and sc.woebin_ply as normal
#

just try it

#

use a small sample of the dataset

#

pick 2 categoricals and 2 numerics

twin fiber
#

i dont know how

desert oar
#

you dont know how to select columns from a pandas dataframe?

#

actually you dont even need to, it takes whole dataframes

#
    dt: A data frame with both x (predictor/feature) and y (response/label) variables.
    y: Name of y variable.
    x: Name of x variables. Default is None. If x is None, 
      then all variables except y are counted as x variables.
#

if you're using jupyter, go into a new cell and type ?sc.woebin

#

it will show you the docs

twin fiber
#

i'm using colab

desert oar
#

same thing

#

also Address is almost certainly a categorical variable...

twin fiber
#

what am i meant to be reading I honestly am very new to this i can't understand what this is saying to me

desert oar
#

you need to be learning basic python then, and basic pandas usage, as well as the fundamental vocabulary that people use to talk about data

#

im sorry if they didnt teach you this and you were expected to figure it out

#

thats not good teaching style and its counterproductive to learning

twin fiber
#

:/

desert oar
#

what kind of course is this? something actuarial?

twin fiber
#

business analytics

desert oar
#

i see

#

can you at least show the previous code sections

#

just copy and paste the code i dont need the text

#

i can try to give a very quick explanation of what it is doing

twin fiber
#

there is quite a lot

#

i don't know how far to go back

#

I just need to know how to deal with these categorical variables because my WoE are so low they are useless

#

I don't understand why they didn't use categorical variables in any of the examples

#

have about 20 graphs like that

#

some look like this which seems wrong

#

that's all that's relevant I think

exotic maple
#

It's the first time i've seen weight of evidence and information value lol. It looks intersting and easy to calculate, especially IV, but WoE...I dont think i'm getting what it intends to represent

twin fiber
#

the nagative or positive impact of the variable relative to your target I guess

#

I'm not too familiar either

exotic maple
#

towards data science tends to do a good job explaining stuff

twin fiber
#

yeah thank you i'll have a look

#

sadly i'm just going to have to continue modelling even though I know what I've done is incorrect

#

can't figure this out right now and have a llot more to finish

chilly pebble
#

Hi everyone, woudl this be a place to talk about a decison i need to make by tmrw morning. I have to decide if I am going to move forward with my CS masters degree or if I am making the switch to a MS in Data Science. Are any of you professional data scientists?

chilly pebble
desert oar
#

@twin fiber it doesn't look like they only used numeric variables

twin fiber
#

they did

#

I have a copy of the dataset

#

it is all numeric

desert oar
#

gender is not a numeric variable

twin fiber
#

they even had random numbers in the excel file for the address

#

I know but they used numbers in the excel file to replace

desert oar
#

then they were just showing how to use the code

#

try it on the categorical data

twin fiber
#

i have tried their code on my data

#

that is how i'm at this stage

#

what i'm saying is in their examples they are only applying the code to numeric values

#

whereas I have categories and subcategories

#

and I don't know to deal with them

#

I have used pandas.get_dummies()

#

which is meant to handle the categorical variables, but that has just divided the categories into the respective sub categories and assigned a 1 or a 0

desert oar
#

im telling you not to use that pd.get_dummies or anything like it

#

just run it on the data with the categorical features, without trying to transform them

#

the docs clearly state that the default "tree" method works on both

twin fiber
#

I'm just following the code

#

from me lectures

#

so I feel like it must be done this way surely

desert oar
#

the code says to use pd.get_dummies on the categorical features?

#

it will certainly work

#

in fact it should be equivalent to not doing it, now that i think about it

twin fiber
#

not explicitly on categorical variables

desert oar
#

so what exactly is the problem that you are encountering

twin fiber
#

the problem is that my data just seems wrong

desert oar
#

well pd.get_dummies makes no sense to use on numeric features

twin fiber
#

it has split sub categories up

#

and i have so many columns

desert oar
#

what is a subcategory?

#

did you consider using fewer columns?

twin fiber
#

and me WoE numbers are so low they are almost meaningless

#

like

desert oar
#

that's probably because the splits were wrong

twin fiber
#

type of job

#

and within that variable the customers can answer 5 different options

#

so my pd.get_dummies has split that into 5 columns

#

1 for each job

#

and assigned a 1 if it was selected

#

this slides along, there are just too many columns

#

and they say to remove any info_values below 0.1

#

all of mine are below 0.1 so I know it's wrong

desert oar
#

can you just show the code that you ran

twin fiber
#

which part man

#

my code is pages long

desert oar
#
import pandas as pd
import scorecardpy as sc

df = pd.DataFrame([
  {'gender': 'f', 'weight_lbs': 120, 'is_adult': 1},
  {'gender': 'f', 'weight_lbs': 130, 'is_adult': 1},
  {'gender': 'f', 'weight_lbs': 60, 'is_adult': 0},
  {'gender': 'm', 'weight_lbs': 50, 'is_adult': 0},
  {'gender': 'm', 'weight_lbs': 70, 'is_adult': 0},
  {'gender': 'f', 'weight_lbs': 30, 'is_adult': 0},
  {'gender': 'f', 'weight_lbs': 45, 'is_adult': 0},
  {'gender': 'm', 'weight_lbs': 175, 'is_adult': 1},
  {'gender': 'm', 'weight_lbs': 163, 'is_adult': 1},
])

bins = sc.woebin(
    df,
    y='is_adult',
    x=['weight_lbs', 'gender'],
    breaks_list={'weight_lbs': [100]},
)
df_woe = sc.woebin_ply(df, bins)
print(df_woe)

this works. i had to manually specify a break point for the weight_lbs column, not entirely sure why, maybe not enough data for the tree splitting algorithm. but it works.

twin fiber
desert oar
#

ok, none of that looks offensive. pd.get_dummies will ignore numerical columns and only convert the categorical ones, so that should be fine

#

however it looks like you never actually used the z-scores you created

twin fiber
#

hmm

desert oar
#

you'd have to do something like customer_data[zscores.columns] = zscores to replace the numeric columns in the customer data with the z-score versions

twin fiber
#

i guess i didn't see them use it in their code

desert oar
#

(ideally you wouldn't replace them, you'd make new columns, but it sounds like you're already struggling w/ the python basics and at this point you just need to get it working)

#

maybe they use it later in the code?

twin fiber
#

well the first lab was preprocessing

#

it ended with zscore function to normalize the data

#

now the second lab goes straight to WoE

#

and they don't use the zscores created previously

#

they just do the WoE stuff I'm trying to do now

#

but my results are just terrible

desert oar
#

its also not that big a deal if you dont use the z scores

#

what code did you use to compute the WoE scores

#

that is what i am interested in

#

because that is where we might be able to identify the problems and misunderstandings

#

evidently you were never taught what WoE actually is or how to use python. which makes me upset and frustrated on your behalf, but that's not something we can change now.

twin fiber
#

yeah it's been an insane few days

desert oar
#

so i want to at least see specifically what you did that generated the bad output, so i can try to at least patch up your understanding enough to survive your exam

twin fiber
#

the zscore line took me 3.5 hours yesterday

#

to get to work and remove the right columns LOL

#

it's an assignment due tomorrow at 4, luckily no exam

#

but i have so much more to try and learn

desert oar
#

im sorry to hear that. learning is always slow at first, but having poor instruction makes it that much worse.

twin fiber
#

yeah true thanks for your concern haha

#

that's what i used to compute the scores

desert oar
#

and how did you compute bins_adj

twin fiber
#

well first it was this

#

and then used that other function to manually adjust some of them

#

for the WoE plots, I have quite a few that look like this

#

that can't be right

desert oar
#

ok, and how did you compute breaks_adj

velvet thorn
#

oh lord

#

weight of evidence

#

it's been a while 🥴

twin fiber
#

what do you mean?

desert oar
#

it looks to me like this data is highly imbalanced. where you only have a small % of "bad" cases

desert oar
twin fiber
#

i just used the function

#

sc.woebin_adj

#

and it has a little input box to manually play with bins

#

and i just tried different bins to try and get some of the unintuitive plots to be monotonic

#

which is what was advised in the lectures slides

grave frost
#

**My biggest mystery in ML is: **
HOW IN THE HOLY FUCK CAN I GET BETTER ACCURACY IF I TRAIN A MODEL FROM SCRATCH THAN PRE-TRAINING

twin fiber
#

is this something to worry about

grave frost
desert oar
#

it means that there is 1 column that has all the same data in every entry

#

no, it is not something to worry about

#

it really would help if you at least listed the steps that you took to get from "a dataframe" to "the final output"

twin fiber
#

um okay

#

i will try

desert oar
#

i imagine you did something like this:

  1. load the data
  2. apply woebin to df and got bins
  3. apply woebin_adj to bins and got breaks_adj
  4. apply woebin to df with breaks_adj and got bins_adj
  5. apply woebin_ply to df with bins_adj to get the final bad output
#

is that close enough?

#

and you are worried because the output seems like it's so low-quality that you think you did something wrong

twin fiber
#

give me a sec i will close frames and then show code

desert oar
#

it would be easier to read that way anyway

twin fiber
#

okay

#

how do i copy and paste multiple modules

#

not letting me highlight more than 1

desert oar
#

notebook cells? i dont know if you can in colab. i hate colab.

grave frost
#

@desert oar ok, then how do I do this

One classic technique is to gradually reduce the learning rate, then increase it and slowly draw it down, again, several times.
it just sounds like random Learning rates with extra steps

twin fiber
#

yeah we have to use it

#

it's shit though imo

grave frost
#

you can share notebook in colab to work on it in real time

desert oar
#

i agree. i like that they make all the libraries and models available and i like that you get free gpu compute, but the interface is hot garbage.

desert oar
twin fiber
desert oar
twin fiber
#

the seed is my student number

#

so i've just changed that

desert oar
#

ok, good. i do not want to know that information.

#

thank you for sharing the code

grave frost
#

I have never expereienced such shit in my life please send help

twin fiber
#

thank you for helping

desert oar
#

so do you think that any of these features should have a strong predictive effect on the target?

#

maybe this data is just crap for this task

twin fiber
#

that's not the problem

#

well

#

maybe

#

but i mean

#

they have this in the code

desert oar
#

if you selected the data yourself then it's very very possible that it's just not a good dataset

twin fiber
#

look at my IV values on the right

#

so all of my variables have no impact?

#

and 1 has very weak impact

#

i've obviously done something wrong

distant path
#

What level of python do I need to learn opencv?

grave frost
#

Right, so basically I have a NN with an efficientnet + some Conv layers. Initially, I initialize the effnet with imagenet weights and then train from scratch on a pretty big dataset.

Next, I increase the regularization of the whole Net and re-fit to another smaller dataset. so the weights of all layers would be used again. the mystery is that this whole method performs much better than if I freezed my base network (effnet) and pre-trained on the small dataset.

I need theories on this. Any idea what could have happened?

desert oar
#

@twin fiber personally i think that IVs are fucked up because you used pd.get_dummies. for a variable like NAME_HOUSING_TYPE, you need to be adding up the IVs across all the individual columns created by pd.get_dummies

#

are you very very certain that they told you to use WoE and IV on the output from pd.get_dummies?

distant path
desert oar
distant path
#

What are the things I should know about

twin fiber
#

umm

desert oar
twin fiber
#

maybe not but I don't understand why we would go throuhg a whole preprocessing stage

#

in the previous lab

#

and then ignore it and use fresh data

desert oar
#

are you sure they werent just showing you various ways to process your data

twin fiber
desert oar
#

it looks like you did, based on the column names

#

from my messing around with this module, it doesnt look like it "splits" columns for you into separate columns, one per category. however get_dummies definitely does that

twin fiber
#

oh i did yes

desert oar
#

i bet your WoE numbers look fine but your IVs look bad

#

try it without the splits

desert oar
twin fiber
#

what do you mean how do i do that

grave frost
desert oar
#

i have to step away for a while... @twin fiber maybe go take a break and get some fresh air. sounds like you've been at this nonstop for hours.

heady yarrow
#

hello

twin fiber
#

i have been but sadly I don't have time to stop as it's due tomorrow and I have a lot more stuff to do r.e. applying models

heady yarrow
twin fiber
#

thanks so much for your time

#

I appreciate you ❤️

heady yarrow
#

I want to get a list of players data here.

grave frost
#

for generating code with AI, why don't we have masked "code modules" with special tokens for variables, chars (language specific subtleties etc.) and apply the masking LM recipie to get a good (kind of advanced) code autocompleter?

heady yarrow
#

what should I do?

rigid ledge
#

hey guys, I am training a graph GAN that consumes a lot of RAM. what are the tricks to reduce its usage ?

bronze skiff
#

uh, what part consumes a lot of ram?

#

you can lazily load in data, saving you some memory on the input side... if your backprop graphs are too large to fit in memory, you can do gradient checkpointing

#

if you need to use a large batch size, but not enough space locally, maybe distribute over multiple nodes

#

your graph representation could also be really inefficient-- are you storing entire adjacency matrices, or just lists?

#

lots of things you can try

#

but you should learn how to use a profiler first, to figure out memory usages of each part

lapis sequoia
#

if i am lacking images on my data set, how good/bad would be using a GAN to generate more images? 🙂

zinc lark
#

do i need cuda installed locally after I build pytorch from source w/ CUDA support? I know that building CUDA-enabled pytorch requires CUDA, but what about running that build?

short inlet
#

Because you will need Train Test Split and everything to run detection

charred skiff
#

Helloooo,a basic question I have,why do we mutiply input with weights?

haughty tree
#

can i start this course to get knowledge in ml

verbal niche
#

Hey there ! i was looking into the math behind neural nets, can anyone suggest some resources where they explain how they actually work ?

distant path
inland zephyr
#

i need some suggestion about face recognition with transfer learning method. Let say if i have a hundred unique person but i only one image per person as the source. The task is to identify who's face is this. Is is feasible if i use image generator to make train test and valid images for each person?

#

since what i have now is one image per person

inland zephyr
verbal niche
inland zephyr
#

Ian Goodfellow
you're wellcome

brittle stratus
brittle stratus
brittle stratus
#

Has anyone here used Named-Entity-Recognition to categorize streamed data and then perform sentiment analysis with BERT or VADER or some other pretrained model? I have only done sentiment analysis but am trying to also use NER, so I don't have to query my streamed data prior and can just analyze all of it real-time. Any help would be greatly appreciated!

raven knoll
#

I trained a logistic regression model with some scraped data and saved the model. Now im trying to import it and im trying to predict the test data from my school project but I keep getting the message could not convert string to float:

My code looks like this right now.

test_data = df["text"]

# loading the vectorizer
vect_name = open("ml-vectorizer/tldf-vectorizer.sav", "rb")
Xtest = vectorizer.transform(test_data)

# loading the model
model_name = open("ml-models/sentiment-model.sav", "rb")
model = joblib.load(model_name)

model.predict(test_data)```

I have no clue where to look...
neon marsh
#

Does anyone know where I can find a tutorial of how to do this using python and tensorflow?

grim fiber
#
X = teamid_matches.loc[:, ['home_team', 'away_team', 'home_team_cup', 'away_team_cup']]
X = np.array(X).astype('int32')
res_filter_drop = res_filter.drop(['date', 'home_score', 'away_score', 'tournament', 'city', 'country', 'neutral'], 1)
# Append data: simply exchange 'home team name' with 'away team name', 'home team championship' with 'away team
# championship', and replace the result
_X = X.copy()

_X[:, 0] = X[:, 1]
_X[:, 1] = X[:, 0]
_X[:, 2] = X[:, 3]
_X[:, 3] = X[:, 2]

y = res_filter_drop.loc[:, ['Winner']]
y = np.array(y).astype('int32')
y = np.reshape(y, (1, 900))
y = y[0]

_y = y.copy()

for i in range(len(_y)):
    if (_y[i] == 1):
        _y[i] = 2
    elif (_y[i] == 2):
        _y[i] = 1

X = np.concatenate((X, _X), axis=0)

y = np.concatenate((y, _y))

# Shuffle and split test, train
X, y = shuffle(X, y)
#

Guys I dont understand this piece of code right here why did he concatenate the columns for this prediction using svm

haughty tree
grim fiber
thorny kite
#

oh good math

raven knoll
grim fiber
#

yes you used test_data

#

@raven knoll

kindred radish
#

So i;ve come up with this diagram for a single-layer perceptron network:

#

Sigma is the summation function, phi is the activation function, x are the inputs, y are the outputs

#

Would this be the correct diagram for a multi-layer perception network with one hidden layer?

#

Basically just checking if you have layers of activation functions:? Or is it only the output layer that has them?

mint palm
#

Anyone having access currently to coursera deep learning specisialisation :course 2 ...plz dm?......theres a small help i need....please and thank you

rough otter
#

if i have two correlated variables like sqft_living (house sqft) and sqft_above (upstairs sqft), is it necessary or beneficial to get rid of one of the two?

sweet zenith
#

hey guys, which is better tensorflow or pytorch?

merry lintel
#

most likely depends on the use case

austere swift
#

^

#

i personally like pytorch better

#

its more pythonic

#

but for getting started using keras with tensorflow is easier

merry lintel
#

@austere swift

#

i have a question

#

im pretty much a beginner but how did you get started with ai/ml?

#

i know dumb quesiton but i just wonder

austere swift
#

i just started out doing some projects and learning about it

#

you have to have a good math background first though

#

linear algebra and calculus mainly

merry lintel
#

did you study it by yourself?

austere swift
#

yes

#

i made a simple neural network from scratch using only numpy

#

when i first started

atomic tide
austere swift
#

that way i can learn more about it

atomic tide
#

I actually started by taking an online course run by Berkeley. It's what got me into computer science and programming.

merry lintel
#

i am currently in 8th grade but i dont live in uk/us so it means a bit different thing

#

oh okay

merry lintel
atomic tide
#

Khan Academy is great!

merry lintel
#

agreed

atomic tide
#

I don't think they have much on AI, but their math content is really good.

#

That would be a great way to get your math up to the level required to really understand what you're doing in AI/ML.

merry lintel
#

yeah thats fine because i enjoy maths anyway xd

atomic tide
merry lintel
#

i mean you can probably do ai without actually understanding the math much but i want to know what am i doing

merry lintel
atomic tide
merry lintel
#

and one more thing

#

if i want to get started with making a basic neural network from scratch should i use numpy?

atomic tide
# merry lintel is it free?

I believe Kaggle has free courses. Edx and Coursera courses can usually be audited for free (i.e. you have access to all the materials but don't get a certificate at the end).

atomic tide
#

There's one text-book I recommend above all others for AI, and thats Artificial Intelligence: A Modern Approach by Russell and Norvig. It provides a pretty comprehensive overview of the subject.

merry lintel
#

oh okay

#

i will check out all the resources

#

thanks a lot

atomic tide
#

No prob 👍

knotty flume
#

hey guys hows it going

#

how do i increase the bar space in matplotlib in graphs

#

in barchart

#
AttributeError: 'Rectangle' object has no property 'rwdith'
``` also  got this error
#
TypeError: barh() got multiple values for argument 'width'
neon marsh
grave frost
#

oh god, I joined this Ai server on discord and it's absolute shite. they think AGI is just a task, like classification
You would think an Ai-focused server would be better than a channel in the python server

grave frost
austere swift
#

I just said I personally like it more

#

I still use keras a lot lol

grave frost
#

Are you sponsored by FAIR to spread Pytorch?

#

😛

#

oof, I hate all these stupid dependencies

knotty flume
#

Tell me about it

grave frost
#

I just hate numba. it f-ups all my stuff

true crag
#

any Idea?
ValueError: Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 96 but received input with shape (32, 1)

#

I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)

#

and that one

grave frost
#

what do you think the error means?

limber yew
#

Hello guys!

#

soooo I am like 14 and I want to learn AI

#

can anyone recommend any vids?

#

Anyone?

misty torrent
#

Does the memory / cpu required to drop a column from a dataframe scale with the amount of records in the column?

sly salmon
#

what is a data pipeline? is this similar to a ci/cd pipeline?

uncut orbit
#

its berkeley coding academy

#

its pretty good and helpful

grave frost
tidal bough
misty torrent
#

Wait I'm disabled, why not just... not load useless columns

#

then I don't have to drop it

#

big brain time

sly salmon
misty torrent
rapid ridge
#

I have a pipeline where I need to send data, but should I yield data as a single object or send it in a list of hashes?

limber yew
rapid ridge
# desert oar Can you elaborate

how do I properly build the following pipeline where all my data sources shares the same structure ?? https://dpaste.org/knNX#L1,15 the example: https://avleonov.com/wp-content/uploads/2017/05/home-grown_vulnerability_database.png

I need currently like a chat where a user sends a message then the other party recieve that data and save it
my approach

  • celery for task queue (2 tasks one each 24 hours, and another each hour)
  • node manger that acts as a chat to stream data
  • scrapers streaming data to node manger
  • node manager saves data < I am not sure if a list of dicts or single json object = many queries to db
true crag
#

I am trying to create a chatbot, I have a simple keras using intents, should I spend time adding information to that json, or try to find an already made one, or should I use a different method?

exotic maple
#

for any of the more experienced guys here, do you think joining a kaggle competition and getting a decent score (even if not top of the leaderboards) would good to showcase some python / DS proficiency?

sour abyss
#

quick stats-related question - in AP statistics we learn to estimate with confidence with methods that are optimal as long as we are sampling less than 10% of our population. I read somewhere that if we can sample over 10%, then there are more optimal statistical procedures to estimate parameters (a confidence interval, for example). how do these methods differ from the ones used?

ebon geyser
#

Anyone here who knows of a good and free AI module/API? except Chatterbot

celest python
#

Anyone knows the difference of ggplot between python and R?

glacial sparrow
#

do 'interpretable machine learning' and 'explainable artificial intelligence' mean the same thing?

ripe forge
#

Yes, pretty much

uncut barn
#

what are the different types of scenarios of when a NN overfits/underfits?

grave frost
grave frost
uncut barn
#

What about number of hidden units?

#

Or batch size

grave frost
#

that is model power - the number of parameters

uncut barn
#

Do hyperparms also affect over/underfitting?

grave frost
uncut barn
# grave frost ofc

Ok but would this depend on the data or is there a general explanation for each hyperparam?

grave frost
#

knowledge about your hyperparameters would also help a lot

uncut barn
#

Kl could you share a useful link

grave frost
#

🤷 I did this a long time ago, so I don't remeber any resources. I guess google overfitting and underfitting and read every single article you get your hands on

uncut barn
#

Alright

#

Thanks for ur help for the above

worldly ruin
#

anyone have any luck installing pandas on an m1 mac?

uncut barn
#

when the number of samples is less than the number of features does this mean we are prone to underfitting?

grave frost
loud kindle
#

what are you trying to overcome exactly?

grave frost
#

~~ google lords, forgive me for sinning,~~ but Is it just me, or is pytorch very easy to ....... debug?

#

previously, I only did tensor manipulation with PT. but now, I am running a pytorch model repo which is basically shit - and I am finding it far easier to edit the files rather than change it on my side; which is smthing I would have never dared to do with TF

uncut barn
#

yes it is @grave frost

grave frost
#

anyone have any leads on this type of output tensor(-0., device='cuda:0', grad_fn=<NegBackward>) I am unable to locate the reason and the particular code

#

I assume it has something to do with my model gradients, since it seems to be at 0 validation accuracy

#

Hmm...I will try pre-processing my data again

main moat
#

I just want to make sure what I have done so far makes sense, because it looks like I can't use any of the generic test train split method such as from sklearn.

My goal is to detect which user is typing based on samples from a typing test which captures key press and release times. There is a warmup prompt, then a real test. Regardless, all of the data for each user is put in one file, and looks like this:

1620322580238912200::1620322580342812900::h
1620322580421799400::1620322580502047400::e
1620322580559144700::1620322580677672600::l
1620322580741669700::1620322580861339900::l
1620322580878105600::1620322580967490900::o
1620322580984171600::1620322581046345000::
1620322581103447000::1620322581181768700::r
1620322582151222600::1620322582230004600::backspace
1620322582215080400::1620322582309995200::w
1620322586805930900::1620322586941929800::o
1620322587061273700::1620322587149182600::r
1620322587125172800::1620322587245511000::l
1620322587285746100::1620322587365764900::d```
`timeUp` and `timeDown` are in nanoseconds time.
#

For each user, I calculate what the text ended up looking like when they were finished with the prompt. I won't go into detail but I'm sure the algorithm that does this works as intended. I just want to give an overview of my whole process.

I put each user's table of data in a pd.DataFrame, and calculate the following:

  • timeDown of first key to timeUp of same key
  • timeUp of first key to timeDown of second key
  • timeUp between two consecutive keys
  • timeDown between two consecutive keys
  • count of times key was pressed

I add this info to a group of lists, organized by character typed. For example 'l' in the above example was pressed twice, so the following dictionary entries would be made:

'l_d_d_avg': [1620322580741669700 - 1620322580559144700, 1620322580878105600 - 1620322580741669700]
'l_d_u_avg': [1620322580677672600 - 1620322580559144700, 1620322580861339900 - 1620322580741669700]
'l_u_u_avg': [1620322580861339900 - 1620322580677672600, 1620322580967490900 - 1620322580861339900]
'l_count': 2```
#

All these calculations are stored as key-value pairs in a dict.
All users will not type exactly the same count of each character, but each dict entry is supposed to be a column. That means different users will give me different number of columns. So after looping through all the data, I replace the lists with average(list) , which greatly reduces the chance of not getting each character's stats. In other words, our attributes are character-focused, not key-event-focused. Any missing columns from this approach are unlikely, but can be imputed.

Now that I've shared all that, I can discuss what I want verification on. I want to make sure I'm splitting the train and test in a logical way. Because I need consecutive rows for my calculations, I can't just randomly split the rows for train and test, because consecutive rows may not be consecutive keystrokes if I do that, and it will throw off my calculations.

#

My Solution:
Currently, I take a percentage of unique, random indices (sampling without replacement) of the input rows before I start getting the character stats. Then while looping, I keep track of the index. If the current index is in that sample of indices, I copy the stats for that iteration to test_dict. If not, I copy them to train_dict. That way, train and test can both have averages from accurate data, but they will be computed from mutually exclusive data points.

lapis sequoia
#

sklearn or tensorflow?

neon marsh
main moat
#

@lapis sequoia sklearn

ebon geyser
#

Ping me if u do plz

silver widget
#

hi all.. I've been studying on hyperparameter tuning . Of course my helper is youtube and datacamp videos.
But I came up with a question here;
grid = GridSearchCV(estimator= rfc, param_grid = param_grid, cv=3, n_jobs=2, verbose=2)
That's my code for gridsearchCV
Should I use all my dataset for fit or still need to train_test_split the data again? If I need to split it again, what does cv=3 do?
Thanks in advence

#

*advance

void prism
#

Hello everyone, I just found the Python server and I am really excited to see how many channels there are. I was wondering if there is an equivalent server for R, I tried searching for it but I couldn't find it. Thanks in advance

light rain
#

{
"hi": "hello"
}
how can i get hi by the value hello in json

#

idk if this is the right channel lmao

sick wedge
#

I just designed my first Neural Network Model with a CNN architecture, it is meant to detect masked faces.

I've just trained it through 10 epochs, but can anyone explain these values to me such as accuracy and val_accuracy? I understand pretty well what they mean but what should I be aiming for with those values?

#

1st epoch

#

10th epoch

#

isn't it bad that the val_accuracy stays at around 78%, whereas the accuracy has increased from 71% to 98%

#

btw I did test the model with 6 images and got 50% accuracy so far

grave frost
#

you model is overfitting. you should aim for the highest val_accuracy and your accuracy should be around 4-6% your val accuracy

sick wedge
#

okay, I heard that term before, but what does it mean 😅

#

it's when it fits the data well but can't make good predictions with new data right?

grave frost
#

it means that your model is basically "memorizing" your input data

#

which is not good, we want it to find the pattern

sick wedge
#

also did you mean it should be around + or - 4-6%, e.g. if i have 70% val accuracy i should have 65-75 accuracy

grave frost
#

in the radius of 4-6% should apply to a lot fo cases

#

+- 4->6%

sick wedge
#

okay

#

I almost thought u meant accuracy should be 4-6% of 70

sick wedge
grave frost
sick wedge
#

I think I'll learn more from the latter, so will try that 😄

#

my models really basic though

grave frost
#

or you can add dropout layer

#

you can read up more about it on net, it's pretty easy to implement

sick wedge
#

okay, you think that might fix it alone? or is that on top of simplifying the model

grave frost
#

it should. if it doesn't then you can simply increase the effectiveness of dropout. that would def do it

sick wedge
#

Okay 👍

#

tbh I thought it was just cuz my training data sucked, so that's good I don't have to find more

grave frost
#

nah, it's fine if the network can overfit. Data scientists always first overfit on the data to see everythin works, then gradually increase the complexity of the model

sick wedge
#

But you said I have to reduce the complexity of the model, why is it different in my case?

grave frost
#

ah, ignore that part

#

I was just describing a debugging flow for Models. you can leave it

sick wedge
#

okay sweet

#

I'll see about getting that dropout layer implemented then, thanks 🙂

ebon geyser
#

Anyways, anyone who used ChatterBot library before?

mint palm
#

the earlier algo was trained on regular birds that are often seen

#

so why am i wrong if i marked option 1?

#

is it because i would still have lower examples

desert oar
#

this is a weird question because presumably data augmentation is part of your model training pipeline, so #2 would imply #1

chilly geyser
#

It's also a weirdly high-level question

mint palm
#

so what would be the the perfect out of thenm

desert oar
#

The city expects a better system from you within the next 3 months
Better get a telephoto lens and start birdwatching 😛

mint palm
#

haha

desert oar
#

The question says "which should you do first" (emphasis mine), which maybe helps narrow it down. What is option 3? is there an option 4+?

#

I wonder if option 2 is inherently wrong because you should be putting 800 of the images into train and 200 into test

mint palm
desert oar
#

yep, that. that's the correct answer

#

that's what i would personally want to do first

#

then you can answer the question of "how bad is our model at predicting these birds if we only have 1000 of them?"

mint palm
#

2 are rulled out

desert oar
#

what's the full statement of #3

#

also i assume this is a graded assignment from the past and you arent sneaking around rule 5 🙂

mint palm
#

oh sorrrrrryyyy

desert oar
#

yeah pedantically the right answer is #3 - make sure your evaluation metric takes the new bird species into account, before trying to actually do anything with the new bird data

#

its a weird question though

mint palm
#

what is the main reason to rule out # 1

desert oar
#

because it's not what you do first

#

it's part of the process, but not the first thing you should do

mint palm
#

yeah but is it still acceptable solution later on?

desert oar
#

sure, but that's not what the question is asking

#

(they should have bolded first)

mint palm
#

ok

desert oar
#

this is actually a good question and as you can see it tripped me up too. it almost seems designed for you to get it wrong and then learn from your wrong answer.

#

rather, the question is badly phrased but the intent behind it is good

mint palm
#

so almost all are right but we first try new eval matrix to know if we can tweak the code and work without changing data sets?

desert oar
#

i think the idea is, if you add the new bird species to your evaluation metrics for the current model, you will have 2 things happen:

  1. the model will no longer appear to (erroneously) underperform on known birds
  2. the model will be total useless shit on the new birds
mint palm
#

yeah

desert oar
#

then you can iterate towards improving 2 while maybe accepting some decrease in 1 (e.g. if the birds are similar or they happen to look similar in pictures)

mint palm
#

so we work after checking what initially we can do with what made earlier

desert oar
#

i think that is the idea, yes. something like test-driven development but for machine learning.

#

presumably also you still care about overall accuracy too, so you might be looking at multiple metrics even if you're only using 1 metric in your cross val loop

mint palm
#

ok got it ....i am quite sure it was beyond syllabus....but andrew did say it will be worth the thinking and experience

mint palm
#

or literally multiple matric(wont that create multiple f1scores)

desert oar
#

yeah maybe you want to at least keep track of accuracy, f1, and brier score

mint palm
#

👍 brier score is still left to be covered by me

mint palm
#

thank you very much

grave frost
#

I think honestly, we can still make an AI that can code very well without reaching the AGI mark

#

since code would just be a set of instructions, we have an easy one-to-many relationship to map code with instructions performed. Once we have this loss metric, then it becomes easier to arrange research in a way to minimize that metric

#

the point won't be to write code fully autonomously, but to understand what a dev has already written and help them write much much more thereby reducing the programmers required, effort and money put in by companies for projects.

#

code is avaiable in plentiful, and we already have automatic documenter that can document each function so there's that

#

I think coding itself is ripe for automation 🤑

granite wolf
#

anyone happen to be good with SQLAlchemy?

exotic maple
#

does it replace all devs? Not even close, just as a "coding AI" wouldnt make all devs obsolete either.

#

but id be surprised if a FAANG wasn't working on something like that

grave frost
#

I don't mean no-code; I mean generating most of the code

exotic maple
#

Is that broad enough? 😳

grave frost
#

mostly, the reason why I think we can acheive automated programming before AGI is that you can't really measure effectiveness of AGI - it's not a simple task to test and achieve best results on. But for automating coding, we can easily create a loss function. what's left is just to optimize it

#

so it's just the modelling part. and seeing the ingenuity from transformers, it does seem like a possibility

chilly geyser
#

'easy loss function' for coding what

#

What losses would you be considering for that?

#

All losses essentially describe the structure of perfection and I'd say it's quite difficult to describe what is 'perfect code.'

Even if you just want the structure of something 'good enough' - what is 'good enough' code?

grave frost
#

mostly, something that runs. you guys are missing the point - the aim is not to fully automate the developers but drastically reduce the amount of developers required. this strategy plays out in almost every innovation and no reason for it to not work this time too

chilly geyser
#

That's somewhat true, but despite Humans Need Not Apply being released in 2015, I feel like the economic revolution hasn't happened or would happen soon

grave frost
#

humans always overestimate future; and CGP grey is not an AI expert

#

I would be willing to bet my every nickel that it would happen in my lifetime

chilly geyser
#

FWIW, I think a no-licenses-required GPT-3 will transform the whole industry, especially as I think it's very end-user friendly

#

It can do stuff like "give me this website design" or "make sure no contrast decisions are bad"

#

Which is already quite a lot of freelance work I'd say?

grave frost
#

it def would not; it suffers too much from biases. it would require some regulation but yeah, your general hypothesis is correct imo

#

again, you overestimate capabilities of GPT3

#

it's just a giant overfitted model

#

it's not supposed to be for techncial CS tasks or medical ones

#

only simple languages - and their usage

chilly geyser
#

I'm not talkingn technical CS or medical

#

Simple language and simple work is a lot of work

#

Deploy this webapp, make a website showcasing X

grave frost
#

yeah, GPT3 can do that. give it notes on news, and you have a news article for that news in the style of a editor

chilly geyser
#

I mean the fact is most work is boring work

#

And GPT3 has the capability to replace that

grave frost
#

yeah, it can. and it has happened

#

a lot of newspapers have these AI-generate low-level articles at the end (for testing)

#

what's your point? it seems you are proving my own point to demonstrate what models are taking over specific industries

chilly geyser
#

Just discussion

grave frost
#

ahh. and consider the fact that oh, "LAnGUaAgEs aRe HaRD" by the non-technical people few years back. they overestimated the complexity of language. while GPT-3 can't write deep books or win a nobel, it can take over quite a few jobs.

I suspect the same thing is gonna happen. programmers think their coding is pretty complicated task, but then it might later turn out that it wasn't as much as you though - tho by the time most realize, they would jobless 🤷

#

if companies are paying for 10 experts at 100,000$, and that product saves them the seats of 5, then they only have to pay like half a million rather than a whole million

chilly geyser
#

tl;dw, weird deprecation issues, and also possibly weird architected systems

#

Also it's just funny so I like posting lol

grave frost
#

I know next to nothing about web-dev but I really don't see why same AI can't generate code to interface with microservices??

grave frost
#

you have to compete with 20% more people who would work more at lower salary, so bye-bye money and perks

limpid saddle
#

Hello, I need a bit of help with a kaggle project
First of all, this was supposed to take the Phrases and cut them down into words then put it in the same row, so why is it different?

slate hollow
#

hey

#
import os
import random
import tensorflow as tf


def process_text(text: tf.Tensor, cutoff_len: int = 300, word_len: int = 100, pad: str = 'asdf') -> tf.Tensor:
    print(text, text.numpy())
    initial = tf.strings.lower(tf.strings.substr(text, 0, cutoff_len))
    no_special = tf.strings.regex_replace(initial, r'[^\w\s*]', '')
    brs_removed = tf.strings.regex_replace(no_special, r'<br\s*/?>', ' ')
    split_up = tf.strings.split(brs_removed)[:-1][:word_len]
    # tf.pad(split_up, [[0, max(0, word_len - len(split_up))]], "CONSTANT", constant_values=tf.constant(pad))
    return split_up  # remove the trailing word because it might be cutoff


raw_data = ([[[1, 0], open(f'data/aclImdb/train/neg/{f}', encoding='utf-8').readline().strip()]
             for f in os.listdir('data/aclImdb/train/neg')] +
            [[[0, 1], open(f'data/aclImdb/train/pos/{f}', encoding='utf-8').readline().strip()]
             for f in os.listdir('data/aclImdb/train/pos')])
random.shuffle(raw_data)

values = [d[0] for d in raw_data]
reviews = [d[1] for d in raw_data]
data = tf.data.Dataset.from_tensor_slices((values, reviews))
data = data.map(lambda v, r: (v, process_text(r)))
#

(the /neg and /pos) directories are just a bunch of txt files

#

so it's some simple strings

#

the thing is, i get this weird error: AttributeError: 'Tensor' object has no attribute 'numpy'

#

i googled it, and they said eager execution should solve this, but they also say tf2 (which is the one i'm using) has eager execution enabled by default so

velvet thorn
slate hollow
#

yeah i tried that, but what should i pass to input and outputT

#

passing a raw tf.Tensor doesn't seem to work

fiery coyote
#

Hello everyone, everything good?

Could someone guide me to solve a problem?

I have a text (recipes) where I need to identify the words that are the ingredients within it, how can I solve this problem? Could someone guide me?

Thanks!

serene scaffold
fiery coyote
fiery coyote
sour abyss
#

quick stats-related question - in AP statistics we learn to estimate with confidence with methods that are optimal as long as we are sampling less than 10% of our population. I read somewhere that if we can sample over 10%, then there are more optimal statistical procedures to estimate parameters (a confidence interval, for example). how do these methods differ from the ones used?

desert oar
# sour abyss quick stats-related question - in AP statistics we learn to estimate with confid...

they are probably referring to the "finite population correction" that can be applied to certain situations
https://stats.stackexchange.com/q/401763/36229
https://web.ma.utexas.edu/users/mks/M358KInstr/TenPctCond.pdf

exotic maple
#

do you remember the criterias for determining finite or infinite populations? I only ever saw it i the context of manufacturing and populations are assumed infinite always (you dont have a production ceiling)

desert oar
#

the a 10% rule mentioned above is as good as any, i think

#

i don't think i've ever had to use the FPC "in anger" although in hindsight i probably should have in some cases

sour abyss
#

i think that's where the bit about infinity comes in, if N = infinity then this isn't a problem

#

so the finite population correction can take care of this in confidence intervals by multiplying the "original" margin of error by sqrt((N - n)/N) i think, right?

exotic maple
olive lichen
#

hey all, I'm working on building a text classifier using nltk and was hoping to get some guidance from someone. is anyone free to talk a bit about it?

weary summit
#

Hello,
I have a 2d numpy array, and I would like to evaluate the following equation.
Is there a numpy like(vectorized) way to implement this evaluation rather than just iterating over the values?

Bonus question, is there a way(vectorized again) to save each result of the product inside a new 1d numpy array?
Thanks in advance

true crag
#

I am using keras, tensorflow to make a simple AI bot, can u recommend any better data processing method?

#

I need to make a simple chatbot that ll interact with the user, keep a conversation... Is that process ok? Can I achieve that by adding more data to my training session?

iron basalt
# weary summit
>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([1, 2, 3, 4, 5])
>>> m1 = x[1:] * y[:-1]
>>> m2 = x[:-1] * y[1:]
>>> s = 0.5 * np.sum(m1 - m2)
#

Where s is the final half sum result, m1 is the first elementwise-multiplication and m2 the second.

weary summit
iron basalt
uncut barn
#

Anyone know why there are 3 conv layers in a row; this is Alexnet?

bold timber
#

Hi everybody

pallid parcel
#

How do i rename these columns in pandas? Im using pandas.read_json() to read the data.json { "c": [ 120.57, 120.52, 120.54 ], "h": [ 120.75, 120.63, 120.75 ], "l": [ 120.54, 120.46, 120.48 ], "o": [ 120.74, 120.58, 120.53 ], "s": "ok", "t": [ 1615302420, 1615302480, 1615302540 ], "v": [ 483063, 498590, 516948 ] }

bold timber
#

I have a question: How I know 'spain' equal to [0.0 0.0 1.0 27.0 48000.0]?

#

whether a binary number place as a randomly? Please give me the clue. I'm so confuse to understand it

lapis sequoia
#

one good example would be if we have a list of variables

#
list = ['a', 'b', 'c']```
#

if we pass it through enc it gives us

#

a
will be
[1, 0, 0]

#

it basically takes all your strings and tranforms them into a matrix

#

so what in reality you are seeing is this

#

(don't mind this just a line filler)
[0, 0, 1, 27, 48000]
^ ^ ^ ^ ^ ^ ^^^^^^^^
Spain Age. Salary

#

btw for a started I'd recommend never to use fit_transform as it might confuse you
just use fit and then transform

bold timber
bold timber
broken quail
#

Hello everyone, i am new for data science (2 month studied claning data and simple machine learning, and little bit streamlit)

Anyone had a suggestion simple project for portofolio ?

uncut barn
#

In LeNet from 6x14x14 ---> 16x10x10 via convolution, would the total number of parameters be 16x((6x14x14) + 1) where 6x14x14 is the dimension of each kernel?

hoary wigeon
#

I need help

#

asap

#

I got an dataset for assignment, I don't want you to complete my assignment
I want some hint to explore dataset

Dataset : HousingPrice

How can i start finding insights ?

#

I have related time with price, Year with price/area