#data-science-and-ml

1 messages ยท Page 213 of 1

oblique belfry
#

No. Iโ€™m in the same boat. I want a tad bit more structure at my workplace.

vital cipher
#

guys want inputs as in material, things to learn.... on cyber security threat analysis-- like where to start.....or any suggestions

#

do share your inputs

#

๐Ÿ™‚

somber kayak
torn garden
#

@drifting hemlock that was a lot of words for saying basically nothing

clear jetty
#

@torn garden Seems to me @drifting hemlock is talking about how to get data science work done when you're dealing with people on your team with very little (if any) technical knowledge? Or am I reading that wrong?

drifting hemlock
#

@torn garden @clear jetty I guess I did a bad work at explaining myself, my point is that, in the workplace, it's very difficult to structure a data science team within an organization and providing them with the tools and the databanks they need in order to concentrate on the tasks they are assigned to do, and furthermore, to provide them with the structure to actually deploy machine learning models that works for the business.

#

This is a problem that you don't actually read in the myriad of blogs out there. For example, if we are assigned with a simple binary classification task, if we do this as a hobby then it's very simple: Clean the data set, train a model, deploy a simple flask server to act as an API.

Is it that simple in a work environment? Hardly, even less if you're working with a team. First you don't get a dataset as in Kaggle, you have to create it from scratch using multiple sources, then you have to document the process because you have to inform the stakeholders how the model works and why it predicted certain value or you'll risk credibility. You also need a cloud service to work with your teammates because only using github will not be enough. And you need a whole pipeline to retrain your model every now and then. Not mentioning a lot of things in between to ensure that something can be deploy within the organization.

#

If the last paragraph was confusing, it's because that confusing it is to build and manage a data science team, we don't really have a platform that makes this process a lot easier, and if there is, well, let me know then, that'd be great!

scarlet night
#

I'm participating in a data science course, and I made this visualization to address the question of "what is the correlation between Carbon Monoxide concentration and air quality?"

From the more professional data scientists here, is there any advice on things to improve about the graph?

drifting hemlock
#

@scarlet night Well I'm not the best when it comes to visualizations but usually there are a couple questions that you can ask yourself to improve them:

  1. Who is your audience? If you define how technical your audience is and how much domain language they have then you can either simplify or add more relevant details to it. If it's a broader audience you might even want to add a caption that summarizes what you are trying to say.
  2. What problem (or benefit) are you trying to pinpoint in this visualization? If it's an all time low, they you might want to add a label with the value of the highest or lowest point. I know that we have it at the left/right side of it, but it can be hard to read.
#

Hope you nail that course btw :)

#

Oh and I forgot to say, it looks great for me! I just wanted to give some advice.

scarlet night
#

Thanks, I've had the wisdom of the average people by asking people what they think in other Discord servers

The audience is between the academic people and the average person, as while having specific and more technical terms and data, its insights and results apply to everyone: pollutants in our air make the air quality worse

small nebula
#

im fairly new to data science, and im trying to get started with tensorflow. does anyone have a moment to help me get started?

silent swan
#

@drifting hemlock I don't think there's going to be a single solution to everything you mentioned. Data Science is (almost intentionally) an incredibly broad term and covers a ton of different contexts/paradigms, and I think no one system is going to be able to streamline all the different use cases and workflows. So it's not surprising to me that you've not found much meaningful to that end. I think everyone just patches together some workflow that works well for them and their particular problem and sticks with it.

#

@small nebula what're you trying to do, and is there a reason why tensorflow over pytorch?

small nebula
#

well tensorflow is the first thing that came up on my google search, a good while ago. been playing around a bit with the tutorials and haven't really checked out pytorch yet

silent swan
#

I am quite biased, but unless you have a strong reason to use TensorFlow (the specific model you want is in TF, or you need to deploy models for business usage), PyTorch would be more recommended

#

e.g. if you're just learning DL for fun / curiosity, go with PyTorch

scarlet night
#

@void anvil I find graphing it like that makes the data less meaningful, as it is much harder for people to understand the significance of a roughly 45 degree angle line compared to 2 lines graphed over time lining up
Also, graphing it over time serves the purpose of showing that this is a constant correlation, not just a day overlapping.

#

Though seeing it, I do see what you mean, but I don't think it helps as much

velvet thorn
#

@silent swan why do you say that?

silent swan
#

which statement

oblique belfry
#

If you find TF frustrating but still want to use it, Iโ€™d recommend trying out Keras. Itโ€™s a higher level api for TF (and other platforms) that simplifies dev.

silent swan
#

I'd recommend PyTorch over TF/Keras regardless unless you have specific reasons for needing TF

#

a higher level API can make it harder rather than easier to debug

velvet thorn
#

I agree with that

#

the latter statement

#

so far TF/Keras have worked fine for me though, so I guess I personally see no compelling reason to try PyTorch out

scarlet night
silent swan
#

worth noting that PyTorch does support Tensorboard now

#

TF also has Eager mode now but from everything I've heard, it's still tedious af to use/debug

oblique belfry
#

I have found Tensorboard to be....annoying.

I get the debugging part. I like the abstraction from the static graph.

The more I have played with Pytorch, the more I want to use it on our next work project.

drifting hemlock
#

Does anyone works with Tableau Server or Online?

snow junco
silent swan
#

personally I've never used Tensorboard much

#

I much prefer to log my own metrics and then build my own visualizations

drifting hemlock
#

What you guys use to deploy your experiments?

#

I'm deciding between Flask or Django. Flask would be easier but I'm concerned about the scalability, but I don't know if Django would be an overkill for a very simple API.

drifting hemlock
#

Well definitely I'm not a Flask expert, so I'm open to suggestions, @void anvil do you think it would lag if I set up an API that handles multiple requests?

#

It's not gonna be a TON of requests, but it would be great to have a responsive API that deals with a decent amount of requests.

rough anvil
#

@snow junco I'm currently about 20% through the udemy course you linked, and so far it has been fantastic.

#

it isn't geared towards absolute beginners, but as long as you know the basics you'll be fine

drifting hemlock
#

@void anvil I guess I'll try Flask then, thanks!

drifting hemlock
#

You guys think that 25% vs 75% would count as an imbalanced dataset for a binary classification exercise?

silent swan
#

I would say yes, but whether it's "sufficiently imbalanced" might depend on your context

drifting hemlock
#

Yeah that's a good point

silent swan
#

eg if you have a ton of data and the problem is simple, and you're using the right metric, 3-1 might not be a big issue

#

but if it's a hard problem and with some metrics, even 2-1 can be very bad if you don't account for it

languid trail
#

hey guys - what method would you use for scraping pages that are loaded with javascript?
PhantomJS is deprecated, PyQt4 doesn't work for me, chrome webdriver also does not work for me

#

or maybe someone can help me fixxing the chrome webdriver issue:

  File "scraper.py", line 4, in <module>
    driver = webdriver.Chrome("/root/chromedriver")
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
    desired_capabilities=desired_capabilities)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)```
smoky pendant
#

hm, I can give it a shot. with chrome webdriver do you mean selenium, or am I just being stupid about another tool here xD

languid trail
#

yeah selenium

smoky pendant
#

hm, sounds interesting. it often get's the job done for me. what site, and what are you trying to get?

#

oh, just looked at the error. the driver is not starting.

languid trail
#
from selenium import webdriver
  
url = 'https://pythonprogramming.net/parsememcparseface/'
driver = webdriver.Chrome("/root/chromedriver")
driver.get(url)
dynamicText = driver.find_element_by_id("yesnojs")
message = dynamicText.get_attribute('innerHTML')
print("Dynamic text element says: '" + message + "'." )```
smoky pendant
#

code looks correct, never used arguments with the driver. just put it in the working directory. but I am pretty sure it should work like you have it.

#

what chrome version are you running?

languid trail
#

80

smoky pendant
languid trail
#

yep

#

wget https://chromedriver.storage.googleapis.com/80.0.3987.16/chromedriver_linux64.zip

smoky pendant
#

then the driver should be correct, hm.

#

(unknown error: DevToolsActivePort file doesn't exist)

languid trail
#

lets see if adding --disable-dev-shm-usage will help ๐Ÿ˜„

smoky pendant
#

just going to recommend that link xD, just reading over it.

#

sounds like something new in a new update

#

I started seeing this problem on Monday 2018-06-04. Our tests run each weekday. It appears that the only thing that changed was the google-chrome version (which had been updated to current) JVM and Selenium were recent versions on Linux box ( Java 1.8.0_151, selenium 3.12.0, google-chrome 67.0.3396.62, and xvfb-run).

#

wait, that says chrome 67, not 80.

#

But when we investigated further, noticed that XVFB screen doesn't started property and thats causing this error. After we fix XVFB screen, it resolved the issue.

languid trail
#

it works

#

i just removed my path from driver = webdriver.Chrome(chrome_options=chrome_options)

#

๐Ÿ†—

#

Senior Python Developer

smoky pendant
#

glad you got it working ๐Ÿ˜ƒ

serene scaffold
#

Anyone experienced with spaCy?

#

I'm doing some unit testing and it gets convoluted when I've changed a property of the Token or Doc class in another test, so there's no way to anticipate what properties the Doc object in my test will have

#

I'm wondering if there's a way to reset the Token and Doc classes at the beginning of each test.

uncut shadow
#

Hey. What is the difference between a * b and numpy.dot(a, b)?

#

Also, what is learning rate and weights in Machine learning?

silent swan
#

a*b will do element wise multiplication. np.dot will do a dot product (which in the case of vectors, means element-wise multiplication and then summing)

#

weights are the parameters of your model. learning rate is a hyperparameter for your optimizer that influences how quickly you want your optimizer to try to learn to the data

uncut shadow
#

So np.sum(a * b) != np.dot(a, b)?

#

Hmmm... What are weights for?

silent swan
#

they're the parts of the model that you learn

#

it depends on what a and b are

lapis sequoia
#

any ideas? I'll post the error as well

silent swan
#

always post the error when you're asking why you're getting errors

lapis sequoia
silent swan
#

your series has dtype object, rather than int or float

lapis sequoia
#

hm better use quotes i guess

silent swan
#

try calling .astype(float) or checking if there're any weird values (e.g. strings) in your series

lapis sequoia
#

must be because i dont know well the Series class yet, but iterating and checking each value was indeed int type

#

anyway, i used .astype and will see if everything goes right now

compact bluff
#

I have a very large dask array representing an image, how can I store it as a png quickly?

#

skimage's imsave is quite slow and so is converting the dask array to a PIL image

deft harbor
silent swan
#

my recommendation is

#

don't use deep learning for finance

#

speaking as someone with experience with both quant finance and deep learning

silent swan
#

standard statistics/econometrics problem

#

basically: don't treat stock prediction as a machine learning, treat it as an economics problem that you need to solve with statistics tools

#

more concretely: don't do stock prediction

heavy thicket
lapis sequoia
#

Guys, what are the best resources to learn data science, not just like machine learning but literally all the theory, i'm really overwhelmed

#

I tried to learn statistics by my own, but i don't see the theories being applied to data science, so i kinda need a practical way to learn? Any suggestion for this?

stray monolith
#
def plot(x_axis, points, title, x_label, y_label):
    plt.title(title)
    plt.xlabel(x_label, fontsize=3)
    plt.ylabel(y_label)
    plt.bar(x_axis, points, width=0.80)
    plt.xticks(rotation=90)
    plt.ylim([0, 10])
    plt.show()
heavy thicket
#

@stray monolith Maybe try this: plt.ylim(ymin=0)

delicate carbon
#

I want to use matplotlib.pyplot to plot a double line graph in which i can analyze for overfitting/underfitting. I am confused as to which variables I should pass to it https://pastebin.com/yfiZbKGy Ive messed around with trying to graph my intended goal for awhile before deleting my attempts

Do I put training and validation outputs before or after training or in other words what are the ideal metrics to use to plot for overfitting/underfitting?

silent swan
#

@lapis sequoia you can use ML/NLP to generate features for prediction, but unless you're ready to read up a lot about quant finance, don't get anywhere near the stock prediction problem. There're too many ways to do it wrongly and draw the wrong conclusions if you don't know what you're doing.

#

@lapis sequoia Elements of Statistical Learning, but sooner or later you're going to need to dig in and read a standard probability/statistics textbook

uncut shadow
#

@stray monolith I think you already solved this problem, but if you didn't, then you should do plt.ylim(0). The same can be achieved x axis with plt.xlim(0) (but not in your plot)

oblique belfry
silent swan
#

I'd say stick with convnets for now

#

Hinton has been pushing capsules for quite a while but as far as I've seen it's still not had any meaningful adoption

lapis sequoia
#

@silent swan I have that book, but i think thatโ€™s more into machine learning rather than the workflow in data science, can you recommend a more general text in Data Science? The thing is that iโ€™m confused on what to really learn and how to connect the dots

oblique belfry
#

Oh...I am def going to stick with Convnets. I just find it funny how he thinks max pooling is stupid, and for some reason it works better than all the other pooling.

stray monolith
#

Guys I fixed that issue.

#

plt.ylim() was not working for some reason

#

Then I found out that the problem was because the points were being plotted as Strings

#

I converted the data to float using list comprehension and everything is fine now

#

But thanks for your input anyways @heavy thicket @uncut shadow

uncut shadow
#

ohh

#

that makes sense lol

worn stratus
#

I need to discretise continous (and continousish) data for an ID3 decision tree- At the moment, I'm just finding the best possible value to split an attribute in two, is this a valid solution, or do I need to try and account for cases where I need to split the data into more than two parts?

blazing bramble
#

Anyone here experienced at webscraping?

#

i'm having some difficulties regarding scraping a website.

#

I want to store all the data but, the show-more button at the end of the page, hides info so i only get a bit of the information

#

How do i get around this..

worn stratus
#

Use selenium and press the button with that

blazing bramble
#

i was using python

#

and beautiful soup

#

is there anyway to acomplish this with bs?

lapis sequoia
#

it depends on the site, but in some cases you need something like selenium

#

there's also chromedriver

#

and the mozilla one, mozilladriver i think

#

sorry geckodriver

#

if you want to stick to just beautiful soup, there could be some way to trigger it, but it entirely depends on the site

blazing bramble
#

the site is just best buy

lapis sequoia
#

a lot of things will use some backend undocumented api that runs the site

blazing bramble
#

the show more button

#

so what exactly is selenium?

lapis sequoia
#

its a browser emulator of sorts

#

capable of keyboard/mouse interaction

blazing bramble
#

hm.. interesting

#

sounds more powerful then bs

lapis sequoia
#

BS has some nice functionality to it

#

i prefer it in most cases, only use selenium if absolutely necessary

blazing bramble
#

but don't most modern sites have multiple pages and show-more buttons

#

where is bs applicable if it cannot

lapis sequoia
#

sure but a lot of them also usually have backend apis that serve the data

#

and a lot of those things can still be triggered with the proper request

#

theres scrapy too

blazing bramble
#

lol ima sound like a noob but

lapis sequoia
#

scrapy looks really nice

#

bs is just easier to use

blazing bramble
#

how do u find the backend api requests that you're reffering to?

lapis sequoia
#

you can use firefox or chrome

#

the dev tools have a network inspector

#

you can watch elements of the page load

blazing bramble
#

yes i was doing this

lapis sequoia
#

so you look for ones that are jso

blazing bramble
#

i noticed some thing called loadmore

lapis sequoia
#

json

blazing bramble
#

was not sure what to do, however

#

json? why json?

lapis sequoia
#

thats what most apis use

#

an api endpoint typically just returns a json document

#

or graphql

#

which i think also registers as json in the network inspector

#

search pages in particular usually have them

#

do you have an example link?

blazing bramble
#

um

#

yes

#

i can just send the current link

#

i am working with atm

blazing bramble
#

hm..

lapis sequoia
#

now if that json has all the data you need

blazing bramble
#

what is this?

lapis sequoia
#

you can change the parameters in that call

#

its the api that drives their search

#

if you have the network tab open

#

go to that page

#

then click 'show more'

#

sort by Type

#

look for json

blazing bramble
#

ok

#

if you do this, then you get the link you just sent?

lapis sequoia
#

yeah

blazing bramble
#

Ok, interesting. And now you can scrape this weblink

lapis sequoia
#

exactly

#

and if you look through it, it should give you enough info to filter, and page through the results

#

and json is easy to work with once you have it

#

no html parsing

blazing bramble
#

so bs is still applicable here

#

correct?

lapis sequoia
#

yes, though really you could do it with just requests

#
import json
import requests

response = requests.get(...)
json_data = json.loads(response.text)
blazing bramble
#

so this data is in json format?

lapis sequoia
#

yeah

#

apis dont always provide everything though

blazing bramble
#

I see, however I am not too familiar with how json is read or even works.

lapis sequoia
#

json is easy

blazing bramble
#

gotta read up on it

lapis sequoia
#

its dicts and lists

blazing bramble
#

you said that the apis don't give everything?

#

interesting...

lapis sequoia
#

yeah a lot of times search pages will have like

#

link and title

#

so then id use beautiful soup to hit that link

#

scrape the content that isnt served by the api

#

that best buy api is a bit crazy in terms of detail

#

pretty heavy

blazing bramble
#

that's a good thing for my purposes i think

lapis sequoia
#

definitely

#

not so great for actually running a site

blazing bramble
#

lol i'm just a student who's trynna play with some new technology for my resume

lapis sequoia
#

scraping is a good place to start

blazing bramble
#

this project seems so daunting now 0-0

lapis sequoia
#

just keep at it, its not too hard

#

i like to use jupyter to prototype these things

#

makes it a lot easier to test navigating through the site to build the scraper

blazing bramble
#

jupyter?

lapis sequoia
#
#

its great for prototyping, you can re run small parts of the code

#

rather than re-running everything

#

if you hit an error, you just retry

#

and you can use it to document the exploration

#

so you can look back later, put it all together

#

great for learning, especially if you are handling larger datasets

blazing bramble
#

ok so sorry lol

#

but like how exactly do i use this link lol

lapis sequoia
#

no worries

blazing bramble
#

it's a json right? so like json manipulation or?

lapis sequoia
#

yes its json

#

so if you do something like this

#
import json
import requests

response = requests.get('https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page=2&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc')
json_data = json.loads(response.text)
#

json_data is a dict

blazing bramble
#

key pairs

#

if i remember correctly

lapis sequoia
#

yup

#

json_data['products']

#

is a list of products

#

its a list of dicts

#

and if you look at the url, it has 'page=2'

#

so that list is the list of dicts of products on page 2

#

iterate through the pages, iterate through the products lists

blazing bramble
#

wow!

#

that's really cool

#

How exactly did you know 'products' was the key for this json file

lapis sequoia
#

firefox has a nice json navigator

#

just looked at it, its at the top

#

if you look at json_data['totalPages'] as well, thats the total number of pages

#

so youd want to iterate through a list from 1 to that

#

and change the url each time to set page= that number

blazing bramble
#

So if you'd want

#

all the products

#

from each page

lapis sequoia
#

just make a list and append each page's product list to that list

blazing bramble
#

how would you navigate through each page like that tho, just a loop through the pages?

lapis sequoia
#

do something like

#
for page in range(1, json_data['totalPages']+1):
  ... request & json stuff
#

and in the request url you'd put that page

#

where it says page=

#

so

#
f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'
blazing bramble
#

oh ok so then load in the request url

#

then just append the different 'products' into a list?

lapis sequoia
#

yeah exactly

blazing bramble
#

kinda like what you showed above?

lapis sequoia
#

yeah

#

use extend

blazing bramble
#

extend?

lapis sequoia
#

create a new list, then list.extend(new list)

#
# language list
language = ['French', 'English', 'German']

# another list of language
language1 = ['Spanish', 'Portuguese']

language.extend(language1)

# Extended List
print('Language List: ', language)
#

except in this case it'd be extending with json_data['products']

blazing bramble
#

ahhhh

#

interesting.

#

I will now try this out

#

thank you for all the help!

lapis sequoia
#

no problem, good luck

blazing bramble
#

I really appreciate the examples, I am very much an example-based learner, and I could not find too many examples on the topic online.

#

Do you have any resources you could possibly pass on to someone who just wants to learn some more?

lapis sequoia
#

github is one of the best sources honestly

#

just search through open repos

blazing bramble
#

If you don't mind answering, are you also a college student?

lapis sequoia
#

if there is a particular library or package you need help with, just search it

#

no i'm a data engineer

#

build apis, scrapers, stuff like that

blazing bramble
#

hey that's cool

lapis sequoia
#

work with the data science team

#

yeah its fun stuff

blazing bramble
#

when'd you get interested in scraping and stuff?

lapis sequoia
#

i was pretty young and i loved news, but hated the news sites at the time

#

i wanted something that would show me just the title and text

#

and rss wasnt really used at the time

#

no google news

#

so i made a scraper that ran and gathered all the news

#

ive made a few hundred since

blazing bramble
#

damn lol

#

I'm just a cs student looking for some meat n potatoes on my resume, thought a scraper would be interesting.

lapis sequoia
#

its good stuff

blazing bramble
#

it's really cool I think

#

the whole internet becomes a data-base

lapis sequoia
#

yeah pretty much

blazing bramble
#

I was able to find the backend api json thing through the network tools, but i'm still confused as to if it represents every single page or just one

lapis sequoia
#

the json is just for one page

blazing bramble
#

so then what do i load into requests

#

like just the page i'm working with?

lapis sequoia
#

like i said above, something like

#
for page in range(1, json_data['totalPages']+1):
  ... request & json stuff
  url = f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'
#

then do requests.get(url)

#

see the page=?

blazing bramble
#

on the top of the page?

lapis sequoia
#

in the url

#

&page=

blazing bramble
#

oh yea

#

i see

#

you'd want to update that

#

which is what the for loop is acomplishing

#

incrementing it by 1 eachtime

lapis sequoia
#

yup

blazing bramble
#

for each page

#

ooooooo

lapis sequoia
#

yeah and since totalPages is in the json dump

#

youd get the first url

#

and take the totalPages from that

#
import json
import requests

response = requests.get('https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page=1&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc')
json_data = json.loads(response.text)

for page in range(1, json_data['totalPages']+1):
  ... request & json stuff
  url = f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'
#

that way as the # of pages changes, the scraper knows and iterates through the right length list

#

i mean do it better than that, but that's the basic idea

blazing bramble
#

ok everything is starting to click

#

what is the f in the url in the loop sir?

#

f'

lapis sequoia
#

its an f string

#

its the same as doing

#
"string {0}".format("is a string")
#

which returns 'string is a string'

#

so if page = 3

#
f"page {page}"
#

would return 'page 3'

#
f"string {"is a string"}"
#

LUL in retrospect string is a string is not the best example

blazing bramble
#

uhh

lapis sequoia
blazing bramble
#

I have not seen this before haha

lapis sequoia
#

its new in 3

#

3.6 even

blazing bramble
#

wow cool

#

new stuff is always cool

lapis sequoia
#

actually for this something like

#

you could also do something like

#
base_url = "https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc"

for page in range(1, json_data['totalPages']+1):
  url = base_url.format(page)
  ... json stuff
blazing bramble
#

wow

#

f strings are cool

lapis sequoia
#

yeah they are shorter and look better

#

but they aren't re-usable the way my example above is

blazing bramble
#

but essentially what's going on is they're taking the thing and making them string literals right?

lapis sequoia
#

yes the f string evaluates the contents of the {} and returns a string

#

where as in the example above, the {} is only evaluated when the string has .format(value)

#

until you do that, {} is just a part of the string

blazing bramble
#

hm... why can't we just toss the url into the loop without the f tho

#

like what it's data type

lapis sequoia
#

you need it to include the contents of page

#

so if page = 3

#

you need the url to say page=3

blazing bramble
#

that makes sense

#

and you got the loop incrementing page

#

but why does the whole thing have to be a string

lapis sequoia
#

the url?

blazing bramble
#

mhm

lapis sequoia
#

because you need a url to make a request

#

and the url needs to be a string

#

so either

#
base_url = "https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc"

for page in range(1, json_data['totalPages']+1):
  url = base_url.format(page)
  ... json stuff
#

or

#
for page in range(1, json_data['totalPages']+1):
  url = f"https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc"
  ... json stuff
#

id argue the first one with .format looks nicer

blazing bramble
#

I think the second one makes more sense to me

lapis sequoia
#

they work the same

#

you could also do

blazing bramble
#

Ofc, yes

#

but like conceptually

#

the second one sits easier with me, just a preference

lapis sequoia
#
for page in range(1, json_data['totalPages']+1):
  url = "https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc".format(page)
  ... json stuff
#

yeah f strings are far easier to read

#

i prefer them myself

#

but its only in i think 3.6+

#

.format is older

blazing bramble
#

i gotta read up on format

#

i forget

#

what it does

lapis sequoia
blazing bramble
#

ahh it's a substituent

#

i see.

#

page is entered into the {}

#

ok that makes sense

lapis sequoia
#

yes

#

you could also have {0}

#

0 being the first item in the list inside .format()

blazing bramble
#

you are extremely helpful man

#

thanks a lot

lapis sequoia
#

i'm bored

#

and like sharing

#

no problem

blazing bramble
#

I mean,

#

i guess i'm bored too,

#

trynna learn this stuff lmao

#

to pass time

#

but it's very cool so, it's a nice pass time haha

lapis sequoia
#

yeah it can be a good way to pass the time

#

especially if its for you

blazing bramble
#

Hey did you ever have experience with getting a co-op?

lapis sequoia
#

co-op?

blazing bramble
#

internship

lapis sequoia
#

oh no

blazing bramble
#

different names different countries

lapis sequoia
#

i was in IT by trade, got right into a paid job

#

that led to dev work, mostly python

#

which led to this

blazing bramble
#

wow so did you have to get a cs-degree or not even?

lapis sequoia
#

not even

blazing bramble
#

o that's cool

#

many pathways

lapis sequoia
#

yeah i wouldn't recommend my route

#

its harder

#

but its the road i took, so what can you do

blazing bramble
#

haha yes i'm in school and planning on staying in it

#

the thing is, it's almost time to find experience

#

i'm scared for technicals man

lapis sequoia
#

make your own experience

blazing bramble
#

I currently am

lapis sequoia
#

come up with some projects for yourself

#

build scrapers, bots

#

set them up right

#

learn how to use docker and deploy things

#

learn version control

blazing bramble
#

what are these things?

lapis sequoia
#

its a tip of an iceberg is what it is

#

docker is a great platform for virtualization

#

and delivering software in isolated containers

#

what it basically does is allow you to specify an environment for your app/script to run in

#

as an example, i have a docker script that loads a pre-built Ubuntu image

#

installs all of the required packages

#

copies the source over

#

you can run the code within that environment

#

that pretty much eliminates the odds that changes in your environment will break your script

#

then you can take that docker image and directly deploy it

#

so if you had a scraper, for example, you could push it to any server, either hosted at home or something on like AWS

#

just by pushing the entire image

#

no setting stuff up, cause its already set up

#

and that might sound hard

blazing bramble
#

wow, this shit

lapis sequoia
#

but the file that does it all

blazing bramble
#

sounds so complex

lapis sequoia
#

is like 20 lines

#

super simple

blazing bramble
#

wow

lapis sequoia
#

let me find an example

blazing bramble
#

where the hell'd you learn this stuff

#

damn, youtube got nothing

lapis sequoia
#

i did IT for a long time

#

just learned everything i could

#

automated everything i could

blazing bramble
#

I just wish i knew some good resources you know?

lapis sequoia
#

automate the simple stuff is a great starter

blazing bramble
#

is python a good place to put my money in terms of where to be in cs

#

i'm learning so much shit

lapis sequoia
#

sorry automate the boring stuff

#

great starter

blazing bramble
#

like html, css, c, java, low-level, a bunch of stuff

#

But i don't know where I wanna become specialized in

#

python is a nice language, solid, simple, and powerful

lapis sequoia
#

language knowledge is pretty transferable

#

once you have your head around a couple

#

learning others is easy

blazing bramble
#

that's true.

lapis sequoia
#

python is a great starter

#

java is good

blazing bramble
#

going from java to c was quite simple

#

but i hate c

#

i'm not a fan

earnest prawn
#

thats because java came from c

blazing bramble
#

well that'd make a lot of sense

#

but our school makes us do memory management manually

#

even tho, we have garb collectors

#

it be liek that ๐Ÿ˜ฆ

earnest prawn
#

well because a) garbage collectors suck, they draw performance and require a larger runtime and b) manual memory performance teaches you something about how things work

blazing bramble
#

hey man, that's true

earnest prawn
#

there is a reason for example many of the libraries you use in data science like tensorflow are written in C++

blazing bramble
#

isn't c++ super powerful

#

like it's modern capabilities

velvet thorn
#

knowing a little about how computers work @ a low level is really helpful sometimes

#

you don't need to be able to write a device driver

earnest prawn
#

C++ is super powerful

#

in fact

#

c++ is too powerful

#

c++ is basically c + every object oriented feature mankind has thought of which makes it a super overloaded language where if you have a new project youll have engineers arguing about which of the versions from c++99 over 11 to 17 they should use

#

and once they have come down to a version of c++ what the best design pattern is because you can achieve the same thing in a million ways

velvet thorn
#

1983 - Bjarne Stroustrup bolts everything he's ever heard of onto C to create C++. The resulting language is so complex that programs must be sent to the future to be compiled by the Skynet artificial intelligence. Build times suffer

earnest prawn
#

a bunch of people at work who do profesional c++ in their department have come up with the theory that stroustrup came up with C++ only to make money off the books people will buy to understand it

blazing bramble
#

As someone who is taking c++ nxt semester, these perspectives are very interesting.

earnest prawn
#

i think there is a big difference between learning a language at school where youre limited to the feature set your teachers throw at you

blazing bramble
#

no you're right

#

but still

earnest prawn
#

and using it professionally where everyone and their mom has another idea on what design pattern is the same

velvet thorn
#

you could try Scala

#

it's a good introduction to functional programming

earnest prawn
#

when did we go to functional programming

blazing bramble
#

i'm still a junior loll

earnest prawn
#

and mor interestingly

blazing bramble
#

idek what that is

earnest prawn
velvet thorn
#

I think it started from the f-string discussion

#

which was in the context of scraping

blazing bramble
#

Yes! i'm making a scraper

lapis sequoia
#

which led to me going on about deploying it

blazing bramble
#

my first real side project

lapis sequoia
#

which led to languages

blazing bramble
#

@lapis sequoia do you primarily program python or also do some others from time to time?

lapis sequoia
#

my work is all python, do some C# for game dev i do for myself

#

i've done others

blazing bramble
#

interesting, a true python master

lapis sequoia
#

i'm no master

#

just an old amateur

blazing bramble
#

So, how much math is involved in data science.

velvet thorn
#

data science is a pretty wide field

blazing bramble
#

typically in a college degree, the ds pathways are more math focused

velvet thorn
#

it depends on what exactly you're doing.

#

minimally, I would say basic statistics and linear algebra

#

and calculus

blazing bramble
#

wow calc too?

velvet thorn
#

if you're planning to go down the research scientist path, then a lot more mathematics, probably

blazing bramble
#

I was planning on not doing calc again lol

velvet thorn
#

only if you need to do theory

#

that's a bit higher-level?

#

like if you're doing DL research

lapis sequoia
#

There's a lot of work for just supporting data science as well

velvet thorn
#

I'd say for the majority of junior data scientists it wouldn't be relevant

lapis sequoia
#

As long as you have a grasp of the basic concepts

velvet thorn
#

except insofar as you need it to understand stuff like gradient descent (actually, more or less only gradient descent...)

blazing bramble
#

the world of development is super vast by the sounds of it

#

almost overwhealmingly vast

velvet thorn
#

there are also things that are data science and not dev work, and vice versa

#

you can be a very bad coder and still be a fairly good data scientist

blazing bramble
#

Damn i thought development was the thing that tied everything together kinda

velvet thorn
#

well, if you mean software development specifically...then not really? because you can do DS without software at all, though that would be inefficient

#

however, it's fairly common for data scientists to not-really be developers

#

i.e. their expertise is in statistical analysis and experimentation

blazing bramble
#

isn't the minimum these days a college degree for a post tho?

velvet thorn
#

and being able to code is just a means to an end

#

that would depend on your country, really

#

mine is a little uptight about education, and I don't actually have a CS degree

#

so I kind of took the alternative route?

blazing bramble
#

i'm in north america

#

also very degree orientated

velvet thorn
#

not sure how it works over there.

#

but I'd say that that kind of requirement is becoming more and more common...?

#

and if you don't fulfil it then you probz need to show competence in some other way

blazing bramble
#

yep

deft harbor
#

Does anyone have any good resources on art and AI?

#

Im curious about music generation, lighting, paintings, etc.

wind marlin
#

Anyone here doing financial time series forecasting? Is there anything open source that you found to work well? Perhaps utilizing RL for forecasting?

wind marlin
#

Is a 1D convolutional network with attention something that I can get help with? I heard it's good for forecasting but haven't seen any code examples or examples of anyone using it.

worn stratus
#

Machine learning for stock predictions isn't generally done. Youre almost certainly not going to do very well with it

#

Although as a learning exercise its probably fine

deft harbor
#

Its pretty common to team machine learning with options trading. However, it pulls outside data and isn't just trying to predict the future based on movement alone.

#

Also, the fund managers I have discussed it with use it for marking potential trades not auto trading.

drifting hemlock
#

What do you guys think about AutoML? Is it good enough?

deft harbor
#

My biggest fear with machine learning is people creating models that impact our daily lives, but they dont really have a clue about what is going on.

#

That said, never used automl

drifting hemlock
#

And that's exactly one of the reasons many experiments don't get deployed, I mean if you develop a machine learning model to predict X thing, then people are going to ask why it made such prediction. I feel that with AutoML you don't exactly know what's going on behind the scenes.

#

I mean it looks great and all, I've tinkered a bit with IBM Watson which is basically an AutoML, but I don't know, I don't trust it lol

deft harbor
#

I'm concerned about people being able to deploy these models without understanding all the issues and ethical questions

#

Looking at you COMPAS, can't even see behind the curtain to defend yourself in court

deft harbor
#

They will use satellite images to track shipments, global warming trends and weather data.. Etc

silent swan
#

using DL to generate features ๐Ÿ‘
using DL to predict stocks ๐Ÿ‘Ž

silent swan
#

I was just explicitly agreeing with you

deft harbor
#

Eh, its new year, I missed the first comment.

#

๐Ÿป

wind marlin
#

I don't believe anyone can get great results using ML only, but I have a deep understanding of building algorithms that backtest decently already. My algorithms output into an oscillator format. I've only gotten to where I am because I know how other builders think. There's a lot of them and they all make the same errors. With ML I can use more instruments because I can't keep up visually, I'd rather get an alert than watch screens all day, I would lose my mind lol But I agree with you guys about stocks, I know about the satellite parking lot stuff. Evidence of insider trading/commitment is actually where ML should shine. I have some ideas for how to spot that. I want many tools that each look at different things.

delicate carbon
#

Im trying to build a decision tree model with a bot I am making for Discord. I have no clue what im doing in regards to the ML part lol. I try to read tutorials but it gets so insanely heavy into math I zone out and look for another source.

I have a great dataset that is widely used. As long as I receive good accuracy and precision I think Ill be alright..

#

Before I even deploy the bot I will test the model against real data and if its misclassifying then i know ill need to make adjustments. I just play around with parameters and values until the results show. I know its not methodical but idk .. its works for me

ocean beacon
#

You probably won't have enough data when you are training your discord bot live. You need to train your bot offline and feed it tens/hundreds of thousands of data.
I suggest doing a standalone ML project first and then worry about discord. ML is a really broad field. Some people who work in ML never work on the model but are purely occupied with gathering sensible data and optimizing those further or they are in the field of generating data which they plan to feed into their model. You are trying to make a finished product (the discord bot) but the core of your project (ML) is still at its infancy.

knotty canopy
#

hey any1 know im trying to read excel file using pandas but i come across problem with o/p that it showing Nan ?

#

in terminal o/p?

paper niche
#

whatโ€™s o/p?

worn stratus
#

output I assume

knotty canopy
#

Yep

deft harbor
#

copy and paste the actual error

delicate carbon
#

@ocean beacon This is the dataset i am using https://archive.ics.uci.edu/ml/datasets/phishing+websites there is also this one but holy cow, its an enormous file http://www.fcsit.unimas.my/research/legit-phish-set

I have thought about that, "what if my bot is misclassifying too many real samples?" Maybe I could use the decision tree + reinforcement learning or decision tree + a neural network (albeit, the simplest form of neural network. I really dont understand how to implement the logic of an extremely robust neural network)

#

I was reading a little bit about 2-class neural networks which might work

deft harbor
#

So your goal would be to take items people are typing, and then capture ones that are phishing attempts?

delicate carbon
#

When users post a link it will go to my model which is being hosted by flask. All links are analyzed

deft harbor
#

If so, you will need more than just a decision tree. You will need NLP items to process the text, created topics, etc.

delicate carbon
#

I did something similar. I created a py program that extracts features from incoming links

#

My dataset has 30 attributes so I extract those 30 attributes from urls

#

those 30 attributes from urls are then put into a new dataset. That new dataset is sent to the model which will determine if its phishing or not

deft harbor
#

Ok, not sure how well it performs, but you can check for false positives by doing a confusion matrix or ploting AUC

delicate carbon
#

I havent tested the dataset yet but I do have my decision tree all set up so I can do it whenever i want

#

thanks

deft harbor
#

I would try tweeking the decision tree model, measuring overall performance and also check things like AUC before feeding that data to a NN

#

Thats just the way I would go about it

delicate carbon
#

Definitely. I read good things about boosted trees so I will play around with that as well. What is NN?

deft harbor
#

Also, you could apply bagging or boosting to the tree as well

#

neural network

delicate carbon
#

Oh right ๐Ÿ™‚

#

If i were to use a tree + NN.. how would I consider the algorithm?

Decision tree classifies (assuming it classifies correctly) -> send to NN so that the DT keeps learning -> send results from NN back to DT?

#

like a constant feedback loop of-sorts

lusty coral
#

Hey guys. I'm coming from general discussion

#

I'm doing some data analysis via pandas. I have to do this task periodically like every 15secs. Though I think dataframes are slow, or I'm using them in a wrong way.

#

I mean I thought of using python matrices, but thought it would be a lot harder to deal with.

#

I need to change some specific roads, and update my dataframe etc

#

I need to get that data as well, and I'm being aware for a live demo pandas won't work I guess

#

But for now, i just want to know whether I'm doing things wrongly, or any other ideas you guys have considering my code? I'm not the best, still learning. So there is that

deft harbor
#

Pandas becomes real slow when you start appending stuff to a dataframe over and over.

#

I'm not sure I follow everything in the code, as the comments aren't English.

lusty coral
#

I could translate them if it would help

#

I did it nonetheless ๐Ÿ˜„

#

This one is the latest one. I tried using apply and update. But I think I'm doing something wrong

#

I need to update my data and calculate some values, then I decide some pricing stuff. I have not implemented pricing stuff because I need to resolve my issues first I think

delicate carbon
#

basically im just looking for a "simple" solution. When I read about implementation of ML and theyre like this is how you do it:

#

im like okay i dont know what any of that means lol

#

i dont have that background

#

but i can read code and understand that so i have to study actual implementation logic

lusty coral
oblique belfry
#

Has anyone used autoencoders to great effect before? I am thinkning about using it in a signal processing problem, but all the hoopla I see about it are with MNIST examples.

There are a lot of academic papers using them, but I am not so sure if they work well in the real world.

velvet thorn
#

@delicate carbon that's entropy

#

basically, you want to create rules that allow you to split a dataset into subsets where each subset has "similar" values of the variable you're trying to predict

#

higher "similarity" is lower entropy

compact bluff
#

I have a csv with the following columns: update_rule, K, N, L, attack, time_taken, and eve_score (where attack can be "none" or "geometric"). I want to do two things:

  • average time_taken and eve_score for each row with the same update_rule, K, N, L, and attack
  • get rid of the attack column and replace eve_score with eve_score_no_attack and eve_score_geometric

at the end these should be the columns: update_rule, K, N, L, time_taken, eve_score, eve_score_no_attack, eve_score_geometric. does anyone know how I can do this? I've been looking at the pandas api but I don't really care about which library I use

velvet thorn
#

that's basically a groupby mean

compact bluff
#

I don't see how it can be used in this case. can you explain a little more?

velvet thorn
#

do you know how groupby works?

#

if I understand you correctly, what you want to do is df.groupby(['update_rule', 'K', 'N', 'L', 'attack']).mean() (for the first part)

compact bluff
#

ahhh that probably will do what I want

#

do you know how to approach the second part?

velvet thorn
#

I'm not really sure what you want there

compact bluff
#

attack isn't numeric which makes it a little harder to work with in analysis. I want to get rid of that column. the start of my processed csv is currently:

update_rule,K,N,L,attack,time_taken,eve_score
anti_hebbian,4.0,4.0,4.0,geometric,12.9164006,79.53125
anti_hebbian,4.0,4.0,4.0,none,11.625815399999999,78.90625

I want it to be:

update_rule,K,N,L,attack,time_taken,eve_score_no_attack,eve_score_geometric
anti_hebbian,4.0,4.0,4.0,12.271108,78.90625,79.53125

note that the updated time_taken comes from the average of the two previous time takens

velvet thorn
#

if I understand correctly, you want to groupby attack then

#

and take the mean

#

and then join onto the result of the first part

wind marlin
#

Is there a way to use Kite for Colab?

tawdry rose
#

im new to pandas

#

where can i find

#

sample datas

oblique belfry
#

datasets?

#

Try Kaggle.

tawdry rose
#

hm okay

#

what is difference between ML and deep learning

desert oar
#

@tawdry rose deep learning is machine learning with certain kinds of "deep" neural networks

tawdry rose
#

hm thanks ๐Ÿ™‚

sand gyro
#

In my code I am converting a csv file into pandas then I am adjusting the data in the columns
I have created two excel workbooks before the if statement
I have two issues in my code the first issue is I have tried a number of solutions but I cannot add the dataframe data in the excel sheet properly according to the if statement. The if statement sets conditions to add the pandas data rows to the accepted contacts otherwise the rows not passing the condition should be added to the rejected contacts workbooks. What am I getting is in the accepting contacts the full contacts are looped in the workbook and are added once into the rejected contacts

for column, row in df.iterrows():
  row_check = row.isna()
  if (not row_check[0] and not row_check[1]) and ((not row_check[2] and not row_check[3]) or (not row_check[2]) and (not row_check[6]) or (not row_check[16]) and (not row_check[12] and not row_check[13] and not row_check[14] and row_check[15]) or (not row_check[8] and not row_check[9] and not row_check[10] and not row_check[11] and (not row_check[7] or not row_check[17]))):
       for r in dataframe_to_rows(df, index=False, header=False):
           ws.append(r)
else:
       for r in dataframe_to_rows(df, index=False, header=False):
           ws2.append(r)

wb.save("Accepted Contacts.xlsx")
wb2.save("Rejected Contacts.xlsx")
velvet thorn
#

that...does not look good.

#

iterrows is generally a dodgy way to do things

granite sierra
#

some theory help.

I am creating a neural net, with a softmax activation function as the output. If I'm doing cross entropy, do I have to hot-encode the output of the soft max before doing cross entropy. If I do have to hot-encode, how does one go about doing this?

#

or does it come out of the softmax as a hot encoded vector?

silent swan
#

cross entropy is computed between distributions

#

in practice, your labels will be a length-N vector of class IDs

#

and you predictions will be a NxK matrix of either prediction probabilities or logits

granite sierra
#

ok and the output of the softmax will be the x_pred in the cross entropy

#

with x_actual being the comparison array, right?

#

correct?

silent swan
#

that sounds correct

granite sierra
#

a little more theory help

#

@silent swan to update the weights on the backpropagation, do I take the derivative of the activation function (softmax in this scenario) or do I take the derivative of the cost function (cross entropy in this scenario).

velvet thorn
#

the latter.

#

well, eventually the former too

#

in terms of backpropagation, there is no difference between an "actual" layer and an activation layer

granite sierra
#

when would the former be needed?

velvet thorn
#

well, you need to propagate backwards throughout the network, right?

granite sierra
#

yea

velvet thorn
#

you can think of each layer as applying a function to the previous layer's output

#

starting with the input

#

that's the forward pass.

#

in this case, the last layer is the softmax

#

the result of that then goes into calculating the loss with reference to the ground truth

#

as well as the gradient

#

so now you have loss and gradient

#

you propagate the gradient backwards through the last layer (there are no weights to update there)

#

but nevertheless you must apply the chain rule so the gradient's magnitude is correct

#

your second last layer is probably a FC layer or something

#

and that has weights to update

jolly briar
#

i think i've seen this used as a concrete example for back-propagation before

velvet thorn
#

that one is great

#

I've seen it

jolly briar
#

๐Ÿ‘Œ

granite sierra
#

Ok thanks, I'll take a look at that, FC layer?
I thought you started the back prop from output backwards and not from the input layer again

velvet thorn
#

fully connected

#

yes, that is correct

#

that's why it's backpropagation

granite sierra
#

ok, sorry I got confused when you said you start from the input, all good

velvet thorn
#

the point is using the chain rule to get the derivative of the loss function with respect to the current layer, right?

granite sierra
#

yea

#

but surely by that logic, you would need the derivative of the softmax?

#

since that would be the "current layer" when starting the backprop

velvet thorn
#

eventually

#

but remember

#

the value of the loss function is derived from the output of the softmax

#

so the very first gradient you can calculate is of the softmax with respect to the loss

#

is that what you mean?

granite sierra
#

yea

velvet thorn
#

yeah, then you go backwards

granite sierra
#

so output of softmax is used to calculate the cross entropy, and then I take the derivative of the CE to get the gradient, I compare the previous weights. Do i get the derivative of the sigmoid in the hidden layer then

velvet thorn
#

yes, unless you don't want to train it

granite sierra
#

whats the point of that then xD

velvet thorn
#

frozen layers?

granite sierra
#

oh yea sure

#

I only have 1 hidden layer though

#

I own't have frozen layers anyway

#

Ok I'll try and figure it out

#

so far what I've found is I struggle more with the python implementation than the math/theory

hexed rampart
#

i was running my neural network and i got a DSATray.exe error talking about a "gaurd page" and the ide said this: Process finished with exit code -1073740791 (0xC0000409). What happened?

granite sierra
#

@velvet thorn my issue with that article is that the example it gives, it uses squared error and sigmoid in a multi class classification, and the code/math is much simpler for both of those over softmax with cross entropy

granite sierra
#

For cross entropy, where do I get the actual probabilities? Is that just the target value? Because the target value won't be a vector of the same size as the predicted probabilities

velvet thorn
#

yes

#

a perfect classifier would predict 1.0 for the actual value

#

and 0.0 for everything else

granite sierra
#

oh

#

so for example if I have 5 options, and the target value is num_1, the vector will be [1, 0, 0, 0, 0]

velvet thorn
#

1.0, 0.0... actually

#

but not that important

#

wait, do you mean the target

#

or the prediction?

granite sierra
#

target

velvet thorn
#

yes

#

then int

granite sierra
#

ok how would I get the target value as vector, like is this where one-hot encoding comes in?

velvet thorn
#

uh

#

you're rolling

#

your own NN right

#

then yes

granite sierra
#

yea

#

oki got it, I feel stupid af lmao

#

gm, question, so if I wanted to optimise the gradient descent, is there a way I could use adam without having to code it myself. would that entail wrapping my model in scikits learn wrapper

velvet thorn
#

uh...well.

#

not sure how you want to do that

#

it's not that hard to code though!

granite sierra
#

fair, maybe I'll just stick with sgd, it doesnt have to be a good classifier, just a classifier

tawdry rose
#

how can i plot correlation heatmap

#

without seaborn

#

with matplotlib

worn stratus
#

It might be worth having a look at seaborn's source code - no idea how readable or not it is

tawdry rose
#

btw can i show correlations with any other plot

worn stratus
#

I've implemented a simple NN with one hidden layer, and I think it works, but I'm not sure. I'm just doing the most basic sigmoid/sigmoid_prime stuff - and it doesn't handle xor, should it?

silent swan
#

just compute a correlation matrix, and then use plt.imshow

granite sierra
#

I have a question, this guy calculates cross entropy, but doesn't use it anywhere in his backprop.

https://beckernick.github.io/neural-network-scratch/

Is that wrong or did he get around it somehow that I am unable to see? I thought the point of the cost function was to see how far off the weights were, but how is he training if he isnt using the cost function values

velvet thorn
#

@worn stratus if you have a hidden layer, it should be able to handle nonlinearities

worn stratus
#

Yeah, it turns out it does/did - but I was just bad and read 0.05 as 0.5

velvet thorn
#

@granite sierra doesn't he do that under "Implementing Backpropagation"

granite sierra
#

well he does something, but he doesnt use the cost function value

#

he literally only uses the cost function value for printing the current loss

#

@velvet thorn do you understand what he is doing, is it correct?

velvet thorn
#

but

#

why do you say that?

#
 output_error_signal = (output_probs - training_labels) / output_probs.shape[0]
    
    error_signal_hidden = np.dot(output_error_signal, layer2_weights_array.T) 
    error_signal_hidden[hidden_layer <= 0] = 0
granite sierra
#

sure but thats not cross entropy?

#

he has a cross entropy function higher up that he calculates loss

velvet thorn
#

oh

#

I see your confusion

#

this is why naming functions well is important

#

cross_entropy_softmax_loss_array

#

you mean this, right?

#

that calculates the "average" loss

granite sierra
#

yea I think so, let me check

velvet thorn
#

but output_error_signal is what you actually use for backpropagation

#

because remember

granite sierra
#

oh

velvet thorn
#

when you apply the loss function to the predictions and target

#

you should get one value per pair

#

pretty bad naming TBH

#

and that's old code

#

maybe I should do a blog on this

granite sierra
#

why is it old? (might be a dumb question)

velvet thorn
#

I thought there were more resources available

#

because it uses xrange

#

which is a Python 2 function

#

not a dumb question

granite sierra
#

oh sure

#

yea but the rest of the code seems relatively ok

#

minus the bad naming

velvet thorn
#

I only briefly glanced through it

granite sierra
#

so he uses loss function just to calculate the average loss and prints that out every epoch

velvet thorn
#

I just woke up so

#

reading code is difficult

granite sierra
#

which side of the world you on, east or west

#

thanks for the help though

velvet thorn
#

well

#

I suppose

#

that depends on how you rotate the globe, right? pithink

#

but east

granite sierra
#

Fair, thanks for help, muchacho gracias

tardy pasture
#

Anyone has experience with using Gaussian process for regression problems?

#

sorry if i posted this in the channel

lapis sequoia
#

!ask

arctic wedgeBOT
#
ask

Asking good questions will yield a much higher chance of a quick response:

โ€ข Don't ask to ask your question, just go ahead and tell us your problem.
โ€ข Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
โ€ข Try to solve the problem on your own first, we're not going to write code for you.
โ€ข Show us the code you've tried and any errors or unexpected results it's giving.
โ€ข Be patient while we're helping you.

You can find a much more detailed explanation on our website.

lapis sequoia
#

the most elaborate 0 to DL approach covering theory, math and code https://d2l.ai/

#

^ can some mod pin this

tardy pasture
#

Thank you very much

native stag
native stag
#

hours upon hours of resources

granite sierra
#

Hey guys, so I have a neural net, multiclass, if I have 3 outputs and 5 hiddens, my softmax is returning 3x5, so like [0.15911229 0.66902292 0.17186479] but 5x

#

so it gives this

#

[[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]]

#

does anyone know why it might do this?

granite sierra
#

nvm all good, I fixed it

haughty storm
#

How can I make my for loops closer to O(logN) efficiency?

worn stratus
#

Replace them with a binary search?

haughty storm
#

ok

#

sure

#

i mean what are good approaches to doin that

uncut shadow
#

Hey. I have 2 questions which I don't quite understand:

  1. How to turn words and text into code machine can understand (for chatbot, so bot can create a new meaningful sentence).
  2. After professing etc. the network will return a number. How can I turn this number into a word again?
#

I'm trying to make a deep learning chatbot, but these 2 things are impossible for me to solve

velvet thorn
#

if you donโ€™t know how to do those

#

you probably need to do a lot of work.

#

a lot.

#

look up tokenisation and embedding @uncut shadow

#

thatโ€™s a good start

tardy pasture
#

How do I use a gaussianProcessRegressor() for time series forecasting?

drowsy grove
#

Hi guys, I am wondering if I can ask a simple pandas question

#

Right now I have this small sample dataset with order details:

#

My train of thought is: since it's a given that it has been processed, I will begin with all the orders with COMPLETED status.

#

I know I can see that there is only one row with Inactive status. But to be accurate, I still use groupby to find out that the anomaly can be shown by using groupby(STATUS) on the result of the last step.

#

In total I got c = df_merge[df_merge.ORDER_STATUS_CD == 'S'] c.groupby(['STATUS'])['ORDER_ID'].size() df_merge[(df_merge.STATUS == 'Inactive') & (df_merge.ORDER_STATUS_CD == 'S')]

#

I feel like this is way too verbose and I have to use my eye to identify which groupby result group ends up being 1. I am wondering if there is a way I can directly utilize the fact that there will be 1 group with a size of 1 in the 2nd step. And directly get that row I need from there.

#

Instead of checking the result 'Oh, I should focus on this group as its size is 1' and then used the 3rd line of code to finally get the result.

#

Hope I made myself clear. Sorry for being verbose here.

#

Thanks.

lapis sequoia
#

@drowsy grove expect input and desired output please.. rephrase your question

#

how do you define an order as being 'processed by mistake'

uncut shadow
#

Well, I know what is word tokenization and embedding, but (iirc) it only allows to match 2 words so for example alice, wonderland.

lapis sequoia
#

what do you mean.. elaborate

#

what do you think embeddings are?

uncut shadow
#

I mean, I'm thinking How to make a chatbot in python which would create It's own sentenecs based sentences it Has seen. The problem is, How to make sentences. It can check if 2 words have the same (or close) meaning, but idk if It's going to work with sentences.

lapis sequoia
#

then you don't understand what embeddings are

#

let's take a look at two sentences, and tell me what you observe from them

#

I waited at the signal

#

vs

#

He gave me a signal

#

what's different here?

uncut shadow
#

Well, everything, but signal is different

#

And the meaning of signal is technically the same in both sentences

lapis sequoia
#

the semantics changed depending on the context

#

embeddings are the same way.. they are a representation

#

and some embeddings are context aware.. so the word signal when converted to an embedding is not the same and changes with the context

#

you can hence arrive at sentence level context aware embeddings

#

understand?

#

Your objective is to have your chatbot make responses.. one similar system that makes responses based on context is widely called Question Answering..

#

Gmail smart reply for example

#

And when you're using something like Gmail's smart compose.. that's something called NLG, that uses context in realtime to update its predictions... returning the highest semantically similar prediction based on its trained corpus

#

@uncut shadow andddd you disappeared

uncut shadow
#

@lapis sequoia I'm back :P

lapis sequoia
#

read and ask questions if you have any

uncut shadow
#

Ok, I'll look into it and I'll try to google something, thanks!

lapis sequoia
#

semantic similarity with sentence level embeddings is the concept I just told you about

pearl kiln
#

any machine learning tutorial on youtube

#

?????

#

good

drowsy grove
#

@lapis sequoia My bad. Reading my question again, it does seem that the basic desired output is missing.

#

Input is the 15rows * 11columns dataframe.

#

Desired output is the record/row of the order that was 'processed by mistake'.

lapis sequoia
#

that's fine..

#

but how do you define a row as having been processed by mistake

drowsy grove
#

As for what is defined as 'processed by mistake', that is a very good question. I honestly do not know. The given information is that there is indeed one order in this dataframe that is processed by mistake.

lapis sequoia
#

that's your filter condition

drowsy grove
#

Exactly.

lapis sequoia
#

ohk.. so you don't know

drowsy grove
#

So I've been trying to deduct from the given.

lapis sequoia
#

show me the df.. let's see..

#

and is this homework?

drowsy grove
#

Exactly.

#

Nope...like a small project haha.

#

It's a merged dataframe.

#

Oops, that looks awful.

lapis sequoia
#

of course it does...

drowsy grove
lapis sequoia
#

what does active inactive mean

drowsy grove
#

That's exactly where I got stumped.

#

What I was trying to do was find one record that is anomaly.

#

And that must be the one.

#

Basically I'm deducting my filter condition from result.

lapis sequoia
#

let's try to be less verbose.. and more objective

drowsy grove
#

Since there's only 1 inactive record that shows "COMPLETED". I thought it must be it.

#

That's a huge problem of mine. Verbosity.

#

IMO, 'processed by mistake' consists of 2 parts: 'processed' and 'by mistake'

lapis sequoia
#

let's see what can be your possible conditions based on the fact that you don't know the information about the fields you have

drowsy grove
#

Sure!

lapis sequoia
#

im just waiting for you to let me

drowsy grove
#

My bad. Go on.

lapis sequoia
#
  1. Status of User Inactive, but Order Status CD shows as S
#
  1. Order Status Cancelled but Order Status CD shows as S or P
#
  1. No order date or No Order ID but order status CD shows as P or S
#

that's it

#

from what I can gather from this

drowsy grove
#

Makes total sense. I failed to see the possibility of 2 and 3.

lapis sequoia
#

it helps to not get in your own way

#

looking at a problem requires breaking down in to manageable pieces and solving the parts

drowsy grove
#

You are right. Lemme look at the data again.

#

So I shall create filter for each situation and then compare the results?

jolly briar
#

i've a script that i wanted call using a bash alias as
alias s='path/to/script.py

the issue is that the script requires some libraries that aren't in my base python install and it also calls some folders within the project (from dir import constants) that i think aren't working when trying to use it as an alias like this.

what's the best approach here?

lapis sequoia
#

hmm I dunno man.. maybe head to #unix

jolly briar
#

think it's a python question

#

seems like it'd be a pretty common pattern / problem

lapis sequoia
#

I know.. but you're talking about a bash alias.. seems like they'd be able to help.. or you can ask in a help channel

jolly briar
#

don't have a specific question - this channel seems ok... i don't think a bash alias is particularly exotic

#

I've just realise that I'm in the #data-science-and-ml channel not the general python channel ๐Ÿคฆ, sorry

drowsy grove
#

@lapis sequoia BTW, I ran the queries for all 3 scenarios and only the first scenario returned 1 row.

#

I think now I can safely assume that 1 is the filter condition for 'processed by mistake' right?

#

Thanks. This looks also doable with SQL.

raven glacier
#

Anyone with experience in learning Spark from scratch via PySpark?

drowsy grove
#

Regarding exploratory data analysis: What can be done in the beginning when analyzing a dataset?

#

I can think of checking dimensions, missing values.

#

Run a df.info(), plot some visualization. Anything else?

brisk swallow
#

Using Pandas and Numpy at the moment

velvet thorn
#

@raven glacier me

raven glacier
velvet thorn
#

well

#

if you like books, maybe? I didn't learn from a book

#

do you already have experience with pandas?

raven glacier
#

Very basic knowledge

#

Where did you learn from

velvet thorn
#

I was already experienced with pandas

#

the concepts are more or less the same

#

so I just started working with it

#

the main difference is SQL-like queries

#

and the ML model

raven glacier
#

okay so you just learnt as you went along by googling etc

velvet thorn
#

more or less, yeah

#

I knew enough to be able to do work

#

that was enough

raven glacier
#

fair enough mate

uncut shadow
#

Do you think making my own models from scratch (for example, only with NumPy and Matplotlib) makes sense? I want to jump into machine learning and things like deep learning so do you think implementing my own models from scratch makes sense?

velvet thorn
#

yes

#

well, I would do it (have done it)

jolly briar
#
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import pandas_datareader as pdr
plt.show()
plt.close("all")
ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
ts.plot()

the above isn't plotting, using ipython in a terminal (osx), i'm expecting a popup plot but am not getting one, anything obvious I'm missing?

#

ok, plt.show() just has to be at the end it seems ๐Ÿ™ƒ

#

this seems kinda verbose, if i'm running ts.plot() in an interactive session it seems that i'm going to want to view it immediately, rather than having to enter plt.show() each time after

drowsy grove
#

I was asked to describe "how to make a cup of tea to people who know nothing about it" today in a big data analyst interview.

jolly briar
#

3 to 4 minutes, don't squeeze the bag

drowsy grove
#

I was a little flabbergasted but I tried my best to break it down into steps.

#

I know there isn't a correct answer. But what shall I specifically emphasize when answering questions like this.

#

@jolly briar they didn't say it comes in bags

jolly briar
#

you can make reasonable assumptions i'm sure

drowsy grove
#

But in my limited experience, loose tea, tea bags, procedure is the same haha.

jolly briar
#

well, depends on the tea for temp and stuff but sure... i would assume that they're just seeing a problem broken down into steps here tho

drowsy grove
#

That's what I thought. Still, a fun question and thought I would share.

#

They made me choose

#

Either this or how to get leaves off my lawn.

jolly briar
#

unless they were waggling their cup at you when they asked, maybe it was a hint

drowsy grove
#

I never used leaf blower once so I chose

#

Haha, it would be. But this was on the phone so I wouldn't know.

jolly briar
#

hope it went ok ๐Ÿ™

drowsy grove
#

@jolly briar what does plt.close("all")do in your code?

#

Thanks man!

jolly briar
#

closes all the plots i reckon, i've not used matplotlib for ages so it's all a mystery... ggplot > matplotlib ๐Ÿ˜ค

drowsy grove
#

Ugh, I just do df.groupby()....plot()usually

#

Would I just be using the plot function of pandas?

jolly briar
#

yea but it's calling matplotlib anyway afaik

drowsy grove
#

Gotcha. Thanks. I learned from Brandon Rhodes's talk.

#

Most people on Kaggle use sns. And it does make a different I have to admit.

jolly briar
#

matplotlib is rank in my opinion

drowsy grove
#

rank as in stink?

jolly briar
#

yes, i don't like it... i always feel as though i'm having to do more than is necessary when i use it, feels too low level or whatever. I think seaborne helps there, but gg ( for me ) is really nice. Building things up with pipes in dplyr, passing to gg, facet wraps etc, are all quite nice

silent swan
#

I like matplotlib because it gives me explicit control over everything

jolly briar
#

yeah i can get that, i've never really found i need it tho

drowsy grove
#

I need to play with gg and see what it does

#

What is facet wrap? @jolly briar

jolly briar
#

it makes great plots in a pretty logical manner...
facet wrap enables you to create a grid of plots easily, like facet_wrap(sex ~ .) will create a grid by sex

silent swan
#

I've had to make very many different kinds of plots, so sooner or later any premade abstraction fails for me