#data-science-and-ml | Python | Page 213

oblique belfry Dec 24, 2019, 9:04 PM

#

No. I’m in the same boat. I want a tad bit more structure at my workplace.

vital cipher Dec 25, 2019, 12:52 PM

#

guys want inputs as in material, things to learn.... on cyber security threat analysis-- like where to start.....or any suggestions

#

was looking into materials from https://medium.com/cyberdefenders/python-for-cybersecurity-lesson-4-network-traffic-analysis-with-python-6321f4c9d3f7
but this does not share any insights on threat analysis.....

#

do share your inputs

#

🙂

somber kayak Dec 26, 2019, 12:05 AM

#

I would hop on over to #cybersecurity @vital cipher

torn garden Dec 26, 2019, 10:57 PM

#

@drifting hemlock that was a lot of words for saying basically nothing

clear jetty Dec 26, 2019, 11:52 PM

#

@torn garden Seems to me @drifting hemlock is talking about how to get data science work done when you're dealing with people on your team with very little (if any) technical knowledge? Or am I reading that wrong?

drifting hemlock Dec 27, 2019, 1:57 AM

#

@torn garden @clear jetty I guess I did a bad work at explaining myself, my point is that, in the workplace, it's very difficult to structure a data science team within an organization and providing them with the tools and the databanks they need in order to concentrate on the tasks they are assigned to do, and furthermore, to provide them with the structure to actually deploy machine learning models that works for the business.

#

This is a problem that you don't actually read in the myriad of blogs out there. For example, if we are assigned with a simple binary classification task, if we do this as a hobby then it's very simple: Clean the data set, train a model, deploy a simple flask server to act as an API.

Is it that simple in a work environment? Hardly, even less if you're working with a team. First you don't get a dataset as in Kaggle, you have to create it from scratch using multiple sources, then you have to document the process because you have to inform the stakeholders how the model works and why it predicted certain value or you'll risk credibility. You also need a cloud service to work with your teammates because only using github will not be enough. And you need a whole pipeline to retrain your model every now and then. Not mentioning a lot of things in between to ensure that something can be deploy within the organization.

#

If the last paragraph was confusing, it's because that confusing it is to build and manage a data science team, we don't really have a platform that makes this process a lot easier, and if there is, well, let me know then, that'd be great!

scarlet night Dec 27, 2019, 2:24 AM

#

I'm participating in a data science course, and I made this visualization to address the question of "what is the correlation between Carbon Monoxide concentration and air quality?"

From the more professional data scientists here, is there any advice on things to improve about the graph?

📎 trial_fig.png

drifting hemlock Dec 27, 2019, 2:37 AM

#

@scarlet night Well I'm not the best when it comes to visualizations but usually there are a couple questions that you can ask yourself to improve them:

Who is your audience? If you define how technical your audience is and how much domain language they have then you can either simplify or add more relevant details to it. If it's a broader audience you might even want to add a caption that summarizes what you are trying to say.
What problem (or benefit) are you trying to pinpoint in this visualization? If it's an all time low, they you might want to add a label with the value of the highest or lowest point. I know that we have it at the left/right side of it, but it can be hard to read.

#

Hope you nail that course btw :)

#

Oh and I forgot to say, it looks great for me! I just wanted to give some advice.

scarlet night Dec 27, 2019, 2:41 AM

#

Thanks, I've had the wisdom of the average people by asking people what they think in other Discord servers

The audience is between the academic people and the average person, as while having specific and more technical terms and data, its insights and results apply to everyone: pollutants in our air make the air quality worse

small nebula Dec 27, 2019, 3:26 AM

#

im fairly new to data science, and im trying to get started with tensorflow. does anyone have a moment to help me get started?

silent swan Dec 27, 2019, 3:27 AM

#

@drifting hemlock I don't think there's going to be a single solution to everything you mentioned. Data Science is (almost intentionally) an incredibly broad term and covers a ton of different contexts/paradigms, and I think no one system is going to be able to streamline all the different use cases and workflows. So it's not surprising to me that you've not found much meaningful to that end. I think everyone just patches together some workflow that works well for them and their particular problem and sticks with it.

#

@small nebula what're you trying to do, and is there a reason why tensorflow over pytorch?

small nebula Dec 27, 2019, 3:28 AM

#

well tensorflow is the first thing that came up on my google search, a good while ago. been playing around a bit with the tutorials and haven't really checked out pytorch yet

silent swan Dec 27, 2019, 3:29 AM

#

I am quite biased, but unless you have a strong reason to use TensorFlow (the specific model you want is in TF, or you need to deploy models for business usage), PyTorch would be more recommended

#

e.g. if you're just learning DL for fun / curiosity, go with PyTorch

scarlet night Dec 27, 2019, 3:45 AM

#

@void anvil I find graphing it like that makes the data less meaningful, as it is much harder for people to understand the significance of a roughly 45 degree angle line compared to 2 lines graphed over time lining up
Also, graphing it over time serves the purpose of showing that this is a constant correlation, not just a day overlapping.

#

(An example of graphing it like a scatter)

📎 unknown.png

#

Though seeing it, I do see what you mean, but I don't think it helps as much

velvet thorn Dec 27, 2019, 4:14 AM

#

@silent swan why do you say that?

silent swan Dec 27, 2019, 4:15 AM

#

which statement

oblique belfry Dec 27, 2019, 4:52 AM

#

If you find TF frustrating but still want to use it, I’d recommend trying out Keras. It’s a higher level api for TF (and other platforms) that simplifies dev.

silent swan Dec 27, 2019, 4:54 AM

#

I'd recommend PyTorch over TF/Keras regardless unless you have specific reasons for needing TF

#

a higher level API can make it harder rather than easier to debug

velvet thorn Dec 27, 2019, 5:33 AM

#

I agree with that

#

the latter statement

#

so far TF/Keras have worked fine for me though, so I guess I personally see no compelling reason to try PyTorch out

scarlet night Dec 27, 2019, 5:37 AM

#

A related article talking about the differences between TF and PyTorch:
https://medium.com/@UdacityINDIA/tensorflow-or-pytorch-the-force-is-strong-with-which-one-68226bb7dab4

Medium

Tensorflow or PyTorch : The force is strong with which one?

By — Yashwardhan Jain

silent swan Dec 27, 2019, 5:44 AM

#

worth noting that PyTorch does support Tensorboard now

#

TF also has Eager mode now but from everything I've heard, it's still tedious af to use/debug

oblique belfry Dec 27, 2019, 1:32 PM

#

I have found Tensorboard to be....annoying.

I get the debugging part. I like the abstraction from the static graph.

The more I have played with Pytorch, the more I want to use it on our next work project.

drifting hemlock Dec 27, 2019, 3:52 PM

#

Does anyone works with Tableau Server or Online?

snow junco Dec 27, 2019, 5:11 PM

#

Has anyone tried this course? Is this course good for someone that is new to ds? https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

Udemy

Learn Python for Data Structures, Algorithms & Interviews

Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

silent swan Dec 27, 2019, 7:09 PM

#

personally I've never used Tensorboard much

#

I much prefer to log my own metrics and then build my own visualizations

drifting hemlock Dec 27, 2019, 9:28 PM

#

What you guys use to deploy your experiments?

#

I'm deciding between Flask or Django. Flask would be easier but I'm concerned about the scalability, but I don't know if Django would be an overkill for a very simple API.

drifting hemlock Dec 28, 2019, 12:17 AM

#

Well definitely I'm not a Flask expert, so I'm open to suggestions, @void anvil do you think it would lag if I set up an API that handles multiple requests?

#

It's not gonna be a TON of requests, but it would be great to have a responsive API that deals with a decent amount of requests.

rough anvil Dec 28, 2019, 12:34 AM

#

@snow junco I'm currently about 20% through the udemy course you linked, and so far it has been fantastic.

#

it isn't geared towards absolute beginners, but as long as you know the basics you'll be fine

drifting hemlock Dec 28, 2019, 1:36 AM

#

@void anvil I guess I'll try Flask then, thanks!

drifting hemlock Dec 28, 2019, 2:56 AM

#

You guys think that 25% vs 75% would count as an imbalanced dataset for a binary classification exercise?

silent swan Dec 28, 2019, 2:57 AM

#

I would say yes, but whether it's "sufficiently imbalanced" might depend on your context

drifting hemlock Dec 28, 2019, 3:05 AM

#

Yeah that's a good point

silent swan Dec 28, 2019, 3:06 AM

#

eg if you have a ton of data and the problem is simple, and you're using the right metric, 3-1 might not be a big issue

#

but if it's a hard problem and with some metrics, even 2-1 can be very bad if you don't account for it

languid trail Dec 28, 2019, 9:15 AM

#

hey guys - what method would you use for scraping pages that are loaded with javascript?
PhantomJS is deprecated, PyQt4 doesn't work for me, chrome webdriver also does not work for me

#

or maybe someone can help me fixxing the chrome webdriver issue:

  File "scraper.py", line 4, in <module>
    driver = webdriver.Chrome("/root/chromedriver")
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
    desired_capabilities=desired_capabilities)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)```

smoky pendant Dec 28, 2019, 9:17 AM

#

hm, I can give it a shot. with chrome webdriver do you mean selenium, or am I just being stupid about another tool here xD

languid trail Dec 28, 2019, 9:17 AM

#

yeah selenium

#

I downloaded the chromedriver here: https://chromedriver.storage.googleapis.com/80.0.3987.16/chromedriver_linux64.zip

smoky pendant Dec 28, 2019, 9:18 AM

#

hm, sounds interesting. it often get's the job done for me. what site, and what are you trying to get?

#

oh, just looked at the error. the driver is not starting.

languid trail Dec 28, 2019, 9:19 AM

#

for learning I want to scrape this site: https://pythonprogramming.net/parsememcparseface/ and the <p> with id "yesnojs"

Python Programming Tutorials

Python Programming tutorials from beginner to advanced on a massive variety of topics. All video and text tutorials are free.

#

from selenium import webdriver
  
url = 'https://pythonprogramming.net/parsememcparseface/'
driver = webdriver.Chrome("/root/chromedriver")
driver.get(url)
dynamicText = driver.find_element_by_id("yesnojs")
message = dynamicText.get_attribute('innerHTML')
print("Dynamic text element says: '" + message + "'." )```

smoky pendant Dec 28, 2019, 9:20 AM

#

code looks correct, never used arguments with the driver. just put it in the working directory. but I am pretty sure it should work like you have it.

#

what chrome version are you running?

languid trail Dec 28, 2019, 9:21 AM

#

80

smoky pendant Dec 28, 2019, 9:21 AM

#

you downloaded from this site? https://chromedriver.storage.googleapis.com/index.html?path=80.0.3987.16/

languid trail Dec 28, 2019, 9:21 AM

#

yep

#

wget https://chromedriver.storage.googleapis.com/80.0.3987.16/chromedriver_linux64.zip

smoky pendant Dec 28, 2019, 9:22 AM

#

then the driver should be correct, hm.

#

(unknown error: DevToolsActivePort file doesn't exist)

languid trail Dec 28, 2019, 9:22 AM

#

https://stackoverflow.com/questions/50642308/webdriverexception-unknown-error-devtoolsactiveport-file-doesnt-exist-while-t

Stack Overflow

WebDriverException: unknown error: DevToolsActivePort file doesn't...

I am trying to launch chrome with an URL, the browser launches and it does nothing after that.

I am seeing the below error after 1 minute:

Unable to open browser with url: 'https://www.google.co...

#

lets see if adding --disable-dev-shm-usage will help 😄

smoky pendant Dec 28, 2019, 9:23 AM

#

just going to recommend that link xD, just reading over it.

#

sounds like something new in a new update

#

I started seeing this problem on Monday 2018-06-04. Our tests run each weekday. It appears that the only thing that changed was the google-chrome version (which had been updated to current) JVM and Selenium were recent versions on Linux box ( Java 1.8.0_151, selenium 3.12.0, google-chrome 67.0.3396.62, and xvfb-run).

#

wait, that says chrome 67, not 80.

#

But when we investigated further, noticed that XVFB screen doesn't started property and thats causing this error. After we fix XVFB screen, it resolved the issue.

#

also this -> https://github.com/heroku/heroku-buildpack-google-chrome/issues/46

GitHub

DevToolsActivePort file doesn't exist. · Issue #46 · heroku/hero...

I`m trying to run selenium with chromedriver (with python) on heroku and I'm getting this error: Traceback (most recent call last): ... File "/app/.heroku/python/lib/python3.6/site...

languid trail Dec 28, 2019, 9:29 AM

#

📎 unknown.png

#

it works

#

i just removed my path from driver = webdriver.Chrome(chrome_options=chrome_options)

#

🆗

#

Senior Python Developer

smoky pendant Dec 28, 2019, 9:30 AM

#

glad you got it working 😃

serene scaffold Dec 28, 2019, 4:26 PM

#

Anyone experienced with spaCy?

#

I'm doing some unit testing and it gets convoluted when I've changed a property of the Token or Doc class in another test, so there's no way to anticipate what properties the Doc object in my test will have

#

I'm wondering if there's a way to reset the Token and Doc classes at the beginning of each test.

uncut shadow Dec 28, 2019, 5:06 PM

#

Hey. What is the difference between a * b and numpy.dot(a, b)?

#

Also, what is learning rate and weights in Machine learning?

silent swan Dec 28, 2019, 6:22 PM

#

a*b will do element wise multiplication. np.dot will do a dot product (which in the case of vectors, means element-wise multiplication and then summing)

#

weights are the parameters of your model. learning rate is a hyperparameter for your optimizer that influences how quickly you want your optimizer to try to learn to the data

uncut shadow Dec 28, 2019, 6:50 PM

#

So np.sum(a * b) != np.dot(a, b)?

#

Hmmm... What are weights for?

silent swan Dec 28, 2019, 7:08 PM

#

they're the parts of the model that you learn

#

it depends on what a and b are

lapis sequoia Dec 29, 2019, 12:45 AM

#

trying to make a seasonal decomposition on a time series but i get some errors.

📎 unknown.png

#

the series type object

📎 unknown.png

#

any ideas? I'll post the error as well

silent swan Dec 29, 2019, 12:48 AM

#

always post the error when you're asking why you're getting errors

lapis sequoia Dec 29, 2019, 12:48 AM

#

📎 unknown.png

silent swan Dec 29, 2019, 12:49 AM

#

your series has dtype object, rather than int or float

lapis sequoia Dec 29, 2019, 12:49 AM

#

hm better use quotes i guess

silent swan Dec 29, 2019, 12:49 AM

#

try calling .astype(float) or checking if there're any weird values (e.g. strings) in your series

lapis sequoia Dec 29, 2019, 12:55 AM

#

must be because i dont know well the Series class yet, but iterating and checking each value was indeed int type

#

anyway, i used .astype and will see if everything goes right now

compact bluff Dec 29, 2019, 1:31 AM

#

I have a very large dask array representing an image, how can I store it as a png quickly?

#

skimage's imsave is quite slow and so is converting the dask array to a PIL image

deft harbor Dec 29, 2019, 1:45 AM

#

📎 1_4ZWNTBXDUiKMRKtgSegqew.png

silent swan Dec 29, 2019, 7:17 AM

#

my recommendation is

#

don't use deep learning for finance

#

speaking as someone with experience with both quant finance and deep learning

silent swan Dec 29, 2019, 7:56 AM

#

standard statistics/econometrics problem

#

basically: don't treat stock prediction as a machine learning, treat it as an economics problem that you need to solve with statistics tools

#

more concretely: don't do stock prediction

heavy thicket Dec 29, 2019, 10:33 AM

#

My first graph with matplotlib, pandas and seaborn! python

📎 Figure_4.png

lapis sequoia Dec 29, 2019, 11:02 AM

#

Guys, what are the best resources to learn data science, not just like machine learning but literally all the theory, i'm really overwhelmed

#

I tried to learn statistics by my own, but i don't see the theories being applied to data science, so i kinda need a practical way to learn? Any suggestion for this?

stray monolith Dec 29, 2019, 2:00 PM

#

How to force Y-axis to start from 0 (zero)?

📎 Figure_1.png

#

def plot(x_axis, points, title, x_label, y_label):
    plt.title(title)
    plt.xlabel(x_label, fontsize=3)
    plt.ylabel(y_label)
    plt.bar(x_axis, points, width=0.80)
    plt.xticks(rotation=90)
    plt.ylim([0, 10])
    plt.show()

heavy thicket Dec 29, 2019, 3:11 PM

#

@stray monolith Maybe try this: plt.ylim(ymin=0)

delicate carbon Dec 29, 2019, 5:31 PM

#

I want to use matplotlib.pyplot to plot a double line graph in which i can analyze for overfitting/underfitting. I am confused as to which variables I should pass to it https://pastebin.com/yfiZbKGy Ive messed around with trying to graph my intended goal for awhile before deleting my attempts

Do I put training and validation outputs before or after training or in other words what are the ideal metrics to use to plot for overfitting/underfitting?

Pastebin

[Python] #!/usr/bin/env python3 # Build Decision Tree Model # ...

silent swan Dec 29, 2019, 6:10 PM

#

@lapis sequoia you can use ML/NLP to generate features for prediction, but unless you're ready to read up a lot about quant finance, don't get anywhere near the stock prediction problem. There're too many ways to do it wrongly and draw the wrong conclusions if you don't know what you're doing.

#

@lapis sequoia Elements of Statistical Learning, but sooner or later you're going to need to dig in and read a standard probability/statistics textbook

uncut shadow Dec 29, 2019, 6:56 PM

#

@stray monolith I think you already solved this problem, but if you didn't, then you should do plt.ylim(0). The same can be achieved x axis with plt.xlim(0) (but not in your plot)

oblique belfry Dec 29, 2019, 10:15 PM

#

http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf

To those who do Computer Vision, this paper is very interesting. Geoffrey Hinton is the main author on the paper. His resume is ridiculous.

silent swan Dec 29, 2019, 10:58 PM

#

I'd say stick with convnets for now

#

Hinton has been pushing capsules for quite a while but as far as I've seen it's still not had any meaningful adoption

lapis sequoia Dec 30, 2019, 12:40 AM

#

@silent swan I have that book, but i think that’s more into machine learning rather than the workflow in data science, can you recommend a more general text in Data Science? The thing is that i’m confused on what to really learn and how to connect the dots

oblique belfry Dec 30, 2019, 1:02 AM

#

Oh...I am def going to stick with Convnets. I just find it funny how he thinks max pooling is stupid, and for some reason it works better than all the other pooling.

stray monolith Dec 30, 2019, 5:10 AM

#

Guys I fixed that issue.

#

plt.ylim() was not working for some reason

#

Then I found out that the problem was because the points were being plotted as Strings

#

I converted the data to float using list comprehension and everything is fine now

#

But thanks for your input anyways @heavy thicket @uncut shadow

uncut shadow Dec 30, 2019, 9:34 AM

#

ohh

#

that makes sense lol

worn stratus Dec 30, 2019, 9:12 PM

#

I need to discretise continous (and continousish) data for an ID3 decision tree- At the moment, I'm just finding the best possible value to split an attribute in two, is this a valid solution, or do I need to try and account for cases where I need to split the data into more than two parts?

blazing bramble Dec 30, 2019, 10:49 PM

#

Anyone here experienced at webscraping?

#

i'm having some difficulties regarding scraping a website.

#

I want to store all the data but, the show-more button at the end of the page, hides info so i only get a bit of the information

#

How do i get around this..

worn stratus Dec 30, 2019, 10:59 PM

#

Use selenium and press the button with that

blazing bramble Dec 30, 2019, 11:05 PM

#

i was using python

#

and beautiful soup

#

is there anyway to acomplish this with bs?

lapis sequoia Dec 30, 2019, 11:06 PM

#

it depends on the site, but in some cases you need something like selenium

#

there's also chromedriver

#

and the mozilla one, mozilladriver i think

#

sorry geckodriver

#

https://chromedriver.chromium.org/

ChromeDriver - WebDriver for Chrome

WebDriver for Chrome

#

https://github.com/mozilla/geckodriver

GitHub

mozilla/geckodriver

WebDriver for Firefox. Contribute to mozilla/geckodriver development by creating an account on GitHub.

#

if you want to stick to just beautiful soup, there could be some way to trigger it, but it entirely depends on the site

blazing bramble Dec 30, 2019, 11:08 PM

#

the site is just best buy

lapis sequoia Dec 30, 2019, 11:08 PM

#

a lot of things will use some backend undocumented api that runs the site

blazing bramble Dec 30, 2019, 11:08 PM

#

the show more button

#

so what exactly is selenium?

lapis sequoia Dec 30, 2019, 11:09 PM

#

its a browser emulator of sorts

#

https://selenium-python.readthedocs.io/getting-started.html

#

capable of keyboard/mouse interaction

blazing bramble Dec 30, 2019, 11:10 PM

#

hm.. interesting

#

sounds more powerful then bs

lapis sequoia Dec 30, 2019, 11:10 PM

#

BS has some nice functionality to it

#

i prefer it in most cases, only use selenium if absolutely necessary

blazing bramble Dec 30, 2019, 11:11 PM

#

but don't most modern sites have multiple pages and show-more buttons

#

where is bs applicable if it cannot

lapis sequoia Dec 30, 2019, 11:12 PM

#

sure but a lot of them also usually have backend apis that serve the data

#

and a lot of those things can still be triggered with the proper request

#

theres scrapy too

blazing bramble Dec 30, 2019, 11:13 PM

#

lol ima sound like a noob but

lapis sequoia Dec 30, 2019, 11:13 PM

#

scrapy looks really nice

#

bs is just easier to use

blazing bramble Dec 30, 2019, 11:13 PM

#

how do u find the backend api requests that you're reffering to?

lapis sequoia Dec 30, 2019, 11:13 PM

#

you can use firefox or chrome

#

the dev tools have a network inspector

#

you can watch elements of the page load

blazing bramble Dec 30, 2019, 11:14 PM

#

yes i was doing this

lapis sequoia Dec 30, 2019, 11:14 PM

#

so you look for ones that are jso

blazing bramble Dec 30, 2019, 11:14 PM

#

i noticed some thing called loadmore

lapis sequoia Dec 30, 2019, 11:14 PM

#

json

blazing bramble Dec 30, 2019, 11:14 PM

#

was not sure what to do, however

#

json? why json?

lapis sequoia Dec 30, 2019, 11:16 PM

#

thats what most apis use

#

an api endpoint typically just returns a json document

#

or graphql

#

which i think also registers as json in the network inspector

#

search pages in particular usually have them

#

do you have an example link?

blazing bramble Dec 30, 2019, 11:17 PM

#

um

#

yes

#

i can just send the current link

#

i am working with atm

#

https://www.bestbuy.ca/en-ca/collection/video-games-accessories-on-sale/17056?icmp=event_evergreen_shopbycategory_videogames

lapis sequoia Dec 30, 2019, 11:18 PM

#

oh i see

#

hang on

#

https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C redirects&lang=en-CA&page=2&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc

blazing bramble Dec 30, 2019, 11:20 PM

#

hm..

lapis sequoia Dec 30, 2019, 11:20 PM

#

now if that json has all the data you need

blazing bramble Dec 30, 2019, 11:20 PM

#

what is this?

lapis sequoia Dec 30, 2019, 11:20 PM

#

you can change the parameters in that call

#

its the api that drives their search

#

if you have the network tab open

#

go to that page

#

then click 'show more'

#

sort by Type

#

look for json

blazing bramble Dec 30, 2019, 11:22 PM

#

ok

#

if you do this, then you get the link you just sent?

lapis sequoia Dec 30, 2019, 11:23 PM

#

yeah

blazing bramble Dec 30, 2019, 11:23 PM

#

Ok, interesting. And now you can scrape this weblink

lapis sequoia Dec 30, 2019, 11:23 PM

#

exactly

#

and if you look through it, it should give you enough info to filter, and page through the results

#

and json is easy to work with once you have it

#

no html parsing

blazing bramble Dec 30, 2019, 11:25 PM

#

so bs is still applicable here

#

correct?

lapis sequoia Dec 30, 2019, 11:25 PM

#

yes, though really you could do it with just requests

#

import json
import requests

response = requests.get(...)
json_data = json.loads(response.text)

blazing bramble Dec 30, 2019, 11:26 PM

#

so this data is in json format?

lapis sequoia Dec 30, 2019, 11:26 PM

#

yeah

#

apis dont always provide everything though

blazing bramble Dec 30, 2019, 11:26 PM

#

I see, however I am not too familiar with how json is read or even works.

lapis sequoia Dec 30, 2019, 11:27 PM

#

json is easy

blazing bramble Dec 30, 2019, 11:27 PM

#

gotta read up on it

lapis sequoia Dec 30, 2019, 11:27 PM

#

its dicts and lists

blazing bramble Dec 30, 2019, 11:27 PM

#

you said that the apis don't give everything?

#

interesting...

lapis sequoia Dec 30, 2019, 11:27 PM

#

yeah a lot of times search pages will have like

#

link and title

#

so then id use beautiful soup to hit that link

#

scrape the content that isnt served by the api

#

that best buy api is a bit crazy in terms of detail

#

pretty heavy

blazing bramble Dec 30, 2019, 11:28 PM

#

that's a good thing for my purposes i think

lapis sequoia Dec 30, 2019, 11:28 PM

#

definitely

#

not so great for actually running a site

blazing bramble Dec 30, 2019, 11:29 PM

#

lol i'm just a student who's trynna play with some new technology for my resume

lapis sequoia Dec 30, 2019, 11:29 PM

#

scraping is a good place to start

blazing bramble Dec 30, 2019, 11:29 PM

#

this project seems so daunting now 0-0

lapis sequoia Dec 30, 2019, 11:29 PM

#

just keep at it, its not too hard

#

i like to use jupyter to prototype these things

#

makes it a lot easier to test navigating through the site to build the scraper

blazing bramble Dec 30, 2019, 11:30 PM

#

jupyter?

lapis sequoia Dec 30, 2019, 11:30 PM

#

https://jupyter.org/

Project Jupyter

The Jupyter Notebook is a web-based interactive computing platform. The notebook combines live code, equations, narrative text, visualizations, interactive dashboards and other media.

#

its great for prototyping, you can re run small parts of the code

#

rather than re-running everything

#

if you hit an error, you just retry

#

and you can use it to document the exploration

#

so you can look back later, put it all together

#

great for learning, especially if you are handling larger datasets

blazing bramble Dec 30, 2019, 11:40 PM

#

ok so sorry lol

#

but like how exactly do i use this link lol

lapis sequoia Dec 30, 2019, 11:40 PM

#

no worries

blazing bramble Dec 30, 2019, 11:40 PM

#

it's a json right? so like json manipulation or?

lapis sequoia Dec 30, 2019, 11:41 PM

#

yes its json

#

so if you do something like this

#

import json
import requests

response = requests.get('https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page=2&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc')
json_data = json.loads(response.text)

#

json_data is a dict

blazing bramble Dec 30, 2019, 11:44 PM

#

key pairs

#

if i remember correctly

lapis sequoia Dec 30, 2019, 11:44 PM

#

yup

#

json_data['products']

#

is a list of products

#

its a list of dicts

#

and if you look at the url, it has 'page=2'

#

so that list is the list of dicts of products on page 2

#

iterate through the pages, iterate through the products lists

#

📎 unknown.png

blazing bramble Dec 30, 2019, 11:49 PM

#

wow!

#

that's really cool

#

How exactly did you know 'products' was the key for this json file

lapis sequoia Dec 30, 2019, 11:50 PM

#

firefox has a nice json navigator

#

just looked at it, its at the top

#

if you look at json_data['totalPages'] as well, thats the total number of pages

#

so youd want to iterate through a list from 1 to that

#

and change the url each time to set page= that number

blazing bramble Dec 30, 2019, 11:56 PM

#

So if you'd want

#

all the products

#

from each page

lapis sequoia Dec 30, 2019, 11:56 PM

#

just make a list and append each page's product list to that list

blazing bramble Dec 30, 2019, 11:57 PM

#

how would you navigate through each page like that tho, just a loop through the pages?

lapis sequoia Dec 30, 2019, 11:57 PM

#

do something like

#

for page in range(1, json_data['totalPages']+1):
  ... request & json stuff

#

and in the request url you'd put that page

#

where it says page=

#

so

#

f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'

blazing bramble Dec 31, 2019, 12:02 AM

#

oh ok so then load in the request url

#

then just append the different 'products' into a list?

lapis sequoia Dec 31, 2019, 12:02 AM

#

yeah exactly

blazing bramble Dec 31, 2019, 12:03 AM

#

kinda like what you showed above?

lapis sequoia Dec 31, 2019, 12:03 AM

#

yeah

#

use extend

blazing bramble Dec 31, 2019, 12:03 AM

#

extend?

lapis sequoia Dec 31, 2019, 12:03 AM

#

create a new list, then list.extend(new list)

#

https://www.programiz.com/python-programming/methods/list/extend

Python List extend()

The extend() extends the list by adding all items of a list (passed as an argument) to the end.

#

# language list
language = ['French', 'English', 'German']

# another list of language
language1 = ['Spanish', 'Portuguese']

language.extend(language1)

# Extended List
print('Language List: ', language)

#

except in this case it'd be extending with json_data['products']

blazing bramble Dec 31, 2019, 12:04 AM

#

ahhhh

#

interesting.

#

I will now try this out

#

thank you for all the help!

lapis sequoia Dec 31, 2019, 12:05 AM

#

no problem, good luck

blazing bramble Dec 31, 2019, 12:06 AM

#

I really appreciate the examples, I am very much an example-based learner, and I could not find too many examples on the topic online.

#

Do you have any resources you could possibly pass on to someone who just wants to learn some more?

lapis sequoia Dec 31, 2019, 12:06 AM

#

github is one of the best sources honestly

#

just search through open repos

blazing bramble Dec 31, 2019, 12:07 AM

#

If you don't mind answering, are you also a college student?

lapis sequoia Dec 31, 2019, 12:07 AM

#

if there is a particular library or package you need help with, just search it

#

no i'm a data engineer

#

build apis, scrapers, stuff like that

blazing bramble Dec 31, 2019, 12:07 AM

#

hey that's cool

lapis sequoia Dec 31, 2019, 12:07 AM

#

work with the data science team

#

yeah its fun stuff

blazing bramble Dec 31, 2019, 12:08 AM

#

when'd you get interested in scraping and stuff?

lapis sequoia Dec 31, 2019, 12:08 AM

#

i was pretty young and i loved news, but hated the news sites at the time

#

i wanted something that would show me just the title and text

#

and rss wasnt really used at the time

#

no google news

#

so i made a scraper that ran and gathered all the news

#

ive made a few hundred since

blazing bramble Dec 31, 2019, 12:09 AM

#

damn lol

#

I'm just a cs student looking for some meat n potatoes on my resume, thought a scraper would be interesting.

lapis sequoia Dec 31, 2019, 12:10 AM

#

its good stuff

blazing bramble Dec 31, 2019, 12:10 AM

#

it's really cool I think

#

the whole internet becomes a data-base

lapis sequoia Dec 31, 2019, 12:10 AM

#

yeah pretty much

blazing bramble Dec 31, 2019, 12:11 AM

#

I was able to find the backend api json thing through the network tools, but i'm still confused as to if it represents every single page or just one

lapis sequoia Dec 31, 2019, 12:12 AM

#

the json is just for one page

blazing bramble Dec 31, 2019, 12:12 AM

#

so then what do i load into requests

#

like just the page i'm working with?

lapis sequoia Dec 31, 2019, 12:12 AM

#

like i said above, something like

#

for page in range(1, json_data['totalPages']+1):
  ... request & json stuff
  url = f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'

#

then do requests.get(url)

#

see the page=?

blazing bramble Dec 31, 2019, 12:14 AM

#

on the top of the page?

lapis sequoia Dec 31, 2019, 12:14 AM

#

in the url

#

&page=

blazing bramble Dec 31, 2019, 12:15 AM

#

oh yea

#

i see

#

you'd want to update that

#

which is what the for loop is acomplishing

#

incrementing it by 1 eachtime

lapis sequoia Dec 31, 2019, 12:16 AM

#

yup

blazing bramble Dec 31, 2019, 12:16 AM

#

for each page

#

ooooooo

lapis sequoia Dec 31, 2019, 12:16 AM

#

yeah and since totalPages is in the json dump

#

youd get the first url

#

url = f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'

#

and take the totalPages from that

#

import json
import requests

response = requests.get('https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page=1&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc')
json_data = json.loads(response.text)

for page in range(1, json_data['totalPages']+1):
  ... request & json stuff
  url = f'https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc'

#

that way as the # of pages changes, the scraper knows and iterates through the right length list

#

i mean do it better than that, but that's the basic idea

blazing bramble Dec 31, 2019, 12:22 AM

#

ok everything is starting to click

#

what is the f in the url in the loop sir?

#

f'

lapis sequoia Dec 31, 2019, 12:22 AM

#

its an f string

#

its the same as doing

#

"string {0}".format("is a string")

#

which returns 'string is a string'

#

so if page = 3

#

f"page {page}"

#

would return 'page 3'

#

f"string {"is a string"}"

#

LUL in retrospect string is a string is not the best example

blazing bramble Dec 31, 2019, 12:25 AM

#

uhh

lapis sequoia Dec 31, 2019, 12:25 AM

#

https://realpython.com/python-f-strings/

Python 3's f-Strings: An Improved String Formatting Syntax (Guide)...

As of Python 3.6, f-strings are a great new way to format strings. Not only are they more readable, more concise, and less prone to error than other ways of formatting, but they are also faster! By the end of this article, you will learn how and why to start using f-strings t...

blazing bramble Dec 31, 2019, 12:25 AM

#

I have not seen this before haha

lapis sequoia Dec 31, 2019, 12:25 AM

#

its new in 3

#

3.6 even

blazing bramble Dec 31, 2019, 12:25 AM

#

wow cool

#

new stuff is always cool

lapis sequoia Dec 31, 2019, 12:26 AM

#

actually for this something like

#

you could also do something like

#

base_url = "https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc"

for page in range(1, json_data['totalPages']+1):
  url = base_url.format(page)
  ... json stuff

blazing bramble Dec 31, 2019, 12:29 AM

#

wow

#

f strings are cool

lapis sequoia Dec 31, 2019, 12:29 AM

#

yeah they are shorter and look better

#

but they aren't re-usable the way my example above is

blazing bramble Dec 31, 2019, 12:30 AM

#

but essentially what's going on is they're taking the thing and making them string literals right?

lapis sequoia Dec 31, 2019, 12:30 AM

#

yes the f string evaluates the contents of the {} and returns a string

#

where as in the example above, the {} is only evaluated when the string has .format(value)

#

until you do that, {} is just a part of the string

#

📎 unknown.png

blazing bramble Dec 31, 2019, 12:32 AM

#

hm... why can't we just toss the url into the loop without the f tho

#

like what it's data type

lapis sequoia Dec 31, 2019, 12:33 AM

#

you need it to include the contents of page

#

so if page = 3

#

you need the url to say page=3

blazing bramble Dec 31, 2019, 12:33 AM

#

that makes sense

#

and you got the loop incrementing page

#

but why does the whole thing have to be a string

lapis sequoia Dec 31, 2019, 12:34 AM

#

the url?

blazing bramble Dec 31, 2019, 12:34 AM

#

mhm

lapis sequoia Dec 31, 2019, 12:34 AM

#

because you need a url to make a request

#

and the url needs to be a string

#

so either

#

base_url = "https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc"

for page in range(1, json_data['totalPages']+1):
  url = base_url.format(page)
  ... json stuff

#

or

#

for page in range(1, json_data['totalPages']+1):
  url = f"https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={page}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc"
  ... json stuff

#

id argue the first one with .format looks nicer

blazing bramble Dec 31, 2019, 12:35 AM

#

I think the second one makes more sense to me

lapis sequoia Dec 31, 2019, 12:36 AM

#

they work the same

#

you could also do

blazing bramble Dec 31, 2019, 12:36 AM

#

Ofc, yes

#

but like conceptually

#

the second one sits easier with me, just a preference

lapis sequoia Dec 31, 2019, 12:36 AM

#

for page in range(1, json_data['totalPages']+1):
  url = "https://www.bestbuy.ca/api/v2/json/sku-collections/17056?categoryid=&currentRegion=BC&include=facets%2C%20redirects&lang=en-CA&page={}&pageSize=24&path=&query=&exp=&sortBy=relevance&sortDir=desc".format(page)
  ... json stuff

#

yeah f strings are far easier to read

#

i prefer them myself

#

but its only in i think 3.6+

#

.format is older

blazing bramble Dec 31, 2019, 12:37 AM

#

i gotta read up on format

#

i forget

#

what it does

lapis sequoia Dec 31, 2019, 12:38 AM

#

https://realpython.com/python-string-formatting/

Python String Formatting Best Practices – Real Python

Learn the four main approaches to string formatting in Python, as well as their strengths and weaknesses. You'll also get a simple rule of thumb for how to pick the best general purpose string formatting approach in your own programs.

blazing bramble Dec 31, 2019, 12:38 AM

#

ahh it's a substituent

#

i see.

#

page is entered into the {}

#

ok that makes sense

lapis sequoia Dec 31, 2019, 12:39 AM

#

yes

#

you could also have {0}

#

0 being the first item in the list inside .format()

blazing bramble Dec 31, 2019, 12:39 AM

#

you are extremely helpful man

#

thanks a lot

lapis sequoia Dec 31, 2019, 12:39 AM

#

i'm bored

#

and like sharing

#

no problem

blazing bramble Dec 31, 2019, 12:40 AM

#

I mean,

#

i guess i'm bored too,

#

trynna learn this stuff lmao

#

to pass time

#

but it's very cool so, it's a nice pass time haha

lapis sequoia Dec 31, 2019, 12:40 AM

#

yeah it can be a good way to pass the time

#

especially if its for you

blazing bramble Dec 31, 2019, 12:41 AM

#

Hey did you ever have experience with getting a co-op?

lapis sequoia Dec 31, 2019, 12:41 AM

#

co-op?

blazing bramble Dec 31, 2019, 12:41 AM

#

internship

lapis sequoia Dec 31, 2019, 12:41 AM

#

oh no

blazing bramble Dec 31, 2019, 12:41 AM

#

different names different countries

lapis sequoia Dec 31, 2019, 12:41 AM

#

i was in IT by trade, got right into a paid job

#

that led to dev work, mostly python

#

which led to this

blazing bramble Dec 31, 2019, 12:42 AM

#

wow so did you have to get a cs-degree or not even?

lapis sequoia Dec 31, 2019, 12:42 AM

#

not even

blazing bramble Dec 31, 2019, 12:42 AM

#

o that's cool

#

many pathways

lapis sequoia Dec 31, 2019, 12:42 AM

#

yeah i wouldn't recommend my route

#

its harder

#

but its the road i took, so what can you do

blazing bramble Dec 31, 2019, 12:43 AM

#

haha yes i'm in school and planning on staying in it

#

the thing is, it's almost time to find experience

#

i'm scared for technicals man

lapis sequoia Dec 31, 2019, 12:43 AM

#

make your own experience

blazing bramble Dec 31, 2019, 12:43 AM

#

I currently am

lapis sequoia Dec 31, 2019, 12:43 AM

#

come up with some projects for yourself

#

build scrapers, bots

#

set them up right

#

learn how to use docker and deploy things

#

learn version control

blazing bramble Dec 31, 2019, 12:44 AM

#

what are these things?

lapis sequoia Dec 31, 2019, 12:44 AM

#

its a tip of an iceberg is what it is

#

docker is a great platform for virtualization

#

and delivering software in isolated containers

#

what it basically does is allow you to specify an environment for your app/script to run in

#

as an example, i have a docker script that loads a pre-built Ubuntu image

#

installs all of the required packages

#

copies the source over

#

you can run the code within that environment

#

that pretty much eliminates the odds that changes in your environment will break your script

#

then you can take that docker image and directly deploy it

#

so if you had a scraper, for example, you could push it to any server, either hosted at home or something on like AWS

#

just by pushing the entire image

#

no setting stuff up, cause its already set up

#

and that might sound hard

blazing bramble Dec 31, 2019, 12:47 AM

#

wow, this shit

lapis sequoia Dec 31, 2019, 12:47 AM

#

but the file that does it all

blazing bramble Dec 31, 2019, 12:47 AM

#

sounds so complex

lapis sequoia Dec 31, 2019, 12:47 AM

#

is like 20 lines

#

super simple

blazing bramble Dec 31, 2019, 12:47 AM

#

wow

lapis sequoia Dec 31, 2019, 12:47 AM

#

let me find an example

blazing bramble Dec 31, 2019, 12:47 AM

#

where the hell'd you learn this stuff

#

damn, youtube got nothing

lapis sequoia Dec 31, 2019, 12:48 AM

#

LUL

#

i did IT for a long time

#

just learned everything i could

#

automated everything i could

blazing bramble Dec 31, 2019, 12:48 AM

#

I just wish i knew some good resources you know?

lapis sequoia Dec 31, 2019, 12:48 AM

#

automate the simple stuff is a great starter

blazing bramble Dec 31, 2019, 12:48 AM

#

is python a good place to put my money in terms of where to be in cs

#

i'm learning so much shit

lapis sequoia Dec 31, 2019, 12:48 AM

#

sorry automate the boring stuff

#

https://automatetheboringstuff.com/

#

great starter

blazing bramble Dec 31, 2019, 12:49 AM

#

like html, css, c, java, low-level, a bunch of stuff

#

But i don't know where I wanna become specialized in

#

python is a nice language, solid, simple, and powerful

lapis sequoia Dec 31, 2019, 12:49 AM

#

language knowledge is pretty transferable

#

once you have your head around a couple

#

learning others is easy

blazing bramble Dec 31, 2019, 12:50 AM

#

that's true.

lapis sequoia Dec 31, 2019, 12:50 AM

#

python is a great starter

#

java is good

blazing bramble Dec 31, 2019, 12:50 AM

#

going from java to c was quite simple

#

but i hate c

#

i'm not a fan

earnest prawn Dec 31, 2019, 12:50 AM

#

thats because java came from c

blazing bramble Dec 31, 2019, 12:50 AM

#

well that'd make a lot of sense

#

but our school makes us do memory management manually

#

even tho, we have garb collectors

#

it be liek that 😦

earnest prawn Dec 31, 2019, 12:51 AM

#

well because a) garbage collectors suck, they draw performance and require a larger runtime and b) manual memory performance teaches you something about how things work

blazing bramble Dec 31, 2019, 12:53 AM

#

hey man, that's true

earnest prawn Dec 31, 2019, 12:53 AM

#

there is a reason for example many of the libraries you use in data science like tensorflow are written in C++

blazing bramble Dec 31, 2019, 12:53 AM

#

isn't c++ super powerful

#

like it's modern capabilities

velvet thorn Dec 31, 2019, 12:53 AM

#

knowing a little about how computers work @ a low level is really helpful sometimes

#

you don't need to be able to write a device driver

earnest prawn Dec 31, 2019, 12:54 AM

#

C++ is super powerful

#

in fact

#

c++ is too powerful

#

c++ is basically c + every object oriented feature mankind has thought of which makes it a super overloaded language where if you have a new project youll have engineers arguing about which of the versions from c++99 over 11 to 17 they should use

#

and once they have come down to a version of c++ what the best design pattern is because you can achieve the same thing in a million ways

velvet thorn Dec 31, 2019, 12:55 AM

#

1983 - Bjarne Stroustrup bolts everything he's ever heard of onto C to create C++. The resulting language is so complex that programs must be sent to the future to be compiled by the Skynet artificial intelligence. Build times suffer

earnest prawn Dec 31, 2019, 12:57 AM

#

a bunch of people at work who do profesional c++ in their department have come up with the theory that stroustrup came up with C++ only to make money off the books people will buy to understand it

blazing bramble Dec 31, 2019, 12:58 AM

#

As someone who is taking c++ nxt semester, these perspectives are very interesting.

earnest prawn Dec 31, 2019, 12:58 AM

#

i think there is a big difference between learning a language at school where youre limited to the feature set your teachers throw at you

blazing bramble Dec 31, 2019, 12:58 AM

#

no you're right

#

but still

earnest prawn Dec 31, 2019, 12:59 AM

#

and using it professionally where everyone and their mom has another idea on what design pattern is the same

velvet thorn Dec 31, 2019, 12:59 AM

#

you could try Scala

#

it's a good introduction to functional programming

earnest prawn Dec 31, 2019, 12:59 AM

#

when did we go to functional programming

blazing bramble Dec 31, 2019, 12:59 AM

#

i'm still a junior loll

earnest prawn Dec 31, 2019, 12:59 AM

#

and mor interestingly

blazing bramble Dec 31, 2019, 12:59 AM

#

idek what that is

earnest prawn Dec 31, 2019, 12:59 AM

#

why is this all happening in #data-science-and-ml

velvet thorn Dec 31, 2019, 1:00 AM

#

I think it started from the f-string discussion

#

which was in the context of scraping

blazing bramble Dec 31, 2019, 1:00 AM

#

Yes! i'm making a scraper

lapis sequoia Dec 31, 2019, 1:00 AM

#

which led to me going on about deploying it

blazing bramble Dec 31, 2019, 1:00 AM

#

my first real side project

lapis sequoia Dec 31, 2019, 1:00 AM

#

which led to languages

blazing bramble Dec 31, 2019, 1:01 AM

#

@lapis sequoia do you primarily program python or also do some others from time to time?

lapis sequoia Dec 31, 2019, 1:02 AM

#

my work is all python, do some C# for game dev i do for myself

#

i've done others

blazing bramble Dec 31, 2019, 1:02 AM

#

interesting, a true python master

lapis sequoia Dec 31, 2019, 1:02 AM

#

i'm no master

#

just an old amateur

blazing bramble Dec 31, 2019, 1:04 AM

#

So, how much math is involved in data science.

velvet thorn Dec 31, 2019, 1:04 AM

#

data science is a pretty wide field

blazing bramble Dec 31, 2019, 1:04 AM

#

typically in a college degree, the ds pathways are more math focused

velvet thorn Dec 31, 2019, 1:04 AM

#

it depends on what exactly you're doing.

#

minimally, I would say basic statistics and linear algebra

#

and calculus

blazing bramble Dec 31, 2019, 1:07 AM

#

wow calc too?

velvet thorn Dec 31, 2019, 1:07 AM

#

if you're planning to go down the research scientist path, then a lot more mathematics, probably

blazing bramble Dec 31, 2019, 1:07 AM

#

I was planning on not doing calc again lol

velvet thorn Dec 31, 2019, 1:07 AM

#

only if you need to do theory

#

that's a bit higher-level?

#

like if you're doing DL research

lapis sequoia Dec 31, 2019, 1:08 AM

#

There's a lot of work for just supporting data science as well

velvet thorn Dec 31, 2019, 1:08 AM

#

I'd say for the majority of junior data scientists it wouldn't be relevant

lapis sequoia Dec 31, 2019, 1:08 AM

#

As long as you have a grasp of the basic concepts

velvet thorn Dec 31, 2019, 1:08 AM

#

except insofar as you need it to understand stuff like gradient descent (actually, more or less only gradient descent...)

blazing bramble Dec 31, 2019, 1:11 AM

#

the world of development is super vast by the sounds of it

#

almost overwhealmingly vast

velvet thorn Dec 31, 2019, 1:11 AM

#

there are also things that are data science and not dev work, and vice versa

#

you can be a very bad coder and still be a fairly good data scientist

blazing bramble Dec 31, 2019, 1:12 AM

#

Damn i thought development was the thing that tied everything together kinda

velvet thorn Dec 31, 2019, 1:12 AM

#

well, if you mean software development specifically...then not really? because you can do DS without software at all, though that would be inefficient

#

however, it's fairly common for data scientists to not-really be developers

#

i.e. their expertise is in statistical analysis and experimentation

blazing bramble Dec 31, 2019, 1:13 AM

#

isn't the minimum these days a college degree for a post tho?

velvet thorn Dec 31, 2019, 1:13 AM

#

and being able to code is just a means to an end

#

that would depend on your country, really

#

mine is a little uptight about education, and I don't actually have a CS degree

#

so I kind of took the alternative route?

blazing bramble Dec 31, 2019, 1:14 AM

#

i'm in north america

#

also very degree orientated

velvet thorn Dec 31, 2019, 1:16 AM

#

not sure how it works over there.

#

but I'd say that that kind of requirement is becoming more and more common...?

#

and if you don't fulfil it then you probz need to show competence in some other way

blazing bramble Dec 31, 2019, 1:18 AM

#

yep

deft harbor Dec 31, 2019, 3:32 AM

#

Does anyone have any good resources on art and AI?

#

Im curious about music generation, lighting, paintings, etc.

wind marlin Dec 31, 2019, 12:01 PM

#

Anyone here doing financial time series forecasting? Is there anything open source that you found to work well? Perhaps utilizing RL for forecasting?

wind marlin Dec 31, 2019, 2:31 PM

#

Is a 1D convolutional network with attention something that I can get help with? I heard it's good for forecasting but haven't seen any code examples or examples of anyone using it.

worn stratus Dec 31, 2019, 3:58 PM

#

Machine learning for stock predictions isn't generally done. Youre almost certainly not going to do very well with it

#

Although as a learning exercise its probably fine

deft harbor Dec 31, 2019, 4:24 PM

#

Its pretty common to team machine learning with options trading. However, it pulls outside data and isn't just trying to predict the future based on movement alone.

#

Also, the fund managers I have discussed it with use it for marking potential trades not auto trading.

drifting hemlock Dec 31, 2019, 9:13 PM

#

What do you guys think about AutoML? Is it good enough?

deft harbor Dec 31, 2019, 9:37 PM

#

My biggest fear with machine learning is people creating models that impact our daily lives, but they dont really have a clue about what is going on.

#

That said, never used automl

drifting hemlock Dec 31, 2019, 9:39 PM

#

And that's exactly one of the reasons many experiments don't get deployed, I mean if you develop a machine learning model to predict X thing, then people are going to ask why it made such prediction. I feel that with AutoML you don't exactly know what's going on behind the scenes.

#

I mean it looks great and all, I've tinkered a bit with IBM Watson which is basically an AutoML, but I don't know, I don't trust it lol

deft harbor Jan 1, 2020, 12:01 AM

#

I'm concerned about people being able to deploy these models without understanding all the issues and ethical questions

#

Looking at you COMPAS, can't even see behind the curtain to defend yourself in court

deft harbor Jan 1, 2020, 1:12 AM

#

They will use satellite images to track shipments, global warming trends and weather data.. Etc

silent swan Jan 1, 2020, 1:33 AM

#

using DL to generate features 👍
using DL to predict stocks 👎

silent swan Jan 1, 2020, 2:17 AM

#

I was just explicitly agreeing with you

deft harbor Jan 1, 2020, 2:19 AM

#

Eh, its new year, I missed the first comment.

#

🍻

wind marlin Jan 1, 2020, 6:06 AM

#

I don't believe anyone can get great results using ML only, but I have a deep understanding of building algorithms that backtest decently already. My algorithms output into an oscillator format. I've only gotten to where I am because I know how other builders think. There's a lot of them and they all make the same errors. With ML I can use more instruments because I can't keep up visually, I'd rather get an alert than watch screens all day, I would lose my mind lol But I agree with you guys about stocks, I know about the satellite parking lot stuff. Evidence of insider trading/commitment is actually where ML should shine. I have some ideas for how to spot that. I want many tools that each look at different things.

delicate carbon Jan 2, 2020, 9:55 AM

#

Im trying to build a decision tree model with a bot I am making for Discord. I have no clue what im doing in regards to the ML part lol. I try to read tutorials but it gets so insanely heavy into math I zone out and look for another source.

I have a great dataset that is widely used. As long as I receive good accuracy and precision I think Ill be alright..

#

Before I even deploy the bot I will test the model against real data and if its misclassifying then i know ill need to make adjustments. I just play around with parameters and values until the results show. I know its not methodical but idk .. its works for me

ocean beacon Jan 2, 2020, 10:19 AM

#

You probably won't have enough data when you are training your discord bot live. You need to train your bot offline and feed it tens/hundreds of thousands of data.
I suggest doing a standalone ML project first and then worry about discord. ML is a really broad field. Some people who work in ML never work on the model but are purely occupied with gathering sensible data and optimizing those further or they are in the field of generating data which they plan to feed into their model. You are trying to make a finished product (the discord bot) but the core of your project (ML) is still at its infancy.

knotty canopy Jan 2, 2020, 11:54 AM

#

hey any1 know im trying to read excel file using pandas but i come across problem with o/p that it showing Nan ?

#

in terminal o/p?

paper niche Jan 2, 2020, 2:10 PM

#

what’s o/p?

worn stratus Jan 2, 2020, 2:13 PM

#

output I assume

knotty canopy Jan 2, 2020, 3:25 PM

#

Yep

deft harbor Jan 2, 2020, 6:31 PM

#

copy and paste the actual error

delicate carbon Jan 2, 2020, 6:34 PM

#

@ocean beacon This is the dataset i am using https://archive.ics.uci.edu/ml/datasets/phishing+websites there is also this one but holy cow, its an enormous file http://www.fcsit.unimas.my/research/legit-phish-set

I have thought about that, "what if my bot is misclassifying too many real samples?" Maybe I could use the decision tree + reinforcement learning or decision tree + a neural network (albeit, the simplest form of neural network. I really dont understand how to implement the logic of an extremely robust neural network)

Phishing Dataset

Complete datasets for phishing and legitimate websites are available for download here. We provide (a) raw samples of phishing websites and legitimate websites, and (b) feature vector data for machine learning based phishing detection.

#

I was reading a little bit about 2-class neural networks which might work

deft harbor Jan 2, 2020, 6:45 PM

#

So your goal would be to take items people are typing, and then capture ones that are phishing attempts?

delicate carbon Jan 2, 2020, 6:46 PM

#

When users post a link it will go to my model which is being hosted by flask. All links are analyzed

deft harbor Jan 2, 2020, 6:46 PM

#

If so, you will need more than just a decision tree. You will need NLP items to process the text, created topics, etc.

delicate carbon Jan 2, 2020, 6:46 PM

#

I did something similar. I created a py program that extracts features from incoming links

#

My dataset has 30 attributes so I extract those 30 attributes from urls

#

those 30 attributes from urls are then put into a new dataset. That new dataset is sent to the model which will determine if its phishing or not

deft harbor Jan 2, 2020, 6:49 PM

#

Ok, not sure how well it performs, but you can check for false positives by doing a confusion matrix or ploting AUC

#

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#sphx-glr-auto-examples-model-selection-plot-roc-crossval-py

delicate carbon Jan 2, 2020, 6:50 PM

#

I havent tested the dataset yet but I do have my decision tree all set up so I can do it whenever i want

#

thanks

deft harbor Jan 2, 2020, 6:51 PM

#

I would try tweeking the decision tree model, measuring overall performance and also check things like AUC before feeding that data to a NN

#

Thats just the way I would go about it

delicate carbon Jan 2, 2020, 6:52 PM

#

Definitely. I read good things about boosted trees so I will play around with that as well. What is NN?

deft harbor Jan 2, 2020, 6:52 PM

#

Also, you could apply bagging or boosting to the tree as well

#

neural network

delicate carbon Jan 2, 2020, 6:53 PM

#

Oh right 🙂

#

If i were to use a tree + NN.. how would I consider the algorithm?

Decision tree classifies (assuming it classifies correctly) -> send to NN so that the DT keeps learning -> send results from NN back to DT?

#

like a constant feedback loop of-sorts

lusty coral Jan 2, 2020, 7:07 PM

#

Hey guys. I'm coming from general discussion

#

I'm doing some data analysis via pandas. I have to do this task periodically like every 15secs. Though I think dataframes are slow, or I'm using them in a wrong way.

#

Here is my code: https://github.com/gokturksm/oyleler/blob/master/main_hibrit.py

GitHub

gokturksm/oyleler

öylesine çalışmalar. Contribute to gokturksm/oyleler development by creating an account on GitHub.

#

I mean I thought of using python matrices, but thought it would be a lot harder to deal with.

#

I need to change some specific roads, and update my dataframe etc

#

I need to get that data as well, and I'm being aware for a live demo pandas won't work I guess

#

But for now, i just want to know whether I'm doing things wrongly, or any other ideas you guys have considering my code? I'm not the best, still learning. So there is that

deft harbor Jan 2, 2020, 7:39 PM

#

Pandas becomes real slow when you start appending stuff to a dataframe over and over.

#

I'm not sure I follow everything in the code, as the comments aren't English.

lusty coral Jan 2, 2020, 7:44 PM

#

I could translate them if it would help

#

I did it nonetheless 😄

#

https://github.com/gokturksm/oyleler/blob/master/calisma_kisaltma.py

GitHub

gokturksm/oyleler

öylesine çalışmalar. Contribute to gokturksm/oyleler development by creating an account on GitHub.

#

This one is the latest one. I tried using apply and update. But I think I'm doing something wrong

#

I need to update my data and calculate some values, then I decide some pricing stuff. I have not implemented pricing stuff because I need to resolve my issues first I think

delicate carbon Jan 2, 2020, 8:17 PM

#

basically im just looking for a "simple" solution. When I read about implementation of ML and theyre like this is how you do it:

📎 unknown.png

#

im like okay i dont know what any of that means lol

#

i dont have that background

#

but i can read code and understand that so i have to study actual implementation logic

lusty coral Jan 2, 2020, 8:34 PM

#

Search n log n in Google. Codeproject.com has a good explanation

oblique belfry Jan 2, 2020, 9:04 PM

#

Has anyone used autoencoders to great effect before? I am thinkning about using it in a signal processing problem, but all the hoopla I see about it are with MNIST examples.

There are a lot of academic papers using them, but I am not so sure if they work well in the real world.

velvet thorn Jan 2, 2020, 11:23 PM

#

@delicate carbon that's entropy

#

basically, you want to create rules that allow you to split a dataset into subsets where each subset has "similar" values of the variable you're trying to predict

#

higher "similarity" is lower entropy

compact bluff Jan 3, 2020, 1:34 AM

#

I have a csv with the following columns: update_rule, K, N, L, attack, time_taken, and eve_score (where attack can be "none" or "geometric"). I want to do two things:

average time_taken and eve_score for each row with the same update_rule, K, N, L, and attack
get rid of the attack column and replace eve_score with eve_score_no_attack and eve_score_geometric

at the end these should be the columns: update_rule, K, N, L, time_taken, eve_score, eve_score_no_attack, eve_score_geometric. does anyone know how I can do this? I've been looking at the pandas api but I don't really care about which library I use

velvet thorn Jan 3, 2020, 1:58 AM

#

that's basically a groupby mean

compact bluff Jan 3, 2020, 2:03 AM

#

I don't see how it can be used in this case. can you explain a little more?

velvet thorn Jan 3, 2020, 2:04 AM

#

do you know how groupby works?

#

if I understand you correctly, what you want to do is df.groupby(['update_rule', 'K', 'N', 'L', 'attack']).mean() (for the first part)

compact bluff Jan 3, 2020, 2:12 AM

#

ahhh that probably will do what I want

#

do you know how to approach the second part?

velvet thorn Jan 3, 2020, 2:16 AM

#

I'm not really sure what you want there

compact bluff Jan 3, 2020, 2:23 AM

#

attack isn't numeric which makes it a little harder to work with in analysis. I want to get rid of that column. the start of my processed csv is currently:

update_rule,K,N,L,attack,time_taken,eve_score
anti_hebbian,4.0,4.0,4.0,geometric,12.9164006,79.53125
anti_hebbian,4.0,4.0,4.0,none,11.625815399999999,78.90625

I want it to be:

update_rule,K,N,L,attack,time_taken,eve_score_no_attack,eve_score_geometric
anti_hebbian,4.0,4.0,4.0,12.271108,78.90625,79.53125

note that the updated time_taken comes from the average of the two previous time takens

velvet thorn Jan 3, 2020, 2:27 AM

#

if I understand correctly, you want to groupby attack then

#

and take the mean

#

and then join onto the result of the first part

wind marlin Jan 3, 2020, 2:12 PM

#

Is there a way to use Kite for Colab?

tawdry rose Jan 3, 2020, 3:26 PM

#

im new to pandas

#

where can i find

#

sample datas

oblique belfry Jan 3, 2020, 4:10 PM

#

datasets?

#

Try Kaggle.

tawdry rose Jan 3, 2020, 4:52 PM

#

hm okay

#

what is difference between ML and deep learning

desert oar Jan 3, 2020, 6:44 PM

#

@tawdry rose deep learning is machine learning with certain kinds of "deep" neural networks

tawdry rose Jan 3, 2020, 6:49 PM

#

hm thanks 🙂

sand gyro Jan 3, 2020, 7:33 PM

#

In my code I am converting a csv file into pandas then I am adjusting the data in the columns
I have created two excel workbooks before the if statement
I have two issues in my code the first issue is I have tried a number of solutions but I cannot add the dataframe data in the excel sheet properly according to the if statement. The if statement sets conditions to add the pandas data rows to the accepted contacts otherwise the rows not passing the condition should be added to the rejected contacts workbooks. What am I getting is in the accepting contacts the full contacts are looped in the workbook and are added once into the rejected contacts

for column, row in df.iterrows():
  row_check = row.isna()
  if (not row_check[0] and not row_check[1]) and ((not row_check[2] and not row_check[3]) or (not row_check[2]) and (not row_check[6]) or (not row_check[16]) and (not row_check[12] and not row_check[13] and not row_check[14] and row_check[15]) or (not row_check[8] and not row_check[9] and not row_check[10] and not row_check[11] and (not row_check[7] or not row_check[17]))):
       for r in dataframe_to_rows(df, index=False, header=False):
           ws.append(r)
else:
       for r in dataframe_to_rows(df, index=False, header=False):
           ws2.append(r)

wb.save("Accepted Contacts.xlsx")
wb2.save("Rejected Contacts.xlsx")

velvet thorn Jan 3, 2020, 10:19 PM

#

that...does not look good.

#

iterrows is generally a dodgy way to do things

granite sierra Jan 4, 2020, 3:04 PM

#

some theory help.

I am creating a neural net, with a softmax activation function as the output. If I'm doing cross entropy, do I have to hot-encode the output of the soft max before doing cross entropy. If I do have to hot-encode, how does one go about doing this?

#

or does it come out of the softmax as a hot encoded vector?

silent swan Jan 4, 2020, 6:58 PM

#

cross entropy is computed between distributions

#

in practice, your labels will be a length-N vector of class IDs

#

and you predictions will be a NxK matrix of either prediction probabilities or logits

granite sierra Jan 4, 2020, 7:51 PM

#

ok and the output of the softmax will be the x_pred in the cross entropy

#

with x_actual being the comparison array, right?

#

correct?

silent swan Jan 4, 2020, 9:06 PM

#

that sounds correct

granite sierra Jan 4, 2020, 10:51 PM

#

a little more theory help

#

@silent swan to update the weights on the backpropagation, do I take the derivative of the activation function (softmax in this scenario) or do I take the derivative of the cost function (cross entropy in this scenario).

velvet thorn Jan 4, 2020, 10:56 PM

#

the latter.

#

well, eventually the former too

#

in terms of backpropagation, there is no difference between an "actual" layer and an activation layer

granite sierra Jan 4, 2020, 11:02 PM

#

when would the former be needed?

velvet thorn Jan 4, 2020, 11:10 PM

#

well, you need to propagate backwards throughout the network, right?

granite sierra Jan 4, 2020, 11:11 PM

#

yea

velvet thorn Jan 4, 2020, 11:11 PM

#

you can think of each layer as applying a function to the previous layer's output

#

starting with the input

#

that's the forward pass.

#

in this case, the last layer is the softmax

#

the result of that then goes into calculating the loss with reference to the ground truth

#

as well as the gradient

#

so now you have loss and gradient

#

you propagate the gradient backwards through the last layer (there are no weights to update there)

#

but nevertheless you must apply the chain rule so the gradient's magnitude is correct

#

your second last layer is probably a FC layer or something

#

and that has weights to update

jolly briar Jan 4, 2020, 11:12 PM

#

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Matt Mazur

Mazur

A Step by Step Backpropagation Example

Background Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example…

#

i think i've seen this used as a concrete example for back-propagation before

velvet thorn Jan 4, 2020, 11:13 PM

#

that one is great

#

I've seen it

jolly briar Jan 4, 2020, 11:13 PM

#

👌

granite sierra Jan 4, 2020, 11:13 PM

#

Ok thanks, I'll take a look at that, FC layer?
I thought you started the back prop from output backwards and not from the input layer again

velvet thorn Jan 4, 2020, 11:13 PM

#

fully connected

#

yes, that is correct

#

that's why it's backpropagation

granite sierra Jan 4, 2020, 11:14 PM

#

ok, sorry I got confused when you said you start from the input, all good

velvet thorn Jan 4, 2020, 11:14 PM

#

the point is using the chain rule to get the derivative of the loss function with respect to the current layer, right?

granite sierra Jan 4, 2020, 11:14 PM

#

yea

#

but surely by that logic, you would need the derivative of the softmax?

#

since that would be the "current layer" when starting the backprop

velvet thorn Jan 4, 2020, 11:17 PM

#

eventually

#

but remember

#

the value of the loss function is derived from the output of the softmax

#

so the very first gradient you can calculate is of the softmax with respect to the loss

#

is that what you mean?

granite sierra Jan 4, 2020, 11:18 PM

#

yea

velvet thorn Jan 4, 2020, 11:18 PM

#

yeah, then you go backwards

granite sierra Jan 4, 2020, 11:22 PM

#

so output of softmax is used to calculate the cross entropy, and then I take the derivative of the CE to get the gradient, I compare the previous weights. Do i get the derivative of the sigmoid in the hidden layer then

velvet thorn Jan 4, 2020, 11:24 PM

#

yes, unless you don't want to train it

granite sierra Jan 4, 2020, 11:25 PM

#

whats the point of that then xD

velvet thorn Jan 4, 2020, 11:26 PM

#

frozen layers?

granite sierra Jan 4, 2020, 11:26 PM

#

oh yea sure

#

I only have 1 hidden layer though

#

I own't have frozen layers anyway

#

Ok I'll try and figure it out

#

so far what I've found is I struggle more with the python implementation than the math/theory

hexed rampart Jan 4, 2020, 11:32 PM

#

i was running my neural network and i got a DSATray.exe error talking about a "gaurd page" and the ide said this: Process finished with exit code -1073740791 (0xC0000409). What happened?

granite sierra Jan 4, 2020, 11:49 PM

#

@velvet thorn my issue with that article is that the example it gives, it uses squared error and sigmoid in a multi class classification, and the code/math is much simpler for both of those over softmax with cross entropy

granite sierra Jan 5, 2020, 12:26 AM

#

For cross entropy, where do I get the actual probabilities? Is that just the target value? Because the target value won't be a vector of the same size as the predicted probabilities

velvet thorn Jan 5, 2020, 12:27 AM

#

yes

#

a perfect classifier would predict 1.0 for the actual value

#

and 0.0 for everything else

granite sierra Jan 5, 2020, 12:27 AM

#

oh

#

so for example if I have 5 options, and the target value is num_1, the vector will be [1, 0, 0, 0, 0]

velvet thorn Jan 5, 2020, 12:29 AM

#

1.0, 0.0... actually

#

but not that important

#

wait, do you mean the target

#

or the prediction?

granite sierra Jan 5, 2020, 12:29 AM

#

target

velvet thorn Jan 5, 2020, 12:29 AM

#

yes

#

then int

granite sierra Jan 5, 2020, 12:30 AM

#

ok how would I get the target value as vector, like is this where one-hot encoding comes in?

velvet thorn Jan 5, 2020, 12:30 AM

#

uh

#

you're rolling

#

your own NN right

#

then yes

granite sierra Jan 5, 2020, 12:30 AM

#

yea

#

oki got it, I feel stupid af lmao

#

gm, question, so if I wanted to optimise the gradient descent, is there a way I could use adam without having to code it myself. would that entail wrapping my model in scikits learn wrapper

velvet thorn Jan 5, 2020, 1:17 AM

#

uh...well.

#

not sure how you want to do that

#

it's not that hard to code though!

granite sierra Jan 5, 2020, 1:27 AM

#

fair, maybe I'll just stick with sgd, it doesnt have to be a good classifier, just a classifier

tawdry rose Jan 5, 2020, 5:15 PM

#

how can i plot correlation heatmap

#

without seaborn

#

with matplotlib

worn stratus Jan 5, 2020, 5:16 PM

#

It might be worth having a look at seaborn's source code - no idea how readable or not it is

tawdry rose Jan 5, 2020, 5:17 PM

#

btw can i show correlations with any other plot

worn stratus Jan 5, 2020, 5:29 PM

#

I've implemented a simple NN with one hidden layer, and I think it works, but I'm not sure. I'm just doing the most basic sigmoid/sigmoid_prime stuff - and it doesn't handle xor, should it?

silent swan Jan 5, 2020, 6:41 PM

#

just compute a correlation matrix, and then use plt.imshow

granite sierra Jan 5, 2020, 10:40 PM

#

I have a question, this guy calculates cross entropy, but doesn't use it anywhere in his backprop.

https://beckernick.github.io/neural-network-scratch/

Is that wrong or did he get around it somehow that I am unable to see? I thought the point of the cost function was to see how far off the weights were, but how is he training if he isnt using the cost function values

nick becker

Building a Neural Network from Scratch in Python and in TensorFlow

Neural Networks, Hidden Layers, Backpropagation, TensorFlow

velvet thorn Jan 5, 2020, 10:48 PM

#

@worn stratus if you have a hidden layer, it should be able to handle nonlinearities

worn stratus Jan 5, 2020, 10:48 PM

#

Yeah, it turns out it does/did - but I was just bad and read 0.05 as 0.5

velvet thorn Jan 5, 2020, 10:50 PM

#

@granite sierra doesn't he do that under "Implementing Backpropagation"

granite sierra Jan 5, 2020, 10:50 PM

#

well he does something, but he doesnt use the cost function value

#

he literally only uses the cost function value for printing the current loss

#

@velvet thorn do you understand what he is doing, is it correct?

velvet thorn Jan 5, 2020, 10:53 PM

#

but

#

why do you say that?

#

 output_error_signal = (output_probs - training_labels) / output_probs.shape[0]
    
    error_signal_hidden = np.dot(output_error_signal, layer2_weights_array.T) 
    error_signal_hidden[hidden_layer <= 0] = 0

granite sierra Jan 5, 2020, 10:54 PM

#

sure but thats not cross entropy?

#

he has a cross entropy function higher up that he calculates loss

velvet thorn Jan 5, 2020, 10:56 PM

#

oh

#

I see your confusion

#

this is why naming functions well is important

#

cross_entropy_softmax_loss_array

#

you mean this, right?

#

that calculates the "average" loss

granite sierra Jan 5, 2020, 10:57 PM

#

yea I think so, let me check

velvet thorn Jan 5, 2020, 10:57 PM

#

but output_error_signal is what you actually use for backpropagation

#

because remember

granite sierra Jan 5, 2020, 10:57 PM

#

oh

velvet thorn Jan 5, 2020, 10:57 PM

#

when you apply the loss function to the predictions and target

#

you should get one value per pair

#

pretty bad naming TBH

#

and that's old code

#

maybe I should do a blog on this

granite sierra Jan 5, 2020, 10:57 PM

#

why is it old? (might be a dumb question)

velvet thorn Jan 5, 2020, 10:57 PM

#

I thought there were more resources available

#

because it uses xrange

#

which is a Python 2 function

#

not a dumb question

granite sierra Jan 5, 2020, 10:58 PM

#

oh sure

#

yea but the rest of the code seems relatively ok

#

minus the bad naming

velvet thorn Jan 5, 2020, 11:00 PM

#

I only briefly glanced through it

granite sierra Jan 5, 2020, 11:00 PM

#

so he uses loss function just to calculate the average loss and prints that out every epoch

velvet thorn Jan 5, 2020, 11:00 PM

#

I just woke up so

#

reading code is difficult

granite sierra Jan 5, 2020, 11:01 PM

#

which side of the world you on, east or west

#

thanks for the help though

velvet thorn Jan 5, 2020, 11:02 PM

#

well

#

I suppose

#

that depends on how you rotate the globe, right? pithink

#

but east

granite sierra Jan 5, 2020, 11:08 PM

#

Fair, thanks for help, muchacho gracias

tardy pasture Jan 6, 2020, 3:01 AM

#

Anyone has experience with using Gaussian process for regression problems?

#

sorry if i posted this in the channel

lapis sequoia Jan 6, 2020, 3:13 AM

#

!ask

arctic wedgeBOT Jan 6, 2020, 3:13 AM

#

ask

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

lapis sequoia Jan 6, 2020, 3:22 AM

#

the most elaborate 0 to DL approach covering theory, math and code https://d2l.ai/

#

^ can some mod pin this

tardy pasture Jan 6, 2020, 3:26 AM

#

Thank you very much

native stag Jan 6, 2020, 2:07 PM

#

https://github.com/brylevkirill/notes

GitHub

brylevkirill/notes

Resources to learn more about Machine Learning and Artificial Intelligence - brylevkirill/notes

native stag Jan 6, 2020, 2:27 PM

#

hours upon hours of resources

granite sierra Jan 6, 2020, 6:51 PM

#

Hey guys, so I have a neural net, multiclass, if I have 3 outputs and 5 hiddens, my softmax is returning 3x5, so like [0.15911229 0.66902292 0.17186479] but 5x

#

so it gives this

#

[[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]
[0.19122457 0.36888354 0.4398919 ]]

#

does anyone know why it might do this?

granite sierra Jan 6, 2020, 9:02 PM

#

nvm all good, I fixed it

haughty storm Jan 6, 2020, 9:13 PM

#

How can I make my for loops closer to O(logN) efficiency?

worn stratus Jan 6, 2020, 9:31 PM

#

Replace them with a binary search?

haughty storm Jan 6, 2020, 9:33 PM

#

ok

#

sure

#

i mean what are good approaches to doin that

uncut shadow Jan 6, 2020, 10:39 PM

#

Hey. I have 2 questions which I don't quite understand:

How to turn words and text into code machine can understand (for chatbot, so bot can create a new meaningful sentence).
After professing etc. the network will return a number. How can I turn this number into a word again?

#

I'm trying to make a deep learning chatbot, but these 2 things are impossible for me to solve

velvet thorn Jan 7, 2020, 12:28 AM

#

if you don’t know how to do those

#

you probably need to do a lot of work.

#

a lot.

#

look up tokenisation and embedding @uncut shadow

#

that’s a good start

tardy pasture Jan 7, 2020, 12:47 AM

#

How do I use a gaussianProcessRegressor() for time series forecasting?

drowsy grove Jan 7, 2020, 3:57 AM

#

Hi guys, I am wondering if I can ask a simple pandas question

#

Right now I have this small sample dataset with order details:

#

My task is to find out the ONE order here that "has been processed by mistake".

📎 unknown.png

#

My train of thought is: since it's a given that it has been processed, I will begin with all the orders with COMPLETED status.

#

So here is what I got

📎 unknown.png

#

I know I can see that there is only one row with Inactive status. But to be accurate, I still use groupby to find out that the anomaly can be shown by using groupby(STATUS) on the result of the last step.

#

And eventually I can get the row I am after with this

📎 unknown.png

#

In total I got c = df_merge[df_merge.ORDER_STATUS_CD == 'S'] c.groupby(['STATUS'])['ORDER_ID'].size() df_merge[(df_merge.STATUS == 'Inactive') & (df_merge.ORDER_STATUS_CD == 'S')]

#

I feel like this is way too verbose and I have to use my eye to identify which groupby result group ends up being 1. I am wondering if there is a way I can directly utilize the fact that there will be 1 group with a size of 1 in the 2nd step. And directly get that row I need from there.

#

Instead of checking the result 'Oh, I should focus on this group as its size is 1' and then used the 3rd line of code to finally get the result.

#

Hope I made myself clear. Sorry for being verbose here.

#

Thanks.

lapis sequoia Jan 7, 2020, 4:52 AM

#

@drowsy grove expect input and desired output please.. rephrase your question

#

how do you define an order as being 'processed by mistake'

uncut shadow Jan 7, 2020, 7:32 AM

#

Well, I know what is word tokenization and embedding, but (iirc) it only allows to match 2 words so for example alice, wonderland.

lapis sequoia Jan 7, 2020, 7:57 AM

#

what do you mean.. elaborate

#

what do you think embeddings are?

uncut shadow Jan 7, 2020, 8:22 AM

#

I mean, I'm thinking How to make a chatbot in python which would create It's own sentenecs based sentences it Has seen. The problem is, How to make sentences. It can check if 2 words have the same (or close) meaning, but idk if It's going to work with sentences.

lapis sequoia Jan 7, 2020, 8:45 AM

#

then you don't understand what embeddings are

#

let's take a look at two sentences, and tell me what you observe from them

#

I waited at the signal

#

vs

#

He gave me a signal

#

what's different here?

uncut shadow Jan 7, 2020, 8:46 AM

#

Well, everything, but signal is different

#

And the meaning of signal is technically the same in both sentences

lapis sequoia Jan 7, 2020, 8:53 AM

#

the semantics changed depending on the context

#

embeddings are the same way.. they are a representation

#

and some embeddings are context aware.. so the word signal when converted to an embedding is not the same and changes with the context

#

you can hence arrive at sentence level context aware embeddings

#

understand?

#

Your objective is to have your chatbot make responses.. one similar system that makes responses based on context is widely called Question Answering..

#

Gmail smart reply for example

#

And when you're using something like Gmail's smart compose.. that's something called NLG, that uses context in realtime to update its predictions... returning the highest semantically similar prediction based on its trained corpus

#

@uncut shadow andddd you disappeared

uncut shadow Jan 7, 2020, 9:01 AM

#

@lapis sequoia I'm back :P

lapis sequoia Jan 7, 2020, 9:02 AM

#

read and ask questions if you have any

uncut shadow Jan 7, 2020, 9:04 AM

#

Ok, I'll look into it and I'll try to google something, thanks!

lapis sequoia Jan 7, 2020, 9:09 AM

#

semantic similarity with sentence level embeddings is the concept I just told you about

pearl kiln Jan 7, 2020, 11:11 AM

#

any machine learning tutorial on youtube

#

?????

#

good

drowsy grove Jan 7, 2020, 12:24 PM

#

@lapis sequoia My bad. Reading my question again, it does seem that the basic desired output is missing.

#

Input is the 15rows * 11columns dataframe.

#

Desired output is the record/row of the order that was 'processed by mistake'.

lapis sequoia Jan 7, 2020, 12:29 PM

#

that's fine..

#

but how do you define a row as having been processed by mistake

drowsy grove Jan 7, 2020, 12:29 PM

#

As for what is defined as 'processed by mistake', that is a very good question. I honestly do not know. The given information is that there is indeed one order in this dataframe that is processed by mistake.

lapis sequoia Jan 7, 2020, 12:29 PM

#

that's your filter condition

drowsy grove Jan 7, 2020, 12:29 PM

#

Exactly.

lapis sequoia Jan 7, 2020, 12:29 PM

#

ohk.. so you don't know

drowsy grove Jan 7, 2020, 12:30 PM

#

So I've been trying to deduct from the given.

lapis sequoia Jan 7, 2020, 12:30 PM

#

show me the df.. let's see..

#

and is this homework?

drowsy grove Jan 7, 2020, 12:30 PM

#

Exactly.

#

Nope...like a small project haha.

#

It's a merged dataframe.

#

Oops, that looks awful.

lapis sequoia Jan 7, 2020, 12:31 PM

#

of course it does...

drowsy grove Jan 7, 2020, 12:31 PM

#

📎 unknown.png

lapis sequoia Jan 7, 2020, 12:31 PM

#

what does active inactive mean

drowsy grove Jan 7, 2020, 12:31 PM

#

That's exactly where I got stumped.

#

What I was trying to do was find one record that is anomaly.

#

And that must be the one.

#

Basically I'm deducting my filter condition from result.

lapis sequoia Jan 7, 2020, 12:33 PM

#

let's try to be less verbose.. and more objective

drowsy grove Jan 7, 2020, 12:33 PM

#

Since there's only 1 inactive record that shows "COMPLETED". I thought it must be it.

#

That's a huge problem of mine. Verbosity.

#

IMO, 'processed by mistake' consists of 2 parts: 'processed' and 'by mistake'

lapis sequoia Jan 7, 2020, 12:34 PM

#

let's see what can be your possible conditions based on the fact that you don't know the information about the fields you have

drowsy grove Jan 7, 2020, 12:34 PM

#

Sure!

lapis sequoia Jan 7, 2020, 12:34 PM

#

im just waiting for you to let me

drowsy grove Jan 7, 2020, 12:34 PM

#

My bad. Go on.

lapis sequoia Jan 7, 2020, 12:35 PM

#

Status of User Inactive, but Order Status CD shows as S

#

Order Status Cancelled but Order Status CD shows as S or P

#

No order date or No Order ID but order status CD shows as P or S

#

that's it

#

from what I can gather from this

drowsy grove Jan 7, 2020, 12:36 PM

#

Makes total sense. I failed to see the possibility of 2 and 3.

lapis sequoia Jan 7, 2020, 12:37 PM

#

it helps to not get in your own way

#

looking at a problem requires breaking down in to manageable pieces and solving the parts

drowsy grove Jan 7, 2020, 12:37 PM

#

You are right. Lemme look at the data again.

#

So I shall create filter for each situation and then compare the results?

jolly briar Jan 7, 2020, 12:38 PM

#

i've a script that i wanted call using a bash alias as
alias s='path/to/script.py

the issue is that the script requires some libraries that aren't in my base python install and it also calls some folders within the project (from dir import constants) that i think aren't working when trying to use it as an alias like this.

what's the best approach here?

lapis sequoia Jan 7, 2020, 12:39 PM

#

hmm I dunno man.. maybe head to #unix

jolly briar Jan 7, 2020, 12:39 PM

#

think it's a python question

#

seems like it'd be a pretty common pattern / problem

lapis sequoia Jan 7, 2020, 12:40 PM

#

I know.. but you're talking about a bash alias.. seems like they'd be able to help.. or you can ask in a help channel

jolly briar Jan 7, 2020, 12:40 PM

#

don't have a specific question - this channel seems ok... i don't think a bash alias is particularly exotic

#

I've just realise that I'm in the #data-science-and-ml channel not the general python channel 🤦, sorry

drowsy grove Jan 7, 2020, 1:25 PM

#

@lapis sequoia BTW, I ran the queries for all 3 scenarios and only the first scenario returned 1 row.

#

I think now I can safely assume that 1 is the filter condition for 'processed by mistake' right?

📎 unknown.png

#

Thanks. This looks also doable with SQL.

raven glacier Jan 7, 2020, 2:52 PM

#

Anyone with experience in learning Spark from scratch via PySpark?

drowsy grove Jan 7, 2020, 5:02 PM

#

Regarding exploratory data analysis: What can be done in the beginning when analyzing a dataset?

#

I can think of checking dimensions, missing values.

#

Run a df.info(), plot some visualization. Anything else?

brisk swallow Jan 7, 2020, 8:27 PM

#

Using Pandas and Numpy at the moment

velvet thorn Jan 7, 2020, 9:32 PM

#

@raven glacier me

raven glacier Jan 7, 2020, 9:33 PM

#

This book a good way?

http://shop.oreilly.com/product/0636920240303.do

Learning Spark

Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads effic...

velvet thorn Jan 7, 2020, 9:35 PM

#

well

#

if you like books, maybe? I didn't learn from a book

#

do you already have experience with pandas?

raven glacier Jan 7, 2020, 9:41 PM

#

Very basic knowledge

#

Where did you learn from

velvet thorn Jan 7, 2020, 9:42 PM

#

I was already experienced with pandas

#

the concepts are more or less the same

#

so I just started working with it

#

the main difference is SQL-like queries

#

and the ML model

raven glacier Jan 7, 2020, 9:48 PM

#

okay so you just learnt as you went along by googling etc

velvet thorn Jan 7, 2020, 9:48 PM

#

more or less, yeah

#

I knew enough to be able to do work

#

that was enough

raven glacier Jan 7, 2020, 9:55 PM

#

fair enough mate

uncut shadow Jan 7, 2020, 10:44 PM

#

Do you think making my own models from scratch (for example, only with NumPy and Matplotlib) makes sense? I want to jump into machine learning and things like deep learning so do you think implementing my own models from scratch makes sense?

velvet thorn Jan 7, 2020, 10:52 PM

#

yes

#

well, I would do it (have done it)

jolly briar Jan 7, 2020, 11:20 PM

#

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import pandas_datareader as pdr
plt.show()
plt.close("all")
ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
ts.plot()

the above isn't plotting, using ipython in a terminal (osx), i'm expecting a popup plot but am not getting one, anything obvious I'm missing?

#

ok, plt.show() just has to be at the end it seems 🙃

#

this seems kinda verbose, if i'm running ts.plot() in an interactive session it seems that i'm going to want to view it immediately, rather than having to enter plt.show() each time after

drowsy grove Jan 7, 2020, 11:26 PM

#

I was asked to describe "how to make a cup of tea to people who know nothing about it" today in a big data analyst interview.

jolly briar Jan 7, 2020, 11:26 PM

#

3 to 4 minutes, don't squeeze the bag

drowsy grove Jan 7, 2020, 11:26 PM

#

I was a little flabbergasted but I tried my best to break it down into steps.

#

I know there isn't a correct answer. But what shall I specifically emphasize when answering questions like this.

#

@jolly briar they didn't say it comes in bags

jolly briar Jan 7, 2020, 11:27 PM

#

you can make reasonable assumptions i'm sure

drowsy grove Jan 7, 2020, 11:27 PM

#

But in my limited experience, loose tea, tea bags, procedure is the same haha.

jolly briar Jan 7, 2020, 11:28 PM

#

well, depends on the tea for temp and stuff but sure... i would assume that they're just seeing a problem broken down into steps here tho

drowsy grove Jan 7, 2020, 11:28 PM

#

That's what I thought. Still, a fun question and thought I would share.

#

They made me choose

#

Either this or how to get leaves off my lawn.

jolly briar Jan 7, 2020, 11:29 PM

#

unless they were waggling their cup at you when they asked, maybe it was a hint

drowsy grove Jan 7, 2020, 11:29 PM

#

I never used leaf blower once so I chose

#

Haha, it would be. But this was on the phone so I wouldn't know.

jolly briar Jan 7, 2020, 11:31 PM

#

hope it went ok 🙏

drowsy grove Jan 7, 2020, 11:31 PM

#

@jolly briar what does plt.close("all")do in your code?

#

Thanks man!

jolly briar Jan 7, 2020, 11:32 PM

#

closes all the plots i reckon, i've not used matplotlib for ages so it's all a mystery... ggplot > matplotlib 😤

drowsy grove Jan 7, 2020, 11:32 PM

#

Ugh, I just do df.groupby()....plot()usually

#

Would I just be using the plot function of pandas?

jolly briar Jan 7, 2020, 11:33 PM

#

yea but it's calling matplotlib anyway afaik

drowsy grove Jan 7, 2020, 11:34 PM

#

Gotcha. Thanks. I learned from Brandon Rhodes's talk.

#

Most people on Kaggle use sns. And it does make a different I have to admit.

jolly briar Jan 7, 2020, 11:35 PM

#

matplotlib is rank in my opinion

drowsy grove Jan 7, 2020, 11:38 PM

#

rank as in stink?

jolly briar Jan 7, 2020, 11:39 PM

#

yes, i don't like it... i always feel as though i'm having to do more than is necessary when i use it, feels too low level or whatever. I think seaborne helps there, but gg ( for me ) is really nice. Building things up with pipes in dplyr, passing to gg, facet wraps etc, are all quite nice

silent swan Jan 7, 2020, 11:44 PM

#

I like matplotlib because it gives me explicit control over everything

jolly briar Jan 7, 2020, 11:45 PM

#

yeah i can get that, i've never really found i need it tho

drowsy grove Jan 7, 2020, 11:45 PM

#

I need to play with gg and see what it does

#

What is facet wrap? @jolly briar

jolly briar Jan 7, 2020, 11:46 PM

#

it makes great plots in a pretty logical manner...
facet wrap enables you to create a grid of plots easily, like facet_wrap(sex ~ .) will create a grid by sex

silent swan Jan 7, 2020, 11:46 PM

#

I've had to make very many different kinds of plots, so sooner or later any premade abstraction fails for me