#Scrape Sensor Platform

1 messages · Page 1 of 1 (latest)

obtuse smelt
#

I cannot make heads or tails of this thing.

I have read a lot of the documentation.
https://www.home-assistant.io/integrations/scrape
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#
https://scrapy.org/
https://youtu.be/0yGQ-No6kCg?feature=shared

I have read the various forum posts.
https://community.home-assistant.io/t/the-new-way-to-scrape/499964/35
https://www.smarthomejunkie.net/how-to-scrape-websites-get-actual-energy-prices-in-home-assistant/
https://stackoverflow.com/questions/53521911/scrape-a-tag-with-multiple-attributes
https://community.home-assistant.io/t/solved-sensor-scrape-need-help-with-attribute/112130/2

I have written my own explanations out.


how to find thing:
<tag attribute="

tag = select (word before ' ' [space])

attribute = attribute (without the '=')

'>' use an arrow to move down the tree [you can start at any unique tag (node) regardless of tree indent]

tag:nth-child(integer) = go to the integer number tag and spit that out [usually combined with attribute]
Note1: integer number is usually wrong from inspector (usually a few less).
Note2: will do nothing if there is a attribute and you have not specified it. 
Note3: only works with the first attribute in a tag


value_template [how to split]:
'{{ value.split("character(s)_you_want_to_split_on")[1] }}'

The dang thing still doesn't behave the way I expect it to.

#

My Config:

Resource: https://beta.omnimetrix.net/account/login

[trying to parse generator stuff inside, but first teaching myself how scrape works]
Select: link:nth-child(1)
Attribute: rel

Returns: ['stylesheet']

Which is the 3rd 'link' tag

This is repeatable depending on different tag/attribute combinations and it changes. Not always the 3rd one.

I saw something about disabling java, but the website doesn't load if I do.

obtuse smelt
#

Same thing happens for:

Select: script
Attribute: src
Returns: //js.hs-scripts.com/2712909.js

Which is the 3rd tag under head??
If I add body > script it drops into the correct first value inside body.
If I add head > script it stays at the 3rd value under head whaaaa?

I think this has something to do with the id= field? Idk, spitballing.

#

body > script:nth-child(2) att: src returns: 1st script_value 1st src_value in body
body > script:nth-child(3) att: src returns: 2nd script_value 1st src_value in body
rest seem to work properly (child_integer, attribute_string) all under body

obtuse smelt
#

For: <head> <meta property="og:url" content="https://beta.omnimetrix.net/app/main/machineUnit/machineUnits?cid=xyz"> </head>
head > meta:nth-child(6) att: property operates as expected
head > meta:nth-child(6) att: content does nothing