#how can I use this selector in the

1 messages · Page 1 of 1 (latest)

split beacon
#

I'm not familiar with the scrape integration, but you need a CSS selector from this JS?

#

Is there some other class, ID, or other differentiator on the second element with the .fr-view class or is it just always the second element within a list or something?

#

Difficult to guess without a test case but possibly .fr-view:nth-child(2) p a and then looks like maybe adding attribute: "text" on the same level as select

#

Maybe .fr-view:nth-child(2) > p a

marble goblet
#

I was messing with nth-child but didn't work out in the browser console. e.g. document.querySelectorAll(".fr-view:nth-child(1)") drops the whole array and not that specific element

marble goblet
#

that div is the parent of the .fr-view tho, since this one doesn't have an id

#

I just used the div id and it works, let's hope it doesn't change in the future

split beacon
#

document.querySelectorAll returns all elements that match the selector while document.querySelector only returns the first match. When chained like you had it originally, the latter selects from the node list of the former

#

CSS is wild, but each space is kind of like a narrowing indicator. .fr-view all elements matching this followed by a space and p will be all p elements within .fr-view elements

#

Where you probably had a boo boo was that the node list returned by querySelectorAll is an array-like object, so the first element is actually 0 which is why I mentioned the second element (1) -- nth-child is not an array so 2 would be the equivalent where 1 is the first element (0 index)

#

Not sure if that made sense but:

  • Node list returned by querySelectorAll
    • Element A = 0
    • Element B = 1
  • Elements selected by nth-child
    • Element A = 1
    • Element B = 2
#

CSS selector of .fr-view:nth-child(2) should be the same element as document.querySelectorAll(".fr-view")[1] and then with a space to select elements within that element, p a should be all instances of anchor links within a paragraph element within the second .fr-view element

#

<div class="fr-view"><a ...wont get selected... /><p><a ...will get selected... /></p></div>

marble goblet
#

how can I test the .fr-view:nth-child(2) selector in the browser console?

split beacon
#

Since only 1 element is expected, document.querySelector('.fr-view:nth-child(2)') should return an element. If more than 1 is expected, document.querySelectorAll

#

They both pretty much work like CSS selectors do with the difference of returning a node list (querySelectorAll) or a single element reference, if found (querySelector)

marble goblet
#

doesn't work

split beacon
#

Possible to show the DOM hierarchy of the element you're generally trying to select?

marble goblet
#

this is the page https://www.leandertx.gov/518/Water-Conservation - I was trying to get the "Phase 2" text, which is the first <a> in there. But FYI, it already works when using the ID. Currently using the following: "#divEditora7552112-e916-485a-bcdb-80069cbc7bda p a"

#

I was just trying for a more generic way in case the div id changes

split beacon
#

That helps a lot. Your hunch is probably right that unique identifier is likely to change in the future and should probably be avoided. I'd probably go with this: #page .fr-view > p a since it seems like fr-view can be in a splash dialog, the one you want is a child of #page, and the > p a is to ensure that the first p a instance is selected

#

Maybe it would help for me to understand the goal -- are you hoping to pull Phase 3 when that becomes an update? This will also fall apart somewhat regardless of approach if a new link is added before the Phase link you're seeking

#

Another option is a[href*="Phase"] to get any anchor link with a href containing "Phase"

marble goblet
#

I'm assuming they will change the phase text which should still be extracted with the same selector

#

of course if they didn't just adding other anchors and stuff

#

it's a new website design, the previous one was easier to scrape since it had a dedicated uri, too bad..