#Scraping Search Results

43 messages · Page 1 of 1 (latest)

fleet silo
#

Hello everyone,

I hope this message finds you well. I'm currently working on a web scraping project where I need to extract URLs from search results based on a specific keyword. However, I've encountered a challenge in identifying the correct class name to use for extracting the URLs from the search results' HTML.

I've thoroughly examined the HTML structure, and it seems that the issue might be related to how the class name is being identified. I've attempted to use a certain class name, but it appears that either the structure has changed or there's a discrepancy in identifying the class name accurately.

For those of you experienced in web scraping or API utilization, I kindly ask for your guidance on the most effective approach to determine the accurate class name or alternative techniques to successfully extract the URLs from the search results.

If any of you can offer assistance, insights, code examples, or recommended resources, I would greatly appreciate your input. Your expertise would be of immense help in overcoming this challenge.

Thank you very much for considering my request, and I eagerly await any valuable advice you might have to share.

fleet silo
#

@lone dawn

lone dawn
#

urm

#

when i was doing a web scraping project using selenium, i just copied the xpath and chucked that in my code, but sometimes it didnt work

#

ur better off waiting for someone who has more experience than me in this topic, sorry

fleet silo
#

*ping

lone dawn
#

im not sure who will, just wait and see who responds

fleet silo
#

WEB SCRAPING

lone dawn
#

i would also advise to not dm people, just wait

fleet silo
lone dawn
#

🤷‍♂️

fleet silo
#

Ahhman

lone dawn
#

post more in one place, thats my advise

#

there will not always be someone here that can answer your question

fleet silo
#

Yeah

#

Wait ik this guy

#

Hes the boss here

#

@worldly moat broo helpppp

#

webbb

worldly moat
#

What search results are you scraping

#

Scraping Search Results

fleet silo
# worldly moat What search results are you scraping

So there this api it scrapes google searches. Like if i put abc then it returns sites ranked for that word but everything it returns is in html i just want to automatically extract sites url from the html. I tried using various libraries, then openais api to locate those class names that contain urls but failed.

worldly moat
#

Google disallows scraping and actively blocks scraping using various techniques

fleet silo
#

Step 1: Retrieve sites ranked for the provided keyword

keyword = input("Enter the keyword to search: ")
search_url = f"https://webspi.p.rapidapi.com/search/{keyword}"
headers = {
"X-RapidAPI-Key": apiKey,
"X-RapidAPI-Host": "webspi.p.rapidapi.com"
}

response = requests.get(search_url, headers=headers)

if response.status_code != 200:
print("Error accessing search API.")
exit()

html_code = response.text

Step 2: Identify the class name pattern using regular expressions

class_name_pattern = re.compile(r'class="'["']')

Step 3: Extract URLs based on the identified class name

soup = BeautifulSoup(html_code, "lxml")

Try to identify the class name pattern from the first 10 <a> tags

class_name_found = False
for tag in soup.find_all("a")[:10]:
match = class_name_pattern.search(str(tag))
if match:
class_name = match.group(1)
class_name_found = True
break

if not class_name_found:
print("No class name pattern found in the first 10 <a> tags.")
exit()

identified_urls = []
search_results = soup.find_all("a", class_=class_name)
for result in search_results:
url = result["href"]
identified_urls.append(url)

if not identified_urls:
print("No URLs identified in the search results.")
exit()

worldly moat
#

So the reason you're not getting much help here is because it's not something that is allowed here and isn't really possible.

worldly moat
#

Disallowed by their TOS

fleet silo
worldly moat
#

They likely have access to the Google API

#

We use the Google search API ourselves

#

!google example

fleet silo
#

Is it free???

worldly moat
#

Yes

fleet silo
#

Shishhhhhhh

#

Any videos related to that?

worldly moat
#

No idea

#

I just read their docs

fleet silo
#

Idk anything abt that

fleet silo
#

Btw im new to diz stuff i don know whats wrong or right

#

Im sry