#detecting REAL web crawlers
1 messages · Page 1 of 1 (latest)
Based on the knowledge sources provided, there isn't enough information to provide a comprehensive solution for verifying the legitimacy of web crawlers using IP addresses and other information. The userAgent.isBot property can indeed help identify if a request is coming from a bot, but it doesn't provide a way to verify the legitimacy of the bot.
However, there are a few strategies you can consider:
-
Check
RefererHeaders: Servers can examine theRefererheader of incoming requests to verify if they originate from trusted pages. This can help ensure that the request is coming from a legitimate source. However, this method is not foolproof as headers can be spoofed. Source (https://vercel.com/guides/understanding-csrf-attacks) -
Robots.txt: You can use a
robots.txtfile to tell search engine crawlers which URLs they can access on your site. This can help control the behavior of legitimate crawlers, but it won't prevent illegitimate crawlers from accessing your pages. Source (https://nextjs.org/docs/app/api-reference/file-conventions/metadata/robots) -
Content Security Policy (CSP): Implementing a strong CSP can help protect your site from certain types of attacks, including those from malicious bots. Source (https://vercel.com/guides/content-security-policy)
For a more robust solution, you might need to consider using a third-party service that specializes in bot detection and management. These services can analyze various factors, including IP addresses, to determine the legitimacy of a bot.