#Katana crawl external URL
1 messages · Page 1 of 1 (latest)
Just making sure. 😆
Taking a look.
Could you try to run it with the -v flag as well?
Does it not show the origin in the json output if you use that?
It only show list of URL
Perhaps im misunderstanding your question { "timestamp": "2023-10-31T10:43:07.753261-04:00", "request": { "method": "GET", "endpoint": "https://google.com/search/howsearchworks/?fg=1", "tag": "a", "attribute": "href", "source": "https://www.google.com/" }, "response": { "status_code": 200, "headers": { "cross_origin_resource_policy": "cross-origin", "expires": "Fri, 01 Jan 1990 00:00:00 GMT", "server": "sffe", "last_modified": "Tue, 24 Oct 2023 06:00:00 GMT", "cache_control": "no-cache, must-revalidate", "vary": "Accept-Encoding", "report_to": "{\"group\":\"uxe-owners-acl/www_google\",\"max_age\":2592000,\"endpoints\":[{\"url\":\"https://csp.withgoogle.com/csp/report-to/uxe-owners-acl/www_google\"}]}", "alt_svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", "x_xss_protection": "0", "pragma": "no-cache", "content_security_policy_report_only": "script-src 'nonce-0Pm5NkE8gatAJyhIMNDtMg' 'report-sample' 'strict-dynamic' 'unsafe-eval' 'unsafe-inline' http: https:; object-src 'none'; report-uri https://csp.withgoogle.com/csp/uxe-owners-acl/www_google; base-uri 'none';require-trusted-types-for 'script'; report-uri https://csp.withgoogle.com/csp/uxe-owners-acl/www_google", "content_type": "text/html", "x_content_type_options": "nosniff", "cross_origin_opener_policy_report_only": "same-origin; report-to=\"uxe-owners-acl/www_google\"", "accept_ranges": "bytes", "date": "Tue, 31 Oct 2023 14:43:07 GMT" }, "technologies": [ "HTTP/3", "Google Tag Manager", "YouTube" ] } }
endpoint is what it discovered, source is where it discovered it from
-do, -display-out-scope display external endpoint from scoped crawling
echo google.com | katana -jsonl -fs fqdn -doThis limits the scope to the fqdn and then shows everything outside of that with an error that its out of scope. You could do dn if you wanted everything in google.com to be in scope and everything otherwise, etc
"timestamp": "2023-10-31T11:20:04.996739-04:00",
"request": {
"method": "GET",
"endpoint": "https://www.google.com/history/optout?hl=en&fg=1",
"tag": "a",
"attribute": "href",
"source": "https://www.google.com/"
},
"error": "out of scope"
}
Scrub the data to only show things with the out of scope error
I don't understand. Are you saying googleusercontent.com shouldn't be out of scope is fqdn of google.com?
I'm not following what you are saying. Why would googleusercontent be in scope for google.com
If you do what I said And only keep the ones out of scope, it'd do all external urls
Www.google.com will be out of scope for google.com if the field scope is fqdn.