#Katana crawl external URL

1 messages · Page 1 of 1 (latest)

vagrant canyon
#

Are you using google.com as the target, or is the target different.

#

Just making sure. 😆

#

Taking a look.

#

Could you try to run it with the -v flag as well?

carmine notch
#

Does it not show the origin in the json output if you use that?

agile saffron
#

It only show list of URL

carmine notch
#

Perhaps im misunderstanding your question { "timestamp": "2023-10-31T10:43:07.753261-04:00", "request": { "method": "GET", "endpoint": "https://google.com/search/howsearchworks/?fg=1", "tag": "a", "attribute": "href", "source": "https://www.google.com/" }, "response": { "status_code": 200, "headers": { "cross_origin_resource_policy": "cross-origin", "expires": "Fri, 01 Jan 1990 00:00:00 GMT", "server": "sffe", "last_modified": "Tue, 24 Oct 2023 06:00:00 GMT", "cache_control": "no-cache, must-revalidate", "vary": "Accept-Encoding", "report_to": "{\"group\":\"uxe-owners-acl/www_google\",\"max_age\":2592000,\"endpoints\":[{\"url\":\"https://csp.withgoogle.com/csp/report-to/uxe-owners-acl/www_google\"}]}", "alt_svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", "x_xss_protection": "0", "pragma": "no-cache", "content_security_policy_report_only": "script-src 'nonce-0Pm5NkE8gatAJyhIMNDtMg' 'report-sample' 'strict-dynamic' 'unsafe-eval' 'unsafe-inline' http: https:; object-src 'none'; report-uri https://csp.withgoogle.com/csp/uxe-owners-acl/www_google; base-uri 'none';require-trusted-types-for 'script'; report-uri https://csp.withgoogle.com/csp/uxe-owners-acl/www_google", "content_type": "text/html", "x_content_type_options": "nosniff", "cross_origin_opener_policy_report_only": "same-origin; report-to=\"uxe-owners-acl/www_google\"", "accept_ranges": "bytes", "date": "Tue, 31 Oct 2023 14:43:07 GMT" }, "technologies": [ "HTTP/3", "Google Tag Manager", "YouTube" ] } }

#

endpoint is what it discovered, source is where it discovered it from

carmine notch
#

-do, -display-out-scope display external endpoint from scoped crawling

#

echo google.com | katana -jsonl -fs fqdn -doThis limits the scope to the fqdn and then shows everything outside of that with an error that its out of scope. You could do dn if you wanted everything in google.com to be in scope and everything otherwise, etc

#
  "timestamp": "2023-10-31T11:20:04.996739-04:00",
  "request": {
    "method": "GET",
    "endpoint": "https://www.google.com/history/optout?hl=en&fg=1",
    "tag": "a",
    "attribute": "href",
    "source": "https://www.google.com/"
  },
  "error": "out of scope"
}
carmine notch
#

Scrub the data to only show things with the out of scope error

carmine notch
#

Im not sure what you are saying

#

It says it is out of scope

carmine notch
carmine notch
#

I'm not following what you are saying. Why would googleusercontent be in scope for google.com

If you do what I said And only keep the ones out of scope, it'd do all external urls

carmine notch