Delayed discovery/indexing of your real pages, especially on a new site with low authority.
Even if Google is smart enough not to index the junk URL, you still pay the price in crawl and trust.
- Duplicate content signals and canonical confusion
If the junk URL serves homepage content and also outputs homepage canonical:
Google sees many URLs returning the same content with the same canonical.
That’s not fatal, but it creates:
duplicate content noise, and
weaker URL-level confidence while Google figures out what’s real.
This contributes to “Discovered – currently not indexed” or “Crawled – currently not indexed” on legitimate pages, because Google is triaging what to trust first.
- Wrong status codes break expected crawling heuristics
Googlebot expects:
real page → 200
moved page → 301/308
non-existent → 404/410
When a site returns 200 for non-existent URLs, Google has to infer intent from content patterns. That inference step is why you see more:
soft 404 reports,
URL inspection weirdness,
and slower indexing.
- sitemap.xml being treated like an app route is a big problem
If /sitemap.xml is ever served via fallback/JS rather than as a static XML file:
Google may show sitemap errors (format, fetch, parsing, or “couldn’t read”).
Even if it “reads” it sometimes, inconsistent delivery can cause:
delayed URL discovery,
partial crawling,
or repeated reprocessing.
For a new site, the sitemap is one of your strongest “please crawl these” signals, so this matters a lot.
- Practical effects you’ll see in Search Console
Common symptoms tied to this exact issue:
Crawled – currently not indexed on legit pages (Google is cautious)
Duplicate, Google chose different canonical or Alternate page with proper canonical tag
Soft 404 coverage issues (or weird URL inspection results)
Indexing volatility when you redeploy or caching changes