#SPA Fallback errors

1 messages · Page 1 of 1 (latest)

dapper hedge
#

The implications are ongoing...
This behaviour can absolutely mess with Googlebot’s crawling and indexing. It won’t “kill” rankings forever, but it slows trust-building and can keep good pages in “crawled, not indexed” purgatory longer than they should be.

  1. 200 for junk URLs (instead of 404) = “soft 404” + crawl waste

When Googlebot hits:

/this-should-never-exist-123 → 200 OK + homepage HTML

Google sees a successful response but the content doesn’t match the URL intent. Typical outcomes:

Soft 404 classification (Google’s term for “this looks like an error page even though it returns 200”).

Wasted crawl budget: Google spends fetches on garbage URLs that should have been rejected instantly.

#

Delayed discovery/indexing of your real pages, especially on a new site with low authority.

Even if Google is smart enough not to index the junk URL, you still pay the price in crawl and trust.

  1. Duplicate content signals and canonical confusion

If the junk URL serves homepage content and also outputs homepage canonical:

Google sees many URLs returning the same content with the same canonical.

That’s not fatal, but it creates:

duplicate content noise, and

weaker URL-level confidence while Google figures out what’s real.

This contributes to “Discovered – currently not indexed” or “Crawled – currently not indexed” on legitimate pages, because Google is triaging what to trust first.

  1. Wrong status codes break expected crawling heuristics

Googlebot expects:

real page → 200

moved page → 301/308

non-existent → 404/410

When a site returns 200 for non-existent URLs, Google has to infer intent from content patterns. That inference step is why you see more:

soft 404 reports,

URL inspection weirdness,

and slower indexing.

  1. sitemap.xml being treated like an app route is a big problem

If /sitemap.xml is ever served via fallback/JS rather than as a static XML file:

Google may show sitemap errors (format, fetch, parsing, or “couldn’t read”).

Even if it “reads” it sometimes, inconsistent delivery can cause:

delayed URL discovery,

partial crawling,

or repeated reprocessing.

For a new site, the sitemap is one of your strongest “please crawl these” signals, so this matters a lot.

  1. Practical effects you’ll see in Search Console

Common symptoms tied to this exact issue:

Crawled – currently not indexed on legit pages (Google is cautious)

Duplicate, Google chose different canonical or Alternate page with proper canonical tag

Soft 404 coverage issues (or weird URL inspection results)

Indexing volatility when you redeploy or caching changes

#
  1. Why your redirects workaround helps (but doesn’t fully solve it)

Your 301 redirects for known legacy URLs (eg /office-removals → /services/office-removals) help because:

Googlebot quickly lands on the correct canonical URL

Link equity consolidates properly

Users don’t bounce from the wrong page

But it doesn’t stop infinite junk URLs from returning 200 unless the platform returns proper 404/410.

dapper hedge
#

Its been 2 weeks and still awaiting senior tech support to address this issue. In the meantime :

SPA Fallback Issue — All Mitigation Measures Taken

Root cause: The Kubernetes ingress intercepts all requests and returns index.html (homepage) with HTTP 200 for any URL that doesn't match a file — including unknown paths, .html suffixed URLs, and proper 404 responses. This bypasses the application server entirely.
What the platform is doing (observed behaviour)

Any unknown URL (e.g. /wrong-page) → returns homepage HTML with HTTP 200 instead of 404
Any .html suffixed URL (e.g. /about.html) → returns the page content with HTTP 200 instead of 301
Server-side response headers (e.g. Cache-Control: no-cache) are overridden to public, max-age=300 by the platform
The node server.cjs correctly handles all of the above locally on localhost:3000 — the ingress intercepts before the server is ever reached

Mitigation 1 — server.cjs redirect logic (bypassed by ingress)

Added .html → clean URL 301 redirect logic directly in the Express server. Not effective — the ingress rewrites the URL before it reaches the server, so the rule never fires.
Mitigation 2 — Cloudflare Bulk Redirects CSV (free plan limit)

Generated a CSV of all .html → clean URL redirects for Cloudflare's Bulk Redirects feature. Not effective — free plan has a 10-entry limit; the site has 80+ pages.
Mitigation 3 — Cloudflare Page Rule for .html → 301 (deployed, unverified)

Created a Cloudflare Page Rule: perthpriorityremovals.com.au/*.htmlhttps://perthpriorityremovals.com.au/$1 (301 Permanent Redirect). Cache was fully purged. Status: deployed but unverifiable — the domain is currently stuck in a "linking pending" state pointing to the old deployment, so end-to-end testing is blocked.

Perth Priority Removals

Perth Priority Removals — small moves, rubbish removal and urgent transport specialists in Perth's eastern suburbs. Apartment moves, Marketplace pickups, same-day available. Call 0449 987 350.

#

Mitigation 4 — _redirects file (Netlify/CF Pages format, ignored by platform)

The Astro build generates a public/_redirects file with /* /404.html 404. Not effective — this format is only processed by Cloudflare Pages or Netlify, not by the custom node server.cjs.
Mitigation 5 — Client-side 404 detection via routes.json with noindex injection (removed, caused harm)

A previous implementation used a client-side script that fetched routes.json (a list of all valid URLs) and injected <meta name="robots" content="noindex"> when a page wasn't in the list. Removed — it was incorrectly marking valid pages as noindex, causing two guide pages to disappear from Google's index.
Mitigation 6 — Client-side redirect on homepage (current, active workaround)

Added a 5-line inline <script is:inline> to index.astro (homepage only). Logic: if window.location.pathname !== '/', immediately call window.location.replace('/404'). Since the SPA fallback exclusively serves the homepage HTML for unknown paths, this script only ever fires in the fallback scenario. Effective for user experience — visitors see the custom 404 page. Limitation: HTTP status code remains 200; the redirect URL becomes /404.
What a proper platform-level fix would look like

The ideal resolution (requiring Emergent/platform support) would be one of:

Disable SPA fallback on the Kubernetes ingress for this deployment — serve a true 404 from server.cjs instead of falling back to index.html
Allow server.cjs to handle routing before the ingress rewrites requests — so .html 301s and 404 responses reach the client intact
Respect origin Cache-Control headers — server.cjs sets no-cache for HTML files; the ingress overrides this to public, max-age=300, making it impossible to push updates without a CDN purge
true raft
#

Hello,

please reach out to our email support team via:

[email protected].

Thanks, @true raft