HostingChecker

2026-06-02 · 8 min read

Why a Hosting Checker Can Be Wrong: CDN and Reverse Proxy

A hosting checker reads what a domain shows to the public internet, and a CDN like Cloudflare deliberately changes that picture. This guide explains why the visible IP is often the edge, not the real host, and which signals raise or lower confidence. It also covers where the honest answer is simply origin hosting unknown, and why bypassing a site's security to find the origin is the wrong move.

T
Tomáš Mahrík
Author
How a CDN like Cloudflare hides the origin host from a checker

If you run a domain through a hosting checker and the answer comes back as "Cloudflare," you have not really learned where the website lives. You have learned which network answered the request. That is not a bug in the tool, and it is not a trick by the site owner. It is how a large and growing share of the web is built: a content delivery network sits in front of the real server, and the real server stays out of view. This article explains, honestly, why a hosting checker is sometimes wrong, what it can still tell you with confidence, and where the only correct answer is "we do not know."

The short version: a hosting checker reports what a domain exposes to the public internet. When a CDN or reverse proxy sits in front of the origin, what the domain exposes is the CDN. The tool is reading the truth — it is just not the truth you were asking about.

Origin hosting versus the edge

Every website runs on an origin — the server (or cluster) where the application code, the database and the actual files live. Historically, the origin was also the thing you talked to directly: you resolved the domain to an IP, that IP belonged to a hosting company, and "who hosts this site" had one clean answer.

A CDN changes that. A content delivery network is a layer of servers distributed around the world that sit between visitors and the origin, caching content close to users and proxying the rest. The visitor connects to the nearest CDN node (the edge); the edge serves cached content directly or fetches it from the origin behind the scenes. Cloudflare's own explainer describes a CDN as a geographically distributed group of servers that caches content near end users, which is exactly the property that makes the origin disappear from public view (Cloudflare CDN learning center).

This is not a niche setup. According to the HTTP Archive Web Almanac, CDN adoption now covers a large share of websites and a majority of the most popular ones, with a handful of providers — Cloudflare prominent among them — serving an outsized portion of traffic (HTTP Archive Web Almanac 2025 – CDN). When you check a random popular site today, the odds that you are looking at an edge rather than an origin are genuinely high.

For the practical difference between the two, see the companion guide on hosting detection in general.

What DNS looks like behind a CDN

The clearest place to see the edge replace the origin is DNS. The pattern differs slightly by provider, but the shape is consistent.

  1. Cloudflare typically requires you to move your domain's nameservers to Cloudflare. The A/AAAA records then resolve to Cloudflare anycast IPs (commonly in 104.x / 172.x ranges). The origin IP is configured inside Cloudflare and is never published in public DNS.
  2. Fastly is usually wired up with a CNAME pointing at a Fastly-managed hostname (for example, something under fastly.net), so the public resolution chain ends at Fastly's network rather than your server.
  3. Akamai likewise relies on CNAME chains into Akamai's edge hostnames (often under edgekey.net or edgesuite.net), with mapping handled inside Akamai's DNS.
  4. Amazon CloudFront sits behind a cloudfront.net distribution hostname (again typically via CNAME or an alias record), resolving to AWS edge IPs.

In all four cases the public DNS answer points at the provider's anycast network. An anycast IP is announced from many locations at once, so even the geolocation of that IP tells you about the CDN's points of presence, not where your server racks are. The origin record still exists — the site has to reach it somehow — but it is held privately inside the provider's configuration, not in the zone the rest of the internet can query.

Why the visible IP is often the CDN

This is the single most common reason a hosting checker looks "wrong." A checker resolves the domain, gets an IP, looks up the ASN (the autonomous system that owns that IP block), and reports the organization behind it. When a CDN is in front, that organization is the CDN. So the checker faithfully reports "Cloudflare" or "Amazon / CloudFront" or "Fastly" — and that is genuinely the network you are connecting to.

What it cannot see from that one lookup is the origin behind the proxy. The whole point of a reverse proxy is that the client never talks to the origin directly; it talks to the proxy, which talks to the origin on a private path. There is no public record that says "the real server is at provider X." Treating the visible IP as the host in this situation produces a confident, clean, and wrong answer about origin hosting — while being a perfectly correct answer about which CDN is in play.

This is why a good checker separates two questions that beginners tend to merge: which network is serving this domain (often answerable) and who runs the origin server (frequently not).

Signals that raise — or lower — confidence

No single lookup is decisive, so the honest approach is to combine signals and report a confidence level rather than a verdict. Useful signals include:

  1. CNAME chain. Following the CNAME records often names the provider outright (*.fastly.net, *.edgekey.net, *.cloudfront.net). The chain is one of the most reliable CDN tells.
  2. ASN of the resolved IP. Mapping the IP to its autonomous system distinguishes a CDN/cloud network from a traditional hosting ASN. It identifies the network, not necessarily the origin host.
  3. HTTP response headers. Headers such as Server, Via, X-Cache, CF-Ray and Server-Timing frequently disclose an intermediary. The MDN HTTP headers reference documents what each one means and which are proxy/cache related — but headers are self-reported and can be stripped, faked or omitted, so they corroborate rather than prove.
  4. NS records. Nameservers delegated to a provider (for example, Cloudflare-assigned nameservers) strongly suggest that provider is managing the edge.
  5. MX records. Mail is a quiet but powerful hint. Email almost always bypasses the web CDN and points at the real mail infrastructure, which often lives on the same provider as the origin. An MX record pointing at a specific host or cloud can suggest where the origin actually sits, even when the web layer is fully proxied.
  6. Historical DNS. Passive DNS history can reveal the IP a domain resolved to before a CDN was added. This is a legitimate, public-record signal — it is reading the past, not probing the present.

Even with all of these aligned, the result is a probability, not a certainty. Stacking independent signals is what moves a report from "guess" to "high confidence." Public market datasets like W3Techs hosting provider rankings are built on similar fingerprinting, and they describe their numbers as detection-based estimates for the same reason: the underlying signals are probabilistic.

For the header- and CNAME-level mechanics of telling the four big networks apart, see the dedicated CDN detection guide.

The ethical boundary: do not unmask hidden origins

There is a category of techniques aimed at defeating the proxy and exposing the origin IP a site owner has deliberately hidden — scanning provider IP ranges, abusing misconfigured services, mining leaked SSL certificate data, triggering error pages, or hitting endpoints that bypass the WAF. This site does not do that, and this guide will not teach it.

The reason is simple: hiding the origin behind a CDN is very often a security measure. It protects the real server from direct denial-of-service attacks and from probing. Bypassing that protection to reveal an IP the owner chose to conceal undermines a legitimate defense, and depending on method and jurisdiction it can cross into unauthorized access. A hosting checker should work entirely from information a domain publishes voluntarily — DNS records, response headers, public passive-DNS history, ASN registries. If the origin is not discoverable from those public, non-intrusive sources, the correct response is not to dig harder. It is to say so.

That self-limit is a feature, not a weakness. A tool you can run on a competitor, a client, or your own infrastructure without crossing a line is a tool you can keep using.

How the report should label the result

Honesty in the output is the whole point of a trust article, so here is the labeling discipline a checker should follow:

  1. "CDN detected: Cloudflare" (or Fastly / Akamai / CloudFront) — when the edge network is identified with confidence from CNAME, ASN, nameservers and headers. This describes the layer you are connecting to.
  2. "Origin hosting: unknown (behind CDN)" — when a CDN is detected but the origin cannot be determined from public, non-intrusive signals. This is the correct, honest answer, and it should never be quietly replaced with the CDN's name.
  3. "Origin hosting: likely [provider]" with a confidence note — when independent signals (historical DNS, MX, supporting NS) converge on a probable origin, clearly flagged as an inference rather than a fact.
  4. "Hosting provider: [provider]" — reserved for the case where there is no proxy in front and the resolved IP genuinely belongs to the origin host.

The failure mode to avoid is collapsing "origin unknown" into a confident hosting name. A report that says "we see Cloudflare, and we cannot see past it without crossing a line we will not cross" is more useful than one that guesses — because you can trust the next thing it tells you.

Summary

A hosting checker reads what a domain shows to the public internet. When a CDN or reverse proxy sits in front of the origin, the public answer is the CDN, the visible IP is the edge, and the real host stays private by design. The right move is to combine signals — CNAME chain, ASN, headers, NS, MX, historical DNS — into a confidence level, to clearly separate "CDN detected" from "origin hosting unknown," and to refuse to bypass a site's security to unmask a hidden origin. As CDN adoption keeps climbing, "we connect to the edge and cannot ethically see past it" will only become a more common — and more honest — answer.

If you want to see these signals laid out for a specific domain, run it through the checker and read the labels for what they say, not for what you hoped they would say.