Speak to an expert : Live Chat exa online chat

Knowledge HubTMeducation

Google’s .zip & .mov TLDs – Blocking phishing URLs

The launch of Google’s .zip and .mov top level domains (TLDs) engendered controversy from the security community and internet at large, due to the opportunity they present for crafting increasingly plausible looking phishing URLs.

On the face of it, we assumed that the issue would likely warrant a cursory investigation and marketing response to help raise awareness of the tell-tale signs to look out for. However, in forming our response we realised that we could make changes to the way we parse and filter URLs to better guard against spoofing attacks.

Fool me once

Amid the controversy, a post by security researcher Bobby Rauch highlights how an attacker might use the .zip TLD, posing a simple question:

Can you quickly tell which of the URLs below is legitimate and which one is a malicious phish that drops evil.exe?

https://github.com∕kubernetes∕kubernetes∕archive∕refs∕tags∕@v1271.zip

https://github.com/kubernetes/kubernetes/archive/refs/tags/v1.27.1.zip

We came across this example published with no context in a link to the full post, and heedless of the URL parsing experience that we’ve built (developing and running a service that handles about 1 billion web requests a day) it was initially hard for our R&D architect to distinguish.

“Ah!”, I said “I see where a naive parser might go wrong”.

I then proceeded to mechanically quote the RFCs and point out that the first slash terminates the *authority* section of a URL so, while it may look at first glance like everything before the ‘@’ character might be interpreted as *userinfo* for the host `v1271.zip`, we were actually perfectly safe.

Intercepting all of the web traffic that originates from our schools throws up some unique challenges. We end up filtering requests from a mix of up-to-date browsers, old software that implements old and obsoleted specs, or applications that speak non-RFC compliant HTTP dialects, with a dedicated backend service and were never expected to speak with an actual web server or proxy. This means that we have to detect and gracefully handle a host of quirky behaviours, to avoid breaking the software our customers depend on. But we’re stricter about the way that we interpret URLs to ensure against any ambiguity that may lead to incorrectly applied filtering.

Usually it’s the job of the web browser to parse the URL as you see it in the URL bar, which is then written in a different format inside an actual HTTP request. Notably, the host and *userinfo* components get written into dedicated header fields while the URL-path is provided as a URI as part of the initial request line. Parsing becomes a little more complicated when your browser is explicitly configured to use a proxy server. The intermediary is expected to accept an absolute-form URI that’s identical to the full URL, but the upshot of filtering HTTP requests is that they rely on the client to tell us where you want to go.

Since we trust major browsers to perform this task safely, I reasoned that our users were no more exposed to phishing attacks than if the URL pointed to a host under any other TLD.

The dangers of unicode

The eagle-eyed among us have probably already noticed that the slashes in our two candidate URLs don’t quite look the same. That’s because, while the legitimate URL uses the expected, ASCII-defined slash (“/”) character to delimit path segments, the phishing example employs some cunning sleight-of-hand to replace slashes with a look-alike unicode character.

It’s relatively easy to see the difference when you’re directly comparing two side-by-side examples, but the replacement is far harder to notice

when clicking on a link that shown on it’s own, particularly when the *protocol* component of the URL is omitted:

github.com∕kubernetes∕kubernetes∕archive∕refs∕tags∕@v1271.zip

The effect is easier to describe with this visualisation:

google's .zip & .mov TLDs

Because there’s no actual slash character in what we thought to be the path, this substition means that “correct” parsing of the URL will interpret everything on the left of the ‘@’ as authentication data, while the host and path are expected on the right. This means that the link actually takes you to v1271.zip, which is a valid hostname with the newly introduced TLD.​

Even when we knew to look for unexpected behaviour in one of the example URLs, examining the correctness of the parser completely missed the point of the vulnerability: we assumed the ‘@’ character to be safe because we erroneously thought it belonged to a portion of the URL in which it has no particular meaning, rather than actually delimiting the *userinfo* subcomponent from the rest of the *authority*.​

Note that SurfProtect will in this case still apply your school’s filtering policy to the actual host you’re being directed to, but phishing attempts aren’t simple to classify based on the content they serve alone, and we’d rather find a safer solution for guarding against users being tricked.

An attack on .zip

The claim that we’re dubious of is that the problem is somehow caused by the the introduction of the new .zip and .mov TLDs.

While it’s true that the collision with a popular file extension helps to hide the fact that there’s even another potential hostname that appears after the ‘@’ character, the post’s own URL demonstrates that there are other viable concealment methods:

https://medium.com/@bobbyrsec/the-dangers-of-googles-zip-tld-5e1e675e59a5

If that format doesn’t worry users then it’s unlikely that anyone would be alarmed by this similar format:

medium.com∕@pepperoni.pizza∕the-dangers-of-too-much-cheese-5e1e675e59a5

And we suspect that internationalized domain names will sufficiently obfuscate even long-established country-specific TLDs for users to follow links without concern:

http://www.microsoft.com∕articles∕@xn--bb-eka.at/de/reiseplanung-services/vor-ihrer-reise/reservierung-sitzplatz

Our Response

Many organisations are now apparently recommending that people block access to these new TLDs, and schools can certainly do so if they wish by adding entries for `.zip` and `.mov` to their url blocklists within the SurfProtect panel.

Contrary to this advice, however, we believe that blocking access to ICANN-approved TLDs that are not reserved for any specific category of content, would be neither an appropriate action for a content filtering system, nor an effective countermeasure to combat the threat of the spoofing attacks discussed in this article. Instead, we will update SurfProtect’s URL filtering to block access to any URL that contains unicode within the *userinfo* field (which subsumes the obsoleted username:password portion from RFC1738).

It is believed that this response will better protect users from the threat of phishing by ensuring that the components of a URL can be identified through visual inspection.

This action will not prevent users from being misled by all phishing URLs, since correct interpretation is still required to identify the values of each component within a URL. In the example provided in RFC3986, it is noted that

`ftp://cnn.example.com&story=breaking_news@10.0.0.1/top_story.htm`

“might lead a human user to assume that the host is ‘cnn.example.com’, whereas it is actually ‘10.0.0.1’”.

While we can successfully block some attacks, we still recommend that user education is necessary to effectively combat the risk of phishing.

Suggested Next Read

Related Knowledge Hub™ Articles

ISPA Testing

The Exa Foundation

Contact us

Sales

Sales

Office hours

Monday: 8:30am – 5pm
Tuesday: 8:30am – 5pm
Wednesday: 8:30am – 5pm
Thursday: 8:30am – 5pm
Friday: 8:30am – 5pm
Saturday: Closed
Sunday: Closed

Technical Support

Contact us

Email: helpdesk@exa.net.uk
Phone: 0345 145 1234

Office hours

Monday: 8am – 6pm
Tuesday: 8am – 6pm
Wednesday: 8am – 6pm
Thursday: 8am – 6pm
Friday: 8am – 6pm
Saturday: 10am – 4pm
Sunday: 10am – 4pm