RFCs<\/span><\/a>\u00a0and point out that the first slash terminates the *authority* section of a URL so, while it may look at first glance like everything before\u00a0the ‘@’ character might be interpreted as *userinfo* for the host `v1271.zip`, we were actually perfectly safe.<\/p>Intercepting all of the web traffic that originates from our schools throws up some unique challenges. We end up filtering requests from\u00a0a mix of up-to-date browsers, old software that implements old and obsoleted specs, or applications that speak non-RFC compliant HTTP dialects,\u00a0<\/span>with a dedicated backend service and were never expected to speak with an actual web server or proxy. This means that we have to detect and\u00a0gracefully handle a host of quirky behaviours, to avoid breaking the software our customers depend on. But we’re stricter about the way that we\u00a0interpret URLs to ensure against any ambiguity that may lead to incorrectly applied filtering.<\/p>Usually it’s the job of the web browser to parse the URL as you see it in the URL bar, which is then written in a different format inside an\u00a0actual HTTP request. Notably, the host and *userinfo* components get written into dedicated header fields while the URL-path is provided as a\u00a0URI as part of the initial request line. Parsing becomes a little more complicated when your browser is explicitly configured to use a proxy\u00a0server. The intermediary is expected to accept an absolute-form URI that’s identical to the full URL, but the upshot of filtering HTTP\u00a0requests is that they rely on the client to tell us where you want to go.<\/p>
Since we trust major browsers to perform this task safely, I reasoned that our users were no more exposed to phishing attacks than if the\u00a0URL pointed to a host under any other TLD.<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t