Friday, 10 December 2021

CVE-2021-44228 log4j RCE mitigation

 "This seems to be generating some buzz" - a passing comment in $WORK's chat app - promoted me to go look at this in a bit more detail. As a systems admin, I generally let the devs guy worry about the health of the applications while I deal with the infrastructure, but this one is bad. Real bad. Like Corona virus for Java application servers. It even came from China (but kudos to the AliBaba guys for letting everyone know - this could have gone very differently).

I've never been able to work on Java developer timescales - and I didn't think this vulnerability would let me. So...

Fail2ban

I've got a small cluster of proxies fronting the web and application servers. These have fail2ban running which does a good job of keeping the script-kiddies out (really - I needed to put in a bypass for the company we subcontract the pen-testing to). So first off was a fail2ban rule:

#
[Definition]

failregex =     ^<HOST>.*\"\${jndi:ldap://
ignoreregex =


But fail2ban reads the log files to get its input. The log files don't get written until the request is processed. It won't catch the first hit.

Containment

The exploit works by retrieving a malware payload from an LDAP server. So the next step I took was to add firewall rules preventing our application servers from connecting to ports 389 and 636 other than our whielisted internal LDAP servers. 

Of course that's only going to help when the attacker is using an LDAP server running on the default ports. Bit it was worth doing. We were already getting attempts to exploit out servers, but they were crude / badly targeted. Until 14 minutes after I rolled out the firewall change. When we got hit by a request which would have triggered a successful exploit.

Prevention

The best mitigation (apart from applying the patch) is to set the "formatMsgNoLookups=true" option (hint for non-Java people out there - add this on the Java command line prefixed with "-D"). However according to the documentation I could find this only works on some version of log4j / it is far from clear just now if those versions are a sub-set or a superset of the versions which are vulnerable to the exploit, and I did not have time to go find out.
 
It seems obvious now, but there is a better way of protecting the systems. The proxy cluster uses nginx, so I went on to add this in the config:

if ($http_user_agent ~* \{jndi: ) {
        return 400 ;
}
if ($http_x_api_version ~*  \{jndi: ) {
        return 400 ;
}

(note that the second statement may have a functional impact).

I don't know if I've covered the entire attack surface with this, but now I get to go to bed and our servers live for another day.