HAPROXY(Rate Limiter + BadUser Detection)
NGINX(rDNS + GeoIP)
Platforms used are completely Open Source. Please refer to their respective documentation from the respective links below:
NGINX: https://nginx.org/en/docs/
HAPROXY: http://www.haproxy.org/#docs
In this article I shall be discussing on how can we build our own Service Protection Layer which shall include modules to protect our web application against DoS, DDoS, Bad Bots based on User-Agents and a Web Application Firewall.
Any additional layer in our web application architecture, when we introduce an extra service layer, the chances of increase in latency is high. Hence for best results always follow the below flow when you are building a Service Protection Layer in front of your firewall.
HAPROXY ———-StaticContent————> Web Application
HAPROXY —–DynamicContent —–> NGINX ——> Web Application
Using HAPROXY we shall perform:
Rate Limiting
User-Agent Detection (Bot Mitigation)
Using NGINX we shall perform:
Fake Google/Yahoo/Bing Bots
GeoBlock
Please download the signatures from https://github.com/aarvee11/webclient-detection or feel free to write your own !
The art of detecting and blocking attacks is all about finding Signatures, Patterns and understanding the Attack Vector!
Rate Limiting: Rate Limiting is a technique used to limit a particular client from abusing our platform with over-whelming requests ensuring that the resources on the web application server are always available for the legitimate users. Rate Limiting for a web application should be gauged for 4 different parameters:
- Number of TCP Connections per client IP
- Rate of Incoming TCP Connections from a Client IP
- Rate of Incoming HTTP Request from a HTTP Client
- Rate of HTTP Errors getting generated by a HTTP Client
Note: Client at application/HTTP layer can be identified based on IP, Cookie, Parameter, IP+Cookie, IP+Cookie+User-Agent or any Other HTTP Header like Authorization etc.
When you are noticing a very huge traffic getting caught at Rate Controls, prepare yourself to start scaling your service protection later infra using techniques like AWS Auto-Scaling based on Network-In and Network-Out parameters at the instance level
Bad-bots: Most of the bots always leave a signature behind. User-Agent is one of the key-headers which detected the same. We shall be using the User-Agent comparison against a known bad bots list that can be downloaded from the git mentioned above
Fake Google/Yahoo/Bing/Applebots: Most of the web admins do not set any security controls on any of the friendly bots like mentioned above. Reason? Definitely if we block the above bots, our SEO, Visibility and Functionality also can take a hit as these are friendly bots and not bad-bots. How do they identify themselves? Connecting IP Address and User-Agent. All the above service providers clearly declare that they cannot publish their IP Address range as it keeps changing very dynamically and the only option left is to check the User-Agent which is consistent. But the challenge for a security administrator is that the User-Agent field is pretty much configurable and hence any client out there can start using the well known friendly bots User-Agent Signatures and cause harm to our application. Hence I adopted to the method of performing a reverse lookup approach for the connecting IP Address as suggested by Google! I am using the rDNS module to perform the connecting IP Address based on their User-Agents and when found they are fake are getting rejected automatically. Cool isn’t it?
GeoIP: ICANN distributes the IP Addresses all over the world and hence based on the same information, we can determine the country from which a client is getting connected and accessing our web application. What if we have been seeing a good amount of attacks from a specific country? As we cant keep mapping the IP Address, we can always opt for a Geo Blocking method by making NGINX Geo Aware using the GeoIP Module. Once done, we can choose which countries to be allowed explicitly and block others or vice-versa!
For the ease of administrators and not to keep this article very theoretical, I have provided config snippets below:
For any further queries please feel free to InMail me!
Happy Hunting folks!
HAPROXY Config Snippet:
In the Frontend Section:
=======================================
# Capture the Rate Limit Headers and Actual Client IP in Logs
capture request header X-Haproxy-ACL len 256
capture request header X-Bad-User len 64
capture request header X-Forwarded-For len 64
capture request header User-Agent len 256
capture request header Host len 32
capture request header X-Geo len 8
capture request header X-Google-Bot len 8
capture request header Referer len 256
# Do Not Rate Control Google based bots. Fake Google bot attacks shall be filtered at NGINX
acl google-ua hdr(user-agent) -i -f <path-to-webclient-detection-directory>/google-ua.lst
http-request add-header X-Google-Bot %[req.fhdr(X-Google-Bot,-1)]Goog-YES, if google-ua
# Define a table that will store IPs associated with counter
stick-table type ip size 500k expire 30s store conn_cur,conn_rate(3s),http_req_rate(10s),http_err_rate(10s)
# Enable tracking of src IP in the sticktable - Secops
tcp-request content track-sc0 src
=======================================
# RATE LIMITING RULES
acl sensitive-urls path -i /api/app/login path -i /api/app/otp path -i /api/app/forgotpass
# Reject the new connection if the client already has 100 opened
http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-over-100-active-connections, if { src_conn_cur ge 100 }
# Reject the new connection if the client has opened more than 65 TCP connections in 3 seconds
http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-over-65-connections-in-3seconds, if { src_conn_rate ge 65 }
# Reject the connection if the client has passed the HTTP error rate (10 HTTP Errors in 10 Seconds)
http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-10-errors-in-10-seconds, if { sc0_http_err_rate() gt 10 }
# Reject the connection if the client has passed the HTTP request rate (70 HTTP Requests in 10 Seconds)
http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-70-HTTPRequests-in-10-seconds, if { sc0_http_req_rate() gt 70 } !listing
# Reject Requests that are hitting more than 20 requests in 10 seconds interval for restaurant listing - Sensitive URLs
http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-Listing, if { sc0_http_req_rate() gt 30 } sensitive-urls
=======================================
# FLAGGING BAD-BOTS @ HAPROXY LEVEL
acl badbots hdr_sub(user-agent) -f <path-to-webclient-detection-directory>/bad-bots.lst
acl nullua hdr_len(user-agent) 0
acl availua hdr(user-agent) -m found
acl ua-regex hdr_reg(user-agent) -i .+?[/\s][\d.]+
acl tornodes src -f <path-to-webclient-detection-directory>/tor-exit-nodes.lst
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]BadBot, if badbots
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]No-UA, if nullua !google-ua
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]No-UA, if !availua !google-ua
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Invalid-UA, if !ua-regex !google-ua
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Tor-Node, if tornodes
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Trace-Method, if http_trace
=======================================
# FLAGGING SCANNERS @ HAPROXY LEVEL
acl scanner hdr_sub(user-agent) -f <path-to-webclient-detection-directory>/scanners.lst
http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Scanner, if scanner
NGINX Config Snippet:
location / {
#Enable Reverse DNS for all the Googlebots
resolver 8.8.8.8;
rdns_allow "(.*)google(.*)";
rdns_allow "(.*)crawl.yahoo.net(.*)";
rdns_allow "(.*)search.msn.com(.*)";
rdns_allow "(.*)applebot.apple.com(.*)";
rdns_deny "^(?!(.*)google(.*))|^(?!(.*)crawl.yahoo.net(.*))|^(?!(.*)search.msn.com(.*))|^(?!(.*)applebot.apple.com(.*))";
if ($http_user_agent ~* "(.*)[Gg]oogle(.*)") {
rdns on;
}
if ($http_user_agent ~* "(.*)[Ss]lurp(.*)") {
rdns on;
}
if ($http_user_agent ~* "(.*)[Bb]ing(.*)") {
rdns on;
}
if ($http_user_agent ~* "(.*)[Aa]pplebot(.*)") {
rdns on;
}
# Set a Variable to take an action based on the result
if ($rdns_hostname ~* ((.*)googlebot\.com)) {
set $valid_bot 1;
}
if ($rdns_hostname ~* ((.*)google\.com)) {
set $valid_bot 1;
}
# Return 403 is not Google
if ($valid_bot = "0") {
return 403;
}
#Geo Block Section
if ($allowed_country = yes) {
set $exclusions 1;
}
if ($exclusions = "0") {
return 451;
}
Hits: 744