HAPROXY(Rate Limiter + BadUser Detection)


Platforms used are completely Open Source. Please refer to their respective documentation from the respective links below:

NGINX: https://nginx.org/en/docs/

HAPROXY: http://www.haproxy.org/#docs

In this article I shall be discussing on how can we build our own Service Protection Layer which shall include modules to protect our web application against DoS, DDoS, Bad Bots based on User-Agents and a Web Application Firewall.

Any additional layer in our web application architecture, when we introduce an extra service layer, the chances of increase in latency is high. Hence for best results always follow the below flow when you are building a Service Protection Layer in front of your firewall.

HAPROXY ———-StaticContent————> Web Application

HAPROXY —–DynamicContent —–> NGINX ——> Web Application

Using HAPROXY we shall perform:

Rate Limiting

User-Agent Detection (Bot Mitigation)

Using NGINX we shall perform:

Fake Google/Yahoo/Bing Bots


Please download the signatures from https://github.com/aarvee11/webclient-detection or feel free to write your own !

The art of detecting and blocking attacks is all about finding Signatures, Patterns and understanding the Attack Vector!

Rate Limiting: Rate Limiting is a technique used to limit a particular client from abusing our platform with over-whelming requests ensuring that the resources on the web application server are always available for the legitimate users. Rate Limiting for a web application should be gauged for 4 different parameters:

  1. Number of TCP Connections per client IP
  2. Rate of Incoming TCP Connections from a Client IP
  3. Rate of Incoming HTTP Request from a HTTP Client
  4. Rate of HTTP Errors getting generated by a HTTP Client

Note: Client at application/HTTP layer can be identified based on IP, Cookie, Parameter, IP+Cookie, IP+Cookie+User-Agent or any Other HTTP Header like Authorization etc.

When you are noticing a very huge traffic getting caught at Rate Controls, prepare yourself to start scaling your service protection later infra using techniques like AWS Auto-Scaling based on Network-In and Network-Out parameters at the instance level

Bad-bots: Most of the bots always leave a signature behind. User-Agent is one of the key-headers which detected the same. We shall be using the User-Agent comparison against a known bad bots list that can be downloaded from the git mentioned above

Fake Google/Yahoo/Bing/Applebots: Most of the web admins do not set any security controls on any of the friendly bots like mentioned above. Reason? Definitely if we block the above bots, our SEO, Visibility and Functionality also can take a hit as these are friendly bots and not bad-bots. How do they identify themselves? Connecting IP Address and User-Agent. All the above service providers clearly declare that they cannot publish their IP Address range as it keeps changing very dynamically and the only option left is to check the User-Agent which is consistent. But the challenge for a security administrator is that the User-Agent field is pretty much configurable and hence any client out there can start using the well known friendly bots User-Agent Signatures and cause harm to our application. Hence I adopted to the method of performing a reverse lookup approach for the connecting IP Address as suggested by Google! I am using the rDNS module to perform the connecting IP Address based on their User-Agents and when found they are fake are getting rejected automatically. Cool isn’t it?

GeoIP: ICANN distributes the IP Addresses all over the world and hence based on the same information, we can determine the country from which a client is getting connected and accessing our web application. What if we have been seeing a good amount of attacks from a specific country? As we cant keep mapping the IP Address, we can always opt for a Geo Blocking method by making NGINX Geo Aware using the GeoIP Module. Once done, we can choose which countries to be allowed explicitly and block others or vice-versa!

For the ease of administrators and not to keep this article very theoretical, I have provided config snippets below:

For any further queries please feel free to InMail me!

Happy Hunting folks!

HAPROXY Config Snippet:

In the Frontend Section:


# Capture the Rate Limit Headers and Actual Client IP in Logs

 capture request header X-Haproxy-ACL len 256

 capture request header X-Bad-User len 64

 capture request header X-Forwarded-For len 64

 capture request header User-Agent len 256

 capture request header Host len 32

 capture request header X-Geo len 8

 capture request header X-Google-Bot len 8

 capture request header Referer len 256

 # Do Not Rate Control Google based bots. Fake Google bot attacks shall be filtered at NGINX

 acl google-ua hdr(user-agent) -i -f <path-to-webclient-detection-directory>/google-ua.lst

 http-request add-header X-Google-Bot %[req.fhdr(X-Google-Bot,-1)]Goog-YES, if google-ua

 # Define a table that will store IPs associated with counter

 stick-table type ip size 500k expire 30s store conn_cur,conn_rate(3s),http_req_rate(10s),http_err_rate(10s)

 # Enable tracking of src IP in the sticktable - Secops

 tcp-request content track-sc0 src



acl sensitive-urls path -i /api/app/login path -i /api/app/otp path -i /api/app/forgotpass

 # Reject the new connection if the client already has 100 opened

 http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-over-100-active-connections, if { src_conn_cur ge 100 }

 # Reject the new connection if the client has opened more than 65 TCP connections in 3 seconds

 http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-over-65-connections-in-3seconds, if { src_conn_rate ge 65 }

 # Reject the connection if the client has passed the HTTP error rate (10 HTTP Errors in 10 Seconds)

 http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-10-errors-in-10-seconds, if { sc0_http_err_rate() gt 10 }

 # Reject the connection if the client has passed the HTTP request rate (70 HTTP Requests in 10 Seconds)

 http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-70-HTTPRequests-in-10-seconds, if { sc0_http_req_rate() gt 70 } !listing

 # Reject Requests that are hitting more than 20 requests in 10 seconds interval for restaurant listing - Sensitive URLs

 http-request add-header X-Haproxy-ACL %[req.fhdr(X-Haproxy-ACL,-1)]Rate-Limit-Listing, if { sc0_http_req_rate() gt 30 } sensitive-urls



acl badbots hdr_sub(user-agent) -f <path-to-webclient-detection-directory>/bad-bots.lst

 acl nullua hdr_len(user-agent) 0

 acl availua hdr(user-agent) -m found

 acl ua-regex hdr_reg(user-agent) -i .+?[/\s][\d.]+

 acl tornodes src -f <path-to-webclient-detection-directory>/tor-exit-nodes.lst

 http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]BadBot, if badbots

 http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]No-UA, if nullua !google-ua

 http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]No-UA, if !availua !google-ua

 http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Invalid-UA, if !ua-regex !google-ua

 http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Tor-Node, if tornodes

 http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Trace-Method, if http_trace



acl scanner hdr_sub(user-agent) -f <path-to-webclient-detection-directory>/scanners.lst

http-request add-header X-Bad-User %[req.fhdr(X-Bad-User,-1)]Scanner, if scanner

 NGINX Config Snippet:

location / {

        #Enable Reverse DNS for all the Googlebots


 rdns_allow "(.*)google(.*)";

 rdns_allow "(.*)crawl.yahoo.net(.*)";

 rdns_allow "(.*)search.msn.com(.*)";

 rdns_allow "(.*)applebot.apple.com(.*)";

 rdns_deny "^(?!(.*)google(.*))|^(?!(.*)crawl.yahoo.net(.*))|^(?!(.*)search.msn.com(.*))|^(?!(.*)applebot.apple.com(.*))";

        if ($http_user_agent ~* "(.*)[Gg]oogle(.*)") {

            rdns on;


        if ($http_user_agent ~* "(.*)[Ss]lurp(.*)") {

            rdns on;


        if ($http_user_agent ~* "(.*)[Bb]ing(.*)") {

            rdns on;


        if ($http_user_agent ~* "(.*)[Aa]pplebot(.*)") {

            rdns on;


        # Set a Variable to take an action based on the result

        if ($rdns_hostname ~* ((.*)googlebot\.com)) {

            set $valid_bot 1;


        if ($rdns_hostname ~* ((.*)google\.com)) {

            set $valid_bot 1;


        # Return 403 is not Google

        if ($valid_bot = "0") {

            return 403;


 #Geo Block Section

        if ($allowed_country = yes) {

            set $exclusions 1;


        if ($exclusions = "0") {

            return 451;



Share it on your Social Network !


Hits: 267