Wallarm API Firewall outperforms Nginx in a production environment

Wallarm API Firewall outperforms Nginx in a production environment
2021-09-15 08:43:50 Author: lab.wallarm.com(查看原文) 阅读量:47 收藏

Wallarm API Firewall is a free light-weighted API Firewall that protects your API endpoints in cloud-native environments with API schema validation. Wallarm API Firewall relies on a positive security model allowing calls that match a predefined API specification, while rejecting everything else.

Wallarm API Firewall is available as a Docker container (with 15M+ pulls to date). You can run the API Firewall Docker container through docker-compose or in Kubernetes. For instructions on launching a sample app protected by Wallarm API Firewall, check out the Quick Start section of the API Firewall repository.

When creating Wallarm API Firewall, we prioritized speed and efficiency to ensure that our customers would have the fastest APIs possible. The firewall is written in Go and uses fasthttp, an HTTP request library that’s up to 10x faster than Go’s built-in net/http solution.

In this article, we’ll show you how we measured and improved Wallarm API Firewall’s performance on a common API workload. We’ll also compare the performance of API Firewall to an nginx proxy. Finally, we’ll share detailed statistics and show the configurations we used.

How does the Wallarm API Firewall perform?

To evaluate the performance of each Wallarm API Firewall release, we built an internal performance profiling tool that relies on Go’s built-in runtime/pprof package.

A common scenario for an API firewall solution is handling HTTP POST requests with a JSON body. We chose to test a 27.5KB JSON file to simulate the body of the request. Many APIs handle data volumes smaller than 27.5KB per request, but we decided to use a larger file to highlight performance bottlenecks.

We ran our profiling tool with four scenarios: 1, 2, 10, and 10,000 requests per connection, to understand the performance implications of each scenario. We received the following results with API Firewall v0.6.2 (one release behind the latest version):

cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkFastOpenAPIServerPost1ReqPerConn        1559984              7597 ns/op            4176 B/op         51 allocs/op
BenchmarkFastOpenAPIServerPost2ReqPerConn        1704481              6968 ns/op            4176 B/op         51 allocs/op
BenchmarkFastOpenAPIServerPost10ReqPerConn       1839904              6430 ns/op            4176 B/op         51 allocs/op
BenchmarkFastOpenAPIServerPost10KReqPerConn      1795530              6573 ns/op            4176 B/op         51 allocs/op

The test showed our firewall takes approximately 7000ns to process a request, and each request causes 51 memory allocations using 4KB of total memory.

One way to compare performance of HTTP firewalls is to look at the number of requests they can handle per second.

In order to get the requests-per-second evaluation for Wallarm API Firewall, we used Apache’s HTTP server benchmarking tool. We ran it with 10000 requests all coming in a single stream (no parallelism), with our JSON file as the test payload:

$ ab -n 10000 -p large.json -T application/json http://127.0.0.1:8282/test/signup

Document Path:          /test/signup
Document Length:        20 bytes
Concurrency Level:      1
Time taken for tests:   14.620 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2150000 bytes
Total body sent:        283770000
HTML transferred:       200000 bytes
Requests per second:    684.01 [#/sec] (mean)
Time per request:       1.462 [ms] (mean)
Time per request:       1.462 [ms] (mean, across all concurrent requests)
Transfer rate:          143.62 [Kbytes/sec] received
                        18955.32 kb/s sent
                        19098.94 kb/s total
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    1   0.4      1       4
Waiting:        1    1   0.4      1       4
Total:          1    1   0.4      1       4
Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      2
  80%      2
  90%      2
  95%      2
  98%      3
  99%      3
 100%      4 (longest request)

We used the resulting 1.462ms/request timing as the performance baseline for the following Wallarm API Firewall release.

Boosting performance with Wallarm API Firewall v0.6.4

For the v0.6.4 release of Wallarm API Firewall, we focused on increasing the firewall’s performance.

We rewrote the firewall code to avoid casting requests from fasthttp objects to the net/http format. Our previous method was to convert objects into the net/http format, since the OpenAPI validation library only understood objects from Go’s built-in net/http package and could not work with fasthttp requests directly. We implemented support for fasthttp objects in the OpenAPI validation code.

We also reduced the number of times we needed to copy HTTP header information during request processing. As a result, we started moving more information around using pointers, instead of relying on Go to create copies of objects passed in as parameters to function calls.

Finally, instead of parsing the request objects in their entirety, we used the bufio.Reader.Peek method to process only the necessary parts of each request, and combined it with methods from the strconv package for efficient processing.

After implementing the changes, we reran our profiling tool and got a faster set of results:

cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkFastOpenAPIServerPost1ReqPerConn        2487135              4781 ns/op             944 B/op         21 allocs/op
BenchmarkFastOpenAPIServerPost2ReqPerConn        2791074              4261 ns/op             944 B/op         21 allocs/op
BenchmarkFastOpenAPIServerPost10ReqPerConn       3134810              3805 ns/op             944 B/op         21 allocs/op
BenchmarkFastOpenAPIServerPost10KReqPerConn      2934400              4046 ns/op             944 B/op         21 allocs/op

We successfully reduced memory allocations by over half, from 51 allocations/request to 21 allocations/request. We also achieved a significant drop in memory usage, from 4KB per request to less than 1KB per request.

How did this translate in terms of number of requests handled per second? We reran the Apache benchmarking tool based on our original parameters:

ab -n 10000 -p large.json -T application/json http://127.0.0.1:8282/test/signup

Document Path:          /test/signup
Document Length:        20 bytes
Concurrency Level:      1
Time taken for tests:   13.389 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2150000 bytes
Total body sent:        283770000
HTML transferred:       200000 bytes
Requests per second:    746.89 [#/sec] (mean)
Time per request:       1.339 [ms] (mean)
Time per request:       1.339 [ms] (mean, across all concurrent requests)
Transfer rate:          156.82 [Kbytes/sec] received
                        20697.88 kb/s sent
                        20854.70 kb/s total
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    1   0.5      1       5
Waiting:        1    1   0.4      1       5
Total:          1    1   0.5      1       5
Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      2
  80%      2
  90%      2
  95%      2
  98%      2
  99%      3
 100%      5 (longest request)

Our request time averaged 1.339 ms per request, which is approximately 10% faster than the previous release.

Wallarm API Firewall vs. nginx proxy speed comparison

To put the performance numbers in context, we compared Wallarm API Firewall’s performance against nginx, an HTTP server that’s known for its fast response times.

We configured nginx to use the proxy_pass mode, as this is the mode optimized for the firewall’s request passthrough. nginx does not parse incoming requests in proxy mode; it only forwards the requests that match a proxy_pass directive to a backend byte-for-byte.

nginx has many compile-time options that can impact performance, so we’re sharing the complete details of the nginx default Ubuntu binary we used for testing:

$ nginx -V
nginx version: nginx/1.18.0 (Ubuntu)
built with OpenSSL 1.1.1f  31 Mar 2020
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-KTLRnK/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

Here’s the configuration we supplied to nginx:

worker_processes 8; # equal to number of CPU cores on our machine
...
client_body_buffer_size 128K; # greater than our JSON file size
... 
       location / {
                proxy_pass http://127.0.0.1:9090;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
        }

We used 8 nginx worker processes based on the nginx performance tuning guide (see the CPU Affinity section). We set the buffer size to 128KB to ensure that our JSON payload would fit into the buffer in its entirety and would not cause any slowdowns. The proxy_pass configuration sets two headers that are common in a proxy configuration: Host and X-Real-IP.

We ran Apache benchmarking tests with the same number of requests and concurrency as we used in our Wallarm API Firewall testing. nginx was almost twice as fast in terms of number of requests handled per second—nginx achieved 1570 requests/second, compared to API Firewall’s 749 requests per second:

$ ab -c 1 -n 10000 -p ./large.json -T application/json http://127.0.0.1/test/signup

Document Path:          /test/signup
Document Length:        20 bytes

Concurrency Level:      1
Time taken for tests:   6.365 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1920000 bytes
Total body sent:        283720000
HTML transferred:       200000 bytes
Requests per second:    1570.99 [#/sec] (mean)
Time per request:       0.637 [ms] (mean)
Time per request:       0.637 [ms] (mean, across all concurrent requests)
Transfer rate:          294.56 [Kbytes/sec] received
                        43527.56 kb/s sent
                        43822.13 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    1   0.1      0      10
Waiting:        0    0   0.1      0      10
Total:          0    1   0.1      1      10
ERROR: The median and mean for the processing time are more than twice the standard
       deviation apart. These results are NOT reliable.

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%     10 (longest request)

We then wanted to check whether these findings would apply to a production-like environment. To test a production case, we increased the number of concurrent requests from 1 to 200. With this updated load, nginx handled 7888 requests per second:

$ ab -c 200 -n 10000 -p ./large.json -T application/json http://127.0.0.1/test/signup

Document Path:          /test/signup
Document Length:        20 bytes

Concurrency Level:      200
Time taken for tests:   1.268 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1920000 bytes
Total body sent:        283720000
HTML transferred:       200000 bytes
Requests per second:    7887.76 [#/sec] (mean)
Time per request:       25.356 [ms] (mean)
Time per request:       0.127 [ms] (mean, across all concurrent requests)
Transfer rate:          1478.96 [Kbytes/sec] received
                        218546.42 kb/s sent
                        220025.38 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    7   4.4      6      20
Processing:     0   18  10.9     16      68
Waiting:        0   15   9.9     14      60
Total:          0   25  13.2     24      78

Percentage of the requests served within a certain time (ms)
  50%     24
  66%     28
  75%     31
  80%     33
  90%     44
  95%     50
  98%     55
  99%     60
 100%     78 (longest request)

The Wallarm API Firewall handled a whopping 13005 requests/second, a 65% improvement over nginx:

$ ab -c 200 -n 10000 -p ./large.json -T application/json http://127.0.0.1:8282/test/signup

Document Path:          /test/signup
Document Length:        20 bytes

Concurrency Level:      200
Time taken for tests:   0.769 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2150000 bytes
Total body sent:        283770000
HTML transferred:       200000 bytes
Requests per second:    13005.81 [#/sec] (mean)
Time per request:       15.378 [ms] (mean)
Time per request:       0.077 [ms] (mean, across all concurrent requests)
Transfer rate:          2730.71 [Kbytes/sec] received
                        360415.95 kb/s sent
                        363146.67 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5   1.6      5      12
Processing:     2   10   5.4      9      59
Waiting:        2    8   5.2      7      56
Total:          3   15   5.7     14      68

Percentage of the requests served within a certain time (ms)
  50%     14
  66%     15
  75%     16
  80%     17
  90%     18
  95%     23
  98%     36
  99%     44
 100%     68 (longest request)

During testing, Wallarm API Firewall had validated every single data field in a 27.5KB JSON file and verified it with the OpenAPI schema, whereas nginx was only forwarding the requests and setting proxy-related headers. If nginx had done any API validation at all, it would likely have been even slower compared to Wallarm API Firewall.

We played around with different nginx configurations to try and speed up the HTTP proxy, but could not observe any meaningful improvements.

Submit your fastest nginx configuration for a chance to win $500!

While we’re proud of Wallarm API Firewall’s performance, we want to ensure our comparison with nginx is fair. If you know how to configure nginx to hit an API firewall performance at our hardware configuration with 200 concurrent threads, we want to hear from you! We’re offering a $500 prize for the fastest nginx configuration submitted for our test scenario.

The machine we used for all performance tests has an Intel® Core™ i7-9750H CPU with clock speed of 2.60GHz. You can download the sample JSON file that we used from the fasthttp repo on GitHub.

For a chance to win, submit your configuration as a new issue in the wallarm/api-firewall GitHub repository.

文章来源: https://lab.wallarm.com/wallarm-api-firewall-outperforms-nginx-in-a-production-environment/
如有侵权请联系:admin#unsafe.sh