AWS system failing on HEAD request but hardly on GET requests on stress testDoes EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?
Out of scope work duties and resignation
What was the first story to feature the plot "the monsters were human all along"?
Emotional immaturity of comic-book version of superhero Shazam
Where is the documentation for this ex command?
Is there an official reason for not adding a post-credits scene?
Find the cheapest shipping option based on item weight
Are there any of the Children of the Forest left, or are they extinct?
SafeCracker #3 - We've Been Blocked
Should I dumb down my writing in a foreign country?
What if the end-user didn't have the required library?
Has a commercial or military jet bi-plane ever been manufactured?
PWM 1Hz on solid state relay
Adding command shortcuts to bin
Should I decline this job offer that requires relocating to an area with high cost of living?
How can I get a job without pushing my family's income into a higher tax bracket?
IP addresses from public IP block in my LAN
Shutter speed -vs- effective image stabilisation
Decoupling cap routing on a 4 layer PCB
Is there an idiom that support the idea that "inflation is bad"?
What is a smasher?
Floor of Riemann zeta function
Word meaning as function of the composition of its phonemes
finding a solution for this recurrence relation
Are the Night's Watch still required?
AWS system failing on HEAD request but hardly on GET requests on stress test
Does EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
|
show 3 more comments
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
|
show 3 more comments
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
edited Apr 26 at 5:54
steros
asked Apr 25 at 2:27
sterossteros
1114
1114
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
|
show 3 more comments
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
|
show 3 more comments
1 Answer
1
active
oldest
votes
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
add a comment |
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
add a comment |
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
answered Apr 25 at 18:27
TimTim
18.3k41950
18.3k41950
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24