AWS system failing on HEAD request but hardly on GET requests on stress testDoes EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?
Can there be a single technologically advanced nation, in a continent full of non-technologically advanced nations?
Copy previous line to current line from text file
Is there an official reason for not adding a post-credits scene?
How can I support myself financially as a 17 year old with a loan?
Adjacent DEM color matching in QGIS
Word meaning as function of the composition of its phonemes
What is the solution to this metapuzzle from a university puzzling column?
US born but as a child of foreign diplomat
Gerrymandering Puzzle - Rig the Election
How to safely wipe a USB flash drive
Would you use llamarse for an animal's name?
Why did Thanos need his ship to help him in the battle scene?
Will 700 more planes a day fly because of the Heathrow expansion?
3D Volume in TIKZ
29er Road Tire?
How can I get a job without pushing my family's income into a higher tax bracket?
What to use instead of cling film to wrap pastry
How to write a 12-bar blues melody
Do publishers care if submitted work has already been copyrighted?
How can I roleplay a follower-type character when I as a player have a leader-type personality?
Why does sound not move through a wall?
Is “snitty” a popular American English term? What is its origin?
Floor of Riemann zeta function
What does 'made on' mean here?
AWS system failing on HEAD request but hardly on GET requests on stress test
Does EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
|
show 3 more comments
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
|
show 3 more comments
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
I'm running a stress test with Locust on:
- c4.xlarge (attacker has c4.4xlarge)
- 1 instance
- amazonlinux 2017.03
The load balancer is:
- classic type
- internet-facing
- stickiness is disabled for both 80 & 443
- 80 is forwarded to 80
- 443 is forwarded to 80
- idle timeout is 60s
- cross zone load balancing is enabled
- access logs disabled
- connection draining is enabled with timeout 300 seconds
- health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds
I do simple HEAD
and GET
requests to a /status.html
endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:
- 504, GATEWAY_TIMEOUT
- 408, REQUEST_TIMEOUT
- 503, Service unavailable: Back-end server is at capacity
The error rate is at about 10%. But strangely for the GET
request I get hardly any errors. It is not even 1%.
Why would that happen?
If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!
Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.
here a 504
:
2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
and later 503
:
2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing
edited Apr 26 at 5:54
steros
asked Apr 25 at 2:27
sterossteros
1114
1114
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
|
show 3 more comments
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24
|
show 3 more comments
1 Answer
1
active
oldest
votes
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
add a comment |
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
add a comment |
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.
Theory
I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:
Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect- Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.
ELB Background
Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.
Links
There is a relevant thread on the AWS forums, which seems to support my theory.
You should also look at the AWS network stress page to ensure you're not creating a DDOS.
answered Apr 25 at 18:27
TimTim
18.3k41950
18.3k41950
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?
– Tim
Apr 25 at 5:30
I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.
– steros
Apr 25 at 7:47
I have also edited the title. I hope it is more appropriate!
– steros
Apr 25 at 7:56
Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.
– Tim
Apr 25 at 8:10
1
"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).
– Michael - sqlbot
Apr 25 at 13:24