AWS system failing on HEAD request but hardly on GET requests on stress testDoes EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?

Can there be a single technologically advanced nation, in a continent full of non-technologically advanced nations?

Copy previous line to current line from text file

Is there an official reason for not adding a post-credits scene?

How can I support myself financially as a 17 year old with a loan?

Adjacent DEM color matching in QGIS

Word meaning as function of the composition of its phonemes

What is the solution to this metapuzzle from a university puzzling column?

US born but as a child of foreign diplomat

Gerrymandering Puzzle - Rig the Election

How to safely wipe a USB flash drive

Would you use llamarse for an animal's name?

Why did Thanos need his ship to help him in the battle scene?

Will 700 more planes a day fly because of the Heathrow expansion?

3D Volume in TIKZ

29er Road Tire?

How can I get a job without pushing my family's income into a higher tax bracket?

What to use instead of cling film to wrap pastry

How to write a 12-bar blues melody

Do publishers care if submitted work has already been copyrighted?

How can I roleplay a follower-type character when I as a player have a leader-type personality?

Why does sound not move through a wall?

Is “snitty” a popular American English term? What is its origin?

Floor of Riemann zeta function

What does 'made on' mean here?



AWS system failing on HEAD request but hardly on GET requests on stress test


Does EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I'm running a stress test with Locust on:



  • c4.xlarge (attacker has c4.4xlarge)

  • 1 instance

  • amazonlinux 2017.03

The load balancer is:



  • classic type

  • internet-facing

  • stickiness is disabled for both 80 & 443

  • 80 is forwarded to 80

  • 443 is forwarded to 80

  • idle timeout is 60s

  • cross zone load balancing is enabled

  • access logs disabled

  • connection draining is enabled with timeout 300 seconds

  • health check is configured as:

    • Ping Target: HTTP:80/status.html

    • Timeout:5 seconds

    • Interval:30 seconds


I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:



  • 504, GATEWAY_TIMEOUT

  • 408, REQUEST_TIMEOUT

  • 503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.



Why would that happen?



If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!



Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.



here a 504:



2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


and later 503:



2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2









share|improve this question
























  • You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

    – Tim
    Apr 25 at 5:30











  • I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

    – steros
    Apr 25 at 7:47











  • I have also edited the title. I hope it is more appropriate!

    – steros
    Apr 25 at 7:56











  • Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

    – Tim
    Apr 25 at 8:10






  • 1





    "access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

    – Michael - sqlbot
    Apr 25 at 13:24

















1















I'm running a stress test with Locust on:



  • c4.xlarge (attacker has c4.4xlarge)

  • 1 instance

  • amazonlinux 2017.03

The load balancer is:



  • classic type

  • internet-facing

  • stickiness is disabled for both 80 & 443

  • 80 is forwarded to 80

  • 443 is forwarded to 80

  • idle timeout is 60s

  • cross zone load balancing is enabled

  • access logs disabled

  • connection draining is enabled with timeout 300 seconds

  • health check is configured as:

    • Ping Target: HTTP:80/status.html

    • Timeout:5 seconds

    • Interval:30 seconds


I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:



  • 504, GATEWAY_TIMEOUT

  • 408, REQUEST_TIMEOUT

  • 503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.



Why would that happen?



If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!



Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.



here a 504:



2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


and later 503:



2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2









share|improve this question
























  • You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

    – Tim
    Apr 25 at 5:30











  • I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

    – steros
    Apr 25 at 7:47











  • I have also edited the title. I hope it is more appropriate!

    – steros
    Apr 25 at 7:56











  • Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

    – Tim
    Apr 25 at 8:10






  • 1





    "access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

    – Michael - sqlbot
    Apr 25 at 13:24













1












1








1








I'm running a stress test with Locust on:



  • c4.xlarge (attacker has c4.4xlarge)

  • 1 instance

  • amazonlinux 2017.03

The load balancer is:



  • classic type

  • internet-facing

  • stickiness is disabled for both 80 & 443

  • 80 is forwarded to 80

  • 443 is forwarded to 80

  • idle timeout is 60s

  • cross zone load balancing is enabled

  • access logs disabled

  • connection draining is enabled with timeout 300 seconds

  • health check is configured as:

    • Ping Target: HTTP:80/status.html

    • Timeout:5 seconds

    • Interval:30 seconds


I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:



  • 504, GATEWAY_TIMEOUT

  • 408, REQUEST_TIMEOUT

  • 503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.



Why would that happen?



If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!



Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.



here a 504:



2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


and later 503:



2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2









share|improve this question
















I'm running a stress test with Locust on:



  • c4.xlarge (attacker has c4.4xlarge)

  • 1 instance

  • amazonlinux 2017.03

The load balancer is:



  • classic type

  • internet-facing

  • stickiness is disabled for both 80 & 443

  • 80 is forwarded to 80

  • 443 is forwarded to 80

  • idle timeout is 60s

  • cross zone load balancing is enabled

  • access logs disabled

  • connection draining is enabled with timeout 300 seconds

  • health check is configured as:

    • Ping Target: HTTP:80/status.html

    • Timeout:5 seconds

    • Interval:30 seconds


I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:



  • 504, GATEWAY_TIMEOUT

  • 408, REQUEST_TIMEOUT

  • 503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.



Why would that happen?



If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!



Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.



here a 504:



2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


and later 503:



2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2






amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 26 at 5:54







steros

















asked Apr 25 at 2:27









sterossteros

1114




1114












  • You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

    – Tim
    Apr 25 at 5:30











  • I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

    – steros
    Apr 25 at 7:47











  • I have also edited the title. I hope it is more appropriate!

    – steros
    Apr 25 at 7:56











  • Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

    – Tim
    Apr 25 at 8:10






  • 1





    "access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

    – Michael - sqlbot
    Apr 25 at 13:24

















  • You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

    – Tim
    Apr 25 at 5:30











  • I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

    – steros
    Apr 25 at 7:47











  • I have also edited the title. I hope it is more appropriate!

    – steros
    Apr 25 at 7:56











  • Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

    – Tim
    Apr 25 at 8:10






  • 1





    "access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

    – Michael - sqlbot
    Apr 25 at 13:24
















You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30





You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30













I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47





I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47













I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56





I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56













Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10





Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10




1




1





"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24





"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24










1 Answer
1






active

oldest

votes


















2














Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.



Theory



I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:




  • Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

  • Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

ELB Background



Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.



Links



There is a relevant thread on the AWS forums, which seems to support my theory.



You should also look at the AWS network stress page to ensure you're not creating a DDOS.






share|improve this answer























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "2"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.



    Theory



    I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:




    • Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

    • Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

    ELB Background



    Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.



    Links



    There is a relevant thread on the AWS forums, which seems to support my theory.



    You should also look at the AWS network stress page to ensure you're not creating a DDOS.






    share|improve this answer



























      2














      Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.



      Theory



      I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:




      • Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

      • Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

      ELB Background



      Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.



      Links



      There is a relevant thread on the AWS forums, which seems to support my theory.



      You should also look at the AWS network stress page to ensure you're not creating a DDOS.






      share|improve this answer

























        2












        2








        2







        Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.



        Theory



        I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:




        • Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

        • Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

        ELB Background



        Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.



        Links



        There is a relevant thread on the AWS forums, which seems to support my theory.



        You should also look at the AWS network stress page to ensure you're not creating a DDOS.






        share|improve this answer













        Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.



        Theory



        I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:




        • Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

        • Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

        ELB Background



        Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.



        Links



        There is a relevant thread on the AWS forums, which seems to support my theory.



        You should also look at the AWS network stress page to ensure you're not creating a DDOS.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 25 at 18:27









        TimTim

        18.3k41950




        18.3k41950



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Server Fault!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

            Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

            What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company