AWS system failing on HEAD request but hardly on GET requests on stress testDoes EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?

Out of scope work duties and resignation

What was the first story to feature the plot "the monsters were human all along"?

Emotional immaturity of comic-book version of superhero Shazam

Where is the documentation for this ex command?

Is there an official reason for not adding a post-credits scene?

Find the cheapest shipping option based on item weight

Are there any of the Children of the Forest left, or are they extinct?

SafeCracker #3 - We've Been Blocked

Should I dumb down my writing in a foreign country?

What if the end-user didn't have the required library?

Has a commercial or military jet bi-plane ever been manufactured?

PWM 1Hz on solid state relay

Adding command shortcuts to bin

Should I decline this job offer that requires relocating to an area with high cost of living?

How can I get a job without pushing my family's income into a higher tax bracket?

IP addresses from public IP block in my LAN

Shutter speed -vs- effective image stabilisation

Decoupling cap routing on a 4 layer PCB

Is there an idiom that support the idea that "inflation is bad"?

What is a smasher?

Floor of Riemann zeta function

Word meaning as function of the composition of its phonemes

finding a solution for this recurrence relation

Are the Night's Watch still required?

AWS system failing on HEAD request but hardly on GET requests on stress test

Does EC2 have a rate limit? It seems if you use AB to test the request we get throttledAWS ELB - Stress test - Transient ErrorCan ping ec2 server (ubuntu/apache) but don't get response from http requestHow can I perform IIS stress test 40,000 http requests per second?AWS Ubuntu 11.04 failing to apt-get update/upgradeSpot instances frequently terminated in AWS auto scaling group (failing system health check)Ubuntu raring apt-get update is failing on AWS/EC2?AWS: Kitterman SPF record test failingI'm trying to stress test a site but no hosting company will let me. What can I do?Whats better: stress test the entire system vs profiling and stress testing specific parts?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm running a stress test with Locust on:

c4.xlarge (attacker has c4.4xlarge)

1 instance

amazonlinux 2017.03

The load balancer is:

classic type

internet-facing

stickiness is disabled for both 80 & 443

80 is forwarded to 80

443 is forwarded to 80

idle timeout is 60s

cross zone load balancing is enabled

access logs disabled

connection draining is enabled with timeout 300 seconds

health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds

I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:

504, GATEWAY_TIMEOUT

408, REQUEST_TIMEOUT

503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.

Why would that happen?

If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!

Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.

here a 504:

2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

and later 503:

2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

edited Apr 26 at 5:54

asked Apr 25 at 2:27

steros

1114

You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30

I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47

I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56

Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10

1

"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24

|
show 3 more comments

I'm running a stress test with Locust on:

c4.xlarge (attacker has c4.4xlarge)

1 instance

amazonlinux 2017.03

The load balancer is:

classic type

internet-facing

stickiness is disabled for both 80 & 443

80 is forwarded to 80

443 is forwarded to 80

idle timeout is 60s

cross zone load balancing is enabled

access logs disabled

connection draining is enabled with timeout 300 seconds

health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds

I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:

504, GATEWAY_TIMEOUT

408, REQUEST_TIMEOUT

503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.

Why would that happen?

If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!

Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.

here a 504:

2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

and later 503:

2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

edited Apr 26 at 5:54

asked Apr 25 at 2:27

steros

1114

You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30

I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47

I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56

Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10

1

"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24

|
show 3 more comments

I'm running a stress test with Locust on:

c4.xlarge (attacker has c4.4xlarge)

1 instance

amazonlinux 2017.03

The load balancer is:

classic type

internet-facing

stickiness is disabled for both 80 & 443

80 is forwarded to 80

443 is forwarded to 80

idle timeout is 60s

cross zone load balancing is enabled

access logs disabled

connection draining is enabled with timeout 300 seconds

health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds

I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:

504, GATEWAY_TIMEOUT

408, REQUEST_TIMEOUT

503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.

Why would that happen?

If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!

Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.

here a 504:

2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

and later 503:

2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

edited Apr 26 at 5:54

asked Apr 25 at 2:27

steros

1114

I'm running a stress test with Locust on:

c4.xlarge (attacker has c4.4xlarge)

1 instance

amazonlinux 2017.03

The load balancer is:

classic type

internet-facing

stickiness is disabled for both 80 & 443

80 is forwarded to 80

443 is forwarded to 80

idle timeout is 60s

cross zone load balancing is enabled

access logs disabled

connection draining is enabled with timeout 300 seconds

health check is configured as:
- Ping Target: HTTP:80/status.html
- Timeout:5 seconds
- Interval:30 seconds

I do simple HEAD and GET requests to a /status.html endpoint with the same distribution numbers 25000 users with 1000 spawned per second.
For the head requests I get a lot of these errors:

504, GATEWAY_TIMEOUT

408, REQUEST_TIMEOUT

503, Service unavailable: Back-end server is at capacity

The error rate is at about 10%. But strangely for the GET request I get hardly any errors. It is not even 1%.

Why would that happen?

If you need more details about the setup I can provide it. Unfortunately I am very new to AWS so I do not know what to provide. Sorry!

Here is some access logs from the production elb before the problem occurs, sorry I couldn't get the logs on stress test so far.

here a 504:

2019-04-26T02:41:20.330496Z XXX xxx.xxx.xxx.xxx:63054 xxx.xxx.xxx.xxx:80 0.000101 31.01214 0.00002 200 200 166 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Java/1.6.0_26" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:40.594005Z XXX xxx.xxx.xxx.xxx:50071 xxx.xxx.xxx.xxx:80 0.00006 10.751718 0.000021 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.446229Z XXX xxx.xxx.xxx.xxx:63063 xxx.xxx.xxx.xxx:80 0.000065 30.900277 0.00002 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:20.517259Z XXX xxx.xxx.xxx.xxx:56506 xxx.xxx.xxx.xxx:80 0.000053 30.829553 0.000018 200 200 0 161 "GET https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.652118Z XXX xxx.xxx.xxx.xxx:50120 xxx.xxx.xxx.xxx:80 0.000069 8.69724 0.000024 401 401 60 48 "POST https://my.endpoint.com/some_script HTTP/1.1" "go-resty/1.10.2 (https://github.com/go-resty/resty)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.360268Z XXX xxx.xxx.xxx.xxx:45201 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.361199Z XXX xxx.xxx.xxx.xxx:50120 - -1 -1 -1 504 0 146 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

and later 503:

2019-04-26T02:41:44.490135Z XXX xxx.xxx.xxx.xxx:50044 xxx.xxx.xxx.xxx:80 0.000062 28.220316 0.000019 200 200 0 320 "GET https://my.endpoint.com/some_script HTTP/1.1" "restify/4.3.1 (x64-linux; v8/5.1.281.111; OpenSSL/1.0.2n) node/6.12.3" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:32.082311Z XXX xxx.xxx.xxx.xxx:32882 xxx.xxx.xxx.xxx:80 0.000031 40.62881 0.000022 200 200 117 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:43.859743Z XXX xxx.xxx.xxx.xxx:32781 xxx.xxx.xxx.xxx:80 0.000077 28.851417 0.000015 200 200 184 78 "POST https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.2.1 curl/7.53.1 PHP/5.6.38" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.532781Z XXX xxx.xxx.xxx.xxx:51094 xxx.xxx.xxx.xxx:80 0.000027 21.178497 0.000014 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:51.568865Z XXX xxx.xxx.xxx.xxx:45267 xxx.xxx.xxx.xxx:80 0.000026 21.142531 0.00002 200 200 0 159 "GET https://my.endpoint.com/some_script HTTP/1.1" "GuzzleHttp/6.3.3 curl/7.40.0 PHP/5.5.25" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:46.195626Z XXX xxx.xxx.xxx.xxx:55182 xxx.xxx.xxx.xxx:80 0.000084 26.516262 0.000017 200 200 269 160 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:41:42.982043Z XXX xxx.xxx.xxx.xxx:56428 xxx.xxx.xxx.xxx:80 0.000114 29.747779 0.000019 200 200 107 305 "POST https://my.endpoint.com/some_script HTTP/1.1" "Apache-HttpClient/4.0.3 (java 1.5)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.543180Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-04-26T02:42:13.587978Z XXX xxx.xxx.xxx.xxx:47351 - -1 -1 -1 503 0 0 0 "POST https://my.endpoint.com/some_script HTTP/1.1" "-" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

amazon-web-services amazon-ec2 load-balancing amazon-elb stress-testing

edited Apr 26 at 5:54

asked Apr 25 at 2:27

steros

1114

edited Apr 26 at 5:54

asked Apr 25 at 2:27

steros

1114

edited Apr 26 at 5:54

asked Apr 25 at 2:27

steros

1114

asked Apr 25 at 2:27

steros

1114

asked Apr 25 at 2:27

steros

1114

You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30

I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47

I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56

Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10

1

"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24

|
show 3 more comments

You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30

I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47

I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56

Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10

1

"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24

You've said you're getting an autoscaling group failure, but ASG only increases server counts based on load. Do you mean you're getting a load balancer error? What kind of load balancer is it?

– Tim
Apr 25 at 5:30

I'm not sure what is failing. I only know it is an ASG so I suppose it might be that? I will add the load balancer info.

– steros
Apr 25 at 7:47

I have also edited the title. I hope it is more appropriate!

– steros
Apr 25 at 7:56

Your title suggests the instance is failing, but are you sure it's the instance? Could it be the load balancer? I wonder if it discards HEAD requests if it is too busy. You also can't just throw a huge load at a load balancer, you have to ramp it up or ask AWS to provision it for large load in advance. I suggest you run a 24 hour load test, ramping up slowly from zero to your target load over that time. Monitor errors as they change with time.

– Tim
Apr 25 at 8:10

"access logs disabled" would seem like the place to start. Turn them on and review them, as well as any logs on your web servers. Also, is there a reason for choosing a Classic balancer? Unless you have a specific reason, you should probably be using an Application Load Balancer (ALB).

– Michael - sqlbot
Apr 25 at 13:24

|
show 3 more comments

1 Answer
1

active

oldest

votes

Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.

Theory

I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:

Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

ELB Background

Initially a load balancer may be one small virtual server in each AZ distributing the load. As your load increases AWS behind the scenes adds either more or larger servers to take your load, which is why DNS can change regularly. If you throw a huge load at these small servers they will fail, likely prioritizing traffic, and a GET request is typically more important than a HEAD request.

Links

There is a relevant thread on the AWS forums, which seems to support my theory.

You should also look at the AWS network stress page to ensure you're not creating a DDOS.

answered Apr 25 at 18:27

Tim

18.3k41950

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f964496%2faws-system-failing-on-head-request-but-hardly-on-get-requests-on-stress-test%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.

Theory

I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:

Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

ELB Background

Links

There is a relevant thread on the AWS forums, which seems to support my theory.

You should also look at the AWS network stress page to ensure you're not creating a DDOS.

answered Apr 25 at 18:27

Tim

18.3k41950

add a comment |

Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.

Theory

I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:

Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

ELB Background

Links

There is a relevant thread on the AWS forums, which seems to support my theory.

You should also look at the AWS network stress page to ensure you're not creating a DDOS.

answered Apr 25 at 18:27

Tim

18.3k41950

add a comment |

Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.

Theory

I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:

Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

ELB Background

Links

There is a relevant thread on the AWS forums, which seems to support my theory.

You should also look at the AWS network stress page to ensure you're not creating a DDOS.

answered Apr 25 at 18:27

Tim

18.3k41950

Here's my best guess. It's a guess based on what little you've told us, whereas really you need to be looking at auto-scaling group logs and instance logs.

Theory

I suspect that you're slamming a high volume of requests at a load balancer without warming it. You have two options:

Pre-warm a load balancer so it can scale, by contacting AWS and telling them the load you expect

Increase your load gradually so the load balancer has a chance to scale. This can take longer than you expect, so I suggest initially you scale up over at least a few hours to high request rates.

ELB Background

Links

There is a relevant thread on the AWS forums, which seems to support my theory.

You should also look at the AWS network stress page to ensure you're not creating a DDOS.

answered Apr 25 at 18:27

Tim

18.3k41950

answered Apr 25 at 18:27

Tim

18.3k41950

answered Apr 25 at 18:27

Tim

18.3k41950

answered Apr 25 at 18:27

Tim

18.3k41950

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Server Fault!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Otdfbt

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1