Most efficient way to batch delete S3 Files Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Come Celebrate our 10 Year Anniversary!Deleting files from S3 subdirectoriesMost efficient way to copy between S3 accountsHow to batch edit a list of files?Most cost efficient way to backup Subversion data to S3?Moving files with batch files from one pc to a server, to a another pc - worried about disk corruptionbatch script to copy files as within SMBCIFS protocolMost bandwidth efficient setup for video sharingWhat is the most efficient way to transfer files from AWS S3 to S3?Logstash S3 input plugin re-scanning all bucket objectsBackup strategy for user uploaded filesProxy for a local mirror of S3 directories

How do I automatically answer y in bash script?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Why is there no army of Iron-Mans in the MCU?

New Order #5: where Fibonacci and Beatty meet at Wythoff

Why don't the Weasley twins use magic outside of school if the Trace can only find the location of spells cast?

Classification of bundles, Postnikov towers, obstruction theory, local coefficients

Does a C shift expression have unsigned type? Why would Splint warn about a right-shift?

Can a monk deflect thrown melee weapons?

Should you tell Jews they are breaking a commandment?

Stars Make Stars

Fishing simulator

Slither Like a Snake

What computer would be fastest for Mathematica Home Edition?

What is the electric potential inside a point charge?

What did Darwin mean by 'squib' here?

Complexity of many constant time steps with occasional logarithmic steps

What are the performance impacts of 'functional' Rust?

Can I throw a sword that doesn't have the Thrown property at someone?

Limit for e and 1/e

Need a suitable toxic chemical for a murder plot in my novel

How to colour the US map with Yellow, Green, Red and Blue to minimize the number of states with the colour of Green

Is there a documented rationale why the House Ways and Means chairman can demand tax info?

Is there folklore associating late breastfeeding with low intelligence and/or gullibility?

Do we know why communications with Beresheet and NASA were lost during the attempted landing of the Moon lander?



Most efficient way to batch delete S3 Files



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Come Celebrate our 10 Year Anniversary!Deleting files from S3 subdirectoriesMost efficient way to copy between S3 accountsHow to batch edit a list of files?Most cost efficient way to backup Subversion data to S3?Moving files with batch files from one pc to a server, to a another pc - worried about disk corruptionbatch script to copy files as within SMBCIFS protocolMost bandwidth efficient setup for video sharingWhat is the most efficient way to transfer files from AWS S3 to S3?Logstash S3 input plugin re-scanning all bucket objectsBackup strategy for user uploaded filesProxy for a local mirror of S3 directories



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








9















I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



  1. How does S3 handle file deletion, especially when deleting large numbers of files?

  2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.









share|improve this question






























    9















    I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



    1. How does S3 handle file deletion, especially when deleting large numbers of files?

    2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.









    share|improve this question


























      9












      9








      9


      1






      I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



      1. How does S3 handle file deletion, especially when deleting large numbers of files?

      2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.









      share|improve this question
















      I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



      1. How does S3 handle file deletion, especially when deleting large numbers of files?

      2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.






      amazon-s3 batch-processing






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 2 '15 at 4:58









      tpml7

      3301421




      3301421










      asked Apr 2 '15 at 4:06









      SudoKillSudoKill

      48114




      48114




















          5 Answers
          5






          active

          oldest

          votes


















          7














          AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



          The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



          I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



          If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



          If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



          Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






          share|improve this answer






























            8














            The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



            Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



            Enter bulk deletion.



            I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



            Here's an example:



            cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


            • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

            • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

            • Removing ,Quiet=true or changing it to false will spew out server responses.

            • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

            But how do you get file-of-keys?



            If you already have your list of keys, good for you. Job complete.



            If not, here's one way I guess:



            aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





            share|improve this answer




















            • 6





              Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

              – SEK
              Aug 13 '18 at 18:09












            • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

              – Vlad Nikiforov
              Oct 1 '18 at 12:42












            • @VladNikiforov Cheers, edited.

              – antak
              Oct 2 '18 at 1:30











            • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

              – joelittlejohn
              Apr 3 at 12:32


















            3














            I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



            aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



            For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






            share|improve this answer


















            • 2





              It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

              – Brandon
              Feb 22 '18 at 4:35



















            1














            A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



            https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






            share|improve this answer








            New contributor




            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.



























              0














              Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



              The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



              http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






              share|improve this answer























                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "2"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f679989%2fmost-efficient-way-to-batch-delete-s3-files%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                7














                AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






                share|improve this answer



























                  7














                  AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                  The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                  I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                  If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                  If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                  Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






                  share|improve this answer

























                    7












                    7








                    7







                    AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                    The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                    I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                    If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                    If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                    Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






                    share|improve this answer













                    AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                    The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                    I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                    If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                    If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                    Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Apr 2 '15 at 19:27









                    Ed D'AzzoEd D'Azzo

                    1061




                    1061























                        8














                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





                        share|improve this answer




















                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32















                        8














                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





                        share|improve this answer




















                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32













                        8












                        8








                        8







                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





                        share|improve this answer















                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Oct 2 '18 at 1:27

























                        answered Jun 22 '18 at 6:38









                        antakantak

                        18915




                        18915







                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32












                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32







                        6




                        6





                        Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                        – SEK
                        Aug 13 '18 at 18:09






                        Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                        – SEK
                        Aug 13 '18 at 18:09














                        You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                        – Vlad Nikiforov
                        Oct 1 '18 at 12:42






                        You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                        – Vlad Nikiforov
                        Oct 1 '18 at 12:42














                        @VladNikiforov Cheers, edited.

                        – antak
                        Oct 2 '18 at 1:30





                        @VladNikiforov Cheers, edited.

                        – antak
                        Oct 2 '18 at 1:30













                        One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                        – joelittlejohn
                        Apr 3 at 12:32





                        One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                        – joelittlejohn
                        Apr 3 at 12:32











                        3














                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






                        share|improve this answer


















                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35
















                        3














                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






                        share|improve this answer


















                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35














                        3












                        3








                        3







                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






                        share|improve this answer













                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Aug 9 '17 at 19:01









                        dannymandannyman

                        286312




                        286312







                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35













                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35








                        2




                        2





                        It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                        – Brandon
                        Feb 22 '18 at 4:35






                        It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                        – Brandon
                        Feb 22 '18 at 4:35












                        1














                        A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                        https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






                        share|improve this answer








                        New contributor




                        cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.
























                          1














                          A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                          https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






                          share|improve this answer








                          New contributor




                          cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






















                            1












                            1








                            1







                            A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                            https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






                            share|improve this answer








                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.










                            A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                            https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html







                            share|improve this answer








                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer






                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered Apr 9 at 20:59









                            cam8001cam8001

                            1112




                            1112




                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





















                                0














                                Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






                                share|improve this answer



























                                  0














                                  Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                  The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                  http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                    The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                    http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






                                    share|improve this answer













                                    Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                    The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                    http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Apr 2 '15 at 19:42









                                    Bill BBill B

                                    411




                                    411



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Server Fault!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f679989%2fmost-efficient-way-to-batch-delete-s3-files%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                                        Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                                        What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company